From patchwork Sat Feb 25 01:08:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151906 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A0EAC6FA8E for ; Sat, 25 Feb 2023 01:15:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229591AbjBYBPr (ORCPT ); Fri, 24 Feb 2023 20:15:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48316 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229491AbjBYBPn (ORCPT ); Fri, 24 Feb 2023 20:15:43 -0500 Received: from mail-oi1-x231.google.com (mail-oi1-x231.google.com [IPv6:2607:f8b0:4864:20::231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8501F126E3 for ; Fri, 24 Feb 2023 17:15:38 -0800 (PST) Received: by mail-oi1-x231.google.com with SMTP id o12so801783oik.6 for ; Fri, 24 Feb 2023 17:15:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gM7ZGxyRl4VfU/TzDEvr1xWUOnT+jQp6Huv+T+VEpUM=; b=ZALVzhKPu41uzhkh7NoojIdk0hyoxb0JsoqQEoxMb2Fy9/so8XvADL6T3POZWYhAgb 0GM1uUiciYm13WxIEvySMMyZt0wvBJoD/bifXot5NF5YdPONs1FIbxKDAu7l78Max8n0 3qKrEXP50iHXhPq53a5blHMZ/NAskyJhnV/d5KJ1rrKao/22RSwSEJvV3vQ7tRwqBnGD 27hmSRgPkf4X0R8IvsG2yBbr3TOTtw9xNIHQQBvrI9boRIygqGSxctyKxAZLxbCnzXx3 PpzE3B6wUm82H+4jmVShu2/7A/jVAeHQpYTNG3g8LrWbP6c8oTIZtk1EvJGmeDNkYr+l bfoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gM7ZGxyRl4VfU/TzDEvr1xWUOnT+jQp6Huv+T+VEpUM=; b=tPmwkbcFyMNXiAAFMbXGMgiEFt/G2gUiXpZWfeia/kTCAhm280J/zacRkzi1PxxMzy CbHEgP2o4lATNtL5Cr6J0+RvLy6p07Frzo3mM67KTxlDvzYNCx0Vccsg4uqljTGMXE0G JmCl43Vl/VSbhJkiipeGO1/JN1EPTfbroTLRVbqh/DH5BM9XeToGTPZq2a7bkq5s52kI HM76mg8KiQBklaCK+2FjgLGJ47mMRKKIwm0Csu0btdD9POa4kWSDGiwewlpe35rRm9xd C4vQWbvtLXaqy4y6ojzLiZC4xHDwumpHmA4R7qPbnXUyEf1b3IzzX/OGwazRG2naEZ25 4BAg== X-Gm-Message-State: AO0yUKXYogU4/RRcOoycgjCvf/gH9UNURv8Bfe7TaEqIQL+MhaMgltAS N3uiTJH1wAXpJ/kBFzASnavGHLLAVE3UwgbJ X-Google-Smtp-Source: AK7set8rv1A+QrvJQDkZV5e8LeSF4tm1yZFDTCKsMU4ADfB9Tt+TcfZP7A5y35AtJbv5QQP0Nfprkg== X-Received: by 2002:a54:418a:0:b0:384:1d7:5834 with SMTP id 10-20020a54418a000000b0038401d75834mr1372569oiy.28.1677287736409; Fri, 24 Feb 2023 17:15:36 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.15.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:15:35 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 01/76] ssdfs: introduce SSDFS on-disk layout Date: Fri, 24 Feb 2023 17:08:12 -0800 Message-Id: <20230225010927.813929-2-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org SSDFS architecture is based on segment concept. Segment is a portion of file system volume that has to be aligned on erase block size. Segment can include one or several erase blocks. It is basic unit to allocate and to manage a free space of file system volume. Erase block is a basic unit to keep metadata and user data. Every erase block contains a sequence of logs. Log starts from segment header (struct ssdfs_segment_header) or partial log header (struct ssdfs_partial_log_header). Full log can be finished with log footer (struct ssdfs_log_footer). Log's header (+footer) contains all necessary metadata describing the log's payload. The log's metadata includes: (1) block bitmap (struct ssdfs_block_bitmap_fragment) + (struct ssdfs_block_bitmap_header): tracking the state of logical blocks (free, pre-allocated, valid, invalid) in segment. (2) offset translation table (struct ssdfs_blk2off_table_header) + (struct ssdfs_phys_offset_table_header) + (struct ssdfs_area_block_table): converts logical block into position inside of particular erase block. Additionally, log's header is the copy of superblock that keeps knowledge of location the all SSDFS metadata structures. SSDFS has: (1) mapping table (struct ssdfs_leb_table_fragment_header) + (struct ssdfs_peb_table_fragment_header): implements the mapping of logical erase blocks into "physical" ones. (2) mapping table cache (struct ssdfs_maptbl_cache_header): copy of content of mapping table for some type of erase blocks. The cache is used for conversion logical erase block ID into "physical" erase block ID in the case when the fragment of mapping table is not initialized yet. (3) segment bitmap (struct ssdfs_segbmap_fragment_header): tracking state (clean, using, used, pre-dirty, dirty, reserved) of segments with the goal of searching, allocation, erase, and garbage collection. (4) b-tree (struct ssdfs_btree_descriptor) + (struct ssdfs_btree_index_key) + (struct ssdfs_btree_node_header): all the rest metadata structures are represented by b-trees. (5) inodes b-tree (struct ssdfs_inodes_btree) + (struct ssdfs_inodes_btree_node_header): keeps raw inodes of existing file system objects (struct ssdfs_inode). (6) dentries b-tree (struct ssdfs_dentries_btree_descriptor) + (struct ssdfs_dentries_btree_node_header): keeps directory entries (struct ssdfs_dir_entry). (7) extents b-tree (struct ssdfs_extents_btree_descriptor) + (struct ssdfs_extents_btree_node_header): keeps raw extents describing the location of piece of data (struct ssdfs_raw_fork) + (struct ssdfs_raw_extent). (8) xattr b-tree (struct ssdfs_xattr_btree_descriptor) + (struct ssdfs_xattrs_btree_node_header): keeps extended attributes of file or folder (struct ssdfs_xattr_entry). (9) invalidated extents b-tree (struct ssdfs_invalidated_extents_btree) + (struct ssdfs_invextree_node_header): keeps information about invalidated extents for ZNS SSD + SMR HDD use cases. (10) shared dictionary b-tree (struct ssdfs_shared_dictionary_btree) + (struct ssdfs_shared_dictionary_node_header): keeps long names (more than 12 symbols) in the form of tries. (11) snapshots b-tree (struct ssdfs_snapshots_btree) + (struct ssdfs_snapshots_btree_node_header): keeps snapshots info (struct ssdfs_snapshot) and association of erase block IDs with timestamps (struct ssdfs_peb2time_set) + (struct ssdfs_peb2time_pair). Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- include/linux/ssdfs_fs.h | 3468 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 3468 insertions(+) create mode 100644 include/linux/ssdfs_fs.h diff --git a/include/linux/ssdfs_fs.h b/include/linux/ssdfs_fs.h new file mode 100644 index 000000000000..a41725234982 --- /dev/null +++ b/include/linux/ssdfs_fs.h @@ -0,0 +1,3468 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * include/linux/ssdfs_fs.h - SSDFS on-disk structures and common declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#ifndef _LINUX_SSDFS_H +#define _LINUX_SSDFS_H + +#include + +typedef u8 __le8; + +struct ssdfs_inode; + +/* + * struct ssdfs_revision - metadata structure version + * @major: major version number + * @minor: minor version number + */ +struct ssdfs_revision { +/* 0x0000 */ + __le8 major; + __le8 minor; + +/* 0x0002 */ +} __packed; + +/* + * struct ssdfs_signature - metadata structure magic signature + * @common: common magic value + * @key: detailed magic value + */ +struct ssdfs_signature { +/* 0x0000 */ + __le32 common; + __le16 key; + struct ssdfs_revision version; + +/* 0x0008 */ +} __packed; + +/* + * struct ssdfs_metadata_check - metadata structure checksum + * @bytes: bytes count of CRC calculation for the structure + * @flags: flags + * @csum: checksum + */ +struct ssdfs_metadata_check { +/* 0x0000 */ + __le16 bytes; +#define SSDFS_CRC32 (1 << 0) +#define SSDFS_ZLIB_COMPRESSED (1 << 1) +#define SSDFS_LZO_COMPRESSED (1 << 2) + __le16 flags; + __le32 csum; + +/* 0x0008 */ +} __packed; + +/* + * struct ssdfs_raw_extent - raw (on-disk) extent + * @seg_id: segment number + * @logical_blk: logical block number + * @len: count of blocks in extent + */ +struct ssdfs_raw_extent { +/* 0x0000 */ + __le64 seg_id; + __le32 logical_blk; + __le32 len; + +/* 0x0010 */ +} __packed; + +/* + * struct ssdfs_meta_area_extent - metadata area extent + * @start_id: starting identification number + * @len: count of items in metadata area + * @type: item's type + * @flags: flags + */ +struct ssdfs_meta_area_extent { +/* 0x0000 */ + __le64 start_id; + __le32 len; + __le16 type; + __le16 flags; + +/* 0x0010 */ +} __packed; + +/* Type of item in metadata area */ +enum { + SSDFS_EMPTY_EXTENT_TYPE, + SSDFS_SEG_EXTENT_TYPE, + SSDFS_PEB_EXTENT_TYPE, + SSDFS_BLK_EXTENT_TYPE, +}; + +/* Type of segbmap's segments */ +enum { + SSDFS_MAIN_SEGBMAP_SEG, + SSDFS_COPY_SEGBMAP_SEG, + SSDFS_SEGBMAP_SEG_COPY_MAX, +}; + +#define SSDFS_SEGBMAP_SEGS 8 + +/* + * struct ssdfs_segbmap_sb_header - superblock's segment bitmap header + * @fragments_count: fragments count in segment bitmap + * @fragments_per_seg: segbmap's fragments per segment + * @fragments_per_peb: segbmap's fragments per PEB + * @fragment_size: size of fragment in bytes + * @bytes_count: size of segment bitmap in bytes (payload part) + * @flags: segment bitmap's flags + * @segs_count: count of really reserved segments in one chain + * @segs: array of segbmap's segment numbers + */ +struct ssdfs_segbmap_sb_header { +/* 0x0000 */ + __le16 fragments_count; + __le16 fragments_per_seg; + __le16 fragments_per_peb; + __le16 fragment_size; + +/* 0x0008 */ + __le32 bytes_count; + __le16 flags; + __le16 segs_count; + +/* 0x0010 */ + __le64 segs[SSDFS_SEGBMAP_SEGS][SSDFS_SEGBMAP_SEG_COPY_MAX]; + +/* 0x0090 */ +} __packed; + +/* Segment bitmap's flags */ +#define SSDFS_SEGBMAP_HAS_COPY (1 << 0) +#define SSDFS_SEGBMAP_ERROR (1 << 1) +#define SSDFS_SEGBMAP_MAKE_ZLIB_COMPR (1 << 2) +#define SSDFS_SEGBMAP_MAKE_LZO_COMPR (1 << 3) +#define SSDFS_SEGBMAP_FLAGS_MASK (0xF) + +enum { + SSDFS_MAIN_MAPTBL_SEG, + SSDFS_COPY_MAPTBL_SEG, + SSDFS_MAPTBL_SEG_COPY_MAX, +}; + +#define SSDFS_MAPTBL_RESERVED_EXTENTS (3) + +/* + * struct ssdfs_maptbl_sb_header - superblock's mapping table header + * @fragments_count: count of fragments in mapping table + * @fragment_bytes: bytes in one mapping table's fragment + * @last_peb_recover_cno: checkpoint of last trying to recover PEBs + * @lebs_count: count of Logical Erase Blocks (LEBs) are described by table + * @pebs_count: count of Physical Erase Blocks (PEBs) are described by table + * @fragments_per_seg: count of mapping table's fragments in segment + * @fragments_per_peb: count of mapping table's fragments in PEB + * @flags: mapping table's flags + * @pre_erase_pebs: count of PEBs in pre-erase state + * @lebs_per_fragment: count of LEBs are described by fragment + * @pebs_per_fragment: count of PEBs are described by fragment + * @pebs_per_stripe: count of PEBs are described by stripe + * @stripes_per_fragment: count of stripes in fragment + * @extents: metadata extents that describe mapping table location + */ +struct ssdfs_maptbl_sb_header { +/* 0x0000 */ + __le32 fragments_count; + __le32 fragment_bytes; + __le64 last_peb_recover_cno; + +/* 0x0010 */ + __le64 lebs_count; + __le64 pebs_count; + +/* 0x0020 */ + __le16 fragments_per_seg; + __le16 fragments_per_peb; + __le16 flags; + __le16 pre_erase_pebs; + +/* 0x0028 */ + __le16 lebs_per_fragment; + __le16 pebs_per_fragment; + __le16 pebs_per_stripe; + __le16 stripes_per_fragment; + +/* 0x0030 */ +#define MAPTBL_LIMIT1 (SSDFS_MAPTBL_RESERVED_EXTENTS) +#define MAPTBL_LIMIT2 (SSDFS_MAPTBL_SEG_COPY_MAX) + struct ssdfs_meta_area_extent extents[MAPTBL_LIMIT1][MAPTBL_LIMIT2]; + +/* 0x0090 */ +} __packed; + +/* Mapping table's flags */ +#define SSDFS_MAPTBL_HAS_COPY (1 << 0) +#define SSDFS_MAPTBL_ERROR (1 << 1) +#define SSDFS_MAPTBL_MAKE_ZLIB_COMPR (1 << 2) +#define SSDFS_MAPTBL_MAKE_LZO_COMPR (1 << 3) +#define SSDFS_MAPTBL_UNDER_FLUSH (1 << 4) +#define SSDFS_MAPTBL_FLAGS_MASK (0x1F) + +/* + * struct ssdfs_btree_descriptor - generic btree descriptor + * @magic: magic signature + * @flags: btree flags + * @type: btree type + * @log_node_size: log2(node size in bytes) + * @pages_per_node: physical pages per btree node + * @node_ptr_size: size in bytes of pointer on btree node + * @index_size: size in bytes of btree's index + * @item_size: size in bytes of btree's item + * @index_area_min_size: minimal size in bytes of index area in btree node + * + * The goal of a btree descriptor is to keep + * the main features of a tree. + */ +struct ssdfs_btree_descriptor { +/* 0x0000 */ + __le32 magic; +#define SSDFS_BTREE_DESC_INDEX_AREA_RESIZABLE (1 << 0) +#define SSDFS_BTREE_DESC_FLAGS_MASK 0x1 + __le16 flags; + __le8 type; + __le8 log_node_size; + +/* 0x0008 */ + __le8 pages_per_node; + __le8 node_ptr_size; + __le16 index_size; + __le16 item_size; + __le16 index_area_min_size; + +/* 0x0010 */ +} __packed; + +/* Btree types */ +enum { + SSDFS_BTREE_UNKNOWN_TYPE, + SSDFS_INODES_BTREE, + SSDFS_DENTRIES_BTREE, + SSDFS_EXTENTS_BTREE, + SSDFS_SHARED_EXTENTS_BTREE, + SSDFS_XATTR_BTREE, + SSDFS_SHARED_XATTR_BTREE, + SSDFS_SHARED_DICTIONARY_BTREE, + SSDFS_SNAPSHOTS_BTREE, + SSDFS_INVALIDATED_EXTENTS_BTREE, + SSDFS_BTREE_TYPE_MAX +}; + +/* + * struct ssdfs_dentries_btree_descriptor - dentries btree descriptor + * @desc: btree descriptor + */ +struct ssdfs_dentries_btree_descriptor { +/* 0x0000 */ + struct ssdfs_btree_descriptor desc; + +/* 0x0010 */ + __le8 reserved[0x10]; + +/* 0x0020 */ +} __packed; + +/* + * struct ssdfs_extents_btree_descriptor - extents btree descriptor + * @desc: btree descriptor + */ +struct ssdfs_extents_btree_descriptor { +/* 0x0000 */ + struct ssdfs_btree_descriptor desc; + +/* 0x0010 */ + __le8 reserved[0x10]; + +/* 0x0020 */ +} __packed; + +/* + * struct ssdfs_xattr_btree_descriptor - extended attr btree descriptor + * @desc: btree descriptor + */ +struct ssdfs_xattr_btree_descriptor { +/* 0x0000 */ + struct ssdfs_btree_descriptor desc; + +/* 0x0010 */ + __le8 reserved[0x10]; + +/* 0x0020 */ +} __packed; + +/* Type of superblock segments */ +enum { + SSDFS_MAIN_SB_SEG, + SSDFS_COPY_SB_SEG, + SSDFS_SB_SEG_COPY_MAX, +}; + +/* Different phases of superblok segment */ +enum { + SSDFS_CUR_SB_SEG, + SSDFS_NEXT_SB_SEG, + SSDFS_RESERVED_SB_SEG, + SSDFS_PREV_SB_SEG, + SSDFS_SB_CHAIN_MAX, +}; + +/* + * struct ssdfs_leb2peb_pair - LEB/PEB numbers association + * @leb_id: LEB ID number + * @peb_id: PEB ID number + */ +struct ssdfs_leb2peb_pair { +/* 0x0000 */ + __le64 leb_id; + __le64 peb_id; + +/* 0x0010 */ +} __packed; + +/* + * struct ssdfs_btree_index - btree index + * @hash: hash value + * @extent: btree node's extent + * + * The goal of btree index is to provide the way to search + * a proper btree node by means of hash value. The hash + * value could be inode_id, string hash and so on. + */ +struct ssdfs_btree_index { +/* 0x0000 */ + __le64 hash; + +/* 0x0008 */ + struct ssdfs_raw_extent extent; + +/* 0x0018 */ +} __packed; + +#define SSDFS_BTREE_NODE_INVALID_ID (U32_MAX) + +/* + * struct ssdfs_btree_index_key - node identification key + * @node_id: node identification key + * @node_type: type of the node + * @height: node's height + * @flags: index flags + * @index: node's index + */ +struct ssdfs_btree_index_key { +/* 0x0000 */ + __le32 node_id; + __le8 node_type; + __le8 height; +#define SSDFS_BTREE_INDEX_HAS_VALID_EXTENT (1 << 0) +#define SSDFS_BTREE_INDEX_SHOW_EMPTY_NODE (1 << 1) +#define SSDFS_BTREE_INDEX_SHOW_FREE_ITEMS (1 << 2) +#define SSDFS_BTREE_INDEX_HAS_CHILD_WITH_FREE_ITEMS (1 << 3) +#define SSDFS_BTREE_INDEX_SHOW_PREALLOCATED_CHILD (1 << 4) +#define SSDFS_BTREE_INDEX_FLAGS_MASK 0x1F + __le16 flags; + +/* 0x0008 */ + struct ssdfs_btree_index index; + +/* 0x0020 */ +} __packed; + +#define SSDFS_BTREE_ROOT_NODE_INDEX_COUNT (2) + +/* + * struct ssdfs_btree_root_node_header - root node header + * @height: btree height + * @items_count: count of items in the root node + * @flags: root node flags + * @type: root node type + * @upper_node_id: last allocated the node identification number + * @node_ids: root node's children IDs + */ +struct ssdfs_btree_root_node_header { +/* 0x0000 */ +#define SSDFS_BTREE_LEAF_NODE_HEIGHT (0) + __le8 height; + __le8 items_count; + __le8 flags; + __le8 type; + +/* 0x0004 */ +#define SSDFS_BTREE_ROOT_NODE_ID (0) + __le32 upper_node_id; + +/* 0x0008 */ + __le32 node_ids[SSDFS_BTREE_ROOT_NODE_INDEX_COUNT]; + +/* 0x0010 */ +} __packed; + +/* + * struct ssdfs_btree_inline_root_node - btree root node + * @header: node header + * @indexes: root node's index array + * + * The goal of root node is to live inside of 0x40 bytes + * space and to keep the root index node of the tree. + * The inline root node could be the part of inode + * structure or the part of btree root. The inode has + * 0x80 bytes space. But inode needs to store as + * extent/dentries tree as extended attributes tree. + * So, 0x80 bytes is used for storing two btrees. + * + * The root node's indexes has pre-defined type. + * If height of the tree equals to 1 - 3 range then + * root node's indexes define hybrid nodes. Otherwise, + * if tree's height is greater than 3 then root node's + * indexes define pure index nodes. + */ +struct ssdfs_btree_inline_root_node { +/* 0x0000 */ + struct ssdfs_btree_root_node_header header; + +/* 0x0010 */ +#define SSDFS_ROOT_NODE_LEFT_LEAF_NODE (0) +#define SSDFS_ROOT_NODE_RIGHT_LEAF_NODE (1) +#define SSDFS_BTREE_ROOT_NODE_INDEX_COUNT (2) + struct ssdfs_btree_index indexes[SSDFS_BTREE_ROOT_NODE_INDEX_COUNT]; + +/* 0x0040 */ +} __packed; + +/* + * struct ssdfs_inodes_btree - inodes btree + * @desc: btree descriptor + * @allocated_inodes: count of allocated inodes + * @free_inodes: count of free inodes + * @inodes_capacity: count of inodes in the whole btree + * @leaf_nodes: count of leaf btree nodes + * @nodes_count: count of nodes in the whole btree + * @upper_allocated_ino: maximal allocated inode ID number + * @root_node: btree's root node + * + * The goal of a btree root is to keep + * the main features of a tree and knowledge + * about two root indexes. These indexes splits + * the whole btree on two branches. + */ +struct ssdfs_inodes_btree { +/* 0x0000 */ + struct ssdfs_btree_descriptor desc; + +/* 0x0010 */ + __le64 allocated_inodes; + __le64 free_inodes; + +/* 0x0020 */ + __le64 inodes_capacity; + __le32 leaf_nodes; + __le32 nodes_count; + +/* 0x0030 */ + __le64 upper_allocated_ino; + __le8 reserved[0x8]; + +/* 0x0040 */ + struct ssdfs_btree_inline_root_node root_node; + +/* 0x0080 */ +} __packed; + +/* + * struct ssdfs_shared_extents_btree - shared extents btree + * @desc: btree descriptor + * @root_node: btree's root node + * + * The goal of a btree root is to keep + * the main features of a tree and knowledge + * about two root indexes. These indexes splits + * the whole btree on two branches. + */ +struct ssdfs_shared_extents_btree { +/* 0x0000 */ + struct ssdfs_btree_descriptor desc; + +/* 0x0010 */ + __le8 reserved[0x30]; + +/* 0x0040 */ + struct ssdfs_btree_inline_root_node root_node; + +/* 0x0080 */ +} __packed; + +/* + * ssdfs_shared_dictionary_btree - shared strings dictionary btree + * @desc: btree descriptor + * @root_node: btree's root node + * + * The goal of a btree root is to keep + * the main features of a tree and knowledge + * about two root indexes. These indexes splits + * the whole btree on two branches. + */ +struct ssdfs_shared_dictionary_btree { +/* 0x0000 */ + struct ssdfs_btree_descriptor desc; + +/* 0x0010 */ + __le8 reserved[0x30]; + +/* 0x0040 */ + struct ssdfs_btree_inline_root_node root_node; + +/* 0x0080 */ +} __packed; + +/* + * struct ssdfs_shared_xattr_btree - shared extended attributes btree + * @desc: btree descriptor + * @root_node: btree's root node + * + * The goal of a btree root is to keep + * the main features of a tree and knowledge + * about two root indexes. These indexes splits + * the whole btree on two branches. + */ +struct ssdfs_shared_xattr_btree { +/* 0x0000 */ + struct ssdfs_btree_descriptor desc; + +/* 0x0010 */ + __le8 reserved[0x30]; + +/* 0x0040 */ + struct ssdfs_btree_inline_root_node root_node; + +/* 0x0080 */ +} __packed; + +/* + * struct ssdfs_snapshots_btree - snapshots btree + * @desc: btree descriptor + * @root_node: btree's root node + * + * The goal of a btree root is to keep + * the main features of a tree and knowledge + * about two root indexes. These indexes splits + * the whole btree on two branches. + */ +struct ssdfs_snapshots_btree { +/* 0x0000 */ + struct ssdfs_btree_descriptor desc; + +/* 0x0010 */ + __le8 reserved[0x30]; + +/* 0x0040 */ + struct ssdfs_btree_inline_root_node root_node; + +/* 0x0080 */ +} __packed; + +/* + * struct ssdfs_invalidated_extents_btree - invalidated extents btree + * @desc: btree descriptor + * @root_node: btree's root node + * + * The goal of a btree root is to keep + * the main features of a tree and knowledge + * about two root indexes. These indexes splits + * the whole btree on two branches. + */ +struct ssdfs_invalidated_extents_btree { +/* 0x0000 */ + struct ssdfs_btree_descriptor desc; + +/* 0x0010 */ + __le8 reserved[0x30]; + +/* 0x0040 */ + struct ssdfs_btree_inline_root_node root_node; + +/* 0x0080 */ +} __packed; + +enum { + SSDFS_CUR_DATA_SEG, + SSDFS_CUR_LNODE_SEG, + SSDFS_CUR_HNODE_SEG, + SSDFS_CUR_IDXNODE_SEG, + SSDFS_CUR_DATA_UPDATE_SEG, /* ZNS SSD case */ + SSDFS_CUR_SEGS_COUNT, +}; + +/* + * struct ssdfs_blk_bmap_options - block bitmap options + * @flags: block bitmap's flags + * @compression: compression type + */ +struct ssdfs_blk_bmap_options { +/* 0x0000 */ +#define SSDFS_BLK_BMAP_CREATE_COPY (1 << 0) +#define SSDFS_BLK_BMAP_MAKE_COMPRESSION (1 << 1) +#define SSDFS_BLK_BMAP_OPTIONS_MASK (0x3) + __le16 flags; +#define SSDFS_BLK_BMAP_NOCOMPR_TYPE (0) +#define SSDFS_BLK_BMAP_ZLIB_COMPR_TYPE (1) +#define SSDFS_BLK_BMAP_LZO_COMPR_TYPE (2) + __le8 compression; + __le8 reserved; + +/* 0x0004 */ +} __packed; + +/* + * struct ssdfs_blk2off_tbl_options - offset translation table options + * @flags: offset translation table's flags + * @compression: compression type + */ +struct ssdfs_blk2off_tbl_options { +/* 0x0000 */ +#define SSDFS_BLK2OFF_TBL_CREATE_COPY (1 << 0) +#define SSDFS_BLK2OFF_TBL_MAKE_COMPRESSION (1 << 1) +#define SSDFS_BLK2OFF_TBL_OPTIONS_MASK (0x3) + __le16 flags; +#define SSDFS_BLK2OFF_TBL_NOCOMPR_TYPE (0) +#define SSDFS_BLK2OFF_TBL_ZLIB_COMPR_TYPE (1) +#define SSDFS_BLK2OFF_TBL_LZO_COMPR_TYPE (2) + __le8 compression; + __le8 reserved; + +/* 0x0004 */ +} __packed; + +/* + * struct ssdfs_user_data_options - user data options + * @flags: user data's flags + * @compression: compression type + * @migration_threshold: default value of destination PEBs in migration + */ +struct ssdfs_user_data_options { +/* 0x0000 */ +#define SSDFS_USER_DATA_MAKE_COMPRESSION (1 << 0) +#define SSDFS_USER_DATA_OPTIONS_MASK (0x1) + __le16 flags; +#define SSDFS_USER_DATA_NOCOMPR_TYPE (0) +#define SSDFS_USER_DATA_ZLIB_COMPR_TYPE (1) +#define SSDFS_USER_DATA_LZO_COMPR_TYPE (2) + __le8 compression; + __le8 reserved1; + __le16 migration_threshold; + __le16 reserved2; + +/* 0x0008 */ +} __packed; + +#define SSDFS_INODE_HASNT_INLINE_FORKS (0) +#define SSDFS_INLINE_FORKS_COUNT (2) +#define SSDFS_INLINE_EXTENTS_COUNT (3) + +/* + * struct ssdfs_raw_fork - contiguous sequence of raw (on-disk) extents + * @start_offset: start logical offset in pages (blocks) from file's beginning + * @blks_count: count of logical blocks in the fork (no holes) + * @extents: sequence of raw (on-disk) extents + */ +struct ssdfs_raw_fork { +/* 0x0000 */ + __le64 start_offset; + __le64 blks_count; + +/* 0x0010 */ + struct ssdfs_raw_extent extents[SSDFS_INLINE_EXTENTS_COUNT]; + +/* 0x0040 */ +} __packed; + +/* + * struct ssdfs_name_hash - hash of the name + * @raw: raw value of the hash64 + * + * The name's hash is 64 bits wide (8 bytes). But the hash64 has + * special structure. The first 4 bytes are the low hash (hash32_lo) + * of the name. The second 4 bytes is the high hash (hash32_hi) + * of the name. If the name lesser or equal to 12 symbols (inline + * name's string) then hash32_hi will be equal to zero always. + * If the name is greater than 12 symbols then the hash32_hi + * will be the hash of the rest of the name (excluding the + * first 12 symbols). The hash32_lo will be defined by inline + * name's length. The inline names (12 symbols long) will be + * stored into dentries only. The regular names will be stored + * partially in the dentry (12 symbols) and the whole name string + * will be stored into shared dictionary. + */ +struct ssdfs_name_hash { +/* 0x0000 */ + __le64 raw; + +/* 0x0008 */ +} __packed; + +/* Name hash related macros */ +#define SSDFS_NAME_HASH(hash32_lo, hash32_hi)({ \ + u64 hash64 = (u32)hash32_lo; \ + hash64 <<= 32; \ + hash64 |= hash32_hi; \ + hash64; \ +}) +#define SSDFS_NAME_HASH_LE64(hash32_lo, hash32_hi) \ + (cpu_to_le64(SSDFS_NAME_HASH(hash32_lo, hash32_hi))) +#define LE64_TO_SSDFS_HASH32_LO(hash_le64) \ + ((u32)(le64_to_cpu(hash_le64) >> 32)) +#define SSDFS_HASH32_LO(hash64) \ + ((u32)(hash64 >> 32)) +#define LE64_TO_SSDFS_HASH32_HI(hash_le64) \ + ((u32)(le64_to_cpu(hash_le64) & 0xFFFFFFFF)) +#define SSDFS_HASH32_HI(hash64) \ + ((u32)(hash64 & 0xFFFFFFFF)) + +/* + * struct ssdfs_dir_entry - directory entry + * @ino: inode number + * @hash_code: name string's hash code + * @name_len: name length in bytes + * @dentry_type: dentry type + * @file_type: directory file types + * @flags: dentry's flags + * @inline_string: inline copy of the name or exclusive storage of short name + */ +struct ssdfs_dir_entry { +/* 0x0000 */ + __le64 ino; + __le64 hash_code; + +/* 0x0010 */ + __le8 name_len; + __le8 dentry_type; + __le8 file_type; + __le8 flags; +#define SSDFS_DENTRY_INLINE_NAME_MAX_LEN (12) + __le8 inline_string[SSDFS_DENTRY_INLINE_NAME_MAX_LEN]; + +/* 0x0020 */ +} __packed; + +/* Dentry types */ +enum { + SSDFS_DENTRY_UNKNOWN_TYPE, + SSDFS_INLINE_DENTRY, + SSDFS_REGULAR_DENTRY, + SSDFS_DENTRY_TYPE_MAX +}; + +/* + * SSDFS directory file types. + */ +enum { + SSDFS_FT_UNKNOWN, + SSDFS_FT_REG_FILE, + SSDFS_FT_DIR, + SSDFS_FT_CHRDEV, + SSDFS_FT_BLKDEV, + SSDFS_FT_FIFO, + SSDFS_FT_SOCK, + SSDFS_FT_SYMLINK, + SSDFS_FT_MAX +}; + +/* Dentry flags */ +#define SSDFS_DENTRY_HAS_EXTERNAL_STRING (1 << 0) +#define SSDFS_DENTRY_FLAGS_MASK 0x1 + +/* + * struct ssdfs_blob_extent - blob's extent descriptor + * @hash: blob's hash + * @extent: blob's extent + */ +struct ssdfs_blob_extent { +/* 0x0000 */ + __le64 hash; + __le64 reserved; + struct ssdfs_raw_extent extent; + +/* 0x0020 */ +} __packed; + +#define SSDFS_XATTR_INLINE_BLOB_MAX_LEN (32) +#define SSDFS_XATTR_EXTERNAL_BLOB_MAX_LEN (32768) + +/* + * struct ssdfs_blob_bytes - inline blob's byte stream + * @bytes: byte stream + */ +struct ssdfs_blob_bytes { +/* 0x0000 */ + __le8 bytes[SSDFS_XATTR_INLINE_BLOB_MAX_LEN]; + +/* 0x0020 */ +} __packed; + +/* + * struct ssdfs_xattr_entry - extended attribute entry + * @name_hash: hash of the name + * @inline_index: index of the inline xattr + * @name_len: length of the name + * @name_type: type of the name + * @name_flags: flags of the name + * @blob_len: blob length in bytes + * @blob_type: type of the blob + * @blob_flags: flags of the blob + * @inline_string: inline string of the name + * @blob.descriptor.hash: hash of the blob + * @blob.descriptor.extent: extent of the blob + * @blob.inline_value: inline value of the blob + * + * The extended attribute can be described by fixed size + * descriptor. The name of extended attribute can be inline + * or to be stored into the shared dictionary. If the name + * is greater than 16 symbols then it will be stored into shared + * dictionary. The blob part can be stored inline or, + * otherwise, the descriptor contains the hash of the blob + * and blob will be stored as ordinary file inside + * of logical blocks. + */ +struct ssdfs_xattr_entry { +/* 0x0000 */ + __le64 name_hash; + +/* 0x0008 */ + __le8 inline_index; + __le8 name_len; + __le8 name_type; + __le8 name_flags; + +/* 0x000C */ + __le16 blob_len; + __le8 blob_type; + __le8 blob_flags; + +/* 0x0010 */ +#define SSDFS_XATTR_INLINE_NAME_MAX_LEN (16) + __le8 inline_string[SSDFS_XATTR_INLINE_NAME_MAX_LEN]; + +/* 0x0020 */ + union { + struct ssdfs_blob_extent descriptor; + struct ssdfs_blob_bytes inline_value; + } blob; + +/* 0x0040 */ +} __packed; + +/* registered names' prefixes */ +enum { + SSDFS_USER_NS_INDEX, + SSDFS_TRUSTED_NS_INDEX, + SSDFS_SYSTEM_NS_INDEX, + SSDFS_SECURITY_NS_INDEX, + SSDFS_REGISTERED_NS_NUMBER +}; + +static const char * const SSDFS_NS_PREFIX[] = { + "user.", + "trusted.", + "system.", + "security.", +}; + +/* xattr name types */ +enum { + SSDFS_XATTR_NAME_UNKNOWN_TYPE, + SSDFS_XATTR_INLINE_NAME, + SSDFS_XATTR_USER_INLINE_NAME, + SSDFS_XATTR_TRUSTED_INLINE_NAME, + SSDFS_XATTR_SYSTEM_INLINE_NAME, + SSDFS_XATTR_SECURITY_INLINE_NAME, + SSDFS_XATTR_REGULAR_NAME, + SSDFS_XATTR_USER_REGULAR_NAME, + SSDFS_XATTR_TRUSTED_REGULAR_NAME, + SSDFS_XATTR_SYSTEM_REGULAR_NAME, + SSDFS_XATTR_SECURITY_REGULAR_NAME, + SSDFS_XATTR_NAME_TYPE_MAX +}; + +/* xattr name flags */ +#define SSDFS_XATTR_HAS_EXTERNAL_STRING (1 << 0) +#define SSDFS_XATTR_NAME_FLAGS_MASK 0x1 + +/* xattr blob types */ +enum { + SSDFS_XATTR_BLOB_UNKNOWN_TYPE, + SSDFS_XATTR_INLINE_BLOB, + SSDFS_XATTR_REGULAR_BLOB, + SSDFS_XATTR_BLOB_TYPE_MAX +}; + +/* xattr blob flags */ +#define SSDFS_XATTR_HAS_EXTERNAL_BLOB (1 << 0) +#define SSDFS_XATTR_BLOB_FLAGS_MASK 0x1 + +#define SSDFS_INLINE_DENTRIES_PER_AREA (2) +#define SSDFS_INLINE_STREAM_SIZE_PER_AREA (64) +#define SSDFS_DEFAULT_INLINE_XATTR_COUNT (1) + +/* + * struct ssdfs_inode_inline_stream - inode's inline stream + * @bytes: bytes array + */ +struct ssdfs_inode_inline_stream { +/* 0x0000 */ + __le8 bytes[SSDFS_INLINE_STREAM_SIZE_PER_AREA]; + +/* 0x0040 */ +} __packed; + +/* + * struct ssdfs_inode_inline_dentries - inline dentries array + * @array: dentries array + */ +struct ssdfs_inode_inline_dentries { +/* 0x0000 */ + struct ssdfs_dir_entry array[SSDFS_INLINE_DENTRIES_PER_AREA]; + +/* 0x0040 */ +} __packed; + +/* + * struct ssdfs_inode_private_area - inode's private area + * @area1.inline_stream: inline file's content + * @area1.extents_root: extents btree root node + * @area1.fork: inline fork + * @area1.dentries_root: dentries btree root node + * @area1.dentries: inline dentries + * @area2.inline_stream: inline file's content + * @area2.inline_xattr: inline extended attribute + * @area2.xattr_root: extended attributes btree root node + * @area2.fork: inline fork + * @area2.dentries: inline dentries + */ +struct ssdfs_inode_private_area { +/* 0x0000 */ + union { + struct ssdfs_inode_inline_stream inline_stream; + struct ssdfs_btree_inline_root_node extents_root; + struct ssdfs_raw_fork fork; + struct ssdfs_btree_inline_root_node dentries_root; + struct ssdfs_inode_inline_dentries dentries; + } area1; + +/* 0x0040 */ + union { + struct ssdfs_inode_inline_stream inline_stream; + struct ssdfs_xattr_entry inline_xattr; + struct ssdfs_btree_inline_root_node xattr_root; + struct ssdfs_raw_fork fork; + struct ssdfs_inode_inline_dentries dentries; + } area2; + +/* 0x0080 */ +} __packed; + +/* + * struct ssdfs_inode - raw (on-disk) inode + * @magic: inode magic + * @mode: file mode + * @flags: file attributes + * @uid: owner user ID + * @gid: owner group ID + * @atime: access time (seconds) + * @ctime: change time (seconds) + * @mtime: modification time (seconds) + * @birthtime: inode creation time (seconds) + * @atime_nsec: access time in nano scale + * @ctime_nsec: change time in nano scale + * @mtime_nsec: modification time in nano scale + * @birthtime_nsec: creation time in nano scale + * @generation: file version (for NFS) + * @size: file size in bytes + * @blocks: file size in blocks + * @parent_ino: parent inode number + * @refcount: links count + * @checksum: inode checksum + * @ino: inode number + * @hash_code: hash code of file name + * @name_len: lengh of file name + * @forks_count: count of forks + * @internal: array of inline private areas of inode + */ +struct ssdfs_inode { +/* 0x0000 */ + __le16 magic; /* Inode magic */ + __le16 mode; /* File mode */ + __le32 flags; /* file attributes */ + +/* 0x0008 */ + __le32 uid; /* user ID */ + __le32 gid; /* group ID */ + +/* 0x0010 */ + __le64 atime; /* access time */ + __le64 ctime; /* change time */ + __le64 mtime; /* modification time */ + __le64 birthtime; /* inode creation time */ + +/* 0x0030 */ + __le32 atime_nsec; /* access time in nano scale */ + __le32 ctime_nsec; /* change time in nano scale */ + __le32 mtime_nsec; /* modification time in nano scale */ + __le32 birthtime_nsec; /* creation time in nano scale */ + +/* 0x0040 */ + __le64 generation; /* file version (for NFS) */ + __le64 size; /* file size in bytes */ + __le64 blocks; /* file size in blocks */ + __le64 parent_ino; /* parent inode number */ + +/* 0x0060 */ + __le32 refcount; /* links count */ + __le32 checksum; /* inode checksum */ + +/* 0x0068 */ +/* TODO: maybe use the hash code of file name as inode number */ + __le64 ino; /* Inode number */ + __le64 hash_code; /* hash code of file name */ + __le16 name_len; /* lengh of file name */ +#define SSDFS_INODE_HAS_INLINE_EXTENTS (1 << 0) +#define SSDFS_INODE_HAS_EXTENTS_BTREE (1 << 1) +#define SSDFS_INODE_HAS_INLINE_DENTRIES (1 << 2) +#define SSDFS_INODE_HAS_DENTRIES_BTREE (1 << 3) +#define SSDFS_INODE_HAS_INLINE_XATTR (1 << 4) +#define SSDFS_INODE_HAS_XATTR_BTREE (1 << 5) +#define SSDFS_INODE_HAS_INLINE_FILE (1 << 6) +#define SSDFS_INODE_PRIVATE_FLAGS_MASK 0x7F + __le16 private_flags; + + union { + __le32 forks; + __le32 dentries; + } count_of __packed; + +/* 0x0080 */ + struct ssdfs_inode_private_area internal[1]; + +/* 0x0100 */ +} __packed; + +#define SSDFS_IFREG_PRIVATE_FLAG_MASK \ + (SSDFS_INODE_HAS_INLINE_EXTENTS | \ + SSDFS_INODE_HAS_EXTENTS_BTREE | \ + SSDFS_INODE_HAS_INLINE_XATTR | \ + SSDFS_INODE_HAS_XATTR_BTREE | \ + SSDFS_INODE_HAS_INLINE_FILE) + +#define SSDFS_IFDIR_PRIVATE_FLAG_MASK \ + (SSDFS_INODE_HAS_INLINE_DENTRIES | \ + SSDFS_INODE_HAS_DENTRIES_BTREE | \ + SSDFS_INODE_HAS_INLINE_XATTR | \ + SSDFS_INODE_HAS_XATTR_BTREE) + +/* + * struct ssdfs_volume_header - static part of superblock + * @magic: magic signature + revision + * @check: metadata checksum + * @log_pagesize: log2(page size) + * @log_erasesize: log2(erase block size) + * @log_segsize: log2(segment size) + * @log_pebs_per_seg: log2(erase blocks per segment) + * @megabytes_per_peb: MBs in one PEB + * @pebs_per_seg: number of PEBs per segment + * @create_time: volume create timestamp (mkfs phase) + * @create_cno: volume create checkpoint + * @flags: volume creation flags + * @lebs_per_peb_index: difference of LEB IDs between PEB indexes in segment + * @sb_pebs: array of prev, cur and next superblock's PEB numbers + * @segbmap: superblock's segment bitmap header + * @maptbl: superblock's mapping table header + * @sb_seg_log_pages: full log size in sb segment (pages count) + * @segbmap_log_pages: full log size in segbmap segment (pages count) + * @maptbl_log_pages: full log size in maptbl segment (pages count) + * @lnodes_seg_log_pages: full log size in leaf nodes segment (pages count) + * @hnodes_seg_log_pages: full log size in hybrid nodes segment (pages count) + * @inodes_seg_log_pages: full log size in index nodes segment (pages count) + * @user_data_log_pages: full log size in user data segment (pages count) + * @create_threads_per_seg: number of creation threads per segment + * @dentries_btree: descriptor of all dentries btrees + * @extents_btree: descriptor of all extents btrees + * @xattr_btree: descriptor of all extended attributes btrees + * @invalidated_extents_btree: b-tree of invalidated extents (ZNS SSD) + */ +struct ssdfs_volume_header { +/* 0x0000 */ + struct ssdfs_signature magic; + +/* 0x0008 */ + struct ssdfs_metadata_check check; + +/* 0x0010 */ + __le8 log_pagesize; + __le8 log_erasesize; + __le8 log_segsize; + __le8 log_pebs_per_seg; + __le16 megabytes_per_peb; + __le16 pebs_per_seg; + +/* 0x0018 */ + __le64 create_time; + __le64 create_cno; +#define SSDFS_VH_ZNS_BASED_VOLUME (1 << 0) +#define SSDFS_VH_UNALIGNED_ZONE (1 << 1) +#define SSDFS_VH_FLAGS_MASK (0x3) + __le32 flags; + __le32 lebs_per_peb_index; + +/* 0x0030 */ +#define VH_LIMIT1 SSDFS_SB_CHAIN_MAX +#define VH_LIMIT2 SSDFS_SB_SEG_COPY_MAX + struct ssdfs_leb2peb_pair sb_pebs[VH_LIMIT1][VH_LIMIT2]; + +/* 0x00B0 */ + struct ssdfs_segbmap_sb_header segbmap; + +/* 0x0140 */ + struct ssdfs_maptbl_sb_header maptbl; + +/* 0x01D0 */ + __le16 sb_seg_log_pages; + __le16 segbmap_log_pages; + __le16 maptbl_log_pages; + __le16 lnodes_seg_log_pages; + __le16 hnodes_seg_log_pages; + __le16 inodes_seg_log_pages; + __le16 user_data_log_pages; + __le16 create_threads_per_seg; + +/* 0x01E0 */ + struct ssdfs_dentries_btree_descriptor dentries_btree; + +/* 0x0200 */ + struct ssdfs_extents_btree_descriptor extents_btree; + +/* 0x0220 */ + struct ssdfs_xattr_btree_descriptor xattr_btree; + +/* 0x0240 */ + struct ssdfs_invalidated_extents_btree invextree; + +/* 0x02C0 */ + __le8 reserved4[0x140]; + +/* 0x0400 */ +} __packed; + +#define SSDFS_LEBS_PER_PEB_INDEX_DEFAULT (1) + +/* + * struct ssdfs_volume_state - changeable part of superblock + * @magic: magic signature + revision + * @check: metadata checksum + * @nsegs: segments count + * @free_pages: free pages count + * @timestamp: write timestamp + * @cno: write checkpoint + * @flags: volume flags + * @state: file system state + * @errors: behaviour when detecting errors + * @feature_compat: compatible feature set + * @feature_compat_ro: read-only compatible feature set + * @feature_incompat: incompatible feature set + * @uuid: 128-bit uuid for volume + * @label: volume name + * @cur_segs: array of current segment numbers + * @migration_threshold: default value of destination PEBs in migration + * @blkbmap: block bitmap options + * @blk2off_tbl: offset translation table options + * @user_data: user data options + * @open_zones: number of open/active zones + * @root_folder: copy of root folder's inode + * @inodes_btree: inodes btree root + * @shared_extents_btree: shared extents btree root + * @shared_dict_btree: shared dictionary btree root + * @snapshots_btree: snapshots btree root + */ +struct ssdfs_volume_state { +/* 0x0000 */ + struct ssdfs_signature magic; + +/* 0x0008 */ + struct ssdfs_metadata_check check; + +/* 0x0010 */ + __le64 nsegs; + __le64 free_pages; + +/* 0x0020 */ + __le64 timestamp; + __le64 cno; + +/* 0x0030 */ +#define SSDFS_HAS_INLINE_INODES_TREE (1 << 0) +#define SSDFS_VOLUME_STATE_FLAGS_MASK 0x1 + __le32 flags; + __le16 state; + __le16 errors; + +/* 0x0038 */ + __le64 feature_compat; + __le64 feature_compat_ro; + __le64 feature_incompat; + +/* 0x0050 */ + __le8 uuid[SSDFS_UUID_SIZE]; + char label[SSDFS_VOLUME_LABEL_MAX]; + +/* 0x0070 */ + __le64 cur_segs[SSDFS_CUR_SEGS_COUNT]; + +/* 0x0098 */ + __le16 migration_threshold; + __le16 reserved1; + +/* 0x009C */ + struct ssdfs_blk_bmap_options blkbmap; + struct ssdfs_blk2off_tbl_options blk2off_tbl; + +/* 0x00A4 */ + struct ssdfs_user_data_options user_data; + +/* 0x00AC */ + __le32 open_zones; + +/* 0x00B0 */ + struct ssdfs_inode root_folder; + +/* 0x01B0 */ + __le8 reserved3[0x50]; + +/* 0x0200 */ + struct ssdfs_inodes_btree inodes_btree; + +/* 0x0280 */ + struct ssdfs_shared_extents_btree shared_extents_btree; + +/* 0x0300 */ + struct ssdfs_shared_dictionary_btree shared_dict_btree; + +/* 0x0380 */ + struct ssdfs_snapshots_btree snapshots_btree; + +/* 0x0400 */ +} __packed; + +/* Compatible feature flags */ +#define SSDFS_HAS_SEGBMAP_COMPAT_FLAG (1 << 0) +#define SSDFS_HAS_MAPTBL_COMPAT_FLAG (1 << 1) +#define SSDFS_HAS_SHARED_EXTENTS_COMPAT_FLAG (1 << 2) +#define SSDFS_HAS_SHARED_XATTRS_COMPAT_FLAG (1 << 3) +#define SSDFS_HAS_SHARED_DICT_COMPAT_FLAG (1 << 4) +#define SSDFS_HAS_INODES_TREE_COMPAT_FLAG (1 << 5) +#define SSDFS_HAS_SNAPSHOTS_TREE_COMPAT_FLAG (1 << 6) +#define SSDFS_HAS_INVALID_EXTENTS_TREE_COMPAT_FLAG (1 << 7) + +/* Read-Only compatible feature flags */ +#define SSDFS_ZLIB_COMPAT_RO_FLAG (1 << 0) +#define SSDFS_LZO_COMPAT_RO_FLAG (1 << 1) + +#define SSDFS_FEATURE_COMPAT_SUPP \ + (SSDFS_HAS_SEGBMAP_COMPAT_FLAG | SSDFS_HAS_MAPTBL_COMPAT_FLAG | \ + SSDFS_HAS_SHARED_EXTENTS_COMPAT_FLAG | \ + SSDFS_HAS_SHARED_XATTRS_COMPAT_FLAG | \ + SSDFS_HAS_SHARED_DICT_COMPAT_FLAG | \ + SSDFS_HAS_INODES_TREE_COMPAT_FLAG | \ + SSDFS_HAS_SNAPSHOTS_TREE_COMPAT_FLAG | \ + SSDFS_HAS_INVALID_EXTENTS_TREE_COMPAT_FLAG) + +#define SSDFS_FEATURE_COMPAT_RO_SUPP \ + (SSDFS_ZLIB_COMPAT_RO_FLAG | SSDFS_LZO_COMPAT_RO_FLAG) + +#define SSDFS_FEATURE_INCOMPAT_SUPP 0ULL + +/* + * struct ssdfs_metadata_descriptor - metadata descriptor + * @offset: offset in bytes + * @size: size in bytes + * @check: metadata checksum + */ +struct ssdfs_metadata_descriptor { +/* 0x0000 */ + __le32 offset; + __le32 size; + struct ssdfs_metadata_check check; + +/* 0x0010 */ +} __packed; + +enum { + SSDFS_BLK_BMAP_INDEX, + SSDFS_SNAPSHOT_RULES_AREA_INDEX, + SSDFS_OFF_TABLE_INDEX, + SSDFS_COLD_PAYLOAD_AREA_INDEX, + SSDFS_WARM_PAYLOAD_AREA_INDEX, + SSDFS_HOT_PAYLOAD_AREA_INDEX, + SSDFS_BLK_DESC_AREA_INDEX, + SSDFS_MAPTBL_CACHE_INDEX, + SSDFS_LOG_FOOTER_INDEX, + SSDFS_SEG_HDR_DESC_MAX = SSDFS_LOG_FOOTER_INDEX + 1, + SSDFS_LOG_FOOTER_DESC_MAX = SSDFS_OFF_TABLE_INDEX + 1, +}; + +enum { + SSDFS_PREV_MIGRATING_PEB, + SSDFS_CUR_MIGRATING_PEB, + SSDFS_MIGRATING_PEBS_CHAIN +}; + +/* + * struct ssdfs_segment_header - header of segment + * @volume_hdr: copy of static part of superblock + * @timestamp: log creation timestamp + * @cno: log checkpoint + * @log_pages: size of log (partial segment) in pages count + * @seg_type: type of segment + * @seg_flags: flags of segment + * @desc_array: array of segment's metadata descriptors + * @peb_migration_id: identification number of PEB in migration sequence + * @peb_create_time: PEB creation timestamp + * @payload: space for segment header's payload + */ +struct ssdfs_segment_header { +/* 0x0000 */ + struct ssdfs_volume_header volume_hdr; + +/* 0x0400 */ + __le64 timestamp; + __le64 cno; + +/* 0x0410 */ + __le16 log_pages; + __le16 seg_type; + __le32 seg_flags; + +/* 0x0418 */ + struct ssdfs_metadata_descriptor desc_array[SSDFS_SEG_HDR_DESC_MAX]; + +/* 0x04A8 */ +#define SSDFS_PEB_UNKNOWN_MIGRATION_ID (0) +#define SSDFS_PEB_MIGRATION_ID_START (1) +#define SSDFS_PEB_MIGRATION_ID_MAX (U8_MAX) + __le8 peb_migration_id[SSDFS_MIGRATING_PEBS_CHAIN]; + +/* 0x4AA */ + __le64 peb_create_time; + +/* 0x4B2 */ + __le8 payload[0x34E]; + +/* 0x0800 */ +} __packed; + +/* Possible segment types */ +#define SSDFS_UNKNOWN_SEG_TYPE (0) +#define SSDFS_SB_SEG_TYPE (1) +#define SSDFS_INITIAL_SNAPSHOT_SEG_TYPE (2) +#define SSDFS_SEGBMAP_SEG_TYPE (3) +#define SSDFS_MAPTBL_SEG_TYPE (4) +#define SSDFS_LEAF_NODE_SEG_TYPE (5) +#define SSDFS_HYBRID_NODE_SEG_TYPE (6) +#define SSDFS_INDEX_NODE_SEG_TYPE (7) +#define SSDFS_USER_DATA_SEG_TYPE (8) +#define SSDFS_LAST_KNOWN_SEG_TYPE SSDFS_USER_DATA_SEG_TYPE + +/* Segment flags' bits */ +#define SSDFS_BLK_BMAP_BIT (0) +#define SSDFS_OFFSET_TABLE_BIT (1) +#define SSDFS_COLD_PAYLOAD_BIT (2) +#define SSDFS_WARM_PAYLOAD_BIT (3) +#define SSDFS_HOT_PAYLOAD_BIT (4) +#define SSDFS_BLK_DESC_CHAIN_BIT (5) +#define SSDFS_MAPTBL_CACHE_BIT (6) +#define SSDFS_FOOTER_BIT (7) +#define SSDFS_PARTIAL_LOG_BIT (8) +#define SSDFS_PARTIAL_LOG_HEADER_BIT (9) +#define SSDFS_PLH_INSTEAD_FOOTER_BIT (10) + + +/* Segment flags */ +#define SSDFS_SEG_HDR_HAS_BLK_BMAP (1 << SSDFS_BLK_BMAP_BIT) +#define SSDFS_SEG_HDR_HAS_OFFSET_TABLE (1 << SSDFS_OFFSET_TABLE_BIT) +#define SSDFS_LOG_HAS_COLD_PAYLOAD (1 << SSDFS_COLD_PAYLOAD_BIT) +#define SSDFS_LOG_HAS_WARM_PAYLOAD (1 << SSDFS_WARM_PAYLOAD_BIT) +#define SSDFS_LOG_HAS_HOT_PAYLOAD (1 << SSDFS_HOT_PAYLOAD_BIT) +#define SSDFS_LOG_HAS_BLK_DESC_CHAIN (1 << SSDFS_BLK_DESC_CHAIN_BIT) +#define SSDFS_LOG_HAS_MAPTBL_CACHE (1 << SSDFS_MAPTBL_CACHE_BIT) +#define SSDFS_LOG_HAS_FOOTER (1 << SSDFS_FOOTER_BIT) +#define SSDFS_LOG_IS_PARTIAL (1 << SSDFS_PARTIAL_LOG_BIT) +#define SSDFS_LOG_HAS_PARTIAL_HEADER (1 << SSDFS_PARTIAL_LOG_HEADER_BIT) +#define SSDFS_PARTIAL_HEADER_INSTEAD_FOOTER (1 << SSDFS_PLH_INSTEAD_FOOTER_BIT) +#define SSDFS_SEG_HDR_FLAG_MASK 0x7FF + +/* Segment flags manipulation functions */ +#define SSDFS_SEG_HDR_FNS(bit, name) \ +static inline void ssdfs_set_##name(struct ssdfs_segment_header *hdr) \ +{ \ + unsigned long seg_flags = le32_to_cpu(hdr->seg_flags); \ + set_bit(SSDFS_##bit, &seg_flags); \ + hdr->seg_flags = cpu_to_le32((u32)seg_flags); \ +} \ +static inline void ssdfs_clear_##name(struct ssdfs_segment_header *hdr) \ +{ \ + unsigned long seg_flags = le32_to_cpu(hdr->seg_flags); \ + clear_bit(SSDFS_##bit, &seg_flags); \ + hdr->seg_flags = cpu_to_le32((u32)seg_flags); \ +} \ +static inline int ssdfs_##name(struct ssdfs_segment_header *hdr) \ +{ \ + unsigned long seg_flags = le32_to_cpu(hdr->seg_flags); \ + return test_bit(SSDFS_##bit, &seg_flags); \ +} + +/* + * ssdfs_set_seg_hdr_has_blk_bmap() + * ssdfs_clear_seg_hdr_has_blk_bmap() + * ssdfs_seg_hdr_has_blk_bmap() + */ +SSDFS_SEG_HDR_FNS(BLK_BMAP_BIT, seg_hdr_has_blk_bmap) + +/* + * ssdfs_set_seg_hdr_has_offset_table() + * ssdfs_clear_seg_hdr_has_offset_table() + * ssdfs_seg_hdr_has_offset_table() + */ +SSDFS_SEG_HDR_FNS(OFFSET_TABLE_BIT, seg_hdr_has_offset_table) + +/* + * ssdfs_set_log_has_cold_payload() + * ssdfs_clear_log_has_cold_payload() + * ssdfs_log_has_cold_payload() + */ +SSDFS_SEG_HDR_FNS(COLD_PAYLOAD_BIT, log_has_cold_payload) + +/* + * ssdfs_set_log_has_warm_payload() + * ssdfs_clear_log_has_warm_payload() + * ssdfs_log_has_warm_payload() + */ +SSDFS_SEG_HDR_FNS(WARM_PAYLOAD_BIT, log_has_warm_payload) + +/* + * ssdfs_set_log_has_hot_payload() + * ssdfs_clear_log_has_hot_payload() + * ssdfs_log_has_hot_payload() + */ +SSDFS_SEG_HDR_FNS(HOT_PAYLOAD_BIT, log_has_hot_payload) + +/* + * ssdfs_set_log_has_blk_desc_chain() + * ssdfs_clear_log_has_blk_desc_chain() + * ssdfs_log_has_blk_desc_chain() + */ +SSDFS_SEG_HDR_FNS(BLK_DESC_CHAIN_BIT, log_has_blk_desc_chain) + +/* + * ssdfs_set_log_has_maptbl_cache() + * ssdfs_clear_log_has_maptbl_cache() + * ssdfs_log_has_maptbl_cache() + */ +SSDFS_SEG_HDR_FNS(MAPTBL_CACHE_BIT, log_has_maptbl_cache) + +/* + * ssdfs_set_log_has_footer() + * ssdfs_clear_log_has_footer() + * ssdfs_log_has_footer() + */ +SSDFS_SEG_HDR_FNS(FOOTER_BIT, log_has_footer) + +/* + * ssdfs_set_log_is_partial() + * ssdfs_clear_log_is_partial() + * ssdfs_log_is_partial() + */ +SSDFS_SEG_HDR_FNS(PARTIAL_LOG_BIT, log_is_partial) + +/* + * ssdfs_set_log_has_partial_header() + * ssdfs_clear_log_has_partial_header() + * ssdfs_log_has_partial_header() + */ +SSDFS_SEG_HDR_FNS(PARTIAL_LOG_HEADER_BIT, log_has_partial_header) + +/* + * ssdfs_set_partial_header_instead_footer() + * ssdfs_clear_partial_header_instead_footer() + * ssdfs_partial_header_instead_footer() + */ +SSDFS_SEG_HDR_FNS(PLH_INSTEAD_FOOTER_BIT, partial_header_instead_footer) + +/* + * struct ssdfs_log_footer - footer of partial log + * @volume_state: changeable part of superblock + * @timestamp: writing timestamp + * @cno: writing checkpoint + * @log_bytes: payload size in bytes + * @log_flags: flags of log + * @reserved1: reserved field + * @desc_array: array of footer's metadata descriptors + * @peb_create_time: PEB creation timestamp + * @payload: space for log footer's payload + */ +struct ssdfs_log_footer { +/* 0x0000 */ + struct ssdfs_volume_state volume_state; + +/* 0x0400 */ + __le64 timestamp; + __le64 cno; + +/* 0x0410 */ + __le32 log_bytes; + __le32 log_flags; + __le64 reserved1; + +/* 0x0420 */ + struct ssdfs_metadata_descriptor desc_array[SSDFS_LOG_FOOTER_DESC_MAX]; + +/* 0x0450 */ + __le64 peb_create_time; + +/* 0x0458 */ + __le8 payload[0x3A8]; + +/* 0x0800 */ +} __packed; + +/* Log footer flags' bits */ +#define __SSDFS_BLK_BMAP_BIT (0) +#define __SSDFS_OFFSET_TABLE_BIT (1) +#define __SSDFS_PARTIAL_LOG_BIT (2) +#define __SSDFS_ENDING_LOG_BIT (3) +#define __SSDFS_SNAPSHOT_RULE_AREA_BIT (4) + +/* Log footer flags */ +#define SSDFS_LOG_FOOTER_HAS_BLK_BMAP (1 << __SSDFS_BLK_BMAP_BIT) +#define SSDFS_LOG_FOOTER_HAS_OFFSET_TABLE (1 << __SSDFS_OFFSET_TABLE_BIT) +#define SSDFS_PARTIAL_LOG_FOOTER (1 << __SSDFS_PARTIAL_LOG_BIT) +#define SSDFS_ENDING_LOG_FOOTER (1 << __SSDFS_ENDING_LOG_BIT) +#define SSDFS_LOG_FOOTER_HAS_SNAPSHOT_RULES (1 << __SSDFS_SNAPSHOT_RULE_AREA_BIT) +#define SSDFS_LOG_FOOTER_FLAG_MASK 0x1F + +/* Log footer flags manipulation functions */ +#define SSDFS_LOG_FOOTER_FNS(bit, name) \ +static inline void ssdfs_set_##name(struct ssdfs_log_footer *footer) \ +{ \ + unsigned long log_flags = le32_to_cpu(footer->log_flags); \ + set_bit(__SSDFS_##bit, &log_flags); \ + footer->log_flags = cpu_to_le32((u32)log_flags); \ +} \ +static inline void ssdfs_clear_##name(struct ssdfs_log_footer *footer) \ +{ \ + unsigned long log_flags = le32_to_cpu(footer->log_flags); \ + clear_bit(__SSDFS_##bit, &log_flags); \ + footer->log_flags = cpu_to_le32((u32)log_flags); \ +} \ +static inline int ssdfs_##name(struct ssdfs_log_footer *footer) \ +{ \ + unsigned long log_flags = le32_to_cpu(footer->log_flags); \ + return test_bit(__SSDFS_##bit, &log_flags); \ +} + +/* + * ssdfs_set_log_footer_has_blk_bmap() + * ssdfs_clear_log_footer_has_blk_bmap() + * ssdfs_log_footer_has_blk_bmap() + */ +SSDFS_LOG_FOOTER_FNS(BLK_BMAP_BIT, log_footer_has_blk_bmap) + +/* + * ssdfs_set_log_footer_has_offset_table() + * ssdfs_clear_log_footer_has_offset_table() + * ssdfs_log_footer_has_offset_table() + */ +SSDFS_LOG_FOOTER_FNS(OFFSET_TABLE_BIT, log_footer_has_offset_table) + +/* + * ssdfs_set_partial_log_footer() + * ssdfs_clear_partial_log_footer() + * ssdfs_partial_log_footer() + */ +SSDFS_LOG_FOOTER_FNS(PARTIAL_LOG_BIT, partial_log_footer) + +/* + * ssdfs_set_ending_log_footer() + * ssdfs_clear_ending_log_footer() + * ssdfs_ending_log_footer() + */ +SSDFS_LOG_FOOTER_FNS(ENDING_LOG_BIT, ending_log_footer) + +/* + * ssdfs_set_log_footer_has_snapshot_rules() + * ssdfs_clear_log_footer_has_snapshot_rules() + * ssdfs_log_footer_has_snapshot_rules() + */ +SSDFS_LOG_FOOTER_FNS(SNAPSHOT_RULE_AREA_BIT, log_footer_has_snapshot_rules) + +/* + * struct ssdfs_partial_log_header - header of partial log + * @magic: magic signature + revision + * @check: metadata checksum + * @timestamp: writing timestamp + * @cno: writing checkpoint + * @log_pages: size of log in pages count + * @seg_type: type of segment + * @pl_flags: flags of log + * @log_bytes: payload size in bytes + * @flags: volume flags + * @desc_array: array of log's metadata descriptors + * @nsegs: segments count + * @free_pages: free pages count + * @root_folder: copy of root folder's inode + * @inodes_btree: inodes btree root + * @shared_extents_btree: shared extents btree root + * @shared_dict_btree: shared dictionary btree root + * @sequence_id: index of partial log in the sequence + * @log_pagesize: log2(page size) + * @log_erasesize: log2(erase block size) + * @log_segsize: log2(segment size) + * @log_pebs_per_seg: log2(erase blocks per segment) + * @lebs_per_peb_index: difference of LEB IDs between PEB indexes in segment + * @create_threads_per_seg: number of creation threads per segment + * @snapshots_btree: snapshots btree root + * @open_zones: number of open/active zones + * @peb_create_time: PEB creation timestamp + * @invextree: invalidated extents btree root + * + * This header is used when the full log needs to be built from several + * partial logs. The header represents the combination of the most + * essential fields of segment header and log footer. The first partial + * log starts from the segment header and partial log header. The next + * every partial log starts from the partial log header. Only the latest + * log ends with the log footer. + */ +struct ssdfs_partial_log_header { +/* 0x0000 */ + struct ssdfs_signature magic; + +/* 0x0008 */ + struct ssdfs_metadata_check check; + +/* 0x0010 */ + __le64 timestamp; + __le64 cno; + +/* 0x0020 */ + __le16 log_pages; + __le16 seg_type; + __le32 pl_flags; + +/* 0x0028 */ + __le32 log_bytes; + __le32 flags; + +/* 0x0030 */ + struct ssdfs_metadata_descriptor desc_array[SSDFS_SEG_HDR_DESC_MAX]; + +/* 0x00C0 */ + __le64 nsegs; + __le64 free_pages; + +/* 0x00D0 */ + struct ssdfs_inode root_folder; + +/* 0x01D0 */ + struct ssdfs_inodes_btree inodes_btree; + +/* 0x0250 */ + struct ssdfs_shared_extents_btree shared_extents_btree; + +/* 0x02D0 */ + struct ssdfs_shared_dictionary_btree shared_dict_btree; + +/* 0x0350 */ + __le32 sequence_id; + __le8 log_pagesize; + __le8 log_erasesize; + __le8 log_segsize; + __le8 log_pebs_per_seg; + __le32 lebs_per_peb_index; + __le16 create_threads_per_seg; + __le8 reserved1[0x2]; + +/* 0x0360 */ + struct ssdfs_snapshots_btree snapshots_btree; + +/* 0x03E0 */ + __le32 open_zones; + __le8 reserved2[0x4]; + __le64 peb_create_time; + __le8 reserved3[0x10]; + +/* 0x0400 */ + struct ssdfs_invalidated_extents_btree invextree; + +/* 0x0480 */ + __le8 payload[0x380]; + +/* 0x0800 */ +} __packed; + +/* Partial log flags manipulation functions */ +#define SSDFS_PL_HDR_FNS(bit, name) \ +static inline void ssdfs_set_##name(struct ssdfs_partial_log_header *hdr) \ +{ \ + unsigned long pl_flags = le32_to_cpu(hdr->pl_flags); \ + set_bit(SSDFS_##bit, &pl_flags); \ + hdr->pl_flags = cpu_to_le32((u32)pl_flags); \ +} \ +static inline void ssdfs_clear_##name(struct ssdfs_partial_log_header *hdr) \ +{ \ + unsigned long pl_flags = le32_to_cpu(hdr->pl_flags); \ + clear_bit(SSDFS_##bit, &pl_flags); \ + hdr->pl_flags = cpu_to_le32((u32)pl_flags); \ +} \ +static inline int ssdfs_##name(struct ssdfs_partial_log_header *hdr) \ +{ \ + unsigned long pl_flags = le32_to_cpu(hdr->pl_flags); \ + return test_bit(SSDFS_##bit, &pl_flags); \ +} + +/* + * ssdfs_set_pl_hdr_has_blk_bmap() + * ssdfs_clear_pl_hdr_has_blk_bmap() + * ssdfs_pl_hdr_has_blk_bmap() + */ +SSDFS_PL_HDR_FNS(BLK_BMAP_BIT, pl_hdr_has_blk_bmap) + +/* + * ssdfs_set_pl_hdr_has_offset_table() + * ssdfs_clear_pl_hdr_has_offset_table() + * ssdfs_pl_hdr_has_offset_table() + */ +SSDFS_PL_HDR_FNS(OFFSET_TABLE_BIT, pl_hdr_has_offset_table) + +/* + * ssdfs_set_pl_has_cold_payload() + * ssdfs_clear_pl_has_cold_payload() + * ssdfs_pl_has_cold_payload() + */ +SSDFS_PL_HDR_FNS(COLD_PAYLOAD_BIT, pl_has_cold_payload) + +/* + * ssdfs_set_pl_has_warm_payload() + * ssdfs_clear_pl_has_warm_payload() + * ssdfs_pl_has_warm_payload() + */ +SSDFS_PL_HDR_FNS(WARM_PAYLOAD_BIT, pl_has_warm_payload) + +/* + * ssdfs_set_pl_has_hot_payload() + * ssdfs_clear_pl_has_hot_payload() + * ssdfs_pl_has_hot_payload() + */ +SSDFS_PL_HDR_FNS(HOT_PAYLOAD_BIT, pl_has_hot_payload) + +/* + * ssdfs_set_pl_has_blk_desc_chain() + * ssdfs_clear_pl_has_blk_desc_chain() + * ssdfs_pl_has_blk_desc_chain() + */ +SSDFS_PL_HDR_FNS(BLK_DESC_CHAIN_BIT, pl_has_blk_desc_chain) + +/* + * ssdfs_set_pl_has_maptbl_cache() + * ssdfs_clear_pl_has_maptbl_cache() + * ssdfs_pl_has_maptbl_cache() + */ +SSDFS_PL_HDR_FNS(MAPTBL_CACHE_BIT, pl_has_maptbl_cache) + +/* + * ssdfs_set_pl_has_footer() + * ssdfs_clear_pl_has_footer() + * ssdfs_pl_has_footer() + */ +SSDFS_PL_HDR_FNS(FOOTER_BIT, pl_has_footer) + +/* + * ssdfs_set_pl_is_partial() + * ssdfs_clear_pl_is_partial() + * ssdfs_pl_is_partial() + */ +SSDFS_PL_HDR_FNS(PARTIAL_LOG_BIT, pl_is_partial) + +/* + * ssdfs_set_pl_has_partial_header() + * ssdfs_clear_pl_has_partial_header() + * ssdfs_pl_has_partial_header() + */ +SSDFS_PL_HDR_FNS(PARTIAL_LOG_HEADER_BIT, pl_has_partial_header) + +/* + * ssdfs_set_pl_header_instead_footer() + * ssdfs_clear_pl_header_instead_footer() + * ssdfs_pl_header_instead_footer() + */ +SSDFS_PL_HDR_FNS(PLH_INSTEAD_FOOTER_BIT, pl_header_instead_footer) + +/* + * struct ssdfs_diff_blob_header - diff blob header + * @magic: diff blob's magic + * @type: diff blob's type + * @desc_size: size of diff blob's descriptor in bytes + * @blob_size: size of diff blob in bytes + * @flags: diff blob's flags + */ +struct ssdfs_diff_blob_header { +/* 0x0000 */ + __le16 magic; + __le8 type; + __le8 desc_size; + __le16 blob_size; + __le16 flags; + +/* 0x0008 */ +} __packed; + +/* Diff blob flags */ +#define SSDFS_DIFF_BLOB_HAS_BTREE_NODE_HEADER (1 << 0) +#define SSDFS_DIFF_CHAIN_CONTAINS_NEXT_BLOB (1 << 1) +#define SSDFS_DIFF_BLOB_FLAGS_MASK (0x3) + +/* + * struct ssdfs_metadata_diff_blob_header - metadata diff blob header + * @diff: generic diff blob header + * @bits_count: count of bits in bitmap + * @item_start_bit: item starting bit in bitmap + * @index_start_bit: index starting bit in bitmap + * @item_size: size of item in bytes + */ +struct ssdfs_metadata_diff_blob_header { +/* 0x0000 */ + struct ssdfs_diff_blob_header diff; + +/* 0x0008 */ + __le16 bits_count; + __le16 item_start_bit; + __le16 index_start_bit; + __le16 item_size; + +/* 0x0010 */ +} __packed; + +/* Diff blob types */ +enum { + SSDFS_UNKNOWN_DIFF_BLOB_TYPE, + SSDFS_BTREE_NODE_DIFF_BLOB, + SSDFS_USER_DATA_DIFF_BLOB, + SSDFS_DIFF_BLOB_TYPE_MAX +}; + +/* + * struct ssdfs_fragments_chain_header - header of fragments' chain + * @compr_bytes: size of the whole fragments' chain in compressed state + * @uncompr_bytes: size of the whole fragments' chain in decompressed state + * @fragments_count: count of fragments in the chain + * @desc_size: size of one descriptor item + * @magic: fragments chain header magic + * @type: fragments chain header type + * @flags: flags of fragments' chain + */ +struct ssdfs_fragments_chain_header { +/* 0x0000 */ + __le32 compr_bytes; + __le32 uncompr_bytes; + +/* 0x0008 */ + __le16 fragments_count; + __le16 desc_size; + +/* 0x000C */ + __le8 magic; + __le8 type; + __le16 flags; + +/* 0x0010 */ +} __packed; + +/* Fragments chain types */ +#define SSDFS_UNKNOWN_CHAIN_HDR 0x0 +#define SSDFS_LOG_AREA_CHAIN_HDR 0x1 +#define SSDFS_BLK_STATE_CHAIN_HDR 0x2 +#define SSDFS_BLK_DESC_CHAIN_HDR 0x3 +#define SSDFS_BLK_DESC_ZLIB_CHAIN_HDR 0x4 +#define SSDFS_BLK_DESC_LZO_CHAIN_HDR 0x5 +#define SSDFS_BLK_BMAP_CHAIN_HDR 0x6 +#define SSDFS_CHAIN_HDR_TYPE_MAX (SSDFS_BLK_BMAP_CHAIN_HDR + 1) + +/* Fragments chain flags */ +#define SSDFS_MULTIPLE_HDR_CHAIN (1 << 0) +#define SSDFS_CHAIN_HDR_FLAG_MASK 0x1 + +/* Fragments chain constants */ +#define SSDFS_FRAGMENTS_CHAIN_MAX 14 +#define SSDFS_BLK_BMAP_FRAGMENTS_CHAIN_MAX 64 + +/* + * struct ssdfs_fragment_desc - fragment descriptor + * @offset: fragment's offset + * @compr_size: size of fragment in compressed state + * @uncompr_size: size of fragment after decompression + * @checksum: fragment checksum + * @sequence_id: fragment's sequential id number + * @magic: fragment descriptor's magic + * @type: fragment descriptor's type + * @flags: fragment descriptor's flags + */ +struct ssdfs_fragment_desc { +/* 0x0000 */ + __le32 offset; + __le16 compr_size; + __le16 uncompr_size; + +/* 0x0008 */ + __le32 checksum; + __le8 sequence_id; + __le8 magic; + __le8 type; + __le8 flags; + +/* 0x0010 */ +} __packed; + +/* Fragment descriptor types */ +#define SSDFS_UNKNOWN_FRAGMENT_TYPE 0 +#define SSDFS_FRAGMENT_UNCOMPR_BLOB 1 +#define SSDFS_FRAGMENT_ZLIB_BLOB 2 +#define SSDFS_FRAGMENT_LZO_BLOB 3 +#define SSDFS_DATA_BLK_STATE_DESC 4 +#define SSDFS_DATA_BLK_DESC 5 +#define SSDFS_DATA_BLK_DESC_ZLIB 6 +#define SSDFS_DATA_BLK_DESC_LZO 7 +#define SSDFS_NEXT_TABLE_DESC 8 +#define SSDFS_FRAGMENT_DESC_MAX_TYPE (SSDFS_NEXT_TABLE_DESC + 1) + +/* Fragment descriptor flags */ +#define SSDFS_FRAGMENT_HAS_CSUM (1 << 0) +#define SSDFS_FRAGMENT_DESC_FLAGS_MASK 0x1 + +/* + * struct ssdfs_block_bitmap_header - header of segment's block bitmap + * @magic: magic signature and flags + * @fragments_count: count of block bitmap's fragments + * @bytes_count: count of bytes in fragments' sequence + * @flags: block bitmap's flags + * @type: type of block bitmap + */ +struct ssdfs_block_bitmap_header { +/* 0x0000 */ + struct ssdfs_signature magic; + +/* 0x0008 */ + __le16 fragments_count; + __le32 bytes_count; + +#define SSDFS_BLK_BMAP_BACKUP (1 << 0) +#define SSDFS_BLK_BMAP_COMPRESSED (1 << 1) +#define SSDFS_BLK_BMAP_FLAG_MASK 0x3 + __le8 flags; + +#define SSDFS_BLK_BMAP_UNCOMPRESSED_BLOB (0) +#define SSDFS_BLK_BMAP_ZLIB_BLOB (1) +#define SSDFS_BLK_BMAP_LZO_BLOB (2) +#define SSDFS_BLK_BMAP_TYPE_MAX (SSDFS_BLK_BMAP_LZO_BLOB + 1) + __le8 type; + +/* 0x0010 */ +} __packed; + +/* + * struct ssdfs_block_bitmap_fragment - block bitmap's fragment header + * @peb_index: PEB's index + * @sequence_id: ID of block bitmap's fragment in the sequence + * @flags: fragment's flags + * @type: fragment type + * @last_free_blk: last logical free block + * @metadata_blks: count of physical pages are used by metadata + * @invalid_blks: count of invalid blocks + * @chain_hdr: descriptor of block bitmap's fragments' chain + */ +struct ssdfs_block_bitmap_fragment { +/* 0x0000 */ + __le16 peb_index; + __le8 sequence_id; + +#define SSDFS_MIGRATING_BLK_BMAP (1 << 0) +#define SSDFS_PEB_HAS_EXT_PTR (1 << 1) +#define SSDFS_PEB_HAS_RELATION (1 << 2) +#define SSDFS_FRAG_BLK_BMAP_FLAG_MASK 0x7 + __le8 flags : 6; + +#define SSDFS_SRC_BLK_BMAP (0) +#define SSDFS_DST_BLK_BMAP (1) +#define SSDFS_FRAG_BLK_BMAP_TYPE_MAX (SSDFS_DST_BLK_BMAP + 1) + __le8 type : 2; + + __le32 last_free_blk; + +/* 0x0008 */ + __le32 metadata_blks; + __le32 invalid_blks; + +/* 0x0010 */ + struct ssdfs_fragments_chain_header chain_hdr; + +/* 0x0020 */ +} __packed; + +/* + * The block to offset table has structure: + * + * ---------------------------- + * | | + * | Blk2Off table Header | + * | | + * ---------------------------- + * | | + * | Translation extents | + * | sequence | + * | | + * ---------------------------- + * | | + * | Physical offsets table | + * | header | + * | | + * ---------------------------- + * | | + * | Physical offset | + * | descriptors sequence | + * | | + * ---------------------------- + */ + +/* Possible log's area types */ +enum { + SSDFS_LOG_BLK_DESC_AREA, + SSDFS_LOG_MAIN_AREA, + SSDFS_LOG_DIFFS_AREA, + SSDFS_LOG_JOURNAL_AREA, + SSDFS_LOG_AREA_MAX, +}; + +/* + * struct ssdfs_peb_page_descriptor - PEB's page descriptor + * @logical_offset: logical offset from file's begin in pages + * @logical_blk: logical number of the block in segment + * @peb_page: PEB's page index + */ +struct ssdfs_peb_page_descriptor { +/* 0x0000 */ + __le32 logical_offset; + __le16 logical_blk; + __le16 peb_page; + +/* 0x0008 */ +} __packed; + +/* + * struct ssdfs_blk_state_offset - block's state offset + * @log_start_page: start page of the log + * @log_area: identification number of log area + * @peb_migration_id: identification number of PEB in migration sequence + * @byte_offset: offset in bytes from area's beginning + */ +struct ssdfs_blk_state_offset { +/* 0x0000 */ + __le16 log_start_page; + __le8 log_area; + __le8 peb_migration_id; + __le32 byte_offset; + +/* 0x0008 */ +} __packed; + +/* + * struct ssdfs_phys_offset_descriptor - descriptor of physical offset + * @page_desc: PEB's page descriptor + * @blk_state: logical block's state offset + */ +struct ssdfs_phys_offset_descriptor { +/* 0x0000 */ + struct ssdfs_peb_page_descriptor page_desc; + struct ssdfs_blk_state_offset blk_state; + +/* 0x0010 */ +} __packed; + +/* + * struct ssdfs_phys_offset_table_header - physical offset table header + * @start_id: start id in the table's fragment + * @id_count: number of unique physical offsets in log's fragments chain + * @byte_size: size in bytes of table's fragment + * @peb_index: PEB index + * @sequence_id: table's fragment's sequential id number + * @type: table's type + * @flags: table's flags + * @magic: table's magic + * @checksum: table checksum + * @used_logical_blks: count of allocated logical blocks + * @free_logical_blks: count of free logical blocks + * @last_allocated_blk: last allocated block (hint for allocation) + * @next_fragment_off: offset till next table's fragment + * + * This table contains offsets of block descriptors in a segment. + * Generally speaking, table can be represented as array of + * ssdfs_phys_offset_descriptor structures are ordered by id + * numbers. The whole table can be split on several fragments. + * Every table's fragment begins from header. + */ +struct ssdfs_phys_offset_table_header { +/* 0x0000 */ + __le16 start_id; + __le16 id_count; + __le32 byte_size; + +/* 0x0008 */ + __le16 peb_index; + __le16 sequence_id; + __le16 type; + __le16 flags; + +/* 0x0010 */ + __le32 magic; + __le32 checksum; + +/* 0x0018 */ + __le16 used_logical_blks; + __le16 free_logical_blks; + __le16 last_allocated_blk; + __le16 next_fragment_off; + +/* 0x0020 */ +} __packed; + +/* Physical offset table types */ +#define SSDFS_UNKNOWN_OFF_TABLE_TYPE 0 +#define SSDFS_SEG_OFF_TABLE 1 +#define SSDFS_OFF_TABLE_MAX_TYPE (SSDFS_SEG_OFF_TABLE + 1) + +/* Physical offset table flags */ +#define SSDFS_OFF_TABLE_HAS_CSUM (1 << 0) +#define SSDFS_OFF_TABLE_HAS_NEXT_FRAGMENT (1 << 1) +#define SSDFS_BLK_DESC_TBL_COMPRESSED (1 << 2) +#define SSDFS_OFF_TABLE_FLAGS_MASK 0x7 + +/* + * struct ssdfs_translation_extent - logical block to offset id translation + * @logical_blk: starting logical block + * @offset_id: starting offset id + * @len: count of items in extent + * @sequence_id: id in sequence of extents + * @state: logical blocks' sequence state + */ +struct ssdfs_translation_extent { +/* 0x0000 */ + __le16 logical_blk; +#define SSDFS_INVALID_OFFSET_ID (U16_MAX) + __le16 offset_id; + __le16 len; + __le8 sequence_id; + __le8 state; + +/* 0x0008 */ +} __packed; + +enum { + SSDFS_LOGICAL_BLK_UNKNOWN_STATE, + SSDFS_LOGICAL_BLK_FREE, + SSDFS_LOGICAL_BLK_USED, + SSDFS_LOGICAL_BLK_STATE_MAX, +}; + +/* + * struct ssdfs_blk2off_table_header - translation table header + * @magic: magic signature + * @check: metadata checksum + flags + * @extents_off: offset in bytes from header begin till extents sequence + * @extents_count: count of extents in the sequence + * @offset_table_off: offset in bytes from header begin till phys offsets table + * @fragments_count: count of table's fragments for the whole PEB + * @sequence: first translation extent in the sequence + */ +struct ssdfs_blk2off_table_header { +/* 0x0000 */ + struct ssdfs_signature magic; + +/* 0x0008 */ +#define SSDFS_BLK2OFF_TBL_ZLIB_COMPR (1 << 1) +#define SSDFS_BLK2OFF_TBL_LZO_COMPR (1 << 2) + struct ssdfs_metadata_check check; + +/* 0x0010 */ + __le16 extents_off; + __le16 extents_count; + __le16 offset_table_off; + __le16 fragments_count; + +/* 0x0018 */ + struct ssdfs_translation_extent sequence[1]; + +/* 0x0020 */ +} __packed; + +/* + * The block's descriptor table has structure: + * + * ---------------------------- + * | | + * | Area block table #0 | + * | Fragment descriptor #0 | + * | *** | + * | Fragment descriptor #14 | + * | Next area block table | + * | descriptor | + * | | + * ---------------------------- + * | | + * | Block descriptor #0 | + * | *** | + * | Block descriptor #N | + * | | + * ---------------------------- + * | | + * | *** | + * | | + * ---------------------------- + * | | + * | Block descriptor #0 | + * | *** | + * | Block descriptor #N | + * | | + * ---------------------------- + * | | + * | *** | + * | | + * ---------------------------- + * | | + * | Area block table #N | + * | Fragment descriptor #0 | + * | *** | + * | Fragment descriptor #14 | + * | Next area block table | + * | descriptor | + * | | + * ---------------------------- + * | | + * | Block descriptor #0 | + * | *** | + * | Block descriptor #N | + * | | + * ---------------------------- + * | | + * | *** | + * | | + * ---------------------------- + * | | + * | Block descriptor #0 | + * | *** | + * | Block descriptor #N | + * | | + * ---------------------------- + */ + +#define SSDFS_BLK_STATE_OFF_MAX 6 + +/* + * struct ssdfs_block_descriptor - block descriptor + * @ino: inode identification number + * @logical_offset: logical offset from file's begin in pages + * @peb_index: PEB's index + * @peb_page: PEB's page index + * @state: array of fragment's offsets + */ +struct ssdfs_block_descriptor { +/* 0x0000 */ + __le64 ino; + __le32 logical_offset; + __le16 peb_index; + __le16 peb_page; + +/* 0x0010 */ + struct ssdfs_blk_state_offset state[SSDFS_BLK_STATE_OFF_MAX]; + +/* 0x0040 */ +} __packed; + +/* + * struct ssdfs_area_block_table - descriptor of block state sequence in area + * @chain_hdr: descriptor of block states' chain + * @blk: table of fragment descriptors + * + * This table describes block state sequence in PEB's area. This table + * can consists from several parts. Every part can describe 14 blocks + * in partial sequence. If sequence contains more block descriptors + * then last fragment descriptor describes placement of next part of + * block table and so on. + */ +struct ssdfs_area_block_table { +/* 0x0000 */ + struct ssdfs_fragments_chain_header chain_hdr; + +/* 0x0010 */ +#define SSDFS_NEXT_BLK_TABLE_INDEX SSDFS_FRAGMENTS_CHAIN_MAX +#define SSDFS_BLK_TABLE_MAX (SSDFS_FRAGMENTS_CHAIN_MAX + 1) + struct ssdfs_fragment_desc blk[SSDFS_BLK_TABLE_MAX]; + +/* 0x0100 */ +} __packed; + +/* + * The data (diff, journaling) area has structure: + * ----------------------------- + * | | + * | Block state descriptor #0 | + * | Fragment descriptor #0 | + * | *** | + * | Fragment descriptor #N | + * | | + * ----------------------------- + * | | + * | Data portion #0 | + * | *** | + * | Data portion #N | + * | | + * ----------------------------- + * | | + * | *** | + * | | + * ----------------------------- + * | | + * | Block state descriptor #N | + * | Fragment descriptor #0 | + * | *** | + * | Fragment descriptor #N | + * | | + * ----------------------------- + * | | + * | Data portion #0 | + * | *** | + * | Data portion #N | + * | | + * ----------------------------- + */ + +/* + * ssdfs_block_state_descriptor - block's state descriptor + * @cno: checkpoint + * @parent_snapshot: parent snapshot + * @chain_hdr: descriptor of data fragments' chain + */ +struct ssdfs_block_state_descriptor { +/* 0x0000 */ + __le64 cno; + __le64 parent_snapshot; + +/* 0x0010 */ + struct ssdfs_fragments_chain_header chain_hdr; + +/* 0x0020 */ +} __packed; + +/* + * struct ssdfs_segbmap_fragment_header - segment bitmap fragment header + * @magic: magic signature + * @seg_index: segment index in segment bitmap fragments' chain + * @peb_index: PEB's index in segment + * @flags: fragment's flags + * @seg_type: segment type (main/backup) + * @start_item: fragment's start item number + * @sequence_id: fragment identification number + * @fragment_bytes: bytes count in fragment + * @checksum: fragment checksum + * @total_segs: count of total segments in fragment + * @clean_or_using_segs: count of clean or using segments in fragment + * @used_or_dirty_segs: count of used or dirty segments in fragment + * @bad_segs: count of bad segments in fragment + */ +struct ssdfs_segbmap_fragment_header { +/* 0x0000 */ + __le16 magic; + __le16 seg_index; + __le16 peb_index; +#define SSDFS_SEGBMAP_FRAG_ZLIB_COMPR (1 << 0) +#define SSDFS_SEGBMAP_FRAG_LZO_COMPR (1 << 1) + __le8 flags; + __le8 seg_type; + +/* 0x0008 */ + __le64 start_item; + +/* 0x0010 */ + __le16 sequence_id; + __le16 fragment_bytes; + __le32 checksum; + +/* 0x0018 */ + __le16 total_segs; + __le16 clean_or_using_segs; + __le16 used_or_dirty_segs; + __le16 bad_segs; + +/* 0x0020 */ +} __packed; + +/* + * struct ssdfs_peb_descriptor - descriptor of PEB + * @erase_cycles: count of P/E cycles of PEB + * @type: PEB's type + * @state: PEB's state + * @flags: PEB's flags + * @shared_peb_index: index of external shared destination PEB + */ +struct ssdfs_peb_descriptor { +/* 0x0000 */ + __le32 erase_cycles; + __le8 type; + __le8 state; + __le8 flags; + __le8 shared_peb_index; + +/* 0x0008 */ +} __packed; + +/* PEB's types */ +enum { + SSDFS_MAPTBL_UNKNOWN_PEB_TYPE, + SSDFS_MAPTBL_DATA_PEB_TYPE, + SSDFS_MAPTBL_LNODE_PEB_TYPE, + SSDFS_MAPTBL_HNODE_PEB_TYPE, + SSDFS_MAPTBL_IDXNODE_PEB_TYPE, + SSDFS_MAPTBL_INIT_SNAP_PEB_TYPE, + SSDFS_MAPTBL_SBSEG_PEB_TYPE, + SSDFS_MAPTBL_SEGBMAP_PEB_TYPE, + SSDFS_MAPTBL_MAPTBL_PEB_TYPE, + SSDFS_MAPTBL_PEB_TYPE_MAX +}; + +/* PEB's states */ +enum { + SSDFS_MAPTBL_UNKNOWN_PEB_STATE, + SSDFS_MAPTBL_BAD_PEB_STATE, + SSDFS_MAPTBL_CLEAN_PEB_STATE, + SSDFS_MAPTBL_USING_PEB_STATE, + SSDFS_MAPTBL_USED_PEB_STATE, + SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE, + SSDFS_MAPTBL_DIRTY_PEB_STATE, + SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE, + SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE, + SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE, + SSDFS_MAPTBL_MIGRATION_DST_CLEAN_STATE, + SSDFS_MAPTBL_MIGRATION_DST_USING_STATE, + SSDFS_MAPTBL_MIGRATION_DST_USED_STATE, + SSDFS_MAPTBL_MIGRATION_DST_PRE_DIRTY_STATE, + SSDFS_MAPTBL_MIGRATION_DST_DIRTY_STATE, + SSDFS_MAPTBL_PRE_ERASE_STATE, + SSDFS_MAPTBL_UNDER_ERASE_STATE, + SSDFS_MAPTBL_SNAPSHOT_STATE, + SSDFS_MAPTBL_RECOVERING_STATE, + SSDFS_MAPTBL_PEB_STATE_MAX +}; + +/* PEB's flags */ +#define SSDFS_MAPTBL_SHARED_DESTINATION_PEB (1 << 0) +#define SSDFS_MAPTBL_SOURCE_PEB_HAS_EXT_PTR (1 << 1) +#define SSDFS_MAPTBL_SOURCE_PEB_HAS_ZONE_PTR (1 << 2) + +#define SSDFS_PEBTBL_BMAP_SIZE \ + ((PAGE_SIZE / sizeof(struct ssdfs_peb_descriptor)) / \ + BITS_PER_BYTE) + +/* PEB table's bitmap types */ +enum { + SSDFS_PEBTBL_USED_BMAP, + SSDFS_PEBTBL_DIRTY_BMAP, + SSDFS_PEBTBL_RECOVER_BMAP, + SSDFS_PEBTBL_BADBLK_BMAP, + SSDFS_PEBTBL_BMAP_MAX +}; + +/* + * struct ssdfs_peb_table_fragment_header - header of PEB table fragment + * @magic: signature of PEB table's fragment + * @flags: flags of PEB table's fragment + * @recover_months: recovering duration in months + * @recover_threshold: recover threshold + * @checksum: checksum of PEB table's fragment + * @start_peb: starting PEB number + * @pebs_count: count of PEB's descriptors in table's fragment + * @last_selected_peb: index of last selected unused PEB + * @reserved_pebs: count of reserved PEBs in table's fragment + * @stripe_id: stripe identification number + * @portion_id: sequential ID of mapping table fragment + * @fragment_id: sequential ID of PEB table fragment in the portion + * @bytes_count: table's fragment size in bytes + * @bmap: PEB table fragment's bitmap + */ +struct ssdfs_peb_table_fragment_header { +/* 0x0000 */ + __le16 magic; + __le8 flags; + __le8 recover_months : 4; + __le8 recover_threshold : 4; + __le32 checksum; + +/* 0x0008 */ + __le64 start_peb; + +/* 0x0010 */ + __le16 pebs_count; + __le16 last_selected_peb; + __le16 reserved_pebs; + __le16 stripe_id; + +/* 0x0018 */ + __le16 portion_id; + __le16 fragment_id; + __le32 bytes_count; + +/* 0x0020 */ + __le8 bmaps[SSDFS_PEBTBL_BMAP_MAX][SSDFS_PEBTBL_BMAP_SIZE]; + +/* 0x0120 */ +} __packed; + +/* PEB table fragment's flags */ +#define SSDFS_PEBTBL_FRAG_ZLIB_COMPR (1 << 0) +#define SSDFS_PEBTBL_FRAG_LZO_COMPR (1 << 1) +#define SSDFS_PEBTBL_UNDER_RECOVERING (1 << 2) +#define SSDFS_PEBTBL_BADBLK_EXIST (1 << 3) +#define SSDFS_PEBTBL_TRY_CORRECT_PEBS_AGAIN (1 << 4) +#define SSDFS_PEBTBL_FIND_RECOVERING_PEBS \ + (SSDFS_PEBTBL_UNDER_RECOVERING | SSDFS_PEBTBL_BADBLK_EXIST) +#define SSDFS_PEBTBL_FLAGS_MASK 0x1F + +/* PEB table recover thresholds */ +#define SSDFS_PEBTBL_FIRST_RECOVER_TRY (0) +#define SSDFS_PEBTBL_SECOND_RECOVER_TRY (1) +#define SSDFS_PEBTBL_THIRD_RECOVER_TRY (2) +#define SSDFS_PEBTBL_FOURTH_RECOVER_TRY (3) +#define SSDFS_PEBTBL_FIFTH_RECOVER_TRY (4) +#define SSDFS_PEBTBL_SIX_RECOVER_TRY (5) +#define SSDFS_PEBTBL_BADBLK_THRESHOLD (6) + +#define SSDFS_PEBTBL_FRAGMENT_HDR_SIZE \ + (sizeof(struct ssdfs_peb_table_fragment_header)) + +#define SSDFS_PEB_DESC_PER_FRAGMENT(fragment_size) \ + ((fragment_size - SSDFS_PEBTBL_FRAGMENT_HDR_SIZE) / \ + sizeof(struct ssdfs_peb_descriptor)) + +/* + * struct ssdfs_leb_descriptor - logical descriptor of erase block + * @physical_index: PEB table's offset till PEB's descriptor + * @relation_index: PEB table's offset till associated PEB's descriptor + */ +struct ssdfs_leb_descriptor { +/* 0x0000 */ + __le16 physical_index; + __le16 relation_index; + +/* 0x0004 */ +} __packed; + +/* + * struct ssdfs_leb_table_fragment_header - header of LEB table fragment + * @magic: signature of LEB table's fragment + * @flags: flags of LEB table's fragment + * @checksum: checksum of LEB table's fragment + * @start_leb: starting LEB number + * @lebs_count: count of LEB's descriptors in table's fragment + * @mapped_lebs: count of LEBs are mapped on PEBs + * @migrating_lebs: count of LEBs under migration + * @portion_id: sequential ID of mapping table fragment + * @fragment_id: sequential ID of LEB table fragment in the portion + * @bytes_count: table's fragment size in bytes + */ +struct ssdfs_leb_table_fragment_header { +/* 0x0000 */ + __le16 magic; +#define SSDFS_LEBTBL_FRAG_ZLIB_COMPR (1 << 0) +#define SSDFS_LEBTBL_FRAG_LZO_COMPR (1 << 1) + __le16 flags; + __le32 checksum; + +/* 0x0008 */ + __le64 start_leb; + +/* 0x0010 */ + __le16 lebs_count; + __le16 mapped_lebs; + __le16 migrating_lebs; + __le16 reserved1; + +/* 0x0018 */ + __le16 portion_id; + __le16 fragment_id; + __le32 bytes_count; + +/* 0x0020 */ +} __packed; + +#define SSDFS_LEBTBL_FRAGMENT_HDR_SIZE \ + (sizeof(struct ssdfs_leb_table_fragment_header)) + +#define SSDFS_LEB_DESC_PER_FRAGMENT(fragment_size) \ + ((fragment_size - SSDFS_LEBTBL_FRAGMENT_HDR_SIZE) / \ + sizeof(struct ssdfs_leb_descriptor)) + +/* + * The mapping table cache is the copy of content of mapping + * table for some type of PEBs. The goal of cache is to provide + * the space for storing the copy of LEB_ID/PEB_ID pairs with + * PEB state record. The cache is using for conversion LEB ID + * to PEB ID and retrieving the PEB state record in the case + * when the fragment of mapping table is not initialized yet. + * Also the cache needs for storing modified PEB state during + * the mapping table destruction. The fragment of mapping table + * cache has structure: + * + * ---------------------------- + * | | + * | Header | + * | | + * ---------------------------- + * | | + * | LEB_ID/PEB_ID pairs | + * | | + * ---------------------------- + * | | + * | PEB state records | + * | | + * ---------------------------- + */ + +/* + * struct ssdfs_maptbl_cache_header - maptbl cache header + * @magic: magic signature + * @sequence_id: ID of fragment in the sequence + * @flags: maptbl cache header's flags + * @items_count: count of items in maptbl cache's fragment + * @bytes_count: size of fragment in bytes + * @start_leb: start LEB ID in fragment + * @end_leb: ending LEB ID in fragment + */ +struct ssdfs_maptbl_cache_header { +/* 0x0000 */ + struct ssdfs_signature magic; + +/* 0x0008 */ + __le16 sequence_id; +#define SSDFS_MAPTBL_CACHE_ZLIB_COMPR (1 << 0) +#define SSDFS_MAPTBL_CACHE_LZO_COMPR (1 << 1) + __le16 flags; + __le16 items_count; + __le16 bytes_count; + +/* 0x0010 */ + __le64 start_leb; + __le64 end_leb; + +/* 0x0020 */ +} __packed; + +/* + * struct ssdfs_maptbl_cache_peb_state - PEB state descriptor + * @consistency: PEB state consistency type + * @state: PEB's state + * @flags: PEB's flags + * @shared_peb_index: index of external shared destination PEB + * + * The mapping table cache is the copy of content of mapping + * table for some type of PEBs. If the mapping table cache and + * the mapping table contain the same content for the PEB then + * the PEB state record is consistent. Otherwise, the PEB state + * record is inconsistent. For example, the inconsistency takes + * place if a PEB state record was modified in the mapping table + * cache during the destruction of the mapping table. + */ +struct ssdfs_maptbl_cache_peb_state { +/* 0x0000 */ + __le8 consistency; + __le8 state; + __le8 flags; + __le8 shared_peb_index; + +/* 0x0004 */ +} __packed; + +/* PEB state consistency type */ +enum { + SSDFS_PEB_STATE_UNKNOWN, + SSDFS_PEB_STATE_CONSISTENT, + SSDFS_PEB_STATE_INCONSISTENT, + SSDFS_PEB_STATE_PRE_DELETED, + SSDFS_PEB_STATE_MAX +}; + +#define SSDFS_MAPTBL_CACHE_HDR_SIZE \ + (sizeof(struct ssdfs_maptbl_cache_header)) +#define SSDFS_LEB2PEB_PAIR_SIZE \ + (sizeof(struct ssdfs_leb2peb_pair)) +#define SSDFS_PEB_STATE_SIZE \ + (sizeof(struct ssdfs_maptbl_cache_peb_state)) + +#define SSDFS_LEB2PEB_PAIR_PER_FRAGMENT(fragment_size) \ + ((fragment_size - SSDFS_MAPTBL_CACHE_HDR_SIZE - \ + SSDFS_PEB_STATE_SIZE) / \ + (SSDFS_LEB2PEB_PAIR_SIZE + SSDFS_PEB_STATE_SIZE)) + +/* + * struct ssdfs_btree_node_header - btree's node header + * @magic: magic signature + revision + * @check: metadata checksum + * @height: btree node's height + * @log_node_size: log2(node size) + * @log_index_area_size: log2(index area size) + * @type: btree node type + * @flags: btree node flags + * @index_area_offset: offset of index area in bytes + * @index_count: count of indexes in index area + * @index_size: size of index in bytes + * @min_item_size: min size of item in bytes + * @max_item_size: max possible size of item in bytes + * @items_capacity: capacity of items in the node + * @start_hash: start hash value + * @end_hash: end hash value + * @create_cno: create checkpoint + * @node_id: node identification number + * @item_area_offset: offset of items area in bytes + */ +struct ssdfs_btree_node_header { +/* 0x0000 */ + struct ssdfs_signature magic; + +/* 0x0008 */ + struct ssdfs_metadata_check check; + +/* 0x0010 */ + __le8 height; + __le8 log_node_size; + __le8 log_index_area_size; + __le8 type; + +/* 0x0014 */ +#define SSDFS_BTREE_NODE_HAS_INDEX_AREA (1 << 0) +#define SSDFS_BTREE_NODE_HAS_ITEMS_AREA (1 << 1) +#define SSDFS_BTREE_NODE_HAS_L1TBL (1 << 2) +#define SSDFS_BTREE_NODE_HAS_L2TBL (1 << 3) +#define SSDFS_BTREE_NODE_HAS_HASH_TBL (1 << 4) +#define SSDFS_BTREE_NODE_PRE_ALLOCATED (1 << 5) +#define SSDFS_BTREE_NODE_FLAGS_MASK 0x3F + __le16 flags; + __le16 index_area_offset; + +/* 0x0018 */ + __le16 index_count; + __le8 index_size; + __le8 min_item_size; + __le16 max_item_size; + __le16 items_capacity; + +/* 0x0020 */ + __le64 start_hash; + __le64 end_hash; + +/* 0x0030 */ + __le64 create_cno; + __le32 node_id; + __le32 item_area_offset; + +/* 0x0040 */ +} __packed; + +/* Index of btree node in node's items sequence */ +#define SSDFS_BTREE_NODE_HEADER_INDEX (0) + +/* Btree node types */ +enum { + SSDFS_BTREE_NODE_UNKNOWN_TYPE, + SSDFS_BTREE_ROOT_NODE, + SSDFS_BTREE_INDEX_NODE, + SSDFS_BTREE_HYBRID_NODE, + SSDFS_BTREE_LEAF_NODE, + SSDFS_BTREE_NODE_TYPE_MAX +}; + +#define SSDFS_DENTRIES_PAGES_PER_NODE_MAX (32) +#define SSDFS_DENTRIES_BMAP_SIZE \ + (((SSDFS_DENTRIES_PAGES_PER_NODE_MAX * PAGE_SIZE) / \ + sizeof(struct ssdfs_dir_entry)) / BITS_PER_BYTE) + +/* + * struct ssdfs_dentries_btree_node_header - directory entries node's header + * @node: generic btree node's header + * @parent_ino: parent inode number + * @dentries_count: count of allocated dentries in the node + * @inline_names: count of dentries with inline names + * @flags: dentries node's flags + * @free_space: free space of the node in bytes + * @lookup_table: table for clustering search in the node + * + * The @lookup_table has goal to provide the way of clustering + * the dentries in the node with the goal to speed-up the search. + */ +struct ssdfs_dentries_btree_node_header { +/* 0x0000 */ + struct ssdfs_btree_node_header node; + +/* 0x0040 */ + __le64 parent_ino; + +/* 0x0048 */ + __le16 dentries_count; + __le16 inline_names; + __le16 flags; + __le16 free_space; + +/* 0x0050 */ +#define SSDFS_DENTRIES_BTREE_LOOKUP_TABLE_SIZE (22) + __le64 lookup_table[SSDFS_DENTRIES_BTREE_LOOKUP_TABLE_SIZE]; + +/* 0x0100 */ +} __packed; + +#define SSDFS_SHARED_DICT_PAGES_PER_NODE_MAX (32) +#define SSDFS_SHARED_DICT_BMAP_SIZE \ + (((SSDFS_SHARED_DICT_PAGES_PER_NODE_MAX * PAGE_SIZE) / \ + SSDFS_DENTRY_INLINE_NAME_MAX_LEN) / BITS_PER_BYTE) + +/* + * struct ssdfs_shdict_search_key - generalized search key + * @name.hash_lo: low hash32 value + * @name.hash_hi: tail hash of the name + * @range.prefix_len: prefix length in bytes + * @range.start_index: starting index into lookup table2 + * @range.reserved: private part of concrete structure + * + * This key is generalized version of the first part of any + * item in lookup1, lookup2 and hash tables. This structure + * is needed for the generic way of making search in all + * tables. + */ +struct ssdfs_shdict_search_key { +/* 0x0000 */ + union { + __le32 hash_lo; + __le32 hash_hi; + } name __packed; + +/* 0x0004 */ + union { + __le8 prefix_len; + __le16 start_index; + __le32 reserved; + } range __packed; + +/* 0x0008 */ +} __packed; + +/* + * struct ssdfs_shdict_ltbl1_item - shared dictionary lookup table1 item + * @hash_lo: low hash32 value + * @start_index: starting index into lookup table2 + * @range_len: number of items in the range of lookup table2 + * + * The header of shared dictionary node contains the lookup table1. + * This table is responsible for clustering the items in lookup + * table2. The @hash_lo is hash32 of the first part of the name. + * The length of the first part is the inline name length. + */ +struct ssdfs_shdict_ltbl1_item { +/* 0x0000 */ + __le32 hash_lo; + __le16 start_index; + __le16 range_len; + +/* 0x0008 */ +} __packed; + +/* + * struct ssdfs_shdict_ltbl2_item - shared dictionary lookup table2 item + * @hash_lo: low hash32 value + * @prefix_len: prefix length in bytes + * @str_count: count of strings in the range + * @hash_index: index of the hash in the hash table + * + * The lookup table2 is located at the end of the node. It begins from + * the bottom and is growing in the node's beginning direction. + * Every item of the lookup table2 describes a position of the starting + * keyword of a name. The goal of such descriptor is to describe + * the starting position of the deduplicated keyword that is shared by + * several following names. But the keyword is used only in the beginning + * of the sequence because the rest of the names are represented by + * suffixes only (for example, the sequence of names "absurd, abcissa, + * abacus" can be reprensented by "abacuscissasurd" deduplicated range + * of names). + */ +struct ssdfs_shdict_ltbl2_item { +/* 0x0000 */ + __le32 hash_lo; + __le8 prefix_len; + __le8 str_count; + __le16 hash_index; + +/* 0x0008 */ +} __packed; + +/* + * struct ssdfs_shdict_htbl_item - shared dictionary hash table item + * @hash_hi: tail hash of the name + * @str_offset: offset in bytes to string + * @str_len: string length + * @type: string type + * + * The hash table contains descriptors of all strings in + * string area. The @str_offset is the offset in bytes from + * the items (strings) area's beginning. + */ +struct ssdfs_shdict_htbl_item { +/* 0x0000 */ + __le32 hash_hi; + __le16 str_offset; + __le8 str_len; + __le8 type; + +/* 0x0008 */ +} __packed; + +/* Name string types */ +enum { + SSDFS_UNKNOWN_NAME_TYPE, + SSDFS_NAME_PREFIX, + SSDFS_NAME_SUFFIX, + SSDFS_FULL_NAME, + SSDFS_NAME_TYPE_MAX +}; + +/* + * struct ssdfs_shared_dict_area - area descriptor + * @offset: area offset in bytes + * @size: area size in bytes + * @free_space: free space in bytes + * @items_count: count of items in area + */ +struct ssdfs_shared_dict_area { +/* 0x0000 */ + __le16 offset; + __le16 size; + __le16 free_space; + __le16 items_count; + +/* 0x0008 */ +} __packed; + +/* + * struct ssdfs_shared_dictionary_node_header - shared dictionary node header + * @node: generic btree node's header + * @str_area: string area descriptor + * @hash_table: hash table descriptor + * @lookup_table2: lookup2 table descriptor + * @flags: private flags + * @lookup_table1_items: number of valid items in the lookup1 table + * @lookup_table1: lookup1 table + */ +struct ssdfs_shared_dictionary_node_header { +/* 0x0000 */ + struct ssdfs_btree_node_header node; + +/* 0x0040 */ + struct ssdfs_shared_dict_area str_area; + +/* 0x0048 */ + struct ssdfs_shared_dict_area hash_table; + +/* 0x0050 */ + struct ssdfs_shared_dict_area lookup_table2; + +/* 0x0058 */ + __le16 flags; + __le16 lookup_table1_items; + __le32 reserved2; + +/* 0x0060 */ +#define SSDFS_SHDIC_LTBL1_SIZE (20) + struct ssdfs_shdict_ltbl1_item lookup_table1[SSDFS_SHDIC_LTBL1_SIZE]; + +/* 0x0100 */ +} __packed; + +#define SSDFS_EXTENT_PAGES_PER_NODE_MAX (32) +#define SSDFS_EXTENT_MAX_BMAP_SIZE \ + (((SSDFS_EXTENT_PAGES_PER_NODE_MAX * PAGE_SIZE) / \ + sizeof(struct ssdfs_raw_fork)) / BITS_PER_BYTE) + +/* + * ssdfs_extents_btree_node_header - extents btree node's header + * @node: generic btree node's header + * @parent_ino: parent inode number + * @blks_count: count of blocks in all valid extents + * @forks_count: count of forks in the node + * @allocated_extents: count of allocated extents in all forks + * @valid_extents: count of valid extents + * @max_extent_blks: maximal number of blocks in one extent + * @lookup_table: table for clustering search in the node + * + * The @lookup_table has goal to provide the way of clustering + * the forks in the node with the goal to speed-up the search. + */ +struct ssdfs_extents_btree_node_header { +/* 0x0000 */ + struct ssdfs_btree_node_header node; + +/* 0x0040 */ + __le64 parent_ino; + __le64 blks_count; + +/* 0x0050 */ + __le32 forks_count; + __le32 allocated_extents; + __le32 valid_extents; + __le32 max_extent_blks; + +/* 0x0060 */ +#define SSDFS_EXTENTS_BTREE_LOOKUP_TABLE_SIZE (20) + __le64 lookup_table[SSDFS_EXTENTS_BTREE_LOOKUP_TABLE_SIZE]; + +/* 0x0100 */ +} __packed; + +#define SSDFS_XATTRS_PAGES_PER_NODE_MAX (32) +#define SSDFS_XATTRS_BMAP_SIZE \ + (((SSDFS_XATTRS_PAGES_PER_NODE_MAX * PAGE_SIZE) / \ + sizeof(struct ssdfs_xattr_entry)) / BITS_PER_BYTE) + +/* + * struct ssdfs_xattrs_btree_node_header - xattrs node's header + * @node: generic btree node's header + * @parent_ino: parent inode number + * @xattrs_count: count of allocated xattrs in the node + * @flags: xattrs node's flags + * @free_space: free space of the node in bytes + * @lookup_table: table for clustering search in the node + * + * The @lookup_table has goal to provide the way of clustering + * the xattrs in the node with the goal to speed-up the search. + */ +struct ssdfs_xattrs_btree_node_header { +/* 0x0000 */ + struct ssdfs_btree_node_header node; + +/* 0x0040 */ + __le64 parent_ino; + +/* 0x0048 */ + __le16 xattrs_count; + __le16 reserved; + __le16 flags; + __le16 free_space; + +/* 0x0050 */ +#define SSDFS_XATTRS_BTREE_LOOKUP_TABLE_SIZE (22) + __le64 lookup_table[SSDFS_XATTRS_BTREE_LOOKUP_TABLE_SIZE]; + +/* 0x0100 */ +} __packed; + +/* + * struct ssdfs_index_area - index area info + * @start_hash: start hash value + * @end_hash: end hash value + */ +struct ssdfs_index_area { +/* 0x0000 */ + __le64 start_hash; + __le64 end_hash; + +/* 0x0010 */ +} __packed; + +#define SSDFS_INODE_PAGES_PER_NODE_MAX (32) +#define SSDFS_INODE_BMAP_SIZE \ + (((SSDFS_INODE_PAGES_PER_NODE_MAX * PAGE_SIZE) / \ + sizeof(struct ssdfs_inode)) / BITS_PER_BYTE) + +/* + * struct ssdfs_inodes_btree_node_header -inodes btree node's header + * @node: generic btree node's header + * @inodes_count: count of inodes in the node + * @valid_inodes: count of valid inodes in the node + * @index_area: index area info (hybrid node) + * @bmap: bitmap of valid/invalid inodes in the node + */ +struct ssdfs_inodes_btree_node_header { +/* 0x0000 */ + struct ssdfs_btree_node_header node; + +/* 0x0040 */ + __le16 inodes_count; + __le16 valid_inodes; + __le8 reserved1[0xC]; + +/* 0x0050 */ + struct ssdfs_index_area index_area; + +/* 0x0060 */ + __le8 reserved2[0x60]; + +/* 0x00C0 */ + __le8 bmap[SSDFS_INODE_BMAP_SIZE]; + +/* 0x0100 */ +} __packed; + +/* + * struct ssdfs_snapshot_rule_info - snapshot rule info + * @mode: snapshot mode (READ-ONLY|READ-WRITE) + * @type: snapshot type (PERIODIC|ONE-TIME) + * @expiration: snapshot expiration time (WEEK|MONTH|YEAR|NEVER) + * @frequency: taking snapshot frequency (SYNCFS|HOUR|DAY|WEEK) + * @snapshots_threshold max number of simultaneously available snapshots + * @snapshots_number: current number of created snapshots + * @ino: root object inode ID + * @uuid: snapshot UUID + * @name: snapshot rule name + * @name_hash: name hash + * @last_snapshot_cno: latest snapshot checkpoint + */ +struct ssdfs_snapshot_rule_info { +/* 0x0000 */ + __le8 mode; + __le8 type; + __le8 expiration; + __le8 frequency; + __le16 snapshots_threshold; + __le16 snapshots_number; + +/* 0x0008 */ + __le64 ino; + +/* 0x0010 */ + __le8 uuid[SSDFS_UUID_SIZE]; + +/* 0x0020 */ + char name[SSDFS_MAX_SNAP_RULE_NAME_LEN]; + +/* 0x0030 */ + __le64 name_hash; + __le64 last_snapshot_cno; + +/* 0x0040 */ +} __packed; + +/* Snapshot mode */ +enum { + SSDFS_UNKNOWN_SNAPSHOT_MODE, + SSDFS_READ_ONLY_SNAPSHOT, + SSDFS_READ_WRITE_SNAPSHOT, + SSDFS_SNAPSHOT_MODE_MAX +}; + +#define SSDFS_READ_ONLY_MODE_STR "READ_ONLY" +#define SSDFS_READ_WRITE_MODE_STR "READ_WRITE" + +/* Snapshot type */ +enum { + SSDFS_UNKNOWN_SNAPSHOT_TYPE, + SSDFS_ONE_TIME_SNAPSHOT, + SSDFS_PERIODIC_SNAPSHOT, + SSDFS_SNAPSHOT_TYPE_MAX +}; + +#define SSDFS_ONE_TIME_TYPE_STR "ONE-TIME" +#define SSDFS_PERIODIC_TYPE_STR "PERIODIC" + +/* Snapshot expiration */ +enum { + SSDFS_UNKNOWN_EXPIRATION_POINT, + SSDFS_EXPIRATION_IN_WEEK, + SSDFS_EXPIRATION_IN_MONTH, + SSDFS_EXPIRATION_IN_YEAR, + SSDFS_NEVER_EXPIRED, + SSDFS_EXPIRATION_POINT_MAX +}; + +#define SSDFS_WEEK_EXPIRATION_POINT_STR "WEEK" +#define SSDFS_MONTH_EXPIRATION_POINT_STR "MONTH" +#define SSDFS_YEAR_EXPIRATION_POINT_STR "YEAR" +#define SSDFS_NEVER_EXPIRED_STR "NEVER" + +/* Snapshot creation frequency */ +enum { + SSDFS_UNKNOWN_FREQUENCY, + SSDFS_SYNCFS_FREQUENCY, + SSDFS_HOUR_FREQUENCY, + SSDFS_DAY_FREQUENCY, + SSDFS_WEEK_FREQUENCY, + SSDFS_MONTH_FREQUENCY, + SSDFS_CREATION_FREQUENCY_MAX +}; + +#define SSDFS_SYNCFS_FREQUENCY_STR "SYNCFS" +#define SSDFS_HOUR_FREQUENCY_STR "HOUR" +#define SSDFS_DAY_FREQUENCY_STR "DAY" +#define SSDFS_WEEK_FREQUENCY_STR "WEEK" +#define SSDFS_MONTH_FREQUENCY_STR "MONTH" + +#define SSDFS_INFINITE_SNAPSHOTS_NUMBER U16_MAX +#define SSDFS_UNDEFINED_SNAPSHOTS_NUMBER (0) + +/* + * struct ssdfs_snapshot_rules_header - snapshot rules table's header + * @magic: magic signature + * @item_size: snapshot rule's size in bytes + * @flags: various flags + * @items_count: number of snapshot rules in table + * @items_capacity: capacity of the snaphot rules table + * @area_size: size of table in bytes + */ +struct ssdfs_snapshot_rules_header { +/* 0x0000 */ + __le32 magic; + __le16 item_size; + __le16 flags; + +/* 0x0008 */ + __le16 items_count; + __le16 items_capacity; + __le32 area_size; + +/* 0x0010 */ + __le8 padding[0x10]; + +/* 0x0020 */ +} __packed; + +/* + * struct ssdfs_snapshot - snapshot info + * @magic: magic signature of snapshot + * @mode: snapshot mode (READ-ONLY|READ-WRITE) + * @expiration: snapshot expiration time (WEEK|MONTH|YEAR|NEVER) + * @flags: snapshot's flags + * @name: snapshot name + * @uuid: snapshot UUID + * @create_time: snapshot's timestamp + * @create_cno: snapshot's checkpoint + * @ino: root object inode ID + * @name_hash: name hash + */ +struct ssdfs_snapshot { +/* 0x0000 */ + __le16 magic; + __le8 mode : 4; + __le8 expiration : 4; + __le8 flags; + char name[SSDFS_MAX_SNAPSHOT_NAME_LEN]; + +/* 0x0010 */ + __le8 uuid[SSDFS_UUID_SIZE]; + +/* 0x0020 */ + __le64 create_time; + __le64 create_cno; + +/* 0x0030 */ + __le64 ino; + __le64 name_hash; + +/* 0x0040 */ +} __packed; + +/* snapshot flags */ +#define SSDFS_SNAPSHOT_HAS_EXTERNAL_STRING (1 << 0) +#define SSDFS_SNAPSHOT_FLAGS_MASK 0x1 + +/* + * struct ssdfs_peb2time_pair - PEB to timestamp pair + * @peb_id: PEB ID + * @last_log_time: last log creation time + */ +struct ssdfs_peb2time_pair { +/* 0x0000 */ + __le64 peb_id; + __le64 last_log_time; + +/* 0x0010 */ +} __packed; + +/* + * struct ssdfs_peb2time_set - PEB to timestamp set + * @magic: magic signature of set + * @pairs_count: number of valid pairs in the set + * @create_time: create time of the first PEB in pair set + * @array: array of PEB to timestamp pairs + */ +struct ssdfs_peb2time_set { +/* 0x0000 */ + __le16 magic; + __le8 pairs_count; + __le8 padding[0x5]; + +/* 0x0008 */ + __le64 create_time; + +/* 0x0010 */ +#define SSDFS_PEB2TIME_ARRAY_CAPACITY (3) + struct ssdfs_peb2time_pair array[SSDFS_PEB2TIME_ARRAY_CAPACITY]; + +/* 0x0040 */ +} __packed; + +/* + * union ssdfs_snapshot_item - snapshot item + * @magic: magic signature + * @snapshot: snapshot info + * @peb2time: PEB to timestamp set + */ +union ssdfs_snapshot_item { +/* 0x0000 */ + __le16 magic; + struct ssdfs_snapshot snapshot; + struct ssdfs_peb2time_set peb2time; + +/* 0x0040 */ +} __packed; + +#define SSDFS_SNAPSHOTS_PAGES_PER_NODE_MAX (32) +#define SSDFS_SNAPSHOTS_BMAP_SIZE \ + (((SSDFS_SNAPSHOTS_PAGES_PER_NODE_MAX * PAGE_SIZE) / \ + sizeof(struct ssdfs_snapshot_info)) / BITS_PER_BYTE) + +/* + * struct ssdfs_snapshots_btree_node_header - snapshots node's header + * @node: generic btree node's header + * @snapshots_count: snapshots count in the node + * @lookup_table: table for clustering search in the node + * + * The @lookup_table has goal to provide the way of clustering + * the snapshots in the node with the goal to speed-up the search. + */ +struct ssdfs_snapshots_btree_node_header { +/* 0x0000 */ + struct ssdfs_btree_node_header node; + +/* 0x0040 */ + __le32 snapshots_count; + __le8 padding[0x0C]; + +/* 0x0050 */ +#define SSDFS_SNAPSHOTS_BTREE_LOOKUP_TABLE_SIZE (22) + __le64 lookup_table[SSDFS_SNAPSHOTS_BTREE_LOOKUP_TABLE_SIZE]; + +/* 0x0100 */ +} __packed; + +/* + * struct ssdfs_shared_extent - shared extent + * @fingerprint: fingerprint of shared extent + * @extent: position of the extent on volume + * @fingerprint_len: length of fingerprint + * @fingerprint_type: type of fingerprint + * @flags: various flags + * @ref_count: reference counter of shared extent + */ +struct ssdfs_shared_extent { +/* 0x0000 */ +#define SSDFS_FINGERPRINT_LENGTH_MAX (32) + __le8 fingerprint[SSDFS_FINGERPRINT_LENGTH_MAX]; + +/* 0x0020 */ + struct ssdfs_raw_extent extent; + +/* 0x0030 */ + __le8 fingerprint_len; + __le8 fingerprint_type; + __le16 flags; + __le8 padding[0x4]; + +/* 0x0038 */ + __le64 ref_count; + +/* 0x0040 */ +} __packed; + +#define SSDFS_SHEXTREE_PAGES_PER_NODE_MAX (32) +#define SSDFS_SHEXTREE_BMAP_SIZE \ + (((SSDFS_SHEXTREE_PAGES_PER_NODE_MAX * PAGE_SIZE) / \ + sizeof(struct ssdfs_shared_extent)) / BITS_PER_BYTE) + +/* + * struct ssdfs_shextree_node_header - shared extents btree node's header + * @node: generic btree node's header + * @shared_extents: number of shared extents in the node + * @lookup_table: table for clustering search in the node + * + * The @lookup_table has goal to provide the way of clustering + * the shared extents in the node with the goal to speed-up the search. + */ +struct ssdfs_shextree_node_header { +/* 0x0000 */ + struct ssdfs_btree_node_header node; + +/* 0x0040 */ + __le32 shared_extents; + __le8 padding[0x0C]; + +/* 0x0050 */ +#define SSDFS_SHEXTREE_LOOKUP_TABLE_SIZE (22) + __le64 lookup_table[SSDFS_SHEXTREE_LOOKUP_TABLE_SIZE]; + +/* 0x0100 */ +} __packed; + +#define SSDFS_INVEXTREE_PAGES_PER_NODE_MAX (32) +#define SSDFS_INVEXTREE_BMAP_SIZE \ + (((SSDFS_INVEXTREE_PAGES_PER_NODE_MAX * PAGE_SIZE) / \ + sizeof(struct ssdfs_raw_extent)) / BITS_PER_BYTE) + +/* + * struct ssdfs_invextree_node_header - invalidated extents btree node's header + * @node: generic btree node's header + * @extents_count: number of invalidated extents in the node + * @lookup_table: table for clustering search in the node + * + * The @lookup_table has goal to provide the way of clustering + * the invalidated extents in the node with the goal to speed-up the search. + */ +struct ssdfs_invextree_node_header { +/* 0x0000 */ + struct ssdfs_btree_node_header node; + +/* 0x0040 */ + __le32 extents_count; + __le8 padding[0x0C]; + +/* 0x0050 */ +#define SSDFS_INVEXTREE_LOOKUP_TABLE_SIZE (22) + __le64 lookup_table[SSDFS_INVEXTREE_LOOKUP_TABLE_SIZE]; + +/* 0x0100 */ +} __packed; + +#endif /* _LINUX_SSDFS_H */ From patchwork Sat Feb 25 01:08:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151907 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC164C64ED8 for ; Sat, 25 Feb 2023 01:15:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229595AbjBYBPs (ORCPT ); Fri, 24 Feb 2023 20:15:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48332 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229522AbjBYBPp (ORCPT ); Fri, 24 Feb 2023 20:15:45 -0500 Received: from mail-oi1-x231.google.com (mail-oi1-x231.google.com [IPv6:2607:f8b0:4864:20::231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AAD6E125AF for ; Fri, 24 Feb 2023 17:15:40 -0800 (PST) Received: by mail-oi1-x231.google.com with SMTP id e21so827842oie.1 for ; Fri, 24 Feb 2023 17:15:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=bP0b9LEh+czsntuQtuFO394YmoIhlhhyVn8RIeEz6ik=; b=YQDOmiA63tCuMr9s3l3Rd9pC93S3yVLf70fQp57U+S855K152uWEaAa56mHRR6rTD/ FfxXb3260CBSFB3FVYNgxvUO1RwGcEHS4D97hHkMkymVHDJKGF9DDRBY6vjYhGbYtVPG kTiKP4gLZL5Q/2dpl+e/E7iGIH5sDxeQnUlK+4viXwLTCmNc/bf7hXuUfIOL18Q80fFX sEjmitXRqvJZNPcDlqyqIbIcg1FWLMl6O9vtSQDjx8TigxokNDUOFJUOi2niYfDiwshv 4I3FWnYOCGSb4nos2i/k20J9NWfH5JMsvzNgCg1dOI8VbrimuHdL8zk6ANL0ziT3Smzm zoxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bP0b9LEh+czsntuQtuFO394YmoIhlhhyVn8RIeEz6ik=; b=3mXOImxPPaa17rKrfZ/aEjfzPx4aMc/gab0VKMDONUDFYMcUUHEoNzkKoldVXk/Zjj HdoX9lMsWENyokwWQ7iFJnrK2uCk54fqR3Ie0FFMkc8gAoaGVeUPTAX2FuE1mBiOQUoi iRruRe36R3vRi0O98nk4E1mXOZyp3g3a0/82LmkN+rKBpWJyLe/lNxjBnHFptTe3hzkk WQryddmw0TLVLpKojCN6iOi9IZJuRG/GhIU5EGGoAUQEOieUdJ9RrKSbG7dCmAyszYGh aMRrhpGuLL9RtRR4rOqvuSrz2eW7q9BNstCO2gCWJilRdNWDc+LHynXbxHB2BPK1aZ4f 7ZPQ== X-Gm-Message-State: AO0yUKVTPcY69fgZ3Ureg6maO1raVW6XGBz1KlfcEGhmuJw70kWScqlL A2Q7wr34yY2yJMlVZSQobh0k/Fh8yXAETu5+ X-Google-Smtp-Source: AK7set+Vu9I4XFZx8dWNNlH2AuFuwZpw3FssR6NKUq/tg4GrKO5VuhdCb20ehxM2pYpK6+F1h7QnwA== X-Received: by 2002:a05:6808:424e:b0:37f:9b35:1880 with SMTP id dp14-20020a056808424e00b0037f9b351880mr5556720oib.27.1677287738824; Fri, 24 Feb 2023 17:15:38 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.15.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:15:37 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 02/76] ssdfs: key file system declarations Date: Fri, 24 Feb 2023 17:08:13 -0800 Message-Id: <20230225010927.813929-3-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This patch contains declarations of key constants, macros, inline functions implementations and function declarations. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/ssdfs.h | 411 ++++++++++ fs/ssdfs/ssdfs_constants.h | 81 ++ fs/ssdfs/ssdfs_fs_info.h | 412 ++++++++++ fs/ssdfs/ssdfs_inline.h | 1346 +++++++++++++++++++++++++++++++++ fs/ssdfs/ssdfs_inode_info.h | 143 ++++ fs/ssdfs/ssdfs_thread_info.h | 42 + fs/ssdfs/version.h | 7 + include/trace/events/ssdfs.h | 255 +++++++ include/uapi/linux/ssdfs_fs.h | 117 +++ 9 files changed, 2814 insertions(+) create mode 100644 fs/ssdfs/ssdfs.h create mode 100644 fs/ssdfs/ssdfs_constants.h create mode 100644 fs/ssdfs/ssdfs_fs_info.h create mode 100644 fs/ssdfs/ssdfs_inline.h create mode 100644 fs/ssdfs/ssdfs_inode_info.h create mode 100644 fs/ssdfs/ssdfs_thread_info.h create mode 100644 fs/ssdfs/version.h create mode 100644 include/trace/events/ssdfs.h create mode 100644 include/uapi/linux/ssdfs_fs.h diff --git a/fs/ssdfs/ssdfs.h b/fs/ssdfs/ssdfs.h new file mode 100644 index 000000000000..c0d5d7ace2eb --- /dev/null +++ b/fs/ssdfs/ssdfs.h @@ -0,0 +1,411 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/ssdfs.h - in-core declarations. + * + * Copyright (c) 2019-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + */ + +#ifndef _SSDFS_H +#define _SSDFS_H + +#ifdef pr_fmt +#undef pr_fmt +#endif + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include + +#include "ssdfs_constants.h" +#include "ssdfs_thread_info.h" +#include "ssdfs_inode_info.h" +#include "snapshot.h" +#include "snapshot_requests_queue.h" +#include "snapshot_rules.h" +#include "ssdfs_fs_info.h" +#include "ssdfs_inline.h" + +/* + * struct ssdfs_value_pair - value/position pair + * @value: some value + * @pos: position of value + */ +struct ssdfs_value_pair { + int value; + int pos; +}; + +/* + * struct ssdfs_min_max_pair - minimum and maximum values pair + * @min: minimum value/position pair + * @max: maximum value/position pair + */ +struct ssdfs_min_max_pair { + struct ssdfs_value_pair min; + struct ssdfs_value_pair max; +}; + +/* + * struct ssdfs_block_bmap_range - block bitmap items range + * @start: begin item + * @len: count of items in the range + */ +struct ssdfs_block_bmap_range { + u32 start; + u32 len; +}; + +struct ssdfs_peb_info; +struct ssdfs_peb_container; +struct ssdfs_segment_info; +struct ssdfs_peb_blk_bmap; + +/* btree_node.c */ +void ssdfs_zero_btree_node_obj_cache_ptr(void); +int ssdfs_init_btree_node_obj_cache(void); +void ssdfs_shrink_btree_node_obj_cache(void); +void ssdfs_destroy_btree_node_obj_cache(void); + +/* btree_search.c */ +void ssdfs_zero_btree_search_obj_cache_ptr(void); +int ssdfs_init_btree_search_obj_cache(void); +void ssdfs_shrink_btree_search_obj_cache(void); +void ssdfs_destroy_btree_search_obj_cache(void); + +/* compression.c */ +int ssdfs_compressors_init(void); +void ssdfs_free_workspaces(void); +void ssdfs_compressors_exit(void); + +/* dev_bdev.c */ +struct bio *ssdfs_bdev_bio_alloc(struct block_device *bdev, + unsigned int nr_iovecs, + unsigned int op, + gfp_t gfp_mask); +void ssdfs_bdev_bio_put(struct bio *bio); +int ssdfs_bdev_bio_add_page(struct bio *bio, struct page *page, + unsigned int len, unsigned int offset); +int ssdfs_bdev_readpage(struct super_block *sb, struct page *page, + loff_t offset); +int ssdfs_bdev_readpages(struct super_block *sb, struct pagevec *pvec, + loff_t offset); +int ssdfs_bdev_read(struct super_block *sb, loff_t offset, + size_t len, void *buf); +int ssdfs_bdev_can_write_page(struct super_block *sb, loff_t offset, + bool need_check); +int ssdfs_bdev_writepage(struct super_block *sb, loff_t to_off, + struct page *page, u32 from_off, size_t len); +int ssdfs_bdev_writepages(struct super_block *sb, loff_t to_off, + struct pagevec *pvec, + u32 from_off, size_t len); + +/* dev_zns.c */ +u64 ssdfs_zns_zone_size(struct super_block *sb, loff_t offset); +u64 ssdfs_zns_zone_capacity(struct super_block *sb, loff_t offset); + +/* dir.c */ +int ssdfs_inode_by_name(struct inode *dir, + const struct qstr *child, + ino_t *ino); +int ssdfs_create(struct user_namespace *mnt_userns, + struct inode *dir, struct dentry *dentry, + umode_t mode, bool excl); + +/* file.c */ +int ssdfs_allocate_inline_file_buffer(struct inode *inode); +void ssdfs_destroy_inline_file_buffer(struct inode *inode); +int ssdfs_fsync(struct file *file, loff_t start, loff_t end, int datasync); + +/* fs_error.c */ +extern __printf(5, 6) +void ssdfs_fs_error(struct super_block *sb, const char *file, + const char *function, unsigned int line, + const char *fmt, ...); +int ssdfs_set_page_dirty(struct page *page); +int __ssdfs_clear_dirty_page(struct page *page); +int ssdfs_clear_dirty_page(struct page *page); +void ssdfs_clear_dirty_pages(struct address_space *mapping); + +/* inode.c */ +bool is_raw_inode_checksum_correct(struct ssdfs_fs_info *fsi, + void *buf, size_t size); +struct inode *ssdfs_iget(struct super_block *sb, ino_t ino); +struct inode *ssdfs_new_inode(struct inode *dir, umode_t mode, + const struct qstr *qstr); +int ssdfs_getattr(struct user_namespace *mnt_userns, + const struct path *path, struct kstat *stat, + u32 request_mask, unsigned int query_flags); +int ssdfs_setattr(struct user_namespace *mnt_userns, + struct dentry *dentry, struct iattr *attr); +void ssdfs_evict_inode(struct inode *inode); +int ssdfs_write_inode(struct inode *inode, struct writeback_control *wbc); +int ssdfs_statfs(struct dentry *dentry, struct kstatfs *buf); +void ssdfs_set_inode_flags(struct inode *inode); + +/* inodes_tree.c */ +void ssdfs_zero_free_ino_desc_cache_ptr(void); +int ssdfs_init_free_ino_desc_cache(void); +void ssdfs_shrink_free_ino_desc_cache(void); +void ssdfs_destroy_free_ino_desc_cache(void); + +/* ioctl.c */ +long ssdfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg); + +/* log_footer.c */ +bool __is_ssdfs_log_footer_magic_valid(struct ssdfs_signature *magic); +bool is_ssdfs_log_footer_magic_valid(struct ssdfs_log_footer *footer); +bool is_ssdfs_log_footer_csum_valid(void *buf, size_t buf_size); +bool is_ssdfs_volume_state_info_consistent(struct ssdfs_fs_info *fsi, + void *buf, + struct ssdfs_log_footer *footer, + u64 dev_size); +int ssdfs_read_unchecked_log_footer(struct ssdfs_fs_info *fsi, + u64 peb_id, u32 bytes_off, + void *buf, bool silent, + u32 *log_pages); +int ssdfs_check_log_footer(struct ssdfs_fs_info *fsi, + void *buf, + struct ssdfs_log_footer *footer, + bool silent); +int ssdfs_read_checked_log_footer(struct ssdfs_fs_info *fsi, void *log_hdr, + u64 peb_id, u32 bytes_off, void *buf, + bool silent); +int ssdfs_prepare_current_segment_ids(struct ssdfs_fs_info *fsi, + __le64 *array, + size_t size); +int ssdfs_prepare_volume_state_info_for_commit(struct ssdfs_fs_info *fsi, + u16 fs_state, + __le64 *cur_segs, + size_t size, + u64 last_log_time, + u64 last_log_cno, + struct ssdfs_volume_state *vs); +int ssdfs_prepare_log_footer_for_commit(struct ssdfs_fs_info *fsi, + u32 log_pages, + u32 log_flags, + u64 last_log_time, + u64 last_log_cno, + struct ssdfs_log_footer *footer); + +/* offset_translation_table.c */ +void ssdfs_zero_blk2off_frag_obj_cache_ptr(void); +int ssdfs_init_blk2off_frag_obj_cache(void); +void ssdfs_shrink_blk2off_frag_obj_cache(void); +void ssdfs_destroy_blk2off_frag_obj_cache(void); + +/* options.c */ +int ssdfs_parse_options(struct ssdfs_fs_info *fs_info, char *data); +void ssdfs_initialize_fs_errors_option(struct ssdfs_fs_info *fsi); +int ssdfs_show_options(struct seq_file *seq, struct dentry *root); + +/* peb_migration_scheme.c */ +int ssdfs_peb_start_migration(struct ssdfs_peb_container *pebc); +bool is_peb_under_migration(struct ssdfs_peb_container *pebc); +bool is_pebs_relation_alive(struct ssdfs_peb_container *pebc); +bool has_peb_migration_done(struct ssdfs_peb_container *pebc); +bool should_migration_be_finished(struct ssdfs_peb_container *pebc); +int ssdfs_peb_finish_migration(struct ssdfs_peb_container *pebc); +bool has_ssdfs_source_peb_valid_blocks(struct ssdfs_peb_container *pebc); +int ssdfs_peb_prepare_range_migration(struct ssdfs_peb_container *pebc, + u32 range_len, int blk_type); +int ssdfs_peb_migrate_valid_blocks_range(struct ssdfs_segment_info *si, + struct ssdfs_peb_container *pebc, + struct ssdfs_peb_blk_bmap *peb_blkbmap, + struct ssdfs_block_bmap_range *range); + +/* readwrite.c */ +int ssdfs_read_page_from_volume(struct ssdfs_fs_info *fsi, + u64 peb_id, u32 bytes_off, + struct page *page); +int ssdfs_read_pagevec_from_volume(struct ssdfs_fs_info *fsi, + u64 peb_id, u32 bytes_off, + struct pagevec *pvec); +int ssdfs_aligned_read_buffer(struct ssdfs_fs_info *fsi, + u64 peb_id, u32 bytes_off, + void *buf, size_t size, + size_t *read_bytes); +int ssdfs_unaligned_read_buffer(struct ssdfs_fs_info *fsi, + u64 peb_id, u32 bytes_off, + void *buf, size_t size); +int ssdfs_can_write_sb_log(struct super_block *sb, + struct ssdfs_peb_extent *sb_log); +int ssdfs_unaligned_read_pagevec(struct pagevec *pvec, + u32 offset, u32 size, + void *buf); +int ssdfs_unaligned_write_pagevec(struct pagevec *pvec, + u32 offset, u32 size, + void *buf); + +/* recovery.c */ +int ssdfs_init_sb_info(struct ssdfs_fs_info *fsi, + struct ssdfs_sb_info *sbi); +void ssdfs_destruct_sb_info(struct ssdfs_sb_info *sbi); +void ssdfs_backup_sb_info(struct ssdfs_fs_info *fsi); +void ssdfs_restore_sb_info(struct ssdfs_fs_info *fsi); +int ssdfs_gather_superblock_info(struct ssdfs_fs_info *fsi, int silent); + +/* segment.c */ +void ssdfs_zero_seg_obj_cache_ptr(void); +int ssdfs_init_seg_obj_cache(void); +void ssdfs_shrink_seg_obj_cache(void); +void ssdfs_destroy_seg_obj_cache(void); +int ssdfs_segment_get_used_data_pages(struct ssdfs_segment_info *si); + +/* sysfs.c */ +int ssdfs_sysfs_init(void); +void ssdfs_sysfs_exit(void); +int ssdfs_sysfs_create_device_group(struct super_block *sb); +void ssdfs_sysfs_delete_device_group(struct ssdfs_fs_info *fsi); +int ssdfs_sysfs_create_seg_group(struct ssdfs_segment_info *si); +void ssdfs_sysfs_delete_seg_group(struct ssdfs_segment_info *si); +int ssdfs_sysfs_create_peb_group(struct ssdfs_peb_container *pebc); +void ssdfs_sysfs_delete_peb_group(struct ssdfs_peb_container *pebc); + +/* volume_header.c */ +bool __is_ssdfs_segment_header_magic_valid(struct ssdfs_signature *magic); +bool is_ssdfs_segment_header_magic_valid(struct ssdfs_segment_header *hdr); +bool is_ssdfs_partial_log_header_magic_valid(struct ssdfs_signature *magic); +bool is_ssdfs_volume_header_csum_valid(void *vh_buf, size_t buf_size); +bool is_ssdfs_partial_log_header_csum_valid(void *plh_buf, size_t buf_size); +bool is_ssdfs_volume_header_consistent(struct ssdfs_fs_info *fsi, + struct ssdfs_volume_header *vh, + u64 dev_size); +int ssdfs_check_segment_header(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_header *hdr, + bool silent); +int ssdfs_read_checked_segment_header(struct ssdfs_fs_info *fsi, + u64 peb_id, u32 pages_off, + void *buf, bool silent); +int ssdfs_check_partial_log_header(struct ssdfs_fs_info *fsi, + struct ssdfs_partial_log_header *hdr, + bool silent); +void ssdfs_create_volume_header(struct ssdfs_fs_info *fsi, + struct ssdfs_volume_header *vh); +int ssdfs_prepare_volume_header_for_commit(struct ssdfs_fs_info *fsi, + struct ssdfs_volume_header *vh); +int ssdfs_prepare_segment_header_for_commit(struct ssdfs_fs_info *fsi, + u32 log_pages, + u16 seg_type, + u32 seg_flags, + u64 last_log_time, + u64 last_log_cno, + struct ssdfs_segment_header *hdr); +int ssdfs_prepare_partial_log_header_for_commit(struct ssdfs_fs_info *fsi, + int sequence_id, + u32 log_pages, + u16 seg_type, + u32 flags, + u64 last_log_time, + u64 last_log_cno, + struct ssdfs_partial_log_header *hdr); + +/* memory leaks checker */ +void ssdfs_acl_memory_leaks_init(void); +void ssdfs_acl_check_memory_leaks(void); +void ssdfs_block_bmap_memory_leaks_init(void); +void ssdfs_block_bmap_check_memory_leaks(void); +void ssdfs_blk2off_memory_leaks_init(void); +void ssdfs_blk2off_check_memory_leaks(void); +void ssdfs_btree_memory_leaks_init(void); +void ssdfs_btree_check_memory_leaks(void); +void ssdfs_btree_hierarchy_memory_leaks_init(void); +void ssdfs_btree_hierarchy_check_memory_leaks(void); +void ssdfs_btree_node_memory_leaks_init(void); +void ssdfs_btree_node_check_memory_leaks(void); +void ssdfs_btree_search_memory_leaks_init(void); +void ssdfs_btree_search_check_memory_leaks(void); +void ssdfs_lzo_memory_leaks_init(void); +void ssdfs_lzo_check_memory_leaks(void); +void ssdfs_zlib_memory_leaks_init(void); +void ssdfs_zlib_check_memory_leaks(void); +void ssdfs_compr_memory_leaks_init(void); +void ssdfs_compr_check_memory_leaks(void); +void ssdfs_cur_seg_memory_leaks_init(void); +void ssdfs_cur_seg_check_memory_leaks(void); +void ssdfs_dentries_memory_leaks_init(void); +void ssdfs_dentries_check_memory_leaks(void); +void ssdfs_dev_bdev_memory_leaks_init(void); +void ssdfs_dev_bdev_check_memory_leaks(void); +void ssdfs_dev_zns_memory_leaks_init(void); +void ssdfs_dev_zns_check_memory_leaks(void); +void ssdfs_dev_mtd_memory_leaks_init(void); +void ssdfs_dev_mtd_check_memory_leaks(void); +void ssdfs_dir_memory_leaks_init(void); +void ssdfs_dir_check_memory_leaks(void); +void ssdfs_diff_memory_leaks_init(void); +void ssdfs_diff_check_memory_leaks(void); +void ssdfs_ext_queue_memory_leaks_init(void); +void ssdfs_ext_queue_check_memory_leaks(void); +void ssdfs_ext_tree_memory_leaks_init(void); +void ssdfs_ext_tree_check_memory_leaks(void); +void ssdfs_file_memory_leaks_init(void); +void ssdfs_file_check_memory_leaks(void); +void ssdfs_fs_error_memory_leaks_init(void); +void ssdfs_fs_error_check_memory_leaks(void); +void ssdfs_inode_memory_leaks_init(void); +void ssdfs_inode_check_memory_leaks(void); +void ssdfs_ino_tree_memory_leaks_init(void); +void ssdfs_ino_tree_check_memory_leaks(void); +void ssdfs_invext_tree_memory_leaks_init(void); +void ssdfs_invext_tree_check_memory_leaks(void); +void ssdfs_parray_memory_leaks_init(void); +void ssdfs_parray_check_memory_leaks(void); +void ssdfs_page_vector_memory_leaks_init(void); +void ssdfs_page_vector_check_memory_leaks(void); +void ssdfs_flush_memory_leaks_init(void); +void ssdfs_flush_check_memory_leaks(void); +void ssdfs_gc_memory_leaks_init(void); +void ssdfs_gc_check_memory_leaks(void); +void ssdfs_map_queue_memory_leaks_init(void); +void ssdfs_map_queue_check_memory_leaks(void); +void ssdfs_map_tbl_memory_leaks_init(void); +void ssdfs_map_tbl_check_memory_leaks(void); +void ssdfs_map_cache_memory_leaks_init(void); +void ssdfs_map_cache_check_memory_leaks(void); +void ssdfs_map_thread_memory_leaks_init(void); +void ssdfs_map_thread_check_memory_leaks(void); +void ssdfs_migration_memory_leaks_init(void); +void ssdfs_migration_check_memory_leaks(void); +void ssdfs_peb_memory_leaks_init(void); +void ssdfs_peb_check_memory_leaks(void); +void ssdfs_read_memory_leaks_init(void); +void ssdfs_read_check_memory_leaks(void); +void ssdfs_recovery_memory_leaks_init(void); +void ssdfs_recovery_check_memory_leaks(void); +void ssdfs_req_queue_memory_leaks_init(void); +void ssdfs_req_queue_check_memory_leaks(void); +void ssdfs_seg_obj_memory_leaks_init(void); +void ssdfs_seg_obj_check_memory_leaks(void); +void ssdfs_seg_bmap_memory_leaks_init(void); +void ssdfs_seg_bmap_check_memory_leaks(void); +void ssdfs_seg_blk_memory_leaks_init(void); +void ssdfs_seg_blk_check_memory_leaks(void); +void ssdfs_seg_tree_memory_leaks_init(void); +void ssdfs_seg_tree_check_memory_leaks(void); +void ssdfs_seq_arr_memory_leaks_init(void); +void ssdfs_seq_arr_check_memory_leaks(void); +void ssdfs_dict_memory_leaks_init(void); +void ssdfs_dict_check_memory_leaks(void); +void ssdfs_shextree_memory_leaks_init(void); +void ssdfs_shextree_check_memory_leaks(void); +void ssdfs_snap_reqs_queue_memory_leaks_init(void); +void ssdfs_snap_reqs_queue_check_memory_leaks(void); +void ssdfs_snap_rules_list_memory_leaks_init(void); +void ssdfs_snap_rules_list_check_memory_leaks(void); +void ssdfs_snap_tree_memory_leaks_init(void); +void ssdfs_snap_tree_check_memory_leaks(void); +void ssdfs_xattr_memory_leaks_init(void); +void ssdfs_xattr_check_memory_leaks(void); + +#endif /* _SSDFS_H */ diff --git a/fs/ssdfs/ssdfs_constants.h b/fs/ssdfs/ssdfs_constants.h new file mode 100644 index 000000000000..d5ba89d8b272 --- /dev/null +++ b/fs/ssdfs/ssdfs_constants.h @@ -0,0 +1,81 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/ssdfs_constants.h - SSDFS constant declarations. + * + * Copyright (c) 2019-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + */ + +#ifndef _SSDFS_CONSTANTS_H +#define _SSDFS_CONSTANTS_H + +/* + * Thread types + */ +enum { + SSDFS_PEB_READ_THREAD, + SSDFS_PEB_FLUSH_THREAD, + SSDFS_PEB_GC_THREAD, + SSDFS_PEB_THREAD_TYPE_MAX, +}; + +enum { + SSDFS_SEG_USING_GC_THREAD, + SSDFS_SEG_USED_GC_THREAD, + SSDFS_SEG_PRE_DIRTY_GC_THREAD, + SSDFS_SEG_DIRTY_GC_THREAD, + SSDFS_GC_THREAD_TYPE_MAX, +}; + +enum { + SSDFS_256B = 256, + SSDFS_512B = 512, + SSDFS_1KB = 1024, + SSDFS_2KB = 2048, + SSDFS_4KB = 4096, + SSDFS_8KB = 8192, + SSDFS_16KB = 16384, + SSDFS_32KB = 32768, + SSDFS_64KB = 65536, + SSDFS_128KB = 131072, + SSDFS_256KB = 262144, + SSDFS_512KB = 524288, + SSDFS_1MB = 1048576, + SSDFS_2MB = 2097152, + SSDFS_8MB = 8388608, + SSDFS_16MB = 16777216, + SSDFS_32MB = 33554432, + SSDFS_64MB = 67108864, + SSDFS_128MB = 134217728, + SSDFS_256MB = 268435456, + SSDFS_512MB = 536870912, + SSDFS_1GB = 1073741824, + SSDFS_2GB = 2147483648, + SSDFS_8GB = 8589934592, + SSDFS_16GB = 17179869184, + SSDFS_32GB = 34359738368, + SSDFS_64GB = 68719476736, +}; + +enum { + SSDFS_UNKNOWN_PAGE_TYPE, + SSDFS_USER_DATA_PAGES, + SSDFS_METADATA_PAGES, + SSDFS_PAGES_TYPE_MAX +}; + +#define SSDFS_INVALID_CNO U64_MAX +#define SSDFS_SECTOR_SHIFT 9 +#define SSDFS_DEFAULT_TIMEOUT (msecs_to_jiffies(120000)) +#define SSDFS_NANOSECS_PER_SEC (1000000000) +#define SSDFS_SECS_PER_HOUR (60 * 60) +#define SSDFS_HOURS_PER_DAY (24) +#define SSDFS_DAYS_PER_WEEK (7) +#define SSDFS_WEEKS_PER_MONTH (4) + +#endif /* _SSDFS_CONSTANTS_H */ diff --git a/fs/ssdfs/ssdfs_fs_info.h b/fs/ssdfs/ssdfs_fs_info.h new file mode 100644 index 000000000000..18ba9c463af4 --- /dev/null +++ b/fs/ssdfs/ssdfs_fs_info.h @@ -0,0 +1,412 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/ssdfs_fs_info.h - in-core fs information. + * + * Copyright (c) 2019-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + */ + +#ifndef _SSDFS_FS_INFO_H +#define _SSDFS_FS_INFO_H + +/* Global FS states */ +enum { + SSDFS_UNKNOWN_GLOBAL_FS_STATE, + SSDFS_REGULAR_FS_OPERATIONS, + SSDFS_METADATA_GOING_FLUSHING, + SSDFS_METADATA_UNDER_FLUSH, + SSDFS_GLOBAL_FS_STATE_MAX +}; + +/* + * struct ssdfs_volume_block - logical block + * @seg_id: segment ID + * @blk_index: block index in segment + */ +struct ssdfs_volume_block { + u64 seg_id; + u16 blk_index; +}; + +/* + * struct ssdfs_volume_extent - logical extent + * @start: initial logical block + * @len: extent length + */ +struct ssdfs_volume_extent { + struct ssdfs_volume_block start; + u16 len; +}; + +/* + * struct ssdfs_peb_extent - PEB's extent + * @leb_id: LEB ID + * @peb_id: PEB ID + * @page_offset: offset in pages + * @pages_count: pages count + */ +struct ssdfs_peb_extent { + u64 leb_id; + u64 peb_id; + u32 page_offset; + u32 pages_count; +}; + +/* + * struct ssdfs_zone_fragment - zone fragment + * @ino: inode identification number + * @logical_blk_offset: logical offset from file's beginning in blocks + * @extent: zone fragment descriptor + */ +struct ssdfs_zone_fragment { + u64 ino; + u64 logical_blk_offset; + struct ssdfs_raw_extent extent; +}; + +/* + * struct ssdfs_metadata_options - metadata options + * @blk_bmap.flags: block bitmap's flags + * @blk_bmap.compression: compression type + * + * @blk2off_tbl.flags: offset transaltion table's flags + * @blk2off_tbl.compression: compression type + * + * @user_data.flags: user data's flags + * @user_data.compression: compression type + * @user_data.migration_threshold: default value of destination PEBs in migration + */ +struct ssdfs_metadata_options { + struct { + u16 flags; + u8 compression; + } blk_bmap; + + struct { + u16 flags; + u8 compression; + } blk2off_tbl; + + struct { + u16 flags; + u8 compression; + u16 migration_threshold; + } user_data; +}; + +/* + * struct ssdfs_sb_info - superblock info + * @vh_buf: volume header buffer + * @vh_buf_size: size of volume header buffer in bytes + * @vs_buf: volume state buffer + * @vs_buf_size: size of volume state buffer in bytes + * @last_log: latest sb log + */ +struct ssdfs_sb_info { + void *vh_buf; + size_t vh_buf_size; + void *vs_buf; + size_t vs_buf_size; + struct ssdfs_peb_extent last_log; +}; + +/* + * struct ssdfs_device_ops - device operations + * @device_name: get device name + * @device_size: get device size in bytes + * @open_zone: open zone + * @reopen_zone: reopen closed zone + * @close_zone: close zone + * @read: read from device + * @readpage: read page + * @readpages: read sequence of pages + * @can_write_page: can we write into page? + * @writepage: write page to device + * @writepages: write sequence of pages to device + * @erase: erase block + * @trim: support of background erase operation + * @peb_isbad: check that physical erase block is bad + * @sync: synchronize page cache with device + */ +struct ssdfs_device_ops { + const char * (*device_name)(struct super_block *sb); + __u64 (*device_size)(struct super_block *sb); + int (*open_zone)(struct super_block *sb, loff_t offset); + int (*reopen_zone)(struct super_block *sb, loff_t offset); + int (*close_zone)(struct super_block *sb, loff_t offset); + int (*read)(struct super_block *sb, loff_t offset, size_t len, + void *buf); + int (*readpage)(struct super_block *sb, struct page *page, + loff_t offset); + int (*readpages)(struct super_block *sb, struct pagevec *pvec, + loff_t offset); + int (*can_write_page)(struct super_block *sb, loff_t offset, + bool need_check); + int (*writepage)(struct super_block *sb, loff_t to_off, + struct page *page, u32 from_off, size_t len); + int (*writepages)(struct super_block *sb, loff_t to_off, + struct pagevec *pvec, u32 from_off, size_t len); + int (*erase)(struct super_block *sb, loff_t offset, size_t len); + int (*trim)(struct super_block *sb, loff_t offset, size_t len); + int (*peb_isbad)(struct super_block *sb, loff_t offset); + int (*mark_peb_bad)(struct super_block *sb, loff_t offset); + void (*sync)(struct super_block *sb); +}; + +/* + * struct ssdfs_snapshot_subsystem - snapshots subsystem + * @reqs_queue: snapshot requests queue + * @rules_list: snapshot rules list + * @tree: snapshots btree + */ +struct ssdfs_snapshot_subsystem { + struct ssdfs_snapshot_reqs_queue reqs_queue; + struct ssdfs_snapshot_rules_list rules_list; + struct ssdfs_snapshots_btree_info *tree; +}; + +/* + * struct ssdfs_fs_info - in-core fs information + * @log_pagesize: log2(page size) + * @pagesize: page size in bytes + * @log_erasesize: log2(erase block size) + * @erasesize: physical erase block size in bytes + * @log_segsize: log2(segment size) + * @segsize: segment size in bytes + * @log_pebs_per_seg: log2(erase blocks per segment) + * @pebs_per_seg: physical erase blocks per segment + * @pages_per_peb: pages per physical erase block + * @pages_per_seg: pages per segment + * @leb_pages_capacity: maximal number of logical blocks per LEB + * @peb_pages_capacity: maximal number of NAND pages can be written per PEB + * @lebs_per_peb_index: difference of LEB IDs between PEB indexes in segment + * @fs_ctime: volume create timestamp (mkfs phase) + * @fs_cno: volume create checkpoint + * @raw_inode_size: raw inode size in bytes + * @create_threads_per_seg: number of creation threads per segment + * @mount_opts: mount options + * @metadata_options: metadata options + * @volume_sem: volume semaphore + * @last_vh: buffer for last valid volume header + * @vh: volume header + * @vs: volume state + * @sbi: superblock info + * @sbi_backup: backup copy of superblock info + * @sb_seg_log_pages: full log size in sb segment (pages count) + * @segbmap_log_pages: full log size in segbmap segment (pages count) + * @maptbl_log_pages: full log size in maptbl segment (pages count) + * @lnodes_seg_log_pages: full log size in leaf nodes segment (pages count) + * @hnodes_seg_log_pages: full log size in hybrid nodes segment (pages count) + * @inodes_seg_log_pages: full log size in index nodes segment (pages count) + * @user_data_log_pages: full log size in user data segment (pages count) + * @volume_state_lock: lock for mutable volume metadata + * @free_pages: free pages count on the volume + * @reserved_new_user_data_pages: reserved pages of growing files' content + * @updated_user_data_pages: number of updated pages of files' content + * @flushing_user_data_requests: number of user data processing flush request + * @pending_wq: wait queue for flush threads of user data segments + * @finish_user_data_flush_wq: wait queue for waiting the end of user data flush + * @fs_mount_time: file system mount timestamp + * @fs_mod_time: last write timestamp + * @fs_mount_cno: mount checkpoint + * @boot_vs_mount_timediff: difference between boottime and mounttime + * @fs_flags: file system flags + * @fs_state: file system state + * @fs_errors: behaviour when detecting errors + * @fs_feature_compat: compatible feature set + * @fs_feature_compat_ro: read-only compatible feature set + * @fs_feature_incompat: incompatible feature set + * @fs_uuid: 128-bit volume's uuid + * @fs_label: volume name + * @migration_threshold: default value of destination PEBs in migration + * @resize_mutex: resize mutex + * @nsegs: number of segments on the volume + * @sb_segs_sem: semaphore for superblock's array of LEB/PEB numbers + * @sb_lebs: array of LEB ID numbers + * @sb_pebs: array of PEB ID numbers + * @segbmap: segment bitmap object + * @segbmap_inode: segment bitmap inode + * @maptbl: PEB mapping table object + * @maptbl_cache: maptbl cache + * @segs_tree: tree of segment objects + * @segs_tree_inode: segment tree inode + * @cur_segs: array of current segments + * @shextree: shared extents tree + * @shdictree: shared dictionary + * @inodes_tree: inodes btree + * @invextree: invalidated extents btree + * @snapshots: snapshots subsystem + * @gc_thread: array of GC threads + * @gc_wait_queue: array of GC threads' wait queues + * @gc_should_act: array of counters that define necessity of GC activity + * @flush_reqs: current number of flush requests + * @sb: pointer on VFS superblock object + * @mtd: MTD info + * @devops: device access operations + * @pending_bios: count of pending BIOs (dev_bdev.c ONLY) + * @erase_page: page with content for erase operation (dev_bdev.c ONLY) + * @is_zns_device: file system volume is on ZNS device + * @zone_size: zone size in bytes + * @zone_capacity: zone capacity in bytes available for write operations + * @max_open_zones: open zones limitation (upper bound) + * @open_zones: current number of opened zones + * @dev_kobj: /sys/fs/ssdfs/ kernel object + * @dev_kobj_unregister: completion state for kernel object + * @maptbl_kobj: /sys/fs///maptbl kernel object + * @maptbl_kobj_unregister: completion state for maptbl kernel object + * @segbmap_kobj: /sys/fs///segbmap kernel object + * @segbmap_kobj_unregister: completion state for segbmap kernel object + * @segments_kobj: /sys/fs///segments kernel object + * @segments_kobj_unregister: completion state for segments kernel object + */ +struct ssdfs_fs_info { + u8 log_pagesize; + u32 pagesize; + u8 log_erasesize; + u32 erasesize; + u8 log_segsize; + u32 segsize; + u8 log_pebs_per_seg; + u32 pebs_per_seg; + u32 pages_per_peb; + u32 pages_per_seg; + u32 leb_pages_capacity; + u32 peb_pages_capacity; + u32 lebs_per_peb_index; + u64 fs_ctime; + u64 fs_cno; + u16 raw_inode_size; + u16 create_threads_per_seg; + + unsigned long mount_opts; + struct ssdfs_metadata_options metadata_options; + + struct rw_semaphore volume_sem; + struct ssdfs_volume_header last_vh; + struct ssdfs_volume_header *vh; + struct ssdfs_volume_state *vs; + struct ssdfs_sb_info sbi; + struct ssdfs_sb_info sbi_backup; + u16 sb_seg_log_pages; + u16 segbmap_log_pages; + u16 maptbl_log_pages; + u16 lnodes_seg_log_pages; + u16 hnodes_seg_log_pages; + u16 inodes_seg_log_pages; + u16 user_data_log_pages; + + atomic_t global_fs_state; + + spinlock_t volume_state_lock; + u64 free_pages; + u64 reserved_new_user_data_pages; + u64 updated_user_data_pages; + u64 flushing_user_data_requests; + wait_queue_head_t pending_wq; + wait_queue_head_t finish_user_data_flush_wq; + u64 fs_mount_time; + u64 fs_mod_time; + u64 fs_mount_cno; + u64 boot_vs_mount_timediff; + u32 fs_flags; + u16 fs_state; + u16 fs_errors; + u64 fs_feature_compat; + u64 fs_feature_compat_ro; + u64 fs_feature_incompat; + unsigned char fs_uuid[SSDFS_UUID_SIZE]; + char fs_label[SSDFS_VOLUME_LABEL_MAX]; + u16 migration_threshold; + + struct mutex resize_mutex; + u64 nsegs; + + struct rw_semaphore sb_segs_sem; + u64 sb_lebs[SSDFS_SB_CHAIN_MAX][SSDFS_SB_SEG_COPY_MAX]; + u64 sb_pebs[SSDFS_SB_CHAIN_MAX][SSDFS_SB_SEG_COPY_MAX]; + + struct ssdfs_segment_bmap *segbmap; + struct inode *segbmap_inode; + + struct ssdfs_peb_mapping_table *maptbl; + struct ssdfs_maptbl_cache maptbl_cache; + + struct ssdfs_segment_tree *segs_tree; + struct inode *segs_tree_inode; + + struct ssdfs_current_segs_array *cur_segs; + + struct ssdfs_shared_extents_tree *shextree; + struct ssdfs_shared_dict_btree_info *shdictree; + struct ssdfs_inodes_btree_info *inodes_tree; + struct ssdfs_invextree_info *invextree; + + struct ssdfs_snapshot_subsystem snapshots; + + struct ssdfs_thread_info gc_thread[SSDFS_GC_THREAD_TYPE_MAX]; + wait_queue_head_t gc_wait_queue[SSDFS_GC_THREAD_TYPE_MAX]; + atomic_t gc_should_act[SSDFS_GC_THREAD_TYPE_MAX]; + atomic64_t flush_reqs; + + struct super_block *sb; + + struct mtd_info *mtd; + const struct ssdfs_device_ops *devops; + atomic_t pending_bios; /* for dev_bdev.c */ + struct page *erase_page; /* for dev_bdev.c */ + + bool is_zns_device; + u64 zone_size; + u64 zone_capacity; + u32 max_open_zones; + atomic_t open_zones; + + /* /sys/fs/ssdfs/ */ + struct kobject dev_kobj; + struct completion dev_kobj_unregister; + + /* /sys/fs///maptbl */ + struct kobject maptbl_kobj; + struct completion maptbl_kobj_unregister; + + /* /sys/fs///segbmap */ + struct kobject segbmap_kobj; + struct completion segbmap_kobj_unregister; + + /* /sys/fs///segments */ + struct kobject segments_kobj; + struct completion segments_kobj_unregister; + +#ifdef CONFIG_SSDFS_TESTING + struct address_space testing_pages; + struct inode *testing_inode; + bool do_fork_invalidation; +#endif /* CONFIG_SSDFS_TESTING */ +}; + +#define SSDFS_FS_I(sb) \ + ((struct ssdfs_fs_info *)(sb->s_fs_info)) + +/* + * GC thread functions + */ +int ssdfs_using_seg_gc_thread_func(void *data); +int ssdfs_used_seg_gc_thread_func(void *data); +int ssdfs_pre_dirty_seg_gc_thread_func(void *data); +int ssdfs_dirty_seg_gc_thread_func(void *data); +int ssdfs_start_gc_thread(struct ssdfs_fs_info *fsi, int type); +int ssdfs_stop_gc_thread(struct ssdfs_fs_info *fsi, int type); + +/* + * Device operations + */ +extern const struct ssdfs_device_ops ssdfs_mtd_devops; +extern const struct ssdfs_device_ops ssdfs_bdev_devops; +extern const struct ssdfs_device_ops ssdfs_zns_devops; + +#endif /* _SSDFS_FS_INFO_H */ diff --git a/fs/ssdfs/ssdfs_inline.h b/fs/ssdfs/ssdfs_inline.h new file mode 100644 index 000000000000..9c416438b291 --- /dev/null +++ b/fs/ssdfs/ssdfs_inline.h @@ -0,0 +1,1346 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/ssdfs_inline.h - inline functions and macros. + * + * Copyright (c) 2019-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + */ + +#ifndef _SSDFS_INLINE_H +#define _SSDFS_INLINE_H + +#include +#include + +#define SSDFS_CRIT(fmt, ...) \ + pr_crit("pid %d:%s:%d %s(): " fmt, \ + current->pid, __FILE__, __LINE__, __func__, ##__VA_ARGS__) + +#define SSDFS_ERR(fmt, ...) \ + pr_err("pid %d:%s:%d %s(): " fmt, \ + current->pid, __FILE__, __LINE__, __func__, ##__VA_ARGS__) + +#define SSDFS_WARN(fmt, ...) \ + do { \ + pr_warn("pid %d:%s:%d %s(): " fmt, \ + current->pid, __FILE__, __LINE__, \ + __func__, ##__VA_ARGS__); \ + dump_stack(); \ + } while (0) + +#define SSDFS_NOTICE(fmt, ...) \ + pr_notice(fmt, ##__VA_ARGS__) + +#define SSDFS_INFO(fmt, ...) \ + pr_info(fmt, ##__VA_ARGS__) + +#ifdef CONFIG_SSDFS_DEBUG + +#define SSDFS_DBG(fmt, ...) \ + pr_debug("pid %d:%s:%d %s(): " fmt, \ + current->pid, __FILE__, __LINE__, __func__, ##__VA_ARGS__) + +#else /* CONFIG_SSDFS_DEBUG */ + +#define SSDFS_DBG(fmt, ...) \ + no_printk(KERN_DEBUG fmt, ##__VA_ARGS__) + +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +extern atomic64_t ssdfs_allocated_pages; +extern atomic64_t ssdfs_memory_leaks; + +extern atomic64_t ssdfs_locked_pages; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +static inline +void ssdfs_memory_leaks_increment(void *kaddr) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_inc(&ssdfs_memory_leaks); + + SSDFS_DBG("memory %p, allocation count %lld\n", + kaddr, + atomic64_read(&ssdfs_memory_leaks)); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +static inline +void ssdfs_memory_leaks_decrement(void *kaddr) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_dec(&ssdfs_memory_leaks); + + SSDFS_DBG("memory %p, allocation count %lld\n", + kaddr, + atomic64_read(&ssdfs_memory_leaks)); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +static inline +void *ssdfs_kmalloc(size_t size, gfp_t flags) +{ + void *kaddr = kmalloc(size, flags); + + if (kaddr) + ssdfs_memory_leaks_increment(kaddr); + + return kaddr; +} + +static inline +void *ssdfs_kzalloc(size_t size, gfp_t flags) +{ + void *kaddr = kzalloc(size, flags); + + if (kaddr) + ssdfs_memory_leaks_increment(kaddr); + + return kaddr; +} + +static inline +void *ssdfs_kvzalloc(size_t size, gfp_t flags) +{ + void *kaddr = kvzalloc(size, flags); + + if (kaddr) + ssdfs_memory_leaks_increment(kaddr); + + return kaddr; +} + +static inline +void *ssdfs_kcalloc(size_t n, size_t size, gfp_t flags) +{ + void *kaddr = kcalloc(n, size, flags); + + if (kaddr) + ssdfs_memory_leaks_increment(kaddr); + + return kaddr; +} + +static inline +void ssdfs_kfree(void *kaddr) +{ + if (kaddr) { + ssdfs_memory_leaks_decrement(kaddr); + kfree(kaddr); + } +} + +static inline +void ssdfs_kvfree(void *kaddr) +{ + if (kaddr) { + ssdfs_memory_leaks_decrement(kaddr); + kvfree(kaddr); + } +} + +static inline +void ssdfs_get_page(struct page *page) +{ + get_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d, flags %#lx\n", + page, page_ref_count(page), page->flags); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +static inline +void ssdfs_put_page(struct page *page) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (page_ref_count(page) < 1) { + SSDFS_WARN("page %p, count %d\n", + page, page_ref_count(page)); + } +} + +static inline +void ssdfs_lock_page(struct page *page) +{ + lock_page(page); + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_locked_pages) < 0) { + SSDFS_WARN("ssdfs_locked_pages %lld\n", + atomic64_read(&ssdfs_locked_pages)); + } + + atomic64_inc(&ssdfs_locked_pages); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +static inline +void ssdfs_account_locked_page(struct page *page) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (!page) + return; + + if (!PageLocked(page)) { + SSDFS_WARN("page %p, page_index %llu\n", + page, (u64)page_index(page)); + } + + if (atomic64_read(&ssdfs_locked_pages) < 0) { + SSDFS_WARN("ssdfs_locked_pages %lld\n", + atomic64_read(&ssdfs_locked_pages)); + } + + atomic64_inc(&ssdfs_locked_pages); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +static inline +void ssdfs_unlock_page(struct page *page) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (!PageLocked(page)) { + SSDFS_WARN("page %p, page_index %llu\n", + page, (u64)page_index(page)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + + unlock_page(page); + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_dec(&ssdfs_locked_pages); + + if (atomic64_read(&ssdfs_locked_pages) < 0) { + SSDFS_WARN("ssdfs_locked_pages %lld\n", + atomic64_read(&ssdfs_locked_pages)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +static inline +struct page *ssdfs_alloc_page(gfp_t gfp_mask) +{ + struct page *page; + + page = alloc_page(gfp_mask); + if (unlikely(!page)) { + SSDFS_ERR("unable to allocate memory page\n"); + return ERR_PTR(-ENOMEM); + } + + ssdfs_get_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d, " + "flags %#lx, page_index %lu\n", + page, page_ref_count(page), + page->flags, page_index(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_inc(&ssdfs_allocated_pages); + + SSDFS_DBG("page %p, allocated_pages %lld\n", + page, atomic64_read(&ssdfs_allocated_pages)); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + + return page; +} + +static inline +void ssdfs_account_page(struct page *page) +{ + return; +} + +static inline +void ssdfs_forget_page(struct page *page) +{ + return; +} + +/* + * ssdfs_add_pagevec_page() - add page into pagevec + * @pvec: pagevec + * + * This function adds empty page into pagevec. + * + * RETURN: + * [success] - pointer on added page. + * [failure] - error code: + * + * %-ENOMEM - fail to allocate memory. + * %-E2BIG - pagevec is full. + */ +static inline +struct page *ssdfs_add_pagevec_page(struct pagevec *pvec) +{ + struct page *page; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pvec); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pagevec_space(pvec) == 0) { + SSDFS_ERR("pagevec hasn't space\n"); + return ERR_PTR(-E2BIG); + } + + page = ssdfs_alloc_page(GFP_KERNEL | __GFP_ZERO); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("unable to allocate memory page\n"); + return ERR_PTR(err); + } + + pagevec_add(pvec, page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pvec %p, pagevec count %u\n", + pvec, pagevec_count(pvec)); + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return page; +} + +static inline +void ssdfs_free_page(struct page *page) +{ + if (!page) + return; + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (PageLocked(page)) { + SSDFS_WARN("page %p is still locked\n", + page); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (page_ref_count(page) <= 0 || + page_ref_count(page) > 1) { + SSDFS_WARN("page %p, count %d\n", + page, page_ref_count(page)); + } + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_dec(&ssdfs_allocated_pages); + + SSDFS_DBG("page %p, allocated_pages %lld\n", + page, atomic64_read(&ssdfs_allocated_pages)); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + + __free_pages(page, 0); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d, " + "flags %#lx, page_index %lu\n", + page, page_ref_count(page), + page->flags, page_index(page)); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +static inline +void ssdfs_pagevec_release(struct pagevec *pvec) +{ + int i; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pvec %p\n", pvec); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!pvec) + return; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pvec count %u\n", pagevec_count(pvec)); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < pagevec_count(pvec); i++) { + struct page *page = pvec->pages[i]; + + if (!page) + continue; + + ssdfs_free_page(page); + + pvec->pages[i] = NULL; + } + + pagevec_reinit(pvec); +} + +#define SSDFS_MEMORY_LEAKS_CHECKER_FNS(name) \ +static inline \ +void ssdfs_##name##_cache_leaks_increment(void *kaddr) \ +{ \ + atomic64_inc(&ssdfs_##name##_cache_leaks); \ + SSDFS_DBG("memory %p, allocation count %lld\n", \ + kaddr, \ + atomic64_read(&ssdfs_##name##_cache_leaks)); \ + ssdfs_memory_leaks_increment(kaddr); \ +} \ +static inline \ +void ssdfs_##name##_cache_leaks_decrement(void *kaddr) \ +{ \ + atomic64_dec(&ssdfs_##name##_cache_leaks); \ + SSDFS_DBG("memory %p, allocation count %lld\n", \ + kaddr, \ + atomic64_read(&ssdfs_##name##_cache_leaks)); \ + ssdfs_memory_leaks_decrement(kaddr); \ +} \ +static inline \ +void *ssdfs_##name##_kmalloc(size_t size, gfp_t flags) \ +{ \ + void *kaddr = ssdfs_kmalloc(size, flags); \ + if (kaddr) { \ + atomic64_inc(&ssdfs_##name##_memory_leaks); \ + SSDFS_DBG("memory %p, allocation count %lld\n", \ + kaddr, \ + atomic64_read(&ssdfs_##name##_memory_leaks)); \ + } \ + return kaddr; \ +} \ +static inline \ +void *ssdfs_##name##_kzalloc(size_t size, gfp_t flags) \ +{ \ + void *kaddr = ssdfs_kzalloc(size, flags); \ + if (kaddr) { \ + atomic64_inc(&ssdfs_##name##_memory_leaks); \ + SSDFS_DBG("memory %p, allocation count %lld\n", \ + kaddr, \ + atomic64_read(&ssdfs_##name##_memory_leaks)); \ + } \ + return kaddr; \ +} \ +static inline \ +void *ssdfs_##name##_kvzalloc(size_t size, gfp_t flags) \ +{ \ + void *kaddr = ssdfs_kvzalloc(size, flags); \ + if (kaddr) { \ + atomic64_inc(&ssdfs_##name##_memory_leaks); \ + SSDFS_DBG("memory %p, allocation count %lld\n", \ + kaddr, \ + atomic64_read(&ssdfs_##name##_memory_leaks)); \ + } \ + return kaddr; \ +} \ +static inline \ +void *ssdfs_##name##_kcalloc(size_t n, size_t size, gfp_t flags) \ +{ \ + void *kaddr = ssdfs_kcalloc(n, size, flags); \ + if (kaddr) { \ + atomic64_inc(&ssdfs_##name##_memory_leaks); \ + SSDFS_DBG("memory %p, allocation count %lld\n", \ + kaddr, \ + atomic64_read(&ssdfs_##name##_memory_leaks)); \ + } \ + return kaddr; \ +} \ +static inline \ +void ssdfs_##name##_kfree(void *kaddr) \ +{ \ + if (kaddr) { \ + atomic64_dec(&ssdfs_##name##_memory_leaks); \ + SSDFS_DBG("memory %p, allocation count %lld\n", \ + kaddr, \ + atomic64_read(&ssdfs_##name##_memory_leaks)); \ + } \ + ssdfs_kfree(kaddr); \ +} \ +static inline \ +void ssdfs_##name##_kvfree(void *kaddr) \ +{ \ + if (kaddr) { \ + atomic64_dec(&ssdfs_##name##_memory_leaks); \ + SSDFS_DBG("memory %p, allocation count %lld\n", \ + kaddr, \ + atomic64_read(&ssdfs_##name##_memory_leaks)); \ + } \ + ssdfs_kvfree(kaddr); \ +} \ +static inline \ +struct page *ssdfs_##name##_alloc_page(gfp_t gfp_mask) \ +{ \ + struct page *page; \ + page = ssdfs_alloc_page(gfp_mask); \ + if (!IS_ERR_OR_NULL(page)) { \ + atomic64_inc(&ssdfs_##name##_page_leaks); \ + SSDFS_DBG("page %p, allocated_pages %lld\n", \ + page, \ + atomic64_read(&ssdfs_##name##_page_leaks)); \ + } \ + return page; \ +} \ +static inline \ +void ssdfs_##name##_account_page(struct page *page) \ +{ \ + if (page) { \ + atomic64_inc(&ssdfs_##name##_page_leaks); \ + SSDFS_DBG("page %p, allocated_pages %lld\n", \ + page, \ + atomic64_read(&ssdfs_##name##_page_leaks)); \ + } \ +} \ +static inline \ +void ssdfs_##name##_forget_page(struct page *page) \ +{ \ + if (page) { \ + atomic64_dec(&ssdfs_##name##_page_leaks); \ + SSDFS_DBG("page %p, allocated_pages %lld\n", \ + page, \ + atomic64_read(&ssdfs_##name##_page_leaks)); \ + } \ +} \ +static inline \ +struct page *ssdfs_##name##_add_pagevec_page(struct pagevec *pvec) \ +{ \ + struct page *page; \ + page = ssdfs_add_pagevec_page(pvec); \ + if (!IS_ERR_OR_NULL(page)) { \ + atomic64_inc(&ssdfs_##name##_page_leaks); \ + SSDFS_DBG("page %p, allocated_pages %lld\n", \ + page, \ + atomic64_read(&ssdfs_##name##_page_leaks)); \ + } \ + return page; \ +} \ +static inline \ +void ssdfs_##name##_free_page(struct page *page) \ +{ \ + if (page) { \ + atomic64_dec(&ssdfs_##name##_page_leaks); \ + SSDFS_DBG("page %p, allocated_pages %lld\n", \ + page, \ + atomic64_read(&ssdfs_##name##_page_leaks)); \ + } \ + ssdfs_free_page(page); \ +} \ +static inline \ +void ssdfs_##name##_pagevec_release(struct pagevec *pvec) \ +{ \ + int i; \ + if (pvec) { \ + for (i = 0; i < pagevec_count(pvec); i++) { \ + struct page *page = pvec->pages[i]; \ + if (!page) \ + continue; \ + atomic64_dec(&ssdfs_##name##_page_leaks); \ + SSDFS_DBG("page %p, allocated_pages %lld\n", \ + page, \ + atomic64_read(&ssdfs_##name##_page_leaks)); \ + } \ + } \ + ssdfs_pagevec_release(pvec); \ +} \ + +#define SSDFS_MEMORY_ALLOCATOR_FNS(name) \ +static inline \ +void ssdfs_##name##_cache_leaks_increment(void *kaddr) \ +{ \ + ssdfs_memory_leaks_increment(kaddr); \ +} \ +static inline \ +void ssdfs_##name##_cache_leaks_decrement(void *kaddr) \ +{ \ + ssdfs_memory_leaks_decrement(kaddr); \ +} \ +static inline \ +void *ssdfs_##name##_kmalloc(size_t size, gfp_t flags) \ +{ \ + return ssdfs_kmalloc(size, flags); \ +} \ +static inline \ +void *ssdfs_##name##_kzalloc(size_t size, gfp_t flags) \ +{ \ + return ssdfs_kzalloc(size, flags); \ +} \ +static inline \ +void *ssdfs_##name##_kvzalloc(size_t size, gfp_t flags) \ +{ \ + return ssdfs_kvzalloc(size, flags); \ +} \ +static inline \ +void *ssdfs_##name##_kcalloc(size_t n, size_t size, gfp_t flags) \ +{ \ + return ssdfs_kcalloc(n, size, flags); \ +} \ +static inline \ +void ssdfs_##name##_kfree(void *kaddr) \ +{ \ + ssdfs_kfree(kaddr); \ +} \ +static inline \ +void ssdfs_##name##_kvfree(void *kaddr) \ +{ \ + ssdfs_kvfree(kaddr); \ +} \ +static inline \ +struct page *ssdfs_##name##_alloc_page(gfp_t gfp_mask) \ +{ \ + return ssdfs_alloc_page(gfp_mask); \ +} \ +static inline \ +void ssdfs_##name##_account_page(struct page *page) \ +{ \ + ssdfs_account_page(page); \ +} \ +static inline \ +void ssdfs_##name##_forget_page(struct page *page) \ +{ \ + ssdfs_forget_page(page); \ +} \ +static inline \ +struct page *ssdfs_##name##_add_pagevec_page(struct pagevec *pvec) \ +{ \ + return ssdfs_add_pagevec_page(pvec); \ +} \ +static inline \ +void ssdfs_##name##_free_page(struct page *page) \ +{ \ + ssdfs_free_page(page); \ +} \ +static inline \ +void ssdfs_##name##_pagevec_release(struct pagevec *pvec) \ +{ \ + ssdfs_pagevec_release(pvec); \ +} \ + +static inline +__le32 ssdfs_crc32_le(void *data, size_t len) +{ + return cpu_to_le32(crc32(~0, data, len)); +} + +static inline +int ssdfs_calculate_csum(struct ssdfs_metadata_check *check, + void *buf, size_t buf_size) +{ + u16 bytes; + u16 flags; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!check || !buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + bytes = le16_to_cpu(check->bytes); + flags = le16_to_cpu(check->flags); + + if (bytes > buf_size) { + SSDFS_ERR("corrupted size %d of checked data\n", bytes); + return -EINVAL; + } + + if (flags & SSDFS_CRC32) { + check->csum = 0; + check->csum = ssdfs_crc32_le(buf, bytes); + } else { + SSDFS_ERR("unknown flags set %#x\n", flags); + return -EINVAL; + } + + return 0; +} + +static inline +bool is_csum_valid(struct ssdfs_metadata_check *check, + void *buf, size_t buf_size) +{ + __le32 old_csum; + __le32 calc_csum; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!check); +#endif /* CONFIG_SSDFS_DEBUG */ + + old_csum = check->csum; + + err = ssdfs_calculate_csum(check, buf, buf_size); + if (unlikely(err)) { + SSDFS_ERR("fail to calculate checksum\n"); + return false; + } + + calc_csum = check->csum; + check->csum = old_csum; + + if (old_csum != calc_csum) { + SSDFS_ERR("old_csum %#x != calc_csum %#x\n", + __le32_to_cpu(old_csum), + __le32_to_cpu(calc_csum)); + return false; + } + + return true; +} + +static inline +bool is_ssdfs_magic_valid(struct ssdfs_signature *magic) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!magic); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (le32_to_cpu(magic->common) != SSDFS_SUPER_MAGIC) + return false; + if (magic->version.major > SSDFS_MAJOR_REVISION || + magic->version.minor > SSDFS_MINOR_REVISION) { + SSDFS_INFO("Volume has unsupported %u.%u version. " + "Driver expects %u.%u version.\n", + magic->version.major, + magic->version.minor, + SSDFS_MAJOR_REVISION, + SSDFS_MINOR_REVISION); + return false; + } + + return true; +} + +#define SSDFS_SEG_HDR(ptr) \ + ((struct ssdfs_segment_header *)(ptr)) +#define SSDFS_LF(ptr) \ + ((struct ssdfs_log_footer *)(ptr)) +#define SSDFS_VH(ptr) \ + ((struct ssdfs_volume_header *)(ptr)) +#define SSDFS_VS(ptr) \ + ((struct ssdfs_volume_state *)(ptr)) +#define SSDFS_PLH(ptr) \ + ((struct ssdfs_partial_log_header *)(ptr)) + +/* + * Flags for mount options. + */ +#define SSDFS_MOUNT_COMPR_MODE_NONE (1 << 0) +#define SSDFS_MOUNT_COMPR_MODE_ZLIB (1 << 1) +#define SSDFS_MOUNT_COMPR_MODE_LZO (1 << 2) +#define SSDFS_MOUNT_ERRORS_CONT (1 << 3) +#define SSDFS_MOUNT_ERRORS_RO (1 << 4) +#define SSDFS_MOUNT_ERRORS_PANIC (1 << 5) +#define SSDFS_MOUNT_IGNORE_FS_STATE (1 << 6) + +#define ssdfs_clear_opt(o, opt) ((o) &= ~SSDFS_MOUNT_##opt) +#define ssdfs_set_opt(o, opt) ((o) |= SSDFS_MOUNT_##opt) +#define ssdfs_test_opt(o, opt) ((o) & SSDFS_MOUNT_##opt) + +#define SSDFS_LOG_FOOTER_OFF(seg_hdr)({ \ + u32 offset; \ + int index; \ + struct ssdfs_metadata_descriptor *desc; \ + index = SSDFS_LOG_FOOTER_INDEX; \ + desc = &SSDFS_SEG_HDR(seg_hdr)->desc_array[index]; \ + offset = le32_to_cpu(desc->offset); \ + offset; \ +}) + +#define SSDFS_LOG_PAGES(seg_hdr) \ + (le16_to_cpu(SSDFS_SEG_HDR(seg_hdr)->log_pages)) +#define SSDFS_SEG_TYPE(seg_hdr) \ + (le16_to_cpu(SSDFS_SEG_HDR(seg_hdr)->seg_type)) + +#define SSDFS_MAIN_SB_PEB(vh, type) \ + (le64_to_cpu(SSDFS_VH(vh)->sb_pebs[type][SSDFS_MAIN_SB_SEG].peb_id)) +#define SSDFS_COPY_SB_PEB(vh, type) \ + (le64_to_cpu(SSDFS_VH(vh)->sb_pebs[type][SSDFS_COPY_SB_SEG].peb_id)) +#define SSDFS_MAIN_SB_LEB(vh, type) \ + (le64_to_cpu(SSDFS_VH(vh)->sb_pebs[type][SSDFS_MAIN_SB_SEG].leb_id)) +#define SSDFS_COPY_SB_LEB(vh, type) \ + (le64_to_cpu(SSDFS_VH(vh)->sb_pebs[type][SSDFS_COPY_SB_SEG].leb_id)) + +#define SSDFS_SEG_CNO(seg_hdr) \ + (le64_to_cpu(SSDFS_SEG_HDR(seg_hdr)->cno)) + +static inline +u64 ssdfs_current_timestamp(void) +{ + struct timespec64 cur_time; + + ktime_get_coarse_real_ts64(&cur_time); + + return (u64)timespec64_to_ns(&cur_time); +} + +static inline +void ssdfs_init_boot_vs_mount_timediff(struct ssdfs_fs_info *fsi) +{ + struct timespec64 uptime; + + ktime_get_boottime_ts64(&uptime); + fsi->boot_vs_mount_timediff = timespec64_to_ns(&uptime); +} + +static inline +u64 ssdfs_current_cno(struct super_block *sb) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + struct timespec64 uptime; + u64 boot_vs_mount_timediff; + u64 fs_mount_cno; + + spin_lock(&fsi->volume_state_lock); + boot_vs_mount_timediff = fsi->boot_vs_mount_timediff; + fs_mount_cno = fsi->fs_mount_cno; + spin_unlock(&fsi->volume_state_lock); + + ktime_get_boottime_ts64(&uptime); + return fs_mount_cno + + timespec64_to_ns(&uptime) - + boot_vs_mount_timediff; +} + +#define SSDFS_MAPTBL_CACHE_HDR(ptr) \ + ((struct ssdfs_maptbl_cache_header *)(ptr)) + +#define SSDFS_SEG_HDR_MAGIC(vh) \ + (le16_to_cpu(SSDFS_VH(vh)->magic.key)) +#define SSDFS_SEG_TIME(seg_hdr) \ + (le64_to_cpu(SSDFS_SEG_HDR(seg_hdr)->timestamp)) + +#define SSDFS_VH_CNO(vh) \ + (le64_to_cpu(SSDFS_VH(vh)->create_cno)) +#define SSDFS_VH_TIME(vh) \ + (le64_to_cpu(SSDFS_VH(vh)->create_timestamp) + +#define SSDFS_VS_CNO(vs) \ + (le64_to_cpu(SSDFS_VS(vs)->cno)) +#define SSDFS_VS_TIME(vs) \ + (le64_to_cpu(SSDFS_VS(vs)->timestamp) + +#define SSDFS_POFFTH(ptr) \ + ((struct ssdfs_phys_offset_table_header *)(ptr)) +#define SSDFS_PHYSOFFD(ptr) \ + ((struct ssdfs_phys_offset_descriptor *)(ptr)) + +static inline +pgoff_t ssdfs_phys_page_to_mem_page(struct ssdfs_fs_info *fsi, + pgoff_t index) +{ + if (fsi->log_pagesize == PAGE_SHIFT) + return index; + else if (fsi->log_pagesize > PAGE_SHIFT) + return index << (fsi->log_pagesize - PAGE_SHIFT); + else + return index >> (PAGE_SHIFT - fsi->log_pagesize); +} + +static inline +pgoff_t ssdfs_mem_page_to_phys_page(struct ssdfs_fs_info *fsi, + pgoff_t index) +{ + if (fsi->log_pagesize == PAGE_SHIFT) + return index; + else if (fsi->log_pagesize > PAGE_SHIFT) + return index >> (fsi->log_pagesize - PAGE_SHIFT); + else + return index << (PAGE_SHIFT - fsi->log_pagesize); +} + +#define SSDFS_MEMPAGE2BYTES(index) \ + ((pgoff_t)index << PAGE_SHIFT) +#define SSDFS_BYTES2MEMPAGE(offset) \ + ((pgoff_t)offset >> PAGE_SHIFT) + +/* + * ssdfs_write_offset_to_mem_page_index() - convert write offset into mem page + * @fsi: pointer on shared file system object + * @start_page: index of log's start physical page + * @write_offset: offset in bytes from log's beginning + */ +static inline +pgoff_t ssdfs_write_offset_to_mem_page_index(struct ssdfs_fs_info *fsi, + u16 start_page, + u32 write_offset) +{ + u32 page_off; + + page_off = ssdfs_phys_page_to_mem_page(fsi, start_page); + page_off = SSDFS_MEMPAGE2BYTES(page_off) + write_offset; + return SSDFS_BYTES2MEMPAGE(page_off); +} + +#define SSDFS_BLKBMP_HDR(ptr) \ + ((struct ssdfs_block_bitmap_header *)(ptr)) +#define SSDFS_SBMP_FRAG_HDR(ptr) \ + ((struct ssdfs_segbmap_fragment_header *)(ptr)) +#define SSDFS_BTN(ptr) \ + ((struct ssdfs_btree_node *)(ptr)) + +static inline +bool need_add_block(struct page *page) +{ + return PageChecked(page); +} + +static inline +bool is_diff_page(struct page *page) +{ + return PageChecked(page); +} + +static inline +void set_page_new(struct page *page) +{ + SetPageChecked(page); +} + +static inline +void clear_page_new(struct page *page) +{ + ClearPageChecked(page); +} + +static +inline void ssdfs_set_page_private(struct page *page, + unsigned long private) +{ + set_page_private(page, private); + SetPagePrivate(page); +} + +static +inline void ssdfs_clear_page_private(struct page *page, + unsigned long private) +{ + set_page_private(page, private); + ClearPagePrivate(page); +} + +static inline +bool can_be_merged_into_extent(struct page *page1, struct page *page2) +{ + ino_t ino1 = page1->mapping->host->i_ino; + ino_t ino2 = page2->mapping->host->i_ino; + pgoff_t index1 = page_index(page1); + pgoff_t index2 = page_index(page2); + pgoff_t diff_index; + bool has_identical_type; + bool has_identical_ino; + + has_identical_type = (PageChecked(page1) && PageChecked(page2)) || + (!PageChecked(page1) && !PageChecked(page2)); + has_identical_ino = ino1 == ino2; + + if (index1 >= index2) + diff_index = index1 - index2; + else + diff_index = index2 - index1; + + return has_identical_type && has_identical_ino && (diff_index == 1); +} + +static inline +int ssdfs_memcpy(void *dst, u32 dst_off, u32 dst_size, + const void *src, u32 src_off, u32 src_size, + u32 copy_size) +{ +#ifdef CONFIG_SSDFS_DEBUG + if ((src_off + copy_size) > src_size) { + SSDFS_ERR("fail to copy: " + "src_off %u, copy_size %u, src_size %u\n", + src_off, copy_size, src_size); + return -ERANGE; + } + + if ((dst_off + copy_size) > dst_size) { + SSDFS_ERR("fail to copy: " + "dst_off %u, copy_size %u, dst_size %u\n", + dst_off, copy_size, dst_size); + return -ERANGE; + } + + SSDFS_DBG("dst %p, dst_off %u, dst_size %u, " + "src %p, src_off %u, src_size %u, " + "copy_size %u\n", + dst, dst_off, dst_size, + src, src_off, src_size, + copy_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + memcpy((u8 *)dst + dst_off, (u8 *)src + src_off, copy_size); + return 0; +} + +static inline +int ssdfs_memcpy_page(struct page *dst_page, u32 dst_off, u32 dst_size, + struct page *src_page, u32 src_off, u32 src_size, + u32 copy_size) +{ +#ifdef CONFIG_SSDFS_DEBUG + if ((src_off + copy_size) > src_size) { + SSDFS_ERR("fail to copy: " + "src_off %u, copy_size %u, src_size %u\n", + src_off, copy_size, src_size); + return -ERANGE; + } + + if ((dst_off + copy_size) > dst_size) { + SSDFS_ERR("fail to copy: " + "dst_off %u, copy_size %u, dst_size %u\n", + dst_off, copy_size, dst_size); + return -ERANGE; + } + + SSDFS_DBG("dst_page %p, dst_off %u, dst_size %u, " + "src_page %p, src_off %u, src_size %u, " + "copy_size %u\n", + dst_page, dst_off, dst_size, + src_page, src_off, src_size, + copy_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + memcpy_page(dst_page, dst_off, src_page, src_off, copy_size); + return 0; +} + +static inline +int ssdfs_memcpy_from_page(void *dst, u32 dst_off, u32 dst_size, + struct page *page, u32 src_off, u32 src_size, + u32 copy_size) +{ +#ifdef CONFIG_SSDFS_DEBUG + if ((src_off + copy_size) > src_size) { + SSDFS_ERR("fail to copy: " + "src_off %u, copy_size %u, src_size %u\n", + src_off, copy_size, src_size); + return -ERANGE; + } + + if ((dst_off + copy_size) > dst_size) { + SSDFS_ERR("fail to copy: " + "dst_off %u, copy_size %u, dst_size %u\n", + dst_off, copy_size, dst_size); + return -ERANGE; + } + + SSDFS_DBG("dst %p, dst_off %u, dst_size %u, " + "page %p, src_off %u, src_size %u, " + "copy_size %u\n", + dst, dst_off, dst_size, + page, src_off, src_size, + copy_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + memcpy_from_page((u8 *)dst + dst_off, page, src_off, copy_size); + return 0; +} + +static inline +int ssdfs_memcpy_to_page(struct page *page, u32 dst_off, u32 dst_size, + void *src, u32 src_off, u32 src_size, + u32 copy_size) +{ +#ifdef CONFIG_SSDFS_DEBUG + if ((src_off + copy_size) > src_size) { + SSDFS_ERR("fail to copy: " + "src_off %u, copy_size %u, src_size %u\n", + src_off, copy_size, src_size); + return -ERANGE; + } + + if ((dst_off + copy_size) > dst_size) { + SSDFS_ERR("fail to copy: " + "dst_off %u, copy_size %u, dst_size %u\n", + dst_off, copy_size, dst_size); + return -ERANGE; + } + + SSDFS_DBG("page %p, dst_off %u, dst_size %u, " + "src %p, src_off %u, src_size %u, " + "copy_size %u\n", + page, dst_off, dst_size, + src, src_off, src_size, + copy_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + memcpy_to_page(page, dst_off, (u8 *)src + src_off, copy_size); + return 0; +} + +static inline +int ssdfs_memmove(void *dst, u32 dst_off, u32 dst_size, + const void *src, u32 src_off, u32 src_size, + u32 move_size) +{ +#ifdef CONFIG_SSDFS_DEBUG + if ((src_off + move_size) > src_size) { + SSDFS_ERR("fail to move: " + "src_off %u, move_size %u, src_size %u\n", + src_off, move_size, src_size); + return -ERANGE; + } + + if ((dst_off + move_size) > dst_size) { + SSDFS_ERR("fail to move: " + "dst_off %u, move_size %u, dst_size %u\n", + dst_off, move_size, dst_size); + return -ERANGE; + } + + SSDFS_DBG("dst %p, dst_off %u, dst_size %u, " + "src %p, src_off %u, src_size %u, " + "move_size %u\n", + dst, dst_off, dst_size, + src, src_off, src_size, + move_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + memmove((u8 *)dst + dst_off, (u8 *)src + src_off, move_size); + return 0; +} + +static inline +int ssdfs_memmove_page(struct page *dst_page, u32 dst_off, u32 dst_size, + struct page *src_page, u32 src_off, u32 src_size, + u32 move_size) +{ +#ifdef CONFIG_SSDFS_DEBUG + if ((src_off + move_size) > src_size) { + SSDFS_ERR("fail to move: " + "src_off %u, move_size %u, src_size %u\n", + src_off, move_size, src_size); + return -ERANGE; + } + + if ((dst_off + move_size) > dst_size) { + SSDFS_ERR("fail to move: " + "dst_off %u, move_size %u, dst_size %u\n", + dst_off, move_size, dst_size); + return -ERANGE; + } + + SSDFS_DBG("dst_page %p, dst_off %u, dst_size %u, " + "src_page %p, src_off %u, src_size %u, " + "move_size %u\n", + dst_page, dst_off, dst_size, + src_page, src_off, src_size, + move_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + memcpy_page(dst_page, dst_off, src_page, src_off, move_size); + return 0; +} + +static inline +int ssdfs_memset_page(struct page *page, u32 dst_off, u32 dst_size, + int value, u32 set_size) +{ +#ifdef CONFIG_SSDFS_DEBUG + if ((dst_off + set_size) > dst_size) { + SSDFS_ERR("fail to copy: " + "dst_off %u, set_size %u, dst_size %u\n", + dst_off, set_size, dst_size); + return -ERANGE; + } + + SSDFS_DBG("page %p, dst_off %u, dst_size %u, " + "value %#x, set_size %u\n", + page, dst_off, dst_size, + value, set_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset_page(page, dst_off, value, set_size); + return 0; +} + +static inline +int ssdfs_memzero_page(struct page *page, u32 dst_off, u32 dst_size, + u32 set_size) +{ +#ifdef CONFIG_SSDFS_DEBUG + if ((dst_off + set_size) > dst_size) { + SSDFS_ERR("fail to copy: " + "dst_off %u, set_size %u, dst_size %u\n", + dst_off, set_size, dst_size); + return -ERANGE; + } + + SSDFS_DBG("page %p, dst_off %u, dst_size %u, " + "set_size %u\n", + page, dst_off, dst_size, set_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + memzero_page(page, dst_off, set_size); + return 0; +} + +static inline +bool is_ssdfs_file_inline(struct ssdfs_inode_info *ii) +{ + return atomic_read(&ii->private_flags) & SSDFS_INODE_HAS_INLINE_FILE; +} + +static inline +size_t ssdfs_inode_inline_file_capacity(struct inode *inode) +{ + struct ssdfs_inode_info *ii = SSDFS_I(inode); + size_t raw_inode_size; + size_t metadata_len; + + raw_inode_size = ii->raw_inode_size; + metadata_len = offsetof(struct ssdfs_inode, internal); + + if (raw_inode_size <= metadata_len) { + SSDFS_ERR("corrupted raw inode: " + "raw_inode_size %zu, metadata_len %zu\n", + raw_inode_size, metadata_len); + return 0; + } + + return raw_inode_size - metadata_len; +} + +/* + * __ssdfs_generate_name_hash() - generate a name's hash + * @name: pointer on the name's string + * @len: length of the name + * @inline_name_max_len: max length of inline name + */ +static inline +u64 __ssdfs_generate_name_hash(const char *name, size_t len, + size_t inline_name_max_len) +{ + u32 hash32_lo, hash32_hi; + size_t copy_len; + u64 name_hash; + u32 diff = 0; + u8 symbol1, symbol2; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!name); + + SSDFS_DBG("name %s, len %zu, inline_name_max_len %zu\n", + name, len, inline_name_max_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (len == 0) { + SSDFS_ERR("invalid len %zu\n", len); + return U64_MAX; + } + + copy_len = min_t(size_t, len, inline_name_max_len); + hash32_lo = full_name_hash(NULL, name, copy_len); + + if (len <= inline_name_max_len) { + hash32_hi = len; + + for (i = 1; i < len; i++) { + symbol1 = (u8)name[i - 1]; + symbol2 = (u8)name[i]; + diff = 0; + + if (symbol1 > symbol2) + diff = symbol1 - symbol2; + else + diff = symbol2 - symbol1; + + hash32_hi += diff * symbol1; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hash32_hi %x, symbol1 %x, " + "symbol2 %x, index %d, diff %u\n", + hash32_hi, symbol1, symbol2, + i, diff); +#endif /* CONFIG_SSDFS_DEBUG */ + } + } else { + hash32_hi = full_name_hash(NULL, + name + inline_name_max_len, + len - copy_len); + } + + name_hash = SSDFS_NAME_HASH(hash32_lo, hash32_hi); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("name %s, len %zu, name_hash %llx\n", + name, len, name_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + return name_hash; +} + +#define SSDFS_LOG_FOOTER_OFF(seg_hdr)({ \ + u32 offset; \ + int index; \ + struct ssdfs_metadata_descriptor *desc; \ + index = SSDFS_LOG_FOOTER_INDEX; \ + desc = &SSDFS_SEG_HDR(seg_hdr)->desc_array[index]; \ + offset = le32_to_cpu(desc->offset); \ + offset; \ +}) + +#define SSDFS_WAITED_TOO_LONG_MSECS (1000) + +static inline +void ssdfs_check_jiffies_left_till_timeout(unsigned long value) +{ +#ifdef CONFIG_SSDFS_DEBUG + unsigned int msecs; + + msecs = jiffies_to_msecs(SSDFS_DEFAULT_TIMEOUT - value); + if (msecs >= SSDFS_WAITED_TOO_LONG_MSECS) + SSDFS_ERR("function waited %u msecs\n", msecs); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +#define SSDFS_WAIT_COMPLETION(end)({ \ + unsigned long res; \ + int err = 0; \ + res = wait_for_completion_timeout(end, SSDFS_DEFAULT_TIMEOUT); \ + if (res == 0) { \ + err = -ERANGE; \ + } else { \ + ssdfs_check_jiffies_left_till_timeout(res); \ + } \ + err; \ +}) + +#define SSDFS_FSI(ptr) \ + ((struct ssdfs_fs_info *)(ptr)) +#define SSDFS_BLKT(ptr) \ + ((struct ssdfs_area_block_table *)(ptr)) +#define SSDFS_FRAGD(ptr) \ + ((struct ssdfs_fragment_desc *)(ptr)) +#define SSDFS_BLKD(ptr) \ + ((struct ssdfs_block_descriptor *)(ptr)) +#define SSDFS_BLKSTOFF(ptr) \ + ((struct ssdfs_blk_state_offset *)(ptr)) +#define SSDFS_STNODE_HDR(ptr) \ + ((struct ssdfs_segment_tree_node_header *)(ptr)) +#define SSDFS_SNRU_HDR(ptr) \ + ((struct ssdfs_snapshot_rules_header *)(ptr)) +#define SSDFS_SNRU_INFO(ptr) \ + ((struct ssdfs_snapshot_rule_info *)(ptr)) + +#define SSDFS_LEB2SEG(fsi, leb) \ + ((u64)ssdfs_get_seg_id_for_leb_id(fsi, leb)) + +#endif /* _SSDFS_INLINE_H */ diff --git a/fs/ssdfs/ssdfs_inode_info.h b/fs/ssdfs/ssdfs_inode_info.h new file mode 100644 index 000000000000..5e98f4fa3672 --- /dev/null +++ b/fs/ssdfs/ssdfs_inode_info.h @@ -0,0 +1,143 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/ssdfs_inode_info.h - SSDFS in-core inode. + * + * Copyright (c) 2019-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + */ + +#ifndef _SSDFS_INODE_INFO_H +#define _SSDFS_INODE_INFO_H + +/* + * Inode flags (GETFLAGS/SETFLAGS) + */ +#define SSDFS_SECRM_FL FS_SECRM_FL /* Secure deletion */ +#define SSDFS_UNRM_FL FS_UNRM_FL /* Undelete */ +#define SSDFS_COMPR_FL FS_COMPR_FL /* Compress file */ +#define SSDFS_SYNC_FL FS_SYNC_FL /* Synchronous updates */ +#define SSDFS_IMMUTABLE_FL FS_IMMUTABLE_FL /* Immutable file */ +#define SSDFS_APPEND_FL FS_APPEND_FL /* writes to file may only append */ +#define SSDFS_NODUMP_FL FS_NODUMP_FL /* do not dump file */ +#define SSDFS_NOATIME_FL FS_NOATIME_FL /* do not update atime */ +/* Reserved for compression usage... */ +#define SSDFS_DIRTY_FL FS_DIRTY_FL +#define SSDFS_COMPRBLK_FL FS_COMPRBLK_FL /* One or more compressed clusters */ +#define SSDFS_NOCOMP_FL FS_NOCOMP_FL /* Don't compress */ +#define SSDFS_ECOMPR_FL FS_ECOMPR_FL /* Compression error */ +/* End compression flags --- maybe not all used */ +#define SSDFS_BTREE_FL FS_BTREE_FL /* btree format dir */ +#define SSDFS_INDEX_FL FS_INDEX_FL /* hash-indexed directory */ +#define SSDFS_IMAGIC_FL FS_IMAGIC_FL /* AFS directory */ +#define SSDFS_JOURNAL_DATA_FL FS_JOURNAL_DATA_FL /* Reserved for ext3 */ +#define SSDFS_NOTAIL_FL FS_NOTAIL_FL /* file tail should not be merged */ +#define SSDFS_DIRSYNC_FL FS_DIRSYNC_FL /* dirsync behaviour (directories only) */ +#define SSDFS_TOPDIR_FL FS_TOPDIR_FL /* Top of directory hierarchies*/ +#define SSDFS_RESERVED_FL FS_RESERVED_FL /* reserved for ext2 lib */ + +#define SSDFS_FL_USER_VISIBLE FS_FL_USER_VISIBLE /* User visible flags */ +#define SSDFS_FL_USER_MODIFIABLE FS_FL_USER_MODIFIABLE /* User modifiable flags */ + +/* Flags that should be inherited by new inodes from their parent. */ +#define SSDFS_FL_INHERITED (SSDFS_SECRM_FL | SSDFS_UNRM_FL | SSDFS_COMPR_FL |\ + SSDFS_SYNC_FL | SSDFS_NODUMP_FL |\ + SSDFS_NOATIME_FL | SSDFS_COMPRBLK_FL |\ + SSDFS_NOCOMP_FL | SSDFS_JOURNAL_DATA_FL |\ + SSDFS_NOTAIL_FL | SSDFS_DIRSYNC_FL) + +/* Flags that are appropriate for regular files (all but dir-specific ones). */ +#define SSDFS_REG_FLMASK (~(SSDFS_DIRSYNC_FL | SSDFS_TOPDIR_FL)) + +/* Flags that are appropriate for non-directories/regular files. */ +#define SSDFS_OTHER_FLMASK (SSDFS_NODUMP_FL | SSDFS_NOATIME_FL) + +/* Mask out flags that are inappropriate for the given type of inode. */ +static inline __u32 ssdfs_mask_flags(umode_t mode, __u32 flags) +{ + if (S_ISDIR(mode)) + return flags; + else if (S_ISREG(mode)) + return flags & SSDFS_REG_FLMASK; + else + return flags & SSDFS_OTHER_FLMASK; +} + +/* + * struct ssdfs_inode_info - in-core inode + * @vfs_inode: VFS inode object + * @birthtime: creation time + * @raw_inode_size: raw inode size in bytes + * @private_flags: inode's private flags + * @lock: inode lock + * @parent_ino: parent inode ID + * @flags: inode flags + * @name_hash: name's hash code + * @name_len: name length + * @extents_tree: extents btree + * @dentries_tree: dentries btree + * @xattrs_tree: extended attributes tree + * @inline_file: inline file buffer + * @raw_inode: raw inode + */ +struct ssdfs_inode_info { + struct inode vfs_inode; + struct timespec64 birthtime; + u16 raw_inode_size; + + atomic_t private_flags; + + struct rw_semaphore lock; + u64 parent_ino; + u32 flags; + u64 name_hash; + u16 name_len; + struct ssdfs_extents_btree_info *extents_tree; + struct ssdfs_dentries_btree_info *dentries_tree; + struct ssdfs_xattrs_btree_info *xattrs_tree; + void *inline_file; + struct ssdfs_inode raw_inode; +}; + +static inline struct ssdfs_inode_info *SSDFS_I(struct inode *inode) +{ + return container_of(inode, struct ssdfs_inode_info, vfs_inode); +} + +static inline +struct ssdfs_extents_btree_info *SSDFS_EXTREE(struct ssdfs_inode_info *ii) +{ + if (S_ISDIR(ii->vfs_inode.i_mode)) + return NULL; + else + return ii->extents_tree; +} + +static inline +struct ssdfs_dentries_btree_info *SSDFS_DTREE(struct ssdfs_inode_info *ii) +{ + if (S_ISDIR(ii->vfs_inode.i_mode)) + return ii->dentries_tree; + else + return NULL; +} + +static inline +struct ssdfs_xattrs_btree_info *SSDFS_XATTREE(struct ssdfs_inode_info *ii) +{ + return ii->xattrs_tree; +} + +extern const struct file_operations ssdfs_dir_operations; +extern const struct inode_operations ssdfs_dir_inode_operations; +extern const struct file_operations ssdfs_file_operations; +extern const struct inode_operations ssdfs_file_inode_operations; +extern const struct address_space_operations ssdfs_aops; +extern const struct inode_operations ssdfs_special_inode_operations; +extern const struct inode_operations ssdfs_symlink_inode_operations; + +#endif /* _SSDFS_INODE_INFO_H */ diff --git a/fs/ssdfs/ssdfs_thread_info.h b/fs/ssdfs/ssdfs_thread_info.h new file mode 100644 index 000000000000..2816a50e18e4 --- /dev/null +++ b/fs/ssdfs/ssdfs_thread_info.h @@ -0,0 +1,42 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/ssdfs_thread_info.h - thread declarations. + * + * Copyright (c) 2019-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + */ + +#ifndef _SSDFS_THREAD_INFO_H +#define _SSDFS_THREAD_INFO_H + +/* + * struct ssdfs_thread_info - thread info + * @task: task descriptor + * @wait: wait queue + * @full_stop: ending of thread's activity + */ +struct ssdfs_thread_info { + struct task_struct *task; + struct wait_queue_entry wait; + struct completion full_stop; +}; + +/* function prototype */ +typedef int (*ssdfs_threadfn)(void *data); + +/* + * struct ssdfs_thread_descriptor - thread descriptor + * @threadfn: thread's function + * @fmt: thread's name format + */ +struct ssdfs_thread_descriptor { + ssdfs_threadfn threadfn; + const char *fmt; +}; + +#endif /* _SSDFS_THREAD_INFO_H */ diff --git a/fs/ssdfs/version.h b/fs/ssdfs/version.h new file mode 100644 index 000000000000..5231f8a1f575 --- /dev/null +++ b/fs/ssdfs/version.h @@ -0,0 +1,7 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +#ifndef _SSDFS_VERSION_H +#define _SSDFS_VERSION_H + +#define SSDFS_VERSION "SSDFS v.4.42" + +#endif /* _SSDFS_VERSION_H */ diff --git a/include/trace/events/ssdfs.h b/include/trace/events/ssdfs.h new file mode 100644 index 000000000000..dbf117dccd28 --- /dev/null +++ b/include/trace/events/ssdfs.h @@ -0,0 +1,255 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * include/trace/events/ssdfs.h - definition of tracepoints. + * + * Copyright (c) 2019-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + */ + +#undef TRACE_SYSTEM +#define TRACE_SYSTEM ssdfs + +#if !defined(_TRACE_SSDFS_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_SSDFS_H + +#include + +DECLARE_EVENT_CLASS(ssdfs__inode, + + TP_PROTO(struct inode *inode), + + TP_ARGS(inode), + + TP_STRUCT__entry( + __field(dev_t, dev) + __field(ino_t, ino) + __field(umode_t, mode) + __field(loff_t, size) + __field(unsigned int, nlink) + __field(blkcnt_t, blocks) + ), + + TP_fast_assign( + __entry->dev = inode->i_sb->s_dev; + __entry->ino = inode->i_ino; + __entry->mode = inode->i_mode; + __entry->nlink = inode->i_nlink; + __entry->size = inode->i_size; + __entry->blocks = inode->i_blocks; + ), + + TP_printk("dev = (%d,%d), ino = %lu, i_mode = 0x%hx, " + "i_size = %lld, i_nlink = %u, i_blocks = %llu", + MAJOR(__entry->dev), + MINOR(__entry->dev), + (unsigned long)__entry->ino, + __entry->mode, + __entry->size, + (unsigned int)__entry->nlink, + (unsigned long long)__entry->blocks) +); + +DECLARE_EVENT_CLASS(ssdfs__inode_exit, + + TP_PROTO(struct inode *inode, int ret), + + TP_ARGS(inode, ret), + + TP_STRUCT__entry( + __field(dev_t, dev) + __field(ino_t, ino) + __field(int, ret) + ), + + TP_fast_assign( + __entry->dev = inode->i_sb->s_dev; + __entry->ino = inode->i_ino; + __entry->ret = ret; + ), + + TP_printk("dev = (%d,%d), ino = %lu, ret = %d", + MAJOR(__entry->dev), + MINOR(__entry->dev), + (unsigned long)__entry->ino, + __entry->ret) +); + +DEFINE_EVENT(ssdfs__inode, ssdfs_inode_new, + + TP_PROTO(struct inode *inode), + + TP_ARGS(inode) +); + +DEFINE_EVENT(ssdfs__inode_exit, ssdfs_inode_new_exit, + + TP_PROTO(struct inode *inode, int ret), + + TP_ARGS(inode, ret) +); + +DEFINE_EVENT(ssdfs__inode, ssdfs_inode_request, + + TP_PROTO(struct inode *inode), + + TP_ARGS(inode) +); + +DEFINE_EVENT(ssdfs__inode, ssdfs_inode_evict, + + TP_PROTO(struct inode *inode), + + TP_ARGS(inode) +); + +DEFINE_EVENT(ssdfs__inode, ssdfs_iget, + + TP_PROTO(struct inode *inode), + + TP_ARGS(inode) +); + +DEFINE_EVENT(ssdfs__inode_exit, ssdfs_iget_exit, + + TP_PROTO(struct inode *inode, int ret), + + TP_ARGS(inode, ret) +); + +TRACE_EVENT(ssdfs_sync_fs, + + TP_PROTO(struct super_block *sb, int wait), + + TP_ARGS(sb, wait), + + TP_STRUCT__entry( + __field(dev_t, dev) + __field(int, wait) + ), + + TP_fast_assign( + __entry->dev = sb->s_dev; + __entry->wait = wait; + ), + + TP_printk("dev = (%d,%d), wait = %d", + MAJOR(__entry->dev), + MINOR(__entry->dev), + __entry->wait) +); + +TRACE_EVENT(ssdfs_sync_fs_exit, + + TP_PROTO(struct super_block *sb, int wait, int ret), + + TP_ARGS(sb, wait, ret), + + TP_STRUCT__entry( + __field(dev_t, dev) + __field(int, wait) + __field(int, ret) + ), + + TP_fast_assign( + __entry->dev = sb->s_dev; + __entry->wait = wait; + __entry->ret = ret; + ), + + TP_printk("dev = (%d,%d), wait = %d, ret = %d", + MAJOR(__entry->dev), + MINOR(__entry->dev), + __entry->wait, + __entry->ret) +); + +DEFINE_EVENT(ssdfs__inode, ssdfs_sync_file_enter, + + TP_PROTO(struct inode *inode), + + TP_ARGS(inode) +); + +TRACE_EVENT(ssdfs_sync_file_exit, + + TP_PROTO(struct file *file, int datasync, int ret), + + TP_ARGS(file, datasync, ret), + + TP_STRUCT__entry( + __field(dev_t, dev) + __field(ino_t, ino) + __field(ino_t, parent) + __field(int, datasync) + __field(int, ret) + ), + + TP_fast_assign( + struct dentry *dentry = file->f_path.dentry; + struct inode *inode = dentry->d_inode; + + __entry->dev = inode->i_sb->s_dev; + __entry->ino = inode->i_ino; + __entry->parent = dentry->d_parent->d_inode->i_ino; + __entry->datasync = datasync; + __entry->ret = ret; + ), + + TP_printk("dev = (%d,%d), ino = %lu, parent = %ld, " + "datasync = %d, ret = %d", + MAJOR(__entry->dev), + MINOR(__entry->dev), + (unsigned long)__entry->ino, + (unsigned long)__entry->parent, + __entry->datasync, + __entry->ret) +); + +TRACE_EVENT(ssdfs_unlink_enter, + + TP_PROTO(struct inode *dir, struct dentry *dentry), + + TP_ARGS(dir, dentry), + + TP_STRUCT__entry( + __field(dev_t, dev) + __field(ino_t, ino) + __field(loff_t, size) + __field(blkcnt_t, blocks) + __field(const char *, name) + ), + + TP_fast_assign( + __entry->dev = dir->i_sb->s_dev; + __entry->ino = dir->i_ino; + __entry->size = dir->i_size; + __entry->blocks = dir->i_blocks; + __entry->name = dentry->d_name.name; + ), + + TP_printk("dev = (%d,%d), dir ino = %lu, i_size = %lld, " + "i_blocks = %llu, name = %s", + MAJOR(__entry->dev), + MINOR(__entry->dev), + (unsigned long)__entry->ino, + __entry->size, + (unsigned long long)__entry->blocks, + __entry->name) +); + +DEFINE_EVENT(ssdfs__inode_exit, ssdfs_unlink_exit, + + TP_PROTO(struct inode *inode, int ret), + + TP_ARGS(inode, ret) +); + +#endif /* _TRACE_SSDFS_H */ + +/* This part must be outside protection */ +#include diff --git a/include/uapi/linux/ssdfs_fs.h b/include/uapi/linux/ssdfs_fs.h new file mode 100644 index 000000000000..50c81751afc9 --- /dev/null +++ b/include/uapi/linux/ssdfs_fs.h @@ -0,0 +1,117 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * include/uapi/linux/ssdfs_fs.h - SSDFS common declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#ifndef _UAPI_LINUX_SSDFS_H +#define _UAPI_LINUX_SSDFS_H + +#include +#include + +/* SSDFS magic signatures */ +#define SSDFS_SUPER_MAGIC 0x53734466 /* SsDf */ +#define SSDFS_SEGMENT_HDR_MAGIC 0x5348 /* SH */ +#define SSDFS_LOG_FOOTER_MAGIC 0x4C46 /* LF */ +#define SSDFS_PARTIAL_LOG_HDR_MAGIC 0x5048 /* PH */ +#define SSDFS_BLK_BMAP_MAGIC 0x424D /* BM */ +#define SSDFS_FRAGMENT_DESC_MAGIC 0x66 /* f */ +#define SSDFS_CHAIN_HDR_MAGIC 0x63 /* c */ +#define SSDFS_PHYS_OFF_TABLE_MAGIC 0x504F5448 /* POTH */ +#define SSDFS_BLK2OFF_TABLE_HDR_MAGIC 0x5474 /* Tt */ +#define SSDFS_SEGBMAP_HDR_MAGIC 0x534D /* SM */ +#define SSDFS_INODE_MAGIC 0x6469 /* di */ +#define SSDFS_PEB_TABLE_MAGIC 0x5074 /* Pt */ +#define SSDFS_LEB_TABLE_MAGIC 0x4C74 /* Lt */ +#define SSDFS_MAPTBL_CACHE_MAGIC 0x4D63 /* Mc */ +#define SSDFS_MAPTBL_CACHE_PEB_STATE_MAGIC 0x4D635053 /* McPS */ +#define SSDFS_INODES_BTREE_MAGIC 0x496E4274 /* InBt */ +#define SSDFS_INODES_BNODE_MAGIC 0x494E /* IN */ +#define SSDFS_DENTRIES_BTREE_MAGIC 0x44654274 /* DeBt */ +#define SSDFS_DENTRIES_BNODE_MAGIC 0x444E /* DN */ +#define SSDFS_EXTENTS_BTREE_MAGIC 0x45784274 /* ExBt */ +#define SSDFS_SHARED_EXTENTS_BTREE_MAGIC 0x53454274 /* SEBt */ +#define SSDFS_EXTENTS_BNODE_MAGIC 0x454E /* EN */ +#define SSDFS_XATTR_BTREE_MAGIC 0x45414274 /* EABt */ +#define SSDFS_SHARED_XATTR_BTREE_MAGIC 0x53454174 /* SEAt */ +#define SSDFS_XATTR_BNODE_MAGIC 0x414E /* AN */ +#define SSDFS_SHARED_DICT_BTREE_MAGIC 0x53446963 /* SDic */ +#define SSDFS_DICTIONARY_BNODE_MAGIC 0x534E /* SN */ +#define SSDFS_SNAPSHOTS_BTREE_MAGIC 0x536E4274 /* SnBt */ +#define SSDFS_SNAPSHOTS_BNODE_MAGIC 0x736E /* sn */ +#define SSDFS_SNAPSHOT_RULES_MAGIC 0x536E5275 /* SnRu */ +#define SSDFS_SNAPSHOT_RECORD_MAGIC 0x5372 /* Sr */ +#define SSDFS_PEB2TIME_RECORD_MAGIC 0x5072 /* Pr */ +#define SSDFS_DIFF_BLOB_MAGIC 0x4466 /* Df */ +#define SSDFS_INVEXT_BTREE_MAGIC 0x49784274 /* IxBt */ +#define SSDFS_INVEXT_BNODE_MAGIC 0x4958 /* IX */ + +/* SSDFS revision */ +#define SSDFS_MAJOR_REVISION 1 +#define SSDFS_MINOR_REVISION 15 + +/* SSDFS constants */ +#define SSDFS_MAX_NAME_LEN 255 +#define SSDFS_UUID_SIZE 16 +#define SSDFS_VOLUME_LABEL_MAX 16 +#define SSDFS_MAX_SNAP_RULE_NAME_LEN 16 +#define SSDFS_MAX_SNAPSHOT_NAME_LEN 12 + +#define SSDFS_RESERVED_VBR_SIZE 1024 /* Volume Boot Record size*/ +#define SSDFS_DEFAULT_SEG_SIZE 8388608 + +/* + * File system states + */ +#define SSDFS_MOUNTED_FS 0x0000 /* Mounted FS state */ +#define SSDFS_VALID_FS 0x0001 /* Unmounted cleanly */ +#define SSDFS_ERROR_FS 0x0002 /* Errors detected */ +#define SSDFS_RESIZE_FS 0x0004 /* Resize required */ +#define SSDFS_LAST_KNOWN_FS_STATE SSDFS_RESIZE_FS + +/* + * Behaviour when detecting errors + */ +#define SSDFS_ERRORS_CONTINUE 1 /* Continue execution */ +#define SSDFS_ERRORS_RO 2 /* Remount fs read-only */ +#define SSDFS_ERRORS_PANIC 3 /* Panic */ +#define SSDFS_ERRORS_DEFAULT SSDFS_ERRORS_CONTINUE +#define SSDFS_LAST_KNOWN_FS_ERROR SSDFS_ERRORS_PANIC + +/* Reserved inode id */ +#define SSDFS_INVALID_EXTENTS_BTREE_INO 5 +#define SSDFS_SNAPSHOTS_BTREE_INO 6 +#define SSDFS_TESTING_INO 7 +#define SSDFS_SHARED_DICT_BTREE_INO 8 +#define SSDFS_INODES_BTREE_INO 9 +#define SSDFS_SHARED_EXTENTS_BTREE_INO 10 +#define SSDFS_SHARED_XATTR_BTREE_INO 11 +#define SSDFS_MAPTBL_INO 12 +#define SSDFS_SEG_TREE_INO 13 +#define SSDFS_SEG_BMAP_INO 14 +#define SSDFS_PEB_CACHE_INO 15 +#define SSDFS_ROOT_INO 16 + +#define SSDFS_LINK_MAX INT_MAX + +#define SSDFS_CUR_SEG_DEFAULT_ID 3 +#define SSDFS_LOG_PAGES_DEFAULT 32 +#define SSDFS_CREATE_THREADS_DEFAULT 1 + +#endif /* _UAPI_LINUX_SSDFS_H */ From patchwork Sat Feb 25 01:08:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151909 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18D9AC64ED8 for ; Sat, 25 Feb 2023 01:16:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229675AbjBYBQB (ORCPT ); Fri, 24 Feb 2023 20:16:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48342 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229610AbjBYBPs (ORCPT ); Fri, 24 Feb 2023 20:15:48 -0500 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 681A912879 for ; Fri, 24 Feb 2023 17:15:43 -0800 (PST) Received: by mail-oi1-x22e.google.com with SMTP id bh20so790927oib.9 for ; Fri, 24 Feb 2023 17:15:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=sJeCP+vkx/2cukvN/pudKoUMgczTWCAc5n5AhbJ5q7o=; b=dNkZ1CA6bgXEKJZkiam1no3GQOLE7ObvK31HbQ/7MyBWiMefgUooUqzeNOCvp3Yg81 L/zOQWRghIvv5elPOHqrGR7YRtZU3Hymv1Co8rt8yYVDYaRPvM+VQw5EqR2Ke+b7+Xos jYSpO1GigKLoY9OH7aaCeHJMqQgv0E3YP/umxPrC3+qPXBDaOYKX+TJ7iy2umCIVirjo 2E/vtiejK5hc3b1/zfacKWUH6omtzeyY+2iy5ANIMW9gxqaN8TM8mtlGKKBhhNPBvKxT Zdi28vjMGqaLvNh3c+MNJmqMOjp0vb72BCal/jUMc/gcdMA40vftOpp7jkJTjYkmvumq 5ykA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sJeCP+vkx/2cukvN/pudKoUMgczTWCAc5n5AhbJ5q7o=; b=1RMZYEiGyuR43LGcj6sjfUAE/PpYgFRjhWXuCffeUKBjKqtSf0+W3GMCHzbSPPPg6Q hC3BcA937N3J2dZXCDN8DvGgoWvLsxUthOLJ8iviT9FMY+tGtmydktEqHqhNETzJ1LU9 Y0rsSMukNvDVR4tpQ7C3F+/ilQdwEeNAMsmNkZZVoxMDYf2B/bzaShM81zmqnkz0qtt/ OKxwrnmMHArWE5Q3jI7P8LF0rKc7vU38QFXS2eAIAThoEdcdFxP2g8UfjzWOdhdWrsJH A3rkZ049krucVmZ897yRSE8bchOMxb98RVRNmdp8XL3GJph3PYa1gFPxDKcor6mKFmQl NfSA== X-Gm-Message-State: AO0yUKWgSVlH/803lhB30kjYQXyc5MLjpF1P/i9ZiTjnOukH2igXWhnb 6gqR4LFoW190APO7S3VnceYW8VJnGlZ4w7EI X-Google-Smtp-Source: AK7set9j8lt23Vg87LduYgMB/p+f6XWX+i3pOLqVkLQLr6uh/m6JgUcRFVFuKDvmFOjyUdIHxCsGzw== X-Received: by 2002:aca:654c:0:b0:35b:e81d:285b with SMTP id j12-20020aca654c000000b0035be81d285bmr4018684oiw.28.1677287740961; Fri, 24 Feb 2023 17:15:40 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.15.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:15:39 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 03/76] ssdfs: implement raw device operations Date: Fri, 24 Feb 2023 17:08:14 -0800 Message-Id: <20230225010927.813929-4-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Implement raw device operations: (1) device_name: get device name (2) device_size: get device size in bytes (3) open_zone: open zone (4) reopen_zone: reopen closed zone (5) close_zone: close zone (6) read: read from device (7) readpage: read page (8) readpages: read sequence of pages (9) can_write_page: can we write into page? (10) writepage: write page to device (11) writepages: write sequence of pages to device (12) erase: erase block (13) trim: support of background erase operation (14) sync: synchronize page cache with device Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/dev_bdev.c | 1187 +++++++++++++++++++++++++++++++++++++++ fs/ssdfs/dev_mtd.c | 641 ++++++++++++++++++++++ fs/ssdfs/dev_zns.c | 1281 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 3109 insertions(+) create mode 100644 fs/ssdfs/dev_bdev.c create mode 100644 fs/ssdfs/dev_mtd.c create mode 100644 fs/ssdfs/dev_zns.c diff --git a/fs/ssdfs/dev_bdev.c b/fs/ssdfs/dev_bdev.c new file mode 100644 index 000000000000..b6cfb7d79c8c --- /dev/null +++ b/fs/ssdfs/dev_bdev.c @@ -0,0 +1,1187 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/dev_bdev.c - Block device access code. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" + +#include + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_dev_bdev_page_leaks; +atomic64_t ssdfs_dev_bdev_memory_leaks; +atomic64_t ssdfs_dev_bdev_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_dev_bdev_cache_leaks_increment(void *kaddr) + * void ssdfs_dev_bdev_cache_leaks_decrement(void *kaddr) + * void *ssdfs_dev_bdev_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_dev_bdev_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_dev_bdev_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_dev_bdev_kfree(void *kaddr) + * struct page *ssdfs_dev_bdev_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_dev_bdev_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_dev_bdev_free_page(struct page *page) + * void ssdfs_dev_bdev_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(dev_bdev) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(dev_bdev) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_dev_bdev_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_dev_bdev_page_leaks, 0); + atomic64_set(&ssdfs_dev_bdev_memory_leaks, 0); + atomic64_set(&ssdfs_dev_bdev_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_dev_bdev_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_dev_bdev_page_leaks) != 0) { + SSDFS_ERR("BLOCK DEV: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_dev_bdev_page_leaks)); + } + + if (atomic64_read(&ssdfs_dev_bdev_memory_leaks) != 0) { + SSDFS_ERR("BLOCK DEV: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_dev_bdev_memory_leaks)); + } + + if (atomic64_read(&ssdfs_dev_bdev_cache_leaks) != 0) { + SSDFS_ERR("BLOCK DEV: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_dev_bdev_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +static DECLARE_WAIT_QUEUE_HEAD(wq); + +/* + * ssdfs_bdev_device_name() - get device name + * @sb: superblock object + */ +static const char *ssdfs_bdev_device_name(struct super_block *sb) +{ + return sb->s_id; +} + +/* + * ssdfs_bdev_device_size() - get partition size in bytes + * @sb: superblock object + */ +static __u64 ssdfs_bdev_device_size(struct super_block *sb) +{ + return i_size_read(sb->s_bdev->bd_inode); +} + +static int ssdfs_bdev_open_zone(struct super_block *sb, loff_t offset) +{ + return -EOPNOTSUPP; +} + +static int ssdfs_bdev_reopen_zone(struct super_block *sb, loff_t offset) +{ + return -EOPNOTSUPP; +} + +static int ssdfs_bdev_close_zone(struct super_block *sb, loff_t offset) +{ + return -EOPNOTSUPP; +} + +/* + * ssdfs_bdev_bio_alloc() - allocate bio object + * @bdev: block device + * @nr_iovecs: number of items in biovec + * @op: direction of I/O + * @gfp_mask: mask of creation flags + */ +struct bio *ssdfs_bdev_bio_alloc(struct block_device *bdev, + unsigned int nr_iovecs, + unsigned int op, + gfp_t gfp_mask) +{ + struct bio *bio; + + bio = bio_alloc(bdev, nr_iovecs, op, gfp_mask); + if (!bio) { + SSDFS_ERR("fail to allocate bio\n"); + return ERR_PTR(-ENOMEM); + } + + return bio; +} + +/* + * ssdfs_bdev_bio_put() - free bio object + */ +void ssdfs_bdev_bio_put(struct bio *bio) +{ + if (!bio) + return; + + bio_put(bio); +} + +/* + * ssdfs_bdev_bio_add_page() - add page into bio + * @bio: pointer on bio object + * @page: memory page + * @len: size of data into memory page + * @offset: vec entry offset + */ +int ssdfs_bdev_bio_add_page(struct bio *bio, struct page *page, + unsigned int len, unsigned int offset) +{ + int res; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bio || !page); + + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + res = bio_add_page(bio, page, len, offset); + if (res != len) { + SSDFS_ERR("res %d != len %u\n", + res, len); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_bdev_sync_page_request() - submit page request + * @sb: superblock object + * @page: memory page + * @offset: offset in bytes from partition's begin + * @op: direction of I/O + * @op_flags: request op flags + */ +static int ssdfs_bdev_sync_page_request(struct super_block *sb, + struct page *page, + loff_t offset, + unsigned int op, int op_flags) +{ + struct bio *bio; + pgoff_t index = (pgoff_t)(offset >> PAGE_SHIFT); + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + bio = ssdfs_bdev_bio_alloc(sb->s_bdev, 1, op, GFP_NOIO); + if (IS_ERR_OR_NULL(bio)) { + err = !bio ? -ERANGE : PTR_ERR(bio); + SSDFS_ERR("fail to allocate bio: err %d\n", + err); + return err; + } + + bio->bi_iter.bi_sector = index * (PAGE_SIZE >> 9); + bio_set_dev(bio, sb->s_bdev); + bio->bi_opf = op | op_flags; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_bdev_bio_add_page(bio, page, PAGE_SIZE, 0); + if (unlikely(err)) { + SSDFS_ERR("fail to add page into bio: " + "err %d\n", + err); + goto finish_sync_page_request; + } + + err = submit_bio_wait(bio); + if (unlikely(err)) { + SSDFS_ERR("fail to process request: " + "err %d\n", + err); + goto finish_sync_page_request; + } + +finish_sync_page_request: + ssdfs_bdev_bio_put(bio); + + return err; +} + +/* + * ssdfs_bdev_sync_pvec_request() - submit pagevec request + * @sb: superblock object + * @pvec: pagevec + * @offset: offset in bytes from partition's begin + * @op: direction of I/O + * @op_flags: request op flags + */ +static int ssdfs_bdev_sync_pvec_request(struct super_block *sb, + struct pagevec *pvec, + loff_t offset, + unsigned int op, int op_flags) +{ + struct bio *bio; + pgoff_t index = (pgoff_t)(offset >> PAGE_SHIFT); + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pvec); + + SSDFS_DBG("offset %llu, op %#x, op_flags %#x\n", + offset, op, op_flags); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pagevec_count(pvec) == 0) { + SSDFS_WARN("empty page vector\n"); + return 0; + } + + bio = ssdfs_bdev_bio_alloc(sb->s_bdev, pagevec_count(pvec), + op, GFP_NOIO); + if (IS_ERR_OR_NULL(bio)) { + err = !bio ? -ERANGE : PTR_ERR(bio); + SSDFS_ERR("fail to allocate bio: err %d\n", + err); + return err; + } + + bio->bi_iter.bi_sector = index * (PAGE_SIZE >> 9); + bio_set_dev(bio, sb->s_bdev); + bio->bi_opf = op | op_flags; + + for (i = 0; i < pagevec_count(pvec); i++) { + struct page *page = pvec->pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); + + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_bdev_bio_add_page(bio, page, + PAGE_SIZE, + 0); + if (unlikely(err)) { + SSDFS_ERR("fail to add page %d into bio: " + "err %d\n", + i, err); + goto finish_sync_pvec_request; + } + } + + err = submit_bio_wait(bio); + if (unlikely(err)) { + SSDFS_ERR("fail to process request: " + "err %d\n", + err); + goto finish_sync_pvec_request; + } + +finish_sync_pvec_request: + ssdfs_bdev_bio_put(bio); + + return err; +} + +/* + * ssdfs_bdev_readpage() - read page from the volume + * @sb: superblock object + * @page: memory page + * @offset: offset in bytes from partition's begin + * + * This function tries to read data on @offset + * from partition's begin in memory page. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + */ +int ssdfs_bdev_readpage(struct super_block *sb, struct page *page, + loff_t offset) +{ + int err; + + err = ssdfs_bdev_sync_page_request(sb, page, offset, + REQ_OP_READ, REQ_SYNC); + if (err) { + ClearPageUptodate(page); + ssdfs_clear_page_private(page, 0); + SetPageError(page); + } else { + SetPageUptodate(page); + ClearPageError(page); + flush_dcache_page(page); + } + + ssdfs_unlock_page(page); + + return err; +} + +/* + * ssdfs_bdev_readpages() - read pages from the volume + * @sb: superblock object + * @pvec: pagevec + * @offset: offset in bytes from partition's begin + * + * This function tries to read data on @offset + * from partition's begin in memory page. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + */ +int ssdfs_bdev_readpages(struct super_block *sb, struct pagevec *pvec, + loff_t offset) +{ + int i; + int err = 0; + + err = ssdfs_bdev_sync_pvec_request(sb, pvec, offset, + REQ_OP_READ, REQ_RAHEAD); + + for (i = 0; i < pagevec_count(pvec); i++) { + struct page *page = pvec->pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (err) { + ClearPageUptodate(page); + ssdfs_clear_page_private(page, 0); + SetPageError(page); + } else { + SetPageUptodate(page); + ClearPageError(page); + flush_dcache_page(page); + } + + ssdfs_unlock_page(page); + } + + return err; +} + +/* + * ssdfs_bdev_read_pvec() - read from volume into buffer + * @sb: superblock object + * @offset: offset in bytes from partition's begin + * @len: size of buffer in bytes + * @buf: buffer + * @read_bytes: pointer on read bytes [out] + * + * This function tries to read data on @offset + * from partition's begin with @len bytes in size + * from the volume into @buf. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + */ +static int ssdfs_bdev_read_pvec(struct super_block *sb, + loff_t offset, size_t len, + void *buf, size_t *read_bytes) +{ + struct pagevec pvec; + struct page *page; + loff_t page_start, page_end; + u32 pages_count; + u32 read_len; + loff_t cur_offset = offset; + u32 page_off; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, offset %llu, len %zu, buf %p\n", + sb, (unsigned long long)offset, len, buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + *read_bytes = 0; + + page_start = offset >> PAGE_SHIFT; + page_end = (offset + len + PAGE_SIZE - 1) >> PAGE_SHIFT; + pages_count = (u32)(page_end - page_start); + + if (pages_count > PAGEVEC_SIZE) { + SSDFS_ERR("pages_count %u > pvec_capacity %u\n", + pages_count, PAGEVEC_SIZE); + return -ERANGE; + } + + pagevec_init(&pvec); + + for (i = 0; i < pages_count; i++) { + page = ssdfs_dev_bdev_alloc_page(GFP_KERNEL | __GFP_ZERO); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("unable to allocate memory page\n"); + goto finish_bdev_read_pvec; + } + + ssdfs_get_page(page); + ssdfs_lock_page(page); + pagevec_add(&pvec, page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + err = ssdfs_bdev_sync_pvec_request(sb, &pvec, offset, + REQ_OP_READ, REQ_SYNC); + if (unlikely(err)) { + SSDFS_ERR("fail to read pagevec: err %d\n", + err); + goto finish_bdev_read_pvec; + } + + for (i = 0; i < pagevec_count(&pvec); i++) { + page = pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (*read_bytes >= len) { + err = -ERANGE; + SSDFS_ERR("read_bytes %zu >= len %zu\n", + *read_bytes, len); + goto finish_bdev_read_pvec; + } + + div_u64_rem(cur_offset, PAGE_SIZE, &page_off); + read_len = min_t(size_t, (size_t)(PAGE_SIZE - page_off), + (size_t)(len - *read_bytes)); + + err = ssdfs_memcpy_from_page(buf, *read_bytes, len, + page, page_off, PAGE_SIZE, + read_len); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + goto finish_bdev_read_pvec; + } + + *read_bytes += read_len; + cur_offset += read_len; + } + +finish_bdev_read_pvec: + for (i = pagevec_count(&pvec) - 1; i >= 0; i--) { + page = pvec.pages[i]; + + if (page) { + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_dev_bdev_free_page(page); + pvec.pages[i] = NULL; + } + } + + pagevec_reinit(&pvec); + + if (*read_bytes != len) { + err = -EIO; + SSDFS_ERR("read_bytes (%zu) != len (%zu)\n", + *read_bytes, len); + } + + return err; +} + +/* + * ssdfs_bdev_read() - read from volume into buffer + * @sb: superblock object + * @offset: offset in bytes from partition's begin + * @len: size of buffer in bytes + * @buf: buffer + * + * This function tries to read data on @offset + * from partition's begin with @len bytes in size + * from the volume into @buf. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + */ +int ssdfs_bdev_read(struct super_block *sb, loff_t offset, + size_t len, void *buf) +{ + size_t read_bytes = 0; + loff_t cur_offset = offset; + u8 *ptr = (u8 *)buf; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, offset %llu, len %zu, buf %p\n", + sb, (unsigned long long)offset, len, buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (len == 0) { + SSDFS_WARN("len is zero\n"); + return 0; + } + + while (read_bytes < len) { + size_t iter_read; + + err = ssdfs_bdev_read_pvec(sb, cur_offset, + len - read_bytes, + ptr, + &iter_read); + if (unlikely(err)) { + SSDFS_ERR("fail to read pvec: " + "cur_offset %llu, read_bytes %zu, " + "err %d\n", + cur_offset, read_bytes, err); + return err; + } + + cur_offset += iter_read; + ptr += iter_read; + read_bytes += iter_read; + } + + return 0; +} + +/* + * ssdfs_bdev_can_write_page() - check that page can be written + * @sb: superblock object + * @offset: offset in bytes from partition's begin + * @need_check: make check or not? + * + * This function checks that page can be written. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EROFS - file system in RO mode. + * %-ENOMEM - fail to allocate memory. + * %-EIO - I/O error. + */ +int ssdfs_bdev_can_write_page(struct super_block *sb, loff_t offset, + bool need_check) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + void *buf; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, offset %llu, need_check %d\n", + sb, (unsigned long long)offset, (int)need_check); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!need_check) + return 0; + + buf = ssdfs_dev_bdev_kzalloc(fsi->pagesize, GFP_KERNEL); + if (!buf) { + SSDFS_ERR("unable to allocate %d bytes\n", fsi->pagesize); + return -ENOMEM; + } + + err = ssdfs_bdev_read(sb, offset, fsi->pagesize, buf); + if (err) + goto free_buf; + + if (memchr_inv(buf, 0xff, fsi->pagesize)) { + if (memchr_inv(buf, 0x00, fsi->pagesize)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("area with offset %llu contains data\n", + (unsigned long long)offset); +#endif /* CONFIG_SSDFS_DEBUG */ + err = -EIO; + } + } + +free_buf: + ssdfs_dev_bdev_kfree(buf); + return err; +} + +/* + * ssdfs_bdev_writepage() - write memory page on volume + * @sb: superblock object + * @to_off: offset in bytes from partition's begin + * @page: memory page + * @from_off: offset in bytes from page's begin + * @len: size of data in bytes + * + * This function tries to write from @page data of @len size + * on @offset from partition's begin in memory page. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EROFS - file system in RO mode. + * %-EIO - I/O error. + */ +int ssdfs_bdev_writepage(struct super_block *sb, loff_t to_off, + struct page *page, u32 from_off, size_t len) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); +#ifdef CONFIG_SSDFS_DEBUG + u32 remainder; +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, to_off %llu, page %p, from_off %u, len %zu\n", + sb, to_off, page, from_off, len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (sb->s_flags & SB_RDONLY) { + SSDFS_WARN("unable to write on RO file system\n"); + return -EROFS; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); + BUG_ON((to_off >= ssdfs_bdev_device_size(sb)) || + (len > (ssdfs_bdev_device_size(sb) - to_off))); + BUG_ON(len == 0); + div_u64_rem((u64)to_off, (u64)fsi->pagesize, &remainder); + BUG_ON(remainder); + BUG_ON((from_off + len) > PAGE_SIZE); + BUG_ON(!PageDirty(page)); + BUG_ON(PageLocked(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + atomic_inc(&fsi->pending_bios); + + err = ssdfs_bdev_sync_page_request(sb, page, to_off, + REQ_OP_WRITE, REQ_SYNC); + if (err) { + SetPageError(page); + SSDFS_ERR("failed to write (err %d): offset %llu\n", + err, (unsigned long long)to_off); + } else { + ssdfs_clear_dirty_page(page); + SetPageUptodate(page); + ClearPageError(page); + } + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (atomic_dec_and_test(&fsi->pending_bios)) + wake_up_all(&wq); + + return err; +} + +/* + * ssdfs_bdev_writepages() - write pagevec on volume + * @sb: superblock object + * @to_off: offset in bytes from partition's begin + * @pvec: memory pages vector + * @from_off: offset in bytes from page's begin + * @len: size of data in bytes + * + * This function tries to write from @pvec data of @len size + * on @offset from partition's begin. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EROFS - file system in RO mode. + * %-EIO - I/O error. + */ +int ssdfs_bdev_writepages(struct super_block *sb, loff_t to_off, + struct pagevec *pvec, + u32 from_off, size_t len) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + struct page *page; + int i; +#ifdef CONFIG_SSDFS_DEBUG + u32 remainder; +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, to_off %llu, pvec %p, from_off %u, len %zu\n", + sb, to_off, pvec, from_off, len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (sb->s_flags & SB_RDONLY) { + SSDFS_WARN("unable to write on RO file system\n"); + return -EROFS; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pvec); + BUG_ON((to_off >= ssdfs_bdev_device_size(sb)) || + (len > (ssdfs_bdev_device_size(sb) - to_off))); + BUG_ON(len == 0); + div_u64_rem((u64)to_off, (u64)fsi->pagesize, &remainder); + BUG_ON(remainder); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pagevec_count(pvec) == 0) { + SSDFS_WARN("empty pagevec\n"); + return 0; + } + + for (i = 0; i < pagevec_count(pvec); i++) { + page = pvec->pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); + BUG_ON(!PageDirty(page)); + BUG_ON(PageLocked(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + } + + atomic_inc(&fsi->pending_bios); + + err = ssdfs_bdev_sync_pvec_request(sb, pvec, to_off, + REQ_OP_WRITE, REQ_SYNC); + + for (i = 0; i < pagevec_count(pvec); i++) { + page = pvec->pages[i]; + + if (err) { + SetPageError(page); + SSDFS_ERR("failed to write (err %d): " + "page_index %llu\n", + err, + (unsigned long long)page_index(page)); + } else { + ssdfs_clear_dirty_page(page); + SetPageUptodate(page); + ClearPageError(page); + } + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + if (atomic_dec_and_test(&fsi->pending_bios)) + wake_up_all(&wq); + + return err; +} + +/* + * ssdfs_bdev_erase_end_io() - callback for erase operation end + */ +static void ssdfs_bdev_erase_end_io(struct bio *bio) +{ + struct super_block *sb = bio->bi_private; + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + + BUG_ON(bio->bi_vcnt == 0); + + ssdfs_bdev_bio_put(bio); + if (atomic_dec_and_test(&fsi->pending_bios)) + wake_up_all(&wq); +} + +/* + * ssdfs_bdev_support_discard() - check that block device supports discard + */ +static inline bool ssdfs_bdev_support_discard(struct block_device *bdev) +{ + return bdev_max_discard_sectors(bdev) || + bdev_is_zoned(bdev); +} + +/* + * ssdfs_bdev_erase_request() - initiate erase request + * @sb: superblock object + * @nr_iovecs: number of pages for erase + * @offset: offset in bytes from partition's begin + * + * This function tries to make erase operation. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EFAULT - erase operation error. + */ +static int ssdfs_bdev_erase_request(struct super_block *sb, + unsigned int nr_iovecs, + loff_t offset) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + struct page *erase_page = fsi->erase_page; + struct bio *bio; + unsigned int max_pages; + pgoff_t index = (pgoff_t)(offset >> PAGE_SHIFT); + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!erase_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (nr_iovecs == 0) { + SSDFS_WARN("empty vector\n"); + return 0; + } + + max_pages = min_t(unsigned int, nr_iovecs, BIO_MAX_VECS); + + bio = ssdfs_bdev_bio_alloc(sb->s_bdev, max_pages, + REQ_OP_DISCARD, GFP_NOFS); + if (IS_ERR_OR_NULL(bio)) { + err = !bio ? -ERANGE : PTR_ERR(bio); + SSDFS_ERR("fail to allocate bio: err %d\n", + err); + return err; + } + + for (i = 0; i < nr_iovecs; i++) { + if (i >= max_pages) { + bio_set_dev(bio, sb->s_bdev); + bio->bi_opf = REQ_OP_DISCARD | REQ_BACKGROUND; + bio->bi_iter.bi_sector = index * (PAGE_SIZE >> 9); + bio->bi_private = sb; + bio->bi_end_io = ssdfs_bdev_erase_end_io; + atomic_inc(&fsi->pending_bios); + submit_bio(bio); + + index += i; + nr_iovecs -= i; + i = 0; + + bio = ssdfs_bdev_bio_alloc(sb->s_bdev, max_pages, + REQ_OP_DISCARD, GFP_NOFS); + if (IS_ERR_OR_NULL(bio)) { + err = !bio ? -ERANGE : PTR_ERR(bio); + SSDFS_ERR("fail to allocate bio: err %d\n", + err); + return err; + } + } + + err = ssdfs_bdev_bio_add_page(bio, erase_page, + PAGE_SIZE, + 0); + if (unlikely(err)) { + SSDFS_ERR("fail to add page %d into bio: " + "err %d\n", + i, err); + goto finish_erase_request; + } + } + + bio_set_dev(bio, sb->s_bdev); + bio->bi_opf = REQ_OP_DISCARD | REQ_BACKGROUND; + bio->bi_iter.bi_sector = index * (PAGE_SIZE >> 9); + bio->bi_private = sb; + bio->bi_end_io = ssdfs_bdev_erase_end_io; + atomic_inc(&fsi->pending_bios); + submit_bio(bio); + + return 0; + +finish_erase_request: + ssdfs_bdev_bio_put(bio); + + return err; +} + +/* + * ssdfs_bdev_erase() - make erase operation + * @sb: superblock object + * @offset: offset in bytes from partition's begin + * @len: size in bytes + * + * This function tries to make erase operation. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EROFS - file system in RO mode. + * %-EFAULT - erase operation error. + */ +static int ssdfs_bdev_erase(struct super_block *sb, loff_t offset, size_t len) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + u32 erase_size = fsi->erasesize; + loff_t page_start, page_end; + u32 pages_count; + sector_t start_sector; + sector_t sectors_count; + u32 remainder; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, offset %llu, len %zu\n", + sb, (unsigned long long)offset, len); + + div_u64_rem((u64)len, (u64)erase_size, &remainder); + BUG_ON(remainder); + div_u64_rem((u64)offset, (u64)erase_size, &remainder); + BUG_ON(remainder); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (sb->s_flags & SB_RDONLY) + return -EROFS; + + div_u64_rem((u64)len, (u64)erase_size, &remainder); + if (remainder) { + SSDFS_WARN("len %llu, erase_size %u, remainder %u\n", + (unsigned long long)len, + erase_size, remainder); + return -ERANGE; + } + + page_start = offset >> PAGE_SHIFT; + page_end = (offset + len + PAGE_SIZE - 1) >> PAGE_SHIFT; + pages_count = (u32)(page_end - page_start); + + if (pages_count == 0) { + SSDFS_WARN("pages_count equals to zero\n"); + return -ERANGE; + } + + if (ssdfs_bdev_support_discard(sb->s_bdev)) { + err = ssdfs_bdev_erase_request(sb, pages_count, offset); + if (unlikely(err)) + goto try_zeroout; + } else { +try_zeroout: + start_sector = page_start << + (PAGE_SHIFT - SSDFS_SECTOR_SHIFT); + sectors_count = pages_count << + (PAGE_SHIFT - SSDFS_SECTOR_SHIFT); + + err = blkdev_issue_zeroout(sb->s_bdev, + start_sector, sectors_count, + GFP_NOFS, 0); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to erase: " + "offset %llu, len %zu, err %d\n", + (unsigned long long)offset, + len, err); + return err; + } + + return 0; +} + +/* + * ssdfs_bdev_trim() - initiate background erase operation + * @sb: superblock object + * @offset: offset in bytes from partition's begin + * @len: size in bytes + * + * This function tries to initiate background erase operation. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EROFS - file system in RO mode. + * %-EFAULT - erase operation error. + */ +static int ssdfs_bdev_trim(struct super_block *sb, loff_t offset, size_t len) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + u32 erase_size = fsi->erasesize; + loff_t page_start, page_end; + u32 pages_count; + u32 remainder; + sector_t start_sector; + sector_t sectors_count; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, offset %llu, len %zu\n", + sb, (unsigned long long)offset, len); + + div_u64_rem((u64)len, (u64)erase_size, &remainder); + BUG_ON(remainder); + div_u64_rem((u64)offset, (u64)erase_size, &remainder); + BUG_ON(remainder); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (sb->s_flags & SB_RDONLY) + return -EROFS; + + div_u64_rem((u64)len, (u64)erase_size, &remainder); + if (remainder) { + SSDFS_WARN("len %llu, erase_size %u, remainder %u\n", + (unsigned long long)len, + erase_size, remainder); + return -ERANGE; + } + + page_start = offset >> PAGE_SHIFT; + page_end = (offset + len + PAGE_SIZE - 1) >> PAGE_SHIFT; + pages_count = (u32)(page_end - page_start); + + if (pages_count == 0) { + SSDFS_WARN("pages_count equals to zero\n"); + return -ERANGE; + } + + start_sector = page_start << (PAGE_SHIFT - SSDFS_SECTOR_SHIFT); + sectors_count = pages_count << (PAGE_SHIFT - SSDFS_SECTOR_SHIFT); + + if (ssdfs_bdev_support_discard(sb->s_bdev)) { + err = blkdev_issue_discard(sb->s_bdev, + start_sector, sectors_count, + GFP_NOFS); + if (unlikely(err)) + goto try_zeroout; + } else { +try_zeroout: + err = blkdev_issue_zeroout(sb->s_bdev, + start_sector, sectors_count, + GFP_NOFS, 0); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to discard: " + "start_sector %llu, sectors_count %llu, " + "err %d\n", + start_sector, sectors_count, err); + return err; + } + + return 0; +} + +/* + * ssdfs_bdev_peb_isbad() - check that PEB is bad + * @sb: superblock object + * @offset: offset in bytes from partition's begin + * + * This function tries to detect that PEB is bad or not. + */ +static int ssdfs_bdev_peb_isbad(struct super_block *sb, loff_t offset) +{ + /* do nothing */ + return 0; +} + +/* + * ssdfs_bdev_mark_peb_bad() - mark PEB as bad + * @sb: superblock object + * @offset: offset in bytes from partition's begin + * + * This function tries to mark PEB as bad. + */ +int ssdfs_bdev_mark_peb_bad(struct super_block *sb, loff_t offset) +{ + /* do nothing */ + return 0; +} + +/* + * ssdfs_bdev_sync() - make sync operation + * @sb: superblock object + */ +static void ssdfs_bdev_sync(struct super_block *sb) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("device %s\n", sb->s_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + wait_event(wq, atomic_read(&fsi->pending_bios) == 0); +} + +const struct ssdfs_device_ops ssdfs_bdev_devops = { + .device_name = ssdfs_bdev_device_name, + .device_size = ssdfs_bdev_device_size, + .open_zone = ssdfs_bdev_open_zone, + .reopen_zone = ssdfs_bdev_reopen_zone, + .close_zone = ssdfs_bdev_close_zone, + .read = ssdfs_bdev_read, + .readpage = ssdfs_bdev_readpage, + .readpages = ssdfs_bdev_readpages, + .can_write_page = ssdfs_bdev_can_write_page, + .writepage = ssdfs_bdev_writepage, + .writepages = ssdfs_bdev_writepages, + .erase = ssdfs_bdev_erase, + .trim = ssdfs_bdev_trim, + .peb_isbad = ssdfs_bdev_peb_isbad, + .mark_peb_bad = ssdfs_bdev_mark_peb_bad, + .sync = ssdfs_bdev_sync, +}; diff --git a/fs/ssdfs/dev_mtd.c b/fs/ssdfs/dev_mtd.c new file mode 100644 index 000000000000..6c092ea863bd --- /dev/null +++ b/fs/ssdfs/dev_mtd.c @@ -0,0 +1,641 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/dev_mtd.c - MTD device access code. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" + +#include + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_dev_mtd_page_leaks; +atomic64_t ssdfs_dev_mtd_memory_leaks; +atomic64_t ssdfs_dev_mtd_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_dev_mtd_cache_leaks_increment(void *kaddr) + * void ssdfs_dev_mtd_cache_leaks_decrement(void *kaddr) + * void *ssdfs_dev_mtd_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_dev_mtd_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_dev_mtd_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_dev_mtd_kfree(void *kaddr) + * struct page *ssdfs_dev_mtd_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_dev_mtd_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_dev_mtd_free_page(struct page *page) + * void ssdfs_dev_mtd_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(dev_mtd) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(dev_mtd) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_dev_mtd_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_dev_mtd_page_leaks, 0); + atomic64_set(&ssdfs_dev_mtd_memory_leaks, 0); + atomic64_set(&ssdfs_dev_mtd_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_dev_mtd_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_dev_mtd_page_leaks) != 0) { + SSDFS_ERR("MTD DEV: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_dev_mtd_page_leaks)); + } + + if (atomic64_read(&ssdfs_dev_mtd_memory_leaks) != 0) { + SSDFS_ERR("MTD DEV: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_dev_mtd_memory_leaks)); + } + + if (atomic64_read(&ssdfs_dev_mtd_cache_leaks) != 0) { + SSDFS_ERR("MTD DEV: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_dev_mtd_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +/* + * ssdfs_mtd_device_name() - get device name + * @sb: superblock object + */ +static const char *ssdfs_mtd_device_name(struct super_block *sb) +{ + return sb->s_mtd->name; +} + +/* + * ssdfs_mtd_device_size() - get partition size in bytes + * @sb: superblock object + */ +static __u64 ssdfs_mtd_device_size(struct super_block *sb) +{ + return SSDFS_FS_I(sb)->mtd->size; +} + +static int ssdfs_mtd_open_zone(struct super_block *sb, loff_t offset) +{ + return -EOPNOTSUPP; +} + +static int ssdfs_mtd_reopen_zone(struct super_block *sb, loff_t offset) +{ + return -EOPNOTSUPP; +} + +static int ssdfs_mtd_close_zone(struct super_block *sb, loff_t offset) +{ + return -EOPNOTSUPP; +} + +/* + * ssdfs_mtd_read() - read from volume into buffer + * @sb: superblock object + * @offset: offset in bytes from partition's begin + * @len: size of buffer in bytes + * @buf: buffer + * + * This function tries to read data on @offset + * from partition's begin with @len bytes in size + * from the volume into @buf. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + */ +static int ssdfs_mtd_read(struct super_block *sb, loff_t offset, size_t len, + void *buf) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + struct mtd_info *mtd = fsi->mtd; + size_t retlen; + int ret; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, offset %llu, len %zu, buf %p\n", + sb, (unsigned long long)offset, len, buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + ret = mtd_read(mtd, offset, len, &retlen, buf); + if (ret) { + SSDFS_ERR("failed to read (err %d): offset %llu, len %zu\n", + ret, (unsigned long long)offset, len); + return ret; + } + + if (retlen != len) { + SSDFS_ERR("retlen (%zu) != len (%zu)\n", retlen, len); + return -EIO; + } + + return 0; +} + +/* + * ssdfs_mtd_readpage() - read page from the volume + * @sb: superblock object + * @page: memory page + * @offset: offset in bytes from partition's begin + * + * This function tries to read data on @offset + * from partition's begin in memory page. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + */ +static int ssdfs_mtd_readpage(struct super_block *sb, struct page *page, + loff_t offset) +{ + void *kaddr; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, offset %llu, page %p, page_index %llu\n", + sb, (unsigned long long)offset, page, + (unsigned long long)page_index(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + kaddr = kmap_local_page(page); + err = ssdfs_mtd_read(sb, offset, PAGE_SIZE, kaddr); + flush_dcache_page(page); + kunmap_local(kaddr); + + if (err) { + ClearPageUptodate(page); + ssdfs_clear_page_private(page, 0); + SetPageError(page); + } else { + SetPageUptodate(page); + ClearPageError(page); + flush_dcache_page(page); + } + + ssdfs_unlock_page(page); + + return err; +} + +/* + * ssdfs_mtd_readpages() - read pages from the volume + * @sb: superblock object + * @pvec: vector of memory pages + * @offset: offset in bytes from partition's begin + * + * This function tries to read data on @offset + * from partition's begin in memory pages. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + */ +static int ssdfs_mtd_readpages(struct super_block *sb, struct pagevec *pvec, + loff_t offset) +{ + struct page *page; + loff_t cur_offset = offset; + u32 page_off; + u32 read_bytes = 0; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, offset %llu, pvec %p\n", + sb, (unsigned long long)offset, pvec); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pagevec_count(pvec) == 0) { + SSDFS_WARN("empty page vector\n"); + return 0; + } + + for (i = 0; i < pagevec_count(pvec); i++) { + page = pvec->pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_mtd_readpage(sb, page, cur_offset); + if (unlikely(err)) { + SSDFS_ERR("fail to read page: " + "cur_offset %llu, err %d\n", + cur_offset, err); + return err; + } + + div_u64_rem(cur_offset, PAGE_SIZE, &page_off); + read_bytes = PAGE_SIZE - page_off; + cur_offset += read_bytes; + } + + return 0; +} + +/* + * ssdfs_mtd_can_write_page() - check that page can be written + * @sb: superblock object + * @offset: offset in bytes from partition's begin + * @need_check: make check or not? + * + * This function checks that page can be written. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EROFS - file system in RO mode. + * %-ENOMEM - fail to allocate memory. + * %-EIO - I/O error. + */ +static int ssdfs_mtd_can_write_page(struct super_block *sb, loff_t offset, + bool need_check) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + void *buf; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, offset %llu, need_check %d\n", + sb, (unsigned long long)offset, (int)need_check); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!need_check) + return 0; + + buf = ssdfs_dev_mtd_kzalloc(fsi->pagesize, GFP_KERNEL); + if (!buf) { + SSDFS_ERR("unable to allocate %d bytes\n", fsi->pagesize); + return -ENOMEM; + } + + err = ssdfs_mtd_read(sb, offset, fsi->pagesize, buf); + if (err) + goto free_buf; + + if (memchr_inv(buf, 0xff, fsi->pagesize)) { + SSDFS_ERR("area with offset %llu contains unmatching char\n", + (unsigned long long)offset); + err = -EIO; + } + +free_buf: + ssdfs_dev_mtd_kfree(buf); + return err; +} + +/* + * ssdfs_mtd_writepage() - write memory page on volume + * @sb: superblock object + * @to_off: offset in bytes from partition's begin + * @page: memory page + * @from_off: offset in bytes from page's begin + * @len: size of data in bytes + * + * This function tries to write from @page data of @len size + * on @offset from partition's begin in memory page. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EROFS - file system in RO mode. + * %-EIO - I/O error. + */ +static int ssdfs_mtd_writepage(struct super_block *sb, loff_t to_off, + struct page *page, u32 from_off, size_t len) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + struct mtd_info *mtd = fsi->mtd; + size_t retlen; + unsigned char *kaddr; + int ret; +#ifdef CONFIG_SSDFS_DEBUG + u32 remainder; +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, to_off %llu, page %p, from_off %u, len %zu\n", + sb, to_off, page, from_off, len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (sb->s_flags & SB_RDONLY) { + SSDFS_WARN("unable to write on RO file system\n"); + return -EROFS; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); + BUG_ON((to_off >= mtd->size) || (len > (mtd->size - to_off))); + BUG_ON(len == 0); + div_u64_rem((u64)to_off, (u64)fsi->pagesize, &remainder); + BUG_ON(remainder); + BUG_ON((from_off + len) > PAGE_SIZE); + BUG_ON(!PageDirty(page)); + BUG_ON(PageLocked(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + ret = mtd_write(mtd, to_off, len, &retlen, kaddr + from_off); + kunmap_local(kaddr); + + if (ret || (retlen != len)) { + SetPageError(page); + SSDFS_ERR("failed to write (err %d): offset %llu, " + "len %zu, retlen %zu\n", + ret, (unsigned long long)to_off, len, retlen); + err = -EIO; + } else { + ssdfs_clear_dirty_page(page); + SetPageUptodate(page); + ClearPageError(page); + } + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_mtd_writepages() - write memory pages on volume + * @sb: superblock object + * @to_off: offset in bytes from partition's begin + * @pvec: vector of memory pages + * @from_off: offset in bytes from page's begin + * @len: size of data in bytes + * + * This function tries to write from @pvec data of @len size + * on @offset from partition's begin in memory page. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EROFS - file system in RO mode. + * %-EIO - I/O error. + */ +static int ssdfs_mtd_writepages(struct super_block *sb, loff_t to_off, + struct pagevec *pvec, u32 from_off, size_t len) +{ + struct page *page; + loff_t cur_to_off = to_off; + u32 page_off = from_off; + u32 written_bytes = 0; + size_t write_len; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, to_off %llu, pvec %p, from_off %u, len %zu\n", + sb, to_off, pvec, from_off, len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (sb->s_flags & SB_RDONLY) { + SSDFS_WARN("unable to write on RO file system\n"); + return -EROFS; + } + + if (pagevec_count(pvec) == 0) { + SSDFS_WARN("empty page vector\n"); + return 0; + } + + for (i = 0; i < pagevec_count(pvec); i++) { + page = pvec->pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (written_bytes >= len) { + SSDFS_ERR("written_bytes %u >= len %zu\n", + written_bytes, len); + return -ERANGE; + } + + write_len = min_t(size_t, (size_t)(PAGE_SIZE - page_off), + (size_t)(len - written_bytes)); + + err = ssdfs_mtd_writepage(sb, cur_to_off, page, page_off, write_len); + if (unlikely(err)) { + SSDFS_ERR("fail to write page: " + "cur_to_off %llu, page_off %u, " + "write_len %zu, err %d\n", + cur_to_off, page_off, write_len, err); + return err; + } + + div_u64_rem(cur_to_off, PAGE_SIZE, &page_off); + written_bytes += write_len; + cur_to_off += write_len; + } + + return 0; +} + +static void ssdfs_erase_callback(struct erase_info *ei) +{ + complete((struct completion *)ei->priv); +} + +/* + * ssdfs_mtd_erase() - make erase operation + * @sb: superblock object + * @offset: offset in bytes from partition's begin + * @len: size in bytes + * + * This function tries to make erase operation. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EROFS - file system in RO mode. + * %-EFAULT - erase operation error. + */ +static int ssdfs_mtd_erase(struct super_block *sb, loff_t offset, size_t len) +{ + struct mtd_info *mtd = SSDFS_FS_I(sb)->mtd; + struct erase_info ei; + DECLARE_COMPLETION_ONSTACK(complete); + u32 remainder; + int ret; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, offset %llu, len %zu\n", + sb, (unsigned long long)offset, len); + + div_u64_rem((u64)len, (u64)mtd->erasesize, &remainder); + BUG_ON(remainder); + div_u64_rem((u64)offset, (u64)mtd->erasesize, &remainder); + BUG_ON(remainder); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (sb->s_flags & SB_RDONLY) + return -EROFS; + + div_u64_rem((u64)len, (u64)mtd->erasesize, &remainder); + if (remainder) { + SSDFS_WARN("len %llu, erase_size %u, remainder %u\n", + (unsigned long long)len, + mtd->erasesize, remainder); + return -ERANGE; + } + + memset(&ei, 0, sizeof(ei)); + ei.mtd = mtd; + ei.addr = offset; + ei.len = len; + ei.callback = ssdfs_erase_callback; + ei.priv = (long)&complete; + + ret = mtd_erase(mtd, &ei); + if (ret) { + SSDFS_ERR("failed to erase (err %d): offset %llu, len %zu\n", + ret, (unsigned long long)offset, len); + return ret; + } + + err = SSDFS_WAIT_COMPLETION(&complete); + if (unlikely(err)) { + SSDFS_ERR("timeout is out: " + "err %d\n", err); + return err; + } + + if (ei.state != MTD_ERASE_DONE) { + SSDFS_ERR("ei.state %#x, offset %llu, len %zu\n", + ei.state, (unsigned long long)offset, len); + return -EFAULT; + } + + return 0; +} + +/* + * ssdfs_mtd_trim() - initiate background erase operation + * @sb: superblock object + * @offset: offset in bytes from partition's begin + * @len: size in bytes + * + * This function tries to initiate background erase operation. + * Currently, it is the same operation as foreground erase. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EROFS - file system in RO mode. + * %-EFAULT - erase operation error. + */ +static int ssdfs_mtd_trim(struct super_block *sb, loff_t offset, size_t len) +{ + return ssdfs_mtd_erase(sb, offset, len); +} + +/* + * ssdfs_mtd_peb_isbad() - check that PEB is bad + * @sb: superblock object + * @offset: offset in bytes from partition's begin + * + * This function tries to detect that PEB is bad or not. + */ +static int ssdfs_mtd_peb_isbad(struct super_block *sb, loff_t offset) +{ + return mtd_block_isbad(SSDFS_FS_I(sb)->mtd, offset); +} + +/* + * ssdfs_mtd_mark_peb_bad() - mark PEB as bad + * @sb: superblock object + * @offset: offset in bytes from partition's begin + * + * This function tries to mark PEB as bad. + */ +int ssdfs_mtd_mark_peb_bad(struct super_block *sb, loff_t offset) +{ + return mtd_block_markbad(SSDFS_FS_I(sb)->mtd, offset); +} + +/* + * ssdfs_mtd_sync() - make sync operation + * @sb: superblock object + */ +static void ssdfs_mtd_sync(struct super_block *sb) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("device %d (\"%s\")\n", + fsi->mtd->index, fsi->mtd->name); +#endif /* CONFIG_SSDFS_DEBUG */ + + mtd_sync(fsi->mtd); +} + +const struct ssdfs_device_ops ssdfs_mtd_devops = { + .device_name = ssdfs_mtd_device_name, + .device_size = ssdfs_mtd_device_size, + .open_zone = ssdfs_mtd_open_zone, + .reopen_zone = ssdfs_mtd_reopen_zone, + .close_zone = ssdfs_mtd_close_zone, + .read = ssdfs_mtd_read, + .readpage = ssdfs_mtd_readpage, + .readpages = ssdfs_mtd_readpages, + .can_write_page = ssdfs_mtd_can_write_page, + .writepage = ssdfs_mtd_writepage, + .writepages = ssdfs_mtd_writepages, + .erase = ssdfs_mtd_erase, + .trim = ssdfs_mtd_trim, + .peb_isbad = ssdfs_mtd_peb_isbad, + .mark_peb_bad = ssdfs_mtd_mark_peb_bad, + .sync = ssdfs_mtd_sync, +}; diff --git a/fs/ssdfs/dev_zns.c b/fs/ssdfs/dev_zns.c new file mode 100644 index 000000000000..2b45f3b1632c --- /dev/null +++ b/fs/ssdfs/dev_zns.c @@ -0,0 +1,1281 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/dev_zns.c - ZNS SSD support. + * + * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates. + * https://www.bytedance.com/ + * Copyright (c) 2022-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cong Wang + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" + +#include + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_dev_zns_page_leaks; +atomic64_t ssdfs_dev_zns_memory_leaks; +atomic64_t ssdfs_dev_zns_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_dev_zns_cache_leaks_increment(void *kaddr) + * void ssdfs_dev_zns_cache_leaks_decrement(void *kaddr) + * void *ssdfs_dev_zns_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_dev_zns_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_dev_zns_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_dev_zns_kfree(void *kaddr) + * struct page *ssdfs_dev_zns_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_dev_zns_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_dev_zns_free_page(struct page *page) + * void ssdfs_dev_zns_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(dev_zns) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(dev_zns) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_dev_zns_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_dev_zns_page_leaks, 0); + atomic64_set(&ssdfs_dev_zns_memory_leaks, 0); + atomic64_set(&ssdfs_dev_zns_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_dev_zns_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_dev_zns_page_leaks) != 0) { + SSDFS_ERR("ZNS DEV: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_dev_zns_page_leaks)); + } + + if (atomic64_read(&ssdfs_dev_zns_memory_leaks) != 0) { + SSDFS_ERR("ZNS DEV: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_dev_zns_memory_leaks)); + } + + if (atomic64_read(&ssdfs_dev_zns_cache_leaks) != 0) { + SSDFS_ERR("ZNS DEV: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_dev_zns_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +static DECLARE_WAIT_QUEUE_HEAD(zns_wq); + +/* + * ssdfs_zns_device_name() - get device name + * @sb: superblock object + */ +static const char *ssdfs_zns_device_name(struct super_block *sb) +{ + return sb->s_id; +} + +/* + * ssdfs_zns_device_size() - get partition size in bytes + * @sb: superblock object + */ +static __u64 ssdfs_zns_device_size(struct super_block *sb) +{ + return i_size_read(sb->s_bdev->bd_inode); +} + +static int ssdfs_report_zone(struct blk_zone *zone, + unsigned int index, void *data) +{ + ssdfs_memcpy(data, 0, sizeof(struct blk_zone), + zone, 0, sizeof(struct blk_zone), + sizeof(struct blk_zone)); + return 0; +} + +/* + * ssdfs_zns_open_zone() - open zone + * @sb: superblock object + * @offset: offset in bytes from partition's begin + */ +static int ssdfs_zns_open_zone(struct super_block *sb, loff_t offset) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + sector_t zone_sector = offset >> SECTOR_SHIFT; + sector_t zone_size = fsi->erasesize >> SECTOR_SHIFT; + u32 open_zones; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, offset %llu\n", + sb, (unsigned long long)offset); + SSDFS_DBG("BEFORE: open_zones %d\n", + atomic_read(&fsi->open_zones)); +#endif /* CONFIG_SSDFS_DEBUG */ + + open_zones = atomic_inc_return(&fsi->open_zones); + if (open_zones > fsi->max_open_zones) { + atomic_dec(&fsi->open_zones); + + SSDFS_WARN("open zones limit achieved: " + "open_zones %u\n", open_zones); + return -EBUSY; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("AFTER: open_zones %d\n", + atomic_read(&fsi->open_zones)); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = blkdev_zone_mgmt(sb->s_bdev, REQ_OP_ZONE_OPEN, + zone_sector, zone_size, GFP_NOFS); + if (unlikely(err)) { + SSDFS_ERR("fail to open zone: " + "zone_sector %llu, zone_size %llu, " + "open_zones %u, max_open_zones %u, " + "err %d\n", + zone_sector, zone_size, + open_zones, fsi->max_open_zones, + err); + return err; + } + + return 0; +} + +/* + * ssdfs_zns_reopen_zone() - reopen closed zone + * @sb: superblock object + * @offset: offset in bytes from partition's begin + */ +static int ssdfs_zns_reopen_zone(struct super_block *sb, loff_t offset) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + struct blk_zone zone; + sector_t zone_sector = offset >> SECTOR_SHIFT; + sector_t zone_size = fsi->erasesize >> SECTOR_SHIFT; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, offset %llu\n", + sb, (unsigned long long)offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = blkdev_report_zones(sb->s_bdev, zone_sector, 1, + ssdfs_report_zone, &zone); + if (err != 1) { + SSDFS_ERR("fail to take report zone: " + "zone_sector %llu, err %d\n", + zone_sector, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("zone before: start %llu, len %llu, wp %llu, " + "type %#x, cond %#x, non_seq %#x, " + "reset %#x, capacity %llu\n", + zone.start, zone.len, zone.wp, + zone.type, zone.cond, zone.non_seq, + zone.reset, zone.capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (zone.cond) { + case BLK_ZONE_COND_CLOSED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("zone is closed: offset %llu\n", + offset); +#endif /* CONFIG_SSDFS_DEBUG */ + /* continue logic */ + break; + + case BLK_ZONE_COND_READONLY: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("zone is READ-ONLY: offset %llu\n", + offset); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EIO; + + case BLK_ZONE_COND_FULL: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("zone is full: offset %llu\n", + offset); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EIO; + + case BLK_ZONE_COND_OFFLINE: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("zone is offline: offset %llu\n", + offset); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EIO; + + default: + /* continue logic */ + break; + } + + err = blkdev_zone_mgmt(sb->s_bdev, REQ_OP_ZONE_OPEN, + zone_sector, zone_size, GFP_NOFS); + if (unlikely(err)) { + SSDFS_ERR("fail to open zone: " + "zone_sector %llu, zone_size %llu, " + "err %d\n", + zone_sector, zone_size, + err); + return err; + } + + err = blkdev_report_zones(sb->s_bdev, zone_sector, 1, + ssdfs_report_zone, &zone); + if (err != 1) { + SSDFS_ERR("fail to take report zone: " + "zone_sector %llu, err %d\n", + zone_sector, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("zone after: start %llu, len %llu, wp %llu, " + "type %#x, cond %#x, non_seq %#x, " + "reset %#x, capacity %llu\n", + zone.start, zone.len, zone.wp, + zone.type, zone.cond, zone.non_seq, + zone.reset, zone.capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (zone.cond) { + case BLK_ZONE_COND_CLOSED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("zone is closed: offset %llu\n", + offset); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EIO; + + case BLK_ZONE_COND_READONLY: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("zone is READ-ONLY: offset %llu\n", + offset); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EIO; + + case BLK_ZONE_COND_FULL: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("zone is full: offset %llu\n", + offset); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EIO; + + case BLK_ZONE_COND_OFFLINE: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("zone is offline: offset %llu\n", + offset); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EIO; + + default: + /* continue logic */ + break; + } + + return 0; +} + +/* + * ssdfs_zns_close_zone() - close zone + * @sb: superblock object + * @offset: offset in bytes from partition's begin + */ +static int ssdfs_zns_close_zone(struct super_block *sb, loff_t offset) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + sector_t zone_sector = offset >> SECTOR_SHIFT; + sector_t zone_size = fsi->erasesize >> SECTOR_SHIFT; + u32 open_zones; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, offset %llu\n", + sb, (unsigned long long)offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = blkdev_zone_mgmt(sb->s_bdev, REQ_OP_ZONE_FINISH, + zone_sector, zone_size, GFP_NOFS); + if (unlikely(err)) { + SSDFS_ERR("fail to open zone: " + "zone_sector %llu, zone_size %llu, err %d\n", + zone_sector, zone_size, err); + return err; + } + + open_zones = atomic_dec_return(&fsi->open_zones); + if (open_zones > fsi->max_open_zones) { + SSDFS_WARN("open zones limit exhausted: " + "open_zones %u\n", open_zones); + } + + return 0; +} + +/* + * ssdfs_zns_zone_size() - retrieve zone size + * @sb: superblock object + * @offset: offset in bytes from partition's begin + * + * This function tries to retrieve zone size. + */ +u64 ssdfs_zns_zone_size(struct super_block *sb, loff_t offset) +{ + struct blk_zone zone; + sector_t zone_sector = offset >> SECTOR_SHIFT; + int res; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, offset %llu\n", + sb, (unsigned long long)offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + res = blkdev_report_zones(sb->s_bdev, zone_sector, 1, + ssdfs_report_zone, &zone); + if (res != 1) { + SSDFS_ERR("fail to take report zone: " + "zone_sector %llu, err %d\n", + zone_sector, res); + return U64_MAX; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("zone: start %llu, len %llu, wp %llu, " + "type %#x, cond %#x, non_seq %#x, " + "reset %#x, capacity %llu\n", + zone.start, zone.len, zone.wp, + zone.type, zone.cond, zone.non_seq, + zone.reset, zone.capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + return (u64)zone.len << SECTOR_SHIFT; +} + +/* + * ssdfs_zns_zone_capacity() - retrieve zone capacity + * @sb: superblock object + * @offset: offset in bytes from partition's begin + * + * This function tries to retrieve zone capacity. + */ +u64 ssdfs_zns_zone_capacity(struct super_block *sb, loff_t offset) +{ + struct blk_zone zone; + sector_t zone_sector = offset >> SECTOR_SHIFT; + int res; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, offset %llu\n", + sb, (unsigned long long)offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + res = blkdev_report_zones(sb->s_bdev, zone_sector, 1, + ssdfs_report_zone, &zone); + if (res != 1) { + SSDFS_ERR("fail to take report zone: " + "zone_sector %llu, err %d\n", + zone_sector, res); + return U64_MAX; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("zone: start %llu, len %llu, wp %llu, " + "type %#x, cond %#x, non_seq %#x, " + "reset %#x, capacity %llu\n", + zone.start, zone.len, zone.wp, + zone.type, zone.cond, zone.non_seq, + zone.reset, zone.capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + return (u64)zone.capacity << SECTOR_SHIFT; +} + +/* + * ssdfs_zns_sync_page_request() - submit page request + * @sb: superblock object + * @page: memory page + * @zone_start: first sector of zone + * @offset: offset in bytes from partition's begin + * @op: direction of I/O + * @op_flags: request op flags + */ +static int ssdfs_zns_sync_page_request(struct super_block *sb, + struct page *page, + sector_t zone_start, + loff_t offset, + unsigned int op, int op_flags) +{ + struct bio *bio; +#ifdef CONFIG_SSDFS_DEBUG + sector_t zone_sector = offset >> SECTOR_SHIFT; + struct blk_zone zone; + int res; +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + + op |= REQ_OP_ZONE_APPEND | REQ_IDLE; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); + + SSDFS_DBG("offset %llu, zone_start %llu, " + "op %#x, op_flags %#x\n", + offset, zone_start, op, op_flags); + + res = blkdev_report_zones(sb->s_bdev, zone_sector, 1, + ssdfs_report_zone, &zone); + if (res != 1) { + SSDFS_ERR("fail to take report zone: " + "zone_sector %llu, err %d\n", + zone_sector, res); + } else { + SSDFS_DBG("zone: start %llu, len %llu, wp %llu, " + "type %#x, cond %#x, non_seq %#x, " + "reset %#x, capacity %llu\n", + zone.start, zone.len, zone.wp, + zone.type, zone.cond, zone.non_seq, + zone.reset, zone.capacity); + } + + BUG_ON(zone_start != zone.start); +#endif /* CONFIG_SSDFS_DEBUG */ + + bio = ssdfs_bdev_bio_alloc(sb->s_bdev, 1, op, GFP_NOFS); + if (IS_ERR_OR_NULL(bio)) { + err = !bio ? -ERANGE : PTR_ERR(bio); + SSDFS_ERR("fail to allocate bio: err %d\n", + err); + return err; + } + + bio->bi_iter.bi_sector = zone_start; + bio_set_dev(bio, sb->s_bdev); + bio->bi_opf = op | op_flags; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_bdev_bio_add_page(bio, page, PAGE_SIZE, 0); + if (unlikely(err)) { + SSDFS_ERR("fail to add page into bio: " + "err %d\n", + err); + goto finish_sync_page_request; + } + + err = submit_bio_wait(bio); + if (unlikely(err)) { + SSDFS_ERR("fail to process request: " + "err %d\n", + err); + goto finish_sync_page_request; + } + +finish_sync_page_request: + ssdfs_bdev_bio_put(bio); + + return err; +} + +/* + * ssdfs_zns_sync_pvec_request() - submit pagevec request + * @sb: superblock object + * @pvec: pagevec + * @zone_start: first sector of zone + * @offset: offset in bytes from partition's begin + * @op: direction of I/O + * @op_flags: request op flags + */ +static int ssdfs_zns_sync_pvec_request(struct super_block *sb, + struct pagevec *pvec, + sector_t zone_start, + loff_t offset, + unsigned int op, int op_flags) +{ + struct bio *bio; + int i; +#ifdef CONFIG_SSDFS_DEBUG + sector_t zone_sector = offset >> SECTOR_SHIFT; + struct blk_zone zone; + int res; +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + + op |= REQ_OP_ZONE_APPEND | REQ_IDLE; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pvec); + + SSDFS_DBG("offset %llu, zone_start %llu, " + "op %#x, op_flags %#x\n", + offset, zone_start, op, op_flags); + + res = blkdev_report_zones(sb->s_bdev, zone_sector, 1, + ssdfs_report_zone, &zone); + if (res != 1) { + SSDFS_ERR("fail to take report zone: " + "zone_sector %llu, err %d\n", + zone_sector, res); + } else { + SSDFS_DBG("zone: start %llu, len %llu, wp %llu, " + "type %#x, cond %#x, non_seq %#x, " + "reset %#x, capacity %llu\n", + zone.start, zone.len, zone.wp, + zone.type, zone.cond, zone.non_seq, + zone.reset, zone.capacity); + } + + BUG_ON(zone_start != zone.start); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pagevec_count(pvec) == 0) { + SSDFS_WARN("empty page vector\n"); + return 0; + } + + bio = ssdfs_bdev_bio_alloc(sb->s_bdev, pagevec_count(pvec), + op, GFP_NOFS); + if (IS_ERR_OR_NULL(bio)) { + err = !bio ? -ERANGE : PTR_ERR(bio); + SSDFS_ERR("fail to allocate bio: err %d\n", + err); + return err; + } + + bio->bi_iter.bi_sector = zone_start; + bio_set_dev(bio, sb->s_bdev); + bio->bi_opf = op | op_flags; + + for (i = 0; i < pagevec_count(pvec); i++) { + struct page *page = pvec->pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); + + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_bdev_bio_add_page(bio, page, + PAGE_SIZE, + 0); + if (unlikely(err)) { + SSDFS_ERR("fail to add page %d into bio: " + "err %d\n", + i, err); + goto finish_sync_pvec_request; + } + } + + err = submit_bio_wait(bio); + if (unlikely(err)) { + SSDFS_ERR("fail to process request: " + "err %d\n", + err); + goto finish_sync_pvec_request; + } + +finish_sync_pvec_request: + ssdfs_bdev_bio_put(bio); + + return err; +} + +/* + * ssdfs_zns_readpage() - read page from the volume + * @sb: superblock object + * @page: memory page + * @offset: offset in bytes from partition's begin + * + * This function tries to read data on @offset + * from partition's begin in memory page. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + */ +int ssdfs_zns_readpage(struct super_block *sb, struct page *page, + loff_t offset) +{ +#ifdef CONFIG_SSDFS_DEBUG + struct blk_zone zone; + sector_t zone_sector = offset >> SECTOR_SHIFT; + int res; +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, offset %llu\n", + sb, (unsigned long long)offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_bdev_readpage(sb, page, offset); + +#ifdef CONFIG_SSDFS_DEBUG + res = blkdev_report_zones(sb->s_bdev, zone_sector, 1, + ssdfs_report_zone, &zone); + if (res != 1) { + SSDFS_ERR("fail to take report zone: " + "zone_sector %llu, err %d\n", + zone_sector, res); + } else { + SSDFS_DBG("zone: start %llu, len %llu, wp %llu, " + "type %#x, cond %#x, non_seq %#x, " + "reset %#x, capacity %llu\n", + zone.start, zone.len, zone.wp, + zone.type, zone.cond, zone.non_seq, + zone.reset, zone.capacity); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_zns_readpages() - read pages from the volume + * @sb: superblock object + * @pvec: pagevec + * @offset: offset in bytes from partition's begin + * + * This function tries to read data on @offset + * from partition's begin in memory page. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + */ +int ssdfs_zns_readpages(struct super_block *sb, struct pagevec *pvec, + loff_t offset) +{ +#ifdef CONFIG_SSDFS_DEBUG + struct blk_zone zone; + sector_t zone_sector = offset >> SECTOR_SHIFT; + int res; +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, offset %llu\n", + sb, (unsigned long long)offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_bdev_readpages(sb, pvec, offset); + +#ifdef CONFIG_SSDFS_DEBUG + res = blkdev_report_zones(sb->s_bdev, zone_sector, 1, + ssdfs_report_zone, &zone); + if (res != 1) { + SSDFS_ERR("fail to take report zone: " + "zone_sector %llu, err %d\n", + zone_sector, res); + } else { + SSDFS_DBG("zone: start %llu, len %llu, wp %llu, " + "type %#x, cond %#x, non_seq %#x, " + "reset %#x, capacity %llu\n", + zone.start, zone.len, zone.wp, + zone.type, zone.cond, zone.non_seq, + zone.reset, zone.capacity); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_zns_read() - read from volume into buffer + * @sb: superblock object + * @offset: offset in bytes from partition's begin + * @len: size of buffer in bytes + * @buf: buffer + * + * This function tries to read data on @offset + * from partition's begin with @len bytes in size + * from the volume into @buf. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + */ +int ssdfs_zns_read(struct super_block *sb, loff_t offset, + size_t len, void *buf) +{ +#ifdef CONFIG_SSDFS_DEBUG + struct blk_zone zone; + sector_t zone_sector = offset >> SECTOR_SHIFT; + int res; +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, offset %llu, len %zu, buf %p\n", + sb, (unsigned long long)offset, len, buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_bdev_read(sb, offset, len, buf); + +#ifdef CONFIG_SSDFS_DEBUG + res = blkdev_report_zones(sb->s_bdev, zone_sector, 1, + ssdfs_report_zone, &zone); + if (res != 1) { + SSDFS_ERR("fail to take report zone: " + "zone_sector %llu, err %d\n", + zone_sector, res); + } else { + SSDFS_DBG("zone: start %llu, len %llu, wp %llu, " + "type %#x, cond %#x, non_seq %#x, " + "reset %#x, capacity %llu\n", + zone.start, zone.len, zone.wp, + zone.type, zone.cond, zone.non_seq, + zone.reset, zone.capacity); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_zns_can_write_page() - check that page can be written + * @sb: superblock object + * @offset: offset in bytes from partition's begin + * @need_check: make check or not? + * + * This function checks that page can be written. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EROFS - file system in RO mode. + * %-ENOMEM - fail to allocate memory. + * %-EIO - I/O error. + */ +static int ssdfs_zns_can_write_page(struct super_block *sb, loff_t offset, + bool need_check) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + struct blk_zone zone; + sector_t zone_sector = offset >> SECTOR_SHIFT; + sector_t zone_size = fsi->erasesize >> SECTOR_SHIFT; + u64 peb_id; + loff_t zone_offset; + int res; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, offset %llu, need_check %d\n", + sb, (unsigned long long)offset, (int)need_check); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!need_check) + return 0; + + res = blkdev_report_zones(sb->s_bdev, zone_sector, 1, + ssdfs_report_zone, &zone); + if (res != 1) { + SSDFS_ERR("fail to take report zone: " + "zone_sector %llu, err %d\n", + zone_sector, res); + return res; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("zone before: start %llu, len %llu, wp %llu, " + "type %#x, cond %#x, non_seq %#x, " + "reset %#x, capacity %llu\n", + zone.start, zone.len, zone.wp, + zone.type, zone.cond, zone.non_seq, + zone.reset, zone.capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (zone.type) { + case BLK_ZONE_TYPE_CONVENTIONAL: + return ssdfs_bdev_can_write_page(sb, offset, need_check); + + default: + /* + * BLK_ZONE_TYPE_SEQWRITE_REQ + * BLK_ZONE_TYPE_SEQWRITE_PREF + * + * continue logic + */ + break; + } + + switch (zone.cond) { + case BLK_ZONE_COND_NOT_WP: + return ssdfs_bdev_can_write_page(sb, offset, need_check); + + case BLK_ZONE_COND_EMPTY: + /* can write */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("zone is empty: offset %llu\n", + offset); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + + case BLK_ZONE_COND_CLOSED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("zone is closed: offset %llu\n", + offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + peb_id = offset / fsi->erasesize; + zone_offset = peb_id * fsi->erasesize; + + err = ssdfs_zns_reopen_zone(sb, zone_offset); + if (unlikely(err)) { + SSDFS_ERR("fail to reopen zone: " + "zone_offset %llu, zone_size %llu, " + "err %d\n", + zone_offset, zone_size, err); + return err; + } + + return 0; + + case BLK_ZONE_COND_READONLY: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("zone is READ-ONLY: offset %llu\n", + offset); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EIO; + + case BLK_ZONE_COND_FULL: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("zone is full: offset %llu\n", + offset); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EIO; + + case BLK_ZONE_COND_OFFLINE: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("zone is offline: offset %llu\n", + offset); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EIO; + + default: + /* continue logic */ + break; + } + + if (zone_sector < zone.wp) { + err = -EIO; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cannot be written: " + "zone_sector %llu, zone.wp %llu\n", + zone_sector, zone.wp); +#endif /* CONFIG_SSDFS_DEBUG */ + } + +#ifdef CONFIG_SSDFS_DEBUG + res = blkdev_report_zones(sb->s_bdev, zone_sector, 1, + ssdfs_report_zone, &zone); + if (res != 1) { + SSDFS_ERR("fail to take report zone: " + "zone_sector %llu, err %d\n", + zone_sector, res); + } else { + SSDFS_DBG("zone after: start %llu, len %llu, wp %llu, " + "type %#x, cond %#x, non_seq %#x, " + "reset %#x, capacity %llu\n", + zone.start, zone.len, zone.wp, + zone.type, zone.cond, zone.non_seq, + zone.reset, zone.capacity); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_zns_writepage() - write memory page on volume + * @sb: superblock object + * @to_off: offset in bytes from partition's begin + * @page: memory page + * @from_off: offset in bytes from page's begin + * @len: size of data in bytes + * + * This function tries to write from @page data of @len size + * on @offset from partition's begin in memory page. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EROFS - file system in RO mode. + * %-EIO - I/O error. + */ +int ssdfs_zns_writepage(struct super_block *sb, loff_t to_off, + struct page *page, u32 from_off, size_t len) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + loff_t zone_start; +#ifdef CONFIG_SSDFS_DEBUG + struct blk_zone zone; + sector_t zone_sector = to_off >> SECTOR_SHIFT; + u32 remainder; + int res; +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, to_off %llu, page %p, from_off %u, len %zu\n", + sb, to_off, page, from_off, len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (sb->s_flags & SB_RDONLY) { + SSDFS_WARN("unable to write on RO file system\n"); + return -EROFS; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); + BUG_ON((to_off >= ssdfs_zns_device_size(sb)) || + (len > (ssdfs_zns_device_size(sb) - to_off))); + BUG_ON(len == 0); + div_u64_rem((u64)to_off, (u64)fsi->pagesize, &remainder); + BUG_ON(remainder); + BUG_ON((from_off + len) > PAGE_SIZE); + BUG_ON(!PageDirty(page)); + BUG_ON(PageLocked(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + atomic_inc(&fsi->pending_bios); + + zone_start = (to_off / fsi->erasesize) * fsi->erasesize; + zone_start >>= SECTOR_SHIFT; + + err = ssdfs_zns_sync_page_request(sb, page, zone_start, to_off, + REQ_OP_WRITE, REQ_SYNC); + if (err) { + SetPageError(page); + SSDFS_ERR("failed to write (err %d): offset %llu\n", + err, (unsigned long long)to_off); + } else { + ssdfs_clear_dirty_page(page); + SetPageUptodate(page); + ClearPageError(page); + } + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (atomic_dec_and_test(&fsi->pending_bios)) + wake_up_all(&zns_wq); + +#ifdef CONFIG_SSDFS_DEBUG + res = blkdev_report_zones(sb->s_bdev, zone_sector, 1, + ssdfs_report_zone, &zone); + if (res != 1) { + SSDFS_ERR("fail to take report zone: " + "zone_sector %llu, err %d\n", + zone_sector, res); + } else { + SSDFS_DBG("zone: start %llu, len %llu, wp %llu, " + "type %#x, cond %#x, non_seq %#x, " + "reset %#x, capacity %llu\n", + zone.start, zone.len, zone.wp, + zone.type, zone.cond, zone.non_seq, + zone.reset, zone.capacity); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_zns_writepages() - write pagevec on volume + * @sb: superblock object + * @to_off: offset in bytes from partition's begin + * @pvec: memory pages vector + * @from_off: offset in bytes from page's begin + * @len: size of data in bytes + * + * This function tries to write from @pvec data of @len size + * on @offset from partition's begin. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EROFS - file system in RO mode. + * %-EIO - I/O error. + */ +int ssdfs_zns_writepages(struct super_block *sb, loff_t to_off, + struct pagevec *pvec, + u32 from_off, size_t len) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + struct page *page; + loff_t zone_start; + int i; +#ifdef CONFIG_SSDFS_DEBUG + struct blk_zone zone; + sector_t zone_sector = to_off >> SECTOR_SHIFT; + u32 remainder; + int res; +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, to_off %llu, pvec %p, from_off %u, len %zu\n", + sb, to_off, pvec, from_off, len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (sb->s_flags & SB_RDONLY) { + SSDFS_WARN("unable to write on RO file system\n"); + return -EROFS; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pvec); + BUG_ON((to_off >= ssdfs_zns_device_size(sb)) || + (len > (ssdfs_zns_device_size(sb) - to_off))); + BUG_ON(len == 0); + div_u64_rem((u64)to_off, (u64)fsi->pagesize, &remainder); + BUG_ON(remainder); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pagevec_count(pvec) == 0) { + SSDFS_WARN("empty pagevec\n"); + return 0; + } + + for (i = 0; i < pagevec_count(pvec); i++) { + page = pvec->pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); + BUG_ON(!PageDirty(page)); + BUG_ON(PageLocked(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + } + + atomic_inc(&fsi->pending_bios); + + zone_start = (to_off / fsi->erasesize) * fsi->erasesize; + zone_start >>= SECTOR_SHIFT; + + err = ssdfs_zns_sync_pvec_request(sb, pvec, zone_start, to_off, + REQ_OP_WRITE, REQ_SYNC); + + for (i = 0; i < pagevec_count(pvec); i++) { + page = pvec->pages[i]; + + if (err) { + SetPageError(page); + SSDFS_ERR("failed to write (err %d): " + "page_index %llu\n", + err, + (unsigned long long)page_index(page)); + } else { + ssdfs_clear_dirty_page(page); + SetPageUptodate(page); + ClearPageError(page); + } + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + if (atomic_dec_and_test(&fsi->pending_bios)) + wake_up_all(&zns_wq); + +#ifdef CONFIG_SSDFS_DEBUG + res = blkdev_report_zones(sb->s_bdev, zone_sector, 1, + ssdfs_report_zone, &zone); + if (res != 1) { + SSDFS_ERR("fail to take report zone: " + "zone_sector %llu, err %d\n", + zone_sector, res); + } else { + SSDFS_DBG("zone: start %llu, len %llu, wp %llu, " + "type %#x, cond %#x, non_seq %#x, " + "reset %#x, capacity %llu\n", + zone.start, zone.len, zone.wp, + zone.type, zone.cond, zone.non_seq, + zone.reset, zone.capacity); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_zns_trim() - initiate background erase operation + * @sb: superblock object + * @offset: offset in bytes from partition's begin + * @len: size in bytes + * + * This function tries to initiate background erase operation. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EROFS - file system in RO mode. + * %-EFAULT - erase operation error. + */ +static int ssdfs_zns_trim(struct super_block *sb, loff_t offset, size_t len) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + u32 erase_size = fsi->erasesize; + loff_t page_start, page_end; + u32 pages_count; + u32 remainder; + sector_t start_sector; + sector_t sectors_count; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p, offset %llu, len %zu\n", + sb, (unsigned long long)offset, len); + + div_u64_rem((u64)len, (u64)erase_size, &remainder); + BUG_ON(remainder); + div_u64_rem((u64)offset, (u64)erase_size, &remainder); + BUG_ON(remainder); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (sb->s_flags & SB_RDONLY) + return -EROFS; + + div_u64_rem((u64)len, (u64)erase_size, &remainder); + if (remainder) { + SSDFS_WARN("len %llu, erase_size %u, remainder %u\n", + (unsigned long long)len, + erase_size, remainder); + return -ERANGE; + } + + page_start = offset >> PAGE_SHIFT; + page_end = (offset + len + PAGE_SIZE - 1) >> PAGE_SHIFT; + pages_count = (u32)(page_end - page_start); + + if (pages_count == 0) { + SSDFS_WARN("pages_count equals to zero\n"); + return -ERANGE; + } + + start_sector = offset >> SECTOR_SHIFT; + sectors_count = fsi->erasesize >> SECTOR_SHIFT; + + err = blkdev_zone_mgmt(sb->s_bdev, REQ_OP_ZONE_RESET, + start_sector, sectors_count, GFP_NOFS); + if (unlikely(err)) { + SSDFS_ERR("fail to reset zone: " + "zone_sector %llu, zone_size %llu, err %d\n", + start_sector, sectors_count, err); + return err; + } + + return 0; +} + +/* + * ssdfs_zns_peb_isbad() - check that PEB is bad + * @sb: superblock object + * @offset: offset in bytes from partition's begin + * + * This function tries to detect that PEB is bad or not. + */ +static int ssdfs_zns_peb_isbad(struct super_block *sb, loff_t offset) +{ + /* do nothing */ + return 0; +} + +/* + * ssdfs_zns_mark_peb_bad() - mark PEB as bad + * @sb: superblock object + * @offset: offset in bytes from partition's begin + * + * This function tries to mark PEB as bad. + */ +int ssdfs_zns_mark_peb_bad(struct super_block *sb, loff_t offset) +{ + /* do nothing */ + return 0; +} + +/* + * ssdfs_zns_sync() - make sync operation + * @sb: superblock object + */ +static void ssdfs_zns_sync(struct super_block *sb) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("device %s\n", sb->s_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + wait_event(zns_wq, atomic_read(&fsi->pending_bios) == 0); +} + +const struct ssdfs_device_ops ssdfs_zns_devops = { + .device_name = ssdfs_zns_device_name, + .device_size = ssdfs_zns_device_size, + .open_zone = ssdfs_zns_open_zone, + .reopen_zone = ssdfs_zns_reopen_zone, + .close_zone = ssdfs_zns_close_zone, + .read = ssdfs_zns_read, + .readpage = ssdfs_zns_readpage, + .readpages = ssdfs_zns_readpages, + .can_write_page = ssdfs_zns_can_write_page, + .writepage = ssdfs_zns_writepage, + .writepages = ssdfs_zns_writepages, + .erase = ssdfs_zns_trim, + .trim = ssdfs_zns_trim, + .peb_isbad = ssdfs_zns_peb_isbad, + .mark_peb_bad = ssdfs_zns_mark_peb_bad, + .sync = ssdfs_zns_sync, +}; From patchwork Sat Feb 25 01:08:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151908 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 644E9C7EE2E for ; Sat, 25 Feb 2023 01:16:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229665AbjBYBQB (ORCPT ); Fri, 24 Feb 2023 20:16:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48398 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229636AbjBYBPx (ORCPT ); Fri, 24 Feb 2023 20:15:53 -0500 Received: from mail-oi1-x233.google.com (mail-oi1-x233.google.com [IPv6:2607:f8b0:4864:20::233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11EA5126FB for ; Fri, 24 Feb 2023 17:15:46 -0800 (PST) Received: by mail-oi1-x233.google.com with SMTP id bi17so820486oib.3 for ; Fri, 24 Feb 2023 17:15:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ccnAJAh0dcfidseCDgjxq/IXJ8B72T0PFubL58Ayp1A=; b=U9oY2egA1mIiP10J/fzkO0nYtSWcWaQRfKluneGuNm8rnyPWjwMi0ZtnFYMnot2Rez 6lKYMxJvIkm93JUuwdlPvUG9xL7I9x1maiWL1GEYeX3zwIiDmxXKjyoiPH+XgTzg+q9n Hcwvn83jtIm+xuLzHp6grSqWrZLvvALr4wnM55jl95C5VrbLXBq7C5YdqVRK2VQUXMUr 38st56V56Q+6MunBjkJNCAM93itYzQTapFPy1flcFTm7tx4pC7RnNqladaoaqKadiqyp W+DmbtVoBnVS1tpmlaUEZCjzurKooy2htGNSrf5QXRL66CvcrbHIJZWzn1TXLA2a/uS9 K91A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ccnAJAh0dcfidseCDgjxq/IXJ8B72T0PFubL58Ayp1A=; b=eKe9WYRFSsU0tGN1+7Jp0Snu6gJyPyD0DP8Z75gAUvKUvwTyKgkefPH97ECFs+//bO QGxzLR8zfEYReoAAJp+Q9fFmOh+1BhOgZT6vGEMIenS2CQkfZXYACZO3LPipvoPk0et0 oPg7YhyI5CDHTrOaaju71p4cGhBm6Z7nyLZCBPnRA8d0B8ldsAM3jRMjnZN1RGlacu8d Ztmvns/hs2v44XeGuoL/FMFFFpMZW9UeSmiglZPnog1L23zvbO9CLHPrBi08W8AknveW M8iqQNhLN9I3iLjhjSDq2Z0T0aC8mIyPcrh5jBHkcXHMpFiRDcSIAsAcNafjrbv+rntG 0QZA== X-Gm-Message-State: AO0yUKUiTyfb/xFkPJf5+Sa6oxpF2w0LeX2LqO+lJORYcIsU0GtrYYBM +JBm3+UTDg8doTkrgk1c3bwXa6sJ+wl+E227 X-Google-Smtp-Source: AK7set/yjgUlUJKej5XQV+nMrlTx/OCMkCv8CYo4+Ezsj8mVLU6C5wcjoUtaHyLjPxCB9D9Gmro+sQ== X-Received: by 2002:aca:180a:0:b0:37f:cc3d:a342 with SMTP id h10-20020aca180a000000b0037fcc3da342mr3671694oih.1.1677287742834; Fri, 24 Feb 2023 17:15:42 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.15.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:15:41 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 04/76] ssdfs: implement super operations Date: Fri, 24 Feb 2023 17:08:15 -0800 Message-Id: <20230225010927.813929-5-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Patch implements register/unregister file system logic. The register FS logic includes caches creation/initialization, compression support initialization, sysfs subsystem initialization. Oppositely, unregister FS logic executes destruction of caches, compression subsystem, and sysfs entries. Also, patch implements basic mount/unmount logic. The ssdfs_fill_super() implements mount logic that includes: (1) parsing mount options, (2) extract superblock info, (3) create key in-core metadata structures (mapping table, segment bitmap, b-trees), (4) create root inode, (5) start metadata structures' threads, (6) commit superblock on finish of mount operation. The ssdfs_put_super() implements unmount logic: (1) stop metadata threads, (2) wait unfinished user data requests, (3) flush dirty metadata structures, (4) commit superblock, (5) destroy in-core metadata structures. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/fs_error.c | 257 ++++++ fs/ssdfs/options.c | 190 +++++ fs/ssdfs/readwrite.c | 651 +++++++++++++++ fs/ssdfs/super.c | 1844 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 2942 insertions(+) create mode 100644 fs/ssdfs/fs_error.c create mode 100644 fs/ssdfs/options.c create mode 100644 fs/ssdfs/readwrite.c create mode 100644 fs/ssdfs/super.c diff --git a/fs/ssdfs/fs_error.c b/fs/ssdfs/fs_error.c new file mode 100644 index 000000000000..452ace18272d --- /dev/null +++ b/fs/ssdfs/fs_error.c @@ -0,0 +1,257 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/fs_error.c - logic for the case of file system errors detection. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_fs_error_page_leaks; +atomic64_t ssdfs_fs_error_memory_leaks; +atomic64_t ssdfs_fs_error_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_fs_error_cache_leaks_increment(void *kaddr) + * void ssdfs_fs_error_cache_leaks_decrement(void *kaddr) + * void *ssdfs_fs_error_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_fs_error_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_fs_error_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_fs_error_kfree(void *kaddr) + * struct page *ssdfs_fs_error_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_fs_error_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_fs_error_free_page(struct page *page) + * void ssdfs_fs_error_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(fs_error) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(fs_error) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_fs_error_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_fs_error_page_leaks, 0); + atomic64_set(&ssdfs_fs_error_memory_leaks, 0); + atomic64_set(&ssdfs_fs_error_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_fs_error_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_fs_error_page_leaks) != 0) { + SSDFS_ERR("FS ERROR: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_fs_error_page_leaks)); + } + + if (atomic64_read(&ssdfs_fs_error_memory_leaks) != 0) { + SSDFS_ERR("FS ERROR: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_fs_error_memory_leaks)); + } + + if (atomic64_read(&ssdfs_fs_error_cache_leaks) != 0) { + SSDFS_ERR("FS ERROR: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_fs_error_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +static void ssdfs_handle_error(struct super_block *sb) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + + if (sb->s_flags & SB_RDONLY) + return; + + spin_lock(&fsi->volume_state_lock); + fsi->fs_state = SSDFS_ERROR_FS; + spin_unlock(&fsi->volume_state_lock); + + if (ssdfs_test_opt(fsi->mount_opts, ERRORS_PANIC)) { + panic("SSDFS (device %s): panic forced after error\n", + fsi->devops->device_name(sb)); + } else if (ssdfs_test_opt(fsi->mount_opts, ERRORS_RO)) { + SSDFS_CRIT("Remounting filesystem read-only\n"); + /* + * Make sure updated value of ->s_mount_flags will be visible + * before ->s_flags update + */ + smp_wmb(); + sb->s_flags |= SB_RDONLY; + } +} + +void ssdfs_fs_error(struct super_block *sb, const char *file, + const char *function, unsigned int line, + const char *fmt, ...) +{ + struct va_format vaf; + va_list args; + + va_start(args, fmt); + vaf.fmt = fmt; + vaf.va = &args; + pr_crit("SSDFS error (device %s): pid %d:%s:%d %s(): comm %s: %pV", + SSDFS_FS_I(sb)->devops->device_name(sb), current->pid, + file, line, function, current->comm, &vaf); + va_end(args); + + ssdfs_handle_error(sb); +} + +int ssdfs_set_page_dirty(struct page *page) +{ + struct address_space *mapping = page->mapping; + unsigned long flags; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page_index: %llu, mapping %p\n", + (u64)page_index(page), mapping); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!PageLocked(page)) { + SSDFS_WARN("page isn't locked: " + "page_index %llu, mapping %p\n", + (u64)page_index(page), mapping); + return -ERANGE; + } + + SetPageDirty(page); + + if (mapping) { + xa_lock_irqsave(&mapping->i_pages, flags); + __xa_set_mark(&mapping->i_pages, page_index(page), + PAGECACHE_TAG_DIRTY); + xa_unlock_irqrestore(&mapping->i_pages, flags); + } + + return 0; +} + +int __ssdfs_clear_dirty_page(struct page *page) +{ + struct address_space *mapping = page->mapping; + unsigned long flags; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page_index: %llu, mapping %p\n", + (u64)page_index(page), mapping); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!PageLocked(page)) { + SSDFS_WARN("page isn't locked: " + "page_index %llu, mapping %p\n", + (u64)page_index(page), mapping); + return -ERANGE; + } + + if (mapping) { + xa_lock_irqsave(&mapping->i_pages, flags); + if (test_bit(PG_dirty, &page->flags)) { + __xa_clear_mark(&mapping->i_pages, + page_index(page), + PAGECACHE_TAG_DIRTY); + } + xa_unlock_irqrestore(&mapping->i_pages, flags); + } + + TestClearPageDirty(page); + + return 0; +} + +int ssdfs_clear_dirty_page(struct page *page) +{ + struct address_space *mapping = page->mapping; + unsigned long flags; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page_index: %llu, mapping %p\n", + (u64)page_index(page), mapping); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!PageLocked(page)) { + SSDFS_WARN("page isn't locked: " + "page_index %llu, mapping %p\n", + (u64)page_index(page), mapping); + return -ERANGE; + } + + if (mapping) { + xa_lock_irqsave(&mapping->i_pages, flags); + if (test_bit(PG_dirty, &page->flags)) { + __xa_clear_mark(&mapping->i_pages, + page_index(page), + PAGECACHE_TAG_DIRTY); + xa_unlock_irqrestore(&mapping->i_pages, flags); + return clear_page_dirty_for_io(page); + } + xa_unlock_irqrestore(&mapping->i_pages, flags); + return 0; + } + + TestClearPageDirty(page); + + return 0; +} + +/* + * ssdfs_clear_dirty_pages - discard dirty pages in address space + * @mapping: address space with dirty pages for discarding + */ +void ssdfs_clear_dirty_pages(struct address_space *mapping) +{ + struct pagevec pvec; + unsigned int i; + pgoff_t index = 0; + int err; + + pagevec_init(&pvec); + + while (pagevec_lookup_tag(&pvec, mapping, &index, + PAGECACHE_TAG_DIRTY)) { + for (i = 0; i < pagevec_count(&pvec); i++) { + struct page *page = pvec.pages[i]; + + ssdfs_lock_page(page); + err = ssdfs_clear_dirty_page(page); + ssdfs_unlock_page(page); + + if (unlikely(err)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fail clear page dirty: " + "page_index %llu\n", + (u64)page_index(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + } + ssdfs_fs_error_pagevec_release(&pvec); + cond_resched(); + } +} diff --git a/fs/ssdfs/options.c b/fs/ssdfs/options.c new file mode 100644 index 000000000000..e36870868c08 --- /dev/null +++ b/fs/ssdfs/options.c @@ -0,0 +1,190 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/options.c - mount options parsing. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "segment_bitmap.h" + +/* + * SSDFS mount options. + * + * Opt_compr: change default compressor + * Opt_fs_err_panic: panic if fs error is detected + * Opt_fs_err_ro: remount in RO state if fs error is detected + * Opt_fs_err_cont: continue execution if fs error is detected + * Opt_ignore_fs_state: ignore on-disk file system state during mount + * Opt_err: just end of array marker + */ +enum { + Opt_compr, + Opt_fs_err_panic, + Opt_fs_err_ro, + Opt_fs_err_cont, + Opt_ignore_fs_state, + Opt_err, +}; + +static const match_table_t tokens = { + {Opt_compr, "compr=%s"}, + {Opt_fs_err_panic, "errors=panic"}, + {Opt_fs_err_ro, "errors=remount-ro"}, + {Opt_fs_err_cont, "errors=continue"}, + {Opt_ignore_fs_state, "fs_state=ignore"}, + {Opt_err, NULL}, +}; + +int ssdfs_parse_options(struct ssdfs_fs_info *fs_info, char *data) +{ + substring_t args[MAX_OPT_ARGS]; + char *p, *name; + + if (!data) + return 0; + + while ((p = strsep(&data, ","))) { + int token; + + if (!*p) + continue; + + token = match_token(p, tokens, args); + switch (token) { + case Opt_compr: + name = match_strdup(&args[0]); + + if (!name) + return -ENOMEM; + if (!strcmp(name, "none")) + ssdfs_set_opt(fs_info->mount_opts, + COMPR_MODE_NONE); +#ifdef CONFIG_SSDFS_ZLIB + else if (!strcmp(name, "zlib")) + ssdfs_set_opt(fs_info->mount_opts, + COMPR_MODE_ZLIB); +#endif +#ifdef CONFIG_SSDFS_LZO + else if (!strcmp(name, "lzo")) + ssdfs_set_opt(fs_info->mount_opts, + COMPR_MODE_LZO); +#endif + else { + SSDFS_ERR("unknown compressor %s\n", name); + ssdfs_kfree(name); + return -EINVAL; + } + ssdfs_kfree(name); + break; + + case Opt_fs_err_panic: + /* Clear possible default initialization */ + ssdfs_clear_opt(fs_info->mount_opts, ERRORS_RO); + ssdfs_clear_opt(fs_info->mount_opts, ERRORS_CONT); + ssdfs_set_opt(fs_info->mount_opts, ERRORS_PANIC); + break; + + case Opt_fs_err_ro: + /* Clear possible default initialization */ + ssdfs_clear_opt(fs_info->mount_opts, ERRORS_PANIC); + ssdfs_clear_opt(fs_info->mount_opts, ERRORS_CONT); + ssdfs_set_opt(fs_info->mount_opts, ERRORS_RO); + break; + + case Opt_fs_err_cont: + /* Clear possible default initialization */ + ssdfs_clear_opt(fs_info->mount_opts, ERRORS_PANIC); + ssdfs_clear_opt(fs_info->mount_opts, ERRORS_RO); + ssdfs_set_opt(fs_info->mount_opts, ERRORS_CONT); + break; + + case Opt_ignore_fs_state: + ssdfs_set_opt(fs_info->mount_opts, IGNORE_FS_STATE); + break; + + default: + SSDFS_ERR("unrecognized mount option '%s'\n", p); + return -EINVAL; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("DONE: parse options\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +void ssdfs_initialize_fs_errors_option(struct ssdfs_fs_info *fsi) +{ + if (fsi->fs_errors == SSDFS_ERRORS_PANIC) + ssdfs_set_opt(fsi->mount_opts, ERRORS_PANIC); + else if (fsi->fs_errors == SSDFS_ERRORS_RO) + ssdfs_set_opt(fsi->mount_opts, ERRORS_RO); + else if (fsi->fs_errors == SSDFS_ERRORS_CONTINUE) + ssdfs_set_opt(fsi->mount_opts, ERRORS_CONT); + else { + u16 def_behaviour = SSDFS_ERRORS_DEFAULT; + + switch (def_behaviour) { + case SSDFS_ERRORS_PANIC: + ssdfs_set_opt(fsi->mount_opts, ERRORS_PANIC); + break; + + case SSDFS_ERRORS_RO: + ssdfs_set_opt(fsi->mount_opts, ERRORS_RO); + break; + } + } +} + +int ssdfs_show_options(struct seq_file *seq, struct dentry *root) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(root->d_sb); + char *compress_type; + + if (ssdfs_test_opt(fsi->mount_opts, COMPR_MODE_ZLIB)) { + compress_type = "zlib"; + seq_printf(seq, ",compress=%s", compress_type); + } else if (ssdfs_test_opt(fsi->mount_opts, COMPR_MODE_LZO)) { + compress_type = "lzo"; + seq_printf(seq, ",compress=%s", compress_type); + } + + if (ssdfs_test_opt(fsi->mount_opts, ERRORS_PANIC)) + seq_puts(seq, ",errors=panic"); + else if (ssdfs_test_opt(fsi->mount_opts, ERRORS_RO)) + seq_puts(seq, ",errors=remount-ro"); + else if (ssdfs_test_opt(fsi->mount_opts, ERRORS_CONT)) + seq_puts(seq, ",errors=continue"); + + if (ssdfs_test_opt(fsi->mount_opts, IGNORE_FS_STATE)) + seq_puts(seq, ",fs_state=ignore"); + + return 0; +} diff --git a/fs/ssdfs/readwrite.c b/fs/ssdfs/readwrite.c new file mode 100644 index 000000000000..b47cef995e4b --- /dev/null +++ b/fs/ssdfs/readwrite.c @@ -0,0 +1,651 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/readwrite.c - read/write primitive operations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" + +#include + +/* + * ssdfs_read_page_from_volume() - read page from volume + * @fsi: pointer on shared file system object + * @peb_id: PEB identification number + * @bytes_off: offset from PEB's begining in bytes + * @page: memory page + * + * This function tries to read page from the volume. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-EIO - I/O error. + */ +int ssdfs_read_page_from_volume(struct ssdfs_fs_info *fsi, + u64 peb_id, u32 bytes_off, + struct page *page) +{ + struct super_block *sb; + loff_t offset; + u32 peb_size; + u32 pagesize; + u32 pages_per_peb; + u32 pages_off; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !page); + BUG_ON(!fsi->devops->readpage); + + SSDFS_DBG("fsi %p, peb_id %llu, bytes_off %u, page %p\n", + fsi, peb_id, bytes_off, page); +#endif /* CONFIG_SSDFS_DEBUG */ + + sb = fsi->sb; + pagesize = fsi->pagesize; + pages_per_peb = fsi->pages_per_peb; + pages_off = bytes_off / pagesize; + + if (pages_per_peb >= (U32_MAX / pagesize)) { + SSDFS_ERR("pages_per_peb %u >= U32_MAX / pagesize %u\n", + pages_per_peb, pagesize); + return -EINVAL; + } + + peb_size = pages_per_peb * pagesize; + + if (peb_id >= div_u64(ULLONG_MAX, peb_size)) { + SSDFS_ERR("peb_id %llu >= ULLONG_MAX / peb_size %u\n", + peb_id, peb_size); + return -EINVAL; + } + + offset = peb_id * peb_size; + + if (pages_off >= pages_per_peb) { + SSDFS_ERR("pages_off %u >= pages_per_peb %u\n", + pages_off, pages_per_peb); + return -EINVAL; + } + + if (pages_off >= (U32_MAX / pagesize)) { + SSDFS_ERR("pages_off %u >= U32_MAX / pagesize %u\n", + pages_off, fsi->pagesize); + return -EINVAL; + } + + offset += bytes_off; + + if (fsi->devops->peb_isbad) { + err = fsi->devops->peb_isbad(sb, offset); + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("offset %llu is in bad PEB: err %d\n", + (unsigned long long)offset, err); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EIO; + } + } + + err = fsi->devops->readpage(sb, page, offset); + if (unlikely(err)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fail to read page: offset %llu, err %d\n", + (unsigned long long)offset, err); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EIO; + } + + return 0; +} + +/* + * ssdfs_read_pagevec_from_volume() - read pagevec from volume + * @fsi: pointer on shared file system object + * @peb_id: PEB identification number + * @bytes_off: offset from PEB's begining in bytes + * @pvec: pagevec [in|out] + * + * This function tries to read pages from the volume. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-EIO - I/O error. + */ +int ssdfs_read_pagevec_from_volume(struct ssdfs_fs_info *fsi, + u64 peb_id, u32 bytes_off, + struct pagevec *pvec) +{ + struct super_block *sb; + loff_t offset; + u32 peb_size; + u32 pagesize; + u32 pages_per_peb; + u32 pages_off; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !pvec); + BUG_ON(!fsi->devops->readpages); + + SSDFS_DBG("fsi %p, peb_id %llu, bytes_off %u, pvec %p\n", + fsi, peb_id, bytes_off, pvec); +#endif /* CONFIG_SSDFS_DEBUG */ + + sb = fsi->sb; + pagesize = fsi->pagesize; + pages_per_peb = fsi->pages_per_peb; + pages_off = bytes_off / pagesize; + + if (pages_per_peb >= (U32_MAX / pagesize)) { + SSDFS_ERR("pages_per_peb %u >= U32_MAX / pagesize %u\n", + pages_per_peb, pagesize); + return -EINVAL; + } + + peb_size = pages_per_peb * pagesize; + + if (peb_id >= div_u64(ULLONG_MAX, peb_size)) { + SSDFS_ERR("peb_id %llu >= ULLONG_MAX / peb_size %u\n", + peb_id, peb_size); + return -EINVAL; + } + + offset = peb_id * peb_size; + + if (pages_off >= pages_per_peb) { + SSDFS_ERR("pages_off %u >= pages_per_peb %u\n", + pages_off, pages_per_peb); + return -EINVAL; + } + + if (pages_off >= (U32_MAX / pagesize)) { + SSDFS_ERR("pages_off %u >= U32_MAX / pagesize %u\n", + pages_off, fsi->pagesize); + return -EINVAL; + } + + offset += bytes_off; + + if (fsi->devops->peb_isbad) { + err = fsi->devops->peb_isbad(sb, offset); + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("offset %llu is in bad PEB: err %d\n", + (unsigned long long)offset, err); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EIO; + } + } + + err = fsi->devops->readpages(sb, pvec, offset); + if (unlikely(err)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fail to read pvec: offset %llu, err %d\n", + (unsigned long long)offset, err); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EIO; + } + + return 0; +} + +/* + * ssdfs_aligned_read_buffer() - aligned read from volume into buffer + * @fsi: pointer on shared file system object + * @peb_id: PEB identification number + * @bytes_off: offset from PEB's begining in bytes + * @buf: buffer + * @size: buffer size + * @read_bytes: really read bytes + * + * This function tries to read in buffer by means of page aligned + * request. It reads part of requested data in the case of unaligned + * request. The @read_bytes returns value of really read data. + * + * RETURN: + * [success] - buffer contains data of @read_bytes in size. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-EIO - I/O error. + */ +int ssdfs_aligned_read_buffer(struct ssdfs_fs_info *fsi, + u64 peb_id, u32 bytes_off, + void *buf, size_t size, + size_t *read_bytes) +{ + struct super_block *sb; + loff_t offset; + u32 peb_size; + u32 pagesize; + u32 pages_per_peb; + u32 pages_off; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !buf); + BUG_ON(!fsi->devops->read); + + SSDFS_DBG("fsi %p, peb_id %llu, bytes_off %u, buf %p, size %zu\n", + fsi, peb_id, bytes_off, buf, size); +#endif /* CONFIG_SSDFS_DEBUG */ + + sb = fsi->sb; + pagesize = fsi->pagesize; + pages_per_peb = fsi->pages_per_peb; + pages_off = bytes_off / pagesize; + + if (pages_per_peb >= (U32_MAX / pagesize)) { + SSDFS_ERR("pages_per_peb %u >= U32_MAX / pagesize %u\n", + pages_per_peb, pagesize); + return -EINVAL; + } + + peb_size = pages_per_peb * pagesize; + + if (peb_id >= div_u64(ULLONG_MAX, peb_size)) { + SSDFS_ERR("peb_id %llu >= ULLONG_MAX / peb_size %u\n", + peb_id, peb_size); + return -EINVAL; + } + + offset = peb_id * peb_size; + + if (pages_off >= pages_per_peb) { + SSDFS_ERR("pages_off %u >= pages_per_peb %u\n", + pages_off, pages_per_peb); + return -EINVAL; + } + + if (pages_off >= (U32_MAX / pagesize)) { + SSDFS_ERR("pages_off %u >= U32_MAX / pagesize %u\n", + pages_off, fsi->pagesize); + return -EINVAL; + } + + if (size > pagesize) { + SSDFS_ERR("size %zu > pagesize %u\n", + size, fsi->pagesize); + return -EINVAL; + } + + offset += bytes_off; + + *read_bytes = ((pages_off + 1) * pagesize) - bytes_off; + *read_bytes = min_t(size_t, *read_bytes, size); + + if (fsi->devops->peb_isbad) { + err = fsi->devops->peb_isbad(sb, offset); + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("offset %llu is in bad PEB: err %d\n", + (unsigned long long)offset, err); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EIO; + } + } + + err = fsi->devops->read(sb, offset, *read_bytes, buf); + if (unlikely(err)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fail to read from offset %llu, size %zu, err %d\n", + (unsigned long long)offset, *read_bytes, err); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EIO; + } + + return 0; +} + +/* + * ssdfs_unaligned_read_buffer() - unaligned read from volume into buffer + * @fsi: pointer on shared file system object + * @peb_id: PEB identification number + * @bytes_off: offset from PEB's begining in bytes + * @buf: buffer + * @size: buffer size + * + * This function tries to read in buffer by means of page unaligned + * request. + * + * RETURN: + * [success] - buffer contains data of @size in bytes. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-EIO - I/O error. + */ +int ssdfs_unaligned_read_buffer(struct ssdfs_fs_info *fsi, + u64 peb_id, u32 bytes_off, + void *buf, size_t size) +{ + size_t read_bytes = 0; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !buf); + BUG_ON(!fsi->devops->read); + + SSDFS_DBG("fsi %p, peb_id %llu, bytes_off %u, buf %p, size %zu\n", + fsi, peb_id, bytes_off, buf, size); +#endif /* CONFIG_SSDFS_DEBUG */ + + do { + size_t iter_size = size - read_bytes; + size_t iter_read_bytes; + + err = ssdfs_aligned_read_buffer(fsi, peb_id, + bytes_off + read_bytes, + buf + read_bytes, + iter_size, + &iter_read_bytes); + if (err) { + SSDFS_ERR("fail to read from peb_id %llu, offset %zu, " + "size %zu, err %d\n", + peb_id, (size_t)(bytes_off + read_bytes), + iter_size, err); + return err; + } + + read_bytes += iter_read_bytes; + } while (read_bytes < size); + + return 0; +} + +/* + * ssdfs_can_write_sb_log() - check that superblock log can be written + * @sb: pointer on superblock object + * @sb_log: superblock log's extent + * + * This function checks that superblock log can be written + * successfully. + * + * RETURN: + * [success] - superblock log can be written successfully. + * [failure] - error code: + * + * %-ERANGE - invalid extent. + */ +int ssdfs_can_write_sb_log(struct super_block *sb, + struct ssdfs_peb_extent *sb_log) +{ + struct ssdfs_fs_info *fsi; + u64 cur_peb; + u32 page_offset; + u32 log_size; + loff_t byte_off; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!sb || !sb_log); + + SSDFS_DBG("leb_id %llu, peb_id %llu, " + "page_offset %u, pages_count %u\n", + sb_log->leb_id, sb_log->peb_id, + sb_log->page_offset, sb_log->pages_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = SSDFS_FS_I(sb); + + if (!fsi->devops->can_write_page) + return 0; + + cur_peb = sb_log->peb_id; + page_offset = sb_log->page_offset; + log_size = sb_log->pages_count; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_peb %llu, page_offset %u, " + "log_size %u, pages_per_peb %u\n", + cur_peb, page_offset, + log_size, fsi->pages_per_peb); + + if (log_size > fsi->pages_per_seg) { + SSDFS_ERR("log_size value %u is too big\n", + log_size); + return -ERANGE; + } + + if (cur_peb > div_u64(ULLONG_MAX, fsi->pages_per_seg)) { + SSDFS_ERR("cur_peb value %llu is too big\n", + cur_peb); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + byte_off = cur_peb * fsi->pages_per_peb; + +#ifdef CONFIG_SSDFS_DEBUG + if (byte_off > div_u64(ULLONG_MAX, fsi->pagesize)) { + SSDFS_ERR("byte_off value %llu is too big\n", + byte_off); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + byte_off *= fsi->pagesize; + +#ifdef CONFIG_SSDFS_DEBUG + if ((u64)page_offset > div_u64(ULLONG_MAX, fsi->pagesize)) { + SSDFS_ERR("page_offset value %u is too big\n", + page_offset); + return -ERANGE; + } + + if (byte_off > (ULLONG_MAX - ((u64)page_offset * fsi->pagesize))) { + SSDFS_ERR("byte_off value %llu is too big\n", + byte_off); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + byte_off += (u64)page_offset * fsi->pagesize; + + for (i = 0; i < log_size; i++) { +#ifdef CONFIG_SSDFS_DEBUG + if (byte_off > (ULLONG_MAX - (i * fsi->pagesize))) { + SSDFS_ERR("offset value %llu is too big\n", + byte_off); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + err = fsi->devops->can_write_page(sb, byte_off, true); + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page can't be written: err %d\n", err); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } + + byte_off += fsi->pagesize; + } + + return 0; +} + +int ssdfs_unaligned_read_pagevec(struct pagevec *pvec, + u32 offset, u32 size, + void *buf) +{ + struct page *page; + u32 page_off; + u32 bytes_off; + size_t read_bytes = 0; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pvec || !buf); + + SSDFS_DBG("pvec %p, offset %u, size %u, buf %p\n", + pvec, offset, size, buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + do { + size_t iter_read_bytes; + size_t cur_off; + + bytes_off = offset + read_bytes; + page_off = bytes_off / PAGE_SIZE; + cur_off = bytes_off % PAGE_SIZE; + + iter_read_bytes = min_t(size_t, + (size_t)(size - read_bytes), + (size_t)(PAGE_SIZE - cur_off)); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page_off %u, cur_off %zu, " + "iter_read_bytes %zu\n", + page_off, cur_off, + iter_read_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (page_off >= pagevec_count(pvec)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page out of range: index %u: " + "offset %zu, pagevec_count %u\n", + page_off, cur_off, + pagevec_count(pvec)); +#endif /* CONFIG_SSDFS_DEBUG */ + return -E2BIG; + } + + page = pvec->pages[page_off]; + + ssdfs_lock_page(page); + err = ssdfs_memcpy_from_page(buf, read_bytes, size, + page, cur_off, PAGE_SIZE, + iter_read_bytes); + ssdfs_unlock_page(page); + + if (unlikely(err)) { + SSDFS_ERR("fail to copy: " + "read_bytes %zu, offset %zu, " + "iter_read_bytes %zu, err %d\n", + read_bytes, cur_off, + iter_read_bytes, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + read_bytes += iter_read_bytes; + } while (read_bytes < size); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("BUF DUMP\n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + buf, size); + SSDFS_DBG("\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +int ssdfs_unaligned_write_pagevec(struct pagevec *pvec, + u32 offset, u32 size, + void *buf) +{ + struct page *page; + u32 page_off; + u32 bytes_off; + size_t written_bytes = 0; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pvec || !buf); + + SSDFS_DBG("pvec %p, offset %u, size %u, buf %p\n", + pvec, offset, size, buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + do { + size_t iter_write_bytes; + size_t cur_off; + + bytes_off = offset + written_bytes; + page_off = bytes_off / PAGE_SIZE; + cur_off = bytes_off % PAGE_SIZE; + + iter_write_bytes = min_t(size_t, + (size_t)(size - written_bytes), + (size_t)(PAGE_SIZE - cur_off)); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("bytes_off %u, page_off %u, " + "cur_off %zu, written_bytes %zu, " + "iter_write_bytes %zu\n", + bytes_off, page_off, cur_off, + written_bytes, iter_write_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (page_off >= pagevec_count(pvec)) { + SSDFS_ERR("invalid page index %u: " + "offset %zu, pagevec_count %u\n", + page_off, cur_off, + pagevec_count(pvec)); + return -EINVAL; + } + + page = pvec->pages[page_off]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); + WARN_ON(!PageLocked(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_memcpy_to_page(page, cur_off, PAGE_SIZE, + buf, written_bytes, size, + iter_write_bytes); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: " + "written_bytes %zu, offset %zu, " + "iter_write_bytes %zu, err %d\n", + written_bytes, cur_off, + iter_write_bytes, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + written_bytes += iter_write_bytes; + } while (written_bytes < size); + + return 0; +} diff --git a/fs/ssdfs/super.c b/fs/ssdfs/super.c new file mode 100644 index 000000000000..a3b144e6eafb --- /dev/null +++ b/fs/ssdfs/super.c @@ -0,0 +1,1844 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/super.c - module and superblock management. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "version.h" +#include "segment_bitmap.h" +#include "offset_translation_table.h" +#include "page_array.h" +#include "page_vector.h" +#include "peb_container.h" +#include "segment.h" +#include "segment_tree.h" +#include "current_segment.h" +#include "peb_mapping_table.h" +#include "extents_queue.h" +#include "btree_search.h" +#include "btree_node.h" +#include "btree.h" +#include "inodes_tree.h" +#include "shared_extents_tree.h" +#include "shared_dictionary.h" +#include "extents_tree.h" +#include "dentries_tree.h" +#include "xattr_tree.h" +#include "xattr.h" +#include "acl.h" +#include "snapshots_tree.h" +#include "invalidated_extents_tree.h" + +#define CREATE_TRACE_POINTS +#include + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_allocated_pages; +atomic64_t ssdfs_memory_leaks; +atomic64_t ssdfs_super_page_leaks; +atomic64_t ssdfs_super_memory_leaks; +atomic64_t ssdfs_super_cache_leaks; + +atomic64_t ssdfs_locked_pages; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_super_cache_leaks_increment(void *kaddr) + * void ssdfs_super_cache_leaks_decrement(void *kaddr) + * void *ssdfs_super_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_super_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_super_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_super_kfree(void *kaddr) + * struct page *ssdfs_super_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_super_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_super_free_page(struct page *page) + * void ssdfs_super_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(super) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(super) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_super_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_super_page_leaks, 0); + atomic64_set(&ssdfs_super_memory_leaks, 0); + atomic64_set(&ssdfs_super_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_super_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_super_page_leaks) != 0) { + SSDFS_ERR("SUPER: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_super_page_leaks)); + } + + if (atomic64_read(&ssdfs_super_memory_leaks) != 0) { + SSDFS_ERR("SUPER: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_super_memory_leaks)); + } + + if (atomic64_read(&ssdfs_super_cache_leaks) != 0) { + SSDFS_ERR("SUPER: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_super_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +static void init_once(void *foo) +{ + struct ssdfs_inode_info *ii = (struct ssdfs_inode_info *)foo; + + inode_init_once(&ii->vfs_inode); +} + +/* + * This method is called by inode_alloc() to allocate memory + * for struct inode and initialize it + */ +struct inode *ssdfs_alloc_inode(struct super_block *sb) +{ + struct ssdfs_inode_info *ii; + + ii = alloc_inode_sb(sb, ssdfs_inode_cachep, GFP_KERNEL); + if (!ii) + return NULL; + + ssdfs_super_cache_leaks_increment(ii); + + init_once((void *)ii); + + atomic_set(&ii->private_flags, 0); + init_rwsem(&ii->lock); + ii->parent_ino = U64_MAX; + ii->flags = 0; + ii->name_hash = 0; + ii->name_len = 0; + ii->extents_tree = NULL; + ii->dentries_tree = NULL; + ii->xattrs_tree = NULL; + ii->inline_file = NULL; + memset(&ii->raw_inode, 0, sizeof(struct ssdfs_inode)); + + return &ii->vfs_inode; +} + +static void ssdfs_i_callback(struct rcu_head *head) +{ + struct inode *inode = container_of(head, struct inode, i_rcu); + struct ssdfs_inode_info *ii = SSDFS_I(inode); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu\n", inode->i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (ii->extents_tree) + ssdfs_extents_tree_destroy(ii); + + if (ii->dentries_tree) + ssdfs_dentries_tree_destroy(ii); + + if (ii->xattrs_tree) + ssdfs_xattrs_tree_destroy(ii); + + if (ii->inline_file) + ssdfs_destroy_inline_file_buffer(inode); + + ssdfs_super_cache_leaks_decrement(ii); + kmem_cache_free(ssdfs_inode_cachep, ii); +} + +/* + * This method is called by destroy_inode() to release + * resources allocated for struct inode + */ +static void ssdfs_destroy_inode(struct inode *inode) +{ + call_rcu(&inode->i_rcu, ssdfs_i_callback); +} + +static void ssdfs_init_inode_once(void *obj) +{ + struct ssdfs_inode_info *ii = obj; + inode_init_once(&ii->vfs_inode); +} + +static int ssdfs_remount_fs(struct super_block *sb, int *flags, char *data) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + struct ssdfs_peb_extent last_sb_log = {0}; + struct ssdfs_sb_log_payload payload; + unsigned long old_sb_flags; + unsigned long old_mount_opts; + int err; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("sb %p, flags %#x, data %p\n", sb, *flags, data); +#else + SSDFS_DBG("sb %p, flags %#x, data %p\n", sb, *flags, data); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + old_sb_flags = sb->s_flags; + old_mount_opts = fsi->mount_opts; + + pagevec_init(&payload.maptbl_cache.pvec); + + err = ssdfs_parse_options(fsi, data); + if (err) + goto restore_opts; + + set_posix_acl_flag(sb); + + if ((*flags & SB_RDONLY) == (sb->s_flags & SB_RDONLY)) + goto out; + + if (*flags & SB_RDONLY) { + down_write(&fsi->volume_sem); + + err = ssdfs_prepare_sb_log(sb, &last_sb_log); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare sb log: err %d\n", + err); + } + + err = ssdfs_snapshot_sb_log_payload(sb, &payload); + if (unlikely(err)) { + SSDFS_ERR("fail to snapshot sb log's payload: err %d\n", + err); + } + + if (!err) { + err = ssdfs_commit_super(sb, SSDFS_VALID_FS, + &last_sb_log, + &payload); + } else { + SSDFS_ERR("fail to prepare sb log payload: " + "err %d\n", err); + } + + up_write(&fsi->volume_sem); + + if (err) + SSDFS_ERR("fail to commit superblock info\n"); + + sb->s_flags |= SB_RDONLY; + SSDFS_DBG("remount in RO mode\n"); + } else { + down_write(&fsi->volume_sem); + + err = ssdfs_prepare_sb_log(sb, &last_sb_log); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare sb log: err %d\n", + err); + } + + err = ssdfs_snapshot_sb_log_payload(sb, &payload); + if (unlikely(err)) { + SSDFS_ERR("fail to snapshot sb log's payload: err %d\n", + err); + } + + if (!err) { + err = ssdfs_commit_super(sb, SSDFS_MOUNTED_FS, + &last_sb_log, + &payload); + } else { + SSDFS_ERR("fail to prepare sb log payload: " + "err %d\n", err); + } + + up_write(&fsi->volume_sem); + + if (err) { + SSDFS_NOTICE("fail to commit superblock info\n"); + goto restore_opts; + } + + sb->s_flags &= ~SB_RDONLY; + SSDFS_DBG("remount in RW mode\n"); + } +out: + ssdfs_super_pagevec_release(&payload.maptbl_cache.pvec); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; + +restore_opts: + sb->s_flags = old_sb_flags; + fsi->mount_opts = old_mount_opts; + ssdfs_super_pagevec_release(&payload.maptbl_cache.pvec); + return err; +} + +static inline +bool unfinished_user_data_requests_exist(struct ssdfs_fs_info *fsi) +{ + u64 flush_requests = 0; + + spin_lock(&fsi->volume_state_lock); + flush_requests = fsi->flushing_user_data_requests; + spin_unlock(&fsi->volume_state_lock); + + return flush_requests > 0; +} + +static int ssdfs_sync_fs(struct super_block *sb, int wait) +{ + struct ssdfs_fs_info *fsi; + int err = 0; + + fsi = SSDFS_FS_I(sb); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("sb %p\n", sb); +#else + SSDFS_DBG("sb %p\n", sb); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + +#ifdef CONFIG_SSDFS_SHOW_CONSUMED_MEMORY + SSDFS_ERR("SYNCFS is starting...\n"); + ssdfs_check_memory_leaks(); +#endif /* CONFIG_SSDFS_SHOW_CONSUMED_MEMORY */ + + atomic_set(&fsi->global_fs_state, SSDFS_METADATA_GOING_FLUSHING); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("SSDFS_METADATA_GOING_FLUSHING\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + wake_up_all(&fsi->pending_wq); + + if (unfinished_user_data_requests_exist(fsi)) { + wait_queue_head_t *wq = &fsi->finish_user_data_flush_wq; + + err = wait_event_killable_timeout(*wq, + !unfinished_user_data_requests_exist(fsi), + SSDFS_DEFAULT_TIMEOUT); + if (err < 0) + WARN_ON(err < 0); + else + err = 0; + + if (unfinished_user_data_requests_exist(fsi)) + BUG(); + } + + atomic_set(&fsi->global_fs_state, SSDFS_METADATA_UNDER_FLUSH); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("SSDFS_METADATA_UNDER_FLUSH\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&fsi->volume_sem); + + if (fsi->fs_feature_compat & + SSDFS_HAS_INVALID_EXTENTS_TREE_COMPAT_FLAG) { + err = ssdfs_invextree_flush(fsi); + if (err) { + SSDFS_ERR("fail to flush invalidated extents btree: " + "err %d\n", err); + } + } + + if (fsi->fs_feature_compat & SSDFS_HAS_SHARED_EXTENTS_COMPAT_FLAG) { + err = ssdfs_shextree_flush(fsi); + if (err) { + SSDFS_ERR("fail to flush shared extents btree: " + "err %d\n", err); + } + } + + if (fsi->fs_feature_compat & SSDFS_HAS_INODES_TREE_COMPAT_FLAG) { + err = ssdfs_inodes_btree_flush(fsi->inodes_tree); + if (err) { + SSDFS_ERR("fail to flush inodes btree: " + "err %d\n", err); + } + } + + if (fsi->fs_feature_compat & SSDFS_HAS_SHARED_DICT_COMPAT_FLAG) { + err = ssdfs_shared_dict_btree_flush(fsi->shdictree); + if (err) { + SSDFS_ERR("fail to flush shared dictionary: " + "err %d\n", err); + } + } + + err = ssdfs_execute_create_snapshots(fsi); + if (err) { + SSDFS_ERR("fail to process the snapshots creation\n"); + } + + if (fsi->fs_feature_compat & SSDFS_HAS_SNAPSHOTS_TREE_COMPAT_FLAG) { + err = ssdfs_snapshots_btree_flush(fsi); + if (err) { + SSDFS_ERR("fail to flush snapshots btree: " + "err %d\n", err); + } + } + + if (fsi->fs_feature_compat & SSDFS_HAS_SEGBMAP_COMPAT_FLAG) { + err = ssdfs_segbmap_flush(fsi->segbmap); + if (err) { + SSDFS_ERR("fail to flush segment bitmap: " + "err %d\n", err); + } + } + + if (fsi->fs_feature_compat & SSDFS_HAS_MAPTBL_COMPAT_FLAG) { + err = ssdfs_maptbl_flush(fsi->maptbl); + if (err) { + SSDFS_ERR("fail to flush mapping table: " + "err %d\n", err); + } + } + + up_write(&fsi->volume_sem); + + atomic_set(&fsi->global_fs_state, SSDFS_REGULAR_FS_OPERATIONS); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("SSDFS_REGULAR_FS_OPERATIONS\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_SHOW_CONSUMED_MEMORY + SSDFS_ERR("SYNCFS has been finished...\n"); + ssdfs_check_memory_leaks(); +#endif /* CONFIG_SSDFS_SHOW_CONSUMED_MEMORY */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (unlikely(err)) + goto fail_sync_fs; + + trace_ssdfs_sync_fs(sb, wait); + + return 0; + +fail_sync_fs: + trace_ssdfs_sync_fs_exit(sb, wait, err); + return err; +} + +static struct inode *ssdfs_nfs_get_inode(struct super_block *sb, + u64 ino, u32 generation) +{ + struct inode *inode; + + if (ino < SSDFS_ROOT_INO) + return ERR_PTR(-ESTALE); + + inode = ssdfs_iget(sb, ino); + if (IS_ERR(inode)) + return ERR_CAST(inode); + if (generation && inode->i_generation != generation) { + iput(inode); + return ERR_PTR(-ESTALE); + } + return inode; +} + +static struct dentry *ssdfs_fh_to_dentry(struct super_block *sb, + struct fid *fid, + int fh_len, int fh_type) +{ + return generic_fh_to_dentry(sb, fid, fh_len, fh_type, + ssdfs_nfs_get_inode); +} + +static struct dentry *ssdfs_fh_to_parent(struct super_block *sb, + struct fid *fid, + int fh_len, int fh_type) +{ + return generic_fh_to_parent(sb, fid, fh_len, fh_type, + ssdfs_nfs_get_inode); +} + +static struct dentry *ssdfs_get_parent(struct dentry *child) +{ + struct qstr dotdot = QSTR_INIT("..", 2); + ino_t ino; + int err; + + err = ssdfs_inode_by_name(d_inode(child), &dotdot, &ino); + if (unlikely(err)) + return ERR_PTR(err); + + return d_obtain_alias(ssdfs_iget(child->d_sb, ino)); +} + +static const struct export_operations ssdfs_export_ops = { + .get_parent = ssdfs_get_parent, + .fh_to_dentry = ssdfs_fh_to_dentry, + .fh_to_parent = ssdfs_fh_to_parent, +}; + +static const struct super_operations ssdfs_super_operations = { + .alloc_inode = ssdfs_alloc_inode, + .destroy_inode = ssdfs_destroy_inode, + .evict_inode = ssdfs_evict_inode, + .write_inode = ssdfs_write_inode, + .statfs = ssdfs_statfs, + .show_options = ssdfs_show_options, + .put_super = ssdfs_put_super, + .remount_fs = ssdfs_remount_fs, + .sync_fs = ssdfs_sync_fs, +}; + +static void ssdfs_memory_page_locks_checker_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_locked_pages, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +static void ssdfs_check_memory_page_locks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_locked_pages) != 0) { + SSDFS_WARN("Lock keeps %lld memory pages\n", + atomic64_read(&ssdfs_locked_pages)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +static void ssdfs_memory_leaks_checker_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_allocated_pages, 0); + atomic64_set(&ssdfs_memory_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +#ifdef CONFIG_SSDFS_POSIX_ACL + ssdfs_acl_memory_leaks_init(); +#endif /* CONFIG_SSDFS_POSIX_ACL */ + + ssdfs_block_bmap_memory_leaks_init(); + ssdfs_btree_memory_leaks_init(); + ssdfs_btree_hierarchy_memory_leaks_init(); + ssdfs_btree_node_memory_leaks_init(); + ssdfs_btree_search_memory_leaks_init(); + +#ifdef CONFIG_SSDFS_ZLIB + ssdfs_zlib_memory_leaks_init(); +#endif /* CONFIG_SSDFS_ZLIB */ + +#ifdef CONFIG_SSDFS_LZO + ssdfs_lzo_memory_leaks_init(); +#endif /* CONFIG_SSDFS_LZO */ + + ssdfs_compr_memory_leaks_init(); + ssdfs_cur_seg_memory_leaks_init(); + ssdfs_dentries_memory_leaks_init(); + +#ifdef CONFIG_SSDFS_MTD_DEVICE + ssdfs_dev_mtd_memory_leaks_init(); +#elif defined(CONFIG_SSDFS_BLOCK_DEVICE) + ssdfs_dev_bdev_memory_leaks_init(); + ssdfs_dev_zns_memory_leaks_init(); +#else + BUILD_BUG(); +#endif + + ssdfs_dir_memory_leaks_init(); + +#ifdef CONFIG_SSDFS_DIFF_ON_WRITE_USER_DATA + ssdfs_diff_memory_leaks_init(); +#endif /* CONFIG_SSDFS_DIFF_ON_WRITE_USER_DATA */ + + ssdfs_ext_queue_memory_leaks_init(); + ssdfs_ext_tree_memory_leaks_init(); + ssdfs_file_memory_leaks_init(); + ssdfs_fs_error_memory_leaks_init(); + ssdfs_inode_memory_leaks_init(); + ssdfs_ino_tree_memory_leaks_init(); + ssdfs_invext_tree_memory_leaks_init(); + ssdfs_blk2off_memory_leaks_init(); + ssdfs_parray_memory_leaks_init(); + ssdfs_page_vector_memory_leaks_init(); + ssdfs_flush_memory_leaks_init(); + ssdfs_gc_memory_leaks_init(); + ssdfs_map_queue_memory_leaks_init(); + ssdfs_map_tbl_memory_leaks_init(); + ssdfs_map_cache_memory_leaks_init(); + ssdfs_map_thread_memory_leaks_init(); + ssdfs_migration_memory_leaks_init(); + ssdfs_peb_memory_leaks_init(); + ssdfs_read_memory_leaks_init(); + ssdfs_recovery_memory_leaks_init(); + ssdfs_req_queue_memory_leaks_init(); + ssdfs_seg_obj_memory_leaks_init(); + ssdfs_seg_bmap_memory_leaks_init(); + ssdfs_seg_blk_memory_leaks_init(); + ssdfs_seg_tree_memory_leaks_init(); + ssdfs_seq_arr_memory_leaks_init(); + ssdfs_dict_memory_leaks_init(); + ssdfs_shextree_memory_leaks_init(); + ssdfs_super_memory_leaks_init(); + ssdfs_xattr_memory_leaks_init(); + ssdfs_snap_reqs_queue_memory_leaks_init(); + ssdfs_snap_rules_list_memory_leaks_init(); + ssdfs_snap_tree_memory_leaks_init(); +} + +static void ssdfs_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_POSIX_ACL + ssdfs_acl_check_memory_leaks(); +#endif /* CONFIG_SSDFS_POSIX_ACL */ + + ssdfs_block_bmap_check_memory_leaks(); + ssdfs_btree_check_memory_leaks(); + ssdfs_btree_hierarchy_check_memory_leaks(); + ssdfs_btree_node_check_memory_leaks(); + ssdfs_btree_search_check_memory_leaks(); + +#ifdef CONFIG_SSDFS_ZLIB + ssdfs_zlib_check_memory_leaks(); +#endif /* CONFIG_SSDFS_ZLIB */ + +#ifdef CONFIG_SSDFS_LZO + ssdfs_lzo_check_memory_leaks(); +#endif /* CONFIG_SSDFS_LZO */ + + ssdfs_compr_check_memory_leaks(); + ssdfs_cur_seg_check_memory_leaks(); + ssdfs_dentries_check_memory_leaks(); + +#ifdef CONFIG_SSDFS_MTD_DEVICE + ssdfs_dev_mtd_check_memory_leaks(); +#elif defined(CONFIG_SSDFS_BLOCK_DEVICE) + ssdfs_dev_bdev_check_memory_leaks(); + ssdfs_dev_zns_check_memory_leaks(); +#else + BUILD_BUG(); +#endif + + ssdfs_dir_check_memory_leaks(); + +#ifdef CONFIG_SSDFS_DIFF_ON_WRITE_USER_DATA + ssdfs_diff_check_memory_leaks(); +#endif /* CONFIG_SSDFS_DIFF_ON_WRITE_USER_DATA */ + + ssdfs_ext_queue_check_memory_leaks(); + ssdfs_ext_tree_check_memory_leaks(); + ssdfs_file_check_memory_leaks(); + ssdfs_fs_error_check_memory_leaks(); + ssdfs_inode_check_memory_leaks(); + ssdfs_ino_tree_check_memory_leaks(); + ssdfs_invext_tree_check_memory_leaks(); + ssdfs_blk2off_check_memory_leaks(); + ssdfs_parray_check_memory_leaks(); + ssdfs_page_vector_check_memory_leaks(); + ssdfs_flush_check_memory_leaks(); + ssdfs_gc_check_memory_leaks(); + ssdfs_map_queue_check_memory_leaks(); + ssdfs_map_tbl_check_memory_leaks(); + ssdfs_map_cache_check_memory_leaks(); + ssdfs_map_thread_check_memory_leaks(); + ssdfs_migration_check_memory_leaks(); + ssdfs_peb_check_memory_leaks(); + ssdfs_read_check_memory_leaks(); + ssdfs_recovery_check_memory_leaks(); + ssdfs_req_queue_check_memory_leaks(); + ssdfs_seg_obj_check_memory_leaks(); + ssdfs_seg_bmap_check_memory_leaks(); + ssdfs_seg_blk_check_memory_leaks(); + ssdfs_seg_tree_check_memory_leaks(); + ssdfs_seq_arr_check_memory_leaks(); + ssdfs_dict_check_memory_leaks(); + ssdfs_shextree_check_memory_leaks(); + ssdfs_super_check_memory_leaks(); + ssdfs_xattr_check_memory_leaks(); + ssdfs_snap_reqs_queue_check_memory_leaks(); + ssdfs_snap_rules_list_check_memory_leaks(); + ssdfs_snap_tree_check_memory_leaks(); + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +#ifdef CONFIG_SSDFS_SHOW_CONSUMED_MEMORY + if (atomic64_read(&ssdfs_allocated_pages) != 0) { + SSDFS_ERR("Memory leaks include %lld pages\n", + atomic64_read(&ssdfs_allocated_pages)); + } + + if (atomic64_read(&ssdfs_memory_leaks) != 0) { + SSDFS_ERR("Memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_memory_leaks)); + } +#else + if (atomic64_read(&ssdfs_allocated_pages) != 0) { + SSDFS_WARN("Memory leaks include %lld pages\n", + atomic64_read(&ssdfs_allocated_pages)); + } + + if (atomic64_read(&ssdfs_memory_leaks) != 0) { + SSDFS_WARN("Memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_memory_leaks)); + } +#endif /* CONFIG_SSDFS_SHOW_CONSUMED_MEMORY */ +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +static int ssdfs_fill_super(struct super_block *sb, void *data, int silent) +{ + struct ssdfs_fs_info *fs_info; + struct ssdfs_peb_extent last_sb_log = {0}; + struct ssdfs_sb_log_payload payload; + struct inode *root_i; + u64 fs_feature_compat; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("sb %p, data %p, silent %#x\n", sb, data, silent); +#else + SSDFS_DBG("sb %p, data %p, silent %#x\n", sb, data, silent); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segment header size %zu, " + "partial log header size %zu, " + "footer size %zu\n", + sizeof(struct ssdfs_segment_header), + sizeof(struct ssdfs_partial_log_header), + sizeof(struct ssdfs_log_footer)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_memory_page_locks_checker_init(); + ssdfs_memory_leaks_checker_init(); + + fs_info = ssdfs_super_kzalloc(sizeof(*fs_info), GFP_KERNEL); + if (!fs_info) + return -ENOMEM; + +#ifdef CONFIG_SSDFS_TESTING + fs_info->do_fork_invalidation = true; +#endif /* CONFIG_SSDFS_TESTING */ + + fs_info->max_open_zones = 0; + fs_info->is_zns_device = false; + fs_info->zone_size = U64_MAX; + fs_info->zone_capacity = U64_MAX; + atomic_set(&fs_info->open_zones, 0); + +#ifdef CONFIG_SSDFS_MTD_DEVICE + fs_info->mtd = sb->s_mtd; + fs_info->devops = &ssdfs_mtd_devops; + sb->s_bdi = sb->s_mtd->backing_dev_info; +#elif defined(CONFIG_SSDFS_BLOCK_DEVICE) + if (bdev_is_zoned(sb->s_bdev)) { + fs_info->devops = &ssdfs_zns_devops; + fs_info->is_zns_device = true; + fs_info->max_open_zones = bdev_max_open_zones(sb->s_bdev); + + fs_info->zone_size = ssdfs_zns_zone_size(sb, + SSDFS_RESERVED_VBR_SIZE); + if (fs_info->zone_size >= U64_MAX) { + SSDFS_ERR("fail to get zone size\n"); + return -ERANGE; + } + + fs_info->zone_capacity = ssdfs_zns_zone_capacity(sb, + SSDFS_RESERVED_VBR_SIZE); + if (fs_info->zone_capacity >= U64_MAX) { + SSDFS_ERR("fail to get zone capacity\n"); + return -ERANGE; + } else if (fs_info->zone_capacity > fs_info->zone_size) { + SSDFS_ERR("invalid zone capacity: " + "capacity %llu, size %llu\n", + fs_info->zone_capacity, + fs_info->zone_size); + return -ERANGE; + } + } else + fs_info->devops = &ssdfs_bdev_devops; + + sb->s_bdi = bdi_get(sb->s_bdev->bd_disk->bdi); + atomic_set(&fs_info->pending_bios, 0); + fs_info->erase_page = ssdfs_super_alloc_page(GFP_KERNEL); + if (IS_ERR_OR_NULL(fs_info->erase_page)) { + err = (fs_info->erase_page == NULL ? + -ENOMEM : PTR_ERR(fs_info->erase_page)); + SSDFS_ERR("unable to allocate memory page\n"); + goto free_erase_page; + } + memset(page_address(fs_info->erase_page), 0xFF, PAGE_SIZE); +#else + BUILD_BUG(); +#endif + + fs_info->sb = sb; + sb->s_fs_info = fs_info; + atomic64_set(&fs_info->flush_reqs, 0); + init_waitqueue_head(&fs_info->pending_wq); + init_waitqueue_head(&fs_info->finish_user_data_flush_wq); + atomic_set(&fs_info->global_fs_state, SSDFS_UNKNOWN_GLOBAL_FS_STATE); + + for (i = 0; i < SSDFS_GC_THREAD_TYPE_MAX; i++) { + init_waitqueue_head(&fs_info->gc_wait_queue[i]); + atomic_set(&fs_info->gc_should_act[i], 1); + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("parse options started...\n"); +#else + SSDFS_DBG("parse options started...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + err = ssdfs_parse_options(fs_info, data); + if (err) + goto free_erase_page; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("gather superblock info started...\n"); +#else + SSDFS_DBG("gather superblock info started...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + err = ssdfs_gather_superblock_info(fs_info, silent); + if (err) + goto free_erase_page; + + spin_lock(&fs_info->volume_state_lock); + fs_feature_compat = fs_info->fs_feature_compat; + spin_unlock(&fs_info->volume_state_lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("create device group started...\n"); +#else + SSDFS_DBG("create device group started...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + err = ssdfs_sysfs_create_device_group(sb); + if (err) + goto release_maptbl_cache; + + sb->s_maxbytes = MAX_LFS_FILESIZE; + sb->s_magic = SSDFS_SUPER_MAGIC; + sb->s_op = &ssdfs_super_operations; + sb->s_export_op = &ssdfs_export_ops; + + sb->s_xattr = ssdfs_xattr_handlers; + set_posix_acl_flag(sb); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("create snapshots subsystem started...\n"); +#else + SSDFS_DBG("create snapshots subsystem started...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + err = ssdfs_snapshot_subsystem_init(fs_info); + if (err == -EINTR) { + /* + * Ignore this error. + */ + err = 0; + goto destroy_sysfs_device_group; + } else if (err) + goto destroy_sysfs_device_group; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("create segment tree started...\n"); +#else + SSDFS_DBG("create segment tree started...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + down_write(&fs_info->volume_sem); + err = ssdfs_segment_tree_create(fs_info); + up_write(&fs_info->volume_sem); + if (err) + goto destroy_snapshot_subsystem; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("create mapping table started...\n"); +#else + SSDFS_DBG("create mapping table started...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (fs_feature_compat & SSDFS_HAS_MAPTBL_COMPAT_FLAG) { + down_write(&fs_info->volume_sem); + err = ssdfs_maptbl_create(fs_info); + up_write(&fs_info->volume_sem); + + if (err == -EINTR) { + /* + * Ignore this error. + */ + err = 0; + goto destroy_segments_tree; + } else if (err) + goto destroy_segments_tree; + } else { + err = -EIO; + SSDFS_WARN("volume hasn't mapping table\n"); + goto destroy_segments_tree; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("create segment bitmap started...\n"); +#else + SSDFS_DBG("create segment bitmap started...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (fs_feature_compat & SSDFS_HAS_SEGBMAP_COMPAT_FLAG) { + down_write(&fs_info->volume_sem); + err = ssdfs_segbmap_create(fs_info); + up_write(&fs_info->volume_sem); + + if (err == -EINTR) { + /* + * Ignore this error. + */ + err = 0; + goto destroy_maptbl; + } else if (err) + goto destroy_maptbl; + } else { + err = -EIO; + SSDFS_WARN("volume hasn't segment bitmap\n"); + goto destroy_maptbl; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("create shared extents tree started...\n"); +#else + SSDFS_DBG("create shared extents tree started...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (fs_info->fs_feature_compat & SSDFS_HAS_SHARED_EXTENTS_COMPAT_FLAG) { + down_write(&fs_info->volume_sem); + err = ssdfs_shextree_create(fs_info); + up_write(&fs_info->volume_sem); + if (err) + goto destroy_segbmap; + } else { + err = -EIO; + SSDFS_WARN("volume hasn't shared extents tree\n"); + goto destroy_segbmap; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("create invalidated extents btree started...\n"); +#else + SSDFS_DBG("create invalidated extents btree started...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (fs_feature_compat & SSDFS_HAS_INVALID_EXTENTS_TREE_COMPAT_FLAG) { + down_write(&fs_info->volume_sem); + err = ssdfs_invextree_create(fs_info); + up_write(&fs_info->volume_sem); + if (err) + goto destroy_shextree; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("create current segment array started...\n"); +#else + SSDFS_DBG("create current segment array started...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + down_write(&fs_info->volume_sem); + err = ssdfs_current_segment_array_create(fs_info); + up_write(&fs_info->volume_sem); + if (err) + goto destroy_invext_btree; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("create shared dictionary started...\n"); +#else + SSDFS_DBG("create shared dictionary started...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (fs_feature_compat & SSDFS_HAS_SHARED_DICT_COMPAT_FLAG) { + down_write(&fs_info->volume_sem); + + err = ssdfs_shared_dict_btree_create(fs_info); + if (err) { + up_write(&fs_info->volume_sem); + goto destroy_current_segment_array; + } + + err = ssdfs_shared_dict_btree_init(fs_info); + if (err) { + up_write(&fs_info->volume_sem); + goto destroy_shdictree; + } + + up_write(&fs_info->volume_sem); + } else { + err = -EIO; + SSDFS_WARN("volume hasn't shared dictionary\n"); + goto destroy_current_segment_array; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("create inodes btree started...\n"); +#else + SSDFS_DBG("create inodes btree started...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (fs_feature_compat & SSDFS_HAS_INODES_TREE_COMPAT_FLAG) { + down_write(&fs_info->volume_sem); + err = ssdfs_inodes_btree_create(fs_info); + up_write(&fs_info->volume_sem); + if (err) + goto destroy_shdictree; + } else { + err = -EIO; + SSDFS_WARN("volume hasn't inodes btree\n"); + goto destroy_shdictree; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("getting root inode...\n"); +#else + SSDFS_DBG("getting root inode...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + root_i = ssdfs_iget(sb, SSDFS_ROOT_INO); + if (IS_ERR(root_i)) { + SSDFS_DBG("getting root inode failed\n"); + err = PTR_ERR(root_i); + goto destroy_inodes_btree; + } + + if (!S_ISDIR(root_i->i_mode) || !root_i->i_blocks || !root_i->i_size) { + err = -ERANGE; + iput(root_i); + SSDFS_ERR("corrupted root inode\n"); + goto destroy_inodes_btree; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("d_make_root()\n"); +#else + SSDFS_DBG("d_make_root()\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + sb->s_root = d_make_root(root_i); + if (!sb->s_root) { + err = -ENOMEM; + goto put_root_inode; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("starting GC threads...\n"); +#else + SSDFS_DBG("starting GC threads...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + err = ssdfs_start_gc_thread(fs_info, SSDFS_SEG_USING_GC_THREAD); + if (err == -EINTR) { + /* + * Ignore this error. + */ + err = 0; + goto put_root_inode; + } else if (unlikely(err)) { + SSDFS_ERR("fail to start GC-using-seg thread: " + "err %d\n", err); + goto put_root_inode; + } + + err = ssdfs_start_gc_thread(fs_info, SSDFS_SEG_USED_GC_THREAD); + if (err == -EINTR) { + /* + * Ignore this error. + */ + err = 0; + goto stop_gc_using_seg_thread; + } else if (unlikely(err)) { + SSDFS_ERR("fail to start GC-used-seg thread: " + "err %d\n", err); + goto stop_gc_using_seg_thread; + } + + err = ssdfs_start_gc_thread(fs_info, SSDFS_SEG_PRE_DIRTY_GC_THREAD); + if (err == -EINTR) { + /* + * Ignore this error. + */ + err = 0; + goto stop_gc_used_seg_thread; + } else if (unlikely(err)) { + SSDFS_ERR("fail to start GC-pre-dirty-seg thread: " + "err %d\n", err); + goto stop_gc_used_seg_thread; + } + + err = ssdfs_start_gc_thread(fs_info, SSDFS_SEG_DIRTY_GC_THREAD); + if (err == -EINTR) { + /* + * Ignore this error. + */ + err = 0; + goto stop_gc_pre_dirty_seg_thread; + } else if (unlikely(err)) { + SSDFS_ERR("fail to start GC-dirty-seg thread: " + "err %d\n", err); + goto stop_gc_pre_dirty_seg_thread; + } + + if (!(sb->s_flags & SB_RDONLY)) { + pagevec_init(&payload.maptbl_cache.pvec); + + down_write(&fs_info->volume_sem); + + err = ssdfs_prepare_sb_log(sb, &last_sb_log); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare sb log: err %d\n", + err); + } + + err = ssdfs_snapshot_sb_log_payload(sb, &payload); + if (unlikely(err)) { + SSDFS_ERR("fail to snapshot sb log's payload: err %d\n", + err); + } + + if (!err) { + err = ssdfs_commit_super(sb, SSDFS_MOUNTED_FS, + &last_sb_log, + &payload); + } else { + SSDFS_ERR("fail to prepare sb log payload: " + "err %d\n", err); + } + + up_write(&fs_info->volume_sem); + + ssdfs_super_pagevec_release(&payload.maptbl_cache.pvec); + + if (err) { + SSDFS_NOTICE("fail to commit superblock info: " + "remount filesystem in RO mode\n"); + sb->s_flags |= SB_RDONLY; + } + } + + atomic_set(&fs_info->global_fs_state, SSDFS_REGULAR_FS_OPERATIONS); + + SSDFS_INFO("%s has been mounted on device %s\n", + SSDFS_VERSION, fs_info->devops->device_name(sb)); + + return 0; + +stop_gc_pre_dirty_seg_thread: + ssdfs_stop_gc_thread(fs_info, SSDFS_SEG_PRE_DIRTY_GC_THREAD); + +stop_gc_used_seg_thread: + ssdfs_stop_gc_thread(fs_info, SSDFS_SEG_USED_GC_THREAD); + +stop_gc_using_seg_thread: + ssdfs_stop_gc_thread(fs_info, SSDFS_SEG_USING_GC_THREAD); + +put_root_inode: + iput(root_i); + +destroy_inodes_btree: + ssdfs_inodes_btree_destroy(fs_info); + +destroy_shdictree: + ssdfs_shared_dict_btree_destroy(fs_info); + +destroy_current_segment_array: + ssdfs_destroy_all_curent_segments(fs_info); + +destroy_invext_btree: + ssdfs_invextree_destroy(fs_info); + +destroy_shextree: + ssdfs_shextree_destroy(fs_info); + +destroy_segbmap: + ssdfs_segbmap_destroy(fs_info); + +destroy_maptbl: + ssdfs_maptbl_stop_thread(fs_info->maptbl); + ssdfs_maptbl_destroy(fs_info); + +destroy_segments_tree: + ssdfs_segment_tree_destroy(fs_info); + ssdfs_current_segment_array_destroy(fs_info); + +destroy_snapshot_subsystem: + ssdfs_snapshot_subsystem_destroy(fs_info); + +destroy_sysfs_device_group: + ssdfs_sysfs_delete_device_group(fs_info); + +release_maptbl_cache: + ssdfs_maptbl_cache_destroy(&fs_info->maptbl_cache); + +free_erase_page: + if (fs_info->erase_page) + ssdfs_super_free_page(fs_info->erase_page); + + ssdfs_destruct_sb_info(&fs_info->sbi); + ssdfs_destruct_sb_info(&fs_info->sbi_backup); + + ssdfs_free_workspaces(); + + ssdfs_super_kfree(fs_info); + + rcu_barrier(); + + ssdfs_check_memory_page_locks(); + ssdfs_check_memory_leaks(); + return err; +} + +static void ssdfs_put_super(struct super_block *sb) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + struct ssdfs_peb_extent last_sb_log = {0}; + struct ssdfs_sb_log_payload payload; + u64 fs_feature_compat; + u16 fs_state; + bool can_commit_super = true; + int i; + int err; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("sb %p\n", sb); +#else + SSDFS_DBG("sb %p\n", sb); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + atomic_set(&fsi->global_fs_state, SSDFS_METADATA_GOING_FLUSHING); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("SSDFS_METADATA_GOING_FLUSHING\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + wake_up_all(&fsi->pending_wq); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("STOP THREADS...\n"); +#else + SSDFS_DBG("STOP THREADS...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + err = ssdfs_stop_gc_thread(fsi, SSDFS_SEG_USING_GC_THREAD); + if (err) { + SSDFS_ERR("fail to stop GC using seg thread: " + "err %d\n", err); + } + + err = ssdfs_stop_gc_thread(fsi, SSDFS_SEG_USED_GC_THREAD); + if (err) { + SSDFS_ERR("fail to stop GC used seg thread: " + "err %d\n", err); + } + + err = ssdfs_stop_gc_thread(fsi, SSDFS_SEG_PRE_DIRTY_GC_THREAD); + if (err) { + SSDFS_ERR("fail to stop GC pre-dirty seg thread: " + "err %d\n", err); + } + + err = ssdfs_stop_gc_thread(fsi, SSDFS_SEG_DIRTY_GC_THREAD); + if (err) { + SSDFS_ERR("fail to stop GC dirty seg thread: " + "err %d\n", err); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("GC threads have been stoped\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_shared_dict_stop_thread(fsi->shdictree); + if (err == -EIO) { + ssdfs_fs_error(fsi->sb, + __FILE__, __func__, __LINE__, + "thread I/O issue\n"); + } else if (unlikely(err)) { + SSDFS_WARN("thread stopping issue: err %d\n", + err); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("shared dictionary thread has been stoped\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < SSDFS_INVALIDATION_QUEUE_NUMBER; i++) { + err = ssdfs_shextree_stop_thread(fsi->shextree, i); + if (err == -EIO) { + ssdfs_fs_error(fsi->sb, + __FILE__, __func__, __LINE__, + "thread I/O issue\n"); + } else if (unlikely(err)) { + SSDFS_WARN("thread stopping issue: ID %d, err %d\n", + i, err); + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("shared extents threads have been stoped\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_stop_snapshots_btree_thread(fsi); + if (err == -EIO) { + ssdfs_fs_error(fsi->sb, + __FILE__, __func__, __LINE__, + "thread I/O issue\n"); + } else if (unlikely(err)) { + SSDFS_WARN("thread stopping issue: err %d\n", + err); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("snaphots btree thread has been stoped\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_maptbl_stop_thread(fsi->maptbl); + if (unlikely(err)) { + SSDFS_WARN("maptbl thread stopping issue: err %d\n", + err); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("mapping table thread has been stoped\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&fsi->volume_state_lock); + fs_feature_compat = fsi->fs_feature_compat; + fs_state = fsi->fs_state; + spin_unlock(&fsi->volume_state_lock); + + pagevec_init(&payload.maptbl_cache.pvec); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("Wait unfinished user data requests...\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unfinished_user_data_requests_exist(fsi)) { + wait_queue_head_t *wq = &fsi->finish_user_data_flush_wq; + + err = wait_event_killable_timeout(*wq, + !unfinished_user_data_requests_exist(fsi), + SSDFS_DEFAULT_TIMEOUT); + if (err < 0) + WARN_ON(err < 0); + else + err = 0; + + if (unfinished_user_data_requests_exist(fsi)) + BUG(); + } + + atomic_set(&fsi->global_fs_state, SSDFS_METADATA_UNDER_FLUSH); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("SSDFS_METADATA_UNDER_FLUSH\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!(sb->s_flags & SB_RDONLY)) { + down_write(&fsi->volume_sem); + + err = ssdfs_prepare_sb_log(sb, &last_sb_log); + if (unlikely(err)) { + can_commit_super = false; + SSDFS_ERR("fail to prepare sb log: err %d\n", + err); + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("Flush invalidated extents b-tree...\n"); +#else + SSDFS_DBG("Flush invalidated extents b-tree...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (fsi->fs_feature_compat & + SSDFS_HAS_INVALID_EXTENTS_TREE_COMPAT_FLAG) { + err = ssdfs_invextree_flush(fsi); + if (err) { + SSDFS_ERR("fail to flush invalidated extents btree: " + "err %d\n", err); + } + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("Flush shared extents b-tree...\n"); +#else + SSDFS_DBG("Flush shared extents b-tree...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (fsi->fs_feature_compat & + SSDFS_HAS_SHARED_EXTENTS_COMPAT_FLAG) { + err = ssdfs_shextree_flush(fsi); + if (err) { + SSDFS_ERR("fail to flush shared extents btree: " + "err %d\n", err); + } + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("Flush inodes b-tree...\n"); +#else + SSDFS_DBG("Flush inodes b-tree...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (fs_feature_compat & SSDFS_HAS_INODES_TREE_COMPAT_FLAG) { + err = ssdfs_inodes_btree_flush(fsi->inodes_tree); + if (err) { + SSDFS_ERR("fail to flush inodes btree: " + "err %d\n", err); + } + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("Flush shared dictionary b-tree...\n"); +#else + SSDFS_DBG("Flush shared dictionary b-tree...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (fs_feature_compat & SSDFS_HAS_SHARED_DICT_COMPAT_FLAG) { + err = ssdfs_shared_dict_btree_flush(fsi->shdictree); + if (err) { + SSDFS_ERR("fail to flush shared dictionary: " + "err %d\n", err); + } + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("Execute create snapshots...\n"); +#else + SSDFS_DBG("Execute create snapshots...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + err = ssdfs_execute_create_snapshots(fsi); + if (err) { + SSDFS_ERR("fail to process the snapshots creation\n"); + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("Flush snapshots b-tree...\n"); +#else + SSDFS_DBG("Flush snapshots b-tree...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (fsi->fs_feature_compat & + SSDFS_HAS_SNAPSHOTS_TREE_COMPAT_FLAG) { + err = ssdfs_snapshots_btree_flush(fsi); + if (err) { + SSDFS_ERR("fail to flush snapshots btree: " + "err %d\n", err); + } + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("Flush segment bitmap...\n"); +#else + SSDFS_DBG("Flush segment bitmap...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (fs_feature_compat & SSDFS_HAS_SEGBMAP_COMPAT_FLAG) { + err = ssdfs_segbmap_flush(fsi->segbmap); + if (err) { + SSDFS_ERR("fail to flush segbmap: " + "err %d\n", err); + } + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("Flush PEB mapping table...\n"); +#else + SSDFS_DBG("Flush PEB mapping table...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (fs_feature_compat & SSDFS_HAS_MAPTBL_COMPAT_FLAG) { + err = ssdfs_maptbl_flush(fsi->maptbl); + if (err) { + SSDFS_ERR("fail to flush maptbl: " + "err %d\n", err); + } + + set_maptbl_going_to_be_destroyed(fsi); + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("Commit superblock...\n"); +#else + SSDFS_DBG("Commit superblock...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (can_commit_super) { + err = ssdfs_snapshot_sb_log_payload(sb, &payload); + if (unlikely(err)) { + SSDFS_ERR("fail to snapshot log's payload: " + "err %d\n", err); + } else { + err = ssdfs_commit_super(sb, SSDFS_VALID_FS, + &last_sb_log, + &payload); + } + } else { + /* prepare error code */ + err = -ERANGE; + } + + if (err) { + SSDFS_ERR("fail to commit superblock info: " + "err %d\n", err); + } + + up_write(&fsi->volume_sem); + } else { + if (fs_state == SSDFS_ERROR_FS) { + down_write(&fsi->volume_sem); + + err = ssdfs_prepare_sb_log(sb, &last_sb_log); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare sb log: err %d\n", + err); + } + + err = ssdfs_snapshot_sb_log_payload(sb, &payload); + if (unlikely(err)) { + SSDFS_ERR("fail to snapshot log's payload: " + "err %d\n", err); + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("Commit superblock...\n"); +#else + SSDFS_DBG("Commit superblock...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (!err) { + err = ssdfs_commit_super(sb, SSDFS_ERROR_FS, + &last_sb_log, + &payload); + } + + up_write(&fsi->volume_sem); + + if (err) { + SSDFS_ERR("fail to commit superblock info: " + "err %d\n", err); + } + } + } + + atomic_set(&fsi->global_fs_state, SSDFS_UNKNOWN_GLOBAL_FS_STATE); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("SSDFS_UNKNOWN_GLOBAL_FS_STATE\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("Starting destroy the metadata structures...\n"); +#else + SSDFS_DBG("Starting destroy the metadata structures...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ssdfs_super_pagevec_release(&payload.maptbl_cache.pvec); + fsi->devops->sync(sb); + ssdfs_snapshot_subsystem_destroy(fsi); + ssdfs_invextree_destroy(fsi); + ssdfs_shextree_destroy(fsi); + ssdfs_inodes_btree_destroy(fsi); + ssdfs_shared_dict_btree_destroy(fsi); + ssdfs_segbmap_destroy(fsi); + ssdfs_destroy_all_curent_segments(fsi); + ssdfs_segment_tree_destroy(fsi); + ssdfs_current_segment_array_destroy(fsi); + ssdfs_maptbl_destroy(fsi); + ssdfs_sysfs_delete_device_group(fsi); + + SSDFS_INFO("%s has been unmounted from device %s\n", + SSDFS_VERSION, fsi->devops->device_name(sb)); + + if (fsi->erase_page) + ssdfs_super_free_page(fsi->erase_page); + + ssdfs_maptbl_cache_destroy(&fsi->maptbl_cache); + ssdfs_destruct_sb_info(&fsi->sbi); + ssdfs_destruct_sb_info(&fsi->sbi_backup); + + ssdfs_free_workspaces(); + + ssdfs_super_kfree(fsi); + sb->s_fs_info = NULL; + + rcu_barrier(); + + ssdfs_check_memory_page_locks(); + ssdfs_check_memory_leaks(); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("All metadata structures have been destroyed...\n"); +#else + SSDFS_DBG("All metadata structures have been destroyed...\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ +} + +static struct dentry *ssdfs_mount(struct file_system_type *fs_type, + int flags, const char *dev_name, + void *data) +{ +#ifdef CONFIG_SSDFS_MTD_DEVICE + return mount_mtd(fs_type, flags, dev_name, data, ssdfs_fill_super); +#elif defined(CONFIG_SSDFS_BLOCK_DEVICE) + return mount_bdev(fs_type, flags, dev_name, data, ssdfs_fill_super); +#else + BUILD_BUG(); + return NULL; +#endif +} + +static void kill_ssdfs_sb(struct super_block *sb) +{ +#ifdef CONFIG_SSDFS_MTD_DEVICE + kill_mtd_super(sb); +#elif defined(CONFIG_SSDFS_BLOCK_DEVICE) + kill_block_super(sb); +#else + BUILD_BUG(); +#endif +} + +static struct file_system_type ssdfs_fs_type = { + .name = "ssdfs", + .owner = THIS_MODULE, + .mount = ssdfs_mount, + .kill_sb = kill_ssdfs_sb, +#ifdef CONFIG_SSDFS_BLOCK_DEVICE + .fs_flags = FS_REQUIRES_DEV, +#endif +}; +MODULE_ALIAS_FS(SSDFS_VERSION); + +static void ssdfs_destroy_caches(void) +{ + /* + * Make sure all delayed rcu free inodes are flushed before we + * destroy cache. + */ + rcu_barrier(); + + if (ssdfs_inode_cachep) + kmem_cache_destroy(ssdfs_inode_cachep); + + ssdfs_destroy_seg_req_obj_cache(); + ssdfs_destroy_btree_search_obj_cache(); + ssdfs_destroy_free_ino_desc_cache(); + ssdfs_destroy_btree_node_obj_cache(); + ssdfs_destroy_seg_obj_cache(); + ssdfs_destroy_extent_info_cache(); + ssdfs_destroy_peb_mapping_info_cache(); + ssdfs_destroy_blk2off_frag_obj_cache(); + ssdfs_destroy_name_info_cache(); +} + +static int ssdfs_init_caches(void) +{ + int err; + + ssdfs_zero_seg_obj_cache_ptr(); + ssdfs_zero_seg_req_obj_cache_ptr(); + ssdfs_zero_extent_info_cache_ptr(); + ssdfs_zero_btree_node_obj_cache_ptr(); + ssdfs_zero_btree_search_obj_cache_ptr(); + ssdfs_zero_free_ino_desc_cache_ptr(); + ssdfs_zero_peb_mapping_info_cache_ptr(); + ssdfs_zero_blk2off_frag_obj_cache_ptr(); + ssdfs_zero_name_info_cache_ptr(); + + ssdfs_inode_cachep = kmem_cache_create("ssdfs_inode_cache", + sizeof(struct ssdfs_inode_info), 0, + SLAB_RECLAIM_ACCOUNT | + SLAB_MEM_SPREAD | + SLAB_ACCOUNT, + ssdfs_init_inode_once); + if (!ssdfs_inode_cachep) { + SSDFS_ERR("unable to create inode cache\n"); + return -ENOMEM; + } + + err = ssdfs_init_seg_obj_cache(); + if (unlikely(err)) { + SSDFS_ERR("unable to create segment object cache: err %d\n", + err); + goto destroy_caches; + } + + err = ssdfs_init_seg_req_obj_cache(); + if (unlikely(err)) { + SSDFS_ERR("unable to create segment request object cache: " + "err %d\n", + err); + goto destroy_caches; + } + + err = ssdfs_init_extent_info_cache(); + if (unlikely(err)) { + SSDFS_ERR("unable to create extent info object cache: " + "err %d\n", + err); + goto destroy_caches; + } + + err = ssdfs_init_btree_node_obj_cache(); + if (unlikely(err)) { + SSDFS_ERR("unable to create btree node object cache: err %d\n", + err); + goto destroy_caches; + } + + err = ssdfs_init_btree_search_obj_cache(); + if (unlikely(err)) { + SSDFS_ERR("unable to create btree search object cache: " + "err %d\n", + err); + goto destroy_caches; + } + + err = ssdfs_init_free_ino_desc_cache(); + if (unlikely(err)) { + SSDFS_ERR("unable to create free inode descriptors cache: " + "err %d\n", + err); + goto destroy_caches; + } + + err = ssdfs_init_peb_mapping_info_cache(); + if (unlikely(err)) { + SSDFS_ERR("unable to create PEB mapping descriptors cache: " + "err %d\n", + err); + goto destroy_caches; + } + + err = ssdfs_init_blk2off_frag_obj_cache(); + if (unlikely(err)) { + SSDFS_ERR("unable to create blk2off fragments cache: " + "err %d\n", + err); + goto destroy_caches; + } + + err = ssdfs_init_name_info_cache(); + if (unlikely(err)) { + SSDFS_ERR("unable to create name info cache: " + "err %d\n", + err); + goto destroy_caches; + } + + return 0; + +destroy_caches: + ssdfs_destroy_caches(); + return -ENOMEM; +} + +static inline void ssdfs_print_info(void) +{ + SSDFS_INFO("%s loaded\n", SSDFS_VERSION); +} + +static int __init ssdfs_init(void) +{ + int err; + + err = ssdfs_init_caches(); + if (err) { + SSDFS_ERR("failed to initialize caches\n"); + goto failed_init; + } + + err = ssdfs_compressors_init(); + if (err) { + SSDFS_ERR("failed to initialize compressors\n"); + goto free_caches; + } + + err = ssdfs_sysfs_init(); + if (err) { + SSDFS_ERR("failed to initialize sysfs subsystem\n"); + goto stop_compressors; + } + + err = register_filesystem(&ssdfs_fs_type); + if (err) { + SSDFS_ERR("failed to register filesystem\n"); + goto sysfs_exit; + } + + ssdfs_print_info(); + + return 0; + +sysfs_exit: + ssdfs_sysfs_exit(); + +stop_compressors: + ssdfs_compressors_exit(); + +free_caches: + ssdfs_destroy_caches(); + +failed_init: + return err; +} + +static void __exit ssdfs_exit(void) +{ + ssdfs_destroy_caches(); + unregister_filesystem(&ssdfs_fs_type); + ssdfs_sysfs_exit(); + ssdfs_compressors_exit(); +} + +module_init(ssdfs_init); +module_exit(ssdfs_exit); + +MODULE_DESCRIPTION("SSDFS -- SSD-oriented File System"); +MODULE_AUTHOR("HGST, San Jose Research Center, Storage Architecture Group"); +MODULE_AUTHOR("Viacheslav Dubeyko "); +MODULE_LICENSE("Dual BSD/GPL"); From patchwork Sat Feb 25 01:08:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151911 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4CE4C6FA8E for ; Sat, 25 Feb 2023 01:16:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229636AbjBYBQC (ORCPT ); Fri, 24 Feb 2023 20:16:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48402 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229663AbjBYBPx (ORCPT ); Fri, 24 Feb 2023 20:15:53 -0500 Received: from mail-oi1-x232.google.com (mail-oi1-x232.google.com [IPv6:2607:f8b0:4864:20::232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A18B1125AF for ; Fri, 24 Feb 2023 17:15:46 -0800 (PST) Received: by mail-oi1-x232.google.com with SMTP id bm20so798235oib.7 for ; Fri, 24 Feb 2023 17:15:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=YnSYdXj+DgGw+DePm+dEPQh+XPhA+sEd1HQBMfLlT00=; b=AcIwASjZC9CP4s/TKr+w4Z1LtEWsoPD2//AYOsXAsmdB2s+bel9odUxtOw4+y68/Kr Un6WtJb4tRwDVk30GsjSO4aUMiPEG+DH0Lsg6IE8XgA+SYWMvdRuLYkgtZeg13ir9Dzx Xjf3TJZdctp0VU74Ad6MoPMRP0MpJ828kOVSvLveUykK2A0mzfzV2qBRci2n9TH2jZ9E fqdwvi/wbW/RFSC5nTwmwiA4zhFIAiEb3JylGAZ6DC30/nZbgsgBI2eMbm0vFzYEO9bz FgqAfkVXRgwyOwHrQFmrGH9YlA4/sQWWAfBlw2fxlQ+XsbPr1h4P34Skm8axcKXz22+M HMpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YnSYdXj+DgGw+DePm+dEPQh+XPhA+sEd1HQBMfLlT00=; b=5iAR7vVyLIRT0oU3cO+WjfC9WKflYjvUEO+b5yQjhT+oPyFG3BhdGrT8MmATyRo+2u vg2ho84dmePKShCJgY9UTbs+7cb3+hWabrHiwlXscufpI72wstf3j3K8BXO9ZRETJLov su4t5NHPPI6McGu3/7+QVWdo4u4MMvBuo7tn9XL5p3o1eXUrg8m8NNcUjr98VnwTN0Gc ZSK3CxCV5LztbPrAJmyYHhFsgKAp3IGFontLlimGuHSV+3BgpHfDnoL/4ZdjGr5m9g+K WTMG5MvDx6jHrL79g7ygw3cR5V0u4T2ZB11YFYCOd4L05mwhP1aKZUxM5eU4q/HNNAcv SG8w== X-Gm-Message-State: AO0yUKW5vzrqoYYleV1LAjImia6maSsO03cdBkqC75ixkAUltMZOCz4B FwFGbjsYwllpltCzYJ+++BDGhipypaPK3Vzl X-Google-Smtp-Source: AK7set99w8qWDX/2lenyGf0caAcICfCdrtApuonzd69HHWApF1EO5Tmi8/M2wRhDpPPbU/NgzRKtww== X-Received: by 2002:a05:6808:6082:b0:383:e7c8:4000 with SMTP id de2-20020a056808608200b00383e7c84000mr3248774oib.13.1677287744610; Fri, 24 Feb 2023 17:15:44 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.15.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:15:43 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 05/76] ssdfs: implement commit superblock operation Date: Fri, 24 Feb 2023 17:08:16 -0800 Message-Id: <20230225010927.813929-6-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org SSDFS has specialized superblock segment (erase block) that has goal to keep the sequence of committed superblocks. Superblock instance is stored on successful mount operation and during unmount operation. At first, logic tries to detect the state of current superblock segment. If segment (erase block) is completely full, then a new superblock segment is reserved. As a result, new superblock instance is stored into the sequence. Actually, SSDFS has main and backup copy of current superblock segments. Additionally, SSDFS keeps information about previous, current, next, and reserved superblock segments. SSDFS can use two policy of segment superblock allocation: (1) reserve a new segment for every new allocation, (2) use only set of superblock segments that have been reserved by mkfs tool. Every commit operation stores log into superblock segment. This log contains: (1) segment header, (2) payload (mapping table cache, for example), (3) log footer. Segment header can be considered like static superblock info. It contains metadata that not changed at all after volume creation (logical block size, for example) or changed rarely (number of segments in the volume, for example). Log footer can be considered like dynamic part of superblock because it contains frequently updated metadata (for example, root node of inodes b-tree). Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/super.c | 2200 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 2200 insertions(+) diff --git a/fs/ssdfs/super.c b/fs/ssdfs/super.c index a3b144e6eafb..39df1e4d9152 100644 --- a/fs/ssdfs/super.c +++ b/fs/ssdfs/super.c @@ -121,6 +121,27 @@ void ssdfs_super_check_memory_leaks(void) #endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ } +struct ssdfs_payload_content { + struct pagevec pvec; + u32 bytes_count; +}; + +struct ssdfs_sb_log_payload { + struct ssdfs_payload_content maptbl_cache; +}; + +static struct kmem_cache *ssdfs_inode_cachep; + +static int ssdfs_prepare_sb_log(struct super_block *sb, + struct ssdfs_peb_extent *last_sb_log); +static int ssdfs_snapshot_sb_log_payload(struct super_block *sb, + struct ssdfs_sb_log_payload *payload); +static int ssdfs_commit_super(struct super_block *sb, u16 fs_state, + struct ssdfs_peb_extent *last_sb_log, + struct ssdfs_sb_log_payload *payload); +static void ssdfs_put_super(struct super_block *sb); +static void ssdfs_check_memory_leaks(void); + static void init_once(void *foo) { struct ssdfs_inode_info *ii = (struct ssdfs_inode_info *)foo; @@ -528,6 +549,2185 @@ static const struct super_operations ssdfs_super_operations = { .sync_fs = ssdfs_sync_fs, }; +static inline +u32 ssdfs_sb_payload_size(struct pagevec *pvec) +{ + struct ssdfs_maptbl_cache_header *hdr; + struct page *page; + void *kaddr; + u16 fragment_bytes_count; + u32 bytes_count = 0; + int i; + + for (i = 0; i < pagevec_count(pvec); i++) { + page = pvec->pages[i]; + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + fragment_bytes_count = le16_to_cpu(hdr->bytes_count); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + WARN_ON(fragment_bytes_count > PAGE_SIZE); +#endif /* CONFIG_SSDFS_DEBUG */ + + bytes_count += fragment_bytes_count; + } + + return bytes_count; +} + +static u32 ssdfs_define_sb_log_size(struct super_block *sb) +{ + struct ssdfs_fs_info *fsi; + size_t hdr_size = sizeof(struct ssdfs_segment_header); + u32 inline_capacity; + u32 log_size = 0; + u32 payload_size; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!sb); + + SSDFS_DBG("sb %p\n", sb); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = SSDFS_FS_I(sb); + payload_size = ssdfs_sb_payload_size(&fsi->maptbl_cache.pvec); + inline_capacity = PAGE_SIZE - hdr_size; + + if (payload_size > inline_capacity) { + log_size += PAGE_SIZE; + log_size += atomic_read(&fsi->maptbl_cache.bytes_count); + log_size += PAGE_SIZE; + } else { + log_size += PAGE_SIZE; + log_size += PAGE_SIZE; + } + + log_size = (log_size + fsi->pagesize - 1) >> fsi->log_pagesize; + + return log_size; +} + +static int ssdfs_snapshot_sb_log_payload(struct super_block *sb, + struct ssdfs_sb_log_payload *payload) +{ + struct ssdfs_fs_info *fsi; + unsigned pages_count; + unsigned i; + struct page *spage, *dpage; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!sb || !payload); + BUG_ON(pagevec_count(&payload->maptbl_cache.pvec) != 0); + + SSDFS_DBG("sb %p, payload %p\n", + sb, payload); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = SSDFS_FS_I(sb); + + down_read(&fsi->maptbl_cache.lock); + + pages_count = pagevec_count(&fsi->maptbl_cache.pvec); + + for (i = 0; i < pages_count; i++) { + dpage = + ssdfs_super_add_pagevec_page(&payload->maptbl_cache.pvec); + if (unlikely(IS_ERR_OR_NULL(dpage))) { + err = !dpage ? -ENOMEM : PTR_ERR(dpage); + SSDFS_ERR("fail to add pagevec page: " + "index %u, err %d\n", + i, err); + goto finish_maptbl_snapshot; + } + + spage = fsi->maptbl_cache.pvec.pages[i]; + if (unlikely(!spage)) { + err = -ERANGE; + SSDFS_ERR("source page is absent: index %u\n", + i); + goto finish_maptbl_snapshot; + } + + ssdfs_lock_page(spage); + ssdfs_lock_page(dpage); + ssdfs_memcpy_page(dpage, 0, PAGE_SIZE, + spage, 0, PAGE_SIZE, + PAGE_SIZE); + ssdfs_unlock_page(dpage); + ssdfs_unlock_page(spage); + } + + payload->maptbl_cache.bytes_count = + atomic_read(&fsi->maptbl_cache.bytes_count); + +finish_maptbl_snapshot: + up_read(&fsi->maptbl_cache.lock); + + if (unlikely(err)) + ssdfs_super_pagevec_release(&payload->maptbl_cache.pvec); + + return err; +} + +static int ssdfs_define_next_sb_log_place(struct super_block *sb, + struct ssdfs_peb_extent *last_sb_log) +{ + struct ssdfs_fs_info *fsi; + u32 offset; + u32 log_size; + u64 cur_peb, prev_peb; + u64 cur_leb; + int i; + int err = 0; + + fsi = SSDFS_FS_I(sb); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!sb || !last_sb_log); + + SSDFS_DBG("sb %p, last_sb_log %p\n", + sb, last_sb_log); + SSDFS_DBG("fsi->sbi.last_log.leb_id %llu, " + "fsi->sbi.last_log.peb_id %llu, " + "fsi->sbi.last_log.page_offset %u, " + "fsi->sbi.last_log.pages_count %u\n", + fsi->sbi.last_log.leb_id, + fsi->sbi.last_log.peb_id, + fsi->sbi.last_log.page_offset, + fsi->sbi.last_log.pages_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + offset = fsi->sbi.last_log.page_offset; + + log_size = ssdfs_define_sb_log_size(sb); + if (log_size > fsi->pages_per_peb) { + SSDFS_ERR("log_size %u > fsi->pages_per_peb %u\n", + log_size, fsi->pages_per_peb); + return -ERANGE; + } + + log_size = max_t(u32, log_size, fsi->sbi.last_log.pages_count); + + if (offset > fsi->pages_per_peb || offset > (UINT_MAX - log_size)) { + SSDFS_ERR("inconsistent metadata state: " + "last_sb_log.page_offset %u, " + "pages_per_peb %u, log_size %u\n", + offset, fsi->pages_per_peb, log_size); + return -EINVAL; + } + + for (err = -EINVAL, i = 0; i < SSDFS_SB_SEG_COPY_MAX; i++) { + cur_peb = fsi->sb_pebs[SSDFS_CUR_SB_SEG][i]; + prev_peb = fsi->sb_pebs[SSDFS_PREV_SB_SEG][i]; + cur_leb = fsi->sb_lebs[SSDFS_CUR_SB_SEG][i]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_peb %llu, prev_peb %llu, " + "last_sb_log.peb_id %llu, err %d\n", + cur_peb, prev_peb, fsi->sbi.last_log.peb_id, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (fsi->sbi.last_log.peb_id == cur_peb) { + if ((offset + (2 * log_size)) > fsi->pages_per_peb) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb PEB %llu is full: " + "(offset %u + (2 * log_size %u)) > " + "pages_per_peb %u\n", + cur_peb, offset, log_size, + fsi->pages_per_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EFBIG; + } + + last_sb_log->leb_id = cur_leb; + last_sb_log->peb_id = cur_peb; + last_sb_log->page_offset = offset + log_size; + last_sb_log->pages_count = log_size; + + err = 0; + break; + } else if (fsi->sbi.last_log.peb_id != cur_peb && + fsi->sbi.last_log.peb_id == prev_peb) { + + last_sb_log->leb_id = cur_leb; + last_sb_log->peb_id = cur_peb; + last_sb_log->page_offset = 0; + last_sb_log->pages_count = log_size; + + err = 0; + break; + } else { + /* continue to check */ + err = -ERANGE; + } + } + + if (err) { + SSDFS_ERR("inconsistent metadata state: " + "cur_peb %llu, prev_peb %llu, " + "last_sb_log.peb_id %llu\n", + cur_peb, prev_peb, fsi->sbi.last_log.peb_id); + return err; + } + + for (i = 0; i < SSDFS_SB_SEG_COPY_MAX; i++) { + last_sb_log->leb_id = fsi->sb_lebs[SSDFS_CUR_SB_SEG][i]; + last_sb_log->peb_id = fsi->sb_pebs[SSDFS_CUR_SB_SEG][i]; + err = ssdfs_can_write_sb_log(sb, last_sb_log); + if (err) { + SSDFS_ERR("fail to write sb log into PEB %llu\n", + last_sb_log->peb_id); + return err; + } + } + + last_sb_log->leb_id = cur_leb; + last_sb_log->peb_id = cur_peb; + + return 0; +} + +static bool ssdfs_sb_seg_exhausted(struct ssdfs_fs_info *fsi, + u64 cur_leb, u64 next_leb) +{ + u64 cur_seg, next_seg; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(cur_leb == U64_MAX || next_leb == U64_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + cur_seg = SSDFS_LEB2SEG(fsi, cur_leb); + next_seg = SSDFS_LEB2SEG(fsi, next_leb); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_seg %llu, cur_leb %llu, " + "next_seg %llu, next_leb %llu\n", + cur_seg, cur_leb, next_seg, next_leb); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (cur_seg >= U64_MAX || next_seg >= U64_MAX) + return true; + + return cur_seg != next_seg; +} + +#ifndef CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET +static u64 ssdfs_correct_start_leb_id(struct ssdfs_fs_info *fsi, + int seg_type, u64 leb_id) +{ + struct completion *init_end; + struct ssdfs_maptbl_peb_relation pebr; + struct ssdfs_maptbl_peb_descriptor *ptr; + u8 peb_type = SSDFS_MAPTBL_UNKNOWN_PEB_TYPE; + u32 pebs_per_seg; + u64 seg_id; + u64 cur_leb; + u64 peb_id1, peb_id2; + u64 found_peb_id; + u64 peb_id_off; + u16 pebs_per_fragment; + u16 pebs_per_stripe; + u16 stripes_per_fragment; + u64 calculated_leb_id = leb_id; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + + SSDFS_DBG("fsi %p, seg_type %#x, leb_id %llu\n", + fsi, seg_type, leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + found_peb_id = leb_id; + peb_type = SEG2PEB_TYPE(seg_type); + pebs_per_seg = fsi->pebs_per_seg; + + seg_id = ssdfs_get_seg_id_for_leb_id(fsi, leb_id); + if (unlikely(seg_id >= U64_MAX)) { + SSDFS_ERR("invalid seg_id: " + "leb_id %llu\n", leb_id); + return -ERANGE; + } + + err = ssdfs_maptbl_define_fragment_info(fsi, leb_id, + &pebs_per_fragment, + &pebs_per_stripe, + &stripes_per_fragment); + if (unlikely(err)) { + SSDFS_ERR("fail to define fragment info: " + "err %d\n", err); + return err; + } + + for (i = 0; i < pebs_per_seg; i++) { + cur_leb = ssdfs_get_leb_id_for_peb_index(fsi, seg_id, i); + if (cur_leb >= U64_MAX) { + SSDFS_ERR("fail to convert PEB index into LEB ID: " + "seg %llu, peb_index %u\n", + seg_id, i); + return -ERANGE; + } + + err = ssdfs_maptbl_convert_leb2peb(fsi, cur_leb, + peb_type, &pebr, + &init_end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(init_end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + goto finish_leb_id_correction; + } + + err = ssdfs_maptbl_convert_leb2peb(fsi, cur_leb, + peb_type, &pebr, + &init_end); + } + + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB is not mapped: leb_id %llu\n", + cur_leb); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_leb_id_correction; + } else if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, peb_type %#x, err %d\n", + cur_leb, peb_type, err); + goto finish_leb_id_correction; + } + + ptr = &pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX]; + peb_id1 = ptr->peb_id; + ptr = &pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX]; + peb_id2 = ptr->peb_id; + + if (peb_id1 < U64_MAX) + found_peb_id = max_t(u64, peb_id1, found_peb_id); + + if (peb_id2 < U64_MAX) + found_peb_id = max_t(u64, peb_id2, found_peb_id); + + peb_id_off = found_peb_id % pebs_per_stripe; + if (peb_id_off >= (pebs_per_stripe / 2)) { + calculated_leb_id = found_peb_id / pebs_per_stripe; + calculated_leb_id++; + calculated_leb_id *= pebs_per_stripe; + } else { + calculated_leb_id = found_peb_id; + calculated_leb_id++; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found_peb_id %llu, pebs_per_stripe %u, " + "calculated_leb_id %llu\n", + found_peb_id, pebs_per_stripe, + calculated_leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + } + +finish_leb_id_correction: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("leb_id %llu, calculated_leb_id %llu\n", + leb_id, calculated_leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + return calculated_leb_id; +} +#endif /* CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET */ + +#ifndef CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET +static int __ssdfs_reserve_clean_segment(struct ssdfs_fs_info *fsi, + int sb_seg_type, + u64 start_search_id, + u64 *reserved_seg) +{ + struct ssdfs_segment_bmap *segbmap = fsi->segbmap; + u64 start_seg = start_search_id; + u64 end_seg = U64_MAX; + struct ssdfs_maptbl_peb_relation pebr; + struct completion *end; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!reserved_seg); + BUG_ON(sb_seg_type >= SSDFS_SB_SEG_COPY_MAX); + + SSDFS_DBG("fsi %p, sb_seg_type %#x, start_search_id %llu\n", + fsi, sb_seg_type, start_search_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (sb_seg_type) { + case SSDFS_MAIN_SB_SEG: + case SSDFS_COPY_SB_SEG: + err = ssdfs_segment_detect_search_range(fsi, + &start_seg, + &end_seg); + if (err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find fragment for search: " + "start_seg %llu, end_seg %llu\n", + start_seg, end_seg); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to define a search range: " + "start_seg %llu, err %d\n", + start_seg, err); + return err; + } + break; + + default: + BUG(); + }; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_seg %llu, end_seg %llu\n", + start_seg, end_seg); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_segbmap_reserve_clean_segment(segbmap, + start_seg, end_seg, + reserved_seg, &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("segbmap init failed: " + "err %d\n", err); + goto finish_search; + } + + err = ssdfs_segbmap_reserve_clean_segment(segbmap, + start_seg, end_seg, + reserved_seg, + &end); + } + + if (err == -ENODATA) { + err = -ENOENT; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to reserve segment: " + "type %#x, start_seg %llu, end_seg %llu\n", + sb_seg_type, start_seg, end_seg); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_search; + } else if (unlikely(err)) { + SSDFS_ERR("fail to reserve segment: " + "type %#x, start_seg %llu, " + "end_seg %llu, err %d\n", + sb_seg_type, start_seg, end_seg, err); + goto finish_search; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("reserved_seg %llu\n", *reserved_seg); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < fsi->pebs_per_seg; i++) { + u8 peb_type = SSDFS_MAPTBL_SBSEG_PEB_TYPE; + u64 leb_id; + + leb_id = ssdfs_get_leb_id_for_peb_index(fsi, *reserved_seg, i); + if (leb_id >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("fail to convert PEB index into LEB ID: " + "seg %llu, peb_index %u\n", + *reserved_seg, i); + goto finish_search; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("leb_id %llu\n", leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_maptbl_map_leb2peb(fsi, leb_id, peb_type, + &pebr, &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + goto finish_search; + } + + err = ssdfs_maptbl_map_leb2peb(fsi, leb_id, + peb_type, + &pebr, &end); + } + + if (err == -EACCES || err == -ENOENT) { + if (i == 0) { + SSDFS_ERR("fail to map LEB to PEB: " + "reserved_seg %llu, leb_id %llu, " + "err %d\n", + *reserved_seg, leb_id, err); + goto finish_search; + } else + goto finish_search; + } else if (unlikely(err)) { + SSDFS_ERR("fail to map LEB to PEB: " + "reserved_seg %llu, leb_id %llu, " + "err %d\n", + *reserved_seg, leb_id, err); + goto finish_search; + } + } + +finish_search: + if (err == -ENOENT) + *reserved_seg = end_seg; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("reserved_seg %llu\n", *reserved_seg); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} +#endif /* CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET */ + +#ifndef CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET +static int ssdfs_reserve_clean_segment(struct super_block *sb, + int sb_seg_type, u64 start_leb, + u64 *reserved_seg) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + u64 start_search_id; + u64 cur_id; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!reserved_seg); + BUG_ON(sb_seg_type >= SSDFS_SB_SEG_COPY_MAX); + + SSDFS_DBG("sb %p, sb_seg_type %#x, start_leb %llu\n", + sb, sb_seg_type, start_leb); +#endif /* CONFIG_SSDFS_DEBUG */ + + *reserved_seg = U64_MAX; + + start_leb = ssdfs_correct_start_leb_id(fsi, + SSDFS_SB_SEG_TYPE, + start_leb); + + start_search_id = SSDFS_LEB2SEG(fsi, start_leb); + if (start_search_id >= fsi->nsegs) + start_search_id = 0; + + cur_id = start_search_id; + + while (cur_id < fsi->nsegs) { + err = __ssdfs_reserve_clean_segment(fsi, sb_seg_type, + cur_id, reserved_seg); + if (err == -ENOENT) { + err = 0; + cur_id = *reserved_seg; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_id %llu\n", cur_id); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find a new segment: " + "cur_id %llu, err %d\n", + cur_id, err); + return err; + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found seg_id %llu\n", *reserved_seg); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + } + + cur_id = 0; + + while (cur_id < start_search_id) { + err = __ssdfs_reserve_clean_segment(fsi, sb_seg_type, + cur_id, reserved_seg); + if (err == -ENOENT) { + err = 0; + cur_id = *reserved_seg; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_id %llu\n", cur_id); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find a new segment: " + "cur_id %llu, err %d\n", + cur_id, err); + return err; + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found seg_id %llu\n", *reserved_seg); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("no free space for a new segment\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return -ENOSPC; +} +#endif /* CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET */ + +typedef u64 sb_pebs_array[SSDFS_SB_CHAIN_MAX][SSDFS_SB_SEG_COPY_MAX]; + +static int ssdfs_erase_dirty_prev_sb_segs(struct ssdfs_fs_info *fsi, + u64 prev_leb) +{ + struct completion *init_end; + u8 peb_type = SSDFS_MAPTBL_UNKNOWN_PEB_TYPE; + u32 pebs_per_seg; + u64 seg_id; + u64 cur_leb; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + + SSDFS_DBG("fsi %p, prev_leb %llu\n", + fsi, prev_leb); +#endif /* CONFIG_SSDFS_DEBUG */ + + peb_type = SEG2PEB_TYPE(SSDFS_SB_SEG_TYPE); + pebs_per_seg = fsi->pebs_per_seg; + + seg_id = SSDFS_LEB2SEG(fsi, prev_leb); + if (seg_id >= U64_MAX) { + SSDFS_ERR("invalid seg_id for leb_id %llu\n", + prev_leb); + return -ERANGE; + } + + for (i = 0; i < pebs_per_seg; i++) { + cur_leb = ssdfs_get_leb_id_for_peb_index(fsi, seg_id, i); + if (cur_leb >= U64_MAX) { + SSDFS_ERR("invalid leb_id for seg_id %llu\n", + seg_id); + return -ERANGE; + } + + err = ssdfs_maptbl_erase_reserved_peb_now(fsi, + cur_leb, + peb_type, + &init_end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(init_end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + return err; + } + + err = ssdfs_maptbl_erase_reserved_peb_now(fsi, + cur_leb, + peb_type, + &init_end); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to erase reserved dirty PEB: " + "leb_id %llu, err %d\n", + cur_leb, err); + return err; + } + } + + return 0; +} + +static int ssdfs_move_on_next_peb_in_sb_seg(struct super_block *sb, + int sb_seg_type, + sb_pebs_array *sb_lebs, + sb_pebs_array *sb_pebs) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + u64 prev_leb, cur_leb, next_leb, reserved_leb; + u64 prev_peb, cur_peb, next_peb, reserved_peb; +#ifdef CONFIG_SSDFS_DEBUG + u64 new_leb = U64_MAX, new_peb = U64_MAX; +#endif /* CONFIG_SSDFS_DEBUG */ + struct ssdfs_maptbl_peb_relation pebr; + u8 peb_type = SSDFS_MAPTBL_SBSEG_PEB_TYPE; + struct completion *end = NULL; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!sb || !sb_lebs || !sb_pebs); + + if (sb_seg_type >= SSDFS_SB_SEG_COPY_MAX) { + SSDFS_ERR("invalid sb_seg_type %#x\n", + sb_seg_type); + return -EINVAL; + } + + SSDFS_DBG("sb %p, sb_seg_type %#x\n", sb, sb_seg_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + prev_leb = (*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type]; + cur_leb = (*sb_lebs)[SSDFS_CUR_SB_SEG][sb_seg_type]; + next_leb = cur_leb + 1; + reserved_leb = (*sb_lebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type]; + + prev_peb = (*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type]; + cur_peb = (*sb_pebs)[SSDFS_CUR_SB_SEG][sb_seg_type]; + next_peb = U64_MAX; + reserved_peb = (*sb_pebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type]; + + err = ssdfs_maptbl_convert_leb2peb(fsi, next_leb, + peb_type, + &pebr, &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + goto finish_move_sb_seg; + } + + err = ssdfs_maptbl_convert_leb2peb(fsi, next_leb, + peb_type, + &pebr, &end); + } + + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB %llu doesn't mapped\n", next_leb); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_move_sb_seg; + } else if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB %llu to PEB: err %d\n", + next_leb, err); + goto finish_move_sb_seg; + } + + next_peb = pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(next_peb == U64_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + (*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_leb; + (*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_peb; + + (*sb_lebs)[SSDFS_CUR_SB_SEG][sb_seg_type] = next_leb; + (*sb_pebs)[SSDFS_CUR_SB_SEG][sb_seg_type] = next_peb; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_leb %llu, cur_peb %llu, " + "next_leb %llu, next_peb %llu, " + "prev_leb %llu, prev_peb %llu, " + "reserved_leb %llu, reserved_peb %llu, " + "new_leb %llu, new_peb %llu\n", + cur_leb, cur_peb, + next_leb, next_peb, + prev_leb, prev_peb, + reserved_leb, reserved_peb, + new_leb, new_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (prev_leb == U64_MAX) + goto finish_move_sb_seg; + else { + err = ssdfs_erase_dirty_prev_sb_segs(fsi, prev_leb); + if (unlikely(err)) { + SSDFS_ERR("fail erase dirty PEBs: " + "prev_leb %llu, err %d\n", + prev_leb, err); + goto finish_move_sb_seg; + } + } + +finish_move_sb_seg: + return err; +} + +#ifdef CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET +static int ssdfs_move_on_first_peb_next_sb_seg(struct super_block *sb, + int sb_seg_type, + sb_pebs_array *sb_lebs, + sb_pebs_array *sb_pebs) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + u64 prev_leb, cur_leb, next_leb, reserved_leb; + u64 prev_peb, cur_peb, next_peb, reserved_peb; + u64 seg_id; + struct ssdfs_maptbl_peb_relation pebr; + u8 peb_type = SSDFS_MAPTBL_SBSEG_PEB_TYPE; + struct completion *end = NULL; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!sb || !sb_lebs || !sb_pebs); + + if (sb_seg_type >= SSDFS_SB_SEG_COPY_MAX) { + SSDFS_ERR("invalid sb_seg_type %#x\n", + sb_seg_type); + return -EINVAL; + } + + SSDFS_DBG("sb %p, sb_seg_type %#x\n", sb, sb_seg_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + prev_leb = (*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type]; + cur_leb = (*sb_lebs)[SSDFS_CUR_SB_SEG][sb_seg_type]; + next_leb = (*sb_lebs)[SSDFS_NEXT_SB_SEG][sb_seg_type]; + reserved_leb = (*sb_lebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type]; + + prev_peb = (*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type]; + cur_peb = (*sb_pebs)[SSDFS_CUR_SB_SEG][sb_seg_type]; + next_peb = (*sb_pebs)[SSDFS_NEXT_SB_SEG][sb_seg_type]; + reserved_peb = (*sb_pebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_peb %llu, next_peb %llu, " + "cur_leb %llu, next_leb %llu\n", + cur_peb, next_peb, cur_leb, next_leb); +#endif /* CONFIG_SSDFS_DEBUG */ + + (*sb_lebs)[SSDFS_CUR_SB_SEG][sb_seg_type] = next_leb; + (*sb_pebs)[SSDFS_CUR_SB_SEG][sb_seg_type] = next_peb; + + if (prev_leb >= U64_MAX) { + (*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_leb; + (*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_peb; + + if (fsi->pebs_per_seg == 1) { + (*sb_lebs)[SSDFS_NEXT_SB_SEG][sb_seg_type] = + reserved_leb; + (*sb_pebs)[SSDFS_NEXT_SB_SEG][sb_seg_type] = + reserved_peb; + + (*sb_lebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type] = + U64_MAX; + (*sb_pebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type] = + U64_MAX; + } else { + /* + * do nothing + */ + } + } else { + err = ssdfs_erase_dirty_prev_sb_segs(fsi, prev_leb); + if (unlikely(err)) { + SSDFS_ERR("fail erase dirty PEBs: " + "prev_leb %llu, err %d\n", + prev_leb, err); + goto finish_move_sb_seg; + } + + if (fsi->pebs_per_seg == 1) { + (*sb_lebs)[SSDFS_NEXT_SB_SEG][sb_seg_type] = + prev_leb; + (*sb_pebs)[SSDFS_NEXT_SB_SEG][sb_seg_type] = + prev_peb; + + (*sb_lebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type] = + U64_MAX; + (*sb_pebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type] = + U64_MAX; + + (*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_leb; + (*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_peb; + } else { + (*sb_lebs)[SSDFS_NEXT_SB_SEG][sb_seg_type] = + reserved_leb; + (*sb_pebs)[SSDFS_NEXT_SB_SEG][sb_seg_type] = + reserved_peb; + + seg_id = SSDFS_LEB2SEG(fsi, prev_leb); + if (seg_id >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid seg_id for leb_id %llu\n", + prev_leb); + goto finish_move_sb_seg; + } + + prev_leb = ssdfs_get_leb_id_for_peb_index(fsi, seg_id, 0); + if (prev_leb >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid leb_id for seg_id %llu\n", + seg_id); + goto finish_move_sb_seg; + } + + err = ssdfs_maptbl_convert_leb2peb(fsi, prev_leb, + peb_type, + &pebr, &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + goto finish_move_sb_seg; + } + + err = ssdfs_maptbl_convert_leb2peb(fsi, + prev_leb, + peb_type, + &pebr, &end); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB %llu to PEB: " + "err %d\n", prev_leb, err); + goto finish_move_sb_seg; + } + + prev_peb = pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(prev_peb == U64_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + (*sb_lebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type] = + prev_leb; + (*sb_pebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type] = + prev_peb; + + (*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_leb; + (*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_peb; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_leb %llu, cur_peb %llu, " + "next_leb %llu, next_peb %llu, " + "reserved_leb %llu, reserved_peb %llu, " + "prev_leb %llu, prev_peb %llu\n", + (*sb_lebs)[SSDFS_CUR_SB_SEG][sb_seg_type], + (*sb_pebs)[SSDFS_CUR_SB_SEG][sb_seg_type], + (*sb_lebs)[SSDFS_NEXT_SB_SEG][sb_seg_type], + (*sb_pebs)[SSDFS_NEXT_SB_SEG][sb_seg_type], + (*sb_lebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type], + (*sb_pebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type], + (*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type], + (*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type]); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_move_sb_seg: + return err; +} +#else +static int ssdfs_move_on_first_peb_next_sb_seg(struct super_block *sb, + int sb_seg_type, + sb_pebs_array *sb_lebs, + sb_pebs_array *sb_pebs) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + struct ssdfs_segment_bmap *segbmap = fsi->segbmap; + struct ssdfs_maptbl_cache *maptbl_cache = &fsi->maptbl_cache; + u64 prev_leb, cur_leb, next_leb, reserved_leb; + u64 prev_peb, cur_peb, next_peb, reserved_peb; + u64 new_leb = U64_MAX, new_peb = U64_MAX; + u64 reserved_seg; + u64 prev_seg, cur_seg; + struct ssdfs_maptbl_peb_relation pebr; + u8 peb_type = SSDFS_MAPTBL_SBSEG_PEB_TYPE; + struct completion *end = NULL; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!sb || !sb_lebs || !sb_pebs); + + if (sb_seg_type >= SSDFS_SB_SEG_COPY_MAX) { + SSDFS_ERR("invalid sb_seg_type %#x\n", + sb_seg_type); + return -EINVAL; + } + + SSDFS_DBG("sb %p, sb_seg_type %#x\n", sb, sb_seg_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + prev_leb = (*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type]; + cur_leb = (*sb_lebs)[SSDFS_CUR_SB_SEG][sb_seg_type]; + next_leb = (*sb_lebs)[SSDFS_NEXT_SB_SEG][sb_seg_type]; + reserved_leb = (*sb_lebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type]; + + prev_peb = (*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type]; + cur_peb = (*sb_pebs)[SSDFS_CUR_SB_SEG][sb_seg_type]; + next_peb = (*sb_pebs)[SSDFS_NEXT_SB_SEG][sb_seg_type]; + reserved_peb = (*sb_pebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_peb %llu, next_peb %llu, " + "cur_leb %llu, next_leb %llu\n", + cur_peb, next_peb, cur_leb, next_leb); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_reserve_clean_segment(sb, sb_seg_type, cur_leb, + &reserved_seg); + if (unlikely(err)) { + SSDFS_ERR("fail to reserve clean segment: err %d\n", err); + goto finish_move_sb_seg; + } + + new_leb = ssdfs_get_leb_id_for_peb_index(fsi, reserved_seg, 0); + if (new_leb >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("fail to convert PEB index into LEB ID: " + "seg %llu\n", reserved_seg); + goto finish_move_sb_seg; + } + + err = ssdfs_maptbl_convert_leb2peb(fsi, new_leb, + peb_type, + &pebr, &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + goto finish_move_sb_seg; + } + + err = ssdfs_maptbl_convert_leb2peb(fsi, new_leb, + peb_type, + &pebr, &end); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB %llu to PEB: err %d\n", + new_leb, err); + goto finish_move_sb_seg; + } + + new_peb = pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(new_peb == U64_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + (*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_leb; + (*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_peb; + + (*sb_lebs)[SSDFS_CUR_SB_SEG][sb_seg_type] = next_leb; + (*sb_pebs)[SSDFS_CUR_SB_SEG][sb_seg_type] = next_peb; + + (*sb_lebs)[SSDFS_NEXT_SB_SEG][sb_seg_type] = reserved_leb; + (*sb_pebs)[SSDFS_NEXT_SB_SEG][sb_seg_type] = reserved_peb; + + (*sb_lebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type] = new_leb; + (*sb_pebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type] = new_peb; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_leb %llu, cur_peb %llu, " + "next_leb %llu, next_peb %llu, " + "reserved_leb %llu, reserved_peb %llu, " + "new_leb %llu, new_peb %llu\n", + cur_leb, cur_peb, + next_leb, next_peb, + reserved_leb, reserved_peb, + new_leb, new_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (prev_leb == U64_MAX) + goto finish_move_sb_seg; + + prev_seg = SSDFS_LEB2SEG(fsi, prev_leb); + cur_seg = SSDFS_LEB2SEG(fsi, cur_leb); + + if (prev_seg != cur_seg) { + err = ssdfs_segbmap_change_state(segbmap, prev_seg, + SSDFS_SEG_DIRTY, &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("segbmap init failed: " + "err %d\n", err); + goto finish_move_sb_seg; + } + + err = ssdfs_segbmap_change_state(segbmap, prev_seg, + SSDFS_SEG_DIRTY, &end); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to change segment state: " + "seg %llu, state %#x, err %d\n", + prev_seg, SSDFS_SEG_DIRTY, err); + goto finish_move_sb_seg; + } + } + + err = ssdfs_maptbl_change_peb_state(fsi, prev_leb, peb_type, + SSDFS_MAPTBL_DIRTY_PEB_STATE, + &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + goto finish_move_sb_seg; + } + + err = ssdfs_maptbl_change_peb_state(fsi, + prev_leb, peb_type, + SSDFS_MAPTBL_DIRTY_PEB_STATE, + &end); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to change the PEB state: " + "leb_id %llu, new_state %#x, err %d\n", + prev_leb, SSDFS_MAPTBL_DIRTY_PEB_STATE, err); + goto finish_move_sb_seg; + } + + err = ssdfs_maptbl_cache_forget_leb2peb(maptbl_cache, prev_leb, + SSDFS_PEB_STATE_CONSISTENT); + if (unlikely(err)) { + SSDFS_ERR("fail to forget prev_leb %llu, err %d\n", + prev_leb, err); + goto finish_move_sb_seg; + } + +finish_move_sb_seg: + return err; +} +#endif /* CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET */ + +static int ssdfs_move_on_next_sb_seg(struct super_block *sb, + int sb_seg_type, + sb_pebs_array *sb_lebs, + sb_pebs_array *sb_pebs) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + u64 cur_leb, next_leb; + u64 cur_peb; + u8 peb_type = SSDFS_MAPTBL_SBSEG_PEB_TYPE; + struct completion *end = NULL; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!sb || !sb_lebs || !sb_pebs); + + if (sb_seg_type >= SSDFS_SB_SEG_COPY_MAX) { + SSDFS_ERR("invalid sb_seg_type %#x\n", + sb_seg_type); + return -EINVAL; + } + + SSDFS_DBG("sb %p, sb_seg_type %#x\n", sb, sb_seg_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + cur_leb = (*sb_lebs)[SSDFS_CUR_SB_SEG][sb_seg_type]; + cur_peb = (*sb_pebs)[SSDFS_CUR_SB_SEG][sb_seg_type]; + + next_leb = cur_leb + 1; + + err = ssdfs_maptbl_change_peb_state(fsi, cur_leb, peb_type, + SSDFS_MAPTBL_USED_PEB_STATE, + &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + return err; + } + + err = ssdfs_maptbl_change_peb_state(fsi, + cur_leb, peb_type, + SSDFS_MAPTBL_USED_PEB_STATE, + &end); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to change the PEB state: " + "leb_id %llu, new_state %#x, err %d\n", + cur_leb, SSDFS_MAPTBL_USED_PEB_STATE, err); + return err; + } + + if (!ssdfs_sb_seg_exhausted(fsi, cur_leb, next_leb)) { + err = ssdfs_move_on_next_peb_in_sb_seg(sb, sb_seg_type, + sb_lebs, sb_pebs); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to move on next PEB of segment: " + "cur_leb %llu, next_leb %llu\n", + cur_leb, next_leb); +#endif /* CONFIG_SSDFS_DEBUG */ + goto try_move_on_first_peb_next_sb_seg; + } + } else { +try_move_on_first_peb_next_sb_seg: + err = ssdfs_move_on_first_peb_next_sb_seg(sb, sb_seg_type, + sb_lebs, sb_pebs); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to move on next sb segment: " + "sb_seg_type %#x, cur_leb %llu, " + "cur_peb %llu, err %d\n", + sb_seg_type, cur_leb, + cur_peb, err); + return err; + } + + return 0; +} + +static int ssdfs_move_on_next_sb_segs_pair(struct super_block *sb) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + sb_pebs_array sb_lebs; + sb_pebs_array sb_pebs; + size_t array_size; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb %p", sb); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!(fsi->fs_feature_compat & SSDFS_HAS_SEGBMAP_COMPAT_FLAG) || + !(fsi->fs_feature_compat & SSDFS_HAS_MAPTBL_COMPAT_FLAG)) { + SSDFS_ERR("volume hasn't segbmap or maptbl\n"); + return -EIO; + } + + array_size = sizeof(u64); + array_size *= SSDFS_SB_CHAIN_MAX; + array_size *= SSDFS_SB_SEG_COPY_MAX; + + down_read(&fsi->sb_segs_sem); + ssdfs_memcpy(sb_lebs, 0, array_size, + fsi->sb_lebs, 0, array_size, + array_size); + ssdfs_memcpy(sb_pebs, 0, array_size, + fsi->sb_pebs, 0, array_size, + array_size); + up_read(&fsi->sb_segs_sem); + + for (i = 0; i < SSDFS_SB_SEG_COPY_MAX; i++) { + err = ssdfs_move_on_next_sb_seg(sb, i, &sb_lebs, &sb_pebs); + if (unlikely(err)) { + SSDFS_ERR("fail to move on next sb PEB: err %d\n", + err); + return err; + } + } + + down_write(&fsi->sb_segs_sem); + ssdfs_memcpy(fsi->sb_lebs, 0, array_size, + sb_lebs, 0, array_size, + array_size); + ssdfs_memcpy(fsi->sb_pebs, 0, array_size, + sb_pebs, 0, array_size, + array_size); + up_write(&fsi->sb_segs_sem); + + return 0; +} + +static +int ssdfs_prepare_sb_log(struct super_block *sb, + struct ssdfs_peb_extent *last_sb_log) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!sb || !last_sb_log); + + SSDFS_DBG("sb %p, last_sb_log %p\n", + sb, last_sb_log); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_define_next_sb_log_place(sb, last_sb_log); + switch (err) { + case -EFBIG: /* current sb segment is exhausted */ + case -EIO: /* current sb segment is corrupted */ + err = ssdfs_move_on_next_sb_segs_pair(sb); + if (err) { + SSDFS_ERR("fail to move on next sb segs pair: err %d\n", + err); + return err; + } + err = ssdfs_define_next_sb_log_place(sb, last_sb_log); + if (unlikely(err)) { + SSDFS_ERR("unable to define next sb log place: err %d\n", + err); + return err; + } + break; + + default: + if (err) { + SSDFS_ERR("unable to define next sb log place: err %d\n", + err); + return err; + } + break; + } + + return 0; +} + +static void +ssdfs_prepare_maptbl_cache_descriptor(struct ssdfs_metadata_descriptor *desc, + u32 offset, + struct ssdfs_payload_content *payload, + u32 payload_size) +{ + unsigned i; + u32 csum = ~0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!desc || !payload); + + SSDFS_DBG("desc %p, offset %u, payload %p\n", + desc, offset, payload); +#endif /* CONFIG_SSDFS_DEBUG */ + + desc->offset = cpu_to_le32(offset); + desc->size = cpu_to_le32(payload_size); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(payload_size >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + desc->check.bytes = cpu_to_le16((u16)payload_size); + desc->check.flags = cpu_to_le16(SSDFS_CRC32); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(pagevec_count(&payload->pvec) == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < pagevec_count(&payload->pvec); i++) { + struct page *page = payload->pvec.pages[i]; + struct ssdfs_maptbl_cache_header *hdr; + u16 bytes_count; + void *kaddr; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + bytes_count = le16_to_cpu(hdr->bytes_count); + + csum = crc32(csum, kaddr, bytes_count); + + kunmap_local(kaddr); + ssdfs_unlock_page(page); + } + + desc->check.csum = cpu_to_le32(csum); +} + +static +int ssdfs_prepare_snapshot_rules_for_commit(struct ssdfs_fs_info *fsi, + struct ssdfs_metadata_descriptor *desc, + u32 offset) +{ + struct ssdfs_snapshot_rules_header *hdr; + size_t hdr_size = sizeof(struct ssdfs_snapshot_rules_header); + size_t info_size = sizeof(struct ssdfs_snapshot_rule_info); + struct ssdfs_snapshot_rule_item *item = NULL; + u32 payload_off; + u32 item_off; + u32 pagesize = fsi->pagesize; + u16 items_count = 0; + u16 items_capacity = 0; + u32 area_size = 0; + struct list_head *this, *next; + u32 csum = ~0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !desc); + + SSDFS_DBG("fsi %p, offset %u\n", + fsi, offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_ssdfs_snapshot_rules_list_empty(&fsi->snapshots.rules_list)) { + SSDFS_DBG("snapshot rules list is empty\n"); + return -ENODATA; + } + + payload_off = offsetof(struct ssdfs_log_footer, payload); + hdr = SSDFS_SNRU_HDR((u8 *)fsi->sbi.vs_buf + payload_off); + memset(hdr, 0, hdr_size); + area_size = pagesize - payload_off; + item_off = payload_off + hdr_size; + + items_capacity = (u16)((area_size - hdr_size) / info_size); + area_size = min_t(u32, area_size, (u32)items_capacity * info_size); + + spin_lock(&fsi->snapshots.rules_list.lock); + list_for_each_safe(this, next, &fsi->snapshots.rules_list.list) { + item = list_entry(this, struct ssdfs_snapshot_rule_item, list); + + err = ssdfs_memcpy(fsi->sbi.vs_buf, item_off, pagesize, + &item->rule, 0, info_size, + info_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + goto finish_copy_items; + } + + item_off += info_size; + items_count++; + } +finish_copy_items: + spin_unlock(&fsi->snapshots.rules_list.lock); + + if (unlikely(err)) + return err; + + hdr->magic = cpu_to_le32(SSDFS_SNAPSHOT_RULES_MAGIC); + hdr->item_size = cpu_to_le16(info_size); + hdr->flags = cpu_to_le16(0); + + if (items_count == 0 || items_count > items_capacity) { + SSDFS_ERR("invalid items number: " + "items_count %u, items_capacity %u, " + "area_size %u, item_size %zu\n", + items_count, items_capacity, + area_size, info_size); + return -ERANGE; + } + + hdr->items_count = cpu_to_le16(items_count); + hdr->items_capacity = cpu_to_le16(items_capacity); + hdr->area_size = cpu_to_le16(area_size); + + desc->offset = cpu_to_le32(offset); + desc->size = cpu_to_le32(area_size); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(area_size >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + desc->check.bytes = cpu_to_le16(area_size); + desc->check.flags = cpu_to_le16(SSDFS_CRC32); + + csum = crc32(csum, hdr, area_size); + desc->check.csum = cpu_to_le32(csum); + + return 0; +} + +static int __ssdfs_commit_sb_log(struct super_block *sb, + u64 timestamp, u64 cno, + struct ssdfs_peb_extent *last_sb_log, + struct ssdfs_sb_log_payload *payload) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_metadata_descriptor hdr_desc[SSDFS_SEG_HDR_DESC_MAX]; + struct ssdfs_metadata_descriptor footer_desc[SSDFS_LOG_FOOTER_DESC_MAX]; + size_t desc_size = sizeof(struct ssdfs_metadata_descriptor); + size_t hdr_array_bytes = desc_size * SSDFS_SEG_HDR_DESC_MAX; + size_t footer_array_bytes = desc_size * SSDFS_LOG_FOOTER_DESC_MAX; + struct ssdfs_metadata_descriptor *cur_hdr_desc; + struct page *page; + struct ssdfs_segment_header *hdr; + size_t hdr_size = sizeof(struct ssdfs_segment_header); + struct ssdfs_log_footer *footer; + size_t footer_size = sizeof(struct ssdfs_log_footer); + void *kaddr = NULL; + loff_t peb_offset, offset; + u32 flags = 0; + u32 written = 0; + unsigned i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!sb || !last_sb_log); + BUG_ON(!SSDFS_FS_I(sb)->devops); + BUG_ON(!SSDFS_FS_I(sb)->devops->writepage); + BUG_ON((last_sb_log->page_offset + last_sb_log->pages_count) > + (ULLONG_MAX >> SSDFS_FS_I(sb)->log_pagesize)); + BUG_ON((last_sb_log->leb_id * SSDFS_FS_I(sb)->pebs_per_seg) >= + SSDFS_FS_I(sb)->nsegs); + BUG_ON(last_sb_log->peb_id > + div_u64(ULLONG_MAX, SSDFS_FS_I(sb)->pages_per_peb)); + BUG_ON((last_sb_log->peb_id * SSDFS_FS_I(sb)->pages_per_peb) > + (ULLONG_MAX >> SSDFS_FS_I(sb)->log_pagesize)); + + SSDFS_DBG("sb %p, last_sb_log->leb_id %llu, last_sb_log->peb_id %llu, " + "last_sb_log->page_offset %u, last_sb_log->pages_count %u\n", + sb, last_sb_log->leb_id, last_sb_log->peb_id, + last_sb_log->page_offset, last_sb_log->pages_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = SSDFS_FS_I(sb); + hdr = SSDFS_SEG_HDR(fsi->sbi.vh_buf); + footer = SSDFS_LF(fsi->sbi.vs_buf); + + memset(hdr_desc, 0, hdr_array_bytes); + memset(footer_desc, 0, footer_array_bytes); + + offset = (loff_t)last_sb_log->page_offset << fsi->log_pagesize; + offset += PAGE_SIZE; + + cur_hdr_desc = &hdr_desc[SSDFS_MAPTBL_CACHE_INDEX]; + ssdfs_prepare_maptbl_cache_descriptor(cur_hdr_desc, (u32)offset, + &payload->maptbl_cache, + payload->maptbl_cache.bytes_count); + + offset += payload->maptbl_cache.bytes_count; + + cur_hdr_desc = &hdr_desc[SSDFS_LOG_FOOTER_INDEX]; + cur_hdr_desc->offset = cpu_to_le32(offset); + cur_hdr_desc->size = cpu_to_le32(footer_size); + + ssdfs_memcpy(hdr->desc_array, 0, hdr_array_bytes, + hdr_desc, 0, hdr_array_bytes, + hdr_array_bytes); + + hdr->peb_migration_id[SSDFS_PREV_MIGRATING_PEB] = + SSDFS_PEB_UNKNOWN_MIGRATION_ID; + hdr->peb_migration_id[SSDFS_CUR_MIGRATING_PEB] = + SSDFS_PEB_UNKNOWN_MIGRATION_ID; + + err = ssdfs_prepare_segment_header_for_commit(fsi, + last_sb_log->pages_count, + SSDFS_SB_SEG_TYPE, + SSDFS_LOG_HAS_FOOTER | + SSDFS_LOG_HAS_MAPTBL_CACHE, + timestamp, cno, + hdr); + if (err) { + SSDFS_ERR("fail to prepare segment header: err %d\n", err); + return err; + } + + offset += offsetof(struct ssdfs_log_footer, payload); + cur_hdr_desc = &footer_desc[SSDFS_SNAPSHOT_RULES_AREA_INDEX]; + + err = ssdfs_prepare_snapshot_rules_for_commit(fsi, cur_hdr_desc, + (u32)offset); + if (err == -ENODATA) { + err = 0; + SSDFS_DBG("snapshot rules list is empty\n"); + } else if (err) { + SSDFS_ERR("fail to prepare snapshot rules: err %d\n", err); + return err; + } else + flags |= SSDFS_LOG_FOOTER_HAS_SNAPSHOT_RULES; + + ssdfs_memcpy(footer->desc_array, 0, footer_array_bytes, + footer_desc, 0, footer_array_bytes, + footer_array_bytes); + + err = ssdfs_prepare_log_footer_for_commit(fsi, last_sb_log->pages_count, + flags, timestamp, + cno, footer); + if (err) { + SSDFS_ERR("fail to prepare log footer: err %d\n", err); + return err; + } + + page = ssdfs_super_alloc_page(GFP_KERNEL | __GFP_ZERO); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("unable to allocate memory page\n"); + return err; + } + + /* ->writepage() calls put_page() */ + ssdfs_get_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + /* write segment header */ + ssdfs_lock_page(page); + ssdfs_memcpy_to_page(page, 0, PAGE_SIZE, + fsi->sbi.vh_buf, 0, hdr_size, + hdr_size); + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + SetPageDirty(page); + ssdfs_unlock_page(page); + + peb_offset = last_sb_log->peb_id * fsi->pages_per_peb; + peb_offset <<= fsi->log_pagesize; + offset = (loff_t)last_sb_log->page_offset << fsi->log_pagesize; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(peb_offset > (ULLONG_MAX - (offset + fsi->pagesize))); +#endif /* CONFIG_SSDFS_DEBUG */ + + offset += peb_offset; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("offset %llu\n", offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = fsi->devops->writepage(sb, offset, page, 0, hdr_size); + if (err) { + SSDFS_ERR("fail to write segment header: " + "offset %llu, size %zu\n", + (u64)offset, hdr_size); + goto cleanup_after_failure; + } + + ssdfs_lock_page(page); + ClearPageUptodate(page); + ssdfs_clear_page_private(page, 0); + ssdfs_unlock_page(page); + + offset += fsi->pagesize; + + for (i = 0; i < pagevec_count(&payload->maptbl_cache.pvec); i++) { + struct page *payload_page = payload->maptbl_cache.pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!payload_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + /* ->writepage() calls put_page() */ + ssdfs_get_page(payload_page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + payload_page, + page_ref_count(payload_page)); + + kaddr = kmap_local_page(payload_page); + SSDFS_DBG("PAYLOAD PAGE %d\n", i); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, PAGE_SIZE); + kunmap_local(kaddr); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(payload_page); + ssdfs_set_page_private(payload_page, 0); + SetPageUptodate(payload_page); + SetPageDirty(payload_page); + ssdfs_unlock_page(payload_page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("offset %llu\n", offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = fsi->devops->writepage(sb, offset, payload_page, + 0, PAGE_SIZE); + if (err) { + SSDFS_ERR("fail to write maptbl cache page: " + "offset %llu, page_index %u, size %zu\n", + (u64)offset, i, PAGE_SIZE); + goto cleanup_after_failure; + } + + ssdfs_lock_page(payload_page); + ClearPageUptodate(payload_page); + ssdfs_clear_page_private(page, 0); + ssdfs_unlock_page(payload_page); + + offset += PAGE_SIZE; + } + + /* TODO: write metadata payload */ + + /* ->writepage() calls put_page() */ + ssdfs_get_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + /* write log footer */ + written = 0; + + while (written < fsi->sbi.vs_buf_size) { + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + memset(kaddr, 0, PAGE_SIZE); + ssdfs_memcpy(kaddr, 0, PAGE_SIZE, + fsi->sbi.vs_buf, written, fsi->sbi.vs_buf_size, + PAGE_SIZE); + flush_dcache_page(page); + kunmap_local(kaddr); + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + SetPageDirty(page); + ssdfs_unlock_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("offset %llu\n", offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = fsi->devops->writepage(sb, offset, page, 0, PAGE_SIZE); + if (err) { + SSDFS_ERR("fail to write log footer: " + "offset %llu, size %zu\n", + (u64)offset, PAGE_SIZE); + goto cleanup_after_failure; + } + + ssdfs_lock_page(page); + ClearPageUptodate(page); + ssdfs_clear_page_private(page, 0); + ssdfs_unlock_page(page); + + written += PAGE_SIZE; + offset += PAGE_SIZE; + }; + + ssdfs_super_free_page(page); + return 0; + +cleanup_after_failure: + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_super_free_page(page); + + return err; +} + +static int +__ssdfs_commit_sb_log_inline(struct super_block *sb, + u64 timestamp, u64 cno, + struct ssdfs_peb_extent *last_sb_log, + struct ssdfs_sb_log_payload *payload, + u32 payload_size) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_metadata_descriptor hdr_desc[SSDFS_SEG_HDR_DESC_MAX]; + struct ssdfs_metadata_descriptor footer_desc[SSDFS_LOG_FOOTER_DESC_MAX]; + size_t desc_size = sizeof(struct ssdfs_metadata_descriptor); + size_t hdr_array_bytes = desc_size * SSDFS_SEG_HDR_DESC_MAX; + size_t footer_array_bytes = desc_size * SSDFS_LOG_FOOTER_DESC_MAX; + struct ssdfs_metadata_descriptor *cur_hdr_desc; + struct page *page; + struct page *payload_page; + struct ssdfs_segment_header *hdr; + size_t hdr_size = sizeof(struct ssdfs_segment_header); + struct ssdfs_log_footer *footer; + size_t footer_size = sizeof(struct ssdfs_log_footer); + void *kaddr = NULL; + loff_t peb_offset, offset; + u32 inline_capacity; + void *payload_buf; + u32 flags = 0; + u32 written = 0; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!sb || !last_sb_log); + BUG_ON(!SSDFS_FS_I(sb)->devops); + BUG_ON(!SSDFS_FS_I(sb)->devops->writepage); + BUG_ON((last_sb_log->page_offset + last_sb_log->pages_count) > + (ULLONG_MAX >> SSDFS_FS_I(sb)->log_pagesize)); + BUG_ON((last_sb_log->leb_id * SSDFS_FS_I(sb)->pebs_per_seg) >= + SSDFS_FS_I(sb)->nsegs); + BUG_ON(last_sb_log->peb_id > + div_u64(ULLONG_MAX, SSDFS_FS_I(sb)->pages_per_peb)); + BUG_ON((last_sb_log->peb_id * SSDFS_FS_I(sb)->pages_per_peb) > + (ULLONG_MAX >> SSDFS_FS_I(sb)->log_pagesize)); + + SSDFS_DBG("sb %p, last_sb_log->leb_id %llu, last_sb_log->peb_id %llu, " + "last_sb_log->page_offset %u, last_sb_log->pages_count %u\n", + sb, last_sb_log->leb_id, last_sb_log->peb_id, + last_sb_log->page_offset, last_sb_log->pages_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = SSDFS_FS_I(sb); + hdr = SSDFS_SEG_HDR(fsi->sbi.vh_buf); + footer = SSDFS_LF(fsi->sbi.vs_buf); + + memset(hdr_desc, 0, hdr_array_bytes); + memset(footer_desc, 0, footer_array_bytes); + + offset = (loff_t)last_sb_log->page_offset << fsi->log_pagesize; + offset += hdr_size; + + cur_hdr_desc = &hdr_desc[SSDFS_MAPTBL_CACHE_INDEX]; + ssdfs_prepare_maptbl_cache_descriptor(cur_hdr_desc, (u32)offset, + &payload->maptbl_cache, + payload_size); + + offset += payload_size; + + offset += fsi->pagesize - 1; + offset = (offset >> fsi->log_pagesize) << fsi->log_pagesize; + + cur_hdr_desc = &hdr_desc[SSDFS_LOG_FOOTER_INDEX]; + cur_hdr_desc->offset = cpu_to_le32(offset); + cur_hdr_desc->size = cpu_to_le32(footer_size); + + ssdfs_memcpy(hdr->desc_array, 0, hdr_array_bytes, + hdr_desc, 0, hdr_array_bytes, + hdr_array_bytes); + + hdr->peb_migration_id[SSDFS_PREV_MIGRATING_PEB] = + SSDFS_PEB_UNKNOWN_MIGRATION_ID; + hdr->peb_migration_id[SSDFS_CUR_MIGRATING_PEB] = + SSDFS_PEB_UNKNOWN_MIGRATION_ID; + + err = ssdfs_prepare_segment_header_for_commit(fsi, + last_sb_log->pages_count, + SSDFS_SB_SEG_TYPE, + SSDFS_LOG_HAS_FOOTER | + SSDFS_LOG_HAS_MAPTBL_CACHE, + timestamp, cno, + hdr); + if (err) { + SSDFS_ERR("fail to prepare segment header: err %d\n", err); + return err; + } + + offset += offsetof(struct ssdfs_log_footer, payload); + cur_hdr_desc = &footer_desc[SSDFS_SNAPSHOT_RULES_AREA_INDEX]; + + err = ssdfs_prepare_snapshot_rules_for_commit(fsi, cur_hdr_desc, + (u32)offset); + if (err == -ENODATA) { + err = 0; + SSDFS_DBG("snapshot rules list is empty\n"); + } else if (err) { + SSDFS_ERR("fail to prepare snapshot rules: err %d\n", err); + return err; + } else + flags |= SSDFS_LOG_FOOTER_HAS_SNAPSHOT_RULES; + + ssdfs_memcpy(footer->desc_array, 0, footer_array_bytes, + footer_desc, 0, footer_array_bytes, + footer_array_bytes); + + err = ssdfs_prepare_log_footer_for_commit(fsi, last_sb_log->pages_count, + flags, timestamp, + cno, footer); + if (err) { + SSDFS_ERR("fail to prepare log footer: err %d\n", err); + return err; + } + + if (pagevec_count(&payload->maptbl_cache.pvec) != 1) { + SSDFS_WARN("payload contains several memory pages\n"); + return -ERANGE; + } + + inline_capacity = PAGE_SIZE - hdr_size; + + if (payload_size > inline_capacity) { + SSDFS_ERR("payload_size %u > inline_capacity %u\n", + payload_size, inline_capacity); + return -ERANGE; + } + + payload_buf = ssdfs_super_kmalloc(inline_capacity, GFP_KERNEL); + if (!payload_buf) { + SSDFS_ERR("fail to allocate payload buffer\n"); + return -ENOMEM; + } + + page = ssdfs_super_alloc_page(GFP_KERNEL | __GFP_ZERO); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("unable to allocate memory page\n"); + ssdfs_super_kfree(payload_buf); + return err; + } + + /* ->writepage() calls put_page() */ + ssdfs_get_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + payload_page = payload->maptbl_cache.pvec.pages[0]; + if (!payload_page) { + err = -ERANGE; + SSDFS_ERR("invalid payload page\n"); + goto free_payload_buffer; + } + + ssdfs_lock_page(payload_page); + err = ssdfs_memcpy_from_page(payload_buf, 0, inline_capacity, + payload_page, 0, PAGE_SIZE, + payload_size); + ssdfs_unlock_page(payload_page); + + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + goto free_payload_buffer; + } + + /* write segment header + payload */ + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + ssdfs_memcpy(kaddr, 0, PAGE_SIZE, + fsi->sbi.vh_buf, 0, hdr_size, + hdr_size); + err = ssdfs_memcpy(kaddr, hdr_size, PAGE_SIZE, + payload_buf, 0, inline_capacity, + payload_size); + flush_dcache_page(page); + kunmap_local(kaddr); + if (!err) { + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + SetPageDirty(page); + } + ssdfs_unlock_page(page); + + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + goto free_payload_buffer; + } + +free_payload_buffer: + ssdfs_super_kfree(payload_buf); + + if (unlikely(err)) + goto cleanup_after_failure; + + peb_offset = last_sb_log->peb_id * fsi->pages_per_peb; + peb_offset <<= fsi->log_pagesize; + offset = (loff_t)last_sb_log->page_offset << fsi->log_pagesize; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(peb_offset > (ULLONG_MAX - (offset + fsi->pagesize))); +#endif /* CONFIG_SSDFS_DEBUG */ + + offset += peb_offset; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("offset %llu\n", offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = fsi->devops->writepage(sb, offset, page, 0, + hdr_size + payload_size); + if (err) { + SSDFS_ERR("fail to write segment header: " + "offset %llu, size %zu\n", + (u64)offset, hdr_size + payload_size); + goto cleanup_after_failure; + } + + ssdfs_lock_page(page); + ClearPageUptodate(page); + ssdfs_clear_page_private(page, 0); + ssdfs_unlock_page(page); + + offset += fsi->pagesize; + + /* ->writepage() calls put_page() */ + ssdfs_get_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + /* write log footer */ + written = 0; + + while (written < fsi->sbi.vs_buf_size) { + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + memset(kaddr, 0, PAGE_SIZE); + ssdfs_memcpy(kaddr, 0, PAGE_SIZE, + fsi->sbi.vs_buf, written, fsi->sbi.vs_buf_size, + PAGE_SIZE); + flush_dcache_page(page); + kunmap_local(kaddr); + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + SetPageDirty(page); + ssdfs_unlock_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("offset %llu\n", offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = fsi->devops->writepage(sb, offset, page, 0, PAGE_SIZE); + if (err) { + SSDFS_ERR("fail to write log footer: " + "offset %llu, size %zu\n", + (u64)offset, PAGE_SIZE); + goto cleanup_after_failure; + } + + ssdfs_lock_page(page); + ClearPageUptodate(page); + ssdfs_clear_page_private(page, 0); + ssdfs_unlock_page(page); + + written += PAGE_SIZE; + offset += PAGE_SIZE; + }; + + ssdfs_super_free_page(page); + return 0; + +cleanup_after_failure: + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_super_free_page(page); + + return err; +} + +static int ssdfs_commit_sb_log(struct super_block *sb, + u64 timestamp, u64 cno, + struct ssdfs_peb_extent *last_sb_log, + struct ssdfs_sb_log_payload *payload) +{ + size_t hdr_size = sizeof(struct ssdfs_segment_header); + u32 inline_capacity; + u32 payload_size; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!sb || !last_sb_log || !payload); + + SSDFS_DBG("sb %p, last_sb_log->leb_id %llu, last_sb_log->peb_id %llu, " + "last_sb_log->page_offset %u, last_sb_log->pages_count %u\n", + sb, last_sb_log->leb_id, last_sb_log->peb_id, + last_sb_log->page_offset, last_sb_log->pages_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + inline_capacity = PAGE_SIZE - hdr_size; + payload_size = ssdfs_sb_payload_size(&payload->maptbl_cache.pvec); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("inline_capacity %u, payload_size %u\n", + inline_capacity, payload_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (payload_size > inline_capacity) { + err = __ssdfs_commit_sb_log(sb, timestamp, cno, + last_sb_log, payload); + } else { + err = __ssdfs_commit_sb_log_inline(sb, timestamp, cno, + last_sb_log, + payload, payload_size); + } + + if (unlikely(err)) + SSDFS_ERR("fail to commit sb log: err %d\n", err); + + return err; +} + +static +int ssdfs_commit_super(struct super_block *sb, u16 fs_state, + struct ssdfs_peb_extent *last_sb_log, + struct ssdfs_sb_log_payload *payload) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + __le64 cur_segs[SSDFS_CUR_SEGS_COUNT]; + size_t size = sizeof(__le64) * SSDFS_CUR_SEGS_COUNT; + u64 timestamp = ssdfs_current_timestamp(); + u64 cno = ssdfs_current_cno(sb); + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!sb || !last_sb_log || !payload); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("sb %p, fs_state %u", sb, fs_state); +#else + SSDFS_DBG("sb %p, fs_state %u", sb, fs_state); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + BUG_ON(fs_state > SSDFS_LAST_KNOWN_FS_STATE); + + if (le16_to_cpu(fsi->vs->state) == SSDFS_ERROR_FS && + !ssdfs_test_opt(fsi->mount_opts, IGNORE_FS_STATE)) { + SSDFS_DBG("refuse commit superblock: fs erroneous state\n"); + return 0; + } + + err = ssdfs_prepare_volume_header_for_commit(fsi, fsi->vh); + if (unlikely(err)) { + SSDFS_CRIT("volume header is inconsistent: err %d\n", err); + goto finish_commit_super; + } + + err = ssdfs_prepare_current_segment_ids(fsi, cur_segs, size); + if (unlikely(err)) { + SSDFS_CRIT("fail to prepare current segments IDs: err %d\n", + err); + goto finish_commit_super; + } + + err = ssdfs_prepare_volume_state_info_for_commit(fsi, fs_state, + cur_segs, size, + timestamp, + cno, + fsi->vs); + if (unlikely(err)) { + SSDFS_CRIT("volume state info is inconsistent: err %d\n", err); + goto finish_commit_super; + } + + for (i = 0; i < SSDFS_SB_SEG_COPY_MAX; i++) { + last_sb_log->leb_id = fsi->sb_lebs[SSDFS_CUR_SB_SEG][i]; + last_sb_log->peb_id = fsi->sb_pebs[SSDFS_CUR_SB_SEG][i]; + err = ssdfs_commit_sb_log(sb, timestamp, cno, + last_sb_log, payload); + if (err) { + SSDFS_ERR("fail to commit superblock log: " + "leb_id %llu, peb_id %llu, " + "page_offset %u, pages_count %u, " + "err %d\n", + last_sb_log->leb_id, + last_sb_log->peb_id, + last_sb_log->page_offset, + last_sb_log->pages_count, + err); + goto finish_commit_super; + } + } + + last_sb_log->leb_id = fsi->sb_lebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG]; + last_sb_log->peb_id = fsi->sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG]; + + ssdfs_memcpy(&fsi->sbi.last_log, + 0, sizeof(struct ssdfs_peb_extent), + last_sb_log, + 0, sizeof(struct ssdfs_peb_extent), + sizeof(struct ssdfs_peb_extent)); + +finish_commit_super: +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: err %d\n", err); +#else + SSDFS_DBG("finished: err %d\n", err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + static void ssdfs_memory_page_locks_checker_init(void) { #ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING From patchwork Sat Feb 25 01:08:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151910 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1522AC7EE2F for ; Sat, 25 Feb 2023 01:16:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229677AbjBYBQE (ORCPT ); Fri, 24 Feb 2023 20:16:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48432 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229667AbjBYBQA (ORCPT ); Fri, 24 Feb 2023 20:16:00 -0500 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A35D12BD6 for ; Fri, 24 Feb 2023 17:15:49 -0800 (PST) Received: by mail-oi1-x22e.google.com with SMTP id bk32so761998oib.10 for ; Fri, 24 Feb 2023 17:15:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=iRM+uQC6vNS1rbOwsqnY+bt00yEm1cpxdJOlnBTWJas=; b=E6jg9CIUoGekbcUHIWAnnPy3yReemhQL1e1OgTxTbM7aqL4wBmvrANjFei2pFaJMag JZU5kFdk5+3jRrjBhTpWtxTcyfs+nZauQ42iLmxdpAkdI6CjYVQCNOPu89t6DwdFY10g tVxGfWZoXXdgyBtEqulYL5URNh3Zijg/YP4cQLoVUbfXMy5mXUFs47UkAKW9rG4qB+iK bjFSUxK6Dyh6XA5HB70a9fnwvqXuGNCAuPvS3ISCaDS9TE+uuRG8J+ZzfBuCSpuyv1Zv 6gOSIX+a1jK2ulUTpt2VJ4ip4NoBIHmtqZ1Ao+uBJsrrDiwfKDEW1meOrAW4Qk9URyjV C28Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iRM+uQC6vNS1rbOwsqnY+bt00yEm1cpxdJOlnBTWJas=; b=uRkHZ62cL4OycFyuDhPBCMtbyZRwZvRSsErbJ1iS5Hhc9dnzwhEBKOx4EmHCppF7gX ikalTwhzJO04TFT0B2h11QawPTxncIT1AGQtHKeA+z6Qnb0AfzJq0iGaiXA5KwcRQctf KL/Nw9lAK0lMcFpZ6VbnoHWUFO2kVJrS8/OnYFAFBzN/cvONdl9o/Zzc8IHZvQB424NE SRyo/IrU728sIiUcAe/mj1FAVYt36Ib1d3JXmXtrdIwQ1ZdQsYy0nrWpSn9nRM9tM6sW iZeSPOuM6cMFS2ROEkkt/NYpDmeLhzErLTaMZg8xE2QHzymEFjksbBoK4q10nL9dFjFh 4aeA== X-Gm-Message-State: AO0yUKUbl33hRYaJjEiOu1xTMzR8TJMDuwPVJc0cwtCe+oxzlCG3GLHo fQkwWuMfg7Vg+XBGWmrDDuqm+8HIQ/S2ksOy X-Google-Smtp-Source: AK7set93sVns4lBi/V71hr1xd5ltLBc2P7w8oU0v6G3aQQPFix/1zjXUECaTEPokhw3LT0KDWkrfcg== X-Received: by 2002:a05:6808:3a4:b0:37f:8a18:d785 with SMTP id n4-20020a05680803a400b0037f8a18d785mr6527042oie.32.1677287747456; Fri, 24 Feb 2023 17:15:47 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.15.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:15:46 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 06/76] ssdfs: segment header + log footer operations Date: Fri, 24 Feb 2023 17:08:17 -0800 Message-Id: <20230225010927.813929-7-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org SSDFS is Log-structured File System (LFS). It means that volume is a sequence of segments that can contain one or several erase blocks. Any write operation into erase block is a creation of log. And content of every erase block is a sequence of logs. Log can be full or partial. Full log starts by header and it is finished by footer. The size of full log is fixed and it is defined during mkfs phase. However, tunefs tool can change this value. But if commit operation has not enough data to prepare the full log, then partial log is created. Partial log starts with partial log header and it hasn't footer. The partial log can be imagined like mixture of segment header and log footer. Segment header can be considered like static superblock info. It contains metadata that not changed at all after volume creation (logical block size, for example) or changed rarely (number of segments in the volume, for example). Log footer can be considered like dynamic part of superblock because it contains frequently updated metadata (for example, root node of inodes b-tree). Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/log_footer.c | 901 +++++++++++++++++++++++++++ fs/ssdfs/volume_header.c | 1256 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 2157 insertions(+) create mode 100644 fs/ssdfs/log_footer.c create mode 100644 fs/ssdfs/volume_header.c diff --git a/fs/ssdfs/log_footer.c b/fs/ssdfs/log_footer.c new file mode 100644 index 000000000000..f56a268f310e --- /dev/null +++ b/fs/ssdfs/log_footer.c @@ -0,0 +1,901 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/log_footer.c - operations with log footer. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "segment_bitmap.h" +#include "offset_translation_table.h" +#include "page_array.h" +#include "page_vector.h" +#include "peb_container.h" +#include "segment.h" +#include "current_segment.h" + +#include + +/* + * __is_ssdfs_log_footer_magic_valid() - check log footer's magic + * @magic: pointer on magic value + */ +bool __is_ssdfs_log_footer_magic_valid(struct ssdfs_signature *magic) +{ + return le16_to_cpu(magic->key) == SSDFS_LOG_FOOTER_MAGIC; +} + +/* + * is_ssdfs_log_footer_magic_valid() - check log footer's magic + * @footer: log footer + */ +bool is_ssdfs_log_footer_magic_valid(struct ssdfs_log_footer *footer) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!footer); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __is_ssdfs_log_footer_magic_valid(&footer->volume_state.magic); +} + +/* + * is_ssdfs_log_footer_csum_valid() - check log footer's checksum + * @buf: buffer with log footer + * @size: size of buffer in bytes + */ +bool is_ssdfs_log_footer_csum_valid(void *buf, size_t buf_size) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + return is_csum_valid(&SSDFS_LF(buf)->volume_state.check, buf, buf_size); +} + +/* + * is_ssdfs_volume_state_info_consistent() - check volume state consistency + * @fsi: pointer on shared file system object + * @buf: log header + * @footer: log footer + * @dev_size: partition size in bytes + * + * RETURN: + * [true] - volume state metadata is consistent. + * [false] - volume state metadata is corrupted. + */ +bool is_ssdfs_volume_state_info_consistent(struct ssdfs_fs_info *fsi, + void *buf, + struct ssdfs_log_footer *footer, + u64 dev_size) +{ + struct ssdfs_signature *magic; + u64 nsegs; + u64 free_pages; + u8 log_segsize = U8_MAX; + u32 seg_size = U32_MAX; + u32 page_size = U32_MAX; + u64 cno = U64_MAX; + u16 log_pages = U16_MAX; + u32 log_bytes = U32_MAX; + u64 pages_count; + u32 pages_per_seg; + u32 remainder; + u16 fs_state; + u16 fs_errors; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!buf || !footer); + + SSDFS_DBG("buf %p, footer %p, dev_size %llu\n", + buf, footer, dev_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + magic = (struct ssdfs_signature *)buf; + + if (!is_ssdfs_magic_valid(magic)) { + SSDFS_DBG("valid magic is not detected\n"); + return -ERANGE; + } + + if (__is_ssdfs_segment_header_magic_valid(magic)) { + struct ssdfs_segment_header *hdr; + struct ssdfs_volume_header *vh; + + hdr = SSDFS_SEG_HDR(buf); + vh = SSDFS_VH(buf); + + log_segsize = vh->log_segsize; + seg_size = 1 << vh->log_segsize; + page_size = 1 << vh->log_pagesize; + cno = le64_to_cpu(hdr->cno); + log_pages = le16_to_cpu(hdr->log_pages); + } else if (is_ssdfs_partial_log_header_magic_valid(magic)) { + struct ssdfs_partial_log_header *pl_hdr; + + pl_hdr = SSDFS_PLH(buf); + + log_segsize = pl_hdr->log_segsize; + seg_size = 1 << pl_hdr->log_segsize; + page_size = 1 << pl_hdr->log_pagesize; + cno = le64_to_cpu(pl_hdr->cno); + log_pages = le16_to_cpu(pl_hdr->log_pages); + } else { + SSDFS_DBG("log header is corrupted\n"); + return -EIO; + } + + nsegs = le64_to_cpu(footer->volume_state.nsegs); + + if (nsegs == 0 || nsegs > (dev_size >> log_segsize)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("invalid nsegs %llu, dev_size %llu, seg_size) %u\n", + nsegs, dev_size, seg_size); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + free_pages = le64_to_cpu(footer->volume_state.free_pages); + + pages_count = div_u64_rem(dev_size, page_size, &remainder); + if (remainder) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dev_size %llu is unaligned on page_size %u\n", + dev_size, page_size); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + if (free_pages > pages_count) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_pages %llu is greater than pages_count %llu\n", + free_pages, pages_count); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + pages_per_seg = seg_size / page_size; + if (nsegs <= div_u64(free_pages, pages_per_seg)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("invalid nsegs %llu, free_pages %llu, " + "pages_per_seg %u\n", + nsegs, free_pages, pages_per_seg); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + if (cno > le64_to_cpu(footer->cno)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("create_cno %llu is greater than write_cno %llu\n", + cno, le64_to_cpu(footer->cno)); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + log_bytes = (u32)log_pages * fsi->pagesize; + if (le32_to_cpu(footer->log_bytes) > log_bytes) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hdr log_bytes %u > footer log_bytes %u\n", + log_bytes, + le32_to_cpu(footer->log_bytes)); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EIO; + } + + fs_state = le16_to_cpu(footer->volume_state.state); + if (fs_state > SSDFS_LAST_KNOWN_FS_STATE) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unknown FS state %#x\n", + fs_state); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + fs_errors = le16_to_cpu(footer->volume_state.errors); + if (fs_errors > SSDFS_LAST_KNOWN_FS_ERROR) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unknown FS error %#x\n", + fs_errors); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + return true; +} + +/* + * ssdfs_check_log_footer() - check log footer consistency + * @fsi: pointer on shared file system object + * @buf: log header + * @footer: log footer + * @silent: show error or not? + * + * This function checks consistency of log footer. + * + * RETURN: + * [success] - log footer is consistent. + * [failure] - error code: + * + * %-ENODATA - valid magic doesn't detected. + * %-EIO - log footer is corrupted. + */ +int ssdfs_check_log_footer(struct ssdfs_fs_info *fsi, + void *buf, + struct ssdfs_log_footer *footer, + bool silent) +{ + struct ssdfs_volume_state *vs; + size_t footer_size = sizeof(struct ssdfs_log_footer); + u64 dev_size; + bool major_magic_valid, minor_magic_valid; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !buf || !footer); + + SSDFS_DBG("fsi %p, buf %p, footer %p, silent %#x\n", + fsi, buf, footer, silent); +#endif /* CONFIG_SSDFS_DEBUG */ + + vs = SSDFS_VS(footer); + + major_magic_valid = is_ssdfs_magic_valid(&vs->magic); + minor_magic_valid = is_ssdfs_log_footer_magic_valid(footer); + + if (!major_magic_valid && !minor_magic_valid) { + if (!silent) + SSDFS_ERR("valid magic doesn't detected\n"); + else + SSDFS_DBG("valid magic doesn't detected\n"); + return -ENODATA; + } else if (!major_magic_valid) { + if (!silent) + SSDFS_ERR("invalid SSDFS magic signature\n"); + else + SSDFS_DBG("invalid SSDFS magic signature\n"); + return -EIO; + } else if (!minor_magic_valid) { + if (!silent) + SSDFS_ERR("invalid log footer magic signature\n"); + else + SSDFS_DBG("invalid log footer magic signature\n"); + return -EIO; + } + + if (!is_ssdfs_log_footer_csum_valid(footer, footer_size)) { + if (!silent) + SSDFS_ERR("invalid checksum of log footer\n"); + else + SSDFS_DBG("invalid checksum of log footer\n"); + return -EIO; + } + + dev_size = fsi->devops->device_size(fsi->sb); + if (!is_ssdfs_volume_state_info_consistent(fsi, buf, + footer, dev_size)) { + if (!silent) + SSDFS_ERR("log footer is corrupted\n"); + else + SSDFS_DBG("log footer is corrupted\n"); + return -EIO; + } + + if (le32_to_cpu(footer->log_flags) & ~SSDFS_LOG_FOOTER_FLAG_MASK) { + if (!silent) { + SSDFS_ERR("corrupted log_flags %#x\n", + le32_to_cpu(footer->log_flags)); + } else { + SSDFS_DBG("corrupted log_flags %#x\n", + le32_to_cpu(footer->log_flags)); + } + return -EIO; + } + + return 0; +} + +/* + * ssdfs_read_unchecked_log_footer() - read log footer without check + * @fsi: pointer on shared file system object + * @peb_id: PEB identification number + * @bytes_off: offset inside PEB in bytes + * @buf: buffer for log footer + * @silent: show error or not? + * @log_pages: number of pages in the log + * + * This function reads log footer without + * the consistency check. + * + * RETURN: + * [success] - log footer is consistent. + * [failure] - error code: + * + * %-ENODATA - valid magic doesn't detected. + * %-EIO - log footer is corrupted. + */ +int ssdfs_read_unchecked_log_footer(struct ssdfs_fs_info *fsi, + u64 peb_id, u32 bytes_off, + void *buf, bool silent, + u32 *log_pages) +{ + struct ssdfs_signature *magic; + struct ssdfs_log_footer *footer; + struct ssdfs_volume_state *vs; + size_t footer_size = sizeof(struct ssdfs_log_footer); + struct ssdfs_partial_log_header *pl_hdr; + size_t hdr_size = sizeof(struct ssdfs_partial_log_header); + bool major_magic_valid, minor_magic_valid; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !fsi->devops->read); + BUG_ON(!buf || !log_pages); + BUG_ON(bytes_off >= (fsi->pages_per_peb * fsi->pagesize)); + + SSDFS_DBG("peb_id %llu, bytes_off %u, buf %p\n", + peb_id, bytes_off, buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + *log_pages = U32_MAX; + + err = ssdfs_unaligned_read_buffer(fsi, peb_id, bytes_off, + buf, footer_size); + if (unlikely(err)) { + if (!silent) { + SSDFS_ERR("fail to read log footer: " + "peb_id %llu, bytes_off %u, err %d\n", + peb_id, bytes_off, err); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fail to read log footer: " + "peb_id %llu, bytes_off %u, err %d\n", + peb_id, bytes_off, err); +#endif /* CONFIG_SSDFS_DEBUG */ + } + return err; + } + + magic = (struct ssdfs_signature *)buf; + + if (!is_ssdfs_magic_valid(magic)) { + if (!silent) + SSDFS_ERR("valid magic is not detected\n"); + else + SSDFS_DBG("valid magic is not detected\n"); + + return -ENODATA; + } + + if (__is_ssdfs_log_footer_magic_valid(magic)) { + footer = SSDFS_LF(buf); + vs = SSDFS_VS(footer); + + major_magic_valid = is_ssdfs_magic_valid(&vs->magic); + minor_magic_valid = is_ssdfs_log_footer_magic_valid(footer); + + if (!major_magic_valid && !minor_magic_valid) { + if (!silent) + SSDFS_ERR("valid magic doesn't detected\n"); + else + SSDFS_DBG("valid magic doesn't detected\n"); + return -ENODATA; + } else if (!major_magic_valid) { + if (!silent) + SSDFS_ERR("invalid SSDFS magic signature\n"); + else + SSDFS_DBG("invalid SSDFS magic signature\n"); + return -EIO; + } else if (!minor_magic_valid) { + if (!silent) + SSDFS_ERR("invalid log footer magic\n"); + else + SSDFS_DBG("invalid log footer magic\n"); + return -EIO; + } + + if (!is_ssdfs_log_footer_csum_valid(footer, footer_size)) { + if (!silent) + SSDFS_ERR("invalid checksum of log footer\n"); + else + SSDFS_DBG("invalid checksum of log footer\n"); + return -EIO; + } + + *log_pages = le32_to_cpu(footer->log_bytes); + *log_pages /= fsi->pagesize; + + if (*log_pages == 0 || *log_pages >= fsi->pages_per_peb) { + if (!silent) + SSDFS_ERR("invalid log pages %u\n", *log_pages); + else + SSDFS_DBG("invalid log pages %u\n", *log_pages); + return -EIO; + } + } else if (is_ssdfs_partial_log_header_magic_valid(magic)) { + pl_hdr = SSDFS_PLH(buf); + + major_magic_valid = is_ssdfs_magic_valid(&pl_hdr->magic); + minor_magic_valid = + is_ssdfs_partial_log_header_magic_valid(&pl_hdr->magic); + + if (!major_magic_valid && !minor_magic_valid) { + if (!silent) + SSDFS_ERR("valid magic doesn't detected\n"); + else + SSDFS_DBG("valid magic doesn't detected\n"); + return -ENODATA; + } else if (!major_magic_valid) { + if (!silent) + SSDFS_ERR("invalid SSDFS magic signature\n"); + else + SSDFS_DBG("invalid SSDFS magic signature\n"); + return -EIO; + } else if (!minor_magic_valid) { + if (!silent) + SSDFS_ERR("invalid partial log header magic\n"); + else + SSDFS_DBG("invalid partial log header magic\n"); + return -EIO; + } + + if (!is_ssdfs_partial_log_header_csum_valid(pl_hdr, hdr_size)) { + if (!silent) + SSDFS_ERR("invalid checksum of footer\n"); + else + SSDFS_DBG("invalid checksum of footer\n"); + return -EIO; + } + + *log_pages = le32_to_cpu(pl_hdr->log_bytes); + *log_pages /= fsi->pagesize; + + if (*log_pages == 0 || *log_pages >= fsi->pages_per_peb) { + if (!silent) + SSDFS_ERR("invalid log pages %u\n", *log_pages); + else + SSDFS_DBG("invalid log pages %u\n", *log_pages); + return -EIO; + } + } else { + if (!silent) { + SSDFS_ERR("log footer is corrupted: " + "peb_id %llu, bytes_off %u\n", + peb_id, bytes_off); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log footer is corrupted: " + "peb_id %llu, bytes_off %u\n", + peb_id, bytes_off); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return -EIO; + } + + return 0; +} + +/* + * ssdfs_read_checked_log_footer() - read and check log footer + * @fsi: pointer on shared file system object + * @log_hdr: log header + * @peb_id: PEB identification number + * @bytes_off: offset inside PEB in bytes + * @buf: buffer for log footer + * @silent: show error or not? + * + * This function reads and checks consistency of log footer. + * + * RETURN: + * [success] - log footer is consistent. + * [failure] - error code: + * + * %-ENODATA - valid magic doesn't detected. + * %-EIO - log footer is corrupted. + */ +int ssdfs_read_checked_log_footer(struct ssdfs_fs_info *fsi, void *log_hdr, + u64 peb_id, u32 bytes_off, void *buf, + bool silent) +{ + struct ssdfs_signature *magic; + struct ssdfs_log_footer *footer; + struct ssdfs_partial_log_header *pl_hdr; + size_t footer_size = sizeof(struct ssdfs_log_footer); + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !fsi->devops->read); + BUG_ON(!log_hdr || !buf); + BUG_ON(bytes_off >= (fsi->pages_per_peb * fsi->pagesize)); + + SSDFS_DBG("peb_id %llu, bytes_off %u, buf %p\n", + peb_id, bytes_off, buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_unaligned_read_buffer(fsi, peb_id, bytes_off, + buf, footer_size); + if (unlikely(err)) { + if (!silent) { + SSDFS_ERR("fail to read log footer: " + "peb_id %llu, bytes_off %u, err %d\n", + peb_id, bytes_off, err); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fail to read log footer: " + "peb_id %llu, bytes_off %u, err %d\n", + peb_id, bytes_off, err); +#endif /* CONFIG_SSDFS_DEBUG */ + } + return err; + } + + magic = (struct ssdfs_signature *)buf; + + if (!is_ssdfs_magic_valid(magic)) { + if (!silent) + SSDFS_ERR("valid magic is not detected\n"); + else + SSDFS_DBG("valid magic is not detected\n"); + + return -ENODATA; + } + + if (__is_ssdfs_log_footer_magic_valid(magic)) { + footer = SSDFS_LF(buf); + + err = ssdfs_check_log_footer(fsi, log_hdr, footer, silent); + if (err) { + if (!silent) { + SSDFS_ERR("log footer is corrupted: " + "peb_id %llu, bytes_off %u, err %d\n", + peb_id, bytes_off, err); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log footer is corrupted: " + "peb_id %llu, bytes_off %u, err %d\n", + peb_id, bytes_off, err); +#endif /* CONFIG_SSDFS_DEBUG */ + } + return err; + } + } else if (is_ssdfs_partial_log_header_magic_valid(magic)) { + pl_hdr = SSDFS_PLH(buf); + + err = ssdfs_check_partial_log_header(fsi, pl_hdr, silent); + if (unlikely(err)) { + if (!silent) { + SSDFS_ERR("partial log header is corrupted: " + "peb_id %llu, bytes_off %u\n", + peb_id, bytes_off); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("partial log header is corrupted: " + "peb_id %llu, bytes_off %u\n", + peb_id, bytes_off); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return err; + } + } else { + if (!silent) { + SSDFS_ERR("log footer is corrupted: " + "peb_id %llu, bytes_off %u\n", + peb_id, bytes_off); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log footer is corrupted: " + "peb_id %llu, bytes_off %u\n", + peb_id, bytes_off); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return -EIO; + } + + return 0; +} + +/* + * ssdfs_store_nsegs() - store volume's segments number in volume state + * @fsi: pointer on shared file system object + * @vs: volume state [out] + * + * This function stores volume's segments number in volume state. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOLCK - volume is under resize. + */ +static inline +int ssdfs_store_nsegs(struct ssdfs_fs_info *fsi, + struct ssdfs_volume_state *vs) +{ + mutex_lock(&fsi->resize_mutex); + vs->nsegs = cpu_to_le64(fsi->nsegs); + mutex_unlock(&fsi->resize_mutex); + + return 0; +} + +/* + * ssdfs_prepare_current_segment_ids() - prepare current segment IDs + * @fsi: pointer on shared file system object + * @array: pointer on array of IDs [out] + * @size: size the array in bytes + * + * This function prepares the current segment IDs. + * + * RETURN: + * [success] + * [failure] - error code. + */ +int ssdfs_prepare_current_segment_ids(struct ssdfs_fs_info *fsi, + __le64 *array, + size_t size) +{ + size_t count = size / sizeof(__le64); + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !array); + + SSDFS_DBG("fsi %p, array %p, size %zu\n", + fsi, array, size); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (size != (sizeof(__le64) * SSDFS_CUR_SEGS_COUNT)) { + SSDFS_ERR("invalid array size %zu\n", + size); + return -EINVAL; + } + + memset(array, 0xFF, size); + + if (fsi->cur_segs) { + down_read(&fsi->cur_segs->lock); + for (i = 0; i < count; i++) { + struct ssdfs_segment_info *real_seg; + u64 seg; + + if (!fsi->cur_segs->objects[i]) + continue; + + ssdfs_current_segment_lock(fsi->cur_segs->objects[i]); + + real_seg = fsi->cur_segs->objects[i]->real_seg; + if (real_seg) { + seg = real_seg->seg_id; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index %d, seg_id %llu\n", + i, seg); +#endif /* CONFIG_SSDFS_DEBUG */ + array[i] = cpu_to_le64(seg); + } else { + seg = fsi->cur_segs->objects[i]->seg_id; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index %d, seg_id %llu\n", + i, seg); +#endif /* CONFIG_SSDFS_DEBUG */ + array[i] = cpu_to_le64(seg); + } + + ssdfs_current_segment_unlock(fsi->cur_segs->objects[i]); + } + up_read(&fsi->cur_segs->lock); + } + + return 0; +} + +/* + * ssdfs_prepare_volume_state_info_for_commit() - prepare volume state + * @fsi: pointer on shared file system object + * @fs_state: file system state + * @array: pointer on array of IDs + * @size: size the array in bytes + * @last_log_time: log creation timestamp + * @last_log_cno: last log checkpoint + * @vs: volume state [out] + * + * This function prepares volume state info for commit. + * + * RETURN: + * [success] + * [failure] - error code. + */ +int ssdfs_prepare_volume_state_info_for_commit(struct ssdfs_fs_info *fsi, + u16 fs_state, + __le64 *cur_segs, + size_t size, + u64 last_log_time, + u64 last_log_cno, + struct ssdfs_volume_state *vs) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !vs); + + SSDFS_DBG("fsi %p, fs_state %#x\n", fsi, fs_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (size != (sizeof(__le64) * SSDFS_CUR_SEGS_COUNT)) { + SSDFS_ERR("invalid array size %zu\n", + size); + return -EINVAL; + } + + err = ssdfs_store_nsegs(fsi, vs); + if (err) { + SSDFS_DBG("unable to store segments number: err %d\n", err); + return err; + } + + vs->magic.common = cpu_to_le32(SSDFS_SUPER_MAGIC); + vs->magic.version.major = SSDFS_MAJOR_REVISION; + vs->magic.version.minor = SSDFS_MINOR_REVISION; + + spin_lock(&fsi->volume_state_lock); + + fsi->fs_mod_time = last_log_time; + fsi->fs_state = fs_state; + + vs->free_pages = cpu_to_le64(fsi->free_pages); + vs->timestamp = cpu_to_le64(last_log_time); + vs->cno = cpu_to_le64(last_log_cno); + vs->flags = cpu_to_le32(fsi->fs_flags); + vs->state = cpu_to_le16(fs_state); + vs->errors = cpu_to_le16(fsi->fs_errors); + vs->feature_compat = cpu_to_le64(fsi->fs_feature_compat); + vs->feature_compat_ro = cpu_to_le64(fsi->fs_feature_compat_ro); + vs->feature_incompat = cpu_to_le64(fsi->fs_feature_incompat); + + ssdfs_memcpy(vs->uuid, 0, SSDFS_UUID_SIZE, + fsi->vs->uuid, 0, SSDFS_UUID_SIZE, + SSDFS_UUID_SIZE); + ssdfs_memcpy(vs->label, 0, SSDFS_VOLUME_LABEL_MAX, + fsi->vs->label, 0, SSDFS_VOLUME_LABEL_MAX, + SSDFS_VOLUME_LABEL_MAX); + ssdfs_memcpy(vs->cur_segs, 0, size, + cur_segs, 0, size, + size); + + vs->migration_threshold = cpu_to_le16(fsi->migration_threshold); + vs->open_zones = cpu_to_le32(atomic_read(&fsi->open_zones)); + + spin_unlock(&fsi->volume_state_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("open_zones %d\n", + atomic_read(&fsi->open_zones)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_memcpy(&vs->blkbmap, + 0, sizeof(struct ssdfs_blk_bmap_options), + &fsi->vs->blkbmap, + 0, sizeof(struct ssdfs_blk_bmap_options), + sizeof(struct ssdfs_blk_bmap_options)); + ssdfs_memcpy(&vs->blk2off_tbl, + 0, sizeof(struct ssdfs_blk2off_tbl_options), + &fsi->vs->blk2off_tbl, + 0, sizeof(struct ssdfs_blk2off_tbl_options), + sizeof(struct ssdfs_blk2off_tbl_options)); + + ssdfs_memcpy(&vs->user_data, + 0, sizeof(struct ssdfs_user_data_options), + &fsi->vs->user_data, + 0, sizeof(struct ssdfs_user_data_options), + sizeof(struct ssdfs_user_data_options)); + ssdfs_memcpy(&vs->root_folder, + 0, sizeof(struct ssdfs_inode), + &fsi->vs->root_folder, + 0, sizeof(struct ssdfs_inode), + sizeof(struct ssdfs_inode)); + + ssdfs_memcpy(&vs->inodes_btree, + 0, sizeof(struct ssdfs_inodes_btree), + &fsi->vs->inodes_btree, + 0, sizeof(struct ssdfs_inodes_btree), + sizeof(struct ssdfs_inodes_btree)); + ssdfs_memcpy(&vs->shared_extents_btree, + 0, sizeof(struct ssdfs_shared_extents_btree), + &fsi->vs->shared_extents_btree, + 0, sizeof(struct ssdfs_shared_extents_btree), + sizeof(struct ssdfs_shared_extents_btree)); + ssdfs_memcpy(&vs->shared_dict_btree, + 0, sizeof(struct ssdfs_shared_dictionary_btree), + &fsi->vs->shared_dict_btree, + 0, sizeof(struct ssdfs_shared_dictionary_btree), + sizeof(struct ssdfs_shared_dictionary_btree)); + ssdfs_memcpy(&vs->snapshots_btree, + 0, sizeof(struct ssdfs_snapshots_btree), + &fsi->vs->snapshots_btree, + 0, sizeof(struct ssdfs_snapshots_btree), + sizeof(struct ssdfs_snapshots_btree)); + + return 0; +} + +/* + * ssdfs_prepare_log_footer_for_commit() - prepare log footer for commit + * @fsi: pointer on shared file system object + * @log_pages: count of pages in the log + * @log_flags: log's flags + * @last_log_time: log creation timestamp + * @last_log_cno: last log checkpoint + * @footer: log footer [out] + * + * This function prepares log footer for commit. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input values. + */ +int ssdfs_prepare_log_footer_for_commit(struct ssdfs_fs_info *fsi, + u32 log_pages, + u32 log_flags, + u64 last_log_time, + u64 last_log_cno, + struct ssdfs_log_footer *footer) +{ + u16 data_size = sizeof(struct ssdfs_log_footer); + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fsi %p, log_pages %u, log_flags %#x, footer %p\n", + fsi, log_pages, log_flags, footer); +#endif /* CONFIG_SSDFS_DEBUG */ + + footer->volume_state.magic.key = cpu_to_le16(SSDFS_LOG_FOOTER_MAGIC); + + footer->timestamp = cpu_to_le64(last_log_time); + footer->cno = cpu_to_le64(last_log_cno); + + if (log_pages >= (U32_MAX >> fsi->log_pagesize)) { + SSDFS_ERR("invalid value of log_pages %u\n", log_pages); + return -EINVAL; + } + + footer->log_bytes = cpu_to_le32(log_pages << fsi->log_pagesize); + + if (log_flags & ~SSDFS_LOG_FOOTER_FLAG_MASK) { + SSDFS_ERR("unknow log flags %#x\n", log_flags); + return -EINVAL; + } + + footer->log_flags = cpu_to_le32(log_flags); + + footer->volume_state.check.bytes = cpu_to_le16(data_size); + footer->volume_state.check.flags = cpu_to_le16(SSDFS_CRC32); + + err = ssdfs_calculate_csum(&footer->volume_state.check, + footer, data_size); + if (unlikely(err)) { + SSDFS_ERR("unable to calculate checksum: err %d\n", err); + return err; + } + + return 0; +} diff --git a/fs/ssdfs/volume_header.c b/fs/ssdfs/volume_header.c new file mode 100644 index 000000000000..e992c3cdf335 --- /dev/null +++ b/fs/ssdfs/volume_header.c @@ -0,0 +1,1256 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/volume_header.c - operations with volume header. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" + +#include + +/* + * __is_ssdfs_segment_header_magic_valid() - check segment header's magic + * @magic: pointer on magic value + */ +bool __is_ssdfs_segment_header_magic_valid(struct ssdfs_signature *magic) +{ + return le16_to_cpu(magic->key) == SSDFS_SEGMENT_HDR_MAGIC; +} + +/* + * is_ssdfs_segment_header_magic_valid() - check segment header's magic + * @hdr: segment header + */ +bool is_ssdfs_segment_header_magic_valid(struct ssdfs_segment_header *hdr) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!hdr); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __is_ssdfs_segment_header_magic_valid(&hdr->volume_hdr.magic); +} + +/* + * is_ssdfs_partial_log_header_magic_valid() - check partial log header's magic + * @magic: pointer on magic value + */ +bool is_ssdfs_partial_log_header_magic_valid(struct ssdfs_signature *magic) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!magic); +#endif /* CONFIG_SSDFS_DEBUG */ + + return le16_to_cpu(magic->key) == SSDFS_PARTIAL_LOG_HDR_MAGIC; +} + +/* + * is_ssdfs_volume_header_csum_valid() - check volume header checksum + * @vh_buf: volume header buffer + * @buf_size: size of buffer in bytes + */ +bool is_ssdfs_volume_header_csum_valid(void *vh_buf, size_t buf_size) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!vh_buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + return is_csum_valid(&SSDFS_VH(vh_buf)->check, vh_buf, buf_size); +} + +/* + * is_ssdfs_partial_log_header_csum_valid() - check partial log header checksum + * @plh_buf: partial log header buffer + * @buf_size: size of buffer in bytes + */ +bool is_ssdfs_partial_log_header_csum_valid(void *plh_buf, size_t buf_size) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!plh_buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + return is_csum_valid(&SSDFS_PLH(plh_buf)->check, plh_buf, buf_size); +} + +static inline +void ssdfs_show_volume_header(struct ssdfs_volume_header *hdr) +{ + SSDFS_ERR("MAGIC: common %#x, key %#x, " + "version (major %u, minor %u)\n", + le32_to_cpu(hdr->magic.common), + le16_to_cpu(hdr->magic.key), + hdr->magic.version.major, + hdr->magic.version.minor); + SSDFS_ERR("CHECK: bytes %u, flags %#x, csum %#x\n", + le16_to_cpu(hdr->check.bytes), + le16_to_cpu(hdr->check.flags), + le32_to_cpu(hdr->check.csum)); + SSDFS_ERR("KEY VALUES: log_pagesize %u, log_erasesize %u, " + "log_segsize %u, log_pebs_per_seg %u, " + "megabytes_per_peb %u, pebs_per_seg %u, " + "create_time %llu, create_cno %llu, flags %#x\n", + hdr->log_pagesize, + hdr->log_erasesize, + hdr->log_segsize, + hdr->log_pebs_per_seg, + le16_to_cpu(hdr->megabytes_per_peb), + le16_to_cpu(hdr->pebs_per_seg), + le64_to_cpu(hdr->create_time), + le64_to_cpu(hdr->create_cno), + le32_to_cpu(hdr->flags)); +} + +/* + * is_ssdfs_volume_header_consistent() - check volume header consistency + * @fsi: pointer on shared file system object + * @vh: volume header + * @dev_size: partition size in bytes + * + * RETURN: + * [true] - volume header is consistent. + * [false] - volume header is corrupted. + */ +bool is_ssdfs_volume_header_consistent(struct ssdfs_fs_info *fsi, + struct ssdfs_volume_header *vh, + u64 dev_size) +{ + u32 page_size; + u64 erase_size; + u32 seg_size; + u32 pebs_per_seg; + u64 leb_array[SSDFS_SB_CHAIN_MAX * SSDFS_SB_SEG_COPY_MAX] = {0}; + u64 peb_array[SSDFS_SB_CHAIN_MAX * SSDFS_SB_SEG_COPY_MAX] = {0}; + int array_index = 0; + int i, j, k; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!vh); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_size = 1 << vh->log_pagesize; + erase_size = 1 << vh->log_erasesize; + seg_size = 1 << vh->log_segsize; + pebs_per_seg = 1 << vh->log_pebs_per_seg; + + if (page_size >= erase_size) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page_size %u >= erase_size %llu\n", + page_size, erase_size); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + switch (page_size) { + case SSDFS_4KB: + case SSDFS_8KB: + case SSDFS_16KB: + case SSDFS_32KB: + /* do nothing */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unexpected page_size %u\n", page_size); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + switch (erase_size) { + case SSDFS_128KB: + case SSDFS_256KB: + case SSDFS_512KB: + case SSDFS_2MB: + case SSDFS_8MB: + case SSDFS_16MB: + case SSDFS_32MB: + case SSDFS_64MB: + case SSDFS_128MB: + case SSDFS_256MB: + case SSDFS_512MB: + case SSDFS_1GB: + case SSDFS_2GB: + case SSDFS_8GB: + case SSDFS_16GB: + case SSDFS_32GB: + case SSDFS_64GB: + /* do nothing */ + break; + + default: + if (fsi->is_zns_device) { + u64 zone_size = le16_to_cpu(vh->megabytes_per_peb); + + zone_size *= SSDFS_1MB; + + if (fsi->zone_size != zone_size) { + SSDFS_ERR("invalid zone size: " + "size1 %llu != size2 %llu\n", + fsi->zone_size, zone_size); + return -ERANGE; + } + + erase_size = zone_size; + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unexpected erase_size %llu\n", erase_size); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + }; + + if (seg_size < erase_size) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_size %u < erase_size %llu\n", + seg_size, erase_size); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + if (pebs_per_seg != (seg_size >> vh->log_erasesize)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pebs_per_seg %u != (seg_size %u / erase_size %llu)\n", + pebs_per_seg, seg_size, erase_size); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + if (seg_size >= dev_size) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_size %u >= dev_size %llu\n", + seg_size, dev_size); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + for (i = 0; i < SSDFS_SB_CHAIN_MAX; i++) { + for (j = 0; j < SSDFS_SB_SEG_COPY_MAX; j++) { + u64 leb_id = le64_to_cpu(vh->sb_pebs[i][j].leb_id); + u64 peb_id = le64_to_cpu(vh->sb_pebs[i][j].peb_id); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("i %d, j %d, LEB %llu, PEB %llu\n", + i, j, leb_id, peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (k = 0; k < array_index; k++) { + if (leb_id == leb_array[k]) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("corrupted LEB number: " + "leb_id %llu, " + "leb_array[%d] %llu\n", + leb_id, k, + leb_array[k]); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + if (peb_id == peb_array[k]) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("corrupted PEB number: " + "peb_id %llu, " + "peb_array[%d] %llu\n", + peb_id, k, + peb_array[k]); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + } + + if (i == SSDFS_PREV_SB_SEG && + leb_id == U64_MAX && peb_id == U64_MAX) { + /* prev id is U64_MAX after volume creation */ + continue; + } + + if (i == SSDFS_RESERVED_SB_SEG && + leb_id == U64_MAX && peb_id == U64_MAX) { + /* + * The reserved seg could be U64_MAX + * if there is no clean segment. + */ + continue; + } + + if (leb_id >= (dev_size >> vh->log_erasesize)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("corrupted LEB number %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + leb_array[array_index] = leb_id; + peb_array[array_index] = peb_id; + + array_index++; + } + } + + return true; +} + +/* + * ssdfs_check_segment_header() - check segment header consistency + * @fsi: pointer on shared file system object + * @hdr: segment header + * @silent: show error or not? + * + * This function checks consistency of segment header. + * + * RETURN: + * [success] - segment header is consistent. + * [failure] - error code: + * + * %-ENODATA - valid magic doesn't detected. + * %-EIO - segment header is corrupted. + */ +int ssdfs_check_segment_header(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_header *hdr, + bool silent) +{ + struct ssdfs_volume_header *vh; + size_t hdr_size = sizeof(struct ssdfs_segment_header); + bool major_magic_valid, minor_magic_valid; + u64 dev_size; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !hdr); + + SSDFS_DBG("fsi %p, hdr %p, silent %#x\n", fsi, hdr, silent); +#endif /* CONFIG_SSDFS_DEBUG */ + + vh = SSDFS_VH(hdr); + + major_magic_valid = is_ssdfs_magic_valid(&vh->magic); + minor_magic_valid = is_ssdfs_segment_header_magic_valid(hdr); + + if (!major_magic_valid && !minor_magic_valid) { + if (!silent) { + SSDFS_ERR("valid magic doesn't detected\n"); + ssdfs_show_volume_header(vh); + } else + SSDFS_DBG("valid magic doesn't detected\n"); + return -ENODATA; + } else if (!major_magic_valid) { + if (!silent) { + SSDFS_ERR("invalid SSDFS magic signature\n"); + ssdfs_show_volume_header(vh); + } else + SSDFS_DBG("invalid SSDFS magic signature\n"); + return -EIO; + } else if (!minor_magic_valid) { + if (!silent) { + SSDFS_ERR("invalid segment header magic signature\n"); + ssdfs_show_volume_header(vh); + } else + SSDFS_DBG("invalid segment header magic signature\n"); + return -EIO; + } + + if (!is_ssdfs_volume_header_csum_valid(hdr, hdr_size)) { + if (!silent) { + SSDFS_ERR("invalid checksum of volume header\n"); + ssdfs_show_volume_header(vh); + } else + SSDFS_DBG("invalid checksum of volume header\n"); + return -EIO; + } + + dev_size = fsi->devops->device_size(fsi->sb); + if (!is_ssdfs_volume_header_consistent(fsi, vh, dev_size)) { + if (!silent) { + SSDFS_ERR("volume header is corrupted\n"); + ssdfs_show_volume_header(vh); + } else + SSDFS_DBG("volume header is corrupted\n"); + return -EIO; + } + + if (SSDFS_VH_CNO(vh) > SSDFS_SEG_CNO(hdr)) { + if (!silent) { + SSDFS_ERR("invalid checkpoint/timestamp\n"); + ssdfs_show_volume_header(vh); + } else + SSDFS_DBG("invalid checkpoint/timestamp\n"); + return -EIO; + } + + if (le16_to_cpu(hdr->log_pages) > fsi->pages_per_peb) { + if (!silent) { + SSDFS_ERR("log_pages %u > pages_per_peb %u\n", + le16_to_cpu(hdr->log_pages), + fsi->pages_per_peb); + ssdfs_show_volume_header(vh); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log_pages %u > pages_per_peb %u\n", + le16_to_cpu(hdr->log_pages), + fsi->pages_per_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + } + return -EIO; + } + + if (le16_to_cpu(hdr->seg_type) > SSDFS_LAST_KNOWN_SEG_TYPE) { + if (!silent) { + SSDFS_ERR("unknown seg_type %#x\n", + le16_to_cpu(hdr->seg_type)); + ssdfs_show_volume_header(vh); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unknown seg_type %#x\n", + le16_to_cpu(hdr->seg_type)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + return -EIO; + } + + if (le32_to_cpu(hdr->seg_flags) & ~SSDFS_SEG_HDR_FLAG_MASK) { + if (!silent) { + SSDFS_ERR("corrupted seg_flags %#x\n", + le32_to_cpu(hdr->seg_flags)); + ssdfs_show_volume_header(vh); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("corrupted seg_flags %#x\n", + le32_to_cpu(hdr->seg_flags)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + return -EIO; + } + + return 0; +} + +/* + * is_ssdfs_partial_log_header_consistent() - check partial header consistency + * @fsi: pointer on shared file system object + * @ph: partial log header + * @dev_size: partition size in bytes + * + * RETURN: + * [true] - partial log header is consistent. + * [false] - partial log header is corrupted. + */ +bool is_ssdfs_partial_log_header_consistent(struct ssdfs_fs_info *fsi, + struct ssdfs_partial_log_header *ph, + u64 dev_size) +{ + u32 page_size; + u64 erase_size; + u32 seg_size; + u32 pebs_per_seg; + u64 nsegs; + u64 free_pages; + u64 pages_count; + u32 remainder; + u32 pages_per_seg; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ph); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_size = 1 << ph->log_pagesize; + erase_size = 1 << ph->log_erasesize; + seg_size = 1 << ph->log_segsize; + pebs_per_seg = 1 << ph->log_pebs_per_seg; + + if (page_size >= erase_size) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page_size %u >= erase_size %llu\n", + page_size, erase_size); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + switch (page_size) { + case SSDFS_4KB: + case SSDFS_8KB: + case SSDFS_16KB: + case SSDFS_32KB: + /* do nothing */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unexpected page_size %u\n", page_size); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + switch (erase_size) { + case SSDFS_128KB: + case SSDFS_256KB: + case SSDFS_512KB: + case SSDFS_2MB: + case SSDFS_8MB: + case SSDFS_16MB: + case SSDFS_32MB: + case SSDFS_64MB: + case SSDFS_128MB: + case SSDFS_256MB: + case SSDFS_512MB: + case SSDFS_1GB: + case SSDFS_2GB: + case SSDFS_8GB: + case SSDFS_16GB: + case SSDFS_32GB: + case SSDFS_64GB: + /* do nothing */ + break; + + default: + if (fsi->is_zns_device) { + u64 zone_size = le16_to_cpu(fsi->vh->megabytes_per_peb); + + zone_size *= SSDFS_1MB; + + if (fsi->zone_size != zone_size) { + SSDFS_ERR("invalid zone size: " + "size1 %llu != size2 %llu\n", + fsi->zone_size, zone_size); + return -ERANGE; + } + + erase_size = (u32)zone_size; + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unexpected erase_size %llu\n", erase_size); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + }; + + if (seg_size < erase_size) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_size %u < erase_size %llu\n", + seg_size, erase_size); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + if (pebs_per_seg != (seg_size >> ph->log_erasesize)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pebs_per_seg %u != (seg_size %u / erase_size %llu)\n", + pebs_per_seg, seg_size, erase_size); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + if (seg_size >= dev_size) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_size %u >= dev_size %llu\n", + seg_size, dev_size); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + nsegs = le64_to_cpu(ph->nsegs); + + if (nsegs == 0 || nsegs > (dev_size >> ph->log_segsize)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("invalid nsegs %llu, dev_size %llu, seg_size) %u\n", + nsegs, dev_size, seg_size); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + free_pages = le64_to_cpu(ph->free_pages); + + pages_count = div_u64_rem(dev_size, page_size, &remainder); + if (remainder) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dev_size %llu is unaligned on page_size %u\n", + dev_size, page_size); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + if (free_pages > pages_count) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_pages %llu is greater than pages_count %llu\n", + free_pages, pages_count); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + pages_per_seg = seg_size / page_size; + if (nsegs <= div_u64(free_pages, pages_per_seg)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("invalid nsegs %llu, free_pages %llu, " + "pages_per_seg %u\n", + nsegs, free_pages, pages_per_seg); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + return true; +} + +/* + * ssdfs_check_partial_log_header() - check partial log header consistency + * @fsi: pointer on shared file system object + * @hdr: partial log header + * @silent: show error or not? + * + * This function checks consistency of partial log header. + * + * RETURN: + * [success] - partial log header is consistent. + * [failure] - error code: + * + * %-ENODATA - valid magic doesn't detected. + * %-EIO - partial log header is corrupted. + */ +int ssdfs_check_partial_log_header(struct ssdfs_fs_info *fsi, + struct ssdfs_partial_log_header *hdr, + bool silent) +{ + size_t hdr_size = sizeof(struct ssdfs_partial_log_header); + bool major_magic_valid, minor_magic_valid; + u64 dev_size; + u32 log_bytes; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !hdr); + + SSDFS_DBG("fsi %p, hdr %p, silent %#x\n", fsi, hdr, silent); +#endif /* CONFIG_SSDFS_DEBUG */ + + major_magic_valid = is_ssdfs_magic_valid(&hdr->magic); + minor_magic_valid = + is_ssdfs_partial_log_header_magic_valid(&hdr->magic); + + if (!major_magic_valid && !minor_magic_valid) { + if (!silent) + SSDFS_ERR("valid magic doesn't detected\n"); + else + SSDFS_DBG("valid magic doesn't detected\n"); + return -ENODATA; + } else if (!major_magic_valid) { + if (!silent) + SSDFS_ERR("invalid SSDFS magic signature\n"); + else + SSDFS_DBG("invalid SSDFS magic signature\n"); + return -EIO; + } else if (!minor_magic_valid) { + if (!silent) + SSDFS_ERR("invalid partial log header magic\n"); + else + SSDFS_DBG("invalid partial log header magic\n"); + return -EIO; + } + + if (!is_ssdfs_partial_log_header_csum_valid(hdr, hdr_size)) { + if (!silent) + SSDFS_ERR("invalid checksum of partial log header\n"); + else + SSDFS_DBG("invalid checksum of partial log header\n"); + return -EIO; + } + + dev_size = fsi->devops->device_size(fsi->sb); + if (!is_ssdfs_partial_log_header_consistent(fsi, hdr, dev_size)) { + if (!silent) + SSDFS_ERR("partial log header is corrupted\n"); + else + SSDFS_DBG("partial log header is corrupted\n"); + return -EIO; + } + + if (le16_to_cpu(hdr->log_pages) > fsi->pages_per_peb) { + if (!silent) { + SSDFS_ERR("log_pages %u > pages_per_peb %u\n", + le16_to_cpu(hdr->log_pages), + fsi->pages_per_peb); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log_pages %u > pages_per_peb %u\n", + le16_to_cpu(hdr->log_pages), + fsi->pages_per_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + } + return -EIO; + } + + log_bytes = (u32)le16_to_cpu(hdr->log_pages) * fsi->pagesize; + if (le32_to_cpu(hdr->log_bytes) > log_bytes) { + if (!silent) { + SSDFS_ERR("calculated log_bytes %u < log_bytes %u\n", + log_bytes, + le32_to_cpu(hdr->log_bytes)); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("calculated log_bytes %u < log_bytes %u\n", + log_bytes, + le32_to_cpu(hdr->log_bytes)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + return -EIO; + } + + if (le16_to_cpu(hdr->seg_type) > SSDFS_LAST_KNOWN_SEG_TYPE) { + if (!silent) { + SSDFS_ERR("unknown seg_type %#x\n", + le16_to_cpu(hdr->seg_type)); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unknown seg_type %#x\n", + le16_to_cpu(hdr->seg_type)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + return -EIO; + } + + if (le32_to_cpu(hdr->pl_flags) & ~SSDFS_SEG_HDR_FLAG_MASK) { + if (!silent) { + SSDFS_ERR("corrupted pl_flags %#x\n", + le32_to_cpu(hdr->pl_flags)); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("corrupted pl_flags %#x\n", + le32_to_cpu(hdr->pl_flags)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + return -EIO; + } + + return 0; +} + +/* + * ssdfs_read_checked_segment_header() - read and check segment header + * @fsi: pointer on shared file system object + * @peb_id: PEB identification number + * @pages_off: offset from PEB's begin in pages + * @buf: buffer + * @silent: show error or not? + * + * This function reads and checks consistency of segment header. + * + * RETURN: + * [success] - segment header is consistent. + * [failure] - error code: + * + * %-ENODATA - valid magic doesn't detected. + * %-EIO - segment header is corrupted. + */ +int ssdfs_read_checked_segment_header(struct ssdfs_fs_info *fsi, + u64 peb_id, u32 pages_off, + void *buf, bool silent) +{ + struct ssdfs_signature *magic; + struct ssdfs_segment_header *hdr; + struct ssdfs_partial_log_header *pl_hdr; + size_t hdr_size = sizeof(struct ssdfs_segment_header); + u64 offset = 0; + size_t read_bytes; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_id %llu, pages_off %u, buf %p, silent %#x\n", + peb_id, pages_off, buf, silent); + + BUG_ON(!fsi); + BUG_ON(!fsi->devops->read); + BUG_ON(!buf); + BUG_ON(pages_off >= fsi->pages_per_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (peb_id == 0 && pages_off == 0) + offset = SSDFS_RESERVED_VBR_SIZE; + else + offset = (u64)pages_off * fsi->pagesize; + + err = ssdfs_aligned_read_buffer(fsi, peb_id, offset, + buf, hdr_size, + &read_bytes); + if (unlikely(err)) { + if (!silent) { + SSDFS_ERR("fail to read segment header: " + "peb_id %llu, pages_off %u, err %d\n", + peb_id, pages_off, err); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fail to read segment header: " + "peb_id %llu, pages_off %u, err %d\n", + peb_id, pages_off, err); +#endif /* CONFIG_SSDFS_DEBUG */ + } + return err; + } + + if (unlikely(read_bytes != hdr_size)) { + if (!silent) { + SSDFS_ERR("fail to read segment header: " + "peb_id %llu, pages_off %u: " + "read_bytes %zu != hdr_size %zu\n", + peb_id, pages_off, read_bytes, hdr_size); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fail to read segment header: " + "peb_id %llu, pages_off %u: " + "read_bytes %zu != hdr_size %zu\n", + peb_id, pages_off, read_bytes, hdr_size); +#endif /* CONFIG_SSDFS_DEBUG */ + } + return -ERANGE; + } + + magic = (struct ssdfs_signature *)buf; + + if (!is_ssdfs_magic_valid(magic)) { + if (!silent) + SSDFS_ERR("valid magic is not detected\n"); + else + SSDFS_DBG("valid magic is not detected\n"); + + return -ENODATA; + } + + if (__is_ssdfs_segment_header_magic_valid(magic)) { + hdr = SSDFS_SEG_HDR(buf); + + err = ssdfs_check_segment_header(fsi, hdr, silent); + if (unlikely(err)) { + if (!silent) { + SSDFS_ERR("segment header is corrupted: " + "peb_id %llu, pages_off %u, err %d\n", + peb_id, pages_off, err); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segment header is corrupted: " + "peb_id %llu, pages_off %u, err %d\n", + peb_id, pages_off, err); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return err; + } + } else if (is_ssdfs_partial_log_header_magic_valid(magic)) { + pl_hdr = SSDFS_PLH(buf); + + err = ssdfs_check_partial_log_header(fsi, pl_hdr, silent); + if (unlikely(err)) { + if (!silent) { + SSDFS_ERR("partial log header is corrupted: " + "peb_id %llu, pages_off %u\n", + peb_id, pages_off); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("partial log header is corrupted: " + "peb_id %llu, pages_off %u\n", + peb_id, pages_off); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return err; + } + } else { + if (!silent) { + SSDFS_ERR("log header is corrupted: " + "peb_id %llu, pages_off %u\n", + peb_id, pages_off); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log header is corrupted: " + "peb_id %llu, pages_off %u\n", + peb_id, pages_off); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return -EIO; + } + + return 0; +} + +/* + * ssdfs_create_volume_header() - initialize volume header from the scratch + * @fsi: pointer on shared file system object + * @vh: volume header + */ +void ssdfs_create_volume_header(struct ssdfs_fs_info *fsi, + struct ssdfs_volume_header *vh) +{ + u64 erase_size; + u32 megabytes_per_peb; + u32 flags; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !vh); + + SSDFS_DBG("fsi %p, vh %p\n", fsi, vh); + SSDFS_DBG("fsi->log_pagesize %u, fsi->log_erasesize %u, " + "fsi->log_segsize %u, fsi->log_pebs_per_seg %u\n", + fsi->log_pagesize, + fsi->log_erasesize, + fsi->log_segsize, + fsi->log_pebs_per_seg); +#endif /* CONFIG_SSDFS_DEBUG */ + + vh->magic.common = cpu_to_le32(SSDFS_SUPER_MAGIC); + vh->magic.key = cpu_to_le16(SSDFS_SEGMENT_HDR_MAGIC); + vh->magic.version.major = SSDFS_MAJOR_REVISION; + vh->magic.version.minor = SSDFS_MINOR_REVISION; + + vh->log_pagesize = fsi->log_pagesize; + vh->log_erasesize = fsi->log_erasesize; + vh->log_segsize = fsi->log_segsize; + vh->log_pebs_per_seg = fsi->log_pebs_per_seg; + + megabytes_per_peb = fsi->erasesize / SSDFS_1MB; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(megabytes_per_peb >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + vh->megabytes_per_peb = cpu_to_le16((u16)megabytes_per_peb); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(fsi->pebs_per_seg >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + vh->pebs_per_seg = cpu_to_le16((u16)fsi->pebs_per_seg); + + vh->create_time = cpu_to_le64(fsi->fs_ctime); + vh->create_cno = cpu_to_le64(fsi->fs_cno); + + vh->lebs_per_peb_index = cpu_to_le32(fsi->lebs_per_peb_index); + vh->create_threads_per_seg = cpu_to_le16(fsi->create_threads_per_seg); + + vh->flags = cpu_to_le32(0); + + if (fsi->is_zns_device) { + flags = le32_to_cpu(vh->flags); + flags |= SSDFS_VH_ZNS_BASED_VOLUME; + + erase_size = 1 << fsi->log_erasesize; + if (erase_size != fsi->zone_size) + flags |= SSDFS_VH_UNALIGNED_ZONE; + + vh->flags = cpu_to_le32(flags); + } + + vh->sb_seg_log_pages = cpu_to_le16(fsi->sb_seg_log_pages); + vh->segbmap_log_pages = cpu_to_le16(fsi->segbmap_log_pages); + vh->maptbl_log_pages = cpu_to_le16(fsi->maptbl_log_pages); + vh->lnodes_seg_log_pages = cpu_to_le16(fsi->lnodes_seg_log_pages); + vh->hnodes_seg_log_pages = cpu_to_le16(fsi->hnodes_seg_log_pages); + vh->inodes_seg_log_pages = cpu_to_le16(fsi->inodes_seg_log_pages); + vh->user_data_log_pages = cpu_to_le16(fsi->user_data_log_pages); + + ssdfs_memcpy(&vh->segbmap, + 0, sizeof(struct ssdfs_segbmap_sb_header), + &fsi->vh->segbmap, + 0, sizeof(struct ssdfs_segbmap_sb_header), + sizeof(struct ssdfs_segbmap_sb_header)); + ssdfs_memcpy(&vh->maptbl, + 0, sizeof(struct ssdfs_maptbl_sb_header), + &fsi->vh->maptbl, + 0, sizeof(struct ssdfs_maptbl_sb_header), + sizeof(struct ssdfs_maptbl_sb_header)); + ssdfs_memcpy(&vh->dentries_btree, + 0, sizeof(struct ssdfs_dentries_btree_descriptor), + &fsi->vh->dentries_btree, + 0, sizeof(struct ssdfs_dentries_btree_descriptor), + sizeof(struct ssdfs_dentries_btree_descriptor)); + ssdfs_memcpy(&vh->extents_btree, + 0, sizeof(struct ssdfs_extents_btree_descriptor), + &fsi->vh->extents_btree, + 0, sizeof(struct ssdfs_extents_btree_descriptor), + sizeof(struct ssdfs_extents_btree_descriptor)); + ssdfs_memcpy(&vh->xattr_btree, + 0, sizeof(struct ssdfs_xattr_btree_descriptor), + &fsi->vh->xattr_btree, + 0, sizeof(struct ssdfs_xattr_btree_descriptor), + sizeof(struct ssdfs_xattr_btree_descriptor)); + ssdfs_memcpy(&vh->invextree, + 0, sizeof(struct ssdfs_invalidated_extents_btree), + &fsi->vh->invextree, + 0, sizeof(struct ssdfs_invalidated_extents_btree), + sizeof(struct ssdfs_invalidated_extents_btree)); +} + +/* + * ssdfs_store_sb_segs_array() - store sb segments array + * @fsi: pointer on shared file system object + * @vh: volume header + */ +static inline +void ssdfs_store_sb_segs_array(struct ssdfs_fs_info *fsi, + struct ssdfs_volume_header *vh) +{ + int i, j; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fsi %p, vh %p\n", fsi, vh); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&fsi->sb_segs_sem); + + for (i = SSDFS_CUR_SB_SEG; i < SSDFS_SB_CHAIN_MAX; i++) { + for (j = SSDFS_MAIN_SB_SEG; j < SSDFS_SB_SEG_COPY_MAX; j++) { + vh->sb_pebs[i][j].leb_id = + cpu_to_le64(fsi->sb_lebs[i][j]); + vh->sb_pebs[i][j].peb_id = + cpu_to_le64(fsi->sb_pebs[i][j]); + } + } + + up_read(&fsi->sb_segs_sem); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sb_lebs[CUR][MAIN] %llu, sb_pebs[CUR][MAIN] %llu\n", + fsi->sb_lebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG], + fsi->sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG]); + SSDFS_DBG("sb_lebs[CUR][COPY] %llu, sb_pebs[CUR][COPY] %llu\n", + fsi->sb_lebs[SSDFS_CUR_SB_SEG][SSDFS_COPY_SB_SEG], + fsi->sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_COPY_SB_SEG]); + SSDFS_DBG("sb_lebs[NEXT][MAIN] %llu, sb_pebs[NEXT][MAIN] %llu\n", + fsi->sb_lebs[SSDFS_NEXT_SB_SEG][SSDFS_MAIN_SB_SEG], + fsi->sb_pebs[SSDFS_NEXT_SB_SEG][SSDFS_MAIN_SB_SEG]); + SSDFS_DBG("sb_lebs[NEXT][COPY] %llu, sb_pebs[NEXT][COPY] %llu\n", + fsi->sb_lebs[SSDFS_NEXT_SB_SEG][SSDFS_COPY_SB_SEG], + fsi->sb_pebs[SSDFS_NEXT_SB_SEG][SSDFS_COPY_SB_SEG]); + SSDFS_DBG("sb_lebs[RESERVED][MAIN] %llu, sb_pebs[RESERVED][MAIN] %llu\n", + fsi->sb_lebs[SSDFS_RESERVED_SB_SEG][SSDFS_MAIN_SB_SEG], + fsi->sb_pebs[SSDFS_RESERVED_SB_SEG][SSDFS_MAIN_SB_SEG]); + SSDFS_DBG("sb_lebs[RESERVED][COPY] %llu, sb_pebs[RESERVED][COPY] %llu\n", + fsi->sb_lebs[SSDFS_RESERVED_SB_SEG][SSDFS_COPY_SB_SEG], + fsi->sb_pebs[SSDFS_RESERVED_SB_SEG][SSDFS_COPY_SB_SEG]); + SSDFS_DBG("sb_lebs[PREV][MAIN] %llu, sb_pebs[PREV][MAIN] %llu\n", + fsi->sb_lebs[SSDFS_PREV_SB_SEG][SSDFS_MAIN_SB_SEG], + fsi->sb_pebs[SSDFS_PREV_SB_SEG][SSDFS_MAIN_SB_SEG]); + SSDFS_DBG("sb_lebs[PREV][COPY] %llu, sb_pebs[PREV][COPY] %llu\n", + fsi->sb_lebs[SSDFS_PREV_SB_SEG][SSDFS_COPY_SB_SEG], + fsi->sb_pebs[SSDFS_PREV_SB_SEG][SSDFS_COPY_SB_SEG]); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +/* + * ssdfs_prepare_volume_header_for_commit() - prepare volume header for commit + * @fsi: pointer on shared file system object + * @vh: volume header + */ +int ssdfs_prepare_volume_header_for_commit(struct ssdfs_fs_info *fsi, + struct ssdfs_volume_header *vh) +{ +#ifdef CONFIG_SSDFS_DEBUG + struct super_block *sb = fsi->sb; + u64 dev_size; + + SSDFS_DBG("fsi %p, vh %p\n", fsi, vh); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_store_sb_segs_array(fsi, vh); + +#ifdef CONFIG_SSDFS_DEBUG + dev_size = fsi->devops->device_size(sb); + if (!is_ssdfs_volume_header_consistent(fsi, vh, dev_size)) { + SSDFS_ERR("volume header is inconsistent\n"); + return -EIO; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_prepare_segment_header_for_commit() - prepare segment header + * @fsi: pointer on shared file system object + * @log_pages: full log pages count + * @seg_type: segment type + * @seg_flags: segment flags + * @last_log_time: log creation time + * @last_log_cno: log checkpoint + * @hdr: segment header [out] + */ +int ssdfs_prepare_segment_header_for_commit(struct ssdfs_fs_info *fsi, + u32 log_pages, + u16 seg_type, + u32 seg_flags, + u64 last_log_time, + u64 last_log_cno, + struct ssdfs_segment_header *hdr) +{ + u16 data_size = sizeof(struct ssdfs_segment_header); + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fsi %p, hdr %p, " + "log_pages %u, seg_type %#x, seg_flags %#x\n", + fsi, hdr, log_pages, seg_type, seg_flags); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr->timestamp = cpu_to_le64(last_log_time); + hdr->cno = cpu_to_le64(last_log_cno); + + if (log_pages > fsi->pages_per_seg || log_pages > U16_MAX) { + SSDFS_ERR("invalid value of log_pages %u\n", log_pages); + return -EINVAL; + } + + hdr->log_pages = cpu_to_le16((u16)log_pages); + + if (seg_type == SSDFS_UNKNOWN_SEG_TYPE || + seg_type > SSDFS_LAST_KNOWN_SEG_TYPE) { + SSDFS_ERR("invalid value of seg_type %#x\n", seg_type); + return -EINVAL; + } + + hdr->seg_type = cpu_to_le16(seg_type); + hdr->seg_flags = cpu_to_le32(seg_flags); + + hdr->volume_hdr.check.bytes = cpu_to_le16(data_size); + hdr->volume_hdr.check.flags = cpu_to_le16(SSDFS_CRC32); + + err = ssdfs_calculate_csum(&hdr->volume_hdr.check, + hdr, data_size); + if (unlikely(err)) { + SSDFS_ERR("unable to calculate checksum: err %d\n", err); + return err; + } + + return 0; +} + +/* + * ssdfs_prepare_partial_log_header_for_commit() - prepare partial log header + * @fsi: pointer on shared file system object + * @sequence_id: sequence ID of the partial log inside the full log + * @log_pages: log pages count + * @seg_type: segment type + * @pl_flags: partial log's flags + * @last_log_time: log creation time + * @last_log_cno: log checkpoint + * @hdr: partial log's header [out] + */ +int ssdfs_prepare_partial_log_header_for_commit(struct ssdfs_fs_info *fsi, + int sequence_id, + u32 log_pages, + u16 seg_type, + u32 pl_flags, + u64 last_log_time, + u64 last_log_cno, + struct ssdfs_partial_log_header *hdr) +{ + u16 data_size = sizeof(struct ssdfs_partial_log_header); + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fsi %p, hdr %p, sequence_id %d, " + "log_pages %u, seg_type %#x, pl_flags %#x\n", + fsi, hdr, sequence_id, log_pages, seg_type, pl_flags); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr->magic.common = cpu_to_le32(SSDFS_SUPER_MAGIC); + hdr->magic.key = cpu_to_le16(SSDFS_PARTIAL_LOG_HDR_MAGIC); + hdr->magic.version.major = SSDFS_MAJOR_REVISION; + hdr->magic.version.minor = SSDFS_MINOR_REVISION; + + hdr->timestamp = cpu_to_le64(last_log_time); + hdr->cno = cpu_to_le64(last_log_cno); + + if (log_pages > fsi->pages_per_seg || log_pages > U16_MAX) { + SSDFS_ERR("invalid value of log_pages %u\n", log_pages); + return -EINVAL; + } + + hdr->log_pages = cpu_to_le16((u16)log_pages); + hdr->log_bytes = cpu_to_le32(log_pages << fsi->log_pagesize); + + if (seg_type == SSDFS_UNKNOWN_SEG_TYPE || + seg_type > SSDFS_LAST_KNOWN_SEG_TYPE) { + SSDFS_ERR("invalid value of seg_type %#x\n", seg_type); + return -EINVAL; + } + + hdr->seg_type = cpu_to_le16(seg_type); + hdr->pl_flags = cpu_to_le32(pl_flags); + + spin_lock(&fsi->volume_state_lock); + hdr->free_pages = cpu_to_le64(fsi->free_pages); + hdr->flags = cpu_to_le32(fsi->fs_flags); + spin_unlock(&fsi->volume_state_lock); + + mutex_lock(&fsi->resize_mutex); + hdr->nsegs = cpu_to_le64(fsi->nsegs); + mutex_unlock(&fsi->resize_mutex); + + ssdfs_memcpy(&hdr->root_folder, + 0, sizeof(struct ssdfs_inode), + &fsi->vs->root_folder, + 0, sizeof(struct ssdfs_inode), + sizeof(struct ssdfs_inode)); + + ssdfs_memcpy(&hdr->inodes_btree, + 0, sizeof(struct ssdfs_inodes_btree), + &fsi->vs->inodes_btree, + 0, sizeof(struct ssdfs_inodes_btree), + sizeof(struct ssdfs_inodes_btree)); + ssdfs_memcpy(&hdr->shared_extents_btree, + 0, sizeof(struct ssdfs_shared_extents_btree), + &fsi->vs->shared_extents_btree, + 0, sizeof(struct ssdfs_shared_extents_btree), + sizeof(struct ssdfs_shared_extents_btree)); + ssdfs_memcpy(&hdr->shared_dict_btree, + 0, sizeof(struct ssdfs_shared_dictionary_btree), + &fsi->vs->shared_dict_btree, + 0, sizeof(struct ssdfs_shared_dictionary_btree), + sizeof(struct ssdfs_shared_dictionary_btree)); + ssdfs_memcpy(&hdr->snapshots_btree, + 0, sizeof(struct ssdfs_snapshots_btree), + &fsi->vs->snapshots_btree, + 0, sizeof(struct ssdfs_snapshots_btree), + sizeof(struct ssdfs_snapshots_btree)); + ssdfs_memcpy(&hdr->invextree, + 0, sizeof(struct ssdfs_invalidated_extents_btree), + &fsi->vh->invextree, + 0, sizeof(struct ssdfs_invalidated_extents_btree), + sizeof(struct ssdfs_invalidated_extents_btree)); + + hdr->sequence_id = cpu_to_le32(sequence_id); + + hdr->log_pagesize = fsi->log_pagesize; + hdr->log_erasesize = fsi->log_erasesize; + hdr->log_segsize = fsi->log_segsize; + hdr->log_pebs_per_seg = fsi->log_pebs_per_seg; + hdr->lebs_per_peb_index = cpu_to_le32(fsi->lebs_per_peb_index); + hdr->create_threads_per_seg = cpu_to_le16(fsi->create_threads_per_seg); + + hdr->open_zones = cpu_to_le32(atomic_read(&fsi->open_zones)); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("open_zones %d\n", + atomic_read(&fsi->open_zones)); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr->check.bytes = cpu_to_le16(data_size); + hdr->check.flags = cpu_to_le16(SSDFS_CRC32); + + err = ssdfs_calculate_csum(&hdr->check, + hdr, data_size); + if (unlikely(err)) { + SSDFS_ERR("unable to calculate checksum: err %d\n", err); + return err; + } + + return 0; +} From patchwork Sat Feb 25 01:08:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151912 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EEAA3C7EE30 for ; Sat, 25 Feb 2023 01:16:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229678AbjBYBQF (ORCPT ); Fri, 24 Feb 2023 20:16:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48434 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229668AbjBYBQA (ORCPT ); Fri, 24 Feb 2023 20:16:00 -0500 Received: from mail-oi1-x231.google.com (mail-oi1-x231.google.com [IPv6:2607:f8b0:4864:20::231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3871114E8D for ; Fri, 24 Feb 2023 17:15:51 -0800 (PST) Received: by mail-oi1-x231.google.com with SMTP id s41so4930oiw.13 for ; Fri, 24 Feb 2023 17:15:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4642LZwxp4N01oHZIls5fZJaA3I2aevwRt1B4pxKRv4=; b=Lm+sV4ua2GWrvq656LmaXxlQsZfEeW17HKwApYvttYkLRPgcbkcJbV0s/EYtcg37h0 QKyQVzDS8RlnRk1kPEdM75U/MgqOPLhNggS1tFj+7+6JBLgAdlBHUDThg2uUHTHv5DkP Q/jT66DXJieJKvtV2Pm5s+lF0xWfIGzLaou9GB3Mj9lKrfbhULZzRwelx7Y8rwn8YdDb 5nvo0I3EfoO9c/DhOmY91JnUJgiefSHioCR4YJtVPtCr7dk98u4RycrSD7sWe+Vp03p9 BB9+/PBn64M01QLzoAVU4JEt6ZgJKdZ5lxNWSBtw2E98s09WbX/DT7fZQW5TVfeW8u/u cnMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4642LZwxp4N01oHZIls5fZJaA3I2aevwRt1B4pxKRv4=; b=4wp97atnBPHTYhARtZoZAMX594j+QegzKIn9Wq3vkS8D2co/wKcl4KcnJuRjOc+H3e uDbcTd1Rydv7/+xbw+Odvk3gVZ3O5QrtWoMWGY6LGJQl+P5JS4DVmqx8zNRenDr+rlxj tYN0sch9ldfTsd4HxejY9W5yd8mbMR0mHv4m81t+z4TfosJ2A0u7LZhzIkH9nDJ7X/BJ u4DBBPrswS0onnG1eUsVedIEtFqdKduzVE833iZIaVM/MRiVWUi8XNrq01WrdA+rn0Yt nWzXFkrEr7PeXkBgP3Bxwyg7Az2NBJpjYNK3tvor3j0rZ06gpUs9myja7tjtGB6lJChv ybdw== X-Gm-Message-State: AO0yUKXU/GMQVWX4XA5v9Ai83i+Rl3W4yinvT+lsftcbrU6j2HWkQNl4 wt/Au1VmyUfVn3DhSD8bWVCRsHkZ1TbdrwK3 X-Google-Smtp-Source: AK7set8oQlzNfs/JmKEbUSKwh0jM4DVRV3rsjSDNyuDe2KQvNhW63HOrC8qnIj/N6fKih3tXT5AiGQ== X-Received: by 2002:a05:6808:2990:b0:378:2fb7:f6a6 with SMTP id ex16-20020a056808299000b003782fb7f6a6mr3963303oib.45.1677287749279; Fri, 24 Feb 2023 17:15:49 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.15.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:15:48 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 07/76] ssdfs: basic mount logic implementation Date: Fri, 24 Feb 2023 17:08:18 -0800 Message-Id: <20230225010927.813929-8-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This patch implements logic of search/recovery of last actual superblock. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/recovery.c | 3144 +++++++++++++++++++++++++++++++++++++++++++ fs/ssdfs/recovery.h | 446 ++++++ 2 files changed, 3590 insertions(+) create mode 100644 fs/ssdfs/recovery.c create mode 100644 fs/ssdfs/recovery.h diff --git a/fs/ssdfs/recovery.c b/fs/ssdfs/recovery.c new file mode 100644 index 000000000000..dcb56ac0d682 --- /dev/null +++ b/fs/ssdfs/recovery.c @@ -0,0 +1,3144 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/recovery.c - searching actual state and recovery on mount code. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "page_array.h" +#include "page_vector.h" +#include "peb.h" +#include "offset_translation_table.h" +#include "segment_bitmap.h" +#include "peb_mapping_table.h" +#include "recovery.h" + +#include + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_recovery_page_leaks; +atomic64_t ssdfs_recovery_memory_leaks; +atomic64_t ssdfs_recovery_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_recovery_cache_leaks_increment(void *kaddr) + * void ssdfs_recovery_cache_leaks_decrement(void *kaddr) + * void *ssdfs_recovery_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_recovery_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_recovery_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_recovery_kfree(void *kaddr) + * struct page *ssdfs_recovery_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_recovery_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_recovery_free_page(struct page *page) + * void ssdfs_recovery_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(recovery) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(recovery) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_recovery_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_recovery_page_leaks, 0); + atomic64_set(&ssdfs_recovery_memory_leaks, 0); + atomic64_set(&ssdfs_recovery_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_recovery_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_recovery_page_leaks) != 0) { + SSDFS_ERR("RECOVERY: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_recovery_page_leaks)); + } + + if (atomic64_read(&ssdfs_recovery_memory_leaks) != 0) { + SSDFS_ERR("RECOVERY: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_recovery_memory_leaks)); + } + + if (atomic64_read(&ssdfs_recovery_cache_leaks) != 0) { + SSDFS_ERR("RECOVERY: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_recovery_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +int ssdfs_init_sb_info(struct ssdfs_fs_info *fsi, + struct ssdfs_sb_info *sbi) +{ + void *vh_buf = NULL; + void *vs_buf = NULL; + size_t hdr_size = sizeof(struct ssdfs_segment_header); + size_t footer_size = sizeof(struct ssdfs_log_footer); + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sbi %p, hdr_size %zu, footer_size %zu\n", + sbi, hdr_size, footer_size); + + BUG_ON(!sbi); +#endif /* CONFIG_SSDFS_DEBUG */ + + sbi->vh_buf = NULL; + sbi->vs_buf = NULL; + + hdr_size = max_t(size_t, hdr_size, (size_t)SSDFS_4KB); + sbi->vh_buf_size = hdr_size; + footer_size = max_t(size_t, footer_size, (size_t)SSDFS_4KB); + sbi->vs_buf_size = footer_size; + + vh_buf = ssdfs_recovery_kzalloc(sbi->vh_buf_size, GFP_KERNEL); + vs_buf = ssdfs_recovery_kzalloc(sbi->vs_buf_size, GFP_KERNEL); + if (unlikely(!vh_buf || !vs_buf)) { + SSDFS_ERR("unable to allocate superblock buffers\n"); + err = -ENOMEM; + goto free_buf; + } + + sbi->vh_buf = vh_buf; + sbi->vs_buf = vs_buf; + + return 0; + +free_buf: + ssdfs_recovery_kfree(vh_buf); + ssdfs_recovery_kfree(vs_buf); + return err; +} + +void ssdfs_destruct_sb_info(struct ssdfs_sb_info *sbi) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!sbi); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!sbi->vh_buf || !sbi->vs_buf) + return; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sbi %p, sbi->vh_buf %p, sbi->vs_buf %p, " + "sbi->last_log.leb_id %llu, sbi->last_log.peb_id %llu, " + "sbi->last_log.page_offset %u, " + "sbi->last_log.pages_count %u\n", + sbi, sbi->vh_buf, sbi->vs_buf, sbi->last_log.leb_id, + sbi->last_log.peb_id, sbi->last_log.page_offset, + sbi->last_log.pages_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_recovery_kfree(sbi->vh_buf); + ssdfs_recovery_kfree(sbi->vs_buf); + sbi->vh_buf = NULL; + sbi->vh_buf_size = 0; + sbi->vs_buf = NULL; + sbi->vs_buf_size = 0; + memset(&sbi->last_log, 0, sizeof(struct ssdfs_peb_extent)); +} + +void ssdfs_backup_sb_info(struct ssdfs_fs_info *fsi) +{ + size_t hdr_size = sizeof(struct ssdfs_segment_header); + size_t footer_size = sizeof(struct ssdfs_log_footer); + size_t extent_size = sizeof(struct ssdfs_peb_extent); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + BUG_ON(!fsi->sbi.vh_buf || !fsi->sbi.vs_buf); + BUG_ON(!fsi->sbi_backup.vh_buf || !fsi->sbi_backup.vs_buf); + + SSDFS_DBG("last_log: leb_id %llu, peb_id %llu, " + "page_offset %u, pages_count %u, " + "volume state: free_pages %llu, timestamp %#llx, " + "cno %#llx, fs_state %#x\n", + fsi->sbi.last_log.leb_id, + fsi->sbi.last_log.peb_id, + fsi->sbi.last_log.page_offset, + fsi->sbi.last_log.pages_count, + le64_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->free_pages), + le64_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->timestamp), + le64_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->cno), + le16_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_memcpy(fsi->sbi_backup.vh_buf, 0, hdr_size, + fsi->sbi.vh_buf, 0, hdr_size, + hdr_size); + ssdfs_memcpy(fsi->sbi_backup.vs_buf, 0, footer_size, + fsi->sbi.vs_buf, 0, footer_size, + footer_size); + ssdfs_memcpy(&fsi->sbi_backup.last_log, 0, extent_size, + &fsi->sbi.last_log, 0, extent_size, + extent_size); +} + +void ssdfs_copy_sb_info(struct ssdfs_fs_info *fsi, + struct ssdfs_recovery_env *env) +{ + size_t hdr_size = sizeof(struct ssdfs_segment_header); + size_t vhdr_size = sizeof(struct ssdfs_volume_header); + size_t footer_size = sizeof(struct ssdfs_log_footer); + size_t extent_size = sizeof(struct ssdfs_peb_extent); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + BUG_ON(!fsi->sbi.vh_buf || !fsi->sbi.vs_buf); + BUG_ON(!fsi->sbi_backup.vh_buf || !fsi->sbi_backup.vs_buf); + BUG_ON(!env); + BUG_ON(!env->sbi.vh_buf || !env->sbi.vs_buf); + BUG_ON(!env->sbi_backup.vh_buf || !env->sbi_backup.vs_buf); + + SSDFS_DBG("last_log: leb_id %llu, peb_id %llu, " + "page_offset %u, pages_count %u, " + "volume state: free_pages %llu, timestamp %#llx, " + "cno %#llx, fs_state %#x\n", + env->sbi.last_log.leb_id, + env->sbi.last_log.peb_id, + env->sbi.last_log.page_offset, + env->sbi.last_log.pages_count, + le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->free_pages), + le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->timestamp), + le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->cno), + le16_to_cpu(SSDFS_VS(env->sbi.vs_buf)->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_memcpy(fsi->sbi.vh_buf, 0, hdr_size, + env->sbi.vh_buf, 0, hdr_size, + hdr_size); + ssdfs_memcpy(fsi->sbi.vs_buf, 0, footer_size, + env->sbi.vs_buf, 0, footer_size, + footer_size); + ssdfs_memcpy(&fsi->sbi.last_log, 0, extent_size, + &env->sbi.last_log, 0, extent_size, + extent_size); + ssdfs_memcpy(fsi->sbi_backup.vh_buf, 0, hdr_size, + env->sbi_backup.vh_buf, 0, hdr_size, + hdr_size); + ssdfs_memcpy(fsi->sbi_backup.vs_buf, 0, footer_size, + env->sbi_backup.vs_buf, 0, footer_size, + footer_size); + ssdfs_memcpy(&fsi->sbi_backup.last_log, 0, extent_size, + &env->sbi_backup.last_log, 0, extent_size, + extent_size); + ssdfs_memcpy(&fsi->last_vh, 0, vhdr_size, + &env->last_vh, 0, vhdr_size, + vhdr_size); +} + +void ssdfs_restore_sb_info(struct ssdfs_fs_info *fsi) +{ + size_t hdr_size = sizeof(struct ssdfs_segment_header); + size_t footer_size = sizeof(struct ssdfs_log_footer); + size_t extent_size = sizeof(struct ssdfs_peb_extent); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + BUG_ON(!fsi->sbi.vh_buf || !fsi->sbi.vs_buf); + BUG_ON(!fsi->sbi_backup.vh_buf || !fsi->sbi_backup.vs_buf); + + SSDFS_DBG("last_log: leb_id %llu, peb_id %llu, " + "page_offset %u, pages_count %u, " + "volume state: free_pages %llu, timestamp %#llx, " + "cno %#llx, fs_state %#x\n", + fsi->sbi.last_log.leb_id, + fsi->sbi.last_log.peb_id, + fsi->sbi.last_log.page_offset, + fsi->sbi.last_log.pages_count, + le64_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->free_pages), + le64_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->timestamp), + le64_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->cno), + le16_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_memcpy(fsi->sbi.vh_buf, 0, hdr_size, + fsi->sbi_backup.vh_buf, 0, hdr_size, + hdr_size); + ssdfs_memcpy(fsi->sbi.vs_buf, 0, footer_size, + fsi->sbi_backup.vs_buf, 0, footer_size, + footer_size); + ssdfs_memcpy(&fsi->sbi.last_log, 0, extent_size, + &fsi->sbi_backup.last_log, 0, extent_size, + extent_size); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("last_log: leb_id %llu, peb_id %llu, " + "page_offset %u, pages_count %u, " + "volume state: free_pages %llu, timestamp %#llx, " + "cno %#llx, fs_state %#x\n", + fsi->sbi.last_log.leb_id, + fsi->sbi.last_log.peb_id, + fsi->sbi.last_log.page_offset, + fsi->sbi.last_log.pages_count, + le64_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->free_pages), + le64_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->timestamp), + le64_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->cno), + le16_to_cpu(SSDFS_VS(fsi->sbi.vs_buf)->state)); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +static int find_seg_with_valid_start_peb(struct ssdfs_fs_info *fsi, + size_t seg_size, + loff_t *offset, + u64 threshold, + int silent, + int op_type) +{ + struct super_block *sb = fsi->sb; + loff_t off; + size_t hdr_size = sizeof(struct ssdfs_segment_header); + struct ssdfs_volume_header *vh; + bool magic_valid = false; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fsi %p, seg_size %zu, start_offset %llu, " + "threshold %llu, silent %#x, op_type %#x\n", + fsi, seg_size, (unsigned long long)*offset, + threshold, silent, op_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (op_type) { + case SSDFS_USE_PEB_ISBAD_OP: + if (!fsi->devops->peb_isbad) { + SSDFS_ERR("unable to detect bad PEB\n"); + return -EOPNOTSUPP; + } + break; + + case SSDFS_USE_READ_OP: + if (!fsi->devops->read) { + SSDFS_ERR("unable to read from device\n"); + return -EOPNOTSUPP; + } + break; + + default: + BUG(); + }; + + if (*offset != SSDFS_RESERVED_VBR_SIZE) + off = (*offset / seg_size) * seg_size; + else + off = *offset; + + while (off < threshold) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("off %llu\n", (u64)off); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (op_type) { + case SSDFS_USE_PEB_ISBAD_OP: + err = fsi->devops->peb_isbad(sb, off); + magic_valid = true; + break; + + case SSDFS_USE_READ_OP: + err = fsi->devops->read(sb, off, hdr_size, + fsi->sbi.vh_buf); + vh = SSDFS_VH(fsi->sbi.vh_buf); + magic_valid = is_ssdfs_magic_valid(&vh->magic); + break; + + default: + BUG(); + }; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("HEADER DUMP: magic_valid %#x, err %d\n", + magic_valid, err); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + fsi->sbi.vh_buf, hdr_size); + SSDFS_DBG("\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!err) { + if (magic_valid) { + *offset = off; + return 0; + } + } else if (!silent) { + SSDFS_NOTICE("offset %llu is in bad PEB\n", + (unsigned long long)off); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("offset %llu is in bad PEB\n", + (unsigned long long)off); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + off += 2 * seg_size; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find valid PEB\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return -ENODATA; +} + +static int ssdfs_find_any_valid_volume_header(struct ssdfs_fs_info *fsi, + loff_t offset, + int silent) +{ + struct super_block *sb; + size_t seg_size = SSDFS_128KB; + loff_t start_offset = offset; + size_t hdr_size = sizeof(struct ssdfs_segment_header); + u64 dev_size; + u64 threshold; + struct ssdfs_volume_header *vh; + bool magic_valid, crc_valid; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + BUG_ON(!fsi->sbi.vh_buf); + BUG_ON(!fsi->devops->read); + + SSDFS_DBG("fsi %p, fsi->sbi.vh_buf %p, silent %#x\n", + fsi, fsi->sbi.vh_buf, silent); +#endif /* CONFIG_SSDFS_DEBUG */ + + sb = fsi->sb; + dev_size = fsi->devops->device_size(sb); + +try_seg_size: + threshold = SSDFS_MAPTBL_PROTECTION_STEP; + threshold *= SSDFS_MAPTBL_PROTECTION_RANGE; + threshold *= seg_size; + threshold = min_t(u64, dev_size, threshold + offset); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("offset %llu, dev_size %llu, threshold %llu\n", + offset, dev_size, threshold); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (fsi->devops->peb_isbad) { + err = fsi->devops->peb_isbad(sb, offset); + if (err) { + if (!silent) { + SSDFS_NOTICE("offset %llu is in bad PEB\n", + (unsigned long long)offset); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("offset %llu is in bad PEB\n", + (unsigned long long)offset); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + offset += seg_size; + err = find_seg_with_valid_start_peb(fsi, seg_size, + &offset, threshold, + silent, + SSDFS_USE_PEB_ISBAD_OP); + if (err) { + switch (seg_size) { + case SSDFS_128KB: + offset = start_offset; + seg_size = SSDFS_256KB; + goto try_seg_size; + + case SSDFS_256KB: + offset = start_offset; + seg_size = SSDFS_512KB; + goto try_seg_size; + + case SSDFS_512KB: + offset = start_offset; + seg_size = SSDFS_2MB; + goto try_seg_size; + + case SSDFS_2MB: + offset = start_offset; + seg_size = SSDFS_8MB; + goto try_seg_size; + + default: + /* finish search */ + break; + } + + SSDFS_NOTICE("unable to find valid start PEB: " + "err %d\n", err); + return err; + } + } + } + + err = find_seg_with_valid_start_peb(fsi, seg_size, &offset, + threshold, silent, + SSDFS_USE_READ_OP); + if (unlikely(err)) { + switch (seg_size) { + case SSDFS_128KB: + offset = start_offset; + seg_size = SSDFS_256KB; + goto try_seg_size; + + case SSDFS_256KB: + offset = start_offset; + seg_size = SSDFS_512KB; + goto try_seg_size; + + case SSDFS_512KB: + offset = start_offset; + seg_size = SSDFS_2MB; + goto try_seg_size; + + case SSDFS_2MB: + offset = start_offset; + seg_size = SSDFS_8MB; + goto try_seg_size; + + default: + /* finish search */ + break; + } + + SSDFS_NOTICE("unable to find valid start PEB\n"); + return err; + } + + vh = SSDFS_VH(fsi->sbi.vh_buf); + + seg_size = 1 << vh->log_segsize; + + magic_valid = is_ssdfs_magic_valid(&vh->magic); + crc_valid = is_ssdfs_volume_header_csum_valid(fsi->sbi.vh_buf, + hdr_size); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("magic_valid %#x, crc_valid %#x\n", + magic_valid, crc_valid); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!magic_valid && !crc_valid) { + if (!silent) + SSDFS_NOTICE("valid magic is not detected\n"); + else + SSDFS_DBG("valid magic is not detected\n"); + return -ENOENT; + } else if ((magic_valid && !crc_valid) || (!magic_valid && crc_valid)) { + loff_t start_off; + +try_again: + start_off = offset; + if (offset >= (threshold - seg_size)) { + if (!silent) + SSDFS_NOTICE("valid magic is not detected\n"); + else + SSDFS_DBG("valid magic is not detected\n"); + return -ENOENT; + } + + if (fsi->devops->peb_isbad) { + err = find_seg_with_valid_start_peb(fsi, seg_size, + &offset, threshold, + silent, + SSDFS_USE_PEB_ISBAD_OP); + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find valid start PEB: " + "err %d\n", err); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } + } + + if (start_off == offset) + offset += seg_size; + + err = find_seg_with_valid_start_peb(fsi, seg_size, &offset, + threshold, silent, + SSDFS_USE_READ_OP); + if (unlikely(err)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find valid start PEB: " + "err %d\n", err); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } + + magic_valid = is_ssdfs_magic_valid(&vh->magic); + crc_valid = is_ssdfs_volume_header_csum_valid(fsi->sbi.vh_buf, + hdr_size); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("magic_valid %#x, crc_valid %#x\n", + magic_valid, crc_valid); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!(magic_valid && crc_valid)) { + if (!silent) + SSDFS_NOTICE("valid magic is not detected\n"); + else + SSDFS_DBG("valid magic is not detected\n"); + return -ENOENT; + } + } + + if (!is_ssdfs_volume_header_consistent(fsi, vh, dev_size)) + goto try_again; + + fsi->pagesize = 1 << vh->log_pagesize; + + if (fsi->is_zns_device) { + fsi->erasesize = fsi->zone_size; + fsi->segsize = fsi->erasesize * le16_to_cpu(vh->pebs_per_seg); + } else { + fsi->erasesize = 1 << vh->log_erasesize; + fsi->segsize = 1 << vh->log_segsize; + } + + fsi->pages_per_seg = fsi->segsize / fsi->pagesize; + fsi->pages_per_peb = fsi->erasesize / fsi->pagesize; + fsi->pebs_per_seg = 1 << vh->log_pebs_per_seg; + + return 0; +} + +static int ssdfs_read_checked_sb_info(struct ssdfs_fs_info *fsi, u64 peb_id, + u32 pages_off, bool silent) +{ + u32 lf_off; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + + SSDFS_DBG("fsi %p, peb_id %llu, pages_off %u, silent %#x\n", + fsi, peb_id, pages_off, silent); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_read_checked_segment_header(fsi, peb_id, pages_off, + fsi->sbi.vh_buf, silent); + if (err) { + if (!silent) { + SSDFS_ERR("volume header is corrupted: " + "peb_id %llu, offset %d, err %d\n", + peb_id, pages_off, err); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("volume header is corrupted: " + "peb_id %llu, offset %d, err %d\n", + peb_id, pages_off, err); +#endif /* CONFIG_SSDFS_DEBUG */ + } + return err; + } + + lf_off = SSDFS_LOG_FOOTER_OFF(fsi->sbi.vh_buf); + + err = ssdfs_read_checked_log_footer(fsi, SSDFS_SEG_HDR(fsi->sbi.vh_buf), + peb_id, lf_off, fsi->sbi.vs_buf, + silent); + if (err) { + if (!silent) { + SSDFS_ERR("log footer is corrupted: " + "peb_id %llu, offset %d, err %d\n", + peb_id, lf_off, err); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log footer is corrupted: " + "peb_id %llu, offset %d, err %d\n", + peb_id, lf_off, err); +#endif /* CONFIG_SSDFS_DEBUG */ + } + return err; + } + + return 0; +} + +static int ssdfs_read_checked_sb_info2(struct ssdfs_fs_info *fsi, u64 peb_id, + u32 pages_off, bool silent, + u32 *cur_off) +{ + u32 bytes_off; + u32 log_pages; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + + SSDFS_DBG("fsi %p, peb_id %llu, pages_off %u, silent %#x\n", + fsi, peb_id, pages_off, silent); +#endif /* CONFIG_SSDFS_DEBUG */ + + bytes_off = pages_off * fsi->pagesize; + + err = ssdfs_read_unchecked_log_footer(fsi, peb_id, bytes_off, + fsi->sbi.vs_buf, silent, + &log_pages); + if (err) { + if (!silent) { + SSDFS_ERR("fail to read the log footer: " + "peb_id %llu, offset %u, err %d\n", + peb_id, bytes_off, err); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fail to read the log footer: " + "peb_id %llu, offset %u, err %d\n", + peb_id, bytes_off, err); +#endif /* CONFIG_SSDFS_DEBUG */ + } + return err; + } + + if (log_pages == 0 || + log_pages > fsi->pages_per_peb || + pages_off < log_pages) { + if (!silent) { + SSDFS_ERR("invalid log_pages %u\n", log_pages); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("invalid log_pages %u\n", log_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + } + return -ERANGE; + } + + pages_off -= log_pages - 1; + *cur_off -= log_pages - 1; + + err = ssdfs_read_checked_segment_header(fsi, peb_id, pages_off, + fsi->sbi.vh_buf, silent); + if (err) { + if (!silent) { + SSDFS_ERR("volume header is corrupted: " + "peb_id %llu, offset %d, err %d\n", + peb_id, pages_off, err); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("volume header is corrupted: " + "peb_id %llu, offset %d, err %d\n", + peb_id, pages_off, err); +#endif /* CONFIG_SSDFS_DEBUG */ + } + return err; + } + + err = ssdfs_check_log_footer(fsi, + SSDFS_SEG_HDR(fsi->sbi.vh_buf), + SSDFS_LF(fsi->sbi.vs_buf), + silent); + if (err) { + if (!silent) { + SSDFS_ERR("log footer is corrupted: " + "peb_id %llu, bytes_off %u, err %d\n", + peb_id, bytes_off, err); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log footer is corrupted: " + "peb_id %llu, bytes_off %u, err %d\n", + peb_id, bytes_off, err); +#endif /* CONFIG_SSDFS_DEBUG */ + } + return err; + } + + return 0; +} + +static int ssdfs_find_any_valid_sb_segment(struct ssdfs_fs_info *fsi, + u64 start_peb_id) +{ +#ifdef CONFIG_SSDFS_DEBUG + size_t hdr_size = sizeof(struct ssdfs_segment_header); +#endif /* CONFIG_SSDFS_DEBUG */ + size_t vh_size = sizeof(struct ssdfs_volume_header); + struct ssdfs_volume_header *vh; + struct ssdfs_segment_header *seg_hdr; + u64 dev_size; + loff_t offset = start_peb_id * fsi->erasesize; + loff_t step = SSDFS_RESERVED_SB_SEGS * SSDFS_128KB; + u64 last_cno, cno; + __le64 peb1, peb2; + __le64 leb1, leb2; + u64 checked_pebs[SSDFS_SB_CHAIN_MAX][SSDFS_SB_SEG_COPY_MAX]; + int i, j; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + BUG_ON(!fsi->sbi.vh_buf); + BUG_ON(!fsi->devops->read); + BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(fsi->sbi.vh_buf)->magic)); + BUG_ON(!is_ssdfs_volume_header_csum_valid(fsi->sbi.vh_buf, hdr_size)); + + SSDFS_DBG("fsi %p, fsi->sbi.vh_buf %p, start_peb_id %llu\n", + fsi, fsi->sbi.vh_buf, start_peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + i = SSDFS_SB_CHAIN_MAX; + dev_size = fsi->devops->device_size(fsi->sb); + memset(checked_pebs, 0xFF, + (SSDFS_SB_CHAIN_MAX * sizeof(u64)) + + (SSDFS_SB_SEG_COPY_MAX * sizeof(u64))); + +try_next_volume_portion: + ssdfs_memcpy(&fsi->last_vh, 0, vh_size, + fsi->sbi.vh_buf, 0, vh_size, + vh_size); + last_cno = le64_to_cpu(SSDFS_SEG_HDR(fsi->sbi.vh_buf)->cno); + +try_again: + switch (i) { + case SSDFS_SB_CHAIN_MAX: + i = SSDFS_CUR_SB_SEG; + break; + + case SSDFS_CUR_SB_SEG: + i = SSDFS_NEXT_SB_SEG; + break; + + case SSDFS_NEXT_SB_SEG: + i = SSDFS_RESERVED_SB_SEG; + break; + + default: + offset += step; + + if (offset >= dev_size) + goto fail_find_sb_seg; + + err = ssdfs_find_any_valid_volume_header(fsi, offset, true); + if (err) + goto fail_find_sb_seg; + else { + i = SSDFS_SB_CHAIN_MAX; + goto try_next_volume_portion; + } + break; + } + + err = -ENODATA; + + for (j = SSDFS_MAIN_SB_SEG; j < SSDFS_SB_SEG_COPY_MAX; j++) { + u64 leb_id = le64_to_cpu(fsi->last_vh.sb_pebs[i][j].leb_id); + u64 peb_id = le64_to_cpu(fsi->last_vh.sb_pebs[i][j].peb_id); + u16 seg_type; + + if (peb_id == U64_MAX || leb_id == U64_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid peb_id %llu, leb_id %llu\n", + leb_id, peb_id); + goto fail_find_sb_seg; + } + + if (start_peb_id > peb_id) + continue; + + if (checked_pebs[i][j] == peb_id) + continue; + else + checked_pebs[i][j] = peb_id; + + if ((peb_id * fsi->erasesize) < dev_size) + offset = peb_id * fsi->erasesize; + + err = ssdfs_read_checked_sb_info(fsi, peb_id, + 0, true); + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_id %llu is corrupted: err %d\n", + peb_id, err); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } + + fsi->sbi.last_log.leb_id = leb_id; + fsi->sbi.last_log.peb_id = peb_id; + fsi->sbi.last_log.page_offset = 0; + fsi->sbi.last_log.pages_count = + SSDFS_LOG_PAGES(fsi->sbi.vh_buf); + + seg_hdr = SSDFS_SEG_HDR(fsi->sbi.vh_buf); + seg_type = SSDFS_SEG_TYPE(seg_hdr); + + if (seg_type == SSDFS_SB_SEG_TYPE) + return 0; + else { + err = -EIO; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PEB %llu is not sb segment\n", + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + if (!err) + goto compare_vh_info; + } + + if (err) { + ssdfs_memcpy(fsi->sbi.vh_buf, 0, vh_size, + &fsi->last_vh, 0, vh_size, + vh_size); + goto try_again; + } + +compare_vh_info: + vh = SSDFS_VH(fsi->sbi.vh_buf); + seg_hdr = SSDFS_SEG_HDR(fsi->sbi.vh_buf); + leb1 = fsi->last_vh.sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG].leb_id; + leb2 = vh->sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG].leb_id; + peb1 = fsi->last_vh.sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG].peb_id; + peb2 = vh->sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG].peb_id; + cno = le64_to_cpu(seg_hdr->cno); + + if (cno > last_cno && (leb1 != leb2 || peb1 != peb2)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cno %llu, last_cno %llu, " + "leb1 %llu, leb2 %llu, " + "peb1 %llu, peb2 %llu\n", + cno, last_cno, + le64_to_cpu(leb1), le64_to_cpu(leb2), + le64_to_cpu(peb1), le64_to_cpu(peb2)); +#endif /* CONFIG_SSDFS_DEBUG */ + goto try_again; + } + +fail_find_sb_seg: + SSDFS_CRIT("unable to find any valid segment with superblocks chain\n"); + return -EIO; +} + +static inline bool is_sb_peb_exhausted2(struct ssdfs_fs_info *fsi, + u64 leb_id, u64 peb_id) +{ +#ifdef CONFIG_SSDFS_DEBUG + size_t hdr_size = sizeof(struct ssdfs_segment_header); +#endif /* CONFIG_SSDFS_DEBUG */ + struct ssdfs_peb_extent checking_page; + u64 pages_per_peb; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + BUG_ON(!fsi->sbi.vh_buf); + BUG_ON(!fsi->devops->read); + BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(fsi->sbi.vh_buf)->magic)); + BUG_ON(!is_ssdfs_volume_header_csum_valid(fsi->sbi.vh_buf, hdr_size)); + + SSDFS_DBG("fsi %p, fsi->sbi.vh_buf %p, " + "leb_id %llu, peb_id %llu\n", + fsi, fsi->sbi.vh_buf, + leb_id, peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!fsi->devops->can_write_page) { + SSDFS_CRIT("fail to find latest valid sb info: " + "can_write_page is not supported\n"); + return true; + } + + if (leb_id >= U64_MAX || peb_id >= U64_MAX) { + SSDFS_ERR("invalid leb_id %llu or peb_id %llu\n", + leb_id, peb_id); + return true; + } + + checking_page.leb_id = leb_id; + checking_page.peb_id = peb_id; + + if (fsi->is_zns_device) { + pages_per_peb = div64_u64(fsi->zone_capacity, fsi->pagesize); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(pages_per_peb >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + checking_page.page_offset = (u32)pages_per_peb - 2; + } else { + checking_page.page_offset = fsi->pages_per_peb - 2; + } + + checking_page.pages_count = 1; + + err = ssdfs_can_write_sb_log(fsi->sb, &checking_page); + if (!err) + return false; + + return true; +} + +static inline bool is_cur_main_sb_peb_exhausted2(struct ssdfs_fs_info *fsi) +{ + u64 leb_id; + u64 peb_id; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + BUG_ON(!fsi->sbi.vh_buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + leb_id = SSDFS_MAIN_SB_LEB(SSDFS_VH(fsi->sbi.vh_buf), + SSDFS_CUR_SB_SEG); + peb_id = SSDFS_MAIN_SB_PEB(SSDFS_VH(fsi->sbi.vh_buf), + SSDFS_CUR_SB_SEG); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fsi %p, fsi->sbi.vh_buf %p, " + "leb_id %llu, peb_id %llu\n", + fsi, fsi->sbi.vh_buf, + leb_id, peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + return is_sb_peb_exhausted2(fsi, leb_id, peb_id); +} + +static inline bool is_cur_copy_sb_peb_exhausted2(struct ssdfs_fs_info *fsi) +{ + u64 leb_id; + u64 peb_id; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + BUG_ON(!fsi->sbi.vh_buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + leb_id = SSDFS_COPY_SB_LEB(SSDFS_VH(fsi->sbi.vh_buf), + SSDFS_CUR_SB_SEG); + peb_id = SSDFS_COPY_SB_PEB(SSDFS_VH(fsi->sbi.vh_buf), + SSDFS_CUR_SB_SEG); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fsi %p, fsi->sbi.vh_buf %p, " + "leb_id %llu, peb_id %llu\n", + fsi, fsi->sbi.vh_buf, + leb_id, peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + return is_sb_peb_exhausted2(fsi, leb_id, peb_id); +} + +static int ssdfs_find_latest_valid_sb_segment(struct ssdfs_fs_info *fsi) +{ +#ifdef CONFIG_SSDFS_DEBUG + size_t hdr_size = sizeof(struct ssdfs_segment_header); +#endif /* CONFIG_SSDFS_DEBUG */ + struct ssdfs_volume_header *last_vh; + u64 cur_main_sb_peb, cur_copy_sb_peb; + u64 cno1, cno2; + u64 cur_peb, next_peb, prev_peb; + u64 cur_leb, next_leb, prev_leb; + u16 seg_type; + loff_t offset; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + BUG_ON(!fsi->sbi.vh_buf); + BUG_ON(!fsi->devops->read); + BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(fsi->sbi.vh_buf)->magic)); + BUG_ON(!is_ssdfs_volume_header_csum_valid(fsi->sbi.vh_buf, hdr_size)); + + SSDFS_DBG("fsi %p, fsi->sbi.vh_buf %p\n", fsi, fsi->sbi.vh_buf); +#endif /* CONFIG_SSDFS_DEBUG */ + +try_next_peb: + last_vh = SSDFS_VH(fsi->sbi.vh_buf); + cur_main_sb_peb = SSDFS_MAIN_SB_PEB(last_vh, SSDFS_CUR_SB_SEG); + cur_copy_sb_peb = SSDFS_COPY_SB_PEB(last_vh, SSDFS_CUR_SB_SEG); + + if (cur_main_sb_peb != fsi->sbi.last_log.peb_id && + cur_copy_sb_peb != fsi->sbi.last_log.peb_id) { + SSDFS_ERR("volume header is corrupted\n"); +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_main_sb_peb %llu, cur_copy_sb_peb %llu, " + "read PEB %llu\n", + cur_main_sb_peb, cur_copy_sb_peb, + fsi->sbi.last_log.peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + err = -EIO; + goto end_search; + } + + if (cur_main_sb_peb == fsi->sbi.last_log.peb_id) { + if (!is_cur_main_sb_peb_exhausted2(fsi)) + goto end_search; + } else { + if (!is_cur_copy_sb_peb_exhausted2(fsi)) + goto end_search; + } + + ssdfs_backup_sb_info(fsi); + + next_leb = SSDFS_MAIN_SB_LEB(SSDFS_VH(fsi->sbi.vh_buf), + SSDFS_NEXT_SB_SEG); + next_peb = SSDFS_MAIN_SB_PEB(SSDFS_VH(fsi->sbi.vh_buf), + SSDFS_NEXT_SB_SEG); + if (next_leb == U64_MAX || next_peb == U64_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid next_leb %llu, next_peb %llu\n", + next_leb, next_peb); + goto end_search; + } + + err = ssdfs_read_checked_sb_info(fsi, next_peb, 0, true); + if (!err) { + fsi->sbi.last_log.leb_id = next_leb; + fsi->sbi.last_log.peb_id = next_peb; + fsi->sbi.last_log.page_offset = 0; + fsi->sbi.last_log.pages_count = + SSDFS_LOG_PAGES(fsi->sbi.vh_buf); + goto check_volume_header; + } else { + ssdfs_restore_sb_info(fsi); + err = 0; /* try to read the backup copy */ + } + + next_leb = SSDFS_COPY_SB_LEB(SSDFS_VH(fsi->sbi.vh_buf), + SSDFS_NEXT_SB_SEG); + next_peb = SSDFS_COPY_SB_PEB(SSDFS_VH(fsi->sbi.vh_buf), + SSDFS_NEXT_SB_SEG); + if (next_leb == U64_MAX || next_peb == U64_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid next_leb %llu, next_peb %llu\n", + next_leb, next_peb); + goto end_search; + } + + err = ssdfs_read_checked_sb_info(fsi, next_peb, 0, true); + if (err) { + if (err == -EIO) { + /* next sb segments are corrupted */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("next sb PEB %llu is corrupted\n", + next_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + /* next sb segments are invalid */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("next sb PEB %llu is invalid\n", + next_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + ssdfs_restore_sb_info(fsi); + + offset = next_peb * fsi->erasesize; + + err = ssdfs_find_any_valid_volume_header(fsi, offset, true); + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find any valid header: " + "peb_id %llu\n", + next_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + err = 0; + goto rollback_valid_vh; + } + + err = ssdfs_find_any_valid_sb_segment(fsi, next_peb); + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find any valid sb seg: " + "peb_id %llu\n", + next_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + err = 0; + goto rollback_valid_vh; + } else + goto try_next_peb; + } + + fsi->sbi.last_log.leb_id = next_leb; + fsi->sbi.last_log.peb_id = next_peb; + fsi->sbi.last_log.page_offset = 0; + fsi->sbi.last_log.pages_count = SSDFS_LOG_PAGES(fsi->sbi.vh_buf); + +check_volume_header: + seg_type = SSDFS_SEG_TYPE(SSDFS_SEG_HDR(fsi->sbi.vh_buf)); + if (seg_type != SSDFS_SB_SEG_TYPE) { + SSDFS_DBG("invalid segment type\n"); + err = 0; + goto mount_fs_read_only; + } + + cno1 = SSDFS_SEG_CNO(fsi->sbi_backup.vh_buf); + cno2 = SSDFS_SEG_CNO(fsi->sbi.vh_buf); + if (cno1 >= cno2) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("last cno %llu is not lesser than read cno %llu\n", + cno1, cno2); +#endif /* CONFIG_SSDFS_DEBUG */ + err = 0; + goto mount_fs_read_only; + } + + next_peb = SSDFS_MAIN_SB_PEB(SSDFS_VH(fsi->sbi_backup.vh_buf), + SSDFS_NEXT_SB_SEG); + cur_peb = SSDFS_MAIN_SB_PEB(SSDFS_VH(fsi->sbi.vh_buf), + SSDFS_CUR_SB_SEG); + if (next_peb != cur_peb) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("next_peb %llu doesn't equal to cur_peb %llu\n", + next_peb, cur_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + err = 0; + goto mount_fs_read_only; + } + + prev_peb = SSDFS_MAIN_SB_PEB(SSDFS_VH(fsi->sbi.vh_buf), + SSDFS_PREV_SB_SEG); + cur_peb = SSDFS_MAIN_SB_PEB(SSDFS_VH(fsi->sbi_backup.vh_buf), + SSDFS_CUR_SB_SEG); + if (prev_peb != cur_peb) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("prev_peb %llu doesn't equal to cur_peb %llu\n", + prev_peb, cur_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + err = 0; + goto mount_fs_read_only; + } + + next_leb = SSDFS_MAIN_SB_LEB(SSDFS_VH(fsi->sbi_backup.vh_buf), + SSDFS_NEXT_SB_SEG); + cur_leb = SSDFS_MAIN_SB_LEB(SSDFS_VH(fsi->sbi.vh_buf), + SSDFS_CUR_SB_SEG); + if (next_leb != cur_leb) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("next_leb %llu doesn't equal to cur_leb %llu\n", + next_leb, cur_leb); +#endif /* CONFIG_SSDFS_DEBUG */ + err = 0; + goto mount_fs_read_only; + } + + prev_leb = SSDFS_MAIN_SB_LEB(SSDFS_VH(fsi->sbi.vh_buf), + SSDFS_PREV_SB_SEG); + cur_leb = SSDFS_MAIN_SB_LEB(SSDFS_VH(fsi->sbi_backup.vh_buf), + SSDFS_CUR_SB_SEG); + if (prev_leb != cur_leb) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("prev_leb %llu doesn't equal to cur_leb %llu\n", + prev_leb, cur_leb); +#endif /* CONFIG_SSDFS_DEBUG */ + err = 0; + goto mount_fs_read_only; + } + + next_peb = SSDFS_COPY_SB_PEB(SSDFS_VH(fsi->sbi_backup.vh_buf), + SSDFS_NEXT_SB_SEG); + cur_peb = SSDFS_COPY_SB_PEB(SSDFS_VH(fsi->sbi.vh_buf), + SSDFS_CUR_SB_SEG); + if (next_peb != cur_peb) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("next_peb %llu doesn't equal to cur_peb %llu\n", + next_peb, cur_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + err = 0; + goto mount_fs_read_only; + } + + prev_peb = SSDFS_COPY_SB_PEB(SSDFS_VH(fsi->sbi.vh_buf), + SSDFS_PREV_SB_SEG); + cur_peb = SSDFS_COPY_SB_PEB(SSDFS_VH(fsi->sbi_backup.vh_buf), + SSDFS_CUR_SB_SEG); + if (prev_peb != cur_peb) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("prev_peb %llu doesn't equal to cur_peb %llu\n", + prev_peb, cur_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + err = 0; + goto mount_fs_read_only; + } + + next_leb = SSDFS_COPY_SB_LEB(SSDFS_VH(fsi->sbi_backup.vh_buf), + SSDFS_NEXT_SB_SEG); + cur_leb = SSDFS_COPY_SB_LEB(SSDFS_VH(fsi->sbi.vh_buf), + SSDFS_CUR_SB_SEG); + if (next_leb != cur_leb) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("next_leb %llu doesn't equal to cur_leb %llu\n", + next_leb, cur_leb); +#endif /* CONFIG_SSDFS_DEBUG */ + err = 0; + goto mount_fs_read_only; + } + + prev_leb = SSDFS_COPY_SB_LEB(SSDFS_VH(fsi->sbi.vh_buf), + SSDFS_PREV_SB_SEG); + cur_leb = SSDFS_COPY_SB_LEB(SSDFS_VH(fsi->sbi_backup.vh_buf), + SSDFS_CUR_SB_SEG); + if (prev_leb != cur_leb) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("prev_leb %llu doesn't equal to cur_leb %llu\n", + prev_leb, cur_leb); +#endif /* CONFIG_SSDFS_DEBUG */ + err = 0; + goto mount_fs_read_only; + } + + goto try_next_peb; + +mount_fs_read_only: + SSDFS_NOTICE("unable to mount in RW mode: " + "chain of superblock's segments is broken\n"); + fsi->sb->s_flags |= SB_RDONLY; + +rollback_valid_vh: + ssdfs_restore_sb_info(fsi); + +end_search: + return err; +} + +static inline +u64 ssdfs_swap_current_sb_peb(struct ssdfs_volume_header *vh, u64 peb) +{ + if (peb == SSDFS_MAIN_SB_PEB(vh, SSDFS_CUR_SB_SEG)) + return SSDFS_COPY_SB_PEB(vh, SSDFS_CUR_SB_SEG); + else if (peb == SSDFS_COPY_SB_PEB(vh, SSDFS_CUR_SB_SEG)) + return SSDFS_MAIN_SB_PEB(vh, SSDFS_CUR_SB_SEG); + + BUG(); + return ULLONG_MAX; +} + +static inline +u64 ssdfs_swap_current_sb_leb(struct ssdfs_volume_header *vh, u64 leb) +{ + if (leb == SSDFS_MAIN_SB_LEB(vh, SSDFS_CUR_SB_SEG)) + return SSDFS_COPY_SB_LEB(vh, SSDFS_CUR_SB_SEG); + else if (leb == SSDFS_COPY_SB_LEB(vh, SSDFS_CUR_SB_SEG)) + return SSDFS_MAIN_SB_LEB(vh, SSDFS_CUR_SB_SEG); + + BUG(); + return ULLONG_MAX; +} + +/* + * This method expects that first volume header and log footer + * are checked yet and they are valid. + */ +static int ssdfs_find_latest_valid_sb_info(struct ssdfs_fs_info *fsi) +{ + struct ssdfs_segment_header *last_seg_hdr; + u64 leb, peb; + u32 cur_off, low_off, high_off; + u32 log_pages; + u64 pages_per_peb; + int err = 0; +#ifdef CONFIG_SSDFS_DEBUG + size_t hdr_size = sizeof(struct ssdfs_segment_header); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + BUG_ON(!fsi->sbi.vh_buf); + BUG_ON(!fsi->devops->read); + BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(fsi->sbi.vh_buf)->magic)); + BUG_ON(!is_ssdfs_volume_header_csum_valid(fsi->sbi.vh_buf, hdr_size)); + + SSDFS_DBG("fsi %p, fsi->sbi.vh_buf %p\n", fsi, fsi->sbi.vh_buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_backup_sb_info(fsi); + last_seg_hdr = SSDFS_SEG_HDR(fsi->sbi.vh_buf); + leb = fsi->sbi.last_log.leb_id; + peb = fsi->sbi.last_log.peb_id; + log_pages = SSDFS_LOG_PAGES(last_seg_hdr); + + if (fsi->is_zns_device) + pages_per_peb = div64_u64(fsi->zone_capacity, fsi->pagesize); + else + pages_per_peb = fsi->pages_per_peb; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(pages_per_peb >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + low_off = fsi->sbi.last_log.page_offset; + high_off = (u32)pages_per_peb; + cur_off = low_off + log_pages; + + do { + u32 diff_pages, diff_logs; + u64 cno1, cno2; + u64 copy_leb, copy_peb; + u32 peb_pages_off; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(cur_off >= pages_per_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + peb_pages_off = cur_off % (u32)pages_per_peb; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(peb_pages_off > U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (leb == U64_MAX || peb == U64_MAX) { + err = -ENODATA; + break; + } + + err = ssdfs_read_checked_sb_info(fsi, peb, + peb_pages_off, true); + cno1 = SSDFS_SEG_CNO(fsi->sbi_backup.vh_buf); + cno2 = SSDFS_SEG_CNO(fsi->sbi.vh_buf); + if (err == -EIO || cno1 >= cno2) { + void *buf = fsi->sbi_backup.vh_buf; + + copy_peb = ssdfs_swap_current_sb_peb(buf, peb); + copy_leb = ssdfs_swap_current_sb_leb(buf, leb); + if (copy_leb == U64_MAX || copy_peb == U64_MAX) { + err = -ERANGE; + break; + } + + err = ssdfs_read_checked_sb_info(fsi, copy_peb, + peb_pages_off, true); + cno1 = SSDFS_SEG_CNO(fsi->sbi_backup.vh_buf); + cno2 = SSDFS_SEG_CNO(fsi->sbi.vh_buf); + if (!err) { + peb = copy_peb; + leb = copy_leb; + fsi->sbi.last_log.leb_id = leb; + fsi->sbi.last_log.peb_id = peb; + fsi->sbi.last_log.page_offset = cur_off; + fsi->sbi.last_log.pages_count = + SSDFS_LOG_PAGES(fsi->sbi.vh_buf); + } + } else { + fsi->sbi.last_log.leb_id = leb; + fsi->sbi.last_log.peb_id = peb; + fsi->sbi.last_log.page_offset = cur_off; + fsi->sbi.last_log.pages_count = + SSDFS_LOG_PAGES(fsi->sbi.vh_buf); + } + + if (err == -ENODATA || err == -EIO || cno1 >= cno2) { + err = !err ? -EIO : err; + high_off = cur_off; + } else if (err) { + /* we have internal error */ + break; + } else { + ssdfs_backup_sb_info(fsi); + low_off = cur_off; + } + + diff_pages = high_off - low_off; + diff_logs = (diff_pages / log_pages) / 2; + cur_off = low_off + (diff_logs * log_pages); + } while (cur_off > low_off && cur_off < high_off); + + if (err) { + if (err == -ENODATA || err == -EIO) { + /* previous read log was valid */ + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_off %u, low_off %u, high_off %u\n", + cur_off, low_off, high_off); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + SSDFS_ERR("fail to find valid volume header: err %d\n", + err); + } + + ssdfs_restore_sb_info(fsi); + } + + return err; +} + +/* + * This method expects that first volume header and log footer + * are checked yet and they are valid. + */ +static int ssdfs_find_latest_valid_sb_info2(struct ssdfs_fs_info *fsi) +{ + struct ssdfs_segment_header *last_seg_hdr; + struct ssdfs_peb_extent checking_page; + u64 leb, peb; + u32 cur_off, low_off, high_off; + u32 log_pages; + u32 start_offset; + u32 found_log_off; + u64 cno1, cno2; + u64 copy_leb, copy_peb; + u32 peb_pages_off; + u64 pages_per_peb; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + BUG_ON(!fsi->sbi.vh_buf); + + SSDFS_DBG("fsi %p, fsi->sbi.vh_buf %p\n", fsi, fsi->sbi.vh_buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!fsi->devops->can_write_page) { + SSDFS_CRIT("fail to find latest valid sb info: " + "can_write_page is not supported\n"); + return -EOPNOTSUPP; + } + + ssdfs_backup_sb_info(fsi); + last_seg_hdr = SSDFS_SEG_HDR(fsi->sbi.vh_buf); + leb = fsi->sbi.last_log.leb_id; + peb = fsi->sbi.last_log.peb_id; + + if (leb == U64_MAX || peb == U64_MAX) { + ssdfs_restore_sb_info(fsi); + SSDFS_ERR("invalid leb_id %llu or peb_id %llu\n", + leb, peb); + return -ERANGE; + } + + if (fsi->is_zns_device) + pages_per_peb = div64_u64(fsi->zone_capacity, fsi->pagesize); + else + pages_per_peb = fsi->pages_per_peb; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(pages_per_peb >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + log_pages = SSDFS_LOG_PAGES(last_seg_hdr); + start_offset = fsi->sbi.last_log.page_offset + log_pages; + low_off = start_offset; + high_off = (u32)pages_per_peb; + cur_off = low_off; + + checking_page.leb_id = leb; + checking_page.peb_id = peb; + checking_page.page_offset = cur_off; + checking_page.pages_count = 1; + + err = ssdfs_can_write_sb_log(fsi->sb, &checking_page); + if (err == -EIO) { + /* correct low bound */ + err = 0; + low_off++; + } else if (err) { + SSDFS_ERR("fail to check for write PEB %llu\n", + peb); + return err; + } else { + ssdfs_restore_sb_info(fsi); + + /* previous read log was valid */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_off %u, low_off %u, high_off %u\n", + cur_off, low_off, high_off); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + cur_off = high_off - 1; + + do { + u32 diff_pages; + + checking_page.leb_id = leb; + checking_page.peb_id = peb; + checking_page.page_offset = cur_off; + checking_page.pages_count = 1; + + err = ssdfs_can_write_sb_log(fsi->sb, &checking_page); + if (err == -EIO) { + /* correct low bound */ + err = 0; + low_off = cur_off; + } else if (err) { + SSDFS_ERR("fail to check for write PEB %llu\n", + peb); + return err; + } else { + /* correct upper bound */ + high_off = cur_off; + } + + diff_pages = (high_off - low_off) / 2; + cur_off = low_off + diff_pages; + } while (cur_off > low_off && cur_off < high_off); + + peb_pages_off = cur_off % (u32)pages_per_peb; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(peb_pages_off > U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + found_log_off = cur_off; + err = ssdfs_read_checked_sb_info2(fsi, peb, peb_pages_off, true, + &found_log_off); + cno1 = SSDFS_SEG_CNO(fsi->sbi_backup.vh_buf); + cno2 = SSDFS_SEG_CNO(fsi->sbi.vh_buf); + + if (err == -EIO || cno1 >= cno2) { + void *buf = fsi->sbi_backup.vh_buf; + + copy_peb = ssdfs_swap_current_sb_peb(buf, peb); + copy_leb = ssdfs_swap_current_sb_leb(buf, leb); + if (copy_leb == U64_MAX || copy_peb == U64_MAX) { + err = -ERANGE; + goto finish_find_latest_sb_info; + } + + found_log_off = cur_off; + err = ssdfs_read_checked_sb_info2(fsi, copy_peb, + peb_pages_off, true, + &found_log_off); + cno1 = SSDFS_SEG_CNO(fsi->sbi_backup.vh_buf); + cno2 = SSDFS_SEG_CNO(fsi->sbi.vh_buf); + if (!err) { + peb = copy_peb; + leb = copy_leb; + fsi->sbi.last_log.leb_id = leb; + fsi->sbi.last_log.peb_id = peb; + fsi->sbi.last_log.page_offset = found_log_off; + fsi->sbi.last_log.pages_count = + SSDFS_LOG_PAGES(fsi->sbi.vh_buf); + } + } else { + fsi->sbi.last_log.leb_id = leb; + fsi->sbi.last_log.peb_id = peb; + fsi->sbi.last_log.page_offset = found_log_off; + fsi->sbi.last_log.pages_count = + SSDFS_LOG_PAGES(fsi->sbi.vh_buf); + } + +finish_find_latest_sb_info: + if (err) { + if (err == -ENODATA || err == -EIO) { + /* previous read log was valid */ + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_off %u, low_off %u, high_off %u\n", + cur_off, low_off, high_off); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + SSDFS_ERR("fail to find valid volume header: err %d\n", + err); + } + + ssdfs_restore_sb_info(fsi); + } + + return err; +} + +static int ssdfs_check_fs_state(struct ssdfs_fs_info *fsi) +{ + if (fsi->sb->s_flags & SB_RDONLY) + return 0; + + switch (fsi->fs_state) { + case SSDFS_MOUNTED_FS: + SSDFS_NOTICE("unable to mount in RW mode: " + "file system didn't unmounted cleanly: " + "Please, run fsck utility\n"); + fsi->sb->s_flags |= SB_RDONLY; + return -EROFS; + + case SSDFS_ERROR_FS: + if (!ssdfs_test_opt(fsi->mount_opts, IGNORE_FS_STATE)) { + SSDFS_NOTICE("unable to mount in RW mode: " + "file system contains errors: " + "Please, run fsck utility\n"); + fsi->sb->s_flags |= SB_RDONLY; + return -EROFS; + } + break; + }; + + return 0; +} + +static int ssdfs_check_feature_compatibility(struct ssdfs_fs_info *fsi) +{ + u64 features; + + features = fsi->fs_feature_incompat & ~SSDFS_FEATURE_INCOMPAT_SUPP; + if (features) { + SSDFS_NOTICE("unable to mount: " + "unsupported incompatible features %llu\n", + features); + return -EOPNOTSUPP; + } + + features = fsi->fs_feature_compat_ro & ~SSDFS_FEATURE_COMPAT_RO_SUPP; + if (!(fsi->sb->s_flags & SB_RDONLY) && features) { + SSDFS_NOTICE("unable to mount in RW mode: " + "unsupported RO compatible features %llu\n", + features); + fsi->sb->s_flags |= SB_RDONLY; + return -EROFS; + } + + features = fsi->fs_feature_compat & ~SSDFS_FEATURE_COMPAT_SUPP; + if (features) + SSDFS_WARN("unknown compatible features %llu\n", features); + + return 0; +} + +static inline void ssdfs_init_sb_segs_array(struct ssdfs_fs_info *fsi) +{ + int i, j; + + for (i = SSDFS_CUR_SB_SEG; i < SSDFS_SB_CHAIN_MAX; i++) { + for (j = SSDFS_MAIN_SB_SEG; j < SSDFS_SB_SEG_COPY_MAX; j++) { + fsi->sb_lebs[i][j] = + le64_to_cpu(fsi->vh->sb_pebs[i][j].leb_id); + fsi->sb_pebs[i][j] = + le64_to_cpu(fsi->vh->sb_pebs[i][j].peb_id); + } + } +} + +static int ssdfs_initialize_fs_info(struct ssdfs_fs_info *fsi) +{ + int err; + + init_rwsem(&fsi->volume_sem); + + fsi->vh = SSDFS_VH(fsi->sbi.vh_buf); + fsi->vs = SSDFS_VS(fsi->sbi.vs_buf); + + fsi->sb_seg_log_pages = le16_to_cpu(fsi->vh->sb_seg_log_pages); + fsi->segbmap_log_pages = le16_to_cpu(fsi->vh->segbmap_log_pages); + fsi->maptbl_log_pages = le16_to_cpu(fsi->vh->maptbl_log_pages); + fsi->lnodes_seg_log_pages = le16_to_cpu(fsi->vh->lnodes_seg_log_pages); + fsi->hnodes_seg_log_pages = le16_to_cpu(fsi->vh->hnodes_seg_log_pages); + fsi->inodes_seg_log_pages = le16_to_cpu(fsi->vh->inodes_seg_log_pages); + fsi->user_data_log_pages = le16_to_cpu(fsi->vh->user_data_log_pages); + + /* Static volume information */ + fsi->log_pagesize = fsi->vh->log_pagesize; + fsi->pagesize = 1 << fsi->vh->log_pagesize; + fsi->log_erasesize = fsi->vh->log_erasesize; + fsi->log_segsize = fsi->vh->log_segsize; + fsi->segsize = 1 << fsi->vh->log_segsize; + fsi->log_pebs_per_seg = fsi->vh->log_pebs_per_seg; + fsi->pebs_per_seg = 1 << fsi->vh->log_pebs_per_seg; + fsi->pages_per_peb = fsi->erasesize / fsi->pagesize; + fsi->pages_per_seg = fsi->segsize / fsi->pagesize; + fsi->lebs_per_peb_index = le32_to_cpu(fsi->vh->lebs_per_peb_index); + + if (fsi->is_zns_device) { + u64 peb_pages_capacity = + fsi->zone_capacity >> fsi->vh->log_pagesize; + + fsi->erasesize = fsi->zone_size; + fsi->segsize = fsi->erasesize * + le16_to_cpu(fsi->vh->pebs_per_seg); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(peb_pages_capacity >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi->peb_pages_capacity = (u32)peb_pages_capacity; + atomic_set(&fsi->open_zones, le32_to_cpu(fsi->vs->open_zones)); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("open_zones %d\n", + atomic_read(&fsi->open_zones)); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + fsi->erasesize = 1 << fsi->vh->log_erasesize; + fsi->segsize = 1 << fsi->vh->log_segsize; + fsi->peb_pages_capacity = fsi->pages_per_peb; + } + + if (fsi->pages_per_peb > U16_MAX) + fsi->leb_pages_capacity = U16_MAX; + else + fsi->leb_pages_capacity = fsi->pages_per_peb; + + fsi->fs_ctime = le64_to_cpu(fsi->vh->create_time); + fsi->fs_cno = le64_to_cpu(fsi->vh->create_cno); + fsi->raw_inode_size = le16_to_cpu(fsi->vs->inodes_btree.desc.item_size); + fsi->create_threads_per_seg = + le16_to_cpu(fsi->vh->create_threads_per_seg); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("STATIC VOLUME INFO:\n"); + SSDFS_DBG("pagesize %u, erasesize %u, segsize %u\n", + fsi->pagesize, fsi->erasesize, fsi->segsize); + SSDFS_DBG("pebs_per_seg %u, pages_per_peb %u, " + "pages_per_seg %u, lebs_per_peb_index %u\n", + fsi->pebs_per_seg, fsi->pages_per_peb, + fsi->pages_per_seg, fsi->lebs_per_peb_index); + SSDFS_DBG("zone_size %llu, zone_capacity %llu, " + "leb_pages_capacity %u, peb_pages_capacity %u, " + "open_zones %d\n", + fsi->zone_size, fsi->zone_capacity, + fsi->leb_pages_capacity, fsi->peb_pages_capacity, + atomic_read(&fsi->open_zones)); + SSDFS_DBG("fs_ctime %llu, fs_cno %llu, " + "raw_inode_size %u, create_threads_per_seg %u\n", + (u64)fsi->fs_ctime, (u64)fsi->fs_cno, + fsi->raw_inode_size, + fsi->create_threads_per_seg); +#endif /* CONFIG_SSDFS_DEBUG */ + + /* Mutable volume info */ + init_rwsem(&fsi->sb_segs_sem); + ssdfs_init_sb_segs_array(fsi); + + mutex_init(&fsi->resize_mutex); + fsi->nsegs = le64_to_cpu(fsi->vs->nsegs); + + spin_lock_init(&fsi->volume_state_lock); + + fsi->free_pages = 0; + fsi->reserved_new_user_data_pages = 0; + fsi->updated_user_data_pages = 0; + fsi->flushing_user_data_requests = 0; + fsi->fs_mount_time = ssdfs_current_timestamp(); + fsi->fs_mod_time = le64_to_cpu(fsi->vs->timestamp); + ssdfs_init_boot_vs_mount_timediff(fsi); + fsi->fs_mount_cno = le64_to_cpu(fsi->vs->cno); + fsi->fs_flags = le32_to_cpu(fsi->vs->flags); + fsi->fs_state = le16_to_cpu(fsi->vs->state); + + fsi->fs_errors = le16_to_cpu(fsi->vs->errors); + ssdfs_initialize_fs_errors_option(fsi); + + fsi->fs_feature_compat = le64_to_cpu(fsi->vs->feature_compat); + fsi->fs_feature_compat_ro = le64_to_cpu(fsi->vs->feature_compat_ro); + fsi->fs_feature_incompat = le64_to_cpu(fsi->vs->feature_incompat); + + ssdfs_memcpy(fsi->fs_uuid, 0, SSDFS_UUID_SIZE, + fsi->vs->uuid, 0, SSDFS_UUID_SIZE, + SSDFS_UUID_SIZE); + ssdfs_memcpy(fsi->fs_label, 0, SSDFS_VOLUME_LABEL_MAX, + fsi->vs->label, 0, SSDFS_VOLUME_LABEL_MAX, + SSDFS_VOLUME_LABEL_MAX); + + fsi->metadata_options.blk_bmap.flags = + le16_to_cpu(fsi->vs->blkbmap.flags); + fsi->metadata_options.blk_bmap.compression = + fsi->vs->blkbmap.compression; + fsi->metadata_options.blk2off_tbl.flags = + le16_to_cpu(fsi->vs->blk2off_tbl.flags); + fsi->metadata_options.blk2off_tbl.compression = + fsi->vs->blk2off_tbl.compression; + fsi->metadata_options.user_data.flags = + le16_to_cpu(fsi->vs->user_data.flags); + fsi->metadata_options.user_data.compression = + fsi->vs->user_data.compression; + fsi->metadata_options.user_data.migration_threshold = + le16_to_cpu(fsi->vs->user_data.migration_threshold); + + fsi->migration_threshold = le16_to_cpu(fsi->vs->migration_threshold); + if (fsi->migration_threshold == 0 || + fsi->migration_threshold >= U16_MAX) { + /* use default value */ + fsi->migration_threshold = fsi->pebs_per_seg; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("MUTABLE VOLUME INFO:\n"); + SSDFS_DBG("sb_lebs[CUR][MAIN] %llu, sb_pebs[CUR][MAIN] %llu\n", + fsi->sb_lebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG], + fsi->sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG]); + SSDFS_DBG("sb_lebs[CUR][COPY] %llu, sb_pebs[CUR][COPY] %llu\n", + fsi->sb_lebs[SSDFS_CUR_SB_SEG][SSDFS_COPY_SB_SEG], + fsi->sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_COPY_SB_SEG]); + SSDFS_DBG("sb_lebs[NEXT][MAIN] %llu, sb_pebs[NEXT][MAIN] %llu\n", + fsi->sb_lebs[SSDFS_NEXT_SB_SEG][SSDFS_MAIN_SB_SEG], + fsi->sb_pebs[SSDFS_NEXT_SB_SEG][SSDFS_MAIN_SB_SEG]); + SSDFS_DBG("sb_lebs[NEXT][COPY] %llu, sb_pebs[NEXT][COPY] %llu\n", + fsi->sb_lebs[SSDFS_NEXT_SB_SEG][SSDFS_COPY_SB_SEG], + fsi->sb_pebs[SSDFS_NEXT_SB_SEG][SSDFS_COPY_SB_SEG]); + SSDFS_DBG("sb_lebs[RESERVED][MAIN] %llu, sb_pebs[RESERVED][MAIN] %llu\n", + fsi->sb_lebs[SSDFS_RESERVED_SB_SEG][SSDFS_MAIN_SB_SEG], + fsi->sb_pebs[SSDFS_RESERVED_SB_SEG][SSDFS_MAIN_SB_SEG]); + SSDFS_DBG("sb_lebs[RESERVED][COPY] %llu, sb_pebs[RESERVED][COPY] %llu\n", + fsi->sb_lebs[SSDFS_RESERVED_SB_SEG][SSDFS_COPY_SB_SEG], + fsi->sb_pebs[SSDFS_RESERVED_SB_SEG][SSDFS_COPY_SB_SEG]); + SSDFS_DBG("sb_lebs[PREV][MAIN] %llu, sb_pebs[PREV][MAIN] %llu\n", + fsi->sb_lebs[SSDFS_PREV_SB_SEG][SSDFS_MAIN_SB_SEG], + fsi->sb_pebs[SSDFS_PREV_SB_SEG][SSDFS_MAIN_SB_SEG]); + SSDFS_DBG("sb_lebs[PREV][COPY] %llu, sb_pebs[PREV][COPY] %llu\n", + fsi->sb_lebs[SSDFS_PREV_SB_SEG][SSDFS_COPY_SB_SEG], + fsi->sb_pebs[SSDFS_PREV_SB_SEG][SSDFS_COPY_SB_SEG]); + SSDFS_DBG("nsegs %llu, free_pages %llu\n", + fsi->nsegs, fsi->free_pages); + SSDFS_DBG("fs_mount_time %llu, fs_mod_time %llu, fs_mount_cno %llu\n", + fsi->fs_mount_time, fsi->fs_mod_time, fsi->fs_mount_cno); + SSDFS_DBG("fs_flags %#x, fs_state %#x, fs_errors %#x\n", + fsi->fs_flags, fsi->fs_state, fsi->fs_errors); + SSDFS_DBG("fs_feature_compat %llu, fs_feature_compat_ro %llu, " + "fs_feature_incompat %llu\n", + fsi->fs_feature_compat, fsi->fs_feature_compat_ro, + fsi->fs_feature_incompat); + SSDFS_DBG("migration_threshold %u\n", + fsi->migration_threshold); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi->sb->s_blocksize = fsi->pagesize; + fsi->sb->s_blocksize_bits = blksize_bits(fsi->pagesize); + + ssdfs_maptbl_cache_init(&fsi->maptbl_cache); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("VOLUME HEADER DUMP\n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + fsi->vh, fsi->pagesize); + SSDFS_DBG("END\n"); + + SSDFS_DBG("VOLUME STATE DUMP\n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + fsi->vs, fsi->pagesize); + SSDFS_DBG("END\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_check_fs_state(fsi); + if (err && err != -EROFS) + return err; + + err = ssdfs_check_feature_compatibility(fsi); + if (err) + return err; + + if (fsi->leb_pages_capacity >= U16_MAX) { +#ifdef CONFIG_SSDFS_TESTING + SSDFS_DBG("Continue in testing mode: " + "leb_pages_capacity %u, peb_pages_capacity %u\n", + fsi->leb_pages_capacity, + fsi->peb_pages_capacity); + return 0; +#else + SSDFS_NOTICE("unable to mount in RW mode: " + "Please, format volume with bigger logical block size.\n"); + SSDFS_NOTICE("STATIC VOLUME INFO:\n"); + SSDFS_NOTICE("pagesize %u, erasesize %u, segsize %u\n", + fsi->pagesize, fsi->erasesize, fsi->segsize); + SSDFS_NOTICE("pebs_per_seg %u, pages_per_peb %u, " + "pages_per_seg %u\n", + fsi->pebs_per_seg, fsi->pages_per_peb, + fsi->pages_per_seg); + SSDFS_NOTICE("zone_size %llu, zone_capacity %llu, " + "leb_pages_capacity %u, peb_pages_capacity %u\n", + fsi->zone_size, fsi->zone_capacity, + fsi->leb_pages_capacity, fsi->peb_pages_capacity); + + fsi->sb->s_flags |= SB_RDONLY; + return -EROFS; +#endif /* CONFIG_SSDFS_TESTING */ + } + + return 0; +} + +static +int ssdfs_check_maptbl_cache_header(struct ssdfs_maptbl_cache_header *hdr, + u16 sequence_id, + u64 prev_end_leb) +{ + size_t bytes_count, calculated; + u64 start_leb, end_leb; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!hdr); + + SSDFS_DBG("maptbl_cache_hdr %p\n", hdr); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (hdr->magic.common != cpu_to_le32(SSDFS_SUPER_MAGIC) || + hdr->magic.key != cpu_to_le16(SSDFS_MAPTBL_CACHE_MAGIC)) { + SSDFS_ERR("invalid maptbl cache magic signature\n"); + return -EIO; + } + + if (le16_to_cpu(hdr->sequence_id) != sequence_id) { + SSDFS_ERR("invalid sequence_id\n"); + return -EIO; + } + + bytes_count = le16_to_cpu(hdr->bytes_count); + + if (bytes_count > PAGE_SIZE) { + SSDFS_ERR("invalid bytes_count %zu\n", + bytes_count); + return -EIO; + } + + calculated = le16_to_cpu(hdr->items_count) * + sizeof(struct ssdfs_leb2peb_pair); + + if (bytes_count < calculated) { + SSDFS_ERR("bytes_count %zu < calculated %zu\n", + bytes_count, calculated); + return -EIO; + } + + start_leb = le64_to_cpu(hdr->start_leb); + end_leb = le64_to_cpu(hdr->end_leb); + + if (start_leb > end_leb || + (prev_end_leb != U64_MAX && prev_end_leb >= start_leb)) { + SSDFS_ERR("invalid LEB range: start_leb %llu, " + "end_leb %llu, prev_end_leb %llu\n", + start_leb, end_leb, prev_end_leb); + return -EIO; + } + + return 0; +} + +static int ssdfs_read_maptbl_cache(struct ssdfs_fs_info *fsi) +{ + struct ssdfs_segment_header *seg_hdr; + struct ssdfs_metadata_descriptor *meta_desc; + struct ssdfs_maptbl_cache_header *maptbl_cache_hdr; + u32 read_off; + u32 read_bytes = 0; + u32 bytes_count; + u32 pages_count; + u64 peb_id; + struct page *page; + void *kaddr; + u64 prev_end_leb; + u32 csum = ~0; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + BUG_ON(!fsi->devops->read); + + SSDFS_DBG("fsi %p\n", fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + + seg_hdr = SSDFS_SEG_HDR(fsi->sbi.vh_buf); + + if (!ssdfs_log_has_maptbl_cache(seg_hdr)) { + SSDFS_ERR("sb segment hasn't maptbl cache\n"); + return -EIO; + } + + down_write(&fsi->maptbl_cache.lock); + + meta_desc = &seg_hdr->desc_array[SSDFS_MAPTBL_CACHE_INDEX]; + read_off = le32_to_cpu(meta_desc->offset); + bytes_count = le32_to_cpu(meta_desc->size); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(bytes_count >= INT_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + peb_id = fsi->sbi.last_log.peb_id; + + pages_count = (bytes_count + PAGE_SIZE - 1) >> PAGE_SHIFT; + + for (i = 0; i < pages_count; i++) { + struct ssdfs_maptbl_cache *cache = &fsi->maptbl_cache; + size_t size; + + size = min_t(size_t, (size_t)PAGE_SIZE, + (size_t)(bytes_count - read_bytes)); + + page = ssdfs_maptbl_cache_add_pagevec_page(cache); + if (unlikely(IS_ERR_OR_NULL(page))) { + err = !page ? -ENOMEM : PTR_ERR(page); + SSDFS_ERR("fail to add pagevec page: err %d\n", + err); + goto finish_read_maptbl_cache; + } + + ssdfs_lock_page(page); + + kaddr = kmap_local_page(page); + err = ssdfs_unaligned_read_buffer(fsi, peb_id, + read_off, kaddr, size); + flush_dcache_page(page); + kunmap_local(kaddr); + + if (unlikely(err)) { + ssdfs_unlock_page(page); + SSDFS_ERR("fail to read page: " + "peb %llu, offset %u, size %zu, err %d\n", + peb_id, read_off, size, err); + goto finish_read_maptbl_cache; + } + + ssdfs_unlock_page(page); + + read_off += size; + read_bytes += size; + } + + prev_end_leb = U64_MAX; + + for (i = 0; i < pages_count; i++) { + page = fsi->maptbl_cache.pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(i >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + + maptbl_cache_hdr = SSDFS_MAPTBL_CACHE_HDR(kaddr); + + err = ssdfs_check_maptbl_cache_header(maptbl_cache_hdr, + (u16)i, + prev_end_leb); + if (unlikely(err)) { + SSDFS_ERR("invalid maptbl cache header: " + "page_index %d, err %d\n", + i, err); + goto unlock_cur_page; + } + + prev_end_leb = le64_to_cpu(maptbl_cache_hdr->end_leb); + + csum = crc32(csum, kaddr, + le16_to_cpu(maptbl_cache_hdr->bytes_count)); + +unlock_cur_page: + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + if (unlikely(err)) + goto finish_read_maptbl_cache; + } + + if (csum != le32_to_cpu(meta_desc->check.csum)) { + err = -EIO; + SSDFS_ERR("invalid checksum\n"); + goto finish_read_maptbl_cache; + } + + if (bytes_count < PAGE_SIZE) + bytes_count = PAGE_SIZE; + + atomic_set(&fsi->maptbl_cache.bytes_count, (int)bytes_count); + +finish_read_maptbl_cache: + up_write(&fsi->maptbl_cache.lock); + + return err; +} + +static inline bool is_ssdfs_snapshot_rules_exist(struct ssdfs_fs_info *fsi) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + + return ssdfs_log_footer_has_snapshot_rules(SSDFS_LF(fsi->vs)); +} + +static inline +int ssdfs_check_snapshot_rules_header(struct ssdfs_snapshot_rules_header *hdr) +{ + size_t item_size = sizeof(struct ssdfs_snapshot_rule_info); + u16 items_count; + u16 items_capacity; + u32 area_size; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!hdr); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (le32_to_cpu(hdr->magic) != SSDFS_SNAPSHOT_RULES_MAGIC) { + SSDFS_ERR("invalid snapshot rules magic %#x\n", + le32_to_cpu(hdr->magic)); + return -EIO; + } + + if (le16_to_cpu(hdr->item_size) != item_size) { + SSDFS_ERR("invalid item size %u\n", + le16_to_cpu(hdr->item_size)); + return -EIO; + } + + items_count = le16_to_cpu(hdr->items_count); + items_capacity = le16_to_cpu(hdr->items_capacity); + + if (items_count > items_capacity) { + SSDFS_ERR("corrupted header: " + "items_count %u > items_capacity %u\n", + items_count, items_capacity); + return -EIO; + } + + area_size = le32_to_cpu(hdr->area_size); + + if (area_size != ((u32)items_capacity * item_size)) { + SSDFS_ERR("corrupted header: " + "area_size %u, items_capacity %u, " + "item_size %zu\n", + area_size, items_capacity, item_size); + return -EIO; + } + + return 0; +} + +static inline int ssdfs_read_snapshot_rules(struct ssdfs_fs_info *fsi) +{ + struct ssdfs_log_footer *footer; + struct ssdfs_snapshot_rules_list *rules_list; + struct ssdfs_metadata_descriptor *meta_desc; + struct ssdfs_snapshot_rules_header snap_rules_hdr; + size_t sr_hdr_size = sizeof(struct ssdfs_snapshot_rules_header); + struct ssdfs_snapshot_rule_info info; + size_t rule_size = sizeof(struct ssdfs_snapshot_rule_info); + struct pagevec pvec; + u32 read_off; + u32 read_bytes = 0; + u32 bytes_count; + u32 pages_count; + u64 peb_id; + struct page *page; + void *kaddr; + u32 csum = ~0; + u16 items_count; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + BUG_ON(!fsi->devops->read); + + SSDFS_DBG("fsi %p\n", fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + + footer = SSDFS_LF(fsi->sbi.vs_buf); + rules_list = &fsi->snapshots.rules_list; + + if (!ssdfs_log_footer_has_snapshot_rules(footer)) { + SSDFS_ERR("footer hasn't snapshot rules table\n"); + return -EIO; + } + + meta_desc = &footer->desc_array[SSDFS_SNAPSHOT_RULES_AREA_INDEX]; + read_off = le32_to_cpu(meta_desc->offset); + bytes_count = le32_to_cpu(meta_desc->size); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(bytes_count >= INT_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + peb_id = fsi->sbi.last_log.peb_id; + + pages_count = (bytes_count + PAGE_SIZE - 1) >> PAGE_SHIFT; + pagevec_init(&pvec); + + for (i = 0; i < pages_count; i++) { + size_t size; + + size = min_t(size_t, (size_t)PAGE_SIZE, + (size_t)(bytes_count - read_bytes)); + + page = ssdfs_snapshot_rules_add_pagevec_page(&pvec); + if (unlikely(IS_ERR_OR_NULL(page))) { + err = !page ? -ENOMEM : PTR_ERR(page); + SSDFS_ERR("fail to add pagevec page: err %d\n", + err); + goto finish_read_snapshot_rules; + } + + ssdfs_lock_page(page); + + kaddr = kmap_local_page(page); + err = ssdfs_unaligned_read_buffer(fsi, peb_id, + read_off, kaddr, size); + flush_dcache_page(page); + kunmap_local(kaddr); + + if (unlikely(err)) { + ssdfs_unlock_page(page); + SSDFS_ERR("fail to read page: " + "peb %llu, offset %u, size %zu, err %d\n", + peb_id, read_off, size, err); + goto finish_read_snapshot_rules; + } + + ssdfs_unlock_page(page); + + read_off += size; + read_bytes += size; + } + + page = pvec.pages[0]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + ssdfs_memcpy_from_page(&snap_rules_hdr, 0, sr_hdr_size, + page, 0, PAGE_SIZE, + sr_hdr_size); + ssdfs_unlock_page(page); + + err = ssdfs_check_snapshot_rules_header(&snap_rules_hdr); + if (unlikely(err)) { + SSDFS_ERR("invalid snapshot rules header: " + "err %d\n", err); + goto finish_read_snapshot_rules; + } + + for (i = 0; i < pages_count; i++) { + page = pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(i >= U16_MAX); + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + csum = crc32(csum, kaddr, le16_to_cpu(meta_desc->check.bytes)); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + } + + if (csum != le32_to_cpu(meta_desc->check.csum)) { + err = -EIO; + SSDFS_ERR("invalid checksum\n"); + goto finish_read_snapshot_rules; + } + + items_count = le16_to_cpu(snap_rules_hdr.items_count); + read_off = sr_hdr_size; + + for (i = 0; i < items_count; i++) { + struct ssdfs_snapshot_rule_item *ptr; + + err = ssdfs_unaligned_read_pagevec(&pvec, read_off, + rule_size, &info); + if (unlikely(err)) { + SSDFS_ERR("fail to read a snapshot rule: " + "read_off %u, index %d, err %d\n", + read_off, i, err); + goto finish_read_snapshot_rules; + } + + ptr = ssdfs_snapshot_rule_alloc(); + if (!ptr) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate rule item\n"); + goto finish_read_snapshot_rules; + } + + ssdfs_memcpy(&ptr->rule, 0, rule_size, + &info, 0, rule_size, + rule_size); + + ssdfs_snapshot_rules_list_add_tail(rules_list, ptr); + + read_off += rule_size; + } + +finish_read_snapshot_rules: + ssdfs_snapshot_rules_pagevec_release(&pvec); + return err; +} + +static int ssdfs_init_recovery_environment(struct ssdfs_fs_info *fsi, + struct ssdfs_volume_header *vh, + u64 pebs_per_volume, + struct ssdfs_recovery_env *env) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !vh || !env); + + SSDFS_DBG("fsi %p, vh %p, env %p\n", fsi, vh, env); +#endif /* CONFIG_SSDFS_DEBUG */ + + env->found = NULL; + env->err = 0; + env->fsi = fsi; + env->pebs_per_volume = pebs_per_volume; + + atomic_set(&env->state, SSDFS_RECOVERY_UNKNOWN_STATE); + + err = ssdfs_init_sb_info(fsi, &env->sbi); + if (likely(!err)) + err = ssdfs_init_sb_info(fsi, &env->sbi_backup); + + if (unlikely(err)) { + SSDFS_ERR("fail to prepare sb info: err %d\n", err); + return err; + } + + return 0; +} + +static inline bool has_thread_finished(struct ssdfs_recovery_env *env) +{ + switch (atomic_read(&env->state)) { + case SSDFS_RECOVERY_FAILED: + case SSDFS_RECOVERY_FINISHED: + return true; + + case SSDFS_START_RECOVERY: + return false; + } + + return true; +} + +static inline u16 ssdfs_get_pebs_per_stripe(u64 pebs_per_volume, + u64 processed_pebs, + u32 fragments_count, + u16 pebs_per_fragment, + u16 stripes_per_fragment, + u16 pebs_per_stripe) +{ + u64 fragment_index; + u64 pebs_per_aligned_fragments; + u64 pebs_per_last_fragment; + u64 calculated = U16_MAX; + u32 remainder; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pebs_per_volume %llu, processed_pebs %llu, " + "fragments_count %u, pebs_per_fragment %u, " + "stripes_per_fragment %u, pebs_per_stripe %u\n", + pebs_per_volume, processed_pebs, + fragments_count, pebs_per_fragment, + stripes_per_fragment, pebs_per_stripe); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (fragments_count == 0) { + SSDFS_WARN("invalid fragments_count %u\n", + fragments_count); + return pebs_per_stripe; + } + + fragment_index = processed_pebs / pebs_per_fragment; + + if (fragment_index >= fragments_count) { + SSDFS_WARN("fragment_index %llu >= fragments_count %u\n", + fragment_index, fragments_count); + return pebs_per_stripe; + } + + if ((fragment_index + 1) < fragments_count) + calculated = pebs_per_stripe; + else { + pebs_per_aligned_fragments = fragments_count - 1; + pebs_per_aligned_fragments *= pebs_per_fragment; + + if (pebs_per_aligned_fragments >= pebs_per_volume) { + SSDFS_WARN("calculated %llu >= pebs_per_volume %llu\n", + pebs_per_aligned_fragments, + pebs_per_volume); + return 0; + } + + pebs_per_last_fragment = pebs_per_volume - + pebs_per_aligned_fragments; + calculated = pebs_per_last_fragment / stripes_per_fragment; + + div_u64_rem(pebs_per_last_fragment, + (u64)stripes_per_fragment, &remainder); + + if (remainder != 0) + calculated++; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("calculated: fragment_index %llu, pebs_per_stripe %llu\n", + fragment_index, calculated); + + BUG_ON(calculated > pebs_per_stripe); +#endif /* CONFIG_SSDFS_DEBUG */ + + return (u16)calculated; +} + +static inline +void ssdfs_init_found_pebs_details(struct ssdfs_found_protected_pebs *ptr) +{ + int i, j; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr); + + SSDFS_DBG("ptr %p\n", ptr); +#endif /* CONFIG_SSDFS_DEBUG */ + + ptr->start_peb = U64_MAX; + ptr->pebs_count = U32_MAX; + ptr->lower_offset = U64_MAX; + ptr->middle_offset = U64_MAX; + ptr->upper_offset = U64_MAX; + ptr->current_offset = U64_MAX; + ptr->search_phase = SSDFS_RECOVERY_NO_SEARCH; + + for (i = 0; i < SSDFS_PROTECTED_PEB_CHAIN_MAX; i++) { + struct ssdfs_found_protected_peb *cur_peb; + + cur_peb = &ptr->array[i]; + + cur_peb->peb.peb_id = U64_MAX; + cur_peb->peb.is_superblock_peb = false; + cur_peb->peb.state = SSDFS_PEB_NOT_CHECKED; + + for (j = 0; j < SSDFS_SB_CHAIN_MAX; j++) { + struct ssdfs_superblock_pebs_pair *cur_pair; + struct ssdfs_found_peb *cur_sb_peb; + + cur_pair = &cur_peb->found.sb_pebs[j]; + + cur_sb_peb = &cur_pair->pair[SSDFS_MAIN_SB_SEG]; + cur_sb_peb->peb_id = U64_MAX; + cur_sb_peb->is_superblock_peb = false; + cur_sb_peb->state = SSDFS_PEB_NOT_CHECKED; + + cur_sb_peb = &cur_pair->pair[SSDFS_COPY_SB_SEG]; + cur_sb_peb->peb_id = U64_MAX; + cur_sb_peb->is_superblock_peb = false; + cur_sb_peb->state = SSDFS_PEB_NOT_CHECKED; + } + } +} + +static inline +int ssdfs_start_recovery_thread_activity(struct ssdfs_recovery_env *env, + struct ssdfs_found_protected_pebs *found, + u64 start_peb, u32 pebs_count, int search_phase) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || env->found || !found); + + SSDFS_DBG("env %p, found %p, start_peb %llu, " + "pebs_count %u, search_phase %#x\n", + env, found, start_peb, + pebs_count, search_phase); +#endif /* CONFIG_SSDFS_DEBUG */ + + env->found = found; + env->err = 0; + + if (search_phase == SSDFS_RECOVERY_FAST_SEARCH) { + env->found->start_peb = start_peb; + env->found->pebs_count = pebs_count; + } else if (search_phase == SSDFS_RECOVERY_SLOW_SEARCH) { + struct ssdfs_found_protected_peb *protected; + u64 lower_peb_id; + u64 upper_peb_id; + u64 last_cno_peb_id; + + if (env->found->start_peb != start_peb || + env->found->pebs_count != pebs_count) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ignore search in fragment: " + "found (start_peb %llu, pebs_count %u), " + "start_peb %llu, pebs_count %u\n", + env->found->start_peb, + env->found->pebs_count, + start_peb, pebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + env->err = -ENODATA; + atomic_set(&env->state, SSDFS_RECOVERY_FAILED); + return -ENODATA; + } + + protected = &env->found->array[SSDFS_LOWER_PEB_INDEX]; + lower_peb_id = protected->peb.peb_id; + + protected = &env->found->array[SSDFS_UPPER_PEB_INDEX]; + upper_peb_id = protected->peb.peb_id; + + protected = &env->found->array[SSDFS_LAST_CNO_PEB_INDEX]; + last_cno_peb_id = protected->peb.peb_id; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("protected PEBs: " + "lower %llu, upper %llu, last_cno_peb %llu\n", + lower_peb_id, upper_peb_id, last_cno_peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (lower_peb_id >= U64_MAX) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ignore search in fragment: " + "found (start_peb %llu, pebs_count %u), " + "start_peb %llu, pebs_count %u, " + "lower %llu, upper %llu, " + "last_cno_peb %llu\n", + env->found->start_peb, + env->found->pebs_count, + start_peb, pebs_count, + lower_peb_id, upper_peb_id, + last_cno_peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + env->err = -ENODATA; + atomic_set(&env->state, SSDFS_RECOVERY_FAILED); + return -ENODATA; + } else if (lower_peb_id == env->found->start_peb && + upper_peb_id >= U64_MAX) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ignore search in fragment: " + "found (start_peb %llu, pebs_count %u), " + "start_peb %llu, pebs_count %u, " + "lower %llu, upper %llu, " + "last_cno_peb %llu\n", + env->found->start_peb, + env->found->pebs_count, + start_peb, pebs_count, + lower_peb_id, upper_peb_id, + last_cno_peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + env->err = -ENODATA; + atomic_set(&env->state, SSDFS_RECOVERY_FAILED); + return -ENODATA; + } + } else { + SSDFS_ERR("unexpected search phase %#x\n", + search_phase); + return -ERANGE; + } + + env->found->search_phase = search_phase; + atomic_set(&env->state, SSDFS_START_RECOVERY); + wake_up(&env->request_wait_queue); + + return 0; +} + +static inline +int ssdfs_wait_recovery_thread_finish(struct ssdfs_fs_info *fsi, + struct ssdfs_recovery_env *env, + u32 stripe_id, + bool *has_sb_peb_found) +{ + struct ssdfs_segment_header *seg_hdr; + wait_queue_head_t *wq; + u64 cno1, cno2; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !has_sb_peb_found); + + SSDFS_DBG("env %p, has_sb_peb_found %p, stripe_id %u\n", + env, has_sb_peb_found, stripe_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + /* + * Do not change has_sb_peb_found + * if nothing has been found!!!! + */ + + wq = &env->result_wait_queue; + + wait_event_interruptible_timeout(*wq, + has_thread_finished(env), + SSDFS_DEFAULT_TIMEOUT); + + switch (atomic_read(&env->state)) { + case SSDFS_RECOVERY_FINISHED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("stripe %u has SB segment\n", + stripe_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + seg_hdr = SSDFS_SEG_HDR(fsi->sbi.vh_buf); + cno1 = le64_to_cpu(seg_hdr->cno); + seg_hdr = SSDFS_SEG_HDR(env->sbi.vh_buf); + cno2 = le64_to_cpu(seg_hdr->cno); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cno1 %llu, cno2 %llu\n", + cno1, cno2); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (cno1 <= cno2) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("copy sb info: " + "stripe_id %u\n", + stripe_id); +#endif /* CONFIG_SSDFS_DEBUG */ + ssdfs_copy_sb_info(fsi, env); + *has_sb_peb_found = true; + } + break; + + case SSDFS_RECOVERY_FAILED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("stripe %u has nothing\n", + stripe_id); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + case SSDFS_START_RECOVERY: + err = -ERANGE; + SSDFS_WARN("thread is working too long: " + "stripe %u\n", + stripe_id); + atomic_set(&env->state, SSDFS_RECOVERY_FAILED); + break; + + default: + BUG(); + } + + env->found = NULL; + return err; +} + +int ssdfs_gather_superblock_info(struct ssdfs_fs_info *fsi, int silent) +{ + struct ssdfs_volume_header *vh; + struct ssdfs_recovery_env *array = NULL; + struct ssdfs_found_protected_pebs *found_pebs = NULL; + u64 dev_size; + u32 erasesize; + u64 pebs_per_volume; + u32 fragments_count = 0; + u16 pebs_per_fragment = 0; + u16 stripes_per_fragment = 0; + u16 pebs_per_stripe = 0; + u32 stripes_count = 0; + u32 threads_count; + u32 jobs_count; + u32 processed_stripes = 0; + u64 processed_pebs = 0; + bool has_sb_peb_found1, has_sb_peb_found2; + bool has_iteration_succeeded; + u16 calculated; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("fsi %p, silent %#x\n", fsi, silent); +#else + SSDFS_DBG("fsi %p, silent %#x\n", fsi, silent); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + err = ssdfs_init_sb_info(fsi, &fsi->sbi); + if (likely(!err)) { + err = ssdfs_init_sb_info(fsi, &fsi->sbi_backup); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to prepare sb info: err %d\n", err); + goto free_buf; + } + + err = ssdfs_find_any_valid_volume_header(fsi, + SSDFS_RESERVED_VBR_SIZE, + silent); + if (err) + goto forget_buf; + + vh = SSDFS_VH(fsi->sbi.vh_buf); + fragments_count = le32_to_cpu(vh->maptbl.fragments_count); + pebs_per_fragment = le16_to_cpu(vh->maptbl.pebs_per_fragment); + pebs_per_stripe = le16_to_cpu(vh->maptbl.pebs_per_stripe); + stripes_per_fragment = le16_to_cpu(vh->maptbl.stripes_per_fragment); + + dev_size = fsi->devops->device_size(fsi->sb); + erasesize = 1 << vh->log_erasesize; + pebs_per_volume = div_u64(dev_size, erasesize); + + stripes_count = fragments_count * stripes_per_fragment; + threads_count = min_t(u32, SSDFS_RECOVERY_THREADS, stripes_count); + + has_sb_peb_found1 = false; + has_sb_peb_found2 = false; + + found_pebs = ssdfs_recovery_kcalloc(stripes_count, + sizeof(struct ssdfs_found_protected_pebs), + GFP_KERNEL); + if (!found_pebs) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate the PEBs details array\n"); + goto free_environment; + } + + for (i = 0; i < stripes_count; i++) { + ssdfs_init_found_pebs_details(&found_pebs[i]); + } + + array = ssdfs_recovery_kcalloc(threads_count, + sizeof(struct ssdfs_recovery_env), + GFP_KERNEL); + if (!array) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate the environment\n"); + goto free_environment; + } + + for (i = 0; i < threads_count; i++) { + err = ssdfs_init_recovery_environment(fsi, vh, + pebs_per_volume, &array[i]); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare sb info: err %d\n", err); + + for (; i >= 0; i--) { + ssdfs_destruct_sb_info(&array[i].sbi); + ssdfs_destruct_sb_info(&array[i].sbi_backup); + } + + goto free_environment; + } + } + + for (i = 0; i < threads_count; i++) { + err = ssdfs_recovery_start_thread(&array[i], i); + if (unlikely(err)) { + if (err == -EINTR) { + /* + * Ignore this error. + */ + } else { + SSDFS_ERR("fail to start thread: " + "id %u, err %d\n", + i, err); + } + + for (; i >= 0; i--) + ssdfs_recovery_stop_thread(&array[i]); + + goto destruct_sb_info; + } + } + + jobs_count = 1; + + processed_stripes = 0; + processed_pebs = 0; + + while (processed_pebs < pebs_per_volume) { + /* Fast search phase */ + has_iteration_succeeded = false; + + if (processed_stripes >= stripes_count) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("processed_stripes %u >= stripes_count %u\n", + processed_stripes, stripes_count); +#endif /* CONFIG_SSDFS_DEBUG */ + goto try_slow_search; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("FAST_SEARCH: jobs_count %u\n", jobs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < jobs_count; i++) { + calculated = + ssdfs_get_pebs_per_stripe(pebs_per_volume, + processed_pebs, + fragments_count, + pebs_per_fragment, + stripes_per_fragment, + pebs_per_stripe); + + if ((processed_pebs + calculated) > pebs_per_volume) + calculated = pebs_per_volume - processed_pebs; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("i %d, start_peb %llu, pebs_count %u\n", + i, processed_pebs, calculated); + SSDFS_DBG("pebs_per_volume %llu, processed_pebs %llu\n", + pebs_per_volume, processed_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_start_recovery_thread_activity(&array[i], + &found_pebs[processed_stripes + i], + processed_pebs, calculated, + SSDFS_RECOVERY_FAST_SEARCH); + if (err) { + SSDFS_ERR("fail to start thread's activity: " + "err %d\n", err); + goto finish_sb_peb_search; + } + + processed_pebs += calculated; + } + + for (i = 0; i < jobs_count; i++) { + err = ssdfs_wait_recovery_thread_finish(fsi, + &array[i], + processed_stripes + i, + &has_iteration_succeeded); + if (unlikely(err)) { + has_sb_peb_found1 = false; + goto finish_sb_peb_search; + } + + switch (array[i].err) { + case 0: + /* SB PEB has been found */ + /* continue logic */ + break; + + case -ENODATA: + case -ENOENT: + case -EAGAIN: + case -E2BIG: + /* SB PEB has not been found */ + /* continue logic */ + break; + + default: + /* Something is going wrong */ + /* stop execution */ + err = array[i].err; + has_sb_peb_found1 = false; + SSDFS_ERR("fail to find valid SB PEB: " + "err %d\n", err); + goto finish_sb_peb_search; + } + } + + if (has_iteration_succeeded) { + has_sb_peb_found1 = true; + goto finish_sb_peb_search; + } + + processed_stripes += jobs_count; + + jobs_count <<= 1; + jobs_count = min_t(u32, jobs_count, threads_count); + jobs_count = min_t(u32, jobs_count, + stripes_count - processed_stripes); + }; + +try_slow_search: + jobs_count = 1; + + processed_stripes = 0; + processed_pebs = 0; + + while (processed_pebs < pebs_per_volume) { + /* Slow search phase */ + has_iteration_succeeded = false; + + if (processed_stripes >= stripes_count) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("processed_stripes %u >= stripes_count %u\n", + processed_stripes, stripes_count); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_sb_peb_search; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("SLOW_SEARCH: jobs_count %u\n", jobs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < jobs_count; i++) { + calculated = + ssdfs_get_pebs_per_stripe(pebs_per_volume, + processed_pebs, + fragments_count, + pebs_per_fragment, + stripes_per_fragment, + pebs_per_stripe); + + if ((processed_pebs + calculated) > pebs_per_volume) + calculated = pebs_per_volume - processed_pebs; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("i %d, start_peb %llu, pebs_count %u\n", + i, processed_pebs, calculated); + SSDFS_DBG("pebs_per_volume %llu, processed_pebs %llu\n", + pebs_per_volume, processed_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_start_recovery_thread_activity(&array[i], + &found_pebs[processed_stripes + i], + processed_pebs, calculated, + SSDFS_RECOVERY_SLOW_SEARCH); + if (err == -ENODATA) { + /* thread continues to sleep */ + /* continue logic */ + } else if (err) { + SSDFS_ERR("fail to start thread's activity: " + "err %d\n", err); + goto finish_sb_peb_search; + } + + processed_pebs += calculated; + } + + for (i = 0; i < jobs_count; i++) { + err = ssdfs_wait_recovery_thread_finish(fsi, + &array[i], + processed_stripes + i, + &has_iteration_succeeded); + if (unlikely(err)) { + has_sb_peb_found2 = false; + goto finish_sb_peb_search; + } + + switch (array[i].err) { + case 0: + /* SB PEB has been found */ + /* continue logic */ + break; + + case -ENODATA: + case -ENOENT: + case -EAGAIN: + case -E2BIG: + /* SB PEB has not been found */ + /* continue logic */ + break; + + default: + /* Something is going wrong */ + /* stop execution */ + err = array[i].err; + has_sb_peb_found2 = false; + SSDFS_ERR("fail to find valid SB PEB: " + "err %d\n", err); + goto finish_sb_peb_search; + } + } + + if (has_iteration_succeeded) { + has_sb_peb_found2 = true; + goto finish_sb_peb_search; + } + + processed_stripes += jobs_count; + + jobs_count <<= 1; + jobs_count = min_t(u32, jobs_count, threads_count); + jobs_count = min_t(u32, jobs_count, + stripes_count - processed_stripes); + }; + +finish_sb_peb_search: + for (i = 0; i < threads_count; i++) + ssdfs_recovery_stop_thread(&array[i]); + +destruct_sb_info: + for (i = 0; i < threads_count; i++) { + ssdfs_destruct_sb_info(&array[i].sbi); + ssdfs_destruct_sb_info(&array[i].sbi_backup); + } + +free_environment: + if (found_pebs) { + ssdfs_recovery_kfree(found_pebs); + found_pebs = NULL; + } + + if (array) { + ssdfs_recovery_kfree(array); + array = NULL; + } + + switch (err) { + case 0: + /* SB PEB has been found */ + /* continue logic */ + break; + + case -ENODATA: + case -ENOENT: + case -EAGAIN: + case -E2BIG: + /* SB PEB has not been found */ + /* continue logic */ + break; + + default: + /* Something is going wrong */ + /* stop execution */ + SSDFS_ERR("fail to find valid SB PEB: err %d\n", err); + goto forget_buf; + } + + if (has_sb_peb_found1) + SSDFS_DBG("FAST_SEARCH: found SB seg\n"); + else if (has_sb_peb_found2) + SSDFS_DBG("SLOW_SEARCH: found SB seg\n"); + + if (!has_sb_peb_found1 && !has_sb_peb_found2) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_ERR("unable to find latest valid sb segment: " + "trying old algorithm!!!\n"); + BUG(); +#else + SSDFS_ERR("unable to find latest valid sb segment: " + "trying old algorithm!!!\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_find_any_valid_sb_segment(fsi, 0); + if (err) + goto forget_buf; + + err = ssdfs_find_latest_valid_sb_segment(fsi); + if (err) + goto forget_buf; + } + + err = ssdfs_find_latest_valid_sb_info2(fsi); + if (err) { + SSDFS_ERR("unable to find latest valid sb info: " + "trying old algorithm!!!\n"); + + err = ssdfs_find_latest_valid_sb_info(fsi); + if (err) + goto forget_buf; + } + + err = ssdfs_initialize_fs_info(fsi); + if (err && err != -EROFS) + goto forget_buf; + + err = ssdfs_read_maptbl_cache(fsi); + if (err) + goto forget_buf; + + if (is_ssdfs_snapshot_rules_exist(fsi)) { + err = ssdfs_read_snapshot_rules(fsi); + if (err) + goto forget_buf; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("DONE: gather superblock info\n"); +#else + SSDFS_DBG("DONE: gather superblock info\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; + +forget_buf: + fsi->vh = NULL; + fsi->vs = NULL; + +free_buf: + ssdfs_destruct_sb_info(&fsi->sbi); + ssdfs_destruct_sb_info(&fsi->sbi_backup); + return err; +} diff --git a/fs/ssdfs/recovery.h b/fs/ssdfs/recovery.h new file mode 100644 index 000000000000..aead1ebe29e6 --- /dev/null +++ b/fs/ssdfs/recovery.h @@ -0,0 +1,446 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/recovery.h - recovery logic declarations. + * + * Copyright (c) 2019-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + */ + +#ifndef _SSDFS_RECOVERY_H +#define _SSDFS_RECOVERY_H + +#define SSDFS_RESERVED_SB_SEGS (6) +#define SSDFS_RECOVERY_THREADS (12) + +/* + * struct ssdfs_found_peb - found PEB details + * @peb_id: PEB's ID + * @cno: PEB's starting checkpoint + * @is_superblock_peb: has superblock PEB been found? + * @state: PEB's state + */ +struct ssdfs_found_peb { + u64 peb_id; + u64 cno; + bool is_superblock_peb; + int state; +}; + +/* + * States of found PEB + */ +enum { + SSDFS_PEB_NOT_CHECKED, + SSDFS_FOUND_PEB_VALID, + SSDFS_FOUND_PEB_INVALID, + SSDFS_FOUND_PEB_STATE_MAX +}; + +/* + * struct ssdfs_superblock_pebs_pair - pair of superblock PEBs + * @pair: main and copy superblock PEBs + */ +struct ssdfs_superblock_pebs_pair { + struct ssdfs_found_peb pair[SSDFS_SB_SEG_COPY_MAX]; +}; + +/* + * struct ssdfs_found_superblock_pebs - found superblock PEBs + * sb_pebs: array of superblock PEBs details + */ +struct ssdfs_found_superblock_pebs { + struct ssdfs_superblock_pebs_pair sb_pebs[SSDFS_SB_CHAIN_MAX]; +}; + +/* + * struct ssdfs_found_protected_peb - protected PEB details + * @peb: protected PEB details + * @found: superblock PEBs details + */ +struct ssdfs_found_protected_peb { + struct ssdfs_found_peb peb; + struct ssdfs_found_superblock_pebs found; +}; + +/* + * struct ssdfs_found_protected_pebs - found protected PEBs + * @start_peb: starting PEB ID in fragment + * @pebs_count: PEBs count in fragment + * @lower_offset: lower offset bound + * @middle_offset: middle offset + * @upper_offset: upper offset bound + * @current_offset: current position of the search + * @search_phase: current search phase + * array: array of protected PEBs details + */ +struct ssdfs_found_protected_pebs { + u64 start_peb; + u32 pebs_count; + + u64 lower_offset; + u64 middle_offset; + u64 upper_offset; + u64 current_offset; + int search_phase; + +#define SSDFS_LOWER_PEB_INDEX (0) +#define SSDFS_UPPER_PEB_INDEX (1) +#define SSDFS_LAST_CNO_PEB_INDEX (2) +#define SSDFS_PROTECTED_PEB_CHAIN_MAX (3) + struct ssdfs_found_protected_peb array[SSDFS_PROTECTED_PEB_CHAIN_MAX]; +}; + +/* + * struct ssdfs_recovery_env - recovery environment + * @found: found PEBs' details + * @err: result of the search + * @state: recovery thread's state + * @pebs_per_volume: PEBs number per volume + * @last_vh: buffer for last valid volume header + * @sbi: superblock info + * @sbi_backup: backup copy of superblock info + * @request_wait_queue: request wait queue of recovery thread + * @result_wait_queue: result wait queue of recovery thread + * @thread: descriptor of recovery thread + * @fsi: file system info object + */ +struct ssdfs_recovery_env { + struct ssdfs_found_protected_pebs *found; + + int err; + atomic_t state; + u64 pebs_per_volume; + + struct ssdfs_volume_header last_vh; + struct ssdfs_sb_info sbi; + struct ssdfs_sb_info sbi_backup; + + wait_queue_head_t request_wait_queue; + wait_queue_head_t result_wait_queue; + struct ssdfs_thread_info thread; + struct ssdfs_fs_info *fsi; +}; + +/* + * Search phases + */ +enum { + SSDFS_RECOVERY_NO_SEARCH, + SSDFS_RECOVERY_FAST_SEARCH, + SSDFS_RECOVERY_SLOW_SEARCH, + SSDFS_RECOVERY_FIRST_SLOW_TRY, + SSDFS_RECOVERY_SECOND_SLOW_TRY, + SSDFS_RECOVERY_THIRD_SLOW_TRY, + SSDFS_RECOVERY_SEARCH_PHASES_MAX +}; + +/* + * Recovery thread's state + */ +enum { + SSDFS_RECOVERY_UNKNOWN_STATE, + SSDFS_START_RECOVERY, + SSDFS_RECOVERY_FAILED, + SSDFS_RECOVERY_FINISHED, + SSDFS_RECOVERY_STATE_MAX +}; + +/* + * Operation types + */ +enum { + SSDFS_USE_PEB_ISBAD_OP, + SSDFS_USE_READ_OP, +}; + +/* + * Inline functions + */ + +static inline +struct ssdfs_found_peb * +CUR_MAIN_SB_PEB(struct ssdfs_found_superblock_pebs *ptr) +{ + return &ptr->sb_pebs[SSDFS_CUR_SB_SEG].pair[SSDFS_MAIN_SB_SEG]; +} + +static inline +struct ssdfs_found_peb * +CUR_COPY_SB_PEB(struct ssdfs_found_superblock_pebs *ptr) +{ + return &ptr->sb_pebs[SSDFS_CUR_SB_SEG].pair[SSDFS_COPY_SB_SEG]; +} + +static inline +struct ssdfs_found_peb * +NEXT_MAIN_SB_PEB(struct ssdfs_found_superblock_pebs *ptr) +{ + return &ptr->sb_pebs[SSDFS_NEXT_SB_SEG].pair[SSDFS_MAIN_SB_SEG]; +} + +static inline +struct ssdfs_found_peb * +NEXT_COPY_SB_PEB(struct ssdfs_found_superblock_pebs *ptr) +{ + return &ptr->sb_pebs[SSDFS_NEXT_SB_SEG].pair[SSDFS_COPY_SB_SEG]; +} + +static inline +struct ssdfs_found_peb * +RESERVED_MAIN_SB_PEB(struct ssdfs_found_superblock_pebs *ptr) +{ + return &ptr->sb_pebs[SSDFS_RESERVED_SB_SEG].pair[SSDFS_MAIN_SB_SEG]; +} + +static inline +struct ssdfs_found_peb * +RESERVED_COPY_SB_PEB(struct ssdfs_found_superblock_pebs *ptr) +{ + return &ptr->sb_pebs[SSDFS_RESERVED_SB_SEG].pair[SSDFS_COPY_SB_SEG]; +} + +static inline +struct ssdfs_found_peb * +PREV_MAIN_SB_PEB(struct ssdfs_found_superblock_pebs *ptr) +{ + return &ptr->sb_pebs[SSDFS_PREV_SB_SEG].pair[SSDFS_MAIN_SB_SEG]; +} + +static inline +struct ssdfs_found_peb * +PREV_COPY_SB_PEB(struct ssdfs_found_superblock_pebs *ptr) +{ + return &ptr->sb_pebs[SSDFS_PREV_SB_SEG].pair[SSDFS_COPY_SB_SEG]; +} + +static inline +bool IS_INSIDE_STRIPE(struct ssdfs_found_protected_pebs *ptr, + struct ssdfs_found_peb *found) +{ + return found->peb_id >= ptr->start_peb && + found->peb_id < (ptr->start_peb + ptr->pebs_count); +} + +static inline +u64 SSDFS_RECOVERY_LOW_OFF(struct ssdfs_recovery_env *env) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->fsi || !env->found); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (env->found->search_phase) { + case SSDFS_RECOVERY_FAST_SEARCH: + return env->found->lower_offset; + + case SSDFS_RECOVERY_SLOW_SEARCH: + case SSDFS_RECOVERY_FIRST_SLOW_TRY: + return env->found->middle_offset; + + case SSDFS_RECOVERY_SECOND_SLOW_TRY: + return env->found->lower_offset; + + case SSDFS_RECOVERY_THIRD_SLOW_TRY: + if (env->found->start_peb == 0) + return SSDFS_RESERVED_VBR_SIZE; + else + return env->found->start_peb * env->fsi->erasesize; + } + + return U64_MAX; +} + +static inline +u64 SSDFS_RECOVERY_UPPER_OFF(struct ssdfs_recovery_env *env) +{ + u64 calculated_peb; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->fsi || !env->found); + BUG_ON(env->pebs_per_volume == 0); + BUG_ON(env->pebs_per_volume >= U64_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (env->found->search_phase) { + case SSDFS_RECOVERY_FAST_SEARCH: + calculated_peb = div_u64(env->found->middle_offset, + env->fsi->erasesize); + calculated_peb += SSDFS_MAPTBL_PROTECTION_STEP - 1; + if (calculated_peb >= env->pebs_per_volume) + calculated_peb = env->pebs_per_volume - 1; + + return calculated_peb * env->fsi->erasesize; + + case SSDFS_RECOVERY_SLOW_SEARCH: + case SSDFS_RECOVERY_FIRST_SLOW_TRY: + return env->found->upper_offset; + + case SSDFS_RECOVERY_SECOND_SLOW_TRY: + return env->found->middle_offset; + + case SSDFS_RECOVERY_THIRD_SLOW_TRY: + return env->found->lower_offset; + } + + return U64_MAX; +} + +static inline +u64 *SSDFS_RECOVERY_CUR_OFF_PTR(struct ssdfs_recovery_env *env) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->found); +#endif /* CONFIG_SSDFS_DEBUG */ + + return &env->found->current_offset; +} + +static inline +void SSDFS_RECOVERY_SET_FAST_SEARCH_TRY(struct ssdfs_recovery_env *env) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->found); +#endif /* CONFIG_SSDFS_DEBUG */ + + *SSDFS_RECOVERY_CUR_OFF_PTR(env) = env->found->lower_offset; + env->found->search_phase = SSDFS_RECOVERY_FAST_SEARCH; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("lower_offset %llu, " + "middle_offset %llu, " + "upper_offset %llu, " + "current_offset %llu, " + "search_phase %#x\n", + env->found->lower_offset, + env->found->middle_offset, + env->found->upper_offset, + env->found->current_offset, + env->found->search_phase); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +static inline +void SSDFS_RECOVERY_SET_FIRST_SLOW_TRY(struct ssdfs_recovery_env *env) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->found); +#endif /* CONFIG_SSDFS_DEBUG */ + + *SSDFS_RECOVERY_CUR_OFF_PTR(env) = env->found->middle_offset; + env->found->search_phase = SSDFS_RECOVERY_FIRST_SLOW_TRY; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("lower_offset %llu, " + "middle_offset %llu, " + "upper_offset %llu, " + "current_offset %llu, " + "search_phase %#x\n", + env->found->lower_offset, + env->found->middle_offset, + env->found->upper_offset, + env->found->current_offset, + env->found->search_phase); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +static inline +bool is_second_slow_try_possible(struct ssdfs_recovery_env *env) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->found); +#endif /* CONFIG_SSDFS_DEBUG */ + + return env->found->lower_offset < env->found->middle_offset; +} + +static inline +void SSDFS_RECOVERY_SET_SECOND_SLOW_TRY(struct ssdfs_recovery_env *env) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->found); +#endif /* CONFIG_SSDFS_DEBUG */ + + *SSDFS_RECOVERY_CUR_OFF_PTR(env) = env->found->lower_offset; + env->found->search_phase = SSDFS_RECOVERY_SECOND_SLOW_TRY; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("lower_offset %llu, " + "middle_offset %llu, " + "upper_offset %llu, " + "current_offset %llu, " + "search_phase %#x\n", + env->found->lower_offset, + env->found->middle_offset, + env->found->upper_offset, + env->found->current_offset, + env->found->search_phase); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +static inline +bool is_third_slow_try_possible(struct ssdfs_recovery_env *env) +{ + u64 offset; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->fsi || !env->found); +#endif /* CONFIG_SSDFS_DEBUG */ + + offset = env->found->start_peb * env->fsi->erasesize; + return offset < env->found->lower_offset; +} + +static inline +void SSDFS_RECOVERY_SET_THIRD_SLOW_TRY(struct ssdfs_recovery_env *env) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->fsi || !env->found); +#endif /* CONFIG_SSDFS_DEBUG */ + + *SSDFS_RECOVERY_CUR_OFF_PTR(env) = env->found->lower_offset; + env->found->search_phase = SSDFS_RECOVERY_THIRD_SLOW_TRY; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("lower_offset %llu, " + "middle_offset %llu, " + "upper_offset %llu, " + "current_offset %llu, " + "search_phase %#x\n", + env->found->lower_offset, + env->found->middle_offset, + env->found->upper_offset, + env->found->current_offset, + env->found->search_phase); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +/* + * Recovery API + */ +int ssdfs_recovery_start_thread(struct ssdfs_recovery_env *env, + u32 id); +int ssdfs_recovery_stop_thread(struct ssdfs_recovery_env *env); +void ssdfs_backup_sb_info2(struct ssdfs_recovery_env *env); +void ssdfs_restore_sb_info2(struct ssdfs_recovery_env *env); +int ssdfs_read_checked_sb_info3(struct ssdfs_recovery_env *env, + u64 peb_id, u32 pages_off); +int __ssdfs_find_any_valid_volume_header2(struct ssdfs_recovery_env *env, + u64 start_offset, + u64 end_offset, + u64 step); +int ssdfs_find_any_valid_sb_segment2(struct ssdfs_recovery_env *env, + u64 threshold_peb); +bool is_cur_main_sb_peb_exhausted(struct ssdfs_recovery_env *env); +bool is_cur_copy_sb_peb_exhausted(struct ssdfs_recovery_env *env); +int ssdfs_check_next_sb_pebs_pair(struct ssdfs_recovery_env *env); +int ssdfs_check_reserved_sb_pebs_pair(struct ssdfs_recovery_env *env); +int ssdfs_find_latest_valid_sb_segment2(struct ssdfs_recovery_env *env); +int ssdfs_find_last_sb_seg_outside_fragment(struct ssdfs_recovery_env *env); +int ssdfs_recovery_try_fast_search(struct ssdfs_recovery_env *env); +int ssdfs_recovery_try_slow_search(struct ssdfs_recovery_env *env); + +#endif /* _SSDFS_RECOVERY_H */ From patchwork Sat Feb 25 01:08:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151913 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 001BCC7EE31 for ; Sat, 25 Feb 2023 01:16:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229679AbjBYBQG (ORCPT ); Fri, 24 Feb 2023 20:16:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48440 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229522AbjBYBQA (ORCPT ); Fri, 24 Feb 2023 20:16:00 -0500 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC4E513516 for ; Fri, 24 Feb 2023 17:15:52 -0800 (PST) Received: by mail-oi1-x22e.google.com with SMTP id bh20so791135oib.9 for ; Fri, 24 Feb 2023 17:15:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qA13pvpEqCNvK58Lmj9i697NKs6YTsixMcGAJ0be9sk=; b=XO7pxqZPQ539oHSD3GMimG9CWfsOzYCciqJ7bY24JYF2A/3aWJZS44V17ghdwK2/S2 64ojG05WeQrv3AQ0ns0Fff4OCAgDrI28gUjkmjSLLkxWHnkYncfud3IKjiSjN3n5dgdQ RaGDde/D36Nx3wPhFiZxgMFyNRxZvfwUbzhSXAOALXafu1skjWJRbQjbSU8kBTzcD64x etdAp+LMtRZnWAmmKvXdz6I9nsBxMnTQ6Cx7hlHqh+/3wzI4Bt7kHzZq4GanqFP9EkV0 7ueiFl7C1ipwXheg3mQ/BeThUEtOfZm21Iq6QRK/LXL9Fsd1W5I1KFRdhdoNSsTXpOvA QpCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qA13pvpEqCNvK58Lmj9i697NKs6YTsixMcGAJ0be9sk=; b=ABd0rZmaZ5nENgiT/JG99JR9M0eVRRG/VnB1gcZVIUq7sPwrQzilVz8qgMJkm90C7H NexBsl3rNuJra7dm0CkKLNay0useNxG1h/DXU5TyXyL5u9ql0fnVqY50+93KFPpzyByH 2AX0+L4XD6ZLEfaXMOSttPaZpSYZu8mkSdMHq9z+ccCunlxlmYFbM5J5qMPFVjCTFEGC KyHbuJ0tC7NFAMmav59et0JXV1HWiISB0coCZoPBaT233FwAIEyvpsYKG6Yj1r+jP1sz yR/7OzCp7+aLpAzTEoZ8vkML+w0ZMeF+FPTlILCGcXm7FO8AvAuxXDaCwsTmAejCtUpw 3vfg== X-Gm-Message-State: AO0yUKWvLNrgm2gog6QcUq2xdCOvYE9dO90QnY9Ya8YSPFphZFnVhrk3 sXpzwsVR9g4dVG0k0zhjqo1B5qQeQMnyH8vI X-Google-Smtp-Source: AK7set8Bwi6Dk6sMFstbaXR3xdSZcjaeIJ5RV7z5c82W2GIuEhM7mG7D4zV03UwMyfcEL3HsUNS09w== X-Received: by 2002:a05:6808:253:b0:383:f380:868e with SMTP id m19-20020a056808025300b00383f380868emr1985089oie.34.1677287751410; Fri, 24 Feb 2023 17:15:51 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.15.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:15:50 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 08/76] ssdfs: search last actual superblock Date: Fri, 24 Feb 2023 17:08:19 -0800 Message-Id: <20230225010927.813929-9-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org SSDFS is pure LFS file system. It means that there is no fixed position of superblock on the volume. SSDFS keeps superblock information into every segment header and log footer. Actually, every log contains copy of superblock. However, it needs to find a specialized superblock segment and last actual superblock state for proper intialization of file system instance. Search logic is split on several steps: (1) find any valid segment header and extract information about current, next, and reserved superblock segment location, (2) find latest valid superblock segment, (3) find latest valid superblock state into superblock segment. Search logic splits file system volume on several portions. It starts to search in the first portion by using fast search algorithm. The fast algorithm checks every 50th erase block in the portion. If first portion hasn't last superblock segment, then search logic starts several threads that are looking for last actual and valid superblock segment by using fast search logic. Finally, if the fast search algorithm is unable to find the last actual superblock segment, then file system driver repeat the search by means of using slow search algorithm logic. The slow search algorithm simply checks every erase block in the portion. Usually, fast search algorithm is enough, but if the volume could be corrupted, then slow search logic can be used to find consistent state of superblock and to try to recover the volume state. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/recovery_fast_search.c | 1194 ++++++++++++++++++++++++++++++ fs/ssdfs/recovery_slow_search.c | 585 +++++++++++++++ fs/ssdfs/recovery_thread.c | 1196 +++++++++++++++++++++++++++++++ 3 files changed, 2975 insertions(+) create mode 100644 fs/ssdfs/recovery_fast_search.c create mode 100644 fs/ssdfs/recovery_slow_search.c create mode 100644 fs/ssdfs/recovery_thread.c diff --git a/fs/ssdfs/recovery_fast_search.c b/fs/ssdfs/recovery_fast_search.c new file mode 100644 index 000000000000..70c97331fccb --- /dev/null +++ b/fs/ssdfs/recovery_fast_search.c @@ -0,0 +1,1194 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/recovery_fast_search.c - fast superblock search. + * + * Copyright (c) 2020-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + */ + +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "page_array.h" +#include "page_vector.h" +#include "peb.h" +#include "segment_bitmap.h" +#include "peb_mapping_table.h" +#include "recovery.h" + +#include + +static inline +bool IS_SB_PEB(struct ssdfs_recovery_env *env) +{ +#ifdef CONFIG_SSDFS_DEBUG + size_t hdr_size = sizeof(struct ssdfs_segment_header); +#endif /* CONFIG_SSDFS_DEBUG */ + int type; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env); + BUG_ON(!env->sbi.vh_buf); + BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(env->sbi.vh_buf)->magic)); + BUG_ON(!is_ssdfs_volume_header_csum_valid(env->sbi.vh_buf, hdr_size)); +#endif /* CONFIG_SSDFS_DEBUG */ + + type = le16_to_cpu(SSDFS_SEG_HDR(env->sbi.vh_buf)->seg_type); + + if (type == SSDFS_SB_SEG_TYPE) + return true; + + return false; +} + +static inline +void STORE_PEB_INFO(struct ssdfs_found_peb *peb, + u64 peb_id, u64 cno, + int type, int state) +{ + peb->peb_id = peb_id; + peb->cno = cno; + if (type == SSDFS_SB_SEG_TYPE) + peb->is_superblock_peb = true; + else + peb->is_superblock_peb = false; + peb->state = state; +} + +static inline +void STORE_SB_PEB_INFO(struct ssdfs_found_peb *peb, + u64 peb_id) +{ + STORE_PEB_INFO(peb, peb_id, U64_MAX, + SSDFS_UNKNOWN_SEG_TYPE, + SSDFS_PEB_NOT_CHECKED); +} + +static inline +void STORE_MAIN_SB_PEB_INFO(struct ssdfs_recovery_env *env, + struct ssdfs_found_protected_peb *ptr, + int sb_seg_index) +{ + struct ssdfs_superblock_pebs_pair *pair; + struct ssdfs_found_peb *sb_peb; + u64 peb_id; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr); + BUG_ON(sb_seg_index < SSDFS_CUR_SB_SEG || + sb_seg_index >= SSDFS_SB_CHAIN_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + pair = &ptr->found.sb_pebs[sb_seg_index]; + sb_peb = &pair->pair[SSDFS_MAIN_SB_SEG]; + peb_id = SSDFS_MAIN_SB_PEB(SSDFS_VH(env->sbi.vh_buf), sb_seg_index); + + STORE_SB_PEB_INFO(sb_peb, peb_id); +} + +static inline +void STORE_COPY_SB_PEB_INFO(struct ssdfs_recovery_env *env, + struct ssdfs_found_protected_peb *ptr, + int sb_seg_index) +{ + struct ssdfs_superblock_pebs_pair *pair; + struct ssdfs_found_peb *sb_peb; + u64 peb_id; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr); + BUG_ON(sb_seg_index < SSDFS_CUR_SB_SEG || + sb_seg_index >= SSDFS_SB_CHAIN_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + pair = &ptr->found.sb_pebs[sb_seg_index]; + sb_peb = &pair->pair[SSDFS_COPY_SB_SEG]; + peb_id = SSDFS_COPY_SB_PEB(SSDFS_VH(env->sbi.vh_buf), sb_seg_index); + + STORE_SB_PEB_INFO(sb_peb, peb_id); +} + +static inline +void ssdfs_store_superblock_pebs_info(struct ssdfs_recovery_env *env, + int peb_index) +{ + struct ssdfs_found_protected_peb *ptr; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->found); + BUG_ON(!env->sbi.vh_buf); + BUG_ON(peb_index < SSDFS_LOWER_PEB_INDEX || + peb_index >= SSDFS_PROTECTED_PEB_CHAIN_MAX); + + SSDFS_DBG("env %p, peb_index %d\n", + env, peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + ptr = &env->found->array[peb_index]; + + STORE_MAIN_SB_PEB_INFO(env, ptr, SSDFS_CUR_SB_SEG); + STORE_COPY_SB_PEB_INFO(env, ptr, SSDFS_CUR_SB_SEG); + + STORE_MAIN_SB_PEB_INFO(env, ptr, SSDFS_NEXT_SB_SEG); + STORE_COPY_SB_PEB_INFO(env, ptr, SSDFS_NEXT_SB_SEG); + + STORE_MAIN_SB_PEB_INFO(env, ptr, SSDFS_RESERVED_SB_SEG); + STORE_COPY_SB_PEB_INFO(env, ptr, SSDFS_RESERVED_SB_SEG); + + STORE_MAIN_SB_PEB_INFO(env, ptr, SSDFS_PREV_SB_SEG); + STORE_COPY_SB_PEB_INFO(env, ptr, SSDFS_PREV_SB_SEG); +} + +static inline +void ssdfs_store_protected_peb_info(struct ssdfs_recovery_env *env, + int peb_index, + u64 peb_id) +{ +#ifdef CONFIG_SSDFS_DEBUG + size_t hdr_size = sizeof(struct ssdfs_segment_header); +#endif /* CONFIG_SSDFS_DEBUG */ + struct ssdfs_found_protected_peb *ptr; + u64 cno; + int type; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->found); + BUG_ON(!env->sbi.vh_buf); + BUG_ON(peb_index < SSDFS_LOWER_PEB_INDEX || + peb_index >= SSDFS_PROTECTED_PEB_CHAIN_MAX); + BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(env->sbi.vh_buf)->magic)); + BUG_ON(!is_ssdfs_volume_header_csum_valid(env->sbi.vh_buf, hdr_size)); + + SSDFS_DBG("env %p, peb_index %d, peb_id %llu\n", + env, peb_index, peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + cno = le64_to_cpu(SSDFS_SEG_HDR(env->sbi.vh_buf)->cno); + type = le16_to_cpu(SSDFS_SEG_HDR(env->sbi.vh_buf)->seg_type); + + ptr = &env->found->array[peb_index]; + STORE_PEB_INFO(&ptr->peb, peb_id, cno, type, SSDFS_FOUND_PEB_VALID); + ssdfs_store_superblock_pebs_info(env, peb_index); +} + +static +int ssdfs_calculate_recovery_search_bounds(struct ssdfs_recovery_env *env, + u64 dev_size, + u64 *lower_peb, loff_t *lower_off, + u64 *upper_peb, loff_t *upper_off) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->found || !env->fsi); + BUG_ON(!lower_peb || !lower_off); + BUG_ON(!upper_peb || !upper_off); + + SSDFS_DBG("env %p, start_peb %llu, " + "pebs_count %u, dev_size %llu\n", + env, env->found->start_peb, + env->found->pebs_count, dev_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + *lower_peb = env->found->start_peb; + if (*lower_peb == 0) + *lower_off = SSDFS_RESERVED_VBR_SIZE; + else + *lower_off = *lower_peb * env->fsi->erasesize; + + if (*lower_off >= dev_size) { + SSDFS_ERR("invalid offset: lower_off %llu, " + "dev_size %llu\n", + (unsigned long long)*lower_off, + dev_size); + return -ERANGE; + } + + *upper_peb = env->found->pebs_count - 1; + *upper_peb /= SSDFS_MAPTBL_PROTECTION_STEP; + *upper_peb *= SSDFS_MAPTBL_PROTECTION_STEP; + *upper_peb += env->found->start_peb; + *upper_off = *upper_peb * env->fsi->erasesize; + + if (*upper_off >= dev_size) { + *upper_off = min_t(u64, *upper_off, + dev_size - env->fsi->erasesize); + *upper_peb = *upper_off / env->fsi->erasesize; + *upper_peb -= env->found->start_peb; + *upper_peb /= SSDFS_MAPTBL_PROTECTION_STEP; + *upper_peb *= SSDFS_MAPTBL_PROTECTION_STEP; + *upper_peb += env->found->start_peb; + *upper_off = *upper_peb * env->fsi->erasesize; + } + + return 0; +} + +static +int ssdfs_find_valid_protected_pebs(struct ssdfs_recovery_env *env) +{ + struct super_block *sb = env->fsi->sb; + u64 dev_size = env->fsi->devops->device_size(sb); + u64 lower_peb, upper_peb; + loff_t lower_off, upper_off; + size_t hdr_size = sizeof(struct ssdfs_segment_header); + size_t vh_size = sizeof(struct ssdfs_volume_header); + struct ssdfs_volume_header *vh; + struct ssdfs_found_protected_peb *found; + bool magic_valid = false; + u64 cno = U64_MAX, last_cno = U64_MAX; + int err; + + if (!env->found) { + SSDFS_ERR("unable to find protected PEBs\n"); + return -EOPNOTSUPP; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("env %p, start_peb %llu, pebs_count %u\n", + env, env->found->start_peb, + env->found->pebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!env->fsi->devops->read) { + SSDFS_ERR("unable to read from device\n"); + return -EOPNOTSUPP; + } + + env->found->lower_offset = dev_size; + env->found->middle_offset = dev_size; + env->found->upper_offset = dev_size; + + err = ssdfs_calculate_recovery_search_bounds(env, dev_size, + &lower_peb, &lower_off, + &upper_peb, &upper_off); + if (unlikely(err)) { + SSDFS_ERR("fail to calculate search bounds: " + "err %d\n", err); + return err; + } + + env->found->lower_offset = lower_off; + env->found->middle_offset = lower_off; + env->found->upper_offset = upper_off; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("lower_peb %llu, upper_peb %llu\n", + lower_peb, upper_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + while (lower_peb <= upper_peb) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("lower_peb %llu, lower_off %llu\n", + lower_peb, (u64)lower_off); + SSDFS_DBG("upper_peb %llu, upper_off %llu\n", + upper_peb, (u64)upper_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = env->fsi->devops->read(sb, + lower_off, + hdr_size, + env->sbi.vh_buf); + vh = SSDFS_VH(env->sbi.vh_buf); + magic_valid = is_ssdfs_magic_valid(&vh->magic); + cno = le64_to_cpu(SSDFS_SEG_HDR(env->sbi.vh_buf)->cno); + + if (!err && magic_valid) { + found = &env->found->array[SSDFS_LOWER_PEB_INDEX]; + + if (found->peb.peb_id >= U64_MAX) { + ssdfs_store_protected_peb_info(env, + SSDFS_LOWER_PEB_INDEX, + lower_peb); + + env->found->lower_offset = lower_off; + + ssdfs_memcpy(&env->last_vh, 0, vh_size, + env->sbi.vh_buf, 0, vh_size, + vh_size); + ssdfs_backup_sb_info2(env); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("FOUND: lower_peb %llu, " + "lower_bound %llu\n", + lower_peb, lower_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + goto define_last_cno_peb; + } + + ssdfs_store_protected_peb_info(env, + SSDFS_UPPER_PEB_INDEX, + lower_peb); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("FOUND: lower_peb %llu, " + "lower_bound %llu\n", + lower_peb, lower_off); +#endif /* CONFIG_SSDFS_DEBUG */ + +define_last_cno_peb: + if (last_cno >= U64_MAX) { + env->found->middle_offset = lower_off; + ssdfs_store_protected_peb_info(env, + SSDFS_LAST_CNO_PEB_INDEX, + lower_peb); + ssdfs_memcpy(&env->last_vh, 0, vh_size, + env->sbi.vh_buf, 0, vh_size, + vh_size); + ssdfs_backup_sb_info2(env); + last_cno = cno; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("FOUND: lower_peb %llu, " + "middle_offset %llu, " + "cno %llu\n", + lower_peb, lower_off, cno); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (cno > last_cno) { + env->found->middle_offset = lower_off; + ssdfs_store_protected_peb_info(env, + SSDFS_LAST_CNO_PEB_INDEX, + lower_peb); + ssdfs_memcpy(&env->last_vh, 0, vh_size, + env->sbi.vh_buf, 0, vh_size, + vh_size); + ssdfs_backup_sb_info2(env); + last_cno = cno; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("FOUND: lower_peb %llu, " + "middle_offset %llu, " + "cno %llu\n", + lower_peb, lower_off, cno); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + ssdfs_restore_sb_info2(env); +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ignore valid PEB: " + "lower_peb %llu, lower_off %llu, " + "cno %llu, last_cno %llu\n", + lower_peb, lower_off, + cno, last_cno); +#endif /* CONFIG_SSDFS_DEBUG */ + } + } else { + ssdfs_restore_sb_info2(env); +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb %llu (offset %llu) is corrupted\n", + lower_peb, + (unsigned long long)lower_off); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + lower_peb += SSDFS_MAPTBL_PROTECTION_STEP; + lower_off = lower_peb * env->fsi->erasesize; + + if (kthread_should_stop()) + goto finish_search; + } + + found = &env->found->array[SSDFS_UPPER_PEB_INDEX]; + + if (found->peb.peb_id >= U64_MAX) + goto finish_search; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("env->lower_offset %llu, " + "env->middle_offset %llu, " + "env->upper_offset %llu\n", + env->found->lower_offset, + env->found->middle_offset, + env->found->upper_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + SSDFS_RECOVERY_SET_FAST_SEARCH_TRY(env); + + return 0; + +finish_search: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find valid PEB\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + SSDFS_RECOVERY_SET_FAST_SEARCH_TRY(env); + + return -ENODATA; +} + +static inline +int ssdfs_read_sb_peb_checked(struct ssdfs_recovery_env *env, + u64 peb_id) +{ + struct ssdfs_volume_header *vh; + size_t vh_size = sizeof(struct ssdfs_volume_header); + bool magic_valid = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->fsi || !env->fsi->sb); + BUG_ON(!env->sbi.vh_buf); + + SSDFS_DBG("peb_id %llu\n", peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_read_checked_sb_info3(env, peb_id, 0); + vh = SSDFS_VH(env->sbi.vh_buf); + magic_valid = is_ssdfs_magic_valid(&vh->magic); + + if (err || !magic_valid) { + err = -ENODATA; + ssdfs_restore_sb_info2(env); +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb %llu is corrupted\n", + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_check; + } else { + ssdfs_memcpy(&env->last_vh, 0, vh_size, + env->sbi.vh_buf, 0, vh_size, + vh_size); + ssdfs_backup_sb_info2(env); + goto finish_check; + } + +finish_check: + return err; +} + +int ssdfs_find_last_sb_seg_outside_fragment(struct ssdfs_recovery_env *env) +{ +#ifdef CONFIG_SSDFS_DEBUG + size_t hdr_size = sizeof(struct ssdfs_segment_header); +#endif /* CONFIG_SSDFS_DEBUG */ + struct super_block *sb; + struct ssdfs_volume_header *vh; + u64 leb_id; + u64 peb_id; + bool magic_valid = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->fsi || !env->fsi->sb); + BUG_ON(!env->sbi.vh_buf); + BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(env->sbi.vh_buf)->magic)); + BUG_ON(!is_ssdfs_volume_header_csum_valid(env->sbi.vh_buf, hdr_size)); + + SSDFS_DBG("env %p, env->sbi.vh_buf %p\n", env, env->sbi.vh_buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + sb = env->fsi->sb; + err = -ENODATA; + + leb_id = SSDFS_MAIN_SB_LEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_CUR_SB_SEG); + peb_id = SSDFS_MAIN_SB_PEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_CUR_SB_SEG); + + do { + err = ssdfs_read_sb_peb_checked(env, peb_id); + vh = SSDFS_VH(env->sbi.vh_buf); + magic_valid = is_ssdfs_magic_valid(&vh->magic); + + if (err == -ENODATA) + goto finish_search; + else if (err) { + SSDFS_ERR("fail to read peb %llu\n", + peb_id); + goto finish_search; + } else { + u64 new_leb_id; + u64 new_peb_id; + + new_leb_id = + SSDFS_MAIN_SB_LEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_CUR_SB_SEG); + new_peb_id = + SSDFS_MAIN_SB_PEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_CUR_SB_SEG); + + if (new_leb_id != leb_id || new_peb_id != peb_id) { + err = -ENODATA; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("SB segment not found: " + "peb %llu\n", + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_search; + } + + env->sbi.last_log.leb_id = leb_id; + env->sbi.last_log.peb_id = peb_id; + env->sbi.last_log.page_offset = 0; + env->sbi.last_log.pages_count = + SSDFS_LOG_PAGES(env->sbi.vh_buf); + + if (IS_SB_PEB(env)) { + if (is_cur_main_sb_peb_exhausted(env)) { + err = -ENOENT; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb %llu is exhausted\n", + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto try_next_sb_peb; + } else { + err = 0; + goto finish_search; + } + } else { + err = -ENODATA; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("SB segment not found: " + "peb %llu\n", + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_search; + } + } + +try_next_sb_peb: + if (kthread_should_stop()) { + err = -ENODATA; + goto finish_search; + } + + leb_id = SSDFS_MAIN_SB_LEB(SSDFS_VH(env->sbi_backup.vh_buf), + SSDFS_NEXT_SB_SEG); + peb_id = SSDFS_MAIN_SB_PEB(SSDFS_VH(env->sbi_backup.vh_buf), + SSDFS_NEXT_SB_SEG); + } while (magic_valid); + +finish_search: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search outside fragment is finished: " + "err %d\n", err); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +static +int ssdfs_check_cur_main_sb_peb(struct ssdfs_recovery_env *env) +{ + struct ssdfs_volume_header *vh; + u64 leb_id; + u64 peb_id; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env); + BUG_ON(!env->sbi.vh_buf); + + SSDFS_DBG("env %p, env->sbi.vh_buf %p\n", env, env->sbi.vh_buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + vh = SSDFS_VH(env->sbi.vh_buf); + leb_id = SSDFS_MAIN_SB_LEB(vh, SSDFS_CUR_SB_SEG); + peb_id = SSDFS_MAIN_SB_PEB(vh, SSDFS_CUR_SB_SEG); + + ssdfs_backup_sb_info2(env); + + err = ssdfs_read_sb_peb_checked(env, peb_id); + if (err == -ENODATA) + goto finish_check; + else if (err) { + SSDFS_ERR("fail to read peb %llu\n", + peb_id); + goto finish_check; + } else { + u64 new_leb_id; + u64 new_peb_id; + + vh = SSDFS_VH(env->sbi.vh_buf); + new_leb_id = SSDFS_MAIN_SB_LEB(vh, SSDFS_CUR_SB_SEG); + new_peb_id = SSDFS_MAIN_SB_PEB(vh, SSDFS_CUR_SB_SEG); + + if (new_leb_id != leb_id || new_peb_id != peb_id) { + err = -ENODATA; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("SB segment not found: " + "peb %llu\n", + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_check; + } + + env->sbi.last_log.leb_id = leb_id; + env->sbi.last_log.peb_id = peb_id; + env->sbi.last_log.page_offset = 0; + env->sbi.last_log.pages_count = + SSDFS_LOG_PAGES(env->sbi.vh_buf); + + if (IS_SB_PEB(env)) { + if (is_cur_main_sb_peb_exhausted(env)) { + err = -ENOENT; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb %llu is exhausted\n", + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_check; + } else { + err = 0; + goto finish_check; + } + } else { + err = -ENODATA; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("SB segment not found: " + "peb %llu\n", + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_check; + } + } + +finish_check: + return err; +} + +static +int ssdfs_check_cur_copy_sb_peb(struct ssdfs_recovery_env *env) +{ + struct ssdfs_volume_header *vh; + u64 leb_id; + u64 peb_id; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env); + BUG_ON(!env->sbi.vh_buf); + + SSDFS_DBG("env %p, env->sbi.vh_buf %p\n", env, env->sbi.vh_buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + vh = SSDFS_VH(env->sbi.vh_buf); + leb_id = SSDFS_COPY_SB_LEB(vh, SSDFS_CUR_SB_SEG); + peb_id = SSDFS_COPY_SB_PEB(vh, SSDFS_CUR_SB_SEG); + + ssdfs_backup_sb_info2(env); + + err = ssdfs_read_sb_peb_checked(env, peb_id); + if (err == -ENODATA) + goto finish_check; + else if (err) { + SSDFS_ERR("fail to read peb %llu\n", + peb_id); + goto finish_check; + } else { + u64 new_leb_id; + u64 new_peb_id; + + vh = SSDFS_VH(env->sbi.vh_buf); + new_leb_id = SSDFS_COPY_SB_LEB(vh, SSDFS_CUR_SB_SEG); + new_peb_id = SSDFS_COPY_SB_PEB(vh, SSDFS_CUR_SB_SEG); + + if (new_leb_id != leb_id || new_peb_id != peb_id) { + err = -ENODATA; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("SB segment not found: " + "peb %llu\n", + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_check; + } + + env->sbi.last_log.leb_id = leb_id; + env->sbi.last_log.peb_id = peb_id; + env->sbi.last_log.page_offset = 0; + env->sbi.last_log.pages_count = + SSDFS_LOG_PAGES(env->sbi.vh_buf); + + if (IS_SB_PEB(env)) { + if (is_cur_copy_sb_peb_exhausted(env)) { + err = -ENOENT; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb %llu is exhausted\n", + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_check; + } else { + err = 0; + goto finish_check; + } + } else { + err = -ENODATA; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("SB segment not found: " + "peb %llu\n", + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_check; + } + } + +finish_check: + return err; +} + +static +int ssdfs_find_last_sb_seg_inside_fragment(struct ssdfs_recovery_env *env) +{ +#ifdef CONFIG_SSDFS_DEBUG + size_t hdr_size = sizeof(struct ssdfs_segment_header); +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->fsi || !env->fsi->sb); + BUG_ON(!env->sbi.vh_buf); + BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(env->sbi.vh_buf)->magic)); + BUG_ON(!is_ssdfs_volume_header_csum_valid(env->sbi.vh_buf, hdr_size)); + + SSDFS_DBG("env %p, env->sbi.vh_buf %p\n", env, env->sbi.vh_buf); +#endif /* CONFIG_SSDFS_DEBUG */ + +try_next_peb: + if (kthread_should_stop()) { + err = -ENODATA; + goto finish_search; + } + + err = ssdfs_check_cur_main_sb_peb(env); + if (err == -ENODATA) + goto try_cur_copy_sb_peb; + else if (err == -ENOENT) + goto check_next_sb_pebs_pair; + else if (err) + goto finish_search; + else + goto finish_search; + +try_cur_copy_sb_peb: + if (kthread_should_stop()) { + err = -ENODATA; + goto finish_search; + } + + err = ssdfs_check_cur_copy_sb_peb(env); + if (err == -ENODATA || err == -ENOENT) + goto check_next_sb_pebs_pair; + else if (err) + goto finish_search; + else + goto finish_search; + +check_next_sb_pebs_pair: + if (kthread_should_stop()) { + err = -ENODATA; + goto finish_search; + } + + err = ssdfs_check_next_sb_pebs_pair(env); + if (err == -E2BIG) { + err = ssdfs_find_last_sb_seg_outside_fragment(env); + if (err == -ENODATA || err == -ENOENT) { + /* unable to find anything */ + goto check_reserved_sb_pebs_pair; + } else if (err) { + SSDFS_ERR("search outside fragment has failed: " + "err %d\n", err); + goto finish_search; + } else + goto finish_search; + } else if (!err) + goto try_next_peb; + +check_reserved_sb_pebs_pair: + if (kthread_should_stop()) { + err = -ENODATA; + goto finish_search; + } + + err = ssdfs_check_reserved_sb_pebs_pair(env); + if (err == -E2BIG) { + err = ssdfs_find_last_sb_seg_outside_fragment(env); + if (err == -ENODATA || err == -ENOENT) { + /* unable to find anything */ + goto finish_search; + } else if (err) { + SSDFS_ERR("search outside fragment has failed: " + "err %d\n", err); + goto finish_search; + } else + goto finish_search; + } else if (!err) + goto try_next_peb; + +finish_search: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search inside fragment is finished: " + "err %d\n", err); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +static +int ssdfs_find_last_sb_seg_starting_from_peb(struct ssdfs_recovery_env *env, + struct ssdfs_found_peb *ptr) +{ + struct super_block *sb; + struct ssdfs_volume_header *vh; + size_t hdr_size = sizeof(struct ssdfs_segment_header); + size_t vh_size = sizeof(struct ssdfs_volume_header); + u64 offset; + u64 threshold_peb; + u64 peb_id; + u64 cno = U64_MAX; + bool magic_valid = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->found || !env->fsi || !env->fsi->sb); + BUG_ON(!env->sbi.vh_buf); + BUG_ON(!env->fsi->devops->read); + BUG_ON(!ptr); + BUG_ON(ptr->peb_id >= U64_MAX); + + SSDFS_DBG("peb_id %llu, start_peb %llu, pebs_count %u\n", + ptr->peb_id, + env->found->start_peb, + env->found->pebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + sb = env->fsi->sb; + threshold_peb = env->found->start_peb + env->found->pebs_count; + peb_id = ptr->peb_id; + offset = peb_id * env->fsi->erasesize; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_id %llu, offset %llu\n", + peb_id, offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = env->fsi->devops->read(sb, offset, hdr_size, + env->sbi.vh_buf); + vh = SSDFS_VH(env->sbi.vh_buf); + magic_valid = is_ssdfs_magic_valid(&vh->magic); + + if (err || !magic_valid) { + ssdfs_restore_sb_info2(env); + ptr->state = SSDFS_FOUND_PEB_INVALID; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb %llu is corrupted\n", + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + if (ptr->peb_id >= env->found->start_peb && + ptr->peb_id < threshold_peb) { + /* try again */ + return -EAGAIN; + } else { + /* PEB is out of range */ + return -E2BIG; + } + } else { + ssdfs_memcpy(&env->last_vh, 0, vh_size, + env->sbi.vh_buf, 0, vh_size, + vh_size); + ssdfs_backup_sb_info2(env); + cno = le64_to_cpu(SSDFS_SEG_HDR(env->sbi.vh_buf)->cno); + ptr->cno = cno; + ptr->is_superblock_peb = IS_SB_PEB(env); + ptr->state = SSDFS_FOUND_PEB_VALID; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_id %llu, cno %llu, is_superblock_peb %#x\n", + peb_id, cno, ptr->is_superblock_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + if (ptr->peb_id >= env->found->start_peb && + ptr->peb_id < threshold_peb) { + err = ssdfs_find_last_sb_seg_inside_fragment(env); + if (err == -ENODATA || err == -ENOENT) { + ssdfs_restore_sb_info2(env); +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("nothing has been found inside fragment: " + "peb_id %llu\n", + ptr->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EAGAIN; + } else if (err) { + SSDFS_ERR("search inside fragment has failed: " + "err %d\n", err); + return err; + } + } else { + err = ssdfs_find_last_sb_seg_outside_fragment(env); + if (err == -ENODATA || err == -ENOENT) { + ssdfs_restore_sb_info2(env); +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("nothing has been found outside fragment: " + "peb_id %llu\n", + ptr->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -E2BIG; + } else if (err) { + SSDFS_ERR("search outside fragment has failed: " + "err %d\n", err); + return err; + } + } + + return 0; +} + +static +int ssdfs_find_last_sb_seg_for_protected_peb(struct ssdfs_recovery_env *env) +{ + struct super_block *sb; + struct ssdfs_found_protected_peb *protected_peb; + struct ssdfs_found_peb *cur_peb; + u64 dev_size; + u64 threshold_peb; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->found || !env->fsi || !env->fsi->sb); + BUG_ON(!env->sbi.vh_buf); + BUG_ON(!env->fsi->devops->read); + + SSDFS_DBG("env %p, env->sbi.vh_buf %p\n", env, env->sbi.vh_buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + sb = env->fsi->sb; + dev_size = env->fsi->devops->device_size(env->fsi->sb); + threshold_peb = env->found->start_peb + env->found->pebs_count; + + protected_peb = &env->found->array[SSDFS_LAST_CNO_PEB_INDEX]; + + if (protected_peb->peb.peb_id >= U64_MAX) { + SSDFS_ERR("protected hasn't been found\n"); + return -ERANGE; + } + + cur_peb = CUR_MAIN_SB_PEB(&protected_peb->found); + if (cur_peb->peb_id >= U64_MAX) { + SSDFS_ERR("peb_id is invalid\n"); + return -ERANGE; + } + + err = ssdfs_find_last_sb_seg_starting_from_peb(env, cur_peb); + if (err == -EAGAIN || err == -E2BIG) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("nothing was found for peb %llu\n", + cur_peb->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + /* continue search */ + } else if (err) { + SSDFS_ERR("fail to find last superblock segment: " + "err %d\n", err); + goto finish_search; + } else + goto finish_search; + + cur_peb = CUR_COPY_SB_PEB(&protected_peb->found); + if (cur_peb->peb_id >= U64_MAX) { + SSDFS_ERR("peb_id is invalid\n"); + return -ERANGE; + } + + err = ssdfs_find_last_sb_seg_starting_from_peb(env, cur_peb); + if (err == -EAGAIN || err == -E2BIG) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("nothing was found for peb %llu\n", + cur_peb->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + /* continue search */ + } else if (err) { + SSDFS_ERR("fail to find last superblock segment: " + "err %d\n", err); + goto finish_search; + } else + goto finish_search; + + cur_peb = NEXT_MAIN_SB_PEB(&protected_peb->found); + if (cur_peb->peb_id >= U64_MAX) { + SSDFS_ERR("peb_id is invalid\n"); + return -ERANGE; + } + + err = ssdfs_find_last_sb_seg_starting_from_peb(env, cur_peb); + if (err == -EAGAIN || err == -E2BIG) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("nothing was found for peb %llu\n", + cur_peb->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + /* continue search */ + } else if (err) { + SSDFS_ERR("fail to find last superblock segment: " + "err %d\n", err); + goto finish_search; + } else + goto finish_search; + + cur_peb = NEXT_COPY_SB_PEB(&protected_peb->found); + if (cur_peb->peb_id >= U64_MAX) { + SSDFS_ERR("peb_id is invalid\n"); + return -ERANGE; + } + + err = ssdfs_find_last_sb_seg_starting_from_peb(env, cur_peb); + if (err == -EAGAIN || err == -E2BIG) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("nothing was found for peb %llu\n", + cur_peb->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + /* continue search */ + } else if (err) { + SSDFS_ERR("fail to find last superblock segment: " + "err %d\n", err); + goto finish_search; + } else + goto finish_search; + + cur_peb = RESERVED_MAIN_SB_PEB(&protected_peb->found); + if (cur_peb->peb_id >= U64_MAX) { + SSDFS_ERR("peb_id is invalid\n"); + return -ERANGE; + } + + err = ssdfs_find_last_sb_seg_starting_from_peb(env, cur_peb); + if (err == -EAGAIN || err == -E2BIG) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("nothing was found for peb %llu\n", + cur_peb->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + /* continue search */ + } else if (err) { + SSDFS_ERR("fail to find last superblock segment: " + "err %d\n", err); + goto finish_search; + } else + goto finish_search; + + cur_peb = RESERVED_COPY_SB_PEB(&protected_peb->found); + if (cur_peb->peb_id >= U64_MAX) { + SSDFS_ERR("peb_id is invalid\n"); + return -ERANGE; + } + + err = ssdfs_find_last_sb_seg_starting_from_peb(env, cur_peb); + if (err == -EAGAIN || err == -E2BIG) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("nothing was found for peb %llu\n", + cur_peb->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_search; + } else if (err) { + SSDFS_ERR("fail to find last superblock segment: " + "err %d\n", err); + goto finish_search; + } else + goto finish_search; + +finish_search: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search is finished: " + "err %d\n", err); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +static +int ssdfs_recovery_protected_section_fast_search(struct ssdfs_recovery_env *env) +{ + u64 threshold_peb; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->fsi); + BUG_ON(!env->sbi.vh_buf); + + SSDFS_DBG("env %p, env->sbi.vh_buf %p\n", env, env->sbi.vh_buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + threshold_peb = *SSDFS_RECOVERY_CUR_OFF_PTR(env) / env->fsi->erasesize; + + err = ssdfs_find_any_valid_sb_segment2(env, threshold_peb); + if (err) + return err; + + if (kthread_should_stop()) + return -ENOENT; + + err = ssdfs_find_latest_valid_sb_segment2(env); + if (err) + return err; + + return 0; +} + +int ssdfs_recovery_try_fast_search(struct ssdfs_recovery_env *env) +{ + struct ssdfs_found_protected_peb *found; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->found); + BUG_ON(!env->sbi.vh_buf); + + SSDFS_DBG("env %p, start_peb %llu, pebs_count %u\n", + env, env->found->start_peb, + env->found->pebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_find_valid_protected_pebs(env); + if (err == -ENODATA) { + found = &env->found->array[SSDFS_LOWER_PEB_INDEX]; + + if (found->peb.peb_id >= U64_MAX) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("no valid protected PEBs in fragment: " + "start_peb %llu, pebs_count %u\n", + env->found->start_peb, + env->found->pebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_fast_search; + } else { + /* search only in the last valid section */ + err = ssdfs_recovery_protected_section_fast_search(env); + goto finish_fast_search; + } + } else if (err) { + SSDFS_ERR("fail to find protected PEBs: " + "start_peb %llu, pebs_count %u, err %d\n", + env->found->start_peb, + env->found->pebs_count, err); + goto finish_fast_search; + } + + err = ssdfs_find_last_sb_seg_for_protected_peb(env); + if (err == -EAGAIN) { + *SSDFS_RECOVERY_CUR_OFF_PTR(env) = env->found->middle_offset; + err = ssdfs_recovery_protected_section_fast_search(env); + if (err == -ENODATA || err == -E2BIG) { + SSDFS_DBG("SEARCH FINISHED: " + "nothing was found\n"); + goto finish_fast_search; + } else if (err) { + SSDFS_ERR("fail to find last SB segment: " + "err %d\n", err); + goto finish_fast_search; + } + } else if (err == -ENODATA || err == -E2BIG) { + SSDFS_DBG("SEARCH FINISHED: " + "nothing was found\n"); + goto finish_fast_search; + } else if (err) { + SSDFS_ERR("fail to find last SB segment: " + "err %d\n", err); + goto finish_fast_search; + } + +finish_fast_search: + return err; +} diff --git a/fs/ssdfs/recovery_slow_search.c b/fs/ssdfs/recovery_slow_search.c new file mode 100644 index 000000000000..ca4d12b24ab3 --- /dev/null +++ b/fs/ssdfs/recovery_slow_search.c @@ -0,0 +1,585 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/recovery_slow_search.c - slow superblock search. + * + * Copyright (c) 2020-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + */ + +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "page_array.h" +#include "page_vector.h" +#include "peb.h" +#include "segment_bitmap.h" +#include "peb_mapping_table.h" +#include "recovery.h" + +#include + +int ssdfs_find_latest_valid_sb_segment2(struct ssdfs_recovery_env *env) +{ +#ifdef CONFIG_SSDFS_DEBUG + size_t hdr_size = sizeof(struct ssdfs_segment_header); +#endif /* CONFIG_SSDFS_DEBUG */ + struct ssdfs_volume_header *last_vh; + u64 dev_size; + u64 cur_main_sb_peb, cur_copy_sb_peb; + u64 start_peb, next_peb; + u64 start_offset; + u64 step; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->fsi); + BUG_ON(!env->sbi.vh_buf); + BUG_ON(!env->fsi->devops->read); + BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(env->sbi.vh_buf)->magic)); + BUG_ON(!is_ssdfs_volume_header_csum_valid(env->sbi.vh_buf, hdr_size)); + + SSDFS_DBG("env %p, env->sbi.vh_buf %p\n", env, env->sbi.vh_buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + dev_size = env->fsi->devops->device_size(env->fsi->sb); + step = env->fsi->erasesize; + +try_next_peb: + if (kthread_should_stop()) { + err = -ENODATA; + goto rollback_valid_vh; + } + + last_vh = SSDFS_VH(env->sbi.vh_buf); + cur_main_sb_peb = SSDFS_MAIN_SB_PEB(last_vh, SSDFS_CUR_SB_SEG); + cur_copy_sb_peb = SSDFS_COPY_SB_PEB(last_vh, SSDFS_CUR_SB_SEG); + + if (cur_main_sb_peb != env->sbi.last_log.peb_id && + cur_copy_sb_peb != env->sbi.last_log.peb_id) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("volume header is corrupted\n"); + SSDFS_DBG("cur_main_sb_peb %llu, cur_copy_sb_peb %llu, " + "read PEB %llu\n", + cur_main_sb_peb, cur_copy_sb_peb, + env->sbi.last_log.peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto continue_search; + } + + if (cur_main_sb_peb == env->sbi.last_log.peb_id) { + if (!is_cur_main_sb_peb_exhausted(env)) + goto end_search; + } else { + if (!is_cur_copy_sb_peb_exhausted(env)) + goto end_search; + } + + err = ssdfs_check_next_sb_pebs_pair(env); + if (err == -E2BIG) + goto continue_search; + else if (err == -ENODATA || err == -ENOENT) + goto check_reserved_sb_pebs_pair; + else if (!err) + goto try_next_peb; + +check_reserved_sb_pebs_pair: + if (kthread_should_stop()) { + err = -ENODATA; + goto rollback_valid_vh; + } + + err = ssdfs_check_reserved_sb_pebs_pair(env); + if (err == -E2BIG || err == -ENODATA || err == -ENOENT) + goto continue_search; + else if (!err) + goto try_next_peb; + +continue_search: + if (kthread_should_stop()) { + err = -ENODATA; + goto rollback_valid_vh; + } + + start_offset = *SSDFS_RECOVERY_CUR_OFF_PTR(env) + env->fsi->erasesize; + start_peb = start_offset / env->fsi->erasesize; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_peb %llu, start_offset %llu, " + "end_offset %llu\n", + start_peb, start_offset, + SSDFS_RECOVERY_UPPER_OFF(env)); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_find_any_valid_volume_header2(env, + start_offset, + SSDFS_RECOVERY_UPPER_OFF(env), + step); + if (err == -E2BIG) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find any valid header: " + "peb_id %llu\n", + start_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + goto end_search; + } else if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find any valid header: " + "peb_id %llu\n", + start_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + goto rollback_valid_vh; + } + + if (kthread_should_stop()) { + err = -ENODATA; + goto rollback_valid_vh; + } + + if (*SSDFS_RECOVERY_CUR_OFF_PTR(env) >= U64_MAX) { + err = -ENODATA; + goto rollback_valid_vh; + } + + next_peb = *SSDFS_RECOVERY_CUR_OFF_PTR(env) / env->fsi->erasesize; + + err = ssdfs_find_any_valid_sb_segment2(env, next_peb); + if (err == -E2BIG) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find any valid header: " + "peb_id %llu\n", + start_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + goto end_search; + } else if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find any valid sb seg: " + "peb_id %llu\n", + next_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + goto rollback_valid_vh; + } else + goto try_next_peb; + +rollback_valid_vh: + ssdfs_restore_sb_info2(env); + +end_search: + return err; +} + +static inline +bool need_continue_search(struct ssdfs_recovery_env *env) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_off %llu, upper_off %llu\n", + *SSDFS_RECOVERY_CUR_OFF_PTR(env), + SSDFS_RECOVERY_UPPER_OFF(env)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return *SSDFS_RECOVERY_CUR_OFF_PTR(env) < SSDFS_RECOVERY_UPPER_OFF(env); +} + +static +int ssdfs_recovery_first_phase_slow_search(struct ssdfs_recovery_env *env) +{ + u64 threshold_peb; + u64 peb_id; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->fsi); + BUG_ON(!env->sbi.vh_buf); + + SSDFS_DBG("env %p, env->sbi.vh_buf %p\n", env, env->sbi.vh_buf); +#endif /* CONFIG_SSDFS_DEBUG */ + +try_another_search: + if (kthread_should_stop()) { + err = -ENOENT; + goto finish_first_phase; + } + + threshold_peb = *SSDFS_RECOVERY_CUR_OFF_PTR(env) / env->fsi->erasesize; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_off %llu, threshold_peb %llu\n", + *SSDFS_RECOVERY_CUR_OFF_PTR(env), + threshold_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_find_any_valid_sb_segment2(env, threshold_peb); + if (err == -E2BIG) { + ssdfs_restore_sb_info2(env); + err = ssdfs_find_last_sb_seg_outside_fragment(env); + if (err == -ENODATA || err == -ENOENT) { + if (kthread_should_stop()) { + err = -ENOENT; + goto finish_first_phase; + } + + if (need_continue_search(env)) { + ssdfs_restore_sb_info2(env); + + peb_id = *SSDFS_RECOVERY_CUR_OFF_PTR(env) / + env->fsi->erasesize; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_off %llu, peb %llu\n", + *SSDFS_RECOVERY_CUR_OFF_PTR(env), + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_find_any_valid_volume_header2(env, + *SSDFS_RECOVERY_CUR_OFF_PTR(env), + SSDFS_RECOVERY_UPPER_OFF(env), + env->fsi->erasesize); + if (err) { + SSDFS_DBG("valid magic is not found\n"); + goto finish_first_phase; + } else + goto try_another_search; + } else + goto finish_first_phase; + } else + goto finish_first_phase; + } else if (err == -ENODATA || err == -ENOENT) { + if (kthread_should_stop()) + err = -ENOENT; + else + err = -EAGAIN; + + goto finish_first_phase; + } else if (err) + goto finish_first_phase; + + if (kthread_should_stop()) { + err = -ENOENT; + goto finish_first_phase; + } + + err = ssdfs_find_latest_valid_sb_segment2(env); + if (err == -ENODATA || err == -ENOENT) { + if (kthread_should_stop()) { + err = -ENOENT; + goto finish_first_phase; + } + + if (need_continue_search(env)) { + ssdfs_restore_sb_info2(env); + + peb_id = *SSDFS_RECOVERY_CUR_OFF_PTR(env) / + env->fsi->erasesize; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_off %llu, peb %llu\n", + *SSDFS_RECOVERY_CUR_OFF_PTR(env), + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_find_any_valid_volume_header2(env, + *SSDFS_RECOVERY_CUR_OFF_PTR(env), + SSDFS_RECOVERY_UPPER_OFF(env), + env->fsi->erasesize); + if (err) { + SSDFS_DBG("valid magic is not found\n"); + goto finish_first_phase; + } else + goto try_another_search; + } else + goto finish_first_phase; + } + +finish_first_phase: + return err; +} + +static +int ssdfs_recovery_second_phase_slow_search(struct ssdfs_recovery_env *env) +{ + u64 threshold_peb; + u64 peb_id; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->fsi); + BUG_ON(!env->sbi.vh_buf); + + SSDFS_DBG("env %p, env->sbi.vh_buf %p\n", env, env->sbi.vh_buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_second_slow_try_possible(env)) { + SSDFS_DBG("there is no room for second slow try\n"); + return -EAGAIN; + } + + SSDFS_RECOVERY_SET_SECOND_SLOW_TRY(env); + +try_another_search: + if (kthread_should_stop()) + return -ENOENT; + + peb_id = *SSDFS_RECOVERY_CUR_OFF_PTR(env) / + env->fsi->erasesize; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_off %llu, peb %llu\n", + *SSDFS_RECOVERY_CUR_OFF_PTR(env), + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_find_any_valid_volume_header2(env, + *SSDFS_RECOVERY_CUR_OFF_PTR(env), + SSDFS_RECOVERY_UPPER_OFF(env), + env->fsi->erasesize); + if (err) { + SSDFS_DBG("valid magic is not detected\n"); + return err; + } + + if (kthread_should_stop()) + return -ENOENT; + + threshold_peb = *SSDFS_RECOVERY_CUR_OFF_PTR(env) / env->fsi->erasesize; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_off %llu, threshold_peb %llu\n", + *SSDFS_RECOVERY_CUR_OFF_PTR(env), + threshold_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_find_any_valid_sb_segment2(env, threshold_peb); + if (err == -E2BIG) { + ssdfs_restore_sb_info2(env); + err = ssdfs_find_last_sb_seg_outside_fragment(env); + if (err == -ENODATA || err == -ENOENT) { + if (kthread_should_stop()) { + err = -ENOENT; + goto finish_second_phase; + } + + if (need_continue_search(env)) { + ssdfs_restore_sb_info2(env); + goto try_another_search; + } else + goto finish_second_phase; + } else + goto finish_second_phase; + } else if (err == -ENODATA || err == -ENOENT) { + if (kthread_should_stop()) + err = -ENOENT; + else + err = -EAGAIN; + + goto finish_second_phase; + } else if (err) + goto finish_second_phase; + + if (kthread_should_stop()) { + err = -ENOENT; + goto finish_second_phase; + } + + err = ssdfs_find_latest_valid_sb_segment2(env); + if (err == -ENODATA || err == -ENOENT) { + if (kthread_should_stop()) { + err = -ENOENT; + goto finish_second_phase; + } + + if (need_continue_search(env)) { + ssdfs_restore_sb_info2(env); + goto try_another_search; + } else + goto finish_second_phase; + } + +finish_second_phase: + return err; +} + +static +int ssdfs_recovery_third_phase_slow_search(struct ssdfs_recovery_env *env) +{ + u64 threshold_peb; + u64 peb_id; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->fsi); + BUG_ON(!env->sbi.vh_buf); + + SSDFS_DBG("env %p, env->sbi.vh_buf %p\n", env, env->sbi.vh_buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_third_slow_try_possible(env)) { + SSDFS_DBG("there is no room for third slow try\n"); + return -ENODATA; + } + + SSDFS_RECOVERY_SET_THIRD_SLOW_TRY(env); + +try_another_search: + if (kthread_should_stop()) + return -ENOENT; + + peb_id = *SSDFS_RECOVERY_CUR_OFF_PTR(env) / + env->fsi->erasesize; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_off %llu, peb %llu\n", + *SSDFS_RECOVERY_CUR_OFF_PTR(env), + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_find_any_valid_volume_header2(env, + *SSDFS_RECOVERY_CUR_OFF_PTR(env), + SSDFS_RECOVERY_UPPER_OFF(env), + env->fsi->erasesize); + if (err) { + SSDFS_DBG("valid magic is not detected\n"); + return err; + } + + if (kthread_should_stop()) + return -ENOENT; + + threshold_peb = *SSDFS_RECOVERY_CUR_OFF_PTR(env) / env->fsi->erasesize; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_off %llu, threshold_peb %llu\n", + *SSDFS_RECOVERY_CUR_OFF_PTR(env), + threshold_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_find_any_valid_sb_segment2(env, threshold_peb); + if (err == -E2BIG) { + ssdfs_restore_sb_info2(env); + err = ssdfs_find_last_sb_seg_outside_fragment(env); + if (err == -ENODATA || err == -ENOENT) { + if (kthread_should_stop()) { + err = -ENOENT; + goto finish_third_phase; + } + + if (need_continue_search(env)) { + ssdfs_restore_sb_info2(env); + goto try_another_search; + } else + goto finish_third_phase; + } else + goto finish_third_phase; + } else if (err) + goto finish_third_phase; + + if (kthread_should_stop()) { + err = -ENOENT; + goto finish_third_phase; + } + + err = ssdfs_find_latest_valid_sb_segment2(env); + if (err == -ENODATA || err == -ENOENT) { + if (kthread_should_stop()) { + err = -ENOENT; + goto finish_third_phase; + } + + if (need_continue_search(env)) { + ssdfs_restore_sb_info2(env); + goto try_another_search; + } else + goto finish_third_phase; + } + +finish_third_phase: + return err; +} + +int ssdfs_recovery_try_slow_search(struct ssdfs_recovery_env *env) +{ + struct ssdfs_found_protected_peb *protected_peb; + struct ssdfs_volume_header *vh; + size_t vh_size = sizeof(struct ssdfs_volume_header); + bool magic_valid = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->found); + BUG_ON(!env->sbi.vh_buf); + + SSDFS_DBG("env %p, start_peb %llu, pebs_count %u\n", + env, env->found->start_peb, env->found->pebs_count); + SSDFS_DBG("env->lower_offset %llu, env->upper_offset %llu\n", + env->found->lower_offset, env->found->upper_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + protected_peb = &env->found->array[SSDFS_LAST_CNO_PEB_INDEX]; + + if (protected_peb->peb.peb_id >= U64_MAX) { + SSDFS_DBG("fragment is empty\n"); + return -ENODATA; + } + + err = ssdfs_read_checked_sb_info3(env, protected_peb->peb.peb_id, 0); + vh = SSDFS_VH(env->sbi.vh_buf); + magic_valid = is_ssdfs_magic_valid(&vh->magic); + + if (err || !magic_valid) { + err = -ENODATA; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb %llu is corrupted\n", + protected_peb->peb.peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_search; + } else { + ssdfs_memcpy(&env->last_vh, 0, vh_size, + env->sbi.vh_buf, 0, vh_size, + vh_size); + ssdfs_backup_sb_info2(env); + } + + if (env->found->start_peb == 0) + env->found->lower_offset = SSDFS_RESERVED_VBR_SIZE; + else { + env->found->lower_offset = + env->found->start_peb * env->fsi->erasesize; + } + + env->found->upper_offset = (env->found->start_peb + + env->found->pebs_count - 1); + env->found->upper_offset *= env->fsi->erasesize; + + SSDFS_RECOVERY_SET_FIRST_SLOW_TRY(env); + + err = ssdfs_recovery_first_phase_slow_search(env); + if (err == -EAGAIN || err == -E2BIG || + err == -ENODATA || err == -ENOENT) { + if (kthread_should_stop()) { + err = -ENOENT; + goto finish_search; + } + + err = ssdfs_recovery_second_phase_slow_search(env); + if (err == -EAGAIN || err == -E2BIG || + err == -ENODATA || err == -ENOENT) { + if (kthread_should_stop()) { + err = -ENOENT; + goto finish_search; + } + + err = ssdfs_recovery_third_phase_slow_search(env); + } + } + +finish_search: + return err; +} diff --git a/fs/ssdfs/recovery_thread.c b/fs/ssdfs/recovery_thread.c new file mode 100644 index 000000000000..cd1424762059 --- /dev/null +++ b/fs/ssdfs/recovery_thread.c @@ -0,0 +1,1196 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/recovery_thread.c - recovery thread's logic. + * + * Copyright (c) 2019-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + */ + +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "page_array.h" +#include "page_vector.h" +#include "peb.h" +#include "segment_bitmap.h" +#include "peb_mapping_table.h" +#include "recovery.h" + +#include + +void ssdfs_backup_sb_info2(struct ssdfs_recovery_env *env) +{ + size_t hdr_size = sizeof(struct ssdfs_segment_header); + size_t footer_size = sizeof(struct ssdfs_log_footer); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env); + BUG_ON(!env->sbi.vh_buf || !env->sbi.vs_buf); + BUG_ON(!env->sbi_backup.vh_buf || !env->sbi_backup.vs_buf); + + SSDFS_DBG("last_log: leb_id %llu, peb_id %llu, " + "page_offset %u, pages_count %u, " + "volume state: free_pages %llu, timestamp %#llx, " + "cno %#llx, fs_state %#x\n", + env->sbi.last_log.leb_id, + env->sbi.last_log.peb_id, + env->sbi.last_log.page_offset, + env->sbi.last_log.pages_count, + le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->free_pages), + le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->timestamp), + le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->cno), + le16_to_cpu(SSDFS_VS(env->sbi.vs_buf)->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_memcpy(env->sbi_backup.vh_buf, 0, hdr_size, + env->sbi.vh_buf, 0, hdr_size, + hdr_size); + ssdfs_memcpy(env->sbi_backup.vs_buf, 0, footer_size, + env->sbi.vs_buf, 0, footer_size, + footer_size); + ssdfs_memcpy(&env->sbi_backup.last_log, + 0, sizeof(struct ssdfs_peb_extent), + &env->sbi.last_log, + 0, sizeof(struct ssdfs_peb_extent), + sizeof(struct ssdfs_peb_extent)); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("last_log: leb_id %llu, peb_id %llu, " + "page_offset %u, pages_count %u, " + "volume state: free_pages %llu, timestamp %#llx, " + "cno %#llx, fs_state %#x\n", + env->sbi.last_log.leb_id, + env->sbi.last_log.peb_id, + env->sbi.last_log.page_offset, + env->sbi.last_log.pages_count, + le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->free_pages), + le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->timestamp), + le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->cno), + le16_to_cpu(SSDFS_VS(env->sbi.vs_buf)->state)); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +void ssdfs_restore_sb_info2(struct ssdfs_recovery_env *env) +{ + size_t hdr_size = sizeof(struct ssdfs_segment_header); + size_t footer_size = sizeof(struct ssdfs_log_footer); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env); + BUG_ON(!env->sbi.vh_buf || !env->sbi.vs_buf); + BUG_ON(!env->sbi_backup.vh_buf || !env->sbi_backup.vs_buf); + + SSDFS_DBG("last_log: leb_id %llu, peb_id %llu, " + "page_offset %u, pages_count %u, " + "volume state: free_pages %llu, timestamp %#llx, " + "cno %#llx, fs_state %#x\n", + env->sbi.last_log.leb_id, + env->sbi.last_log.peb_id, + env->sbi.last_log.page_offset, + env->sbi.last_log.pages_count, + le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->free_pages), + le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->timestamp), + le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->cno), + le16_to_cpu(SSDFS_VS(env->sbi.vs_buf)->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_memcpy(env->sbi.vh_buf, 0, hdr_size, + env->sbi_backup.vh_buf, 0, hdr_size, + hdr_size); + ssdfs_memcpy(env->sbi.vs_buf, 0, footer_size, + env->sbi_backup.vs_buf, 0, footer_size, + footer_size); + ssdfs_memcpy(&env->sbi.last_log, + 0, sizeof(struct ssdfs_peb_extent), + &env->sbi_backup.last_log, + 0, sizeof(struct ssdfs_peb_extent), + sizeof(struct ssdfs_peb_extent)); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("last_log: leb_id %llu, peb_id %llu, " + "page_offset %u, pages_count %u, " + "volume state: free_pages %llu, timestamp %#llx, " + "cno %#llx, fs_state %#x\n", + env->sbi.last_log.leb_id, + env->sbi.last_log.peb_id, + env->sbi.last_log.page_offset, + env->sbi.last_log.pages_count, + le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->free_pages), + le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->timestamp), + le64_to_cpu(SSDFS_VS(env->sbi.vs_buf)->cno), + le16_to_cpu(SSDFS_VS(env->sbi.vs_buf)->state)); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +int ssdfs_read_checked_sb_info3(struct ssdfs_recovery_env *env, + u64 peb_id, u32 pages_off) +{ + u32 lf_off; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->fsi); + + SSDFS_DBG("env %p, peb_id %llu, pages_off %u\n", + env, peb_id, pages_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_read_checked_segment_header(env->fsi, peb_id, pages_off, + env->sbi.vh_buf, true); + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("volume header is corrupted: " + "peb_id %llu, offset %d, err %d\n", + peb_id, pages_off, err); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } + + lf_off = SSDFS_LOG_FOOTER_OFF(env->sbi.vh_buf); + + err = ssdfs_read_checked_log_footer(env->fsi, + SSDFS_SEG_HDR(env->sbi.vh_buf), + peb_id, lf_off, env->sbi.vs_buf, + true); + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log footer is corrupted: " + "peb_id %llu, offset %d, err %d\n", + peb_id, lf_off, err); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } + + return 0; +} + +static inline +int ssdfs_read_and_check_volume_header(struct ssdfs_recovery_env *env, + u64 offset) +{ + struct super_block *sb; + struct ssdfs_volume_header *vh; + size_t hdr_size = sizeof(struct ssdfs_segment_header); + u64 dev_size; + bool magic_valid, crc_valid, hdr_consistent; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->fsi); + BUG_ON(!env->fsi->devops->read); + + SSDFS_DBG("env %p, offset %llu\n", + env, offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + sb = env->fsi->sb; + dev_size = env->fsi->devops->device_size(sb); + + err = env->fsi->devops->read(sb, offset, hdr_size, + env->sbi.vh_buf); + if (err) + goto found_corrupted_peb; + + err = -ENODATA; + + vh = SSDFS_VH(env->sbi.vh_buf); + magic_valid = is_ssdfs_magic_valid(&vh->magic); + if (magic_valid) { + crc_valid = is_ssdfs_volume_header_csum_valid(env->sbi.vh_buf, + hdr_size); + hdr_consistent = is_ssdfs_volume_header_consistent(env->fsi, vh, + dev_size); + + if (crc_valid && hdr_consistent) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found offset %llu\n", + offset); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + } + +found_corrupted_peb: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb %llu (offset %llu) is corrupted\n", + offset / env->fsi->erasesize, offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +int __ssdfs_find_any_valid_volume_header2(struct ssdfs_recovery_env *env, + u64 start_offset, + u64 end_offset, + u64 step) +{ + u64 dev_size; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->fsi); + BUG_ON(!env->fsi->devops->read); + + SSDFS_DBG("env %p, start_offset %llu, " + "end_offset %llu, step %llu\n", + env, start_offset, end_offset, step); +#endif /* CONFIG_SSDFS_DEBUG */ + + dev_size = env->fsi->devops->device_size(env->fsi->sb); + end_offset = min_t(u64, dev_size, end_offset); + + *SSDFS_RECOVERY_CUR_OFF_PTR(env) = start_offset; + + if (start_offset >= end_offset) { + err = -E2BIG; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_offset %llu, end_offset %llu, err %d\n", + start_offset, end_offset, err); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } + + while (*SSDFS_RECOVERY_CUR_OFF_PTR(env) < end_offset) { + if (kthread_should_stop()) + return -ENOENT; + + err = ssdfs_read_and_check_volume_header(env, + *SSDFS_RECOVERY_CUR_OFF_PTR(env)); + if (!err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found offset %llu\n", + *SSDFS_RECOVERY_CUR_OFF_PTR(env)); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + *SSDFS_RECOVERY_CUR_OFF_PTR(env) += step; + } + + return -E2BIG; +} + +int ssdfs_find_any_valid_sb_segment2(struct ssdfs_recovery_env *env, + u64 threshold_peb) +{ + size_t vh_size = sizeof(struct ssdfs_volume_header); + struct ssdfs_volume_header *vh; + struct ssdfs_segment_header *seg_hdr; + u64 dev_size; + u64 start_peb; + loff_t start_offset, next_offset; + u64 last_cno, cno; + __le64 peb1, peb2; + __le64 leb1, leb2; + u64 checked_pebs[SSDFS_SB_CHAIN_MAX][SSDFS_SB_SEG_COPY_MAX]; + u64 step; + int i, j; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->found || !env->fsi); + BUG_ON(!env->fsi->devops->read); + + SSDFS_DBG("env %p, start_peb %llu, " + "pebs_count %u, threshold_peb %llu\n", + env, env->found->start_peb, + env->found->pebs_count, threshold_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + dev_size = env->fsi->devops->device_size(env->fsi->sb); + step = env->fsi->erasesize; + + start_peb = max_t(u64, + *SSDFS_RECOVERY_CUR_OFF_PTR(env) / env->fsi->erasesize, + threshold_peb); + start_offset = start_peb * env->fsi->erasesize; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_peb %llu, start_offset %llu, " + "end_offset %llu\n", + start_peb, start_offset, + SSDFS_RECOVERY_UPPER_OFF(env)); +#endif /* CONFIG_SSDFS_DEBUG */ + + *SSDFS_RECOVERY_CUR_OFF_PTR(env) = start_offset; + + if (start_offset >= SSDFS_RECOVERY_UPPER_OFF(env)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_offset %llu >= end_offset %llu\n", + start_offset, SSDFS_RECOVERY_UPPER_OFF(env)); +#endif /* CONFIG_SSDFS_DEBUG */ + return -E2BIG; + } + + i = SSDFS_SB_CHAIN_MAX; + memset(checked_pebs, 0xFF, + (SSDFS_SB_CHAIN_MAX * sizeof(u64)) + + (SSDFS_SB_SEG_COPY_MAX * sizeof(u64))); + +try_next_volume_portion: + ssdfs_memcpy(&env->last_vh, 0, vh_size, + env->sbi.vh_buf, 0, vh_size, + vh_size); + last_cno = le64_to_cpu(SSDFS_SEG_HDR(env->sbi.vh_buf)->cno); + +try_again: + if (kthread_should_stop()) + return -ENODATA; + + switch (i) { + case SSDFS_SB_CHAIN_MAX: + i = SSDFS_CUR_SB_SEG; + break; + + case SSDFS_CUR_SB_SEG: + i = SSDFS_NEXT_SB_SEG; + break; + + case SSDFS_NEXT_SB_SEG: + i = SSDFS_RESERVED_SB_SEG; + break; + + default: + start_offset = (threshold_peb * env->fsi->erasesize) + step; + start_offset = max_t(u64, start_offset, + *SSDFS_RECOVERY_CUR_OFF_PTR(env) + step); + *SSDFS_RECOVERY_CUR_OFF_PTR(env) = start_offset; + err = __ssdfs_find_any_valid_volume_header2(env, start_offset, + SSDFS_RECOVERY_UPPER_OFF(env), step); + if (!err) { + i = SSDFS_SB_CHAIN_MAX; + threshold_peb = *SSDFS_RECOVERY_CUR_OFF_PTR(env); + threshold_peb /= env->fsi->erasesize; + goto try_next_volume_portion; + } + + /* the fragment is checked completely */ + return err; + } + + err = -ENODATA; + + for (j = SSDFS_MAIN_SB_SEG; j < SSDFS_SB_SEG_COPY_MAX; j++) { + u64 leb_id = le64_to_cpu(env->last_vh.sb_pebs[i][j].leb_id); + u64 peb_id = le64_to_cpu(env->last_vh.sb_pebs[i][j].peb_id); + u16 seg_type; + u32 erasesize = env->fsi->erasesize; + + if (kthread_should_stop()) + return -ENODATA; + + if (peb_id == U64_MAX || leb_id == U64_MAX) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("invalid peb_id %llu, leb_id %llu, " + "sb_chain %d, sb_copy %d\n", + leb_id, peb_id, i, j); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("leb_id %llu, peb_id %llu, " + "checked_peb %llu, threshold_peb %llu\n", + leb_id, peb_id, + checked_pebs[i][j], + threshold_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (checked_pebs[i][j] == peb_id) + continue; + else + checked_pebs[i][j] = peb_id; + + next_offset = peb_id * erasesize; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_id %llu, next_offset %llu, " + "cur_offset %llu, end_offset %llu\n", + peb_id, next_offset, + *SSDFS_RECOVERY_CUR_OFF_PTR(env), + SSDFS_RECOVERY_UPPER_OFF(env)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (next_offset >= SSDFS_RECOVERY_UPPER_OFF(env)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find valid SB segment: " + "next_offset %llu >= end_offset %llu\n", + next_offset, + SSDFS_RECOVERY_UPPER_OFF(env)); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } + + if ((env->found->start_peb * erasesize) > next_offset) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find valid SB segment: " + "next_offset %llu >= start_offset %llu\n", + next_offset, + env->found->start_peb * erasesize); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } + + err = ssdfs_read_checked_sb_info3(env, peb_id, 0); + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_id %llu is corrupted: err %d\n", + peb_id, err); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } + + env->sbi.last_log.leb_id = leb_id; + env->sbi.last_log.peb_id = peb_id; + env->sbi.last_log.page_offset = 0; + env->sbi.last_log.pages_count = + SSDFS_LOG_PAGES(env->sbi.vh_buf); + + seg_hdr = SSDFS_SEG_HDR(env->sbi.vh_buf); + seg_type = SSDFS_SEG_TYPE(seg_hdr); + + if (seg_type == SSDFS_SB_SEG_TYPE) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PEB %llu has been found\n", + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } else { + err = -EIO; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PEB %llu is not sb segment\n", + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + if (!err) + goto compare_vh_info; + } + + if (err) { + ssdfs_memcpy(env->sbi.vh_buf, 0, vh_size, + &env->last_vh, 0, vh_size, + vh_size); + goto try_again; + } + +compare_vh_info: + vh = SSDFS_VH(env->sbi.vh_buf); + seg_hdr = SSDFS_SEG_HDR(env->sbi.vh_buf); + leb1 = env->last_vh.sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG].leb_id; + leb2 = vh->sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG].leb_id; + peb1 = env->last_vh.sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG].peb_id; + peb2 = vh->sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG].peb_id; + cno = le64_to_cpu(seg_hdr->cno); + + if (cno > last_cno && (leb1 != leb2 || peb1 != peb2)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cno %llu, last_cno %llu, " + "leb1 %llu, leb2 %llu, " + "peb1 %llu, peb2 %llu\n", + cno, last_cno, + le64_to_cpu(leb1), le64_to_cpu(leb2), + le64_to_cpu(peb1), le64_to_cpu(peb2)); +#endif /* CONFIG_SSDFS_DEBUG */ + goto try_again; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find any valid segment with superblocks chain\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; +} + +static inline +bool is_sb_peb_exhausted(struct ssdfs_recovery_env *env, + u64 leb_id, u64 peb_id) +{ +#ifdef CONFIG_SSDFS_DEBUG + size_t hdr_size = sizeof(struct ssdfs_segment_header); +#endif /* CONFIG_SSDFS_DEBUG */ + struct ssdfs_peb_extent checking_page; + u64 pages_per_peb; + u16 sb_seg_log_pages; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->fsi); + BUG_ON(!env->sbi.vh_buf); + BUG_ON(!env->fsi->devops->read); + BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(env->sbi.vh_buf)->magic)); + BUG_ON(!is_ssdfs_volume_header_csum_valid(env->sbi.vh_buf, hdr_size)); + + SSDFS_DBG("env %p, env->sbi.vh_buf %p, " + "leb_id %llu, peb_id %llu\n", + env, env->sbi.vh_buf, + leb_id, peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + sb_seg_log_pages = + le16_to_cpu(SSDFS_VH(env->sbi.vh_buf)->sb_seg_log_pages); + + if (!env->fsi->devops->can_write_page) { + SSDFS_CRIT("fail to find latest valid sb info: " + "can_write_page is not supported\n"); + return true; + } + + if (leb_id >= U64_MAX || peb_id >= U64_MAX) { + SSDFS_ERR("invalid leb_id %llu or peb_id %llu\n", + leb_id, peb_id); + return true; + } + + if (env->fsi->is_zns_device) { + pages_per_peb = div64_u64(env->fsi->zone_capacity, + env->fsi->pagesize); + } else + pages_per_peb = env->fsi->pages_per_peb; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(pages_per_peb >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + checking_page.leb_id = leb_id; + checking_page.peb_id = peb_id; + checking_page.page_offset = (u32)pages_per_peb - sb_seg_log_pages; + checking_page.pages_count = 1; + + err = ssdfs_can_write_sb_log(env->fsi->sb, &checking_page); + if (!err) + return false; + + return true; +} + +bool is_cur_main_sb_peb_exhausted(struct ssdfs_recovery_env *env) +{ + u64 leb_id; + u64 peb_id; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->fsi); + BUG_ON(!env->sbi.vh_buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + leb_id = SSDFS_MAIN_SB_LEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_CUR_SB_SEG); + peb_id = SSDFS_MAIN_SB_PEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_CUR_SB_SEG); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("env %p, env->sbi.vh_buf %p, " + "leb_id %llu, peb_id %llu\n", + env, env->sbi.vh_buf, + leb_id, peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + return is_sb_peb_exhausted(env, leb_id, peb_id); +} + +bool is_cur_copy_sb_peb_exhausted(struct ssdfs_recovery_env *env) +{ + u64 leb_id; + u64 peb_id; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->fsi); + BUG_ON(!env->sbi.vh_buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + leb_id = SSDFS_COPY_SB_LEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_CUR_SB_SEG); + peb_id = SSDFS_COPY_SB_PEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_CUR_SB_SEG); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("env %p, env->sbi.vh_buf %p, " + "leb_id %llu, peb_id %llu\n", + env, env->sbi.vh_buf, + leb_id, peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + return is_sb_peb_exhausted(env, leb_id, peb_id); +} + +static +int ssdfs_check_sb_segs_sequence(struct ssdfs_recovery_env *env) +{ + u16 seg_type; + u64 cno1, cno2; + u64 cur_peb, next_peb, prev_peb; + u64 cur_leb, next_leb, prev_leb; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->fsi); + BUG_ON(!env->sbi.vh_buf); + + SSDFS_DBG("env %p, env->sbi.vh_buf %p\n", env, env->sbi.vh_buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + seg_type = SSDFS_SEG_TYPE(SSDFS_SEG_HDR(env->sbi.vh_buf)); + if (seg_type != SSDFS_SB_SEG_TYPE) { + SSDFS_DBG("invalid segment type\n"); + return -ENODATA; + } + + cno1 = SSDFS_SEG_CNO(env->sbi_backup.vh_buf); + cno2 = SSDFS_SEG_CNO(env->sbi.vh_buf); + if (cno1 >= cno2) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("last cno %llu is not lesser than read cno %llu\n", + cno1, cno2); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + + next_peb = SSDFS_MAIN_SB_PEB(SSDFS_VH(env->sbi_backup.vh_buf), + SSDFS_NEXT_SB_SEG); + cur_peb = SSDFS_MAIN_SB_PEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_CUR_SB_SEG); + if (next_peb != cur_peb) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("next_peb %llu doesn't equal to cur_peb %llu\n", + next_peb, cur_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + + prev_peb = SSDFS_MAIN_SB_PEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_PREV_SB_SEG); + cur_peb = SSDFS_MAIN_SB_PEB(SSDFS_VH(env->sbi_backup.vh_buf), + SSDFS_CUR_SB_SEG); + if (prev_peb != cur_peb) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("prev_peb %llu doesn't equal to cur_peb %llu\n", + prev_peb, cur_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + + next_leb = SSDFS_MAIN_SB_LEB(SSDFS_VH(env->sbi_backup.vh_buf), + SSDFS_NEXT_SB_SEG); + cur_leb = SSDFS_MAIN_SB_LEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_CUR_SB_SEG); + if (next_leb != cur_leb) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("next_leb %llu doesn't equal to cur_leb %llu\n", + next_leb, cur_leb); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + + prev_leb = SSDFS_MAIN_SB_LEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_PREV_SB_SEG); + cur_leb = SSDFS_MAIN_SB_LEB(SSDFS_VH(env->sbi_backup.vh_buf), + SSDFS_CUR_SB_SEG); + if (prev_leb != cur_leb) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("prev_leb %llu doesn't equal to cur_leb %llu\n", + prev_leb, cur_leb); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + + next_peb = SSDFS_COPY_SB_PEB(SSDFS_VH(env->sbi_backup.vh_buf), + SSDFS_NEXT_SB_SEG); + cur_peb = SSDFS_COPY_SB_PEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_CUR_SB_SEG); + if (next_peb != cur_peb) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("next_peb %llu doesn't equal to cur_peb %llu\n", + next_peb, cur_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + + prev_peb = SSDFS_COPY_SB_PEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_PREV_SB_SEG); + cur_peb = SSDFS_COPY_SB_PEB(SSDFS_VH(env->sbi_backup.vh_buf), + SSDFS_CUR_SB_SEG); + if (prev_peb != cur_peb) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("prev_peb %llu doesn't equal to cur_peb %llu\n", + prev_peb, cur_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + + next_leb = SSDFS_COPY_SB_LEB(SSDFS_VH(env->sbi_backup.vh_buf), + SSDFS_NEXT_SB_SEG); + cur_leb = SSDFS_COPY_SB_LEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_CUR_SB_SEG); + if (next_leb != cur_leb) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("next_leb %llu doesn't equal to cur_leb %llu\n", + next_leb, cur_leb); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + + prev_leb = SSDFS_COPY_SB_LEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_PREV_SB_SEG); + cur_leb = SSDFS_COPY_SB_LEB(SSDFS_VH(env->sbi_backup.vh_buf), + SSDFS_CUR_SB_SEG); + if (prev_leb != cur_leb) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("prev_leb %llu doesn't equal to cur_leb %llu\n", + prev_leb, cur_leb); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + + return 0; +} + +int ssdfs_check_next_sb_pebs_pair(struct ssdfs_recovery_env *env) +{ +#ifdef CONFIG_SSDFS_DEBUG + size_t hdr_size = sizeof(struct ssdfs_segment_header); +#endif /* CONFIG_SSDFS_DEBUG */ + u64 next_leb; + u64 next_peb; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->found || !env->fsi); + BUG_ON(!env->sbi.vh_buf); + BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(env->sbi.vh_buf)->magic)); + BUG_ON(!is_ssdfs_volume_header_csum_valid(env->sbi.vh_buf, hdr_size)); + + SSDFS_DBG("env %p, env->sbi.vh_buf %p, " + "env->start_peb %llu, env->pebs_count %u\n", + env, env->sbi.vh_buf, + env->found->start_peb, env->found->pebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + next_leb = SSDFS_MAIN_SB_LEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_NEXT_SB_SEG); + next_peb = SSDFS_MAIN_SB_PEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_NEXT_SB_SEG); + if (next_leb == U64_MAX || next_peb == U64_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid next_leb %llu, next_peb %llu\n", + next_leb, next_peb); + goto end_next_peb_check; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("MAIN: next_leb %llu, next_peb %llu\n", + next_leb, next_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (next_peb < env->found->start_peb || + next_peb >= (env->found->start_peb + env->found->pebs_count)) { + err = -E2BIG; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("next_peb %llu, start_peb %llu, pebs_count %u\n", + next_peb, + env->found->start_peb, + env->found->pebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + goto end_next_peb_check; + } + + ssdfs_backup_sb_info2(env); + + err = ssdfs_read_checked_sb_info3(env, next_peb, 0); + if (!err) { + env->sbi.last_log.leb_id = next_leb; + env->sbi.last_log.peb_id = next_peb; + env->sbi.last_log.page_offset = 0; + env->sbi.last_log.pages_count = + SSDFS_LOG_PAGES(env->sbi.vh_buf); + + err = ssdfs_check_sb_segs_sequence(env); + if (!err) + goto end_next_peb_check; + } + + ssdfs_restore_sb_info2(env); + err = 0; /* try to read the backup copy */ + + next_leb = SSDFS_COPY_SB_LEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_NEXT_SB_SEG); + next_peb = SSDFS_COPY_SB_PEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_NEXT_SB_SEG); + if (next_leb >= U64_MAX || next_peb >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid next_leb %llu, next_peb %llu\n", + next_leb, next_peb); + goto end_next_peb_check; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("COPY: next_leb %llu, next_peb %llu\n", + next_leb, next_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (next_peb < env->found->start_peb || + next_peb >= (env->found->start_peb + env->found->pebs_count)) { + err = -E2BIG; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("next_peb %llu, start_peb %llu, pebs_count %u\n", + next_peb, + env->found->start_peb, + env->found->pebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + goto end_next_peb_check; + } + + err = ssdfs_read_checked_sb_info3(env, next_peb, 0); + if (!err) { + env->sbi.last_log.leb_id = next_leb; + env->sbi.last_log.peb_id = next_peb; + env->sbi.last_log.page_offset = 0; + env->sbi.last_log.pages_count = + SSDFS_LOG_PAGES(env->sbi.vh_buf); + + err = ssdfs_check_sb_segs_sequence(env); + if (!err) + goto end_next_peb_check; + } + + ssdfs_restore_sb_info2(env); + +end_next_peb_check: + return err; +} + +int ssdfs_check_reserved_sb_pebs_pair(struct ssdfs_recovery_env *env) +{ +#ifdef CONFIG_SSDFS_DEBUG + size_t hdr_size = sizeof(struct ssdfs_segment_header); +#endif /* CONFIG_SSDFS_DEBUG */ + u64 reserved_leb; + u64 reserved_peb; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->found || !env->fsi); + BUG_ON(!env->sbi.vh_buf); + BUG_ON(!is_ssdfs_magic_valid(&SSDFS_VH(env->sbi.vh_buf)->magic)); + BUG_ON(!is_ssdfs_volume_header_csum_valid(env->sbi.vh_buf, hdr_size)); + + SSDFS_DBG("env %p, env->sbi.vh_buf %p, " + "start_peb %llu, pebs_count %u\n", + env, env->sbi.vh_buf, + env->found->start_peb, + env->found->pebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + reserved_leb = SSDFS_MAIN_SB_LEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_RESERVED_SB_SEG); + reserved_peb = SSDFS_MAIN_SB_PEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_RESERVED_SB_SEG); + if (reserved_leb >= U64_MAX || reserved_peb >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid reserved_leb %llu, reserved_peb %llu\n", + reserved_leb, reserved_peb); + goto end_reserved_peb_check; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("MAIN: reserved_leb %llu, reserved_peb %llu\n", + reserved_leb, reserved_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (reserved_peb < env->found->start_peb || + reserved_peb >= (env->found->start_peb + env->found->pebs_count)) { + err = -E2BIG; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("reserved_peb %llu, start_peb %llu, pebs_count %u\n", + reserved_peb, + env->found->start_peb, + env->found->pebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + goto end_reserved_peb_check; + } + + ssdfs_backup_sb_info2(env); + + err = ssdfs_read_checked_sb_info3(env, reserved_peb, 0); + if (!err) { + env->sbi.last_log.leb_id = reserved_leb; + env->sbi.last_log.peb_id = reserved_peb; + env->sbi.last_log.page_offset = 0; + env->sbi.last_log.pages_count = + SSDFS_LOG_PAGES(env->sbi.vh_buf); + goto end_reserved_peb_check; + } + + ssdfs_restore_sb_info2(env); + err = 0; /* try to read the backup copy */ + + reserved_leb = SSDFS_COPY_SB_LEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_RESERVED_SB_SEG); + reserved_peb = SSDFS_COPY_SB_PEB(SSDFS_VH(env->sbi.vh_buf), + SSDFS_RESERVED_SB_SEG); + if (reserved_leb >= U64_MAX || reserved_peb >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid reserved_leb %llu, reserved_peb %llu\n", + reserved_leb, reserved_peb); + goto end_reserved_peb_check; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("COPY: reserved_leb %llu, reserved_peb %llu\n", + reserved_leb, reserved_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (reserved_peb < env->found->start_peb || + reserved_peb >= (env->found->start_peb + env->found->pebs_count)) { + err = -E2BIG; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("reserved_peb %llu, start_peb %llu, pebs_count %u\n", + reserved_peb, + env->found->start_peb, + env->found->pebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + goto end_reserved_peb_check; + } + + err = ssdfs_read_checked_sb_info3(env, reserved_peb, 0); + if (!err) { + env->sbi.last_log.leb_id = reserved_leb; + env->sbi.last_log.peb_id = reserved_peb; + env->sbi.last_log.page_offset = 0; + env->sbi.last_log.pages_count = + SSDFS_LOG_PAGES(env->sbi.vh_buf); + goto end_reserved_peb_check; + } + + ssdfs_restore_sb_info2(env); + +end_reserved_peb_check: + return err; +} + +static inline +bool has_recovery_job(struct ssdfs_recovery_env *env) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env); +#endif /* CONFIG_SSDFS_DEBUG */ + + return atomic_read(&env->state) == SSDFS_START_RECOVERY; +} + +int ssdfs_recovery_thread_func(void *data); + +static +struct ssdfs_thread_descriptor recovery_thread = { + .threadfn = ssdfs_recovery_thread_func, + .fmt = "ssdfs-recovery-%u", +}; + +#define RECOVERY_THREAD_WAKE_CONDITION(env) \ + (kthread_should_stop() || has_recovery_job(env)) + +/* + * ssdfs_recovery_thread_func() - main fuction of recovery thread + * @data: pointer on data object + * + * This function is main fuction of recovery thread. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + */ +int ssdfs_recovery_thread_func(void *data) +{ + struct ssdfs_recovery_env *env = data; + wait_queue_head_t *wait_queue; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + if (!env) { + SSDFS_ERR("pointer on environment is NULL\n"); + return -EINVAL; + } + + SSDFS_DBG("recovery thread: env %p\n", env); +#endif /* CONFIG_SSDFS_DEBUG */ + + wait_queue = &env->request_wait_queue; + +repeat: + if (kthread_should_stop()) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("stop recovery thread: env %p\n", env); +#endif /* CONFIG_SSDFS_DEBUG */ + complete_all(&env->thread.full_stop); + return 0; + } + + if (atomic_read(&env->state) != SSDFS_START_RECOVERY) + goto sleep_recovery_thread; + + if (env->found->start_peb >= U64_MAX || + env->found->pebs_count >= U32_MAX) { + err = -EINVAL; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("invalid input: " + "start_peb %llu, pebs_count %u\n", + env->found->start_peb, + env->found->pebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_recovery; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_peb %llu, pebs_count %u\n", + env->found->start_peb, + env->found->pebs_count); + SSDFS_DBG("search_phase %#x\n", + env->found->search_phase); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (env->found->search_phase) { + case SSDFS_RECOVERY_FAST_SEARCH: + err = ssdfs_recovery_try_fast_search(env); + if (err) { + if (kthread_should_stop()) { + err = -ENOENT; + goto finish_recovery; + } + } + break; + + case SSDFS_RECOVERY_SLOW_SEARCH: + err = ssdfs_recovery_try_slow_search(env); + if (err) { + if (kthread_should_stop()) { + err = -ENOENT; + goto finish_recovery; + } + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("search has not been requested: " + "search_phase %#x\n", + env->found->search_phase); + goto finish_recovery; + } + +finish_recovery: + env->err = err; + + if (env->err) + atomic_set(&env->state, SSDFS_RECOVERY_FAILED); + else + atomic_set(&env->state, SSDFS_RECOVERY_FINISHED); + + wake_up_all(&env->result_wait_queue); + +sleep_recovery_thread: + wait_event_interruptible(*wait_queue, + RECOVERY_THREAD_WAKE_CONDITION(env)); + goto repeat; +} + +/* + * ssdfs_recovery_start_thread() - start recovery's thread + * @env: recovery environment + * @id: thread's ID + */ +int ssdfs_recovery_start_thread(struct ssdfs_recovery_env *env, + u32 id) +{ + ssdfs_threadfn threadfn; + const char *fmt; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env); + + SSDFS_DBG("env %p, id %u\n", env, id); +#endif /* CONFIG_SSDFS_DEBUG */ + + threadfn = recovery_thread.threadfn; + fmt = recovery_thread.fmt; + + env->thread.task = kthread_create(threadfn, env, fmt, id); + if (IS_ERR_OR_NULL(env->thread.task)) { + err = (env->thread.task == NULL ? -ENOMEM : + PTR_ERR(env->thread.task)); + if (err == -EINTR) { + /* + * Ignore this error. + */ + } else { + if (err == 0) + err = -ERANGE; + SSDFS_ERR("fail to start recovery thread: " + "id %u, err %d\n", id, err); + } + + return err; + } + + init_waitqueue_head(&env->request_wait_queue); + init_waitqueue_entry(&env->thread.wait, env->thread.task); + add_wait_queue(&env->request_wait_queue, &env->thread.wait); + init_waitqueue_head(&env->result_wait_queue); + init_completion(&env->thread.full_stop); + + wake_up_process(env->thread.task); + + return 0; +} + +/* + * ssdfs_recovery_stop_thread() - stop recovery thread + * @env: recovery environment + */ +int ssdfs_recovery_stop_thread(struct ssdfs_recovery_env *env) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env); + + SSDFS_DBG("env %p\n", env); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!env->thread.task) + return 0; + + err = kthread_stop(env->thread.task); + if (err == -EINTR) { + /* + * Ignore this error. + * The wake_up_process() was never called. + */ + return 0; + } else if (unlikely(err)) { + SSDFS_WARN("thread function had some issue: err %d\n", + err); + return err; + } + + finish_wait(&env->request_wait_queue, &env->thread.wait); + env->thread.task = NULL; + + err = SSDFS_WAIT_COMPLETION(&env->thread.full_stop); + if (unlikely(err)) { + SSDFS_ERR("stop thread fails: err %d\n", err); + return err; + } + + return 0; +} From patchwork Sat Feb 25 01:08:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151914 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7340C7EE32 for ; Sat, 25 Feb 2023 01:16:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229647AbjBYBQI (ORCPT ); Fri, 24 Feb 2023 20:16:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48444 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229674AbjBYBQA (ORCPT ); Fri, 24 Feb 2023 20:16:00 -0500 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EBF1913529 for ; Fri, 24 Feb 2023 17:15:54 -0800 (PST) Received: by mail-oi1-x22e.google.com with SMTP id bk32so762138oib.10 for ; Fri, 24 Feb 2023 17:15:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=YONuv0X0U2Lm9+hiOtlXl27ij+G6kSbWI/eMWoeQ5L0=; b=aYjXBxvZL0LoLfDdi56NlP0Cvt8Y0VOgmUlrASp1Pux/gXCAQLJRSfYQQ3HTt3i8KP KdslexxVPksCewM1CgyhXj/4sqRTWrtj4F9JcgneGPvbsRckUNCduUI9hudGh/2kZ0X/ yvmVGdNbQtQPMMBS5KsugsAFpEo6iuaCK+bhEYC7sSagVfcWnWThEmrh4MiLj3cLlHfW ZK02F6si51rkqxosf9krwzrFymQ6lxSABPbUpFy0eeQqcuTyIvF+jy6aJSmq49qM4c8u N3kcjF+v3wgfTxnsB/zIrhMY2Awh2NGY8yWw9+UpMg/bvav5fd56n1LCGU6R/baRjejY smoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YONuv0X0U2Lm9+hiOtlXl27ij+G6kSbWI/eMWoeQ5L0=; b=t2Wwf0I3HfmKwDyKKUz34QekOiyOINmCZjgrxE6KVu5nK5GtWCP+04syhUEmzFsfmH elnqVEn7FF/LCitvT2Twtm5pKcFecOXgQA530m8AW0QIUEV8ahy+44RLIY6hMnGs+T0C zqQq/CtDF8ws/68TrOlFoXAx8gRCIwJOFiVyMM4LP7q9HfggjN19ACfOeTn+MMcvNZin hWO3NqSyUBMO6rM0fgsDECTD4WkHIY79E2wV2SDmR4/xpBEhun6qAAJtFFQelndWTCe/ iG6NKZ5RRy9dD7mV0dSCrXGyQmOyc8V9ENaF7ErAkiieZRf/tfjZIAmP1cVNNNlufDLZ Pl3g== X-Gm-Message-State: AO0yUKX8ZaFMjDmrrJ+9jJhjLwCzNaqZNCfDXU0PO+2Vsq4u006PAcDG 5U/akOy5HKpUxjB9oOosF6LP68yY0GwyVNoE X-Google-Smtp-Source: AK7set+gDD0aT9SV1hl7kb3XZyOiwyXOrdbSSlpdwAbXAXPKAlKq6WvSH/Jkle4SzYHVKk4idmPsWw== X-Received: by 2002:a05:6808:605:b0:384:219:5691 with SMTP id y5-20020a056808060500b0038402195691mr1417663oih.15.1677287753750; Fri, 24 Feb 2023 17:15:53 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.15.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:15:52 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 09/76] ssdfs: internal array/sequence primitives Date: Fri, 24 Feb 2023 17:08:20 -0800 Message-Id: <20230225010927.813929-10-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Page vector implements simple concept of dynamically growing page set. For example, block bitmap requires 32 memory pages to represent 2GB erase block. Page vector has simple interface: (1) create - allocate page vector's metadata limited by capacity (2) destroy - deallocate page vector's metadata (3) init/reinit - clean metadata and set count to zero (4) allocate - allocate memory page and add to the tail of sequence (5) add - add memory page to the tail of sequence (6) remove - remove a memory page for requested index (7) release - free all pages and remove from page vector Dynamic array implements concept of dynamically growing sequence of fixed-sized items based on page vector primitive. Dynamic array has API: (1) create - create dynamic array for requested capacity and item size (2) destroy - destroy dynamic array (3) get_locked - get item locked for index in array (4) release - release and unlock item for index (5) set - set item for index (6) copy_content - copy content of dynamic array in buffer Sequence array is specialized structure that has goal to provide access to items via pointers on the basis of ID numbers. It means that every item has dedicated ID but sequence array could contain only some portion of existing items. Initialization phase has goal to add some limited number of existing items into the sequence array. The ID number could be reverted from some maximum number (threshold) to zero value. Sequence array has API: (1) create - create sequence array (2) destroy - destroy sequence array (3) init_item - init item for requested ID (4) add_item - add item to the tail of sequence (5) get_item - get pointer on item for requested ID (6) apply_for_all - apply an action/function for all items (7) change_state - change item state for requested ID (8) change_all_state - change state of all items in sequence Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/dynamic_array.c | 781 ++++++++++++++++++++++++++++++++++++++ fs/ssdfs/dynamic_array.h | 96 +++++ fs/ssdfs/page_vector.c | 437 +++++++++++++++++++++ fs/ssdfs/page_vector.h | 64 ++++ fs/ssdfs/sequence_array.c | 639 +++++++++++++++++++++++++++++++ fs/ssdfs/sequence_array.h | 119 ++++++ 6 files changed, 2136 insertions(+) create mode 100644 fs/ssdfs/dynamic_array.c create mode 100644 fs/ssdfs/dynamic_array.h create mode 100644 fs/ssdfs/page_vector.c create mode 100644 fs/ssdfs/page_vector.h create mode 100644 fs/ssdfs/sequence_array.c create mode 100644 fs/ssdfs/sequence_array.h diff --git a/fs/ssdfs/dynamic_array.c b/fs/ssdfs/dynamic_array.c new file mode 100644 index 000000000000..ae7e121f61d0 --- /dev/null +++ b/fs/ssdfs/dynamic_array.c @@ -0,0 +1,781 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/dymanic_array.c - dynamic array implementation. + * + * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates. + * https://www.bytedance.com/ + * Copyright (c) 2022-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cong Wang + */ + +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "dynamic_array.h" + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_dynamic_array_page_leaks; +atomic64_t ssdfs_dynamic_array_memory_leaks; +atomic64_t ssdfs_dynamic_array_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_dynamic_array_cache_leaks_increment(void *kaddr) + * void ssdfs_dynamic_array_cache_leaks_decrement(void *kaddr) + * void *ssdfs_dynamic_array_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_dynamic_array_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_dynamic_array_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_dynamic_array_kfree(void *kaddr) + * struct page *ssdfs_dynamic_array_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_dynamic_array_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_dynamic_array_free_page(struct page *page) + * void ssdfs_dynamic_array_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(dynamic_array) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(dynamic_array) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_dynamic_array_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_dynamic_array_page_leaks, 0); + atomic64_set(&ssdfs_dynamic_array_memory_leaks, 0); + atomic64_set(&ssdfs_dynamic_array_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_dynamic_array_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_dynamic_array_page_leaks) != 0) { + SSDFS_ERR("DYNAMIC ARRAY: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_dynamic_array_page_leaks)); + } + + if (atomic64_read(&ssdfs_dynamic_array_memory_leaks) != 0) { + SSDFS_ERR("DYNAMIC ARRAY: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_dynamic_array_memory_leaks)); + } + + if (atomic64_read(&ssdfs_dynamic_array_cache_leaks) != 0) { + SSDFS_ERR("DYNAMIC ARRAY: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_dynamic_array_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +/* + * ssdfs_dynamic_array_create() - create dynamic array + * @array: pointer on dynamic array object + * @capacity: maximum number of items in array + * @item_size: item size in bytes + * @alloc_pattern: pattern to init memory pages + */ +int ssdfs_dynamic_array_create(struct ssdfs_dynamic_array *array, + u32 capacity, size_t item_size, + u8 alloc_pattern) +{ + struct page *page; + u64 max_threshold = (u64)ssdfs_page_vector_max_threshold() * PAGE_SIZE; + u32 pages_count; + u64 bytes_count; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); + + SSDFS_DBG("array %p, capacity %u, item_size %zu\n", + array, capacity, item_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + array->state = SSDFS_DYNAMIC_ARRAY_STORAGE_ABSENT; + array->alloc_pattern = alloc_pattern; + + if (capacity == 0) { + SSDFS_ERR("invalid capacity %u\n", + capacity); + return -EINVAL; + } + + if (item_size == 0 || item_size > PAGE_SIZE) { + SSDFS_ERR("invalid item_size %zu\n", + item_size); + return -EINVAL; + } + + array->capacity = capacity; + array->item_size = item_size; + array->items_per_mem_page = PAGE_SIZE / item_size; + + pages_count = capacity + array->items_per_mem_page - 1; + pages_count /= array->items_per_mem_page; + + if (pages_count == 0) + pages_count = 1; + + bytes_count = (u64)capacity * item_size; + + if (bytes_count > max_threshold) { + SSDFS_ERR("invalid request: " + "bytes_count %llu > max_threshold %llu, " + "capacity %u, item_size %zu\n", + bytes_count, max_threshold, + capacity, item_size); + return -EINVAL; + } + + if (bytes_count > PAGE_SIZE) { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(pages_count >= ssdfs_page_vector_max_threshold()); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_page_vector_create(&array->pvec, pages_count); + if (unlikely(err)) { + SSDFS_ERR("fail to create page vector: " + "bytes_count %llu, pages_count %u, " + "err %d\n", + bytes_count, pages_count, err); + return err; + } + + err = ssdfs_page_vector_init(&array->pvec); + if (unlikely(err)) { + ssdfs_page_vector_destroy(&array->pvec); + SSDFS_ERR("fail to init page vector: " + "bytes_count %llu, pages_count %u, " + "err %d\n", + bytes_count, pages_count, err); + return err; + } + + page = ssdfs_page_vector_allocate(&array->pvec); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("unable to allocate page\n"); + return err; + } + + ssdfs_lock_page(page); + ssdfs_memset_page(page, 0, PAGE_SIZE, + array->alloc_pattern, PAGE_SIZE); + ssdfs_unlock_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + array->bytes_count = PAGE_SIZE; + array->state = SSDFS_DYNAMIC_ARRAY_STORAGE_PAGE_VEC; + } else { + array->buf = ssdfs_dynamic_array_kzalloc(bytes_count, + GFP_KERNEL); + if (!array->buf) { + SSDFS_ERR("fail to allocate memory: " + "bytes_count %llu\n", + bytes_count); + return -ENOMEM; + } + + memset(array->buf, array->alloc_pattern, bytes_count); + + array->bytes_count = bytes_count; + array->state = SSDFS_DYNAMIC_ARRAY_STORAGE_BUFFER; + } + + return 0; +} + +/* + * ssdfs_dynamic_array_destroy() - destroy dynamic array + * @array: pointer on dynamic array object + */ +void ssdfs_dynamic_array_destroy(struct ssdfs_dynamic_array *array) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); + + SSDFS_DBG("array %p, capacity %u, " + "item_size %zu, bytes_count %u\n", + array, array->capacity, + array->item_size, array->bytes_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (array->state) { + case SSDFS_DYNAMIC_ARRAY_STORAGE_PAGE_VEC: + ssdfs_page_vector_release(&array->pvec); + ssdfs_page_vector_destroy(&array->pvec); + break; + + case SSDFS_DYNAMIC_ARRAY_STORAGE_BUFFER: + if (array->buf) + ssdfs_dynamic_array_kfree(array->buf); + break; + + default: + SSDFS_WARN("unexpected state %#x\n", array->state); + break; + } + + array->capacity = 0; + array->item_size = 0; + array->items_per_mem_page = 0; + array->bytes_count = 0; + array->state = SSDFS_DYNAMIC_ARRAY_STORAGE_ABSENT; +} + +/* + * ssdfs_dynamic_array_get_locked() - get locked item + * @array: pointer on dynamic array object + * @index: item index + * + * This method tries to get pointer on item. If short buffer + * (< 4K) represents dynamic array, then the logic is pretty + * straitforward. Otherwise, memory page is locked. The release + * method should be called to unlock memory page. + * + * RETURN: + * [success] - pointer on requested item. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-E2BIG - request is out of array capacity. + * %-ERANGE - internal error. + */ +void *ssdfs_dynamic_array_get_locked(struct ssdfs_dynamic_array *array, + u32 index) +{ + struct page *page; + void *ptr = NULL; + u64 max_threshold = (u64)ssdfs_page_vector_max_threshold() * PAGE_SIZE; + u64 item_offset = 0; + u64 page_index; + u32 page_off; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); + + SSDFS_DBG("array %p, index %u, capacity %u, " + "item_size %zu, bytes_count %u\n", + array, index, array->capacity, + array->item_size, array->bytes_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (array->state) { + case SSDFS_DYNAMIC_ARRAY_STORAGE_PAGE_VEC: + case SSDFS_DYNAMIC_ARRAY_STORAGE_BUFFER: + /* continue logic */ + break; + + default: + SSDFS_WARN("unexpected state %#x\n", array->state); + return ERR_PTR(-ERANGE); + } + + if (array->item_size == 0 || array->item_size > PAGE_SIZE) { + SSDFS_ERR("invalid item_size %zu\n", + array->item_size); + return ERR_PTR(-ERANGE); + } + + if (array->capacity == 0) { + SSDFS_ERR("invalid capacity %u\n", + array->capacity); + return ERR_PTR(-ERANGE); + } + + if (array->bytes_count == 0) { + SSDFS_ERR("invalid bytes_count %u\n", + array->bytes_count); + return ERR_PTR(-ERANGE); + } + + if (index >= array->capacity) { + SSDFS_WARN("invalid index: index %u, capacity %u\n", + index, array->capacity); + return ERR_PTR(-ERANGE); + } + + item_offset = (u64)array->item_size * index; + + if (item_offset >= max_threshold) { + SSDFS_ERR("invalid item_offset: " + "index %u, item_size %zu, " + "item_offset %llu, bytes_count %u, " + "max_threshold %llu\n", + index, array->item_size, + item_offset, array->bytes_count, + max_threshold); + return ERR_PTR(-E2BIG); + } + + switch (array->state) { + case SSDFS_DYNAMIC_ARRAY_STORAGE_PAGE_VEC: + page_index = index / array->items_per_mem_page; + page_off = index % array->items_per_mem_page; + page_off *= array->item_size; + + if (page_index >= ssdfs_page_vector_capacity(&array->pvec)) { + SSDFS_ERR("invalid page index: " + "page_index %llu, item_offset %llu\n", + page_index, item_offset); + return ERR_PTR(-E2BIG); + } + + while (page_index >= ssdfs_page_vector_count(&array->pvec)) { + page = ssdfs_page_vector_allocate(&array->pvec); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("unable to allocate page\n"); + return ERR_PTR(err); + } + + ssdfs_lock_page(page); + ssdfs_memset_page(page, 0, PAGE_SIZE, + array->alloc_pattern, PAGE_SIZE); + ssdfs_unlock_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + array->bytes_count += PAGE_SIZE; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("array %p, index %u, capacity %u, " + "item_size %zu, bytes_count %u, " + "index %u, item_offset %llu, " + "page_index %llu, page_count %u\n", + array, index, array->capacity, + array->item_size, array->bytes_count, + index, item_offset, page_index, + ssdfs_page_vector_count(&array->pvec)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + page = array->pvec.pages[page_index]; + + ssdfs_lock_page(page); + ptr = kmap_local_page(page); + ptr = (u8 *)ptr + page_off; + break; + + case SSDFS_DYNAMIC_ARRAY_STORAGE_BUFFER: + ptr = (u8 *)array->buf + item_offset; + break; + + default: + SSDFS_WARN("unexpected state %#x\n", array->state); + return ERR_PTR(-ERANGE); + } + + return ptr; +} + +/* + * ssdfs_dynamic_array_release() - release item + * @array: pointer on dynamic array object + * @index: item index + * @ptr: pointer on item + * + * This method tries to release item pointer. + * + * RETURN: + * [success] - pointer on requested item. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-E2BIG - request is out of array capacity. + * %-ERANGE - internal error. + */ +int ssdfs_dynamic_array_release(struct ssdfs_dynamic_array *array, + u32 index, void *ptr) +{ + struct page *page; + u64 item_offset = 0; + u64 page_index; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array || !ptr); + + SSDFS_DBG("array %p, index %u, capacity %u, " + "item_size %zu, bytes_count %u\n", + array, index, array->capacity, + array->item_size, array->bytes_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (array->state) { + case SSDFS_DYNAMIC_ARRAY_STORAGE_PAGE_VEC: + /* continue logic */ + break; + + case SSDFS_DYNAMIC_ARRAY_STORAGE_BUFFER: + /* do nothing */ + return 0; + + default: + SSDFS_WARN("unexpected state %#x\n", array->state); + return -ERANGE; + } + + if (array->item_size == 0 || array->item_size > PAGE_SIZE) { + SSDFS_ERR("invalid item_size %zu\n", + array->item_size); + return -ERANGE; + } + + if (array->capacity == 0) { + SSDFS_ERR("invalid capacity %u\n", + array->capacity); + return -ERANGE; + } + + if (array->bytes_count == 0) { + SSDFS_ERR("invalid bytes_count %u\n", + array->bytes_count); + return -ERANGE; + } + + if (index >= array->capacity) { + SSDFS_ERR("invalid index: index %u, capacity %u\n", + index, array->capacity); + return -ERANGE; + } + + item_offset = (u64)array->item_size * index; + + if (item_offset >= array->bytes_count) { + SSDFS_ERR("invalid item_offset: " + "index %u, item_size %zu, " + "item_offset %llu, bytes_count %u\n", + index, array->item_size, + item_offset, array->bytes_count); + return -E2BIG; + } + + page_index = index / array->items_per_mem_page; + + if (page_index >= ssdfs_page_vector_count(&array->pvec)) { + SSDFS_ERR("invalid page index: " + "page_index %llu, item_offset %llu\n", + page_index, item_offset); + return -E2BIG; + } + + page = array->pvec.pages[page_index]; + + kunmap_local(ptr); + ssdfs_unlock_page(page); + + return 0; +} + +/* + * ssdfs_dynamic_array_set() - store item into dynamic array + * @array: pointer on dynamic array object + * @index: item index + * @item: pointer on item + * + * This method tries to store item into dynamic array. + * + * RETURN: + * [success] - pointer on requested item. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-E2BIG - request is out of array capacity. + * %-ERANGE - internal error. + */ +int ssdfs_dynamic_array_set(struct ssdfs_dynamic_array *array, + u32 index, void *item) +{ + struct page *page; + void *kaddr = NULL; + u64 max_threshold = (u64)ssdfs_page_vector_max_threshold() * PAGE_SIZE; + u64 item_offset = 0; + u64 page_index; + u32 page_off; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array || !item); + + SSDFS_DBG("array %p, index %u, capacity %u, " + "item_size %zu, bytes_count %u\n", + array, index, array->capacity, + array->item_size, array->bytes_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (array->state) { + case SSDFS_DYNAMIC_ARRAY_STORAGE_PAGE_VEC: + case SSDFS_DYNAMIC_ARRAY_STORAGE_BUFFER: + /* continue logic */ + break; + + default: + SSDFS_WARN("unexpected state %#x\n", array->state); + return -ERANGE; + } + + if (array->item_size == 0 || array->item_size > PAGE_SIZE) { + SSDFS_ERR("invalid item_size %zu\n", + array->item_size); + return -ERANGE; + } + + if (array->capacity == 0) { + SSDFS_ERR("invalid capacity %u\n", + array->capacity); + return -ERANGE; + } + + if (array->bytes_count == 0) { + SSDFS_ERR("invalid bytes_count %u\n", + array->bytes_count); + return -ERANGE; + } + + if (index >= array->capacity) { + SSDFS_ERR("invalid index: index %u, capacity %u\n", + index, array->capacity); + return -ERANGE; + } + + item_offset = (u64)array->item_size * index; + + if (item_offset >= max_threshold) { + SSDFS_ERR("invalid item_offset: " + "index %u, item_size %zu, " + "item_offset %llu, bytes_count %u, " + "max_threshold %llu\n", + index, array->item_size, + item_offset, array->bytes_count, + max_threshold); + return -E2BIG; + } + + switch (array->state) { + case SSDFS_DYNAMIC_ARRAY_STORAGE_PAGE_VEC: + page_index = index / array->items_per_mem_page; + page_off = index % array->items_per_mem_page;; + page_off *= array->item_size; + + if (page_index >= ssdfs_page_vector_capacity(&array->pvec)) { + SSDFS_ERR("invalid page index: " + "page_index %llu, item_offset %llu\n", + page_index, item_offset); + return -E2BIG; + } + + while (page_index >= ssdfs_page_vector_count(&array->pvec)) { + page = ssdfs_page_vector_allocate(&array->pvec); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("unable to allocate page\n"); + return err; + } + + ssdfs_lock_page(page); + ssdfs_memset_page(page, 0, PAGE_SIZE, + array->alloc_pattern, PAGE_SIZE); + ssdfs_unlock_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + array->bytes_count += PAGE_SIZE; + } + + page = array->pvec.pages[page_index]; + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + err = ssdfs_memcpy(kaddr, page_off, PAGE_SIZE, + item, 0, array->item_size, + array->item_size); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + break; + + case SSDFS_DYNAMIC_ARRAY_STORAGE_BUFFER: + err = ssdfs_memcpy(array->buf, item_offset, array->bytes_count, + item, 0, array->item_size, + array->item_size); + break; + + default: + SSDFS_WARN("unexpected state %#x\n", array->state); + return -ERANGE; + } + + if (unlikely(err)) { + SSDFS_ERR("fail to set item: index %u, err %d\n", + index, err); + } + + return err; +} + +/* + * ssdfs_dynamic_array_copy_content() - copy the whole dynamic array + * @array: pointer on dynamic array object + * @copy_buf: pointer on copy buffer + * @buf_size: size of the buffer in bytes + * + * This method tries to copy the whole content of dynamic array. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_dynamic_array_copy_content(struct ssdfs_dynamic_array *array, + void *copy_buf, size_t buf_size) +{ + struct page *page; + u32 copied_bytes = 0; + u32 pages_count; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array || !copy_buf); + + SSDFS_DBG("array %p, capacity %u, " + "item_size %zu, bytes_count %u, " + "copy_buf %p, buf_size %zu\n", + array, array->capacity, + array->item_size, array->bytes_count, + copy_buf, buf_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (array->state) { + case SSDFS_DYNAMIC_ARRAY_STORAGE_PAGE_VEC: + case SSDFS_DYNAMIC_ARRAY_STORAGE_BUFFER: + /* continue logic */ + break; + + default: + SSDFS_WARN("unexpected state %#x\n", array->state); + return -ERANGE; + } + + if (array->bytes_count == 0) { + SSDFS_ERR("invalid bytes_count %u\n", + array->bytes_count); + return -ERANGE; + } + + switch (array->state) { + case SSDFS_DYNAMIC_ARRAY_STORAGE_PAGE_VEC: + pages_count = ssdfs_page_vector_count(&array->pvec); + + for (i = 0; i < pages_count; i++) { + size_t bytes_count; + + if (copied_bytes >= buf_size) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("stop copy: " + "copied_bytes %u, " + "buf_size %zu, " + "array->bytes_count %u, " + "pages_count %u\n", + copied_bytes, + buf_size, + array->bytes_count, + pages_count); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + } + + page = array->pvec.pages[i]; + + if (!page) { + err = -ERANGE; + SSDFS_ERR("fail to copy content: " + "copied_bytes %u, " + "array->bytes_count %u, " + "page_index %d, " + "pages_count %u\n", + copied_bytes, + array->bytes_count, + i, pages_count); + goto finish_copy_content; + } + + bytes_count = + array->item_size * array->items_per_mem_page; + bytes_count = min_t(size_t, bytes_count, + buf_size - copied_bytes); + + err = ssdfs_memcpy_from_page(copy_buf, + copied_bytes, + buf_size, + page, + 0, + PAGE_SIZE, + bytes_count); + if (unlikely(err)) { + SSDFS_ERR("fail to copy content: " + "copied_bytes %u, " + "array->bytes_count %u, " + "page_index %d, " + "pages_count %u, " + "err %d\n", + copied_bytes, + array->bytes_count, + i, pages_count, + err); + goto finish_copy_content; + } + + copied_bytes += bytes_count; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("array %p, capacity %u, " + "item_size %zu, bytes_count %u, " + "page_index %d, pages_count %u, " + "bytes_count %zu, copied_bytes %u\n", + array, array->capacity, + array->item_size, array->bytes_count, + i, pages_count, bytes_count, copied_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + case SSDFS_DYNAMIC_ARRAY_STORAGE_BUFFER: + err = ssdfs_memcpy(copy_buf, 0, buf_size, + array->buf, 0, array->bytes_count, + array->bytes_count); + break; + + default: + BUG(); + break; + } + +finish_copy_content: + return err; +} diff --git a/fs/ssdfs/dynamic_array.h b/fs/ssdfs/dynamic_array.h new file mode 100644 index 000000000000..3bb73510f389 --- /dev/null +++ b/fs/ssdfs/dynamic_array.h @@ -0,0 +1,96 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/dynamic_array.h - dynamic array's declarations. + * + * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates. + * https://www.bytedance.com/ + * Copyright (c) 2022-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cong Wang + */ + +#ifndef _SSDFS_DYNAMIC_ARRAY_H +#define _SSDFS_DYNAMIC_ARRAY_H + +#include "page_vector.h" + +/* + * struct ssdfs_dynamic_array - dynamic array + * @state: array state + * @item_size: size of item in bytes + * @items_per_mem_page: number of items per memory page + * @capacity: maximum available items count + * @bytes_count: currently allocated bytes count + * @alloc_pattern: pattern to init memory pages + * @pvec: vector of pages + * @buf: pointer on memory buffer + */ +struct ssdfs_dynamic_array { + int state; + size_t item_size; + u32 items_per_mem_page; + u32 capacity; + u32 bytes_count; + u8 alloc_pattern; + struct ssdfs_page_vector pvec; + void *buf; +}; + +/* Dynamic array's states */ +enum { + SSDFS_DYNAMIC_ARRAY_STORAGE_ABSENT, + SSDFS_DYNAMIC_ARRAY_STORAGE_PAGE_VEC, + SSDFS_DYNAMIC_ARRAY_STORAGE_BUFFER, + SSDFS_DYNAMIC_ARRAY_STORAGE_STATE_MAX +}; + +/* + * Inline functions + */ + +static inline +u32 ssdfs_dynamic_array_allocated_bytes(struct ssdfs_dynamic_array *array) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); +#endif /* CONFIG_SSDFS_DEBUG */ + + return array->bytes_count; +} + +static inline +u32 ssdfs_dynamic_array_items_count(struct ssdfs_dynamic_array *array) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (array->bytes_count == 0 || array->item_size == 0) + return 0; + + return array->bytes_count / array->item_size; +} + +/* + * Dynamic array's API + */ +int ssdfs_dynamic_array_create(struct ssdfs_dynamic_array *array, + u32 capacity, size_t item_size, + u8 alloc_pattern); +void ssdfs_dynamic_array_destroy(struct ssdfs_dynamic_array *array); +void *ssdfs_dynamic_array_get_locked(struct ssdfs_dynamic_array *array, + u32 index); +int ssdfs_dynamic_array_release(struct ssdfs_dynamic_array *array, + u32 index, void *ptr); +int ssdfs_dynamic_array_set(struct ssdfs_dynamic_array *array, + u32 index, void *ptr); +int ssdfs_dynamic_array_copy_content(struct ssdfs_dynamic_array *array, + void *copy_buf, size_t buf_size); + +#endif /* _SSDFS_DYNAMIC_ARRAY_H */ diff --git a/fs/ssdfs/page_vector.c b/fs/ssdfs/page_vector.c new file mode 100644 index 000000000000..b130d99df31b --- /dev/null +++ b/fs/ssdfs/page_vector.c @@ -0,0 +1,437 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/page_vector.c - page vector implementation. + * + * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates. + * https://www.bytedance.com/ + * Copyright (c) 2022-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cong Wang + */ + +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "page_vector.h" + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_page_vector_page_leaks; +atomic64_t ssdfs_page_vector_memory_leaks; +atomic64_t ssdfs_page_vector_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_page_vector_cache_leaks_increment(void *kaddr) + * void ssdfs_page_vector_cache_leaks_decrement(void *kaddr) + * void *ssdfs_page_vector_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_page_vector_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_page_vector_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_page_vector_kfree(void *kaddr) + * struct page *ssdfs_page_vector_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_page_vector_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_page_vector_free_page(struct page *page) + * void ssdfs_page_vector_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(page_vector) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(page_vector) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_page_vector_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_page_vector_page_leaks, 0); + atomic64_set(&ssdfs_page_vector_memory_leaks, 0); + atomic64_set(&ssdfs_page_vector_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_page_vector_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_page_vector_page_leaks) != 0) { + SSDFS_ERR("PAGE VECTOR: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_page_vector_page_leaks)); + } + + if (atomic64_read(&ssdfs_page_vector_memory_leaks) != 0) { + SSDFS_ERR("PAGE VECTOR: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_page_vector_memory_leaks)); + } + + if (atomic64_read(&ssdfs_page_vector_cache_leaks) != 0) { + SSDFS_ERR("PAGE VECTOR: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_page_vector_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +/* + * ssdfs_page_vector_create() - create page vector + * @array: pointer on page vector + * @capacity: max number of memory pages in vector + */ +int ssdfs_page_vector_create(struct ssdfs_page_vector *array, + u32 capacity) +{ + size_t size = sizeof(struct page *); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); +#endif /* CONFIG_SSDFS_DEBUG */ + + array->count = 0; + array->capacity = 0; + + size *= capacity; + array->pages = ssdfs_page_vector_kzalloc(size, GFP_KERNEL); + if (!array->pages) { + SSDFS_ERR("fail to allocate memory: size %zu\n", + size); + return -ENOMEM; + } + + array->capacity = capacity; + + return 0; +} + +/* + * ssdfs_page_vector_destroy() - destroy page vector + * @array: pointer on page vector + */ +void ssdfs_page_vector_destroy(struct ssdfs_page_vector *array) +{ +#ifdef CONFIG_SSDFS_DEBUG + int i; + + BUG_ON(!array); + + if (array->count > 0) { + SSDFS_ERR("invalid state: count %u\n", + array->count); + } + + for (i = 0; i < array->capacity; i++) { + struct page *page = array->pages[i]; + + if (page) + SSDFS_ERR("page %d is not released\n", i); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + array->count = 0; + + if (array->pages) { +#ifdef CONFIG_SSDFS_DEBUG + if (array->capacity == 0) { + SSDFS_ERR("invalid state: capacity %u\n", + array->capacity); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + array->capacity = 0; + ssdfs_page_vector_kfree(array->pages); + array->pages = NULL; + } +} + +/* + * ssdfs_page_vector_init() - init page vector + * @array: pointer on page vector + */ +int ssdfs_page_vector_init(struct ssdfs_page_vector *array) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); + + if (!array->pages) { + SSDFS_ERR("fail to init\n"); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + array->count = 0; + + if (array->capacity == 0) { + SSDFS_ERR("invalid state: capacity %u\n", + array->capacity); + return -ERANGE; + } else { + memset(array->pages, 0, + sizeof(struct page *) * array->capacity); + } + + return 0; +} + +/* + * ssdfs_page_vector_reinit() - reinit page vector + * @array: pointer on page vector + */ +int ssdfs_page_vector_reinit(struct ssdfs_page_vector *array) +{ +#ifdef CONFIG_SSDFS_DEBUG + int i; + + BUG_ON(!array); + + if (!array->pages) { + SSDFS_ERR("fail to reinit\n"); + return -ERANGE; + } + + for (i = 0; i < array->capacity; i++) { + struct page *page = array->pages[i]; + + if (page) + SSDFS_WARN("page %d is not released\n", i); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + array->count = 0; + + if (array->capacity == 0) { + SSDFS_ERR("invalid state: capacity %u\n", + array->capacity); + return -ERANGE; + } else { + memset(array->pages, 0, + sizeof(struct page *) * array->capacity); + } + + return 0; +} + +/* + * ssdfs_page_vector_count() - count of pages in page vector + * @array: pointer on page vector + */ +u32 ssdfs_page_vector_count(struct ssdfs_page_vector *array) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); +#endif /* CONFIG_SSDFS_DEBUG */ + + return array->count; +} + +/* + * ssdfs_page_vector_space() - free space in page vector + * @array: pointer on page vector + */ +u32 ssdfs_page_vector_space(struct ssdfs_page_vector *array) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); + + if (array->count > array->capacity) { + SSDFS_ERR("count %u is bigger than max %u\n", + array->count, array->capacity); + return 0; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + return array->capacity - array->count; +} + +/* + * ssdfs_page_vector_capacity() - capacity of page vector + * @array: pointer on page vector + */ +u32 ssdfs_page_vector_capacity(struct ssdfs_page_vector *array) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); +#endif /* CONFIG_SSDFS_DEBUG */ + + return array->capacity; +} + +/* + * ssdfs_page_vector_add() - add page in page vector + * @array: pointer on page vector + * @page: memory page + */ +int ssdfs_page_vector_add(struct ssdfs_page_vector *array, + struct page *page) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array || !page); + + if (array->count >= array->capacity) { + SSDFS_ERR("array is full: count %u\n", + array->count); + return -ENOSPC; + } + + if (!array->pages) { + SSDFS_ERR("fail to add page: " + "count %u, capacity %u\n", + array->count, array->capacity); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + array->pages[array->count] = page; + array->count++; + + ssdfs_page_vector_account_page(page); + + return 0; +} + +/* + * ssdfs_page_vector_allocate() - allocate + add page + * @array: pointer on page vector + */ +struct page *ssdfs_page_vector_allocate(struct ssdfs_page_vector *array) +{ + struct page *page; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (ssdfs_page_vector_space(array) == 0) { + SSDFS_ERR("page vector hasn't space\n"); + return ERR_PTR(-E2BIG); + } + + page = ssdfs_page_vector_alloc_page(GFP_KERNEL | __GFP_ZERO); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("unable to allocate memory page\n"); + return ERR_PTR(err); + } + + /* + * ssdfs_page_vector_add() accounts page + */ + ssdfs_page_vector_forget_page(page); + + err = ssdfs_page_vector_add(array, page); + if (unlikely(err)) { + SSDFS_ERR("fail to add page: err %d\n", + err); + ssdfs_free_page(page); + return ERR_PTR(err); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("array %p, page vector count %u\n", + array->pages, ssdfs_page_vector_count(array)); + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_DBG("page %p, allocated_pages %lld\n", + page, atomic64_read(&ssdfs_page_vector_page_leaks)); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +#endif /* CONFIG_SSDFS_DEBUG */ + + return page; +} + +/* + * ssdfs_page_vector_remove() - remove page + * @array: pointer on page vector + * @page_index: index of the page + */ +struct page *ssdfs_page_vector_remove(struct ssdfs_page_vector *array, + u32 page_index) +{ + struct page *page; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (ssdfs_page_vector_count(array) == 0) { + SSDFS_ERR("page vector is empty\n"); + return ERR_PTR(-ENODATA); + } + + if (array->count > array->capacity) { + SSDFS_ERR("page vector is corrupted: " + "array->count %u, array->capacity %u\n", + array->count, array->capacity); + return ERR_PTR(-ERANGE); + } + + if (page_index >= array->count) { + SSDFS_ERR("page index is out of range: " + "page_index %u, array->count %u\n", + page_index, array->count); + return ERR_PTR(-ENOENT); + } + + page = array->pages[page_index]; + + if (!page) { + SSDFS_ERR("page index is absent: " + "page_index %u, array->count %u\n", + page_index, array->count); + return ERR_PTR(-ENOENT); + } + + ssdfs_page_vector_forget_page(page); + array->pages[page_index] = NULL; + + return page; +} + +/* + * ssdfs_page_vector_release() - release pages from page vector + * @array: pointer on page vector + */ +void ssdfs_page_vector_release(struct ssdfs_page_vector *array) +{ + struct page *page; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); + + if (!array->pages) { + SSDFS_ERR("fail to release: " + "count %u, capacity %u\n", + array->count, array->capacity); + return; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < ssdfs_page_vector_count(array); i++) { + page = array->pages[i]; + + if (!page) + continue; + + ssdfs_page_vector_free_page(page); + array->pages[i] = NULL; + +#ifdef CONFIG_SSDFS_DEBUG +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_DBG("page %p, allocated_pages %lld\n", + page, + atomic64_read(&ssdfs_page_vector_page_leaks)); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +#endif /* CONFIG_SSDFS_DEBUG */ + } + + ssdfs_page_vector_reinit(array); +} diff --git a/fs/ssdfs/page_vector.h b/fs/ssdfs/page_vector.h new file mode 100644 index 000000000000..4a4a6bcaed32 --- /dev/null +++ b/fs/ssdfs/page_vector.h @@ -0,0 +1,64 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/page_vector.h - page vector's declarations. + * + * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates. + * https://www.bytedance.com/ + * Copyright (c) 2022-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cong Wang + */ + +#ifndef _SSDFS_PAGE_VECTOR_H +#define _SSDFS_PAGE_VECTOR_H + +/* + * struct ssdfs_page_vector - vector of memory pages + * @count: current number of pages in vector + * @capacity: max number of pages in vector + * @pages: array of pointers on pages + */ +struct ssdfs_page_vector { + u32 count; + u32 capacity; + struct page **pages; +}; + +/* + * Inline functions + */ + +/* + * ssdfs_page_vector_max_threshold() - maximum possible capacity + */ +static inline +u32 ssdfs_page_vector_max_threshold(void) +{ + return S32_MAX; +} + +/* + * Page vector's API + */ +int ssdfs_page_vector_create(struct ssdfs_page_vector *array, + u32 capacity); +void ssdfs_page_vector_destroy(struct ssdfs_page_vector *array); +int ssdfs_page_vector_init(struct ssdfs_page_vector *array); +int ssdfs_page_vector_reinit(struct ssdfs_page_vector *array); +u32 ssdfs_page_vector_count(struct ssdfs_page_vector *array); +u32 ssdfs_page_vector_space(struct ssdfs_page_vector *array); +u32 ssdfs_page_vector_capacity(struct ssdfs_page_vector *array); +struct page *ssdfs_page_vector_allocate(struct ssdfs_page_vector *array); +int ssdfs_page_vector_add(struct ssdfs_page_vector *array, + struct page *page); +struct page *ssdfs_page_vector_remove(struct ssdfs_page_vector *array, + u32 page_index); +void ssdfs_page_vector_release(struct ssdfs_page_vector *array); + +#endif /* _SSDFS_PAGE_VECTOR_H */ diff --git a/fs/ssdfs/sequence_array.c b/fs/ssdfs/sequence_array.c new file mode 100644 index 000000000000..696fb88ab208 --- /dev/null +++ b/fs/ssdfs/sequence_array.c @@ -0,0 +1,639 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/sequence_array.c - sequence array implementation. + * + * Copyright (c) 2019-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + */ + +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "sequence_array.h" + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_seq_arr_page_leaks; +atomic64_t ssdfs_seq_arr_memory_leaks; +atomic64_t ssdfs_seq_arr_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_seq_arr_cache_leaks_increment(void *kaddr) + * void ssdfs_seq_arr_cache_leaks_decrement(void *kaddr) + * void *ssdfs_seq_arr_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_seq_arr_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_seq_arr_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_seq_arr_kfree(void *kaddr) + * struct page *ssdfs_seq_arr_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_seq_arr_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_seq_arr_free_page(struct page *page) + * void ssdfs_seq_arr_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(seq_arr) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(seq_arr) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_seq_arr_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_seq_arr_page_leaks, 0); + atomic64_set(&ssdfs_seq_arr_memory_leaks, 0); + atomic64_set(&ssdfs_seq_arr_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_seq_arr_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_seq_arr_page_leaks) != 0) { + SSDFS_ERR("SEQUENCE ARRAY: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_seq_arr_page_leaks)); + } + + if (atomic64_read(&ssdfs_seq_arr_memory_leaks) != 0) { + SSDFS_ERR("SEQUENCE ARRAY: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_seq_arr_memory_leaks)); + } + + if (atomic64_read(&ssdfs_seq_arr_cache_leaks) != 0) { + SSDFS_ERR("SEQUENCE ARRAY: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_seq_arr_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +/* + * ssdfs_create_sequence_array() - create sequence array + * @revert_threshold: threshold of rollbacking to zero + * + * This method tries to allocate memory and to create + * the sequence array. + * + * RETURN: + * [success] - pointer on created sequence array + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENOMEM - fail to allocate memory. + */ +struct ssdfs_sequence_array * +ssdfs_create_sequence_array(unsigned long revert_threshold) +{ + struct ssdfs_sequence_array *ptr; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("revert_threshold %lu\n", revert_threshold); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (revert_threshold == 0) { + SSDFS_ERR("invalid revert_threshold %lu\n", + revert_threshold); + return ERR_PTR(-EINVAL); + } + + ptr = ssdfs_seq_arr_kmalloc(sizeof(struct ssdfs_sequence_array), + GFP_KERNEL); + if (!ptr) { + SSDFS_ERR("fail to allocate memory\n"); + return ERR_PTR(-ENOMEM); + } + + ptr->revert_threshold = revert_threshold; + spin_lock_init(&ptr->lock); + ptr->last_allocated_id = SSDFS_SEQUENCE_ARRAY_INVALID_ID; + INIT_RADIX_TREE(&ptr->map, GFP_ATOMIC); + + return ptr; +} + +/* + * ssdfs_destroy_sequence_array() - destroy sequence array + * @array: pointer on sequence array object + * @free_item: pointer on function that can free item + * + * This method tries to delete all items from the radix tree, + * to free memory of every item and to free the memory of + * sequence array itself. + */ +void ssdfs_destroy_sequence_array(struct ssdfs_sequence_array *array, + ssdfs_free_item free_item) +{ + struct radix_tree_iter iter; + void __rcu **slot; + void *item_ptr; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array || !free_item); + + SSDFS_DBG("array %p\n", array); +#endif /* CONFIG_SSDFS_DEBUG */ + + rcu_read_lock(); + spin_lock(&array->lock); + radix_tree_for_each_slot(slot, &array->map, &iter, 0) { + item_ptr = rcu_dereference_raw(*slot); + + spin_unlock(&array->lock); + rcu_read_unlock(); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index %llu, ptr %p\n", + (u64)iter.index, item_ptr); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!item_ptr) { + SSDFS_WARN("empty node pointer: " + "index %llu\n", + (u64)iter.index); + } else { + free_item(item_ptr); + } + + rcu_read_lock(); + spin_lock(&array->lock); + + radix_tree_iter_delete(&array->map, &iter, slot); + } + array->last_allocated_id = SSDFS_SEQUENCE_ARRAY_INVALID_ID; + spin_unlock(&array->lock); + rcu_read_unlock(); + + ssdfs_seq_arr_kfree(array); +} + +/* + * ssdfs_sequence_array_init_item() - initialize the array by item + * @array: pointer on sequence array object + * @id: ID of inserting item + * @item: pointer on inserting item + * + * This method tries to initialize the array by item. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + */ +int ssdfs_sequence_array_init_item(struct ssdfs_sequence_array *array, + unsigned long id, void *item) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array || !item); + + SSDFS_DBG("array %p, id %lu, item %p\n", + array, id, item); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (id > array->revert_threshold) { + SSDFS_ERR("invalid input: " + "id %lu, revert_threshold %lu\n", + id, array->revert_threshold); + return -EINVAL; + } + + err = radix_tree_preload(GFP_NOFS); + if (unlikely(err)) { + SSDFS_ERR("fail to preload radix tree: err %d\n", + err); + return err; + } + + spin_lock(&array->lock); + err = radix_tree_insert(&array->map, id, item); + spin_unlock(&array->lock); + + radix_tree_preload_end(); + + if (unlikely(err)) { + SSDFS_ERR("fail to add item into radix tree: " + "id %llu, item %p, err %d\n", + (u64)id, item, err); + return err; + } + + spin_lock(&array->lock); + if (array->last_allocated_id == SSDFS_SEQUENCE_ARRAY_INVALID_ID) + array->last_allocated_id = id; + spin_unlock(&array->lock); + + return 0; +} + +/* + * ssdfs_sequence_array_add_item() - add new item into array + * @array: pointer on sequence array object + * @item: pointer on adding item + * @id: pointer on ID value [out] + * + * This method tries to add a new item into the array. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_sequence_array_add_item(struct ssdfs_sequence_array *array, + void *item, unsigned long *id) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array || !item || !id); + + SSDFS_DBG("array %p, item %p, id %p\n", + array, item, id); +#endif /* CONFIG_SSDFS_DEBUG */ + + *id = SSDFS_SEQUENCE_ARRAY_INVALID_ID; + + err = radix_tree_preload(GFP_NOFS); + if (unlikely(err)) { + SSDFS_ERR("fail to preload radix tree: err %d\n", + err); + return err; + } + + spin_lock(&array->lock); + + if (array->last_allocated_id == SSDFS_SEQUENCE_ARRAY_INVALID_ID) { + err = -ERANGE; + goto finish_add_item; + } else { + if ((array->last_allocated_id + 1) > array->revert_threshold) { + *id = 0; + array->last_allocated_id = 0; + } else { + array->last_allocated_id++; + *id = array->last_allocated_id; + } + } + + if (*id > array->revert_threshold) { + err = -ERANGE; + goto finish_add_item; + } + + err = radix_tree_insert(&array->map, *id, item); + +finish_add_item: + spin_unlock(&array->lock); + + radix_tree_preload_end(); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("id %lu\n", *id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) { + SSDFS_ERR("fail to add item into radix tree: " + "id %llu, last_allocated_id %lu, " + "item %p, err %d\n", + (u64)*id, array->last_allocated_id, + item, err); + return err; + } + + return 0; +} + +/* + * ssdfs_sequence_array_get_item() - retrieve item from array + * @array: pointer on sequence array object + * @id: ID value + * + * This method tries to retrieve the pointer on an item + * with @id value. + * + * RETURN: + * [success] - pointer on existing item. + * [failure] - error code: + * + * %-ENOENT - item is absent. + */ +void *ssdfs_sequence_array_get_item(struct ssdfs_sequence_array *array, + unsigned long id) +{ + void *item_ptr; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); + + SSDFS_DBG("array %p, id %lu\n", + array, id); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&array->lock); + item_ptr = radix_tree_lookup(&array->map, id); + spin_unlock(&array->lock); + + if (!item_ptr) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find the item: id %llu\n", + (u64)id); +#endif /* CONFIG_SSDFS_DEBUG */ + return ERR_PTR(-ENOENT); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("item_ptr %p\n", item_ptr); +#endif /* CONFIG_SSDFS_DEBUG */ + + return item_ptr; +} + +/* + * ssdfs_sequence_array_apply_for_all() - apply action for all items + * @array: pointer on sequence array object + * @apply_action: pointer on method that needs to be applied + * + * This method tries to apply some action on all items.. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_sequence_array_apply_for_all(struct ssdfs_sequence_array *array, + ssdfs_apply_action apply_action) +{ + struct radix_tree_iter iter; + void **slot; + void *item_ptr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array || !apply_action); + + SSDFS_DBG("array %p\n", array); +#endif /* CONFIG_SSDFS_DEBUG */ + + rcu_read_lock(); + + spin_lock(&array->lock); + radix_tree_for_each_slot(slot, &array->map, &iter, 0) { + item_ptr = radix_tree_deref_slot(slot); + if (unlikely(!item_ptr)) { + SSDFS_WARN("empty item ptr: id %llu\n", + (u64)iter.index); + continue; + } + spin_unlock(&array->lock); + + rcu_read_unlock(); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("id %llu, item_ptr %p\n", + (u64)iter.index, item_ptr); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = apply_action(item_ptr); + if (unlikely(err)) { + SSDFS_ERR("fail to apply action: " + "id %llu, err %d\n", + (u64)iter.index, err); + goto finish_apply_to_all; + } + + rcu_read_lock(); + + spin_lock(&array->lock); + } + spin_unlock(&array->lock); + + rcu_read_unlock(); + +finish_apply_to_all: + if (unlikely(err)) { + SSDFS_ERR("fail to apply action for all items: " + "err %d\n", err); + return err; + } + + return 0; +} + +/* + * ssdfs_sequence_array_change_state() - change item's state + * @array: pointer on sequence array object + * @id: ID value + * @old_tag: old tag value + * @new_tag: new tag value + * @change_state: pointer on method of changing item's state + * @old_state: old item's state value + * @new_state: new item's state value + * + * This method tries to change an item's state. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOENT - item is absent. + */ +int ssdfs_sequence_array_change_state(struct ssdfs_sequence_array *array, + unsigned long id, + int old_tag, int new_tag, + ssdfs_change_item_state change_state, + int old_state, int new_state) +{ + void *item_ptr = NULL; + int res; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array || !change_state); + + SSDFS_DBG("array %p, id %lu, " + "old_tag %#x, new_tag %#x, " + "old_state %#x, new_state %#x\n", + array, id, old_tag, new_tag, + old_state, new_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + rcu_read_lock(); + + spin_lock(&array->lock); + item_ptr = radix_tree_lookup(&array->map, id); + if (item_ptr) { + if (old_tag != SSDFS_SEQUENCE_ITEM_NO_TAG) { + res = radix_tree_tag_get(&array->map, id, old_tag); + if (res != 1) + err = -ERANGE; + } + } else + err = -ENOENT; + spin_unlock(&array->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to find item id %llu with tag %#x\n", + (u64)id, old_tag); + goto finish_change_state; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("id %llu, item_ptr %p\n", + (u64)id, item_ptr); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = change_state(item_ptr, old_state, new_state); + if (unlikely(err)) { + SSDFS_ERR("fail to change state: " + "id %llu, old_state %#x, " + "new_state %#x, err %d\n", + (u64)id, old_state, new_state, err); + goto finish_change_state; + } + + spin_lock(&array->lock); + item_ptr = radix_tree_tag_set(&array->map, id, new_tag); + if (old_tag != SSDFS_SEQUENCE_ITEM_NO_TAG) + radix_tree_tag_clear(&array->map, id, old_tag); + spin_unlock(&array->lock); + +finish_change_state: + rcu_read_unlock(); + + return err; +} + +/* + * ssdfs_sequence_array_change_all_states() - change state of all tagged items + * @array: pointer on sequence array object + * @old_tag: old tag value + * @new_tag: new tag value + * @change_state: pointer on method of changing item's state + * @old_state: old item's state value + * @new_state: new item's state value + * @found_items: pointer on count of found items [out] + * + * This method tries to change the state of all tagged items. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_sequence_array_change_all_states(struct ssdfs_sequence_array *ptr, + int old_tag, int new_tag, + ssdfs_change_item_state change_state, + int old_state, int new_state, + unsigned long *found_items) +{ + struct radix_tree_iter iter; + void **slot; + void *item_ptr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr || !change_state || !found_items); + + SSDFS_DBG("array %p, " + "old_tag %#x, new_tag %#x, " + "old_state %#x, new_state %#x\n", + ptr, old_tag, new_tag, + old_state, new_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + *found_items = 0; + + rcu_read_lock(); + + spin_lock(&ptr->lock); + radix_tree_for_each_tagged(slot, &ptr->map, &iter, 0, old_tag) { + item_ptr = radix_tree_deref_slot(slot); + if (unlikely(!item_ptr)) { + SSDFS_WARN("empty item ptr: id %llu\n", + (u64)iter.index); + radix_tree_tag_clear(&ptr->map, iter.index, old_tag); + continue; + } + spin_unlock(&ptr->lock); + + rcu_read_unlock(); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("id %llu, item_ptr %p\n", + (u64)iter.index, item_ptr); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = change_state(item_ptr, old_state, new_state); + if (unlikely(err)) { + SSDFS_ERR("fail to change state: " + "id %llu, old_state %#x, " + "new_state %#x, err %d\n", + (u64)iter.index, old_state, + new_state, err); + goto finish_change_all_states; + } + + (*found_items)++; + + rcu_read_lock(); + + spin_lock(&ptr->lock); + radix_tree_tag_set(&ptr->map, iter.index, new_tag); + radix_tree_tag_clear(&ptr->map, iter.index, old_tag); + } + spin_unlock(&ptr->lock); + + rcu_read_unlock(); + +finish_change_all_states: + if (*found_items == 0 || err) { + SSDFS_ERR("fail to change all items' state\n"); + return err; + } + + return 0; +} + +/* + * has_ssdfs_sequence_array_state() - check that any item is tagged + * @array: pointer on sequence array object + * @tag: checking tag + * + * This method tries to check that any item is tagged. + */ +bool has_ssdfs_sequence_array_state(struct ssdfs_sequence_array *array, + int tag) +{ + bool res; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); + + SSDFS_DBG("array %p, tag %#x\n", array, tag); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&array->lock); + res = radix_tree_tagged(&array->map, tag); + spin_unlock(&array->lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("res %#x\n", res); +#endif /* CONFIG_SSDFS_DEBUG */ + + return res; +} diff --git a/fs/ssdfs/sequence_array.h b/fs/ssdfs/sequence_array.h new file mode 100644 index 000000000000..9a9c21e30cbe --- /dev/null +++ b/fs/ssdfs/sequence_array.h @@ -0,0 +1,119 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/sequence_array.h - sequence array's declarations. + * + * Copyright (c) 2019-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + */ + +#ifndef _SSDFS_SEQUENCE_ARRAY_H +#define _SSDFS_SEQUENCE_ARRAY_H + +#define SSDFS_SEQUENCE_ARRAY_INVALID_ID ULONG_MAX + +#define SSDFS_SEQUENCE_ITEM_NO_TAG 0 +#define SSDFS_SEQUENCE_ITEM_DIRTY_TAG 1 +#define SSDFS_SEQUENCE_ITEM_UNDER_COMMIT_TAG 2 +#define SSDFS_SEQUENCE_ITEM_COMMITED_TAG 3 + +/* + * struct ssdfs_sequence_array - sequence of pointers on items + * @revert_threshold: threshold of reverting the ID numbers' sequence + * @lock: exclusive lock + * @last_allocated_id: the latest ID was allocated + * @map: pointers' radix tree + * + * The sequence array is specialized structure that has goal + * to provide access to items via pointers on the basis of + * ID numbers. It means that every item has dedicated ID but + * sequence array could contain only some portion of existing + * items. Initialization phase has goal to add some limited + * number of existing items into the sequence array. + * The ID number could be reverted from some maximum number + * (threshold) to zero value. + */ +struct ssdfs_sequence_array { + unsigned long revert_threshold; + + spinlock_t lock; + unsigned long last_allocated_id; + struct radix_tree_root map; +}; + +/* function prototype */ +typedef void (*ssdfs_free_item)(void *item); +typedef int (*ssdfs_apply_action)(void *item); +typedef int (*ssdfs_change_item_state)(void *item, + int old_state, + int new_state); + +/* + * Inline functions + */ +static inline +unsigned long ssdfs_sequence_array_last_id(struct ssdfs_sequence_array *array) +{ + unsigned long last_id = ULONG_MAX; + + spin_lock(&array->lock); + last_id = array->last_allocated_id; + spin_unlock(&array->lock); + + return last_id; +} + +static inline +void ssdfs_sequence_array_set_last_id(struct ssdfs_sequence_array *array, + unsigned long id) +{ + spin_lock(&array->lock); + array->last_allocated_id = id; + spin_unlock(&array->lock); +} + +static inline +bool is_ssdfs_sequence_array_last_id_invalid(struct ssdfs_sequence_array *ptr) +{ + bool is_invalid = false; + + spin_lock(&ptr->lock); + is_invalid = ptr->last_allocated_id == SSDFS_SEQUENCE_ARRAY_INVALID_ID; + spin_unlock(&ptr->lock); + + return is_invalid; +} + +/* + * Sequence array API + */ +struct ssdfs_sequence_array * +ssdfs_create_sequence_array(unsigned long revert_threshold); +void ssdfs_destroy_sequence_array(struct ssdfs_sequence_array *array, + ssdfs_free_item free_item); +int ssdfs_sequence_array_init_item(struct ssdfs_sequence_array *array, + unsigned long id, void *item); +int ssdfs_sequence_array_add_item(struct ssdfs_sequence_array *array, + void *item, unsigned long *id); +void *ssdfs_sequence_array_get_item(struct ssdfs_sequence_array *array, + unsigned long id); +int ssdfs_sequence_array_apply_for_all(struct ssdfs_sequence_array *array, + ssdfs_apply_action apply_action); +int ssdfs_sequence_array_change_state(struct ssdfs_sequence_array *array, + unsigned long id, + int old_tag, int new_tag, + ssdfs_change_item_state change_state, + int old_state, int new_state); +int ssdfs_sequence_array_change_all_states(struct ssdfs_sequence_array *ptr, + int old_tag, int new_tag, + ssdfs_change_item_state change_state, + int old_state, int new_state, + unsigned long *found_items); +bool has_ssdfs_sequence_array_state(struct ssdfs_sequence_array *array, + int tag); + +#endif /* _SSDFS_SEQUENCE_ARRAY_H */ From patchwork Sat Feb 25 01:08:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151916 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2AC6C6FA8E for ; Sat, 25 Feb 2023 01:16:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229688AbjBYBQ2 (ORCPT ); Fri, 24 Feb 2023 20:16:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229671AbjBYBQC (ORCPT ); Fri, 24 Feb 2023 20:16:02 -0500 Received: from mail-oi1-x231.google.com (mail-oi1-x231.google.com [IPv6:2607:f8b0:4864:20::231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6757012848 for ; Fri, 24 Feb 2023 17:15:58 -0800 (PST) Received: by mail-oi1-x231.google.com with SMTP id e21so828254oie.1 for ; Fri, 24 Feb 2023 17:15:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=dKfbLSgIY4jw6APVHff0Z1QZ3oHknF0kOQjUcEVCXtE=; b=PM8ph9TzgUIxS8gunR+dY0od/4K15D2KZi15f0N9k708cmOesVWXGOK0M8iviXsipV 94y5eQ3qLiVVMCwVolsgld0yj1aAmYIcUCaHy49lpRP0LtYbzBd6EGhJ+oyzg0ENl9xT X0avr28Ek/LkwKYQ2y3tt2BBQs1jKyQbNbMnDsWeiO5eIWPZPOESUtMt5n4AgiWT1bjV N8G8Oz9IHP//lDEFxjx6lPmpvQMycCIZcAM2Y9e1uCw0wEAAUTni3mxhUsrDJc3i83z4 UQ+B+Yn6RRzybtDFF0jY0u8e+14WjaKvfjfJiyO8FUeJY6Sm+ALXz0mwaC/45nsN//LR u/8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dKfbLSgIY4jw6APVHff0Z1QZ3oHknF0kOQjUcEVCXtE=; b=2mMhzxQBDtVB1vVgvsWFO32VwSeU51ZeXx5DOeqZzvJJPwOcABb7ZIqICSVme9JC7o hDWJaoG8OBAmFjslHz5HFYFZZcWDuG/nnHvAkt4NR71ZbCqdRaVhghlT2Ov/F1vO2ECb BGIftATeGpcEUEKgkulaazhzqltbMA3vlfJcKEcI3+ySU7r8QdHCMvfVyRqt5eWUcDxV TLiRP3N6bR3MPFjlAIOWmDJJvs1ZZILgwquwRQeuRYXhunllPy9rvoLi3zRmlKRk+3Oj EpxYYYo0t4d2mZpogDoqlxCsX6VlW3AxkVR5mL20KFliNRViM65wmGsdYXnCeaRc9XF1 k/qw== X-Gm-Message-State: AO0yUKVo7jlyRXpTIyp+wtqzPymcAMnxk6ZGTwsy87VBMlSKs1VESK8c bAVoc5r2XwsV4j8ulBLSlIWLDWFggrjx+I7z X-Google-Smtp-Source: AK7set/P+uyjm9fXrBOYRI4abHfY/L4KDqFquFyxTg0t3439k27vLw+pEvoEB+Q/ZARKVsW/NhuYMA== X-Received: by 2002:aca:6741:0:b0:378:5987:6dc9 with SMTP id b1-20020aca6741000000b0037859876dc9mr4438335oiy.9.1677287756002; Fri, 24 Feb 2023 17:15:56 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.15.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:15:55 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 10/76] ssdfs: introduce PEB's block bitmap Date: Fri, 24 Feb 2023 17:08:21 -0800 Message-Id: <20230225010927.813929-11-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org SSDFS splits a partition/volume on sequence of fixed-sized segments. Every segment can include one or several Logical Erase Blocks (LEB). LEB can be mapped into "Physical" Erase Block (PEB). Generally speaking, PEB is fixed-sized container that include some number of logical blocks (or NAND flash pages). PEB has block bitmap with the goal to track the state (free, pre-allocated, allocated, invalid) of logical blocks and to account the physical space is used for storing log's metadata (segment header, partial log header, footer). Block bitmap implements API: (1) create - create empty block bitmap (2) destroy - destroy block bitmap object (3) init - intialize block bitmap by metadata from PEB's log (4) snapshot - take block bitmap snapshot for flush operation (5) forget_snapshot - free block bitmap's snapshot resources (6) lock/unlock - lock/unlock block bitmap (7) test_block/test_range - check state of block or range of blocks (8) get_free_pages - get number of free pages (9) get_used_pages - get number of used pages (10) get_invalid_pages - get number of invalid pages (11) pre_allocate - pre_allocate logical block or range of blocks (12) allocate - allocate logical block or range of blocks (13) invalidate - invalidate logical block or range of blocks (14) collect_garbage - get contigous range of blocks in state (15) clean - convert the whole block bitmap into clean state Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/block_bitmap.c | 1209 ++++++++++++++++++++++++++++++++ fs/ssdfs/block_bitmap.h | 370 ++++++++++ fs/ssdfs/block_bitmap_tables.c | 310 ++++++++ 3 files changed, 1889 insertions(+) create mode 100644 fs/ssdfs/block_bitmap.c create mode 100644 fs/ssdfs/block_bitmap.h create mode 100644 fs/ssdfs/block_bitmap_tables.c diff --git a/fs/ssdfs/block_bitmap.c b/fs/ssdfs/block_bitmap.c new file mode 100644 index 000000000000..fd7e84258cf0 --- /dev/null +++ b/fs/ssdfs/block_bitmap.c @@ -0,0 +1,1209 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/block_bitmap.c - PEB's block bitmap implementation. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates. + * https://www.bytedance.com/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + * Cong Wang + */ + +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "page_vector.h" +#include "block_bitmap.h" + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_block_bmap_page_leaks; +atomic64_t ssdfs_block_bmap_memory_leaks; +atomic64_t ssdfs_block_bmap_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_block_bmap_cache_leaks_increment(void *kaddr) + * void ssdfs_block_bmap_cache_leaks_decrement(void *kaddr) + * void *ssdfs_block_bmap_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_block_bmap_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_block_bmap_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_block_bmap_kfree(void *kaddr) + * struct page *ssdfs_block_bmap_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_block_bmap_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_block_bmap_free_page(struct page *page) + * void ssdfs_block_bmap_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(block_bmap) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(block_bmap) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_block_bmap_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_block_bmap_page_leaks, 0); + atomic64_set(&ssdfs_block_bmap_memory_leaks, 0); + atomic64_set(&ssdfs_block_bmap_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_block_bmap_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_block_bmap_page_leaks) != 0) { + SSDFS_ERR("BLOCK BMAP: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_block_bmap_page_leaks)); + } + + if (atomic64_read(&ssdfs_block_bmap_memory_leaks) != 0) { + SSDFS_ERR("BLOCK BMAP: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_block_bmap_memory_leaks)); + } + + if (atomic64_read(&ssdfs_block_bmap_cache_leaks) != 0) { + SSDFS_ERR("BLOCK BMAP: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_block_bmap_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +extern const bool detect_free_blk[U8_MAX + 1]; +extern const bool detect_pre_allocated_blk[U8_MAX + 1]; +extern const bool detect_valid_blk[U8_MAX + 1]; +extern const bool detect_invalid_blk[U8_MAX + 1]; + +#define ALIGNED_START_BLK(blk) ({ \ + u32 aligned_blk; \ + aligned_blk = (blk >> SSDFS_BLK_STATE_BITS) << SSDFS_BLK_STATE_BITS; \ + aligned_blk; \ +}) + +#define ALIGNED_END_BLK(blk) ({ \ + u32 aligned_blk; \ + aligned_blk = blk + SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS) - 1; \ + aligned_blk >>= SSDFS_BLK_STATE_BITS; \ + aligned_blk <<= SSDFS_BLK_STATE_BITS; \ + aligned_blk; \ +}) + +#define SSDFS_BLK_BMAP_STATE_FLAGS_FNS(state, name) \ +static inline \ +bool is_block_bmap_##name(struct ssdfs_block_bmap *blk_bmap) \ +{ \ + return atomic_read(&blk_bmap->flags) & SSDFS_BLK_BMAP_##state; \ +} \ +static inline \ +void set_block_bmap_##name(struct ssdfs_block_bmap *blk_bmap) \ +{ \ + atomic_or(SSDFS_BLK_BMAP_##state, &blk_bmap->flags); \ +} \ +static inline \ +void clear_block_bmap_##name(struct ssdfs_block_bmap *blk_bmap) \ +{ \ + atomic_and(~SSDFS_BLK_BMAP_##state, &blk_bmap->flags); \ +} \ + +/* + * is_block_bmap_initialized() + * set_block_bmap_initialized() + * clear_block_bmap_initialized() + */ +SSDFS_BLK_BMAP_STATE_FLAGS_FNS(INITIALIZED, initialized) + +/* + * is_block_bmap_dirty() + * set_block_bmap_dirty() + * clear_block_bmap_dirty() + */ +SSDFS_BLK_BMAP_STATE_FLAGS_FNS(DIRTY, dirty) + +static +int ssdfs_cache_block_state(struct ssdfs_block_bmap *blk_bmap, + u32 blk, int blk_state); + +bool ssdfs_block_bmap_dirtied(struct ssdfs_block_bmap *blk_bmap) +{ + return is_block_bmap_dirty(blk_bmap); +} + +bool ssdfs_block_bmap_initialized(struct ssdfs_block_bmap *blk_bmap) +{ + return is_block_bmap_initialized(blk_bmap); +} + +void ssdfs_set_block_bmap_initialized(struct ssdfs_block_bmap *blk_bmap) +{ + set_block_bmap_initialized(blk_bmap); +} + +void ssdfs_block_bmap_clear_dirty_state(struct ssdfs_block_bmap *blk_bmap) +{ + SSDFS_DBG("clear dirty state\n"); + clear_block_bmap_dirty(blk_bmap); +} + +static inline +bool is_cache_invalid(struct ssdfs_block_bmap *blk_bmap, int blk_state); +static +int ssdfs_set_range_in_storage(struct ssdfs_block_bmap *blk_bmap, + struct ssdfs_block_bmap_range *range, + int blk_state); +static +int ssdfs_block_bmap_find_block_in_cache(struct ssdfs_block_bmap *blk_bmap, + u32 start, u32 max_blk, + int blk_state, u32 *found_blk); +static +int ssdfs_block_bmap_find_block(struct ssdfs_block_bmap *blk_bmap, + u32 start, u32 max_blk, int blk_state, + u32 *found_blk); + +#ifdef CONFIG_SSDFS_DEBUG +static +void ssdfs_debug_block_bitmap(struct ssdfs_block_bmap *bmap); +#endif /* CONFIG_SSDFS_DEBUG */ + +/* + * ssdfs_block_bmap_storage_destroy() - destroy block bitmap's storage + * @storage: pointer on block bitmap's storage + */ +static +void ssdfs_block_bmap_storage_destroy(struct ssdfs_block_bmap_storage *storage) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!storage); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (storage->state) { + case SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC: + ssdfs_page_vector_release(&storage->array); + ssdfs_page_vector_destroy(&storage->array); + break; + + case SSDFS_BLOCK_BMAP_STORAGE_BUFFER: + if (storage->buf) + ssdfs_block_bmap_kfree(storage->buf); + break; + + default: + SSDFS_WARN("unexpected state %#x\n", storage->state); + break; + } + + storage->state = SSDFS_BLOCK_BMAP_STORAGE_ABSENT; +} + +/* + * ssdfs_block_bmap_destroy() - destroy PEB's block bitmap + * @blk_bmap: pointer on block bitmap + * + * This function releases memory pages of pagevec and + * to free memory of ssdfs_block_bmap structure. + */ +void ssdfs_block_bmap_destroy(struct ssdfs_block_bmap *blk_bmap) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap); + + SSDFS_DBG("blk_bmap %p, items count %zu, " + "bmap bytes %zu\n", + blk_bmap, blk_bmap->items_count, + blk_bmap->bytes_count); + + if (mutex_is_locked(&blk_bmap->lock)) + SSDFS_WARN("block bitmap's mutex is locked\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_block_bmap_initialized(blk_bmap)) + SSDFS_WARN("block bitmap hasn't been initialized\n"); + + if (is_block_bmap_dirty(blk_bmap)) + SSDFS_WARN("block bitmap is dirty\n"); + + ssdfs_block_bmap_storage_destroy(&blk_bmap->storage); +} + +/* + * ssdfs_block_bmap_create_empty_storage() - create block bitmap's storage + * @storage: pointer on block bitmap's storage + * @bmap_bytes: number of bytes in block bitmap + */ +static +int ssdfs_block_bmap_create_empty_storage(struct ssdfs_block_bmap_storage *ptr, + size_t bmap_bytes) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr); + + SSDFS_DBG("storage %p, bmap_bytes %zu\n", + ptr, bmap_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + ptr->state = SSDFS_BLOCK_BMAP_STORAGE_ABSENT; + + if (bmap_bytes > PAGE_SIZE) { + size_t capacity = (bmap_bytes + PAGE_SIZE - 1) / PAGE_SIZE; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(capacity >= U8_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_page_vector_create(&ptr->array, (u8)capacity); + if (unlikely(err)) { + SSDFS_ERR("fail to create page vector: " + "bmap_bytes %zu, capacity %zu, err %d\n", + bmap_bytes, capacity, err); + return err; + } + + err = ssdfs_page_vector_init(&ptr->array); + if (unlikely(err)) { + ssdfs_page_vector_destroy(&ptr->array); + SSDFS_ERR("fail to init page vector: " + "bmap_bytes %zu, capacity %zu, err %d\n", + bmap_bytes, capacity, err); + return err; + } + + ptr->state = SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC; + } else { + ptr->buf = ssdfs_block_bmap_kmalloc(bmap_bytes, GFP_KERNEL); + if (!ptr->buf) { + SSDFS_ERR("fail to allocate memory: " + "bmap_bytes %zu\n", + bmap_bytes); + return -ENOMEM; + } + + ptr->state = SSDFS_BLOCK_BMAP_STORAGE_BUFFER; + } + + return 0; +} + +/* + * ssdfs_block_bmap_init_clean_storage() - init clean block bitmap + * @ptr: pointer on block bitmap object + * @bmap_pages: memory pages count in block bitmap + * + * This function initializes storage space of the clean + * block bitmap. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - unable to allocate memory. + * %-ERANGE - internal error. + */ +static +int ssdfs_block_bmap_init_clean_storage(struct ssdfs_block_bmap *ptr, + size_t bmap_pages) +{ + struct ssdfs_page_vector *array; + struct page *page; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr); + + SSDFS_DBG("bmap %p, storage_state %#x, " + "bmap_bytes %zu, bmap_pages %zu\n", + ptr, ptr->storage.state, + ptr->bytes_count, bmap_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (ptr->storage.state) { + case SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC: + array = &ptr->storage.array; + + if (ssdfs_page_vector_space(array) < bmap_pages) { + SSDFS_ERR("page vector capacity is not enough: " + "capacity %u, free_space %u, " + "bmap_pages %zu\n", + ssdfs_page_vector_capacity(array), + ssdfs_page_vector_space(array), + bmap_pages); + return -ENOMEM; + } + + page = ssdfs_page_vector_allocate(array); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("unable to allocate #%d page\n", i); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + case SSDFS_BLOCK_BMAP_STORAGE_BUFFER: + memset(ptr->storage.buf, 0, ptr->bytes_count); + break; + + default: + SSDFS_ERR("unexpected state %#x\n", ptr->storage.state); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_block_bmap_create() - construct PEB's block bitmap + * @fsi: file system info object + * @ptr: pointer on block bitmap object + * @items_count: count of described items + * @flag: define necessity to allocate memory + * @init_state: block state is used during initialization + * + * This function prepares page vector and + * makes initialization of ssdfs_block_bmap structure. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EOPNOTSUPP - pagevec is too small for block bitmap + * representation. + * %-ENOMEM - unable to allocate memory. + */ +int ssdfs_block_bmap_create(struct ssdfs_fs_info *fsi, + struct ssdfs_block_bmap *ptr, + u32 items_count, + int flag, int init_state) +{ + int max_capacity = SSDFS_BLK_BMAP_FRAGMENTS_CHAIN_MAX; + size_t bmap_bytes = 0; + size_t bmap_pages = 0; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !ptr); + + if (init_state > SSDFS_BLK_STATE_MAX) { + SSDFS_ERR("invalid block state %#x\n", init_state); + return -EINVAL; + } + + SSDFS_DBG("fsi %p, pagesize %u, segsize %u, pages_per_seg %u, " + "items_count %u, flag %#x, init_state %#x\n", + fsi, fsi->pagesize, fsi->segsize, fsi->pages_per_seg, + items_count, flag, init_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + bmap_bytes = BLK_BMAP_BYTES(items_count); + bmap_pages = (bmap_bytes + PAGE_SIZE - 1) / PAGE_SIZE; + + if (bmap_pages > max_capacity) { + SSDFS_WARN("unable to allocate bmap with %zu pages\n", + bmap_pages); + return -EOPNOTSUPP; + } + + mutex_init(&ptr->lock); + atomic_set(&ptr->flags, 0); + ptr->bytes_count = bmap_bytes; + ptr->items_count = items_count; + ptr->metadata_items = 0; + ptr->used_blks = 0; + ptr->invalid_blks = 0; + + err = ssdfs_block_bmap_create_empty_storage(&ptr->storage, bmap_bytes); + if (unlikely(err)) { + SSDFS_ERR("fail to create empty bmap's storage: " + "bmap_bytes %zu, err %d\n", + bmap_bytes, err); + return err; + } + + for (i = 0; i < SSDFS_SEARCH_TYPE_MAX; i++) { + ptr->last_search[i].page_index = max_capacity; + ptr->last_search[i].offset = U16_MAX; + } + + if (flag == SSDFS_BLK_BMAP_INIT) + goto alloc_end; + + err = ssdfs_block_bmap_init_clean_storage(ptr, bmap_pages); + if (unlikely(err)) { + SSDFS_ERR("fail to init clean bmap's storage: " + "bmap_bytes %zu, bmap_pages %zu, err %d\n", + bmap_bytes, bmap_pages, err); + goto destroy_pagevec; + } + + if (init_state != SSDFS_BLK_FREE) { + struct ssdfs_block_bmap_range range = {0, ptr->items_count}; + + err = ssdfs_set_range_in_storage(ptr, &range, init_state); + if (unlikely(err)) { + SSDFS_ERR("fail to initialize block bmap: " + "range (start %u, len %u), " + "init_state %#x, err %d\n", + range.start, range.len, init_state, err); + goto destroy_pagevec; + } + } + + err = ssdfs_cache_block_state(ptr, 0, SSDFS_BLK_FREE); + if (unlikely(err)) { + SSDFS_ERR("fail to cache last free page: err %d\n", + err); + goto destroy_pagevec; + } + + set_block_bmap_initialized(ptr); + +alloc_end: + return 0; + +destroy_pagevec: + ssdfs_block_bmap_destroy(ptr); + return err; +} + +/* + * ssdfs_block_bmap_init_storage() - initialize block bitmap storage + * @blk_bmap: pointer on block bitmap + * @source: prepared pagevec after reading from volume + * + * This function initializes block bitmap's storage on + * the basis of pages @source are read from volume. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_block_bmap_init_storage(struct ssdfs_block_bmap *blk_bmap, + struct ssdfs_page_vector *source) +{ + struct ssdfs_page_vector *array; + struct page *page; +#ifdef CONFIG_SSDFS_DEBUG + void *kaddr; +#endif /* CONFIG_SSDFS_DEBUG */ + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap || !source); + + if (!mutex_is_locked(&blk_bmap->lock)) { + SSDFS_WARN("block bitmap mutex should be locked\n"); + return -EINVAL; + } + + SSDFS_DBG("bmap %p, bmap_bytes %zu\n", + blk_bmap, blk_bmap->bytes_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + array = &blk_bmap->storage.array; + + if (blk_bmap->storage.state != SSDFS_BLOCK_BMAP_STORAGE_ABSENT) { + switch (blk_bmap->storage.state) { + case SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC: + ssdfs_page_vector_release(array); + break; + + case SSDFS_BLOCK_BMAP_STORAGE_BUFFER: + /* Do nothing. We have buffer already */ + break; + + default: + BUG(); + } + } else { + err = ssdfs_block_bmap_create_empty_storage(&blk_bmap->storage, + blk_bmap->bytes_count); + if (unlikely(err)) { + SSDFS_ERR("fail to create empty bmap's storage: " + "err %d\n", err); + return err; + } + } + + switch (blk_bmap->storage.state) { + case SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC: + for (i = 0; i < ssdfs_page_vector_count(source); i++) { + page = ssdfs_page_vector_remove(source, i); + if (IS_ERR_OR_NULL(page)) { + SSDFS_WARN("page %d is NULL\n", i); + return -ERANGE; + } + + ssdfs_lock_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + kaddr = kmap_local_page(page); + SSDFS_DBG("BMAP INIT\n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, 32); + kunmap_local(kaddr); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_page_vector_add(array, page); + ssdfs_unlock_page(page); + + if (unlikely(err)) { + SSDFS_ERR("fail to add page: " + "page_index %d, err %d\n", + i, err); + return err; + } + } + + err = ssdfs_page_vector_reinit(source); + if (unlikely(err)) { + SSDFS_ERR("fail to reinit page vector: " + "err %d\n", err); + return err; + } + break; + + case SSDFS_BLOCK_BMAP_STORAGE_BUFFER: + if (ssdfs_page_vector_count(source) > 1) { + SSDFS_ERR("invalid source pvec size %u\n", + ssdfs_page_vector_count(source)); + return -ERANGE; + } + + page = ssdfs_page_vector_remove(source, 0); + + if (!page) { + SSDFS_WARN("page %d is NULL\n", 0); + return -ERANGE; + } + + ssdfs_lock_page(page); + + ssdfs_memcpy_from_page(blk_bmap->storage.buf, + 0, blk_bmap->bytes_count, + page, 0, PAGE_SIZE, + blk_bmap->bytes_count); + +#ifdef CONFIG_SSDFS_DEBUG + kaddr = kmap_local_page(page); + SSDFS_DBG("BMAP INIT\n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, 32); + kunmap_local(kaddr); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_unlock_page(page); + + ssdfs_block_bmap_account_page(page); + ssdfs_block_bmap_free_page(page); + + ssdfs_page_vector_release(source); + break; + + default: + SSDFS_ERR("unexpected state %#x\n", + blk_bmap->storage.state); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pvec %p, pagevec count %u\n", + source, ssdfs_page_vector_count(source)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +static +int ssdfs_block_bmap_find_range(struct ssdfs_block_bmap *blk_bmap, + u32 start, u32 len, u32 max_blk, + int blk_state, + struct ssdfs_block_bmap_range *range); + +/* + * ssdfs_block_bmap_init() - initialize block bitmap pagevec + * @blk_bmap: pointer on block bitmap + * @source: prepared pagevec after reading from volume + * @last_free_blk: saved on volume last free page + * @metadata_blks: saved on volume reserved metadata blocks count + * @invalid_blks: saved on volume count of invalid blocks + * + * This function initializes block bitmap's pagevec on + * the basis of pages @source are read from volume. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + */ +int ssdfs_block_bmap_init(struct ssdfs_block_bmap *blk_bmap, + struct ssdfs_page_vector *source, + u32 last_free_blk, + u32 metadata_blks, + u32 invalid_blks) +{ + struct ssdfs_block_bmap_range found; + int max_capacity = SSDFS_BLK_BMAP_FRAGMENTS_CHAIN_MAX; + u32 start_item; + int blk_state; + int free_pages; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap || !source); + + if (!mutex_is_locked(&blk_bmap->lock)) { + SSDFS_WARN("block bitmap mutex should be locked\n"); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, source %p, " + "last_free_blk %u, metadata_blks %u, invalid_blks %u\n", + blk_bmap, source, + last_free_blk, metadata_blks, invalid_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_block_bmap_initialized(blk_bmap)) { + if (is_block_bmap_dirty(blk_bmap)) { + SSDFS_WARN("block bitmap has been initialized\n"); + return -ERANGE; + } + + free_pages = ssdfs_block_bmap_get_free_pages(blk_bmap); + if (unlikely(free_pages < 0)) { + err = free_pages; + SSDFS_ERR("fail to define free pages: err %d\n", + err); + return err; + } + + if (free_pages != blk_bmap->items_count) { + SSDFS_WARN("block bitmap has been initialized\n"); + return -ERANGE; + } + + for (i = 0; i < SSDFS_SEARCH_TYPE_MAX; i++) { + blk_bmap->last_search[i].page_index = max_capacity; + blk_bmap->last_search[i].offset = U16_MAX; + } + + ssdfs_block_bmap_storage_destroy(&blk_bmap->storage); + clear_block_bmap_initialized(blk_bmap); + } + + if (ssdfs_page_vector_count(source) == 0) { + SSDFS_ERR("fail to init because of empty pagevec\n"); + return -EINVAL; + } + + if (last_free_blk > blk_bmap->items_count) { + SSDFS_ERR("invalid values: " + "last_free_blk %u, items_count %zu\n", + last_free_blk, blk_bmap->items_count); + return -EINVAL; + } + + if (metadata_blks > blk_bmap->items_count) { + SSDFS_ERR("invalid values: " + "metadata_blks %u, items_count %zu\n", + metadata_blks, blk_bmap->items_count); + return -EINVAL; + } + + blk_bmap->metadata_items = metadata_blks; + + if (invalid_blks > blk_bmap->items_count) { + SSDFS_ERR("invalid values: " + "invalid_blks %u, last_free_blk %u, " + "items_count %zu\n", + invalid_blks, last_free_blk, + blk_bmap->items_count); + return -EINVAL; + } + + blk_bmap->invalid_blks = invalid_blks; + + err = ssdfs_block_bmap_init_storage(blk_bmap, source); + if (unlikely(err)) { + SSDFS_ERR("fail to init bmap's storage: err %d\n", + err); + return err; + } + + err = ssdfs_cache_block_state(blk_bmap, last_free_blk, SSDFS_BLK_FREE); + if (unlikely(err)) { + SSDFS_ERR("fail to cache last free page %u, err %d\n", + last_free_blk, err); + return err; + } + + blk_bmap->used_blks = 0; + + start_item = 0; + blk_state = SSDFS_BLK_VALID; + + do { + err = ssdfs_block_bmap_find_range(blk_bmap, + start_item, + blk_bmap->items_count - start_item, + blk_bmap->items_count, + blk_state, &found); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find more valid blocks: " + "start_item %u\n", + start_item); +#endif /* CONFIG_SSDFS_DEBUG */ + goto check_pre_allocated_blocks; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find range: err %d\n", err); + return err; + } + + blk_bmap->used_blks += found.len; + start_item = found.start + found.len; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("VALID_BLK: range (start %u, len %u)\n", + found.start, found.len); +#endif /* CONFIG_SSDFS_DEBUG */ + } while (start_item < blk_bmap->items_count); + +check_pre_allocated_blocks: + start_item = 0; + blk_state = SSDFS_BLK_PRE_ALLOCATED; + + do { + err = ssdfs_block_bmap_find_range(blk_bmap, + start_item, + blk_bmap->items_count - start_item, + blk_bmap->items_count, + blk_state, &found); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find more pre-allocated blocks: " + "start_item %u\n", + start_item); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_block_bmap_init; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find range: err %d\n", err); + return err; + } + + blk_bmap->used_blks += found.len; + start_item = found.start + found.len; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PRE_ALLOCATED_BLK: range (start %u, len %u)\n", + found.start, found.len); +#endif /* CONFIG_SSDFS_DEBUG */ + } while (start_item < blk_bmap->items_count); + +finish_block_bmap_init: + set_block_bmap_initialized(blk_bmap); + +#ifdef CONFIG_SSDFS_DEBUG + ssdfs_debug_block_bitmap(blk_bmap); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_define_last_free_page() - define last free page + * @blk_bmap: pointer on block bitmap + * @found_blk: found last free page [out] + */ +static +int ssdfs_define_last_free_page(struct ssdfs_block_bmap *blk_bmap, + u32 *found_blk) +{ + int cache_type; + struct ssdfs_last_bmap_search *last_search; + u32 first_cached_blk; + u32 max_blk; + u32 items_per_long = SSDFS_ITEMS_PER_LONG(SSDFS_BLK_STATE_BITS); + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("blk_bmap %p, found_blk %p\n", + blk_bmap, found_blk); + + BUG_ON(!blk_bmap || !found_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + cache_type = SSDFS_GET_CACHE_TYPE(SSDFS_BLK_FREE); + max_blk = blk_bmap->items_count; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(cache_type >= SSDFS_SEARCH_TYPE_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_cache_invalid(blk_bmap, SSDFS_BLK_FREE)) { + err = ssdfs_block_bmap_find_block(blk_bmap, + 0, max_blk, + SSDFS_BLK_FREE, + found_blk); + if (err == -ENODATA) { + *found_blk = blk_bmap->items_count; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find last free block: " + "found_blk %u\n", + *found_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_define_last_free_page; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find last free block: err %d\n", + err); + return err; + } + } else { + last_search = &blk_bmap->last_search[cache_type]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("last_search.cache %lx\n", last_search->cache); +#endif /* CONFIG_SSDFS_DEBUG */ + + first_cached_blk = SSDFS_FIRST_CACHED_BLOCK(last_search); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("first_cached_blk %u\n", + first_cached_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_block_bmap_find_block_in_cache(blk_bmap, + first_cached_blk, + max_blk, + SSDFS_BLK_FREE, + found_blk); + if (err == -ENODATA) { + first_cached_blk += items_per_long; + err = ssdfs_block_bmap_find_block(blk_bmap, + first_cached_blk, + max_blk, + SSDFS_BLK_FREE, + found_blk); + if (err == -ENODATA) { + *found_blk = blk_bmap->items_count; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find last free block: " + "found_blk %u\n", + *found_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_define_last_free_page; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find last free block: err %d\n", + err); + return err; + } + } else if (unlikely(err)) { + SSDFS_ERR("fail to find last free block: err %d\n", + err); + return err; + } + } + +finish_define_last_free_page: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("last free block: %u\n", *found_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_block_bmap_snapshot_storage() - make snapshot of bmap's storage + * @blk_bmap: pointer on block bitmap + * @snapshot: pagevec with snapshot of block bitmap state [out] + * + * This function copies pages of block bitmap's styorage into + * @snapshot pagevec. + * + * RETURN: + * [success] - @snapshot contains copy of block bitmap's state + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENOMEM - unable to allocate memory. + */ +static +int ssdfs_block_bmap_snapshot_storage(struct ssdfs_block_bmap *blk_bmap, + struct ssdfs_page_vector *snapshot) +{ + struct ssdfs_page_vector *array; + struct page *page; +#ifdef CONFIG_SSDFS_DEBUG + void *kaddr; +#endif /* CONFIG_SSDFS_DEBUG */ + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap || !snapshot); + BUG_ON(ssdfs_page_vector_count(snapshot) != 0); + + if (!mutex_is_locked(&blk_bmap->lock)) { + SSDFS_WARN("block bitmap's mutex should be locked\n"); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, snapshot %p\n", + blk_bmap, snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (blk_bmap->storage.state) { + case SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC: + array = &blk_bmap->storage.array; + + for (i = 0; i < ssdfs_page_vector_count(array); i++) { + page = ssdfs_block_bmap_alloc_page(GFP_KERNEL); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("unable to allocate #%d page\n", i); + return err; + } + + ssdfs_memcpy_page(page, 0, PAGE_SIZE, + array->pages[i], 0, PAGE_SIZE, + PAGE_SIZE); + +#ifdef CONFIG_SSDFS_DEBUG + kaddr = kmap_local_page(page); + SSDFS_DBG("BMAP SNAPSHOT\n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, 32); + kunmap_local(kaddr); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_block_bmap_forget_page(page); + err = ssdfs_page_vector_add(snapshot, page); + if (unlikely(err)) { + SSDFS_ERR("fail to add page: " + "index %d, err %d\n", + i, err); + return err; + } + } + + for (; i < ssdfs_page_vector_capacity(array); i++) { + page = ssdfs_block_bmap_alloc_page(GFP_KERNEL); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("unable to allocate #%d page\n", i); + return err; + } + + ssdfs_memzero_page(page, 0, PAGE_SIZE, PAGE_SIZE); + + ssdfs_block_bmap_forget_page(page); + err = ssdfs_page_vector_add(snapshot, page); + if (unlikely(err)) { + SSDFS_ERR("fail to add page: " + "index %d, err %d\n", + i, err); + return err; + } + } + break; + + case SSDFS_BLOCK_BMAP_STORAGE_BUFFER: + page = ssdfs_block_bmap_alloc_page(GFP_KERNEL); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("unable to allocate memory page\n"); + return err; + } + + ssdfs_memcpy_to_page(page, + 0, PAGE_SIZE, + blk_bmap->storage.buf, + 0, blk_bmap->bytes_count, + blk_bmap->bytes_count); + +#ifdef CONFIG_SSDFS_DEBUG + kaddr = kmap_local_page(page); + SSDFS_DBG("BMAP SNAPSHOT\n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, 32); + kunmap_local(kaddr); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_block_bmap_forget_page(page); + err = ssdfs_page_vector_add(snapshot, page); + if (unlikely(err)) { + SSDFS_ERR("fail to add page: " + "err %d\n", err); + return err; + } + break; + + default: + SSDFS_ERR("unexpected state %#x\n", + blk_bmap->storage.state); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_block_bmap_snapshot() - make snapshot of block bitmap's pagevec + * @blk_bmap: pointer on block bitmap + * @snapshot: pagevec with snapshot of block bitmap state [out] + * @last_free_blk: pointer on last free page value [out] + * @metadata_blks: pointer on reserved metadata pages count [out] + * @invalid_blks: pointer on invalid blocks count [out] + * @bytes_count: size of block bitmap in bytes [out] + * + * This function copy pages of block bitmap's pagevec into + * @snapshot pagevec. + * + * RETURN: + * [success] - @snapshot contains copy of block bitmap's state + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENOMEM - unable to allocate memory. + */ +int ssdfs_block_bmap_snapshot(struct ssdfs_block_bmap *blk_bmap, + struct ssdfs_page_vector *snapshot, + u32 *last_free_page, + u32 *metadata_blks, + u32 *invalid_blks, + size_t *bytes_count) +{ + u32 used_pages; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap || !snapshot); + BUG_ON(!last_free_page || !metadata_blks || !bytes_count); + BUG_ON(ssdfs_page_vector_count(snapshot) != 0); + + if (!mutex_is_locked(&blk_bmap->lock)) { + SSDFS_WARN("block bitmap's mutex should be locked\n"); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, snapshot %p, last_free_page %p, " + "metadata_blks %p, bytes_count %p\n", + blk_bmap, snapshot, last_free_page, + metadata_blks, bytes_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_block_bmap_initialized(blk_bmap)) { + SSDFS_WARN("block bitmap hasn't been initialized\n"); + return -EINVAL; + } + + err = ssdfs_block_bmap_snapshot_storage(blk_bmap, snapshot); + if (unlikely(err)) { + SSDFS_ERR("fail to snapshot bmap's storage: err %d\n", err); + goto cleanup_snapshot_pagevec; + } + + err = ssdfs_define_last_free_page(blk_bmap, last_free_page); + if (unlikely(err)) { + SSDFS_ERR("fail to define last free page: err %d\n", err); + goto cleanup_snapshot_pagevec; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("bytes_count %zu, items_count %zu, " + "metadata_items %u, used_blks %u, invalid_blks %u, " + "last_free_page %u\n", + blk_bmap->bytes_count, blk_bmap->items_count, + blk_bmap->metadata_items, blk_bmap->used_blks, + blk_bmap->invalid_blks, *last_free_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (*last_free_page >= blk_bmap->items_count) { + err = -ERANGE; + SSDFS_ERR("invalid last_free_page: " + "bytes_count %zu, items_count %zu, " + "metadata_items %u, used_blks %u, invalid_blks %u, " + "last_free_page %u\n", + blk_bmap->bytes_count, blk_bmap->items_count, + blk_bmap->metadata_items, blk_bmap->used_blks, + blk_bmap->invalid_blks, *last_free_page); + goto cleanup_snapshot_pagevec; + } + + *metadata_blks = blk_bmap->metadata_items; + *invalid_blks = blk_bmap->invalid_blks; + *bytes_count = blk_bmap->bytes_count; + + used_pages = blk_bmap->used_blks + blk_bmap->invalid_blks + + blk_bmap->metadata_items; + + if (used_pages > blk_bmap->items_count) { + err = -ERANGE; + SSDFS_ERR("invalid values: " + "bytes_count %zu, items_count %zu, " + "metadata_items %u, used_blks %u, invalid_blks %u, " + "last_free_page %u\n", + blk_bmap->bytes_count, blk_bmap->items_count, + blk_bmap->metadata_items, blk_bmap->used_blks, + blk_bmap->invalid_blks, *last_free_page); + goto cleanup_snapshot_pagevec; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("clear dirty state\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + clear_block_bmap_dirty(blk_bmap); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("last_free_page %u, metadata_blks %u, " + "bytes_count %zu\n", + *last_free_page, *metadata_blks, *bytes_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; + +cleanup_snapshot_pagevec: + ssdfs_page_vector_release(snapshot); + return err; +} + +void ssdfs_block_bmap_forget_snapshot(struct ssdfs_page_vector *snapshot) +{ + if (!snapshot) + return; + + ssdfs_page_vector_release(snapshot); +} diff --git a/fs/ssdfs/block_bitmap.h b/fs/ssdfs/block_bitmap.h new file mode 100644 index 000000000000..0b036eab3707 --- /dev/null +++ b/fs/ssdfs/block_bitmap.h @@ -0,0 +1,370 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/block_bitmap.h - PEB's block bitmap declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#ifndef _SSDFS_BLOCK_BITMAP_H +#define _SSDFS_BLOCK_BITMAP_H + +#include "common_bitmap.h" + +#define SSDFS_BLK_STATE_BITS 2 +#define SSDFS_BLK_STATE_MASK 0x3 + +enum { + SSDFS_BLK_FREE = 0x0, + SSDFS_BLK_PRE_ALLOCATED = 0x1, + SSDFS_BLK_VALID = 0x3, + SSDFS_BLK_INVALID = 0x2, + SSDFS_BLK_STATE_MAX = SSDFS_BLK_VALID + 1, +}; + +#define SSDFS_FREE_STATES_BYTE 0x00 +#define SSDFS_PRE_ALLOC_STATES_BYTE 0x55 +#define SSDFS_VALID_STATES_BYTE 0xFF +#define SSDFS_INVALID_STATES_BYTE 0xAA + +#define SSDFS_BLK_BMAP_BYTE(blk_state)({ \ + u8 value; \ + switch (blk_state) { \ + case SSDFS_BLK_FREE: \ + value = SSDFS_FREE_STATES_BYTE; \ + break; \ + case SSDFS_BLK_PRE_ALLOCATED: \ + value = SSDFS_PRE_ALLOC_STATES_BYTE; \ + break; \ + case SSDFS_BLK_VALID: \ + value = SSDFS_VALID_STATES_BYTE; \ + break; \ + case SSDFS_BLK_INVALID: \ + value = SSDFS_INVALID_STATES_BYTE; \ + break; \ + default: \ + BUG(); \ + }; \ + value; \ +}) + +#define BLK_BMAP_BYTES(items_count) \ + ((items_count + SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS) - 1) / \ + SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS)) + +static inline +int SSDFS_BLK2PAGE(u32 blk, u8 item_bits, u16 *offset) +{ + u32 blks_per_byte = SSDFS_ITEMS_PER_BYTE(item_bits); + u32 blks_per_long = SSDFS_ITEMS_PER_LONG(item_bits); + u32 blks_per_page = PAGE_SIZE * blks_per_byte; + u32 off; + + if (offset) { + off = (blk % blks_per_page) / blks_per_long; + off *= sizeof(unsigned long); + BUG_ON(off >= U16_MAX); + *offset = off; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("blk %u, item_bits %u, blks_per_byte %u, " + "blks_per_long %u, blks_per_page %u, " + "page_index %u\n", + blk, item_bits, blks_per_byte, + blks_per_long, blks_per_page, + blk / blks_per_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + return blk / blks_per_page; +} + +/* + * struct ssdfs_last_bmap_search - last search in bitmap + * @page_index: index of page in pagevec + * @offset: offset of cache from page's begining + * @cache: cached bmap's part + */ +struct ssdfs_last_bmap_search { + int page_index; + u16 offset; + unsigned long cache; +}; + +static inline +u32 SSDFS_FIRST_CACHED_BLOCK(struct ssdfs_last_bmap_search *search) +{ + u32 blks_per_byte = SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS); + u32 blks_per_page = PAGE_SIZE * blks_per_byte; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page_index %d, offset %u, " + "blks_per_byte %u, blks_per_page %u\n", + search->page_index, + search->offset, + blks_per_byte, blks_per_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + return (search->page_index * blks_per_page) + + (search->offset * blks_per_byte); +} + +enum { + SSDFS_FREE_BLK_SEARCH, + SSDFS_VALID_BLK_SEARCH, + SSDFS_OTHER_BLK_SEARCH, + SSDFS_SEARCH_TYPE_MAX, +}; + +static inline +int SSDFS_GET_CACHE_TYPE(int blk_state) +{ + switch (blk_state) { + case SSDFS_BLK_FREE: + return SSDFS_FREE_BLK_SEARCH; + + case SSDFS_BLK_VALID: + return SSDFS_VALID_BLK_SEARCH; + + case SSDFS_BLK_PRE_ALLOCATED: + case SSDFS_BLK_INVALID: + return SSDFS_OTHER_BLK_SEARCH; + }; + + return SSDFS_SEARCH_TYPE_MAX; +} + +#define SSDFS_BLK_BMAP_INITIALIZED (1 << 0) +#define SSDFS_BLK_BMAP_DIRTY (1 << 1) + +/* + * struct ssdfs_block_bmap_storage - block bitmap's storage + * @state: storage state + * @array: vector of pages + * @buf: pointer on memory buffer + */ +struct ssdfs_block_bmap_storage { + int state; + struct ssdfs_page_vector array; + void *buf; +}; + +/* Block bitmap's storage's states */ +enum { + SSDFS_BLOCK_BMAP_STORAGE_ABSENT, + SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC, + SSDFS_BLOCK_BMAP_STORAGE_BUFFER, + SSDFS_BLOCK_BMAP_STORAGE_STATE_MAX +}; + +/* + * struct ssdfs_block_bmap - in-core segment's block bitmap + * @lock: block bitmap lock + * @flags: block bitmap state flags + * @storage: block bitmap's storage + * @bytes_count: block bitmap size in bytes + * @items_count: items count in bitmap + * @metadata_items: count of metadata items + * @used_blks: count of valid blocks + * @invalid_blks: count of invalid blocks + * @last_search: last search/access cache array + */ +struct ssdfs_block_bmap { + struct mutex lock; + atomic_t flags; + struct ssdfs_block_bmap_storage storage; + size_t bytes_count; + size_t items_count; + u32 metadata_items; + u32 used_blks; + u32 invalid_blks; + struct ssdfs_last_bmap_search last_search[SSDFS_SEARCH_TYPE_MAX]; +}; + +/* + * compare_block_bmap_ranges() - compare two ranges + * @range1: left range + * @range2: right range + * + * RETURN: + * 0: range1 == range2 + * -1: range1 < range2 + * 1: range1 > range2 + */ +static inline +int compare_block_bmap_ranges(struct ssdfs_block_bmap_range *range1, + struct ssdfs_block_bmap_range *range2) +{ + u32 range1_end, range2_end; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!range1 || !range2); + + SSDFS_DBG("range1 (start %u, len %u), range2 (start %u, len %u)\n", + range1->start, range1->len, range2->start, range2->len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (range1->start == range2->start) { + if (range1->len == range2->len) + return 0; + else if (range1->len < range2->len) + return -1; + else + return 1; + } else if (range1->start < range2->start) { + range1_end = range1->start + range1->len; + range2_end = range2->start + range2->len; + + if (range2_end <= range1_end) + return 1; + else + return -1; + } + + /* range1->start > range2->start */ + return -1; +} + +/* + * ranges_have_intersection() - have ranges intersection? + * @range1: left range + * @range2: right range + * + * RETURN: + * [true] - ranges have intersection + * [false] - ranges doesn't intersect + */ +static inline +bool ranges_have_intersection(struct ssdfs_block_bmap_range *range1, + struct ssdfs_block_bmap_range *range2) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!range1 || !range2); + + SSDFS_DBG("range1 (start %u, len %u), range2 (start %u, len %u)\n", + range1->start, range1->len, range2->start, range2->len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if ((range2->start + range2->len) <= range1->start) + return false; + else if ((range1->start + range1->len) <= range2->start) + return false; + + return true; +} + +enum { + SSDFS_BLK_BMAP_CREATE, + SSDFS_BLK_BMAP_INIT, +}; + +/* Function prototypes */ +int ssdfs_block_bmap_create(struct ssdfs_fs_info *fsi, + struct ssdfs_block_bmap *bmap, + u32 items_count, + int flag, int init_state); +void ssdfs_block_bmap_destroy(struct ssdfs_block_bmap *blk_bmap); +int ssdfs_block_bmap_init(struct ssdfs_block_bmap *blk_bmap, + struct ssdfs_page_vector *source, + u32 last_free_blk, + u32 metadata_blks, + u32 invalid_blks); +int ssdfs_block_bmap_snapshot(struct ssdfs_block_bmap *blk_bmap, + struct ssdfs_page_vector *snapshot, + u32 *last_free_page, + u32 *metadata_blks, + u32 *invalid_blks, + size_t *bytes_count); +void ssdfs_block_bmap_forget_snapshot(struct ssdfs_page_vector *snapshot); + +int ssdfs_block_bmap_lock(struct ssdfs_block_bmap *blk_bmap); +bool ssdfs_block_bmap_is_locked(struct ssdfs_block_bmap *blk_bmap); +void ssdfs_block_bmap_unlock(struct ssdfs_block_bmap *blk_bmap); + +bool ssdfs_block_bmap_dirtied(struct ssdfs_block_bmap *blk_bmap); +void ssdfs_block_bmap_clear_dirty_state(struct ssdfs_block_bmap *blk_bmap); +bool ssdfs_block_bmap_initialized(struct ssdfs_block_bmap *blk_bmap); +void ssdfs_set_block_bmap_initialized(struct ssdfs_block_bmap *blk_bmap); + +bool ssdfs_block_bmap_test_block(struct ssdfs_block_bmap *blk_bmap, + u32 blk, int blk_state); +bool ssdfs_block_bmap_test_range(struct ssdfs_block_bmap *blk_bmap, + struct ssdfs_block_bmap_range *range, + int blk_state); +int ssdfs_get_block_state(struct ssdfs_block_bmap *blk_bmap, u32 blk); +int ssdfs_get_range_state(struct ssdfs_block_bmap *blk_bmap, + struct ssdfs_block_bmap_range *range); +int ssdfs_block_bmap_reserve_metadata_pages(struct ssdfs_block_bmap *blk_bmap, + u32 count); +int ssdfs_block_bmap_free_metadata_pages(struct ssdfs_block_bmap *blk_bmap, + u32 count); +int ssdfs_block_bmap_get_free_pages(struct ssdfs_block_bmap *blk_bmap); +int ssdfs_block_bmap_get_used_pages(struct ssdfs_block_bmap *blk_bmap); +int ssdfs_block_bmap_get_invalid_pages(struct ssdfs_block_bmap *blk_bmap); +int ssdfs_block_bmap_pre_allocate(struct ssdfs_block_bmap *blk_bmap, + u32 start, u32 *len, + struct ssdfs_block_bmap_range *range); +int ssdfs_block_bmap_allocate(struct ssdfs_block_bmap *blk_bmap, + u32 start, u32 *len, + struct ssdfs_block_bmap_range *range); +int ssdfs_block_bmap_invalidate(struct ssdfs_block_bmap *blk_bmap, + struct ssdfs_block_bmap_range *range); +int ssdfs_block_bmap_collect_garbage(struct ssdfs_block_bmap *blk_bmap, + u32 start, u32 max_len, + int blk_state, + struct ssdfs_block_bmap_range *range); +int ssdfs_block_bmap_clean(struct ssdfs_block_bmap *blk_bmap); + +#define SSDFS_BLK_BMAP_FNS(state, name) \ +static inline \ +bool is_block_##name(struct ssdfs_block_bmap *blk_bmap, u32 blk) \ +{ \ + return ssdfs_block_bmap_test_block(blk_bmap, blk, \ + SSDFS_BLK_##state); \ +} \ +static inline \ +bool is_range_##name(struct ssdfs_block_bmap *blk_bmap, \ + struct ssdfs_block_bmap_range *range) \ +{ \ + return ssdfs_block_bmap_test_range(blk_bmap, range, \ + SSDFS_BLK_##state); \ +} \ + +/* + * is_block_free() + * is_range_free() + */ +SSDFS_BLK_BMAP_FNS(FREE, free) + +/* + * is_block_pre_allocated() + * is_range_pre_allocated() + */ +SSDFS_BLK_BMAP_FNS(PRE_ALLOCATED, pre_allocated) + +/* + * is_block_valid() + * is_range_valid() + */ +SSDFS_BLK_BMAP_FNS(VALID, valid) + +/* + * is_block_invalid() + * is_range_invalid() + */ +SSDFS_BLK_BMAP_FNS(INVALID, invalid) + +#endif /* _SSDFS_BLOCK_BITMAP_H */ diff --git a/fs/ssdfs/block_bitmap_tables.c b/fs/ssdfs/block_bitmap_tables.c new file mode 100644 index 000000000000..4f7e04a8a9b6 --- /dev/null +++ b/fs/ssdfs/block_bitmap_tables.c @@ -0,0 +1,310 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/block_bitmap_tables.c - declaration of block bitmap's search tables. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include + +/* + * Table for determination presence of free block + * state in provided byte. Checking byte is used + * as index in array. + */ +const bool detect_free_blk[U8_MAX + 1] = { +/* 00 - 0x00 */ true, true, true, true, +/* 01 - 0x04 */ true, true, true, true, +/* 02 - 0x08 */ true, true, true, true, +/* 03 - 0x0C */ true, true, true, true, +/* 04 - 0x10 */ true, true, true, true, +/* 05 - 0x14 */ true, true, true, true, +/* 06 - 0x18 */ true, true, true, true, +/* 07 - 0x1C */ true, true, true, true, +/* 08 - 0x20 */ true, true, true, true, +/* 09 - 0x24 */ true, true, true, true, +/* 10 - 0x28 */ true, true, true, true, +/* 11 - 0x2C */ true, true, true, true, +/* 12 - 0x30 */ true, true, true, true, +/* 13 - 0x34 */ true, true, true, true, +/* 14 - 0x38 */ true, true, true, true, +/* 15 - 0x3C */ true, true, true, true, +/* 16 - 0x40 */ true, true, true, true, +/* 17 - 0x44 */ true, true, true, true, +/* 18 - 0x48 */ true, true, true, true, +/* 19 - 0x4C */ true, true, true, true, +/* 20 - 0x50 */ true, true, true, true, +/* 21 - 0x54 */ true, false, false, false, +/* 22 - 0x58 */ true, false, false, false, +/* 23 - 0x5C */ true, false, false, false, +/* 24 - 0x60 */ true, true, true, true, +/* 25 - 0x64 */ true, false, false, false, +/* 26 - 0x68 */ true, false, false, false, +/* 27 - 0x6C */ true, false, false, false, +/* 28 - 0x70 */ true, true, true, true, +/* 29 - 0x74 */ true, false, false, false, +/* 30 - 0x78 */ true, false, false, false, +/* 31 - 0x7C */ true, false, false, false, +/* 32 - 0x80 */ true, true, true, true, +/* 33 - 0x84 */ true, true, true, true, +/* 34 - 0x88 */ true, true, true, true, +/* 35 - 0x8C */ true, true, true, true, +/* 36 - 0x90 */ true, true, true, true, +/* 37 - 0x94 */ true, false, false, false, +/* 38 - 0x98 */ true, false, false, false, +/* 39 - 0x9C */ true, false, false, false, +/* 40 - 0xA0 */ true, true, true, true, +/* 41 - 0xA4 */ true, false, false, false, +/* 42 - 0xA8 */ true, false, false, false, +/* 43 - 0xAC */ true, false, false, false, +/* 44 - 0xB0 */ true, true, true, true, +/* 45 - 0xB4 */ true, false, false, false, +/* 46 - 0xB8 */ true, false, false, false, +/* 47 - 0xBC */ true, false, false, false, +/* 48 - 0xC0 */ true, true, true, true, +/* 49 - 0xC4 */ true, true, true, true, +/* 50 - 0xC8 */ true, true, true, true, +/* 51 - 0xCC */ true, true, true, true, +/* 52 - 0xD0 */ true, true, true, true, +/* 53 - 0xD4 */ true, false, false, false, +/* 54 - 0xD8 */ true, false, false, false, +/* 55 - 0xDC */ true, false, false, false, +/* 56 - 0xE0 */ true, true, true, true, +/* 57 - 0xE4 */ true, false, false, false, +/* 58 - 0xE8 */ true, false, false, false, +/* 59 - 0xEC */ true, false, false, false, +/* 60 - 0xF0 */ true, true, true, true, +/* 61 - 0xF4 */ true, false, false, false, +/* 62 - 0xF8 */ true, false, false, false, +/* 63 - 0xFC */ true, false, false, false +}; + +/* + * Table for determination presence of pre-allocated + * block state in provided byte. Checking byte is used + * as index in array. + */ +const bool detect_pre_allocated_blk[U8_MAX + 1] = { +/* 00 - 0x00 */ false, true, false, false, +/* 01 - 0x04 */ true, true, true, true, +/* 02 - 0x08 */ false, true, false, false, +/* 03 - 0x0C */ false, true, false, false, +/* 04 - 0x10 */ true, true, true, true, +/* 05 - 0x14 */ true, true, true, true, +/* 06 - 0x18 */ true, true, true, true, +/* 07 - 0x1C */ true, true, true, true, +/* 08 - 0x20 */ false, true, false, false, +/* 09 - 0x24 */ true, true, true, true, +/* 10 - 0x28 */ false, true, false, false, +/* 11 - 0x2C */ false, true, false, false, +/* 12 - 0x30 */ false, true, false, false, +/* 13 - 0x34 */ true, true, true, true, +/* 14 - 0x38 */ false, true, false, false, +/* 15 - 0x3C */ false, true, false, false, +/* 16 - 0x40 */ true, true, true, true, +/* 17 - 0x44 */ true, true, true, true, +/* 18 - 0x48 */ true, true, true, true, +/* 19 - 0x4C */ true, true, true, true, +/* 20 - 0x50 */ true, true, true, true, +/* 21 - 0x54 */ true, true, true, true, +/* 22 - 0x58 */ true, true, true, true, +/* 23 - 0x5C */ true, true, true, true, +/* 24 - 0x60 */ true, true, true, true, +/* 25 - 0x64 */ true, true, true, true, +/* 26 - 0x68 */ true, true, true, true, +/* 27 - 0x6C */ true, true, true, true, +/* 28 - 0x70 */ true, true, true, true, +/* 29 - 0x74 */ true, true, true, true, +/* 30 - 0x78 */ true, true, true, true, +/* 31 - 0x7C */ true, true, true, true, +/* 32 - 0x80 */ false, true, false, false, +/* 33 - 0x84 */ true, true, true, true, +/* 34 - 0x88 */ false, true, false, false, +/* 35 - 0x8C */ false, true, false, false, +/* 36 - 0x90 */ true, true, true, true, +/* 37 - 0x94 */ true, true, true, true, +/* 38 - 0x98 */ true, true, true, true, +/* 39 - 0x9C */ true, true, true, true, +/* 40 - 0xA0 */ false, true, false, false, +/* 41 - 0xA4 */ true, true, true, true, +/* 42 - 0xA8 */ false, true, false, false, +/* 43 - 0xAC */ false, true, false, false, +/* 44 - 0xB0 */ false, true, false, false, +/* 45 - 0xB4 */ true, true, true, true, +/* 46 - 0xB8 */ false, true, false, false, +/* 47 - 0xBC */ false, true, false, false, +/* 48 - 0xC0 */ false, true, false, false, +/* 49 - 0xC4 */ true, true, true, true, +/* 50 - 0xC8 */ false, true, false, false, +/* 51 - 0xCC */ false, true, false, false, +/* 52 - 0xD0 */ true, true, true, true, +/* 53 - 0xD4 */ true, true, true, true, +/* 54 - 0xD8 */ true, true, true, true, +/* 55 - 0xDC */ true, true, true, true, +/* 56 - 0xE0 */ false, true, false, false, +/* 57 - 0xE4 */ true, true, true, true, +/* 58 - 0xE8 */ false, true, false, false, +/* 59 - 0xEC */ false, true, false, false, +/* 60 - 0xF0 */ false, true, false, false, +/* 61 - 0xF4 */ true, true, true, true, +/* 62 - 0xF8 */ false, true, false, false, +/* 63 - 0xFC */ false, true, false, false +}; + +/* + * Table for determination presence of valid block + * state in provided byte. Checking byte is used + * as index in array. + */ +const bool detect_valid_blk[U8_MAX + 1] = { +/* 00 - 0x00 */ false, false, false, true, +/* 01 - 0x04 */ false, false, false, true, +/* 02 - 0x08 */ false, false, false, true, +/* 03 - 0x0C */ true, true, true, true, +/* 04 - 0x10 */ false, false, false, true, +/* 05 - 0x14 */ false, false, false, true, +/* 06 - 0x18 */ false, false, false, true, +/* 07 - 0x1C */ true, true, true, true, +/* 08 - 0x20 */ false, false, false, true, +/* 09 - 0x24 */ false, false, false, true, +/* 10 - 0x28 */ false, false, false, true, +/* 11 - 0x2C */ true, true, true, true, +/* 12 - 0x30 */ true, true, true, true, +/* 13 - 0x34 */ true, true, true, true, +/* 14 - 0x38 */ true, true, true, true, +/* 15 - 0x3C */ true, true, true, true, +/* 16 - 0x40 */ false, false, false, true, +/* 17 - 0x44 */ false, false, false, true, +/* 18 - 0x48 */ false, false, false, true, +/* 19 - 0x4C */ true, true, true, true, +/* 20 - 0x50 */ false, false, false, true, +/* 21 - 0x54 */ false, false, false, true, +/* 22 - 0x58 */ false, false, false, true, +/* 23 - 0x5C */ true, true, true, true, +/* 24 - 0x60 */ false, false, false, true, +/* 25 - 0x64 */ false, false, false, true, +/* 26 - 0x68 */ false, false, false, true, +/* 27 - 0x6C */ true, true, true, true, +/* 28 - 0x70 */ true, true, true, true, +/* 29 - 0x74 */ true, true, true, true, +/* 30 - 0x78 */ true, true, true, true, +/* 31 - 0x7C */ true, true, true, true, +/* 32 - 0x80 */ false, false, false, true, +/* 33 - 0x84 */ false, false, false, true, +/* 34 - 0x88 */ false, false, false, true, +/* 35 - 0x8C */ true, true, true, true, +/* 36 - 0x90 */ false, false, false, true, +/* 37 - 0x94 */ false, false, false, true, +/* 38 - 0x98 */ false, false, false, true, +/* 39 - 0x9C */ true, true, true, true, +/* 40 - 0xA0 */ false, false, false, true, +/* 41 - 0xA4 */ false, false, false, true, +/* 42 - 0xA8 */ false, false, false, true, +/* 43 - 0xAC */ true, true, true, true, +/* 44 - 0xB0 */ true, true, true, true, +/* 45 - 0xB4 */ true, true, true, true, +/* 46 - 0xB8 */ true, true, true, true, +/* 47 - 0xBC */ true, true, true, true, +/* 48 - 0xC0 */ true, true, true, true, +/* 49 - 0xC4 */ true, true, true, true, +/* 50 - 0xC8 */ true, true, true, true, +/* 51 - 0xCC */ true, true, true, true, +/* 52 - 0xD0 */ true, true, true, true, +/* 53 - 0xD4 */ true, true, true, true, +/* 54 - 0xD8 */ true, true, true, true, +/* 55 - 0xDC */ true, true, true, true, +/* 56 - 0xE0 */ true, true, true, true, +/* 57 - 0xE4 */ true, true, true, true, +/* 58 - 0xE8 */ true, true, true, true, +/* 59 - 0xEC */ true, true, true, true, +/* 60 - 0xF0 */ true, true, true, true, +/* 61 - 0xF4 */ true, true, true, true, +/* 62 - 0xF8 */ true, true, true, true, +/* 63 - 0xFC */ true, true, true, true +}; + +/* + * Table for determination presence of invalid block + * state in provided byte. Checking byte is used + * as index in array. + */ +const bool detect_invalid_blk[U8_MAX + 1] = { +/* 00 - 0x00 */ false, false, true, false, +/* 01 - 0x04 */ false, false, true, false, +/* 02 - 0x08 */ true, true, true, true, +/* 03 - 0x0C */ false, false, true, false, +/* 04 - 0x10 */ false, false, true, false, +/* 05 - 0x14 */ false, false, true, false, +/* 06 - 0x18 */ true, true, true, true, +/* 07 - 0x1C */ false, false, true, false, +/* 08 - 0x20 */ true, true, true, true, +/* 09 - 0x24 */ true, true, true, true, +/* 10 - 0x28 */ true, true, true, true, +/* 11 - 0x2C */ true, true, true, true, +/* 12 - 0x30 */ false, false, true, false, +/* 13 - 0x34 */ false, false, true, false, +/* 14 - 0x38 */ true, true, true, true, +/* 15 - 0x3C */ false, false, true, false, +/* 16 - 0x40 */ false, false, true, false, +/* 17 - 0x44 */ false, false, true, false, +/* 18 - 0x48 */ true, true, true, true, +/* 19 - 0x4C */ false, false, true, false, +/* 20 - 0x50 */ false, false, true, false, +/* 21 - 0x54 */ false, false, true, false, +/* 22 - 0x58 */ true, true, true, true, +/* 23 - 0x5C */ false, false, true, false, +/* 24 - 0x60 */ true, true, true, true, +/* 25 - 0x64 */ true, true, true, true, +/* 26 - 0x68 */ true, true, true, true, +/* 27 - 0x6C */ true, true, true, true, +/* 28 - 0x70 */ false, false, true, false, +/* 29 - 0x74 */ false, false, true, false, +/* 30 - 0x78 */ true, true, true, true, +/* 31 - 0x7C */ false, false, true, false, +/* 32 - 0x80 */ true, true, true, true, +/* 33 - 0x84 */ true, true, true, true, +/* 34 - 0x88 */ true, true, true, true, +/* 35 - 0x8C */ true, true, true, true, +/* 36 - 0x90 */ true, true, true, true, +/* 37 - 0x94 */ true, true, true, true, +/* 38 - 0x98 */ true, true, true, true, +/* 39 - 0x9C */ true, true, true, true, +/* 40 - 0xA0 */ true, true, true, true, +/* 41 - 0xA4 */ true, true, true, true, +/* 42 - 0xA8 */ true, true, true, true, +/* 43 - 0xAC */ true, true, true, true, +/* 44 - 0xB0 */ true, true, true, true, +/* 45 - 0xB4 */ true, true, true, true, +/* 46 - 0xB8 */ true, true, true, true, +/* 47 - 0xBC */ true, true, true, true, +/* 48 - 0xC0 */ false, false, true, false, +/* 49 - 0xC4 */ false, false, true, false, +/* 50 - 0xC8 */ true, true, true, true, +/* 51 - 0xCC */ false, false, true, false, +/* 52 - 0xD0 */ false, false, true, false, +/* 53 - 0xD4 */ false, false, true, false, +/* 54 - 0xD8 */ true, true, true, true, +/* 55 - 0xDC */ false, false, true, false, +/* 56 - 0xE0 */ true, true, true, true, +/* 57 - 0xE4 */ true, true, true, true, +/* 58 - 0xE8 */ true, true, true, true, +/* 59 - 0xEC */ true, true, true, true, +/* 60 - 0xF0 */ false, false, true, false, +/* 61 - 0xF4 */ false, false, true, false, +/* 62 - 0xF8 */ true, true, true, true, +/* 63 - 0xFC */ false, false, true, false +}; From patchwork Sat Feb 25 01:08:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151918 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C345C7EE2F for ; Sat, 25 Feb 2023 01:16:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229681AbjBYBQ3 (ORCPT ); Fri, 24 Feb 2023 20:16:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229672AbjBYBQE (ORCPT ); Fri, 24 Feb 2023 20:16:04 -0500 Received: from mail-oi1-x232.google.com (mail-oi1-x232.google.com [IPv6:2607:f8b0:4864:20::232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4AAE012BE7 for ; Fri, 24 Feb 2023 17:15:59 -0800 (PST) Received: by mail-oi1-x232.google.com with SMTP id bm20so798498oib.7 for ; Fri, 24 Feb 2023 17:15:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=G1MzfL9njinVXGzkkO+L0kaA19mWqqXlNNrD94cIGqw=; b=jV42U2HzsDPEaPk/W/W9zugYvHELWWxUPvFN9yfIGMXISdlAMJhE5//2CRIvmt2Gg9 rzQQI0sJdLotedWbTfz+SOVJFRIIAQr2+3LA9+45Fau9yuA4D1vUGHpJaHOVMea4NS82 imlvQOPpFK5qmOhkcE7cbHK2/zg1hh0ITRgJgATnDyEWmI56LGd2HgwjzUpqtHxJnCS7 T2Hh4pbRk3y7RRFOaPGvOf6aiDW4yhGjcoJDf3U9c0+Sr0CL6M3rHiQXD8ICKvmV+lue F5wd3LJ7fLbKZCeaLd5BjULlEHBoTFMY5QZrU0G3468z6QgR2uanlkhoD2LfOtkkbj0C qiEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=G1MzfL9njinVXGzkkO+L0kaA19mWqqXlNNrD94cIGqw=; b=qsypAloyNobItBm+t97+6dc43QlYwfsXahjwZOQLfX2aPSWwtNTmum4eutf9mxvrl1 nSkKYmHRYWn2fLbc1obNJjNCe5NIpn+Gw6BCgDI+qOAagCBwbcUoL0AbKBjdQ/ea0AgX EiO3Z1Oq3mSEMH+IyNiXOJRDV71kIYLbyb6sekd1Cl2/i2v/OWCsWvMPRqG25Q1JRuID ljCVEbdusDISFQfL8lXQ2HkQKVqFKI/pMKYFfWKoFrn41sWt3qz8f8JbarXg/r4/QwMH 6w/vkKDavDR6FPoo7EnCs8xl8d2Iib3kkobo8X9BC9T4EcFPMJWElT35IjcHm0juYvzj WTYQ== X-Gm-Message-State: AO0yUKUUV2LBf515jb83J6/zdYdyBvIeUd7YKTuaO6wa06X4FLbo8xSs VTE1bPwu6AFq/pHTVX4BynwE0jE8hEJgD1BW X-Google-Smtp-Source: AK7set/tFzIU3eRtOrLWWiyzlFUja+2k1s5Aa1ymIypbx17QdTasSMzYDhSaW7OYMuRLXUYfLAbtHw== X-Received: by 2002:a05:6808:68f:b0:37e:c9d4:ca3d with SMTP id k15-20020a056808068f00b0037ec9d4ca3dmr4509384oig.27.1677287757836; Fri, 24 Feb 2023 17:15:57 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.15.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:15:57 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 11/76] ssdfs: block bitmap search operations implementation Date: Fri, 24 Feb 2023 17:08:22 -0800 Message-Id: <20230225010927.813929-12-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Implement internal block bitmap's search operations for pre_allocate, allocate, and collect_garbage operations. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/block_bitmap.c | 3401 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 3401 insertions(+) diff --git a/fs/ssdfs/block_bitmap.c b/fs/ssdfs/block_bitmap.c index fd7e84258cf0..3e3ddb6ff745 100644 --- a/fs/ssdfs/block_bitmap.c +++ b/fs/ssdfs/block_bitmap.c @@ -1207,3 +1207,3404 @@ void ssdfs_block_bmap_forget_snapshot(struct ssdfs_page_vector *snapshot) ssdfs_page_vector_release(snapshot); } + +/* + * ssdfs_block_bmap_lock() - lock segment's block bitmap + * @blk_bmap: pointer on block bitmap + */ +int ssdfs_block_bmap_lock(struct ssdfs_block_bmap *blk_bmap) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("blk_bmap %p\n", blk_bmap); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = mutex_lock_killable(&blk_bmap->lock); + if (err) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + return err; + } + + return 0; +} + +/* + * ssdfs_block_bmap_is_locked() - check that block bitmap is locked + * @blk_bmap: pointer on block bitmap + */ +bool ssdfs_block_bmap_is_locked(struct ssdfs_block_bmap *blk_bmap) +{ + return mutex_is_locked(&blk_bmap->lock); +} + +/* + * ssdfs_block_bmap_unlock() - unlock segment's block bitmap + * @blk_bmap: pointer on block bitmap + */ +void ssdfs_block_bmap_unlock(struct ssdfs_block_bmap *blk_bmap) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("blk_bmap %p\n", blk_bmap); +#endif /* CONFIG_SSDFS_DEBUG */ + + mutex_unlock(&blk_bmap->lock); +} + +/* + * ssdfs_get_cache_type() - define cache type for block + * @blk_bmap: pointer on block bitmap + * @blk: block number + * + * RETURN: + * [success] - cache type + * [failure] - SSDFS_SEARCH_TYPE_MAX + */ +static +int ssdfs_get_cache_type(struct ssdfs_block_bmap *blk_bmap, + u32 blk) +{ + int page_index; + u16 offset; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap); + + SSDFS_DBG("blk_bmap %p, block %u\n", blk_bmap, blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_index = SSDFS_BLK2PAGE(blk, SSDFS_BLK_STATE_BITS, &offset); + + for (i = 0; i < SSDFS_SEARCH_TYPE_MAX; i++) { + struct ssdfs_last_bmap_search *last; + + last = &blk_bmap->last_search[i]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("last->page_index %d, page_index %d, " + "last->offset %u, offset %u, " + "search_type %#x\n", + last->page_index, page_index, + last->offset, offset, i); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (last->page_index == page_index && + last->offset == offset) + return i; + } + + return SSDFS_SEARCH_TYPE_MAX; +} + +/* + * is_block_state_cached() - check that block state is in cache + * @blk_bmap: pointer on block bitmap + * @blk: block number + * + * RETURN: + * [true] - block state is in cache + * [false] - cache doesn't contain block state + */ +static +bool is_block_state_cached(struct ssdfs_block_bmap *blk_bmap, + u32 blk) +{ + int cache_type; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap); + + SSDFS_DBG("blk_bmap %p, block %u\n", blk_bmap, blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + cache_type = ssdfs_get_cache_type(blk_bmap, blk); + + if (cache_type < 0) { + SSDFS_ERR("invalid cache type %d\n", cache_type); + return false; + } + + if (cache_type >= SSDFS_SEARCH_TYPE_MAX) + return false; + + return true; +} + +/* + * ssdfs_determine_cache_type() - detect type of cache for value + * @cache: value for caching + * + * RETURN: suggested type of cache + */ +static +int ssdfs_determine_cache_type(unsigned long cache) +{ + size_t bytes_per_long = sizeof(cache); + size_t criterion = bytes_per_long / 2; + u8 bytes[SSDFS_BLK_STATE_MAX] = {0}; + int i; + + for (i = 0; i < bytes_per_long; i++) { + int cur_state = (int)((cache >> (i * BITS_PER_BYTE)) & 0xFF); + + switch (cur_state) { + case SSDFS_FREE_STATES_BYTE: + bytes[SSDFS_BLK_FREE]++; + break; + + case SSDFS_PRE_ALLOC_STATES_BYTE: + bytes[SSDFS_BLK_PRE_ALLOCATED]++; + break; + + case SSDFS_VALID_STATES_BYTE: + bytes[SSDFS_BLK_VALID]++; + break; + + case SSDFS_INVALID_STATES_BYTE: + bytes[SSDFS_BLK_INVALID]++; + break; + + default: + /* mix of block states */ + break; + }; + } + + if (bytes[SSDFS_BLK_FREE] > criterion) + return SSDFS_FREE_BLK_SEARCH; + else if (bytes[SSDFS_BLK_VALID] > criterion) + return SSDFS_VALID_BLK_SEARCH; + + return SSDFS_OTHER_BLK_SEARCH; +} + +/* + * ssdfs_cache_block_state() - cache block state from pagevec + * @blk_bmap: pointer on block bitmap + * @blk: segment's block + * @blk_state: state as hint for cache type determination + * + * This function retrieves state of @blk from pagevec + * and save retrieved value for requested type of cache. + * If @blk_state has SSDFS_BLK_STATE_MAX value then function + * defines block state and to cache value in proper place. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-EOPNOTSUPP - invalid page index. + */ +static +int ssdfs_cache_block_state(struct ssdfs_block_bmap *blk_bmap, + u32 blk, int blk_state) +{ + struct ssdfs_page_vector *array; + int page_index; + u16 offset; + void *kaddr; + unsigned long cache; + int cache_type; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap); + + SSDFS_DBG("blk_bmap %p, block %u, state %#x\n", + blk_bmap, blk, blk_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (blk_state > SSDFS_BLK_STATE_MAX) { + SSDFS_ERR("invalid block state %#x\n", blk_state); + return -EINVAL; + } + + if (is_block_state_cached(blk_bmap, blk)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("block %u has been cached already\n", blk); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + page_index = SSDFS_BLK2PAGE(blk, SSDFS_BLK_STATE_BITS, &offset); + + switch (blk_bmap->storage.state) { + case SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC: + array = &blk_bmap->storage.array; + + if (page_index >= ssdfs_page_vector_capacity(array)) { + SSDFS_ERR("invalid page index %d\n", page_index); + return -EOPNOTSUPP; + } + + if (page_index >= ssdfs_page_vector_count(array)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("absent page index %d\n", page_index); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOENT; + } + + err = ssdfs_memcpy_from_page(&cache, + 0, sizeof(unsigned long), + array->pages[page_index], + offset, PAGE_SIZE, + sizeof(unsigned long)); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + return err; + } + break; + + case SSDFS_BLOCK_BMAP_STORAGE_BUFFER: + if (page_index > 0) { + SSDFS_ERR("invalid page_index %d\n", page_index); + return -ERANGE; + } + + kaddr = blk_bmap->storage.buf; + err = ssdfs_memcpy(&cache, 0, sizeof(unsigned long), + kaddr, offset, blk_bmap->bytes_count, + sizeof(unsigned long)); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + return err; + } + break; + + default: + SSDFS_ERR("unexpected state %#x\n", blk_bmap->storage.state); + return -ERANGE; + } + + cache_type = ssdfs_determine_cache_type(cache); + BUG_ON(cache_type >= SSDFS_SEARCH_TYPE_MAX); + + blk_bmap->last_search[cache_type].page_index = page_index; + blk_bmap->last_search[cache_type].offset = offset; + blk_bmap->last_search[cache_type].cache = cache; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("last_search.cache %lx, cache_type %#x, " + "page_index %d, offset %u\n", + cache, cache_type, + page_index, offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_define_bits_shift_in_cache() - calculate bit shift of block in cache + * @blk_bmap: pointer on block bitmap + * @cache_type: type of cache + * @blk: segment's block + * + * This function calculates bit shift of @blk in cache of + * @cache_type. + * + * RETURN: + * [success] - bit shift + * [failure] - error code: + * + * %-EINVAL - invalid input value. + */ +static +int ssdfs_define_bits_shift_in_cache(struct ssdfs_block_bmap *blk_bmap, + int cache_type, u32 blk) +{ + struct ssdfs_last_bmap_search *last_search; + u32 first_cached_block, diff; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap); + + if (blk >= blk_bmap->items_count) { + SSDFS_ERR("invalid block %u\n", blk); + return -EINVAL; + } + + if (cache_type < 0) { + SSDFS_ERR("invalid cache type %d\n", cache_type); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, cache_type %#x, blk %u\n", + blk_bmap, cache_type, blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (cache_type >= SSDFS_SEARCH_TYPE_MAX) { + SSDFS_ERR("cache doesn't contain block %u\n", blk); + return -EINVAL; + } + + last_search = &blk_bmap->last_search[cache_type]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("last_search.cache %lx\n", last_search->cache); +#endif /* CONFIG_SSDFS_DEBUG */ + + first_cached_block = SSDFS_FIRST_CACHED_BLOCK(last_search); + + if (first_cached_block > blk) { + SSDFS_ERR("first_cached_block %u > blk %u\n", + first_cached_block, blk); + return -EINVAL; + } + + diff = blk - first_cached_block; + +#ifdef CONFIG_SSDFS_DEBUG + if (diff >= (U32_MAX / SSDFS_BLK_STATE_BITS)) { + SSDFS_ERR("invalid diff %u; blk %u, first_cached_block %u\n", + diff, blk, first_cached_block); + return -EINVAL; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + diff *= SSDFS_BLK_STATE_BITS; + +#ifdef CONFIG_SSDFS_DEBUG + if (diff > (BITS_PER_LONG - SSDFS_BLK_STATE_BITS)) { + SSDFS_ERR("invalid diff %u; bits_per_long %u, " + "bits_per_state %u\n", + diff, BITS_PER_LONG, SSDFS_BLK_STATE_BITS); + return -EINVAL; + } + + SSDFS_DBG("diff %u\n", diff); +#endif /* CONFIG_SSDFS_DEBUG */ + + return (int)diff; +} + +/* + * ssdfs_get_block_state_from_cache() - retrieve block state from cache + * @blk_bmap: pointer on block bitmap + * @blk: segment's block + * + * This function retrieve state of @blk from cache. + * + * RETURN: + * [success] - state of block + * [failure] - error code: + * + * %-EINVAL - invalid input value. + */ +static +int ssdfs_get_block_state_from_cache(struct ssdfs_block_bmap *blk_bmap, + u32 blk) +{ + int cache_type; + struct ssdfs_last_bmap_search *last_search; + int shift; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap); + + if (blk >= blk_bmap->items_count) { + SSDFS_ERR("invalid block %u\n", blk); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, block %u\n", blk_bmap, blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + cache_type = ssdfs_get_cache_type(blk_bmap, blk); + shift = ssdfs_define_bits_shift_in_cache(blk_bmap, cache_type, blk); + if (unlikely(shift < 0)) { + SSDFS_ERR("fail to define bits shift: " + "cache_type %d, blk %u, err %d\n", + cache_type, blk, shift); + return shift; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(cache_type >= SSDFS_SEARCH_TYPE_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + last_search = &blk_bmap->last_search[cache_type]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("last_search.cache %lx\n", last_search->cache); +#endif /* CONFIG_SSDFS_DEBUG */ + + return (int)((last_search->cache >> shift) & SSDFS_BLK_STATE_MASK); +} + +/* + * ssdfs_set_block_state_in_cache() - set block state in cache + * @blk_bmap: pointer on block bitmap + * @blk: segment's block + * @blk_state: new state of @blk + * + * This function sets state @blk_state of @blk in cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input value. + */ +static +int ssdfs_set_block_state_in_cache(struct ssdfs_block_bmap *blk_bmap, + u32 blk, int blk_state) +{ + int cache_type; + int shift; + unsigned long value, *cached_value; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap); + + if (blk >= blk_bmap->items_count) { + SSDFS_ERR("invalid block %u\n", blk); + return -EINVAL; + } + + if (blk_state > SSDFS_BLK_STATE_MAX) { + SSDFS_ERR("invalid block state %#x\n", blk_state); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, block %u, blk_state %#x\n", + blk_bmap, blk, blk_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + cache_type = ssdfs_get_cache_type(blk_bmap, blk); + shift = ssdfs_define_bits_shift_in_cache(blk_bmap, cache_type, blk); + if (unlikely(shift < 0)) { + SSDFS_ERR("fail to define bits shift: " + "cache_type %d, blk %u, err %d\n", + cache_type, blk, shift); + return shift; + } + + value = blk_state & SSDFS_BLK_STATE_MASK; + value <<= shift; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(cache_type >= SSDFS_SEARCH_TYPE_MAX); + + SSDFS_DBG("value %lx, cache %lx\n", + value, + blk_bmap->last_search[cache_type].cache); +#endif /* CONFIG_SSDFS_DEBUG */ + + cached_value = &blk_bmap->last_search[cache_type].cache; + *cached_value &= ~((unsigned long)SSDFS_BLK_STATE_MASK << shift); + *cached_value |= value; + + return 0; +} + +/* + * ssdfs_save_cache_in_storage() - save cached values in storage + * @blk_bmap: pointer on block bitmap + * + * This function saves cached values in storage. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input value. + */ +static +int ssdfs_save_cache_in_storage(struct ssdfs_block_bmap *blk_bmap) +{ + struct ssdfs_page_vector *array; + void *kaddr; + int max_capacity = SSDFS_BLK_BMAP_FRAGMENTS_CHAIN_MAX; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap); + + SSDFS_DBG("blk_bmap %p\n", blk_bmap); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < SSDFS_SEARCH_TYPE_MAX; i++) { + int page_index = blk_bmap->last_search[i].page_index; + u16 offset = blk_bmap->last_search[i].offset; + unsigned long cache = blk_bmap->last_search[i].cache; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search_type %d, page_index %d, offset %u\n", + i, page_index, offset); + SSDFS_DBG("last_search.cache %lx\n", cache); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (page_index == max_capacity || offset == U16_MAX) + continue; + + switch (blk_bmap->storage.state) { + case SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC: + array = &blk_bmap->storage.array; + + if (page_index >= ssdfs_page_vector_capacity(array)) { + SSDFS_ERR("block bmap's cache is corrupted: " + "page_index %d, offset %u\n", + page_index, (u32)offset); + return -EINVAL; + } + + while (page_index >= ssdfs_page_vector_count(array)) { + struct page *page; + + page = ssdfs_page_vector_allocate(array); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : + PTR_ERR(page)); + SSDFS_ERR("unable to allocate page\n"); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + err = ssdfs_memcpy_to_page(array->pages[page_index], + offset, PAGE_SIZE, + &cache, + 0, sizeof(unsigned long), + sizeof(unsigned long)); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + return err; + } + break; + + case SSDFS_BLOCK_BMAP_STORAGE_BUFFER: + if (page_index > 0) { + SSDFS_ERR("invalid page_index %d\n", page_index); + return -ERANGE; + } + + kaddr = blk_bmap->storage.buf; + err = ssdfs_memcpy(kaddr, offset, blk_bmap->bytes_count, + &cache, 0, sizeof(unsigned long), + sizeof(unsigned long)); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + return err; + } + break; + + default: + SSDFS_ERR("unexpected state %#x\n", + blk_bmap->storage.state); + return -ERANGE; + } + } + + return 0; +} + +/* + * is_cache_invalid() - check that cache is invalid for requested state + * @blk_bmap: pointer on block bitmap + * @blk_state: requested block's state + * + * RETURN: + * [true] - cache doesn't been initialized yet. + * [false] - cache is valid. + */ +static inline +bool is_cache_invalid(struct ssdfs_block_bmap *blk_bmap, int blk_state) +{ + struct ssdfs_last_bmap_search *last_search; + int cache_type = SSDFS_GET_CACHE_TYPE(blk_state); + int max_capacity = SSDFS_BLK_BMAP_FRAGMENTS_CHAIN_MAX; + + if (cache_type >= SSDFS_SEARCH_TYPE_MAX) { + SSDFS_ERR("invalid cache type %#x, blk_state %#x\n", + cache_type, blk_state); + return true; + } + + last_search = &blk_bmap->last_search[cache_type]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("last_search.cache %lx\n", last_search->cache); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (last_search->page_index >= max_capacity || + last_search->offset == U16_MAX) + return true; + + return false; +} + +/* + * BYTE_CONTAINS_STATE() - check that provided byte contains state + * @value: pointer on analysed byte + * @blk_state: requested block's state + * + * RETURN: + * [true] - @value contains @blk_state. + * [false] - @value hasn't @blk_state. + */ +static inline +bool BYTE_CONTAINS_STATE(u8 *value, int blk_state) +{ + switch (blk_state) { + case SSDFS_BLK_FREE: + return detect_free_blk[*value]; + + case SSDFS_BLK_PRE_ALLOCATED: + return detect_pre_allocated_blk[*value]; + + case SSDFS_BLK_VALID: + return detect_valid_blk[*value]; + + case SSDFS_BLK_INVALID: + return detect_invalid_blk[*value]; + }; + + return false; +} + +/* + * ssdfs_block_bmap_find_block_in_cache() - find block for state in cache + * @blk_bmap: pointer on block bitmap + * @start: starting block for search + * @max_blk: upper bound for search + * @blk_state: requested block's state + * @found_blk: pointer on found block for requested state [out] + * + * This function tries to find in block block bitmap with @blk_state + * in range [@start, @max_blk). + * + * RETURN: + * [success] - @found_blk contains found block number for @blk_state. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ENODATA - requested range [@start, @max_blk) doesn't contain + * any block with @blk_state. + */ +static +int ssdfs_block_bmap_find_block_in_cache(struct ssdfs_block_bmap *blk_bmap, + u32 start, u32 max_blk, + int blk_state, u32 *found_blk) +{ + int cache_type = SSDFS_GET_CACHE_TYPE(blk_state); + u32 items_per_byte = SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS); + struct ssdfs_last_bmap_search *last_search; + u32 first_cached_blk; + u32 byte_index; + u8 blks_diff; + size_t bytes_per_long = sizeof(unsigned long); + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap || !found_blk); + + if (blk_state > SSDFS_BLK_STATE_MAX) { + SSDFS_ERR("invalid block state %#x\n", blk_state); + return -EINVAL; + } + + if (start >= blk_bmap->items_count) { + SSDFS_ERR("invalid start block %u\n", start); + return -EINVAL; + } + + if (start > max_blk) { + SSDFS_ERR("start %u > max_blk %u\n", start, max_blk); + return -EINVAL; + } + + if (!is_block_state_cached(blk_bmap, start)) { + SSDFS_ERR("cache doesn't contain start %u\n", start); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, start %u, max_blk %u, " + "state %#x, found_blk %p\n", + blk_bmap, start, max_blk, blk_state, found_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (cache_type >= SSDFS_SEARCH_TYPE_MAX) { + SSDFS_ERR("invalid cache type %#x, blk_state %#x\n", + cache_type, blk_state); + return -EINVAL; + } + + *found_blk = max_blk; + last_search = &blk_bmap->last_search[cache_type]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("last_search.cache %lx\n", last_search->cache); +#endif /* CONFIG_SSDFS_DEBUG */ + + first_cached_blk = SSDFS_FIRST_CACHED_BLOCK(last_search); + blks_diff = start - first_cached_blk; + byte_index = blks_diff / items_per_byte; + blks_diff = blks_diff % items_per_byte; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("first_cached_blk %u, start %u, " + "byte_index %u, bytes_per_long %zu\n", + first_cached_blk, start, + byte_index, bytes_per_long); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (; byte_index < bytes_per_long; byte_index++) { + u8 *value = (u8 *)&last_search->cache + byte_index; + u8 found_off; + + err = FIND_FIRST_ITEM_IN_BYTE(value, blk_state, + SSDFS_BLK_STATE_BITS, + SSDFS_BLK_STATE_MASK, + blks_diff, + BYTE_CONTAINS_STATE, + FIRST_STATE_IN_BYTE, + &found_off); + if (err == -ENODATA) { + blks_diff = 0; + continue; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find block in byte: " + "start_off %u, blk_state %#x, err %d\n", + blks_diff, blk_state, err); + return err; + } + + *found_blk = first_cached_blk; + *found_blk += byte_index * items_per_byte; + *found_blk += found_off; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("block %u has been found for state %#x\n", + *found_blk, blk_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; + } + + return -ENODATA; +} + +static inline +int ssdfs_block_bmap_define_start_item(int page_index, + u32 start, + u32 aligned_start, + u32 aligned_end, + u32 *start_byte, + u32 *rest_bytes, + u8 *item_offset) +{ + u32 items_per_byte = SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS); + u32 items_per_page = PAGE_SIZE * items_per_byte; + u32 items; + u32 offset; + + if ((page_index * items_per_page) <= aligned_start) + offset = aligned_start % items_per_page; + else + offset = aligned_start; + + *start_byte = offset / items_per_byte; + + items = items_per_page - offset; + + if (aligned_end <= start) { + SSDFS_ERR("page_index %d, start %u, " + "aligned_start %u, aligned_end %u, " + "start_byte %u, rest_bytes %u, item_offset %u\n", + page_index, start, + aligned_start, aligned_end, + *start_byte, *rest_bytes, *item_offset); + SSDFS_WARN("aligned_end %u <= start %u\n", + aligned_end, start); + return -ERANGE; + } else + items = min_t(u32, items, aligned_end); + + *rest_bytes = items + items_per_byte - 1; + *rest_bytes /= items_per_byte; + + *item_offset = (u8)(start - aligned_start); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page_index %d, start %u, " + "aligned_start %u, aligned_end %u, " + "start_byte %u, rest_bytes %u, item_offset %u\n", + page_index, start, + aligned_start, aligned_end, + *start_byte, *rest_bytes, *item_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_block_bmap_find_block_in_memory_range() - find block in memory range + * @kaddr: pointer on memory range + * @blk_state: requested state of searching block + * @byte_index: index of byte in memory range [in|out] + * @search_bytes: upper bound for search + * @start_off: starting bit offset in byte + * @found_off: pointer on found byte's offset [out] + * + * This function searches a block with requested @blk_state + * into memory range. + * + * RETURN: + * [success] - found byte's offset in @found_off. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ENODATA - block with requested state is not found. + */ +static +int ssdfs_block_bmap_find_block_in_memory_range(void *kaddr, + int blk_state, + u32 *byte_index, + u32 search_bytes, + u8 start_off, + u8 *found_off) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!kaddr || !byte_index || !found_off); + + if (blk_state > SSDFS_BLK_STATE_MAX) { + SSDFS_ERR("invalid block state %#x\n", blk_state); + return -EINVAL; + } + + SSDFS_DBG("blk_state %#x, byte_index %u, " + "search_bytes %u, start_off %u\n", + blk_state, *byte_index, + search_bytes, start_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (; *byte_index < search_bytes; ++(*byte_index)) { + u8 *value = (u8 *)kaddr + *byte_index; + + err = FIND_FIRST_ITEM_IN_BYTE(value, blk_state, + SSDFS_BLK_STATE_BITS, + SSDFS_BLK_STATE_MASK, + start_off, + BYTE_CONTAINS_STATE, + FIRST_STATE_IN_BYTE, + found_off); + if (err == -ENODATA) { + start_off = 0; + continue; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find block in byte: " + "start_off %u, blk_state %#x, " + "err %d\n", + start_off, blk_state, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("offset %u has been found for state %#x, " + "err %d\n", + *found_off, blk_state, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; + } + + return -ENODATA; +} + +/* + * ssdfs_block_bmap_find_block_in_buffer() - find block in buffer with state + * @blk_bmap: pointer on block bitmap + * @start: start position for search + * @max_blk: upper bound for search + * @blk_state: requested state of searching block + * @found_blk: pointer on found block number [out] + * + * This function searches a block with requested @blk_state + * from @start till @max_blk (not inclusive) into buffer. + * The found block's number is returned via @found_blk. + * + * RETURN: + * [success] - found block number in @found_blk. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ENODATA - block with requested state is not found. + */ +static +int ssdfs_block_bmap_find_block_in_buffer(struct ssdfs_block_bmap *blk_bmap, + u32 start, u32 max_blk, + int blk_state, u32 *found_blk) +{ + u32 items_per_byte = SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS); + u32 aligned_start, aligned_end; + u32 byte_index, search_bytes = U32_MAX; + u32 rest_bytes = U32_MAX; + u8 start_off = U8_MAX; + void *kaddr; + u8 found_off = U8_MAX; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap || !found_blk); + + if (blk_state > SSDFS_BLK_STATE_MAX) { + SSDFS_ERR("invalid block state %#x\n", blk_state); + return -EINVAL; + } + + if (start >= blk_bmap->items_count) { + SSDFS_ERR("invalid start block %u\n", start); + return -EINVAL; + } + + if (start > max_blk) { + SSDFS_ERR("start %u > max_blk %u\n", start, max_blk); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, start %u, max_blk %u, " + "state %#x, found_blk %p\n", + blk_bmap, start, max_blk, blk_state, found_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + *found_blk = max_blk; + + aligned_start = ALIGNED_START_BLK(start); + aligned_end = ALIGNED_END_BLK(max_blk); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("blk_state %#x, start %u, max_blk %u, " + "aligned_start %u, aligned_end %u\n", + blk_state, start, max_blk, + aligned_start, aligned_end); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_block_bmap_define_start_item(0, + start, + aligned_start, + aligned_end, + &byte_index, + &rest_bytes, + &start_off); + if (unlikely(err)) { + SSDFS_ERR("fail to define start item: " + "blk_state %#x, start %u, max_blk %u, " + "aligned_start %u, aligned_end %u\n", + blk_state, start, max_blk, + aligned_start, aligned_end); + return err; + } + + kaddr = blk_bmap->storage.buf; + search_bytes = byte_index + rest_bytes; + + err = ssdfs_block_bmap_find_block_in_memory_range(kaddr, blk_state, + &byte_index, + search_bytes, + start_off, + &found_off); + if (err == -ENODATA) { + /* no item has been found */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find block: " + "start_off %u, blk_state %#x, " + "err %d\n", + start_off, blk_state, err); + return err; + } + + *found_blk = byte_index * items_per_byte; + *found_blk += found_off; + + if (*found_blk >= max_blk) + err = -ENODATA; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("block %u has been found for state %#x, " + "err %d\n", + *found_blk, blk_state, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_block_bmap_find_block_in_pagevec() - find block in pagevec with state + * @blk_bmap: pointer on block bitmap + * @start: start position for search + * @max_blk: upper bound for search + * @blk_state: requested state of searching block + * @found_blk: pointer on found block number [out] + * + * This function searches a block with requested @blk_state + * from @start till @max_blk (not inclusive) into pagevec. + * The found block's number is returned via @found_blk. + * + * RETURN: + * [success] - found block number in @found_blk. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ENODATA - block with requested state is not found. + */ +static +int ssdfs_block_bmap_find_block_in_pagevec(struct ssdfs_block_bmap *blk_bmap, + u32 start, u32 max_blk, + int blk_state, u32 *found_blk) +{ + struct ssdfs_page_vector *array; + u32 items_per_byte = SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS); + size_t items_per_page = PAGE_SIZE * items_per_byte; + u32 aligned_start, aligned_end; + struct page *page; + void *kaddr; + int page_index; + u8 found_off = U8_MAX; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap || !found_blk); + + if (blk_state > SSDFS_BLK_STATE_MAX) { + SSDFS_ERR("invalid block state %#x\n", blk_state); + return -EINVAL; + } + + if (start >= blk_bmap->items_count) { + SSDFS_ERR("invalid start block %u\n", start); + return -EINVAL; + } + + if (start > max_blk) { + SSDFS_ERR("start %u > max_blk %u\n", start, max_blk); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, start %u, max_blk %u, " + "state %#x, found_blk %p\n", + blk_bmap, start, max_blk, blk_state, found_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + *found_blk = max_blk; + + array = &blk_bmap->storage.array; + + aligned_start = ALIGNED_START_BLK(start); + aligned_end = ALIGNED_END_BLK(max_blk); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("blk_state %#x, start %u, max_blk %u, " + "aligned_start %u, aligned_end %u\n", + blk_state, start, max_blk, + aligned_start, aligned_end); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_index = aligned_start / items_per_page; + + if (page_index >= ssdfs_page_vector_capacity(array)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page_index %d >= capacity %u\n", + page_index, + ssdfs_page_vector_capacity(array)); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + + if (page_index >= ssdfs_page_vector_count(array)) { + if (blk_state != SSDFS_BLK_FREE) + return -ENODATA; + + while (page_index >= ssdfs_page_vector_count(array)) { + page = ssdfs_page_vector_allocate(array); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("unable to allocate page\n"); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + *found_blk = page_index * items_per_page; + + if (*found_blk >= max_blk) + err = -ENODATA; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("block %u has been found for state %#x, " + "err %d\n", + *found_blk, blk_state, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; + } + + for (; page_index < ssdfs_page_vector_capacity(array); page_index++) { + u32 byte_index, search_bytes = U32_MAX; + u32 rest_bytes = U32_MAX; + u8 start_off = U8_MAX; + + if (page_index == ssdfs_page_vector_count(array)) { + if (blk_state != SSDFS_BLK_FREE) + return -ENODATA; + + page = ssdfs_page_vector_allocate(array); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("unable to allocate page\n"); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + *found_blk = page_index * items_per_page; + + if (*found_blk >= max_blk) + err = -ENODATA; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("block %u has been found for state %#x, " + "err %d\n", + *found_blk, blk_state, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; + } + + err = ssdfs_block_bmap_define_start_item(page_index, start, + aligned_start, + aligned_end, + &byte_index, + &rest_bytes, + &start_off); + if (unlikely(err)) { + SSDFS_ERR("fail to define start item: " + "blk_state %#x, start %u, max_blk %u, " + "aligned_start %u, aligned_end %u\n", + blk_state, start, max_blk, + aligned_start, aligned_end); + return err; + } + + search_bytes = byte_index + rest_bytes; + + kaddr = kmap_local_page(array->pages[page_index]); + err = ssdfs_block_bmap_find_block_in_memory_range(kaddr, + blk_state, + &byte_index, + search_bytes, + start_off, + &found_off); + kunmap_local(kaddr); + + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("no item has been found: " + "page_index %d, " + "page_vector_count %u, " + "page_vector_capacity %u\n", + page_index, + ssdfs_page_vector_count(array), + ssdfs_page_vector_capacity(array)); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find block: " + "start_off %u, blk_state %#x, " + "err %d\n", + start_off, blk_state, err); + return err; + } + + *found_blk = page_index * items_per_page; + *found_blk += byte_index * items_per_byte; + *found_blk += found_off; + + if (*found_blk >= max_blk) + err = -ENODATA; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("block %u has been found for state %#x, " + "err %d\n", + *found_blk, blk_state, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; + } + + return -ENODATA; +} + +/* + * ssdfs_block_bmap_find_block_in_storage() - find block in storage with state + * @blk_bmap: pointer on block bitmap + * @start: start position for search + * @max_blk: upper bound for search + * @blk_state: requested state of searching block + * @found_blk: pointer on found block number [out] + * + * This function searches a block with requested @blk_state + * from @start till @max_blk (not inclusive) into storage. + * The found block's number is returned via @found_blk. + * + * RETURN: + * [success] - found block number in @found_blk. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ENODATA - block with requested state is not found. + */ +static +int ssdfs_block_bmap_find_block_in_storage(struct ssdfs_block_bmap *blk_bmap, + u32 start, u32 max_blk, + int blk_state, u32 *found_blk) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap || !found_blk); + + if (blk_state > SSDFS_BLK_STATE_MAX) { + SSDFS_ERR("invalid block state %#x\n", blk_state); + return -EINVAL; + } + + if (start >= blk_bmap->items_count) { + SSDFS_ERR("invalid start block %u\n", start); + return -EINVAL; + } + + if (start > max_blk) { + SSDFS_ERR("start %u > max_blk %u\n", start, max_blk); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, start %u, max_blk %u, " + "state %#x, found_blk %p\n", + blk_bmap, start, max_blk, blk_state, found_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (blk_bmap->storage.state) { + case SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC: + err = ssdfs_block_bmap_find_block_in_pagevec(blk_bmap, + start, + max_blk, + blk_state, + found_blk); + break; + + case SSDFS_BLOCK_BMAP_STORAGE_BUFFER: + err = ssdfs_block_bmap_find_block_in_buffer(blk_bmap, + start, + max_blk, + blk_state, + found_blk); + break; + + default: + SSDFS_ERR("unexpected state %#x\n", + blk_bmap->storage.state); + return -ERANGE; + } + + return err; +} + +/* + * ssdfs_block_bmap_find_block() - find block with requested state + * @blk_bmap: pointer on block bitmap + * @start: start position for search + * @max_blk: upper bound for search + * @blk_state: requested state of searching block + * @found_blk: pointer on found block number [out] + * + * This function searches a block with requested @blk_state + * from @start till @max_blk (not inclusive). The found block's + * number is returned via @found_blk. If @blk_state has + * SSDFS_BLK_STATE_MAX then it needs to get block state + * for @start block number, simply. + * + * RETURN: + * [success] - found block number in @found_blk. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ENODATA - block with requested state is not found. + */ +static +int ssdfs_block_bmap_find_block(struct ssdfs_block_bmap *blk_bmap, + u32 start, u32 max_blk, int blk_state, + u32 *found_blk) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap || !found_blk); + + if (blk_state > SSDFS_BLK_STATE_MAX) { + SSDFS_ERR("invalid block state %#x\n", blk_state); + return -EINVAL; + } + + if (start >= blk_bmap->items_count) { + SSDFS_ERR("invalid start block %u\n", start); + return -EINVAL; + } + + if (start > max_blk) { + SSDFS_ERR("start %u > max_blk %u\n", start, max_blk); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, start %u, max_blk %u, " + "state %#x, found_blk %p\n", + blk_bmap, start, max_blk, blk_state, found_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (blk_state == SSDFS_BLK_STATE_MAX) { + err = ssdfs_cache_block_state(blk_bmap, start, blk_state); + if (unlikely(err)) { + SSDFS_ERR("unable to cache block %u state: err %d\n", + start, err); + return err; + } + + *found_blk = start; + return 0; + } + + *found_blk = max_blk; + max_blk = min_t(u32, max_blk, blk_bmap->items_count); + + if (is_cache_invalid(blk_bmap, blk_state)) { + err = ssdfs_block_bmap_find_block_in_storage(blk_bmap, + start, max_blk, + blk_state, + found_blk); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find block in pagevec: " + "start %u, max_blk %u, state %#x\n", + 0, max_blk, blk_state); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find block in pagevec: " + "start %u, max_blk %u, state %#x, err %d\n", + 0, max_blk, blk_state, err); + goto fail_find; + } + + err = ssdfs_cache_block_state(blk_bmap, *found_blk, blk_state); + if (unlikely(err)) { + SSDFS_ERR("fail to cache block: " + "found_blk %u, state %#x, err %d\n", + *found_blk, blk_state, err); + goto fail_find; + } + } + + if (*found_blk >= start && *found_blk < max_blk) + goto end_search; + + if (is_block_state_cached(blk_bmap, start)) { + err = ssdfs_block_bmap_find_block_in_cache(blk_bmap, + start, max_blk, + blk_state, + found_blk); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find block in cache: " + "start %u, max_blk %u, state %#x\n", + start, max_blk, blk_state); +#endif /* CONFIG_SSDFS_DEBUG */ + /* + * Continue to search in pagevec + */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to find block in cache: " + "start %u, max_blk %u, state %#x, err %d\n", + start, max_blk, blk_state, err); + goto fail_find; + } else if (*found_blk >= start && *found_blk < max_blk) + goto end_search; + } + + err = ssdfs_block_bmap_find_block_in_storage(blk_bmap, start, max_blk, + blk_state, found_blk); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find block in pagevec: " + "start %u, max_blk %u, state %#x\n", + start, max_blk, blk_state); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find block in pagevec: " + "start %u, max_blk %u, state %#x, err %d\n", + start, max_blk, blk_state, err); + goto fail_find; + } + + switch (SSDFS_GET_CACHE_TYPE(blk_state)) { + case SSDFS_FREE_BLK_SEARCH: + case SSDFS_OTHER_BLK_SEARCH: + err = ssdfs_cache_block_state(blk_bmap, *found_blk, blk_state); + if (unlikely(err)) { + SSDFS_ERR("fail to cache block: " + "found_blk %u, state %#x, err %d\n", + *found_blk, blk_state, err); + goto fail_find; + } + break; + + default: + /* do nothing */ + break; + } + +end_search: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("block %u has been found for state %#x\n", + *found_blk, blk_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; + +fail_find: + return err; +} + +/* + * BYTE_CONTAIN_DIVERSE_STATES() - check that byte contains diverse state + * @value: pointer on analysed byte + * @blk_state: requested block's state + * + * RETURN: + * [true] - @value contains diverse states. + * [false] - @value contains @blk_state only. + */ +static inline +bool BYTE_CONTAIN_DIVERSE_STATES(u8 *value, int blk_state) +{ + switch (blk_state) { + case SSDFS_BLK_FREE: + return *value != SSDFS_FREE_STATES_BYTE; + + case SSDFS_BLK_PRE_ALLOCATED: + return *value != SSDFS_PRE_ALLOC_STATES_BYTE; + + case SSDFS_BLK_VALID: + return *value != SSDFS_VALID_STATES_BYTE; + + case SSDFS_BLK_INVALID: + return *value != SSDFS_INVALID_STATES_BYTE; + }; + + return false; +} + +/* + * GET_FIRST_DIFF_STATE() - determine first block offset for different state + * @value: pointer on analysed byte + * @blk_state: requested block's state + * @start_off: starting block offset for analysis beginning + * + * This function tries to determine an item with different that @blk_state in + * @value starting from @start_off. + * + * RETURN: + * [success] - found block offset. + * [failure] - BITS_PER_BYTE. + */ +static inline +u8 GET_FIRST_DIFF_STATE(u8 *value, int blk_state, u8 start_off) +{ + u8 i; + u8 bits_off; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!value); + BUG_ON(start_off >= (BITS_PER_BYTE / SSDFS_BLK_STATE_BITS)); +#endif /* CONFIG_SSDFS_DEBUG */ + + bits_off = start_off * SSDFS_BLK_STATE_BITS; + + for (i = bits_off; i < BITS_PER_BYTE; i += SSDFS_BLK_STATE_BITS) { + if (((*value >> i) & SSDFS_BLK_STATE_MASK) != blk_state) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("blk_state %#x, start_off %u, blk_off %u\n", + blk_state, start_off, i); +#endif /* CONFIG_SSDFS_DEBUG */ + return i / SSDFS_BLK_STATE_BITS; + } + } + + return BITS_PER_BYTE; +} + +/* + * ssdfs_find_state_area_end_in_byte() - find end block for state area in byte + * @value: pointer on analysed byte + * @blk_state: requested block's state + * @start_off: starting block offset for search + * @found_off: pointer on found end block [out] + * + * RETURN: + * [success] - @found_off contains found end offset. + * [failure] - error code: + * + * %-ENODATA - analyzed @value contains @blk_state only. + */ +static inline +int ssdfs_find_state_area_end_in_byte(u8 *value, int blk_state, + u8 start_off, u8 *found_off) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("value %p, blk_state %#x, " + "start_off %u, found_off %p\n", + value, blk_state, start_off, found_off); + + BUG_ON(!value || !found_off); + BUG_ON(start_off >= (BITS_PER_BYTE / SSDFS_BLK_STATE_BITS)); +#endif /* CONFIG_SSDFS_DEBUG */ + + *found_off = BITS_PER_BYTE; + + if (BYTE_CONTAIN_DIVERSE_STATES(value, blk_state)) { + u8 blk_offset = GET_FIRST_DIFF_STATE(value, blk_state, + start_off); + + if (blk_offset < BITS_PER_BYTE) { + *found_off = blk_offset; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("block offset %u for *NOT* state %#x\n", + *found_off, blk_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; + } + } + + return -ENODATA; +} + +/* + * ssdfs_block_bmap_find_state_area_end_in_memory() - find state area end + * @kaddr: pointer on memory range + * @blk_state: requested state of searching block + * @byte_index: index of byte in memory range [in|out] + * @search_bytes: upper bound for search + * @start_off: starting bit offset in byte + * @found_off: pointer on found end block [out] + * + * This function tries to find @blk_state area end + * in range [@start, @max_blk). + * + * RETURN: + * [success] - found byte's offset in @found_off. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ENODATA - nothing has been found. + */ +static +int ssdfs_block_bmap_find_state_area_end_in_memory(void *kaddr, + int blk_state, + u32 *byte_index, + u32 search_bytes, + u8 start_off, + u8 *found_off) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!kaddr || !byte_index || !found_off); + + if (blk_state > SSDFS_BLK_STATE_MAX) { + SSDFS_ERR("invalid block state %#x\n", blk_state); + return -EINVAL; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + for (; *byte_index < search_bytes; ++(*byte_index)) { + u8 *value = (u8 *)kaddr + *byte_index; + + err = ssdfs_find_state_area_end_in_byte(value, + blk_state, + start_off, + found_off); + if (err == -ENODATA) { + start_off = 0; + continue; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find state area's end: " + "start_off %u, blk_state %#x, " + "err %d\n", + start_off, blk_state, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("offset %u has been found for state %#x, " + "err %d\n", + *found_off, blk_state, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; + } + + return -ENODATA; +} + +/* + * ssdfs_block_bmap_find_state_area_end_in_buffer() - find state area end + * @bmap: pointer on block bitmap + * @start: start position for search + * @max_blk: upper bound for search + * @blk_state: area state + * @found_end: pointer on found end block [out] + * + * This function tries to find @blk_state area end + * in range [@start, @max_blk). + * + * RETURN: + * [success] - @found_end contains found end block. + * [failure] - items count in block bitmap or error: + * + * %-EINVAL - invalid input value. + */ +static int +ssdfs_block_bmap_find_state_area_end_in_buffer(struct ssdfs_block_bmap *bmap, + u32 start, u32 max_blk, + int blk_state, u32 *found_end) +{ + u32 aligned_start, aligned_end; + u32 items_per_byte = SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS); + u32 byte_index, search_bytes = U32_MAX; + u32 rest_bytes = U32_MAX; + u8 start_off = U8_MAX; + void *kaddr; + u8 found_off = U8_MAX; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start %u, max_blk %u, blk_state %#x\n", + start, max_blk, blk_state); + + BUG_ON(!bmap || !found_end); + + if (start >= bmap->items_count) { + SSDFS_ERR("invalid start block %u\n", start); + return -EINVAL; + } + + if (start > max_blk) { + SSDFS_ERR("start %u > max_blk %u\n", start, max_blk); + return -EINVAL; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + *found_end = U32_MAX; + + aligned_start = ALIGNED_START_BLK(start); + aligned_end = ALIGNED_END_BLK(max_blk); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("blk_state %#x, start %u, max_blk %u, " + "aligned_start %u, aligned_end %u\n", + blk_state, start, max_blk, + aligned_start, aligned_end); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_block_bmap_define_start_item(0, + start, + aligned_start, + aligned_end, + &byte_index, + &rest_bytes, + &start_off); + if (unlikely(err)) { + SSDFS_ERR("fail to define start item: " + "blk_state %#x, start %u, max_blk %u, " + "aligned_start %u, aligned_end %u\n", + blk_state, start, max_blk, + aligned_start, aligned_end); + return err; + } + + kaddr = bmap->storage.buf; + search_bytes = byte_index + rest_bytes; + + err = ssdfs_block_bmap_find_state_area_end_in_memory(kaddr, blk_state, + &byte_index, + search_bytes, + start_off, + &found_off); + if (err == -ENODATA) { + *found_end = max_blk; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("area end %u has been found for state %#x\n", + *found_end, blk_state); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find state area's end: " + "start_off %u, blk_state %#x, " + "err %d\n", + start_off, blk_state, err); + return err; + } + + *found_end = byte_index * items_per_byte; + *found_end += found_off; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start %u, aligned_start %u, " + "aligned_end %u, byte_index %u, " + "items_per_byte %u, start_off %u, " + "found_off %u\n", + start, aligned_start, aligned_end, byte_index, + items_per_byte, start_off, found_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (*found_end > max_blk) + *found_end = max_blk; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("area end %u has been found for state %#x\n", + *found_end, blk_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_block_bmap_find_state_area_end_in_pagevec() - find state area end + * @bmap: pointer on block bitmap + * @start: start position for search + * @max_blk: upper bound for search + * @blk_state: area state + * @found_end: pointer on found end block [out] + * + * This function tries to find @blk_state area end + * in range [@start, @max_blk). + * + * RETURN: + * [success] - @found_end contains found end block. + * [failure] - items count in block bitmap or error: + * + * %-EINVAL - invalid input value. + */ +static int +ssdfs_block_bmap_find_state_area_end_in_pagevec(struct ssdfs_block_bmap *bmap, + u32 start, u32 max_blk, + int blk_state, u32 *found_end) +{ + struct ssdfs_page_vector *array; + u32 aligned_start, aligned_end; + u32 items_per_byte = SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS); + size_t items_per_page = PAGE_SIZE * items_per_byte; + void *kaddr; + int page_index; + u8 found_off = U8_MAX; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start %u, max_blk %u, blk_state %#x\n", + start, max_blk, blk_state); + + BUG_ON(!bmap || !found_end); + + if (start >= bmap->items_count) { + SSDFS_ERR("invalid start block %u\n", start); + return -EINVAL; + } + + if (start > max_blk) { + SSDFS_ERR("start %u > max_blk %u\n", start, max_blk); + return -EINVAL; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + *found_end = U32_MAX; + + array = &bmap->storage.array; + + aligned_start = ALIGNED_START_BLK(start); + aligned_end = ALIGNED_END_BLK(max_blk); + + page_index = aligned_start / items_per_page; + + if (page_index >= ssdfs_page_vector_count(array)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page_index %d >= count %u\n", + page_index, + ssdfs_page_vector_count(array)); +#endif /* CONFIG_SSDFS_DEBUG */ + + *found_end = max_blk; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("area end %u has been found for state %#x\n", + *found_end, blk_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; + } + + for (; page_index < ssdfs_page_vector_count(array); page_index++) { + u32 byte_index, search_bytes = U32_MAX; + u32 rest_bytes = U32_MAX; + u8 start_off = U8_MAX; + + err = ssdfs_block_bmap_define_start_item(page_index, start, + aligned_start, + aligned_end, + &byte_index, + &rest_bytes, + &start_off); + if (unlikely(err)) { + SSDFS_ERR("fail to define start item: " + "blk_state %#x, start %u, max_blk %u, " + "aligned_start %u, aligned_end %u\n", + blk_state, start, max_blk, + aligned_start, aligned_end); + return err; + } + + search_bytes = byte_index + rest_bytes; + + kaddr = kmap_local_page(array->pages[page_index]); + err = ssdfs_block_bmap_find_state_area_end_in_memory(kaddr, + blk_state, + &byte_index, + search_bytes, + start_off, + &found_off); + kunmap_local(kaddr); + + if (err == -ENODATA) { + /* nothing has been found */ + continue; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find state area's end: " + "start_off %u, blk_state %#x, " + "err %d\n", + start_off, blk_state, err); + return err; + } + + *found_end = page_index * items_per_page; + *found_end += byte_index * items_per_byte; + *found_end += found_off; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start %u, aligned_start %u, " + "aligned_end %u, " + "page_index %d, items_per_page %zu, " + "byte_index %u, " + "items_per_byte %u, start_off %u, " + "found_off %u\n", + start, aligned_start, aligned_end, + page_index, items_per_page, byte_index, + items_per_byte, start_off, found_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (*found_end > max_blk) + *found_end = max_blk; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("area end %u has been found for state %#x\n", + *found_end, blk_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; + } + + *found_end = max_blk; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("area end %u has been found for state %#x\n", + *found_end, blk_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_block_bmap_find_state_area_end() - find state area end + * @blk_bmap: pointer on block bitmap + * @start: start position for search + * @max_blk: upper bound for search + * @blk_state: area state + * @found_end: pointer on found end block [out] + * + * This function tries to find @blk_state area end + * in range [@start, @max_blk). + * + * RETURN: + * [success] - @found_end contains found end block. + * [failure] - items count in block bitmap or error: + * + * %-EINVAL - invalid input value. + */ +static +int ssdfs_block_bmap_find_state_area_end(struct ssdfs_block_bmap *blk_bmap, + u32 start, u32 max_blk, int blk_state, + u32 *found_end) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start %u, max_blk %u, blk_state %#x\n", + start, max_blk, blk_state); + + BUG_ON(!blk_bmap || !found_end); + + if (start >= blk_bmap->items_count) { + SSDFS_ERR("invalid start block %u\n", start); + return -EINVAL; + } + + if (start > max_blk) { + SSDFS_ERR("start %u > max_blk %u\n", start, max_blk); + return -EINVAL; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + if (blk_state == SSDFS_BLK_FREE) { + *found_end = blk_bmap->items_count; + return 0; + } + + switch (blk_bmap->storage.state) { + case SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC: + err = ssdfs_block_bmap_find_state_area_end_in_pagevec(blk_bmap, + start, + max_blk, + blk_state, + found_end); + break; + + case SSDFS_BLOCK_BMAP_STORAGE_BUFFER: + err = ssdfs_block_bmap_find_state_area_end_in_buffer(blk_bmap, + start, + max_blk, + blk_state, + found_end); + break; + + default: + SSDFS_ERR("unexpected state %#x\n", + blk_bmap->storage.state); + return -ERANGE; + } + + return err; +} + +/* + * range_corrupted() - check that range is corrupted + * @blk_bmap: pointer on block bitmap + * @range: range for check + * + * RETURN: + * [true] - range is invalid + * [false] - range is valid + */ +static inline +bool range_corrupted(struct ssdfs_block_bmap *blk_bmap, + struct ssdfs_block_bmap_range *range) +{ + if (range->len > blk_bmap->items_count) + return true; + if (range->start > (blk_bmap->items_count - range->len)) + return true; + return false; +} + +/* + * is_whole_range_cached() - check that cache contains requested range + * @blk_bmap: pointer on block bitmap + * @range: range for check + * + * RETURN: + * [true] - cache contains the whole range + * [false] - cache doesn't include the whole range + */ +static +bool is_whole_range_cached(struct ssdfs_block_bmap *blk_bmap, + struct ssdfs_block_bmap_range *range) +{ + struct ssdfs_block_bmap_range cached_range; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap || !range); + + if (range_corrupted(blk_bmap, range)) { + SSDFS_ERR("invalid range (start %u, len %u); items count %zu\n", + range->start, range->len, + blk_bmap->items_count); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, range (start %u, len %u)\n", + blk_bmap, range->start, range->len); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < SSDFS_SEARCH_TYPE_MAX; i++) { + struct ssdfs_last_bmap_search *last_search; + int cmp; + + last_search = &blk_bmap->last_search[i]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("last_search.cache %lx\n", last_search->cache); +#endif /* CONFIG_SSDFS_DEBUG */ + + cached_range.start = SSDFS_FIRST_CACHED_BLOCK(last_search); + cached_range.len = SSDFS_ITEMS_PER_LONG(SSDFS_BLK_STATE_BITS); + + cmp = compare_block_bmap_ranges(&cached_range, range); + + if (cmp >= 0) + return true; + else if (ranges_have_intersection(&cached_range, range)) + return false; + } + + return false; +} + +/* + * ssdfs_set_range_in_cache() - set small range in cache + * @blk_bmap: pointer on block bitmap + * @range: requested range + * @blk_state: state for set + * + * This function sets small range in cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input value. + */ +static +int ssdfs_set_range_in_cache(struct ssdfs_block_bmap *blk_bmap, + struct ssdfs_block_bmap_range *range, + int blk_state) +{ + u32 blk, index; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap); + + if (blk_state >= SSDFS_BLK_STATE_MAX) { + SSDFS_ERR("invalid block state %#x\n", blk_state); + return -EINVAL; + } + + if (range_corrupted(blk_bmap, range)) { + SSDFS_ERR("invalid range (start %u, len %u); items count %zu\n", + range->start, range->len, + blk_bmap->items_count); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, range (start %u, len %u), state %#x\n", + blk_bmap, range->start, range->len, blk_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (index = 0; index < range->len; index++) { + blk = range->start + index; + err = ssdfs_set_block_state_in_cache(blk_bmap, blk, blk_state); + if (unlikely(err)) { + SSDFS_ERR("fail to set block %u in cache: err %d\n", + blk, err); + return err; + } + } + + return 0; +} + +/* + * ssdfs_set_uncached_tiny_range() - set tiny uncached range by state + * @blk_bmap: pointer on block bitmap + * @range: range for set + * @blk_state: state for set + * + * This function caches @range, to set @range in cache by @blk_state + * and to save the cache in pagevec. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input value. + */ +static +int ssdfs_set_uncached_tiny_range(struct ssdfs_block_bmap *blk_bmap, + struct ssdfs_block_bmap_range *range, + int blk_state) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap || !range); + + if (blk_state >= SSDFS_BLK_STATE_MAX) { + SSDFS_ERR("invalid block state %#x\n", blk_state); + return -EINVAL; + } + + if (range_corrupted(blk_bmap, range)) { + SSDFS_ERR("invalid range (start %u, len %u); items count %zu\n", + range->start, range->len, + blk_bmap->items_count); + return -EINVAL; + } + + if (range->len > SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS)) { + SSDFS_ERR("range (start %u, len %u) is not tiny\n", + range->start, range->len); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, range (start %u, len %u), state %#x\n", + blk_bmap, range->start, range->len, blk_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_cache_block_state(blk_bmap, range->start, blk_state); + if (unlikely(err)) { + SSDFS_ERR("fail to cache block %u: err %d\n", + range->start, err); + return err; + } + + err = ssdfs_set_range_in_cache(blk_bmap, range, blk_state); + if (unlikely(err)) { + SSDFS_ERR("fail to set (start %u, len %u): err %d\n", + range->start, range->len, err); + return err; + } + + err = ssdfs_save_cache_in_storage(blk_bmap); + if (unlikely(err)) { + SSDFS_ERR("fail to save cache in pagevec: err %d\n", + err); + return err; + } + + return 0; +} + +/* + * __ssdfs_set_range_in_memory() - set range of bits in memory + * @blk_bmap: pointer on block bitmap + * @page_index: index of memory page + * @byte_offset: offset in bytes from the page's beginning + * @byte_value: byte value for setting + * @init_size: size in bytes for setting + * + * This function sets range of bits in memory. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ERANGE - internal error. + */ +static +int __ssdfs_set_range_in_memory(struct ssdfs_block_bmap *blk_bmap, + int page_index, u32 byte_offset, + int byte_value, size_t init_size) +{ + struct ssdfs_page_vector *array; + void *kaddr; + int max_capacity = SSDFS_BLK_BMAP_FRAGMENTS_CHAIN_MAX; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap); + + SSDFS_DBG("blk_bmap %p, page_index %d, byte_offset %u, " + "byte_value %#x, init_size %zu\n", + blk_bmap, page_index, byte_offset, + byte_value, init_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (blk_bmap->storage.state) { + case SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC: + array = &blk_bmap->storage.array; + + if (page_index >= ssdfs_page_vector_count(array)) { + SSDFS_ERR("invalid page index %d, pagevec size %d\n", + page_index, + ssdfs_page_vector_count(array)); + return -EINVAL; + } + + if (page_index >= ssdfs_page_vector_capacity(array)) { + SSDFS_ERR("invalid page index %d, pagevec capacity %d\n", + page_index, + ssdfs_page_vector_capacity(array)); + return -EINVAL; + } + + while (page_index >= ssdfs_page_vector_count(array)) { + struct page *page; + + page = ssdfs_page_vector_allocate(array); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("unable to allocate page\n"); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + +#ifdef CONFIG_SSDFS_DEBUG + if ((byte_offset + init_size) > PAGE_SIZE) { + SSDFS_WARN("invalid offset: " + "byte_offset %u, init_size %zu\n", + byte_offset, init_size); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_memset_page(array->pages[page_index], + byte_offset, PAGE_SIZE, + byte_value, init_size); + break; + + case SSDFS_BLOCK_BMAP_STORAGE_BUFFER: + if (page_index != 0) { + SSDFS_ERR("invalid page index %d\n", + page_index); + return -EINVAL; + } + +#ifdef CONFIG_SSDFS_DEBUG + if ((byte_offset + init_size) > blk_bmap->bytes_count) { + SSDFS_WARN("invalid offset: " + "byte_offset %u, init_size %zu\n", + byte_offset, init_size); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + kaddr = blk_bmap->storage.buf; + memset((u8 *)kaddr + byte_offset, byte_value, init_size); + break; + + default: + SSDFS_ERR("unexpected state %#x\n", + blk_bmap->storage.state); + return -ERANGE; + } + + for (i = 0; i < SSDFS_SEARCH_TYPE_MAX; i++) { + blk_bmap->last_search[i].page_index = max_capacity; + blk_bmap->last_search[i].offset = U16_MAX; + blk_bmap->last_search[i].cache = 0; + } + + return 0; +} + +/* + * ssdfs_set_range_in_storage() - set range in storage by state + * @blk_bmap: pointer on block bitmap + * @range: range for set + * @blk_state: state for set + * + * This function sets @range in storage by @blk_state. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input value. + */ +static +int ssdfs_set_range_in_storage(struct ssdfs_block_bmap *blk_bmap, + struct ssdfs_block_bmap_range *range, + int blk_state) +{ + u32 aligned_start, aligned_end; + size_t items_per_byte = SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS); + int byte_value; + size_t rest_items, items_per_page; + u32 blk; + int page_index; + u32 item_offset, byte_offset; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap || !range); + + if (blk_state >= SSDFS_BLK_STATE_MAX) { + SSDFS_ERR("invalid block state %#x\n", blk_state); + return -EINVAL; + } + + if (range_corrupted(blk_bmap, range)) { + SSDFS_ERR("invalid range (start %u, len %u); items count %zu\n", + range->start, range->len, + blk_bmap->items_count); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, range (start %u, len %u), state %#x\n", + blk_bmap, range->start, range->len, blk_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + aligned_start = range->start + items_per_byte - 1; + aligned_start >>= SSDFS_BLK_STATE_BITS; + aligned_start <<= SSDFS_BLK_STATE_BITS; + + aligned_end = range->start + range->len; + aligned_end >>= SSDFS_BLK_STATE_BITS; + aligned_end <<= SSDFS_BLK_STATE_BITS; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("aligned_start %u, aligned_end %u\n", + aligned_start, aligned_end); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (range->start != aligned_start) { + struct ssdfs_block_bmap_range unaligned; + + unaligned.start = range->start; + unaligned.len = aligned_start - range->start; + + err = ssdfs_set_uncached_tiny_range(blk_bmap, &unaligned, + blk_state); + if (unlikely(err)) { + SSDFS_ERR("fail to set (start %u, len %u): err %d\n", + unaligned.start, unaligned.len, err); + return err; + } + } + + byte_value = SSDFS_BLK_BMAP_BYTE(blk_state); + items_per_page = PAGE_SIZE * items_per_byte; + rest_items = aligned_end - aligned_start; + page_index = aligned_start / items_per_page; + item_offset = aligned_start % items_per_page; + byte_offset = item_offset / items_per_byte; + + blk = aligned_start; + while (blk < aligned_end) { + size_t iter_items, init_size; + + if (rest_items == 0) { + SSDFS_WARN("unexpected items absence: blk %u\n", + blk); + break; + } + + if (byte_offset >= PAGE_SIZE) { + SSDFS_ERR("invalid byte offset %u\n", byte_offset); + return -EINVAL; + } + + iter_items = items_per_page - item_offset; + iter_items = min_t(size_t, iter_items, rest_items); + if (iter_items < items_per_page) { + init_size = iter_items + items_per_byte - 1; + init_size /= items_per_byte; + } else + init_size = PAGE_SIZE; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_per_page %zu, item_offset %u, " + "rest_items %zu, iter_items %zu, " + "init_size %zu\n", + items_per_page, item_offset, + rest_items, iter_items, + init_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_set_range_in_memory(blk_bmap, page_index, + byte_offset, byte_value, + init_size); + if (unlikely(err)) { + SSDFS_ERR("fail to set range in memory: " + "page_index %d, byte_offset %u, " + "byte_value %#x, init_size %zu, " + "err %d\n", + page_index, byte_offset, + byte_value, init_size, + err); + return err; + } + + item_offset = 0; + byte_offset = 0; + page_index++; + blk += iter_items; + rest_items -= iter_items; + }; + + if (aligned_end != range->start + range->len) { + struct ssdfs_block_bmap_range unaligned; + + unaligned.start = aligned_end; + unaligned.len = (range->start + range->len) - aligned_end; + + err = ssdfs_set_uncached_tiny_range(blk_bmap, &unaligned, + blk_state); + if (unlikely(err)) { + SSDFS_ERR("fail to set (start %u, len %u): err %d\n", + unaligned.start, unaligned.len, err); + return err; + } + } + + return 0; +} + +/* + * ssdfs_block_bmap_find_range() - find range of block of requested state + * @blk_bmap: pointer on block bitmap + * @start: start block for search + * @len: requested length of range + * @max_blk: upper bound for search + * @blk_state: requested state of blocks in range + * @range: found range [out] + * + * This function searches @range of blocks with requested + * @blk_state. If @blk_state has SSDFS_BLK_STATE_MAX value + * then it needs to get a continuous @range of blocks + * for detecting state of @range is began from @start + * block. + * + * RETURN: + * [success] - @range of found blocks. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + */ +static +int ssdfs_block_bmap_find_range(struct ssdfs_block_bmap *blk_bmap, + u32 start, u32 len, u32 max_blk, + int blk_state, + struct ssdfs_block_bmap_range *range) +{ + u32 found_start, found_end; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap || !range); + + if (blk_state > SSDFS_BLK_STATE_MAX) { + SSDFS_ERR("invalid block state %#x\n", blk_state); + return -EINVAL; + } + + if (start >= blk_bmap->items_count) { + SSDFS_ERR("invalid start block %u\n", start); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, start %u, len %u, max_blk %u, " + "state %#x, range %p\n", + blk_bmap, start, len, max_blk, blk_state, range); +#endif /* CONFIG_SSDFS_DEBUG */ + + range->start = U32_MAX; + range->len = 0; + + if (start >= max_blk) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start %u >= max_blk %u\n", start, max_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + + err = ssdfs_block_bmap_find_block(blk_bmap, start, max_blk, + blk_state, &found_start); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find block: " + "start %u, max_blk %u, state %#x, err %d\n", + start, max_blk, blk_state, err); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find block: " + "start %u, max_blk %u, state %#x, err %d\n", + start, max_blk, blk_state, err); + return err; + } + + if (found_start >= blk_bmap->items_count) { + SSDFS_ERR("invalid found start %u, items count %zu\n", + found_start, blk_bmap->items_count); + return -EINVAL; + } + + err = ssdfs_block_bmap_find_state_area_end(blk_bmap, found_start, + found_start + len, + blk_state, + &found_end); + if (unlikely(err)) { + SSDFS_ERR("fail to find block: " + "start %u, max_blk %u, state %#x, err %d\n", + start, max_blk, blk_state, err); + return err; + } + + if (found_end <= found_start) { + SSDFS_ERR("invalid found (start %u, end %u), items count %zu\n", + found_start, found_end, blk_bmap->items_count); + return -EINVAL; + } + + if (found_end > blk_bmap->items_count) + found_end = blk_bmap->items_count; + + range->start = found_start; + range->len = min_t(u32, len, found_end - found_start); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found_start %u, found_end %u, len %u, " + "range (start %u, len %u)\n", + found_start, found_end, len, + range->start, range->len); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_block_bmap_set_block_state() - set state of block + * @blk_bmap: pointer on block bitmap + * @blk: segment's block + * @blk_state: state for set + * + * This function sets @blk by @blk_state. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input value. + */ +static +int ssdfs_block_bmap_set_block_state(struct ssdfs_block_bmap *blk_bmap, + u32 blk, int blk_state) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap); + + if (blk_state >= SSDFS_BLK_STATE_MAX) { + SSDFS_ERR("invalid block state %#x\n", blk_state); + return -EINVAL; + } + + if (blk >= blk_bmap->items_count) { + SSDFS_ERR("invalid block %u\n", blk); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, block %u, state %#x\n", + blk_bmap, blk, blk_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_block_state_cached(blk_bmap, blk)) { + err = ssdfs_cache_block_state(blk_bmap, blk, blk_state); + if (unlikely(err)) { + SSDFS_ERR("unable to cache block %u state: err %d\n", + blk, err); + return err; + } + } + + err = ssdfs_set_block_state_in_cache(blk_bmap, blk, blk_state); + if (unlikely(err)) { + SSDFS_ERR("unable to set block %u state in cache: err %d\n", + blk, err); + return err; + } + + err = ssdfs_save_cache_in_storage(blk_bmap); + if (unlikely(err)) { + SSDFS_ERR("unable to save the cache in storage: err %d\n", + err); + return err; + } + + return 0; +} + +/* + * ssdfs_block_bmap_set_range() - set state of blocks' range + * @blk_bmap: pointer on block bitmap + * @range: requested range + * @blk_state: state for set + * + * This function sets blocks' @range by @blk_state. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input value. + */ +static +int ssdfs_block_bmap_set_range(struct ssdfs_block_bmap *blk_bmap, + struct ssdfs_block_bmap_range *range, + int blk_state) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap || !range); + + if (blk_state >= SSDFS_BLK_STATE_MAX) { + SSDFS_ERR("invalid block state %#x\n", blk_state); + return -EINVAL; + } + + if (range_corrupted(blk_bmap, range)) { + SSDFS_ERR("invalid range (start %u, len %u); items count %zu\n", + range->start, range->len, + blk_bmap->items_count); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, range (start %u, len %u), state %#x\n", + blk_bmap, range->start, range->len, blk_state); + + ssdfs_debug_block_bitmap(blk_bmap); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (range->len == 1) { + err = ssdfs_block_bmap_set_block_state(blk_bmap, range->start, + blk_state); + if (err) { + SSDFS_ERR("fail to set (start %u, len %u) state %#x: " + "err %d\n", + range->start, range->len, blk_state, err); + return err; + } + } else if (is_whole_range_cached(blk_bmap, range)) { + err = ssdfs_set_range_in_cache(blk_bmap, range, blk_state); + if (unlikely(err)) { + SSDFS_ERR("unable to set (start %u, len %u) state %#x " + "in cache: err %d\n", + range->start, range->len, blk_state, err); + return err; + } + + err = ssdfs_save_cache_in_storage(blk_bmap); + if (unlikely(err)) { + SSDFS_ERR("unable to save the cache in storage: " + "err %d\n", err); + return err; + } + } else { + u32 next_blk; + + err = ssdfs_set_range_in_storage(blk_bmap, range, blk_state); + if (unlikely(err)) { + SSDFS_ERR("unable to set (start %u, len %u) state %#x " + "in storage: err %d\n", + range->start, range->len, blk_state, err); + return err; + } + + next_blk = range->start + range->len; + if (next_blk == blk_bmap->items_count) + next_blk--; + + err = ssdfs_cache_block_state(blk_bmap, next_blk, blk_state); + if (unlikely(err)) { + SSDFS_ERR("unable to cache block %u state: err %d\n", + next_blk, err); + return err; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + ssdfs_debug_block_bitmap(blk_bmap); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_block_bmap_test_block() - check state of block + * @blk_bmap: pointer on block bitmap + * @blk: segment's block + * @blk_state: checked state + * + * This function checks that requested @blk has @blk_state. + * + * RETURN: + * [true] - requested @blk has @blk_state + * [false] - requested @blk hasn't @blk_state or it took place + * some failure during checking. + */ +bool ssdfs_block_bmap_test_block(struct ssdfs_block_bmap *blk_bmap, + u32 blk, int blk_state) +{ + u32 found; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap); + + if (blk_state >= SSDFS_BLK_STATE_MAX) { + SSDFS_ERR("invalid block state %#x\n", blk_state); + return false; + } + + if (blk >= blk_bmap->items_count) { + SSDFS_ERR("invalid block %u\n", blk); + return false; + } + + BUG_ON(!mutex_is_locked(&blk_bmap->lock)); + + SSDFS_DBG("blk_bmap %p, block %u, state %#x\n", + blk_bmap, blk, blk_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + BUG_ON(!is_block_bmap_initialized(blk_bmap)); + + err = ssdfs_block_bmap_find_block(blk_bmap, blk, blk + 1, blk_state, + &found); + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find block %u, state %#x, err %d\n", + blk, blk_state, err); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + return (found != blk) ? false : true; +} + +/* + * ssdfs_block_bmap_test_range() - check state of blocks' range + * @blk_bmap: pointer on block bitmap + * @range: segment's blocks' range + * @blk_state: checked state + * + * This function checks that all blocks in requested @range have + * @blk_state. + * + * RETURN: + * [true] - all blocks in requested @range have @blk_state + * [false] - requested @range contains blocks with various states or + * it took place some failure during checking. + */ +bool ssdfs_block_bmap_test_range(struct ssdfs_block_bmap *blk_bmap, + struct ssdfs_block_bmap_range *range, + int blk_state) +{ + struct ssdfs_block_bmap_range found; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap || !range); + + if (blk_state >= SSDFS_BLK_STATE_MAX) { + SSDFS_ERR("invalid block state %#x\n", blk_state); + return false; + } + + if (range_corrupted(blk_bmap, range)) { + SSDFS_ERR("invalid range (start %u, len %u); items count %zu\n", + range->start, range->len, + blk_bmap->items_count); + return false; + } + + BUG_ON(!mutex_is_locked(&blk_bmap->lock)); + + SSDFS_DBG("blk_bmap %p, range (start %u, len %u), state %#x\n", + blk_bmap, range->start, range->len, blk_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + BUG_ON(!is_block_bmap_initialized(blk_bmap)); + + err = ssdfs_block_bmap_find_range(blk_bmap, range->start, range->len, + range->start + range->len, + blk_state, &found); + if (unlikely(err)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find range: err %d\n", err); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + if (compare_block_bmap_ranges(&found, range) == 0) + return true; + + return false; +} + +/* + * ssdfs_get_block_state() - detect state of block + * @blk_bmap: pointer on block bitmap + * @blk: segment's block + * + * This function retrieve state of @blk from block bitmap. + * + * RETURN: + * [success] - state of block + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ENODATA - requsted @blk hasn't been found. + * %-ENOENT - block bitmap doesn't initialized. + */ +int ssdfs_get_block_state(struct ssdfs_block_bmap *blk_bmap, u32 blk) +{ + u32 found; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap); + + if (blk >= blk_bmap->items_count) { + SSDFS_ERR("invalid block %u\n", blk); + return -EINVAL; + } + + if (!mutex_is_locked(&blk_bmap->lock)) { + SSDFS_WARN("block bitmap mutex should be locked\n"); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, block %u\n", blk_bmap, blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_block_bmap_initialized(blk_bmap)) { + SSDFS_WARN("block bitmap hasn't been initialized\n"); + return -ENOENT; + } + + err = ssdfs_block_bmap_find_block(blk_bmap, blk, blk + 1, + SSDFS_BLK_STATE_MAX, + &found); + if (err) { + SSDFS_ERR("fail to find block %u, err %d\n", + blk, err); + return err; + } + + if (found != blk) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found (%u) != blk (%u)\n", found, blk); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + + return ssdfs_get_block_state_from_cache(blk_bmap, blk); +} + +/* + * ssdfs_get_range_state() - detect state of blocks' range + * @blk_bmap: pointer on block bitmap + * @range: pointer on blocks' range + * + * This function retrieve state of @range from block bitmap. + * + * RETURN: + * [success] - state of blocks' range + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-EOPNOTSUPP - requsted @range contains various state of blocks. + * %-ENOENT - block bitmap doesn't initialized. + */ +int ssdfs_get_range_state(struct ssdfs_block_bmap *blk_bmap, + struct ssdfs_block_bmap_range *range) +{ + struct ssdfs_block_bmap_range found; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap || !range); + + if (range_corrupted(blk_bmap, range)) { + SSDFS_ERR("invalid range: start %u, len %u; items count %zu\n", + range->start, range->len, + blk_bmap->items_count); + return -EINVAL; + } + + if (!mutex_is_locked(&blk_bmap->lock)) { + SSDFS_WARN("block bitmap mutex should be locked\n"); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, range (start %u, len %u)\n", + blk_bmap, range->start, range->len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_block_bmap_initialized(blk_bmap)) { + SSDFS_WARN("block bitmap hasn't been initialized\n"); + return -ENOENT; + } + + err = ssdfs_block_bmap_find_range(blk_bmap, range->start, range->len, + range->start + range->len, + SSDFS_BLK_STATE_MAX, &found); + if (unlikely(err)) { + SSDFS_ERR("fail to find range: err %d\n", err); + return err; + } + + if (compare_block_bmap_ranges(&found, range) != 0) { + SSDFS_ERR("range contains various state of blocks\n"); + return -EOPNOTSUPP; + } + + err = ssdfs_cache_block_state(blk_bmap, range->start, + SSDFS_BLK_STATE_MAX); + if (unlikely(err)) { + SSDFS_ERR("fail to cache block %u: err %d\n", + range->start, err); + return err; + } + + return ssdfs_get_block_state_from_cache(blk_bmap, range->start); +} + +/* + * ssdfs_block_bmap_reserve_metadata_pages() - reserve metadata pages + * @blk_bmap: pointer on block bitmap + * @count: count of reserved metadata pages + * + * This function tries to reserve @count of metadata pages in + * block bitmap's space. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ENOENT - block bitmap doesn't initialized. + */ +int ssdfs_block_bmap_reserve_metadata_pages(struct ssdfs_block_bmap *blk_bmap, + u32 count) +{ + u32 reserved_items; + u32 calculated_items; + int free_pages = 0; + int used_pages = 0; + int invalid_pages = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap); + + if (!mutex_is_locked(&blk_bmap->lock)) { + SSDFS_WARN("block bitmap mutex should be locked\n"); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, count %u\n", + blk_bmap, count); + SSDFS_DBG("items_count %zu, used_blks %u, " + "metadata_items %u\n", + blk_bmap->items_count, + blk_bmap->used_blks, + blk_bmap->metadata_items); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_block_bmap_initialized(blk_bmap)) { + SSDFS_WARN("block bitmap hasn't been initialized\n"); + return -ENOENT; + } + + err = ssdfs_block_bmap_get_free_pages(blk_bmap); + if (unlikely(err < 0)) { + SSDFS_ERR("fail to get free pages: err %d\n", err); + return err; + } else { + free_pages = err; + err = 0; + } + + if (free_pages < count) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to reserve metadata pages: " + "free_pages %d, count %u\n", + free_pages, count); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } + + err = ssdfs_block_bmap_get_used_pages(blk_bmap); + if (unlikely(err < 0)) { + SSDFS_ERR("fail to get used pages: err %d\n", err); + return err; + } else { + used_pages = err; + err = 0; + } + + err = ssdfs_block_bmap_get_invalid_pages(blk_bmap); + if (unlikely(err < 0)) { + SSDFS_ERR("fail to get invalid pages: err %d\n", err); + return err; + } else { + invalid_pages = err; + err = 0; + } + + reserved_items = blk_bmap->metadata_items + count; + calculated_items = used_pages + invalid_pages + reserved_items; + if (calculated_items > blk_bmap->items_count) { + SSDFS_ERR("fail to reserve metadata pages: " + "used_pages %d, invalid_pages %d, " + "metadata_items %u, " + "count %u, items_count %zu\n", + used_pages, invalid_pages, + blk_bmap->metadata_items, + count, blk_bmap->items_count); + return -EINVAL; + } + + blk_bmap->metadata_items += count; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(blk_bmap->metadata_items == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_block_bmap_free_metadata_pages() - free metadata pages + * @blk_bmap: pointer on block bitmap + * @count: count of metadata pages for freeing + * + * This function tries to free @count of metadata pages in + * block bitmap's space. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ENOENT - block bitmap doesn't initialized. + * %-ERANGE - internal error. + */ +int ssdfs_block_bmap_free_metadata_pages(struct ssdfs_block_bmap *blk_bmap, + u32 count) +{ + u32 metadata_items; + u32 freed_items; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap); + + if (!mutex_is_locked(&blk_bmap->lock)) { + SSDFS_WARN("block bitmap mutex should be locked\n"); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, count %u\n", + blk_bmap, count); + SSDFS_DBG("items_count %zu, used_blks %u, " + "metadata_items %u\n", + blk_bmap->items_count, + blk_bmap->used_blks, + blk_bmap->metadata_items); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_block_bmap_initialized(blk_bmap)) { + SSDFS_WARN("block bitmap hasn't been initialized\n"); + return -ENOENT; + } + + metadata_items = blk_bmap->metadata_items; + freed_items = count; + + if (blk_bmap->metadata_items < count) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("correct value: metadata_items %u < count %u\n", + blk_bmap->metadata_items, count); +#endif /* CONFIG_SSDFS_DEBUG */ + freed_items = blk_bmap->metadata_items; + } + + blk_bmap->metadata_items -= freed_items; + +#ifdef CONFIG_SSDFS_DEBUG + if (blk_bmap->metadata_items == 0) { + SSDFS_ERR("BEFORE: metadata_items %u, count %u, " + "items_count %zu, used_blks %u, " + "invalid_blks %u\n", + metadata_items, count, + blk_bmap->items_count, + blk_bmap->used_blks, + blk_bmap->invalid_blks); + SSDFS_ERR("AFTER: metadata_items %u, freed_items %u\n", + blk_bmap->metadata_items, freed_items); + } + BUG_ON(blk_bmap->metadata_items == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_block_bmap_get_free_pages() - determine current free pages count + * @blk_bmap: pointer on block bitmap + * + * This function tries to detect current free pages count + * in block bitmap. + * + * RETURN: + * [success] - count of free pages. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ERANGE - invalid internal calculations. + * %-ENOENT - block bitmap doesn't initialized. + */ +int ssdfs_block_bmap_get_free_pages(struct ssdfs_block_bmap *blk_bmap) +{ + u32 found_blk; + u32 used_blks; + u32 metadata_items; + u32 invalid_blks; + int free_blks; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap); + + if (!mutex_is_locked(&blk_bmap->lock)) { + SSDFS_WARN("block bitmap mutex should be locked\n"); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p\n", blk_bmap); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_block_bmap_initialized(blk_bmap)) { + SSDFS_WARN("block bitmap hasn't been initialized\n"); + return -ENOENT; + } + + if (is_cache_invalid(blk_bmap, SSDFS_BLK_FREE)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cache for free states is invalid!!!\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_block_bmap_find_block(blk_bmap, + 0, blk_bmap->items_count, + SSDFS_BLK_FREE, &found_blk); + } else + err = ssdfs_define_last_free_page(blk_bmap, &found_blk); + + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find last free block: " + "found_blk %u\n", + found_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to find last free block: err %d\n", + err); + return err; + } + + used_blks = blk_bmap->used_blks; + metadata_items = blk_bmap->metadata_items; + invalid_blks = blk_bmap->invalid_blks; + + free_blks = blk_bmap->items_count; + free_blks -= used_blks + metadata_items + invalid_blks; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_count %zu, used_blks %u, " + "invalid_blks %u, " + "metadata_items %u, free_blks %d\n", + blk_bmap->items_count, + used_blks, invalid_blks, metadata_items, + free_blks); + + if (unlikely(found_blk > blk_bmap->items_count)) { + SSDFS_ERR("found_blk %u > items_count %zu\n", + found_blk, blk_bmap->items_count); + return -ERANGE; + } + + WARN_ON(INT_MAX < (blk_bmap->items_count - found_blk)); + + if (unlikely((used_blks + metadata_items) > blk_bmap->items_count)) { + SSDFS_ERR("used_blks %u, metadata_items %u, " + "items_count %zu\n", + used_blks, metadata_items, + blk_bmap->items_count); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + if (free_blks < 0) { + SSDFS_ERR("items_count %zu, used_blks %u, " + "invalid_blks %u, " + "metadata_items %u, free_blks %d\n", + blk_bmap->items_count, + used_blks, invalid_blks, metadata_items, + free_blks); + return -ERANGE; + } + + return free_blks; +} + +/* + * ssdfs_block_bmap_get_used_pages() - determine current used pages count + * @blk_bmap: pointer on block bitmap + * + * This function tries to detect current used pages count + * in block bitmap. + * + * RETURN: + * [success] - count of used pages. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ERANGE - invalid internal calculations. + * %-ENOENT - block bitmap doesn't initialized. + */ +int ssdfs_block_bmap_get_used_pages(struct ssdfs_block_bmap *blk_bmap) +{ + u32 found_blk; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap); + + if (!mutex_is_locked(&blk_bmap->lock)) { + SSDFS_WARN("block bitmap mutex should be locked\n"); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p\n", blk_bmap); + SSDFS_DBG("items_count %zu, used_blks %u, " + "metadata_items %u, invalid_blks %u\n", + blk_bmap->items_count, + blk_bmap->used_blks, + blk_bmap->metadata_items, + blk_bmap->invalid_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_block_bmap_initialized(blk_bmap)) { + SSDFS_WARN("block bitmap hasn't been initialized\n"); + return -ENOENT; + } + + err = ssdfs_define_last_free_page(blk_bmap, &found_blk); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find last free block: " + "found_blk %u\n", + found_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to find last free block: err %d\n", + err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + if (unlikely(found_blk > blk_bmap->items_count)) { + SSDFS_ERR("found_blk %u > items_count %zu\n", + found_blk, blk_bmap->items_count); + return -ERANGE; + } + + if (unlikely(blk_bmap->used_blks > blk_bmap->items_count)) { + SSDFS_ERR("used_blks %u > items_count %zu\n", + blk_bmap->used_blks, + blk_bmap->items_count); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + return blk_bmap->used_blks; +} + +/* + * ssdfs_block_bmap_get_invalid_pages() - determine current invalid pages count + * @blk_bmap: pointer on block bitmap + * + * This function tries to detect current invalid pages count + * in block bitmap. + * + * RETURN: + * [success] - count of invalid pages. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ENOENT - block bitmap doesn't initialized. + */ +int ssdfs_block_bmap_get_invalid_pages(struct ssdfs_block_bmap *blk_bmap) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap); + + if (!mutex_is_locked(&blk_bmap->lock)) { + SSDFS_WARN("block bitmap mutex should be locked\n"); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p\n", blk_bmap); + SSDFS_DBG("items_count %zu, used_blks %u, " + "metadata_items %u, invalid_blks %u\n", + blk_bmap->items_count, + blk_bmap->used_blks, + blk_bmap->metadata_items, + blk_bmap->invalid_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_block_bmap_initialized(blk_bmap)) { + SSDFS_WARN("block bitmap hasn't been initialized\n"); + return -ENOENT; + } + + return blk_bmap->invalid_blks; +} From patchwork Sat Feb 25 01:08:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151915 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0DDBC7EE2E for ; Sat, 25 Feb 2023 01:16:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229698AbjBYBQ2 (ORCPT ); Fri, 24 Feb 2023 20:16:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48486 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229663AbjBYBQE (ORCPT ); Fri, 24 Feb 2023 20:16:04 -0500 Received: from mail-oi1-x22d.google.com (mail-oi1-x22d.google.com [IPv6:2607:f8b0:4864:20::22d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 596BA125AF for ; Fri, 24 Feb 2023 17:16:01 -0800 (PST) Received: by mail-oi1-x22d.google.com with SMTP id e21so828311oie.1 for ; Fri, 24 Feb 2023 17:16:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wTAo84ZIoMyDtrXC/1nqUKH+RkTcJSj7ZK53l7rSU7g=; b=24eKXTtqMucgXfw284B57M7L4b8IL6FVx0tB5GQM7H/K/deavsFeAQ4Iu8Pugvsipk HiGc0JCPgC3Ynoc8MEnYnro6j2XZNZNzy/R/fJyxpOR0+Iav0KyXgYV1uUp2hL/dhA2a 0OLAKQZVfpbJouqmG/3ReeL6ZoNzA/Pz3jHuca/YvxVVx+qhVQC68D7eYzbmrYdH9GrL RCbLcNcWyscz1sbsDjE7E0/tzClF7KHvvaar06WHjbnvqSlrwNzRdYOqAxfCprXZ+kDK RC7RZx3Kgz0Cii5URLrOKnu/z9VtQZFWWf1bAUQ0BjmbFbKWo0rseZz3n46NThgcVVfL dSqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wTAo84ZIoMyDtrXC/1nqUKH+RkTcJSj7ZK53l7rSU7g=; b=pc8fMdYEq3WFtFkp9cgfjpUWoRSkEVuChRtblGX7CJk25OJe/nR314c69uaNEsw3Db FbHKoVSHsPIdAFa7nPGVYAQVjfY9mH/lh+6lYIqJNvDYmiwLwM4HjNoCPof8sD06fzew OHCDeCP9iJGmgLywX0nXLCgt5WU15yVjbjxg9xXt4FlGXbz5vVmb8kR4zjpIXjGoq/1J DEQhFenjZgwYtkbcSVqxVdl9aZJbUQj3t6iVMpY3pi3w3Acik3fxohTltis6zMyzsB3F bXxbOn2nzx6DhpL8weM3+RkKxz8tOvoQW0433OWw3/kIfaTY321Ulnj/rEk/LlIyos8U 3oWw== X-Gm-Message-State: AO0yUKUMuN4+gbna50WxoErXjNGYwe1RoMNzEUPCbszICqvKz7+SJ6JO FsIJLJICjPv1etv2jdwAyS9l6MT7qnaeZM8A X-Google-Smtp-Source: AK7set/2lnwsMF4NCoa8bNxbdx260IZ+9yUpVuZ0FpOXcBnqzdmoxlcTJ9z0QK+loVoTldfEKFiEdw== X-Received: by 2002:aca:1206:0:b0:383:fc9b:fb4f with SMTP id 6-20020aca1206000000b00383fc9bfb4fmr1786911ois.53.1677287759862; Fri, 24 Feb 2023 17:15:59 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.15.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:15:58 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 12/76] ssdfs: block bitmap modification operations implementation Date: Fri, 24 Feb 2023 17:08:23 -0800 Message-Id: <20230225010927.813929-13-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This patch implements block bitmap's modification operations: pre_allocate - pre_allocate logical block or range of blocks allocate - allocate logical block or range of blocks invalidate - invalidate logical block or range of blocks collect_garbage - get contigous range of blocks in state clean - convert the whole block bitmap into clean state Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/block_bitmap.c | 703 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 703 insertions(+) diff --git a/fs/ssdfs/block_bitmap.c b/fs/ssdfs/block_bitmap.c index 3e3ddb6ff745..258d3b3856e1 100644 --- a/fs/ssdfs/block_bitmap.c +++ b/fs/ssdfs/block_bitmap.c @@ -4608,3 +4608,706 @@ int ssdfs_block_bmap_get_invalid_pages(struct ssdfs_block_bmap *blk_bmap) return blk_bmap->invalid_blks; } + +/* + * ssdfs_block_bmap_pre_allocate() - pre-allocate segment's range of blocks + * @blk_bmap: pointer on block bitmap + * @start: starting block for search + * @len: pointer on variable with requested length of range + * @range: pointer on blocks' range [in | out] + * + * This function tries to find contiguous range of free blocks and + * to set the found range in pre-allocated state. + * + * If pointer @len is NULL then it needs: + * (1) check that requested range contains free blocks only; + * (2) set the requested range of blocks in pre-allocated state. + * + * Otherwise, if pointer @len != NULL then it needs: + * (1) find the range of free blocks of requested length or lesser; + * (2) set the found range of blocks in pre-allocated state. + * + * RETURN: + * [success] - @range of pre-allocated blocks. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ENOENT - block bitmap doesn't initialized. + * %-ENOSPC - block bitmap hasn't free blocks. + */ +int ssdfs_block_bmap_pre_allocate(struct ssdfs_block_bmap *blk_bmap, + u32 start, u32 *len, + struct ssdfs_block_bmap_range *range) +{ + int free_pages; + u32 used_blks = 0; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap || !range); + if (!mutex_is_locked(&blk_bmap->lock)) { + SSDFS_WARN("block bitmap mutex should be locked\n"); + return -EINVAL; + } +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("start %u, len %p\n", + start, len); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (len) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("blk_bmap %p, start %u, len %u\n", + blk_bmap, start, *len); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("blk_bmap %p, range (start %u, len %u)\n", + blk_bmap, range->start, range->len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (range_corrupted(blk_bmap, range)) { + SSDFS_ERR("invalid range: start %u, len %u; " + "items count %zu\n", + range->start, range->len, + blk_bmap->items_count); + return -EINVAL; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_count %zu, used_blks %u, " + "metadata_items %u, invalid_blks %u\n", + blk_bmap->items_count, + blk_bmap->used_blks, + blk_bmap->metadata_items, + blk_bmap->invalid_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_block_bmap_initialized(blk_bmap)) { + SSDFS_WARN("block bitmap hasn't been initialized\n"); + return -ENOENT; + } + + err = ssdfs_block_bmap_get_free_pages(blk_bmap); + if (unlikely(err < 0)) { + SSDFS_ERR("fail to get free pages: err %d\n", err); + return err; + } else { + free_pages = err; + err = 0; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_pages %d\n", free_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (len) { + u32 max_blk = blk_bmap->items_count - blk_bmap->metadata_items; + u32 start_blk = 0; + + if (free_pages < *len) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to pre_allocate: " + "free_pages %d, count %u\n", + free_pages, *len); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } + + if (!is_cache_invalid(blk_bmap, SSDFS_BLK_FREE)) { + err = ssdfs_define_last_free_page(blk_bmap, &start_blk); + if (err) { + SSDFS_ERR("fail to define start block: " + "err %d\n", + err); + return err; + } + } + + start_blk = max_t(u32, start_blk, start); + + err = ssdfs_block_bmap_find_range(blk_bmap, start_blk, *len, + max_blk, + SSDFS_BLK_FREE, range); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find free blocks: " + "start_blk %u, max_blk %u, len %u\n", + start_blk, max_blk, *len); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } else if (err) { + SSDFS_ERR("fail to find free blocks: err %d\n", err); + return err; + } + } else { + if (free_pages < range->len) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to pre_allocate: " + "free_pages %d, count %u\n", + free_pages, range->len); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } + + if (!is_range_free(blk_bmap, range)) { + SSDFS_ERR("range (start %u, len %u) is not free\n", + range->start, range->len); + return -EINVAL; + } + } + + used_blks = (u32)blk_bmap->used_blks + range->len; + + if (used_blks > blk_bmap->items_count) { + SSDFS_ERR("invalid used blocks count: " + "used_blks %u, items_count %zu\n", + used_blks, + blk_bmap->items_count); + return -ERANGE; + } + + err = ssdfs_block_bmap_set_range(blk_bmap, range, + SSDFS_BLK_PRE_ALLOCATED); + if (unlikely(err)) { + SSDFS_ERR("fail to set range (start %u, len %u): err %d\n", + range->start, range->len, err); + return err; + } + + blk_bmap->used_blks += range->len; + + set_block_bmap_dirty(blk_bmap); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("range (start %u, len %u) has been pre-allocated\n", + range->start, range->len); +#else + SSDFS_DBG("range (start %u, len %u) has been pre-allocated\n", + range->start, range->len); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} + +/* + * ssdfs_block_bmap_allocate() - allocate segment's range of blocks + * @blk_bmap: pointer on block bitmap + * @start: starting block for search + * @len: pointer on variable with requested length of range + * @range: pointer on blocks' range [in | out] + * + * This function tries to find contiguous range of free + * (or pre-allocated) blocks and to set the found range in + * valid state. + * + * If pointer @len is NULL then it needs: + * (1) check that requested range contains free or pre-allocated blocks; + * (2) set the requested range of blocks in valid state. + * + * Otherwise, if pointer @len != NULL then it needs: + * (1) find the range of free blocks of requested length or lesser; + * (2) set the found range of blocks in valid state. + * + * RETURN: + * [success] - @range of valid blocks. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ENOENT - block bitmap doesn't initialized. + * %-ENOSPC - block bitmap hasn't free blocks. + */ +int ssdfs_block_bmap_allocate(struct ssdfs_block_bmap *blk_bmap, + u32 start, u32 *len, + struct ssdfs_block_bmap_range *range) +{ + int state = SSDFS_BLK_FREE; + int free_pages; + u32 used_blks = 0; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap || !range); + if (!mutex_is_locked(&blk_bmap->lock)) { + SSDFS_WARN("block bitmap mutex should be locked\n"); + return -EINVAL; + } +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("start %u, len %p\n", + start, len); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (len) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("blk_bmap %p, start %u, len %u\n", + blk_bmap, start, *len); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("blk_bmap %p, range (start %u, len %u)\n", + blk_bmap, range->start, range->len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (range_corrupted(blk_bmap, range)) { + SSDFS_ERR("invalid range: start %u, len %u; " + "items count %zu\n", + range->start, range->len, + blk_bmap->items_count); + return -EINVAL; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_count %zu, used_blks %u, " + "metadata_items %u\n", + blk_bmap->items_count, + blk_bmap->used_blks, + blk_bmap->metadata_items); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_block_bmap_initialized(blk_bmap)) { + SSDFS_WARN("block bitmap hasn't been initialized\n"); + return -ENOENT; + } + + err = ssdfs_block_bmap_get_free_pages(blk_bmap); + if (unlikely(err < 0)) { + SSDFS_ERR("fail to get free pages: err %d\n", err); + return err; + } else { + free_pages = err; + err = 0; + } + + if (len) { + u32 max_blk = blk_bmap->items_count - blk_bmap->metadata_items; + u32 start_blk = 0; + + if (free_pages < *len) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to allocate: " + "free_pages %d, count %u\n", + free_pages, *len); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } + + if (!is_cache_invalid(blk_bmap, SSDFS_BLK_FREE)) { + err = ssdfs_define_last_free_page(blk_bmap, &start_blk); + if (err) { + SSDFS_ERR("fail to define start block: " + "err %d\n", + err); + return err; + } + } + + start_blk = max_t(u32, start_blk, start); + + err = ssdfs_block_bmap_find_range(blk_bmap, start_blk, *len, + max_blk, SSDFS_BLK_FREE, + range); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find free blocks: " + "start_blk %u, max_blk %u, len %u\n", + start_blk, max_blk, *len); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } else if (err) { + SSDFS_ERR("fail to find free blocks: err %d\n", err); + return err; + } + } else { + state = ssdfs_get_range_state(blk_bmap, range); + + if (state < 0) { + SSDFS_ERR("fail to get range " + "(start %u, len %u) state: err %d\n", + range->start, range->len, state); + return state; + } + + if (state != SSDFS_BLK_FREE && + state != SSDFS_BLK_PRE_ALLOCATED) { + SSDFS_ERR("range (start %u, len %u), state %#x, " + "can't be allocated\n", + range->start, range->len, state); + return -EINVAL; + } + } + + err = ssdfs_block_bmap_set_range(blk_bmap, range, + SSDFS_BLK_VALID); + if (unlikely(err)) { + SSDFS_ERR("fail to set range (start %u, len %u): " + "err %d\n", + range->start, range->len, err); + return err; + } + + if (state == SSDFS_BLK_FREE) { + used_blks = (u32)blk_bmap->used_blks + range->len; + + if (used_blks > blk_bmap->items_count) { + SSDFS_ERR("invalid used blocks count: " + "used_blks %u, items_count %zu\n", + used_blks, + blk_bmap->items_count); + return -ERANGE; + } + + blk_bmap->used_blks += range->len; + } + + set_block_bmap_dirty(blk_bmap); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("range (start %u, len %u) has been allocated\n", + range->start, range->len); +#else + SSDFS_DBG("range (start %u, len %u) has been allocated\n", + range->start, range->len); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} + +/* + * ssdfs_block_bmap_invalidate() - invalidate segment's range of blocks + * @blk_bmap: pointer on block bitmap + * @len: pointer on variable with requested length of range + * @range: pointer on blocks' range [in | out] + * + * This function tries to set the requested range of blocks in + * invalid state. At first, it checks that requested range contains + * valid blocks only. And, then, it sets the requested range of blocks + * in invalid state. + * + * RETURN: + * [success] - @range of invalid blocks. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ENOENT - block bitmap doesn't initialized. + */ +int ssdfs_block_bmap_invalidate(struct ssdfs_block_bmap *blk_bmap, + struct ssdfs_block_bmap_range *range) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap || !range); + if (!mutex_is_locked(&blk_bmap->lock)) { + SSDFS_WARN("block bitmap mutex should be locked\n"); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, range (start %u, len %u)\n", + blk_bmap, range->start, range->len); + SSDFS_DBG("items_count %zu, used_blks %u, " + "metadata_items %u, invalid_blks %u\n", + blk_bmap->items_count, + blk_bmap->used_blks, + blk_bmap->metadata_items, + blk_bmap->invalid_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("range (start %u, len %u)\n", + range->start, range->len); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (!is_block_bmap_initialized(blk_bmap)) { + SSDFS_WARN("block bitmap hasn't been initialized\n"); + return -ENOENT; + } + +#ifdef CONFIG_SSDFS_DEBUG + ssdfs_debug_block_bitmap(blk_bmap); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (range_corrupted(blk_bmap, range)) { + SSDFS_ERR("invalid range (start %u, len %u); items count %zu\n", + range->start, range->len, blk_bmap->items_count); + return -EINVAL; + } + + if (!is_range_valid(blk_bmap, range) && + !is_range_pre_allocated(blk_bmap, range)) { + SSDFS_ERR("range (start %u, len %u) contains not valid blocks\n", + range->start, range->len); + return -EINVAL; + } + + err = ssdfs_block_bmap_set_range(blk_bmap, range, + SSDFS_BLK_INVALID); + if (unlikely(err)) { + SSDFS_ERR("fail to set range (start %u, len %u): err %d\n", + range->start, range->len, err); + return err; + } + + blk_bmap->invalid_blks += range->len; + + if (range->len > blk_bmap->used_blks) { + SSDFS_ERR("invalid range len: " + "range_len %u, used_blks %u, items_count %zu\n", + range->len, + blk_bmap->used_blks, + blk_bmap->items_count); + return -ERANGE; + } else + blk_bmap->used_blks -= range->len; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_count %zu, used_blks %u, " + "metadata_items %u, invalid_blks %u\n", + blk_bmap->items_count, + blk_bmap->used_blks, + blk_bmap->metadata_items, + blk_bmap->invalid_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + set_block_bmap_dirty(blk_bmap); + +#ifdef CONFIG_SSDFS_DEBUG + ssdfs_debug_block_bitmap(blk_bmap); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("range (start %u, len %u) has been invalidated\n", + range->start, range->len); +#else + SSDFS_DBG("range (start %u, len %u) has been invalidated\n", + range->start, range->len); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} + +/* + * ssdfs_block_bmap_collect_garbage() - find range of valid blocks for GC + * @blk_bmap: pointer on block bitmap + * @start: starting position for search + * @max_len: maximum requested length of valid blocks' range + * @blk_state: requested block state (pre-allocated or valid) + * @range: pointer on blocks' range [out] + * + * This function tries to find range of valid blocks for GC. + * The length of requested range is limited by @max_len. + * + * RETURN: + * [success] - @range of invalid blocks. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ENOENT - block bitmap doesn't initialized. + * %-ENODATA - requested range hasn't valid blocks. + */ +int ssdfs_block_bmap_collect_garbage(struct ssdfs_block_bmap *blk_bmap, + u32 start, u32 max_len, + int blk_state, + struct ssdfs_block_bmap_range *range) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap || !range); + if (!mutex_is_locked(&blk_bmap->lock)) { + SSDFS_WARN("block bitmap mutex should be locked\n"); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p, start %u, max_len %u\n", + blk_bmap, start, max_len); + SSDFS_DBG("items_count %zu, used_blks %u, " + "metadata_items %u, invalid_blks %u\n", + blk_bmap->items_count, + blk_bmap->used_blks, + blk_bmap->metadata_items, + blk_bmap->invalid_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("start %u, max_len %u, blk_state %#x\n", + start, max_len, blk_state); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (!is_block_bmap_initialized(blk_bmap)) { + SSDFS_WARN("block bitmap hasn't been initialized\n"); + return -ENOENT; + } + +#ifdef CONFIG_SSDFS_DEBUG + ssdfs_debug_block_bitmap(blk_bmap); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (blk_state) { + case SSDFS_BLK_PRE_ALLOCATED: + case SSDFS_BLK_VALID: + /* valid block state */ + break; + + default: + SSDFS_ERR("invalid block state: %#x\n", + blk_state); + return -EINVAL; + }; + + err = ssdfs_block_bmap_find_range(blk_bmap, start, max_len, max_len, + blk_state, range); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("range (start %u, len %u) hasn't valid blocks\n", + start, max_len); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (err) { + SSDFS_ERR("fail to find valid blocks: err %d\n", err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("range (start %u, len %u) has been collected as garbage\n", + range->start, range->len); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("range (start %u, len %u) has been collected as garbage\n", + range->start, range->len); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} + +/* + * ssdfs_block_bmap_clean() - set all blocks as free/clean + * @blk_bmap: pointer on block bitmap + * + * This function tries to clean the whole bitmap. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ENOENT - block bitmap doesn't initialized. + */ +int ssdfs_block_bmap_clean(struct ssdfs_block_bmap *blk_bmap) +{ + struct ssdfs_block_bmap_range range; + int max_capacity = SSDFS_BLK_BMAP_FRAGMENTS_CHAIN_MAX; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!blk_bmap); + if (!mutex_is_locked(&blk_bmap->lock)) { + SSDFS_WARN("block bitmap mutex should be locked\n"); + return -EINVAL; + } + + SSDFS_DBG("blk_bmap %p\n", blk_bmap); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_block_bmap_initialized(blk_bmap)) { + SSDFS_WARN("block bitmap hasn't been initialized\n"); + return -ENOENT; + } + + blk_bmap->metadata_items = 0; + blk_bmap->used_blks = 0; + blk_bmap->invalid_blks = 0; + + for (i = 0; i < SSDFS_SEARCH_TYPE_MAX; i++) { + blk_bmap->last_search[i].page_index = max_capacity; + blk_bmap->last_search[i].offset = U16_MAX; + blk_bmap->last_search[i].cache = 0; + } + + range.start = 0; + range.len = blk_bmap->items_count; + + err = ssdfs_set_range_in_storage(blk_bmap, &range, SSDFS_BLK_FREE); + if (unlikely(err)) { + SSDFS_ERR("fail to clean block bmap: " + "range (start %u, len %u), " + "err %d\n", + range.start, range.len, err); + return err; + } + + err = ssdfs_cache_block_state(blk_bmap, 0, SSDFS_BLK_FREE); + if (unlikely(err)) { + SSDFS_ERR("fail to cache last free page: err %d\n", + err); + return err; + } + + return 0; +} + +#ifdef CONFIG_SSDFS_DEBUG +static +void ssdfs_debug_block_bitmap(struct ssdfs_block_bmap *bmap) +{ + struct ssdfs_page_vector *array; + struct page *page; + void *kaddr; + int i; + + BUG_ON(!bmap); + + SSDFS_DBG("BLOCK BITMAP: bytes_count %zu, items_count %zu, " + "metadata_items %u, used_blks %u, invalid_blks %u, " + "flags %#x\n", + bmap->bytes_count, + bmap->items_count, + bmap->metadata_items, + bmap->used_blks, + bmap->invalid_blks, + atomic_read(&bmap->flags)); + + SSDFS_DBG("LAST SEARCH:\n"); + for (i = 0; i < SSDFS_SEARCH_TYPE_MAX; i++) { + SSDFS_DBG("TYPE %d: page_index %d, offset %u, cache %lx\n", + i, + bmap->last_search[i].page_index, + bmap->last_search[i].offset, + bmap->last_search[i].cache); + } + + switch (bmap->storage.state) { + case SSDFS_BLOCK_BMAP_STORAGE_PAGE_VEC: + array = &bmap->storage.array; + + for (i = 0; i < ssdfs_page_vector_count(array); i++) { + page = array->pages[i]; + + if (!page) { + SSDFS_WARN("page %d is NULL\n", i); + continue; + } + + kaddr = kmap_local_page(page); + SSDFS_DBG("BMAP CONTENT: page %d\n", i); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, PAGE_SIZE); + kunmap_local(kaddr); + } + break; + + case SSDFS_BLOCK_BMAP_STORAGE_BUFFER: + SSDFS_DBG("BMAP CONTENT:\n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + bmap->storage.buf, + bmap->bytes_count); + break; + } +} +#endif /* CONFIG_SSDFS_DEBUG */ From patchwork Sat Feb 25 01:08:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151917 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CECF9C64ED8 for ; Sat, 25 Feb 2023 01:16:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229648AbjBYBQb (ORCPT ); Fri, 24 Feb 2023 20:16:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48570 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229561AbjBYBQJ (ORCPT ); Fri, 24 Feb 2023 20:16:09 -0500 Received: from mail-oi1-x234.google.com (mail-oi1-x234.google.com [IPv6:2607:f8b0:4864:20::234]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DADBC15C88 for ; Fri, 24 Feb 2023 17:16:04 -0800 (PST) Received: by mail-oi1-x234.google.com with SMTP id q15so779634oiw.11 for ; Fri, 24 Feb 2023 17:16:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3Vp8yFd/CK3RAMFUu4dR8uwEv3Xl03K9pf0agNYOIQs=; b=YUEi/ymScdXiaXJQD86T4HbabiqTCsxD+dMDproQii0yGRZg1/8yLZz8uC6/+2/A0K vWYAkGReSj1Zw6zkaXtGwcCJY42kzkRDwBdXo5uGz25OBa86aEYgBWdGSjfH4L86vrHg Jb1h8BNBBlFGLTwkcrZHe6xRLmySmHGt4wHJ0gQ1W0uTl7pxCZDE7WVA9vhtQXUxO6Gn SZ8XN5EQ/gNwZeS6FZxlJoAmXMwmJMlFyQusYuj+3O+9HILsEaWPjDGZNyiyGt0uBYXq Pr1haM5pDUdoaLyZtmkpsIHZMNsT2xWnhU5MtsX81PR/rBuGPax/5IAFQrcCSPQ1dk0q p1JQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3Vp8yFd/CK3RAMFUu4dR8uwEv3Xl03K9pf0agNYOIQs=; b=xhFGbks9Ej3AmiFzjdyQAVILtr+JnfqggkHJoIoy6+sAYY+FWHPERZS3px1S35Ekwc IKajoNnTOQbHc3AG8rDzNLyFEVUppmakq71UQDJ4JSLyhA96magBYuxjSLwndwRDGaUn AEffIsdIErcf5bXwFZUBNjWDA+RfTWJtJjNwl3IgBPJBhLiNv+yt3TLj+92WVmbCSQTX RDN7nDFdq5Syerdnnf4ALQdoXvqBN/L0DucEqfM/lMABROt8ys8i83TI5DFEZlH105iK r9F9ieAHPyFxzi/ZrJKvvDgNsYLFmgaZQqvC8pU5Tda0GS8W5Xc5FdrxpM/hMKz3ZceW V0gw== X-Gm-Message-State: AO0yUKUY7+UCSvZ2uY+zAPPIGLIqyXUKkAq2qcttrCVmJdAJPj0UXQXK Sl3orq2nSIXGh/hfo+7Hce/CGvFFXoVTmQev X-Google-Smtp-Source: AK7set/wYPoEgLgmV+EAIleOUPl40wwfwfNwTStEc5blh2xccjAnqWNNJehARs5L+Y7oErC89lcR2A== X-Received: by 2002:a05:6808:642:b0:383:b777:8518 with SMTP id z2-20020a056808064200b00383b7778518mr6771379oih.24.1677287761959; Fri, 24 Feb 2023 17:16:01 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:01 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 13/76] ssdfs: introduce PEB block bitmap Date: Fri, 24 Feb 2023 17:08:24 -0800 Message-Id: <20230225010927.813929-14-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org SSDFS implements a migration scheme. Migration scheme is a fundamental technique of GC overhead management. The key responsibility of the migration scheme is to guarantee the presence of data in the same segment for any update operations. Generally speaking, the migration scheme’s model is implemented on the basis of association an exhausted "Physical" Erase Block (PEB) with a clean one. The goal such association of two PEBs is to implement the gradual migration of data by means of the update operations in the initial (exhausted) PEB. As a result, the old, exhausted PEB becomes invalidated after complete data migration and it will be possible to apply the erase operation to convert it in the clean state. Moreover, the destination PEB in the association changes the initial PEB for some index in the segment and, finally, it becomes the only PEB for this position. Namely such technique implements the concept of logical extent with the goal to decrease the write amplification issue and to manage the GC overhead. Because the logical extent concept excludes the necessity to update metadata is tracking the position of user data on the file system’s volume. Generally speaking, the migration scheme is capable to decrease the GC activity significantly by means of excluding the necessity to update metadata and by means of self-migration of data between of PEBs is triggered by regular update operations. To implement the migration scheme concept, SSDFS introduces PEB container that includes source and destination erase blocks. As a result, PEB block bitmap object represents the same aggregation for source PEB's block bitmap and destination PEB's block bitmap. PEB block bitmap implements API: (1) create - create PEB block bitmap (2) destroy - destroy PEB block bitmap (3) init - initialize PEB block bitmap by metadata from a log (4) get_free_pages - get free pages in aggregation of block bitmaps (5) get_used_pages - get used pages in aggregation of block bitmaps (6) get_invalid_pages - get invalid pages in aggregation of block bitmaps (7) pre_allocate - pre_allocate page/range in aggregation of block bitmaps (8) allocate - allocate page/range in aggregation of block bitmaps (9) invalidate - invalidate page/range in aggregation of block bitmaps (10) update_range - change the state of range in aggregation of block bitmaps (11) collect_garbage - find contiguous range for requested state (12) start_migration - prepare PEB's environment for migration (13) migrate - move range from source block bitmap into destination one (14) finish_migration - clean source block bitmap and swap block bitmaps Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/peb_block_bitmap.c | 1540 +++++++++++++++++++++++++++++++++++ fs/ssdfs/peb_block_bitmap.h | 165 ++++ 2 files changed, 1705 insertions(+) create mode 100644 fs/ssdfs/peb_block_bitmap.c create mode 100644 fs/ssdfs/peb_block_bitmap.h diff --git a/fs/ssdfs/peb_block_bitmap.c b/fs/ssdfs/peb_block_bitmap.c new file mode 100644 index 000000000000..0011ed7dc306 --- /dev/null +++ b/fs/ssdfs/peb_block_bitmap.c @@ -0,0 +1,1540 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/peb_block_bitmap.c - PEB's block bitmap implementation. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "page_vector.h" +#include "peb_block_bitmap.h" +#include "segment_block_bitmap.h" +#include "offset_translation_table.h" +#include "page_array.h" +#include "peb_container.h" +#include "segment_bitmap.h" +#include "segment.h" + +#define SSDFS_PEB_BLK_BMAP_STATE_FNS(value, name) \ +static inline \ +bool is_peb_block_bmap_##name(struct ssdfs_peb_blk_bmap *bmap) \ +{ \ + return atomic_read(&bmap->state) == SSDFS_PEB_BLK_BMAP_##value; \ +} \ +static inline \ +void set_peb_block_bmap_##name(struct ssdfs_peb_blk_bmap *bmap) \ +{ \ + atomic_set(&bmap->state, SSDFS_PEB_BLK_BMAP_##value); \ +} \ + +/* + * is_peb_block_bmap_created() + * set_peb_block_bmap_created() + */ +SSDFS_PEB_BLK_BMAP_STATE_FNS(CREATED, created) + +/* + * is_peb_block_bmap_initialized() + * set_peb_block_bmap_initialized() + */ +SSDFS_PEB_BLK_BMAP_STATE_FNS(INITIALIZED, initialized) + +bool ssdfs_peb_blk_bmap_initialized(struct ssdfs_peb_blk_bmap *ptr) +{ + return is_peb_block_bmap_initialized(ptr); +} + +/* + * ssdfs_peb_blk_bmap_create() - construct PEB's block bitmap + * @parent: parent segment's block bitmap + * @peb_index: PEB's index in segment's array + * @items_count: count of described items + * @flag: define necessity to allocate memory + * @init_flag: definition of block bitmap's creation state + * @init_state: block state is used during initialization + * + * This function tries to create the source and destination block + * bitmap objects. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_peb_blk_bmap_create(struct ssdfs_segment_blk_bmap *parent, + u16 peb_index, u32 items_count, + int init_flag, int init_state) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + struct ssdfs_peb_blk_bmap *bmap; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!parent || !parent->peb); + BUG_ON(peb_index >= parent->pebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("parent %p, peb_index %u, " + "items_count %u, init_flag %#x, init_state %#x\n", + parent, peb_index, + items_count, init_flag, init_state); +#else + SSDFS_DBG("parent %p, peb_index %u, " + "items_count %u, init_flag %#x, init_state %#x\n", + parent, peb_index, + items_count, init_flag, init_state); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + fsi = parent->parent_si->fsi; + si = parent->parent_si; + bmap = &parent->peb[peb_index]; + atomic_set(&bmap->state, SSDFS_PEB_BLK_BMAP_STATE_UNKNOWN); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, peb_index %u\n", + si->seg_id, bmap->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (items_count > parent->pages_per_peb) { + SSDFS_ERR("items_count %u > pages_per_peb %u\n", + items_count, parent->pages_per_peb); + return -ERANGE; + } + + bmap->parent = parent; + bmap->peb_index = peb_index; + bmap->pages_per_peb = parent->pages_per_peb; + + init_rwsem(&bmap->modification_lock); + atomic_set(&bmap->peb_valid_blks, 0); + atomic_set(&bmap->peb_invalid_blks, 0); + atomic_set(&bmap->peb_free_blks, 0); + + atomic_set(&bmap->buffers_state, SSDFS_PEB_BMAP_BUFFERS_EMPTY); + init_rwsem(&bmap->lock); + bmap->init_cno = U64_MAX; + + err = ssdfs_block_bmap_create(fsi, + &bmap->buffer[SSDFS_PEB_BLK_BMAP1], + items_count, init_flag, init_state); + if (unlikely(err)) { + SSDFS_ERR("fail to create source block bitmap: " + "peb_index %u, items_count %u, " + "init_flag %#x, init_state %#x\n", + peb_index, items_count, + init_flag, init_state); + goto fail_create_peb_bmap; + } + + err = ssdfs_block_bmap_create(fsi, + &bmap->buffer[SSDFS_PEB_BLK_BMAP2], + items_count, + SSDFS_BLK_BMAP_CREATE, + SSDFS_BLK_FREE); + if (unlikely(err)) { + SSDFS_ERR("fail to create destination block bitmap: " + "peb_index %u, items_count %u\n", + peb_index, items_count); + goto fail_create_peb_bmap; + } + + if (init_flag == SSDFS_BLK_BMAP_CREATE) { + atomic_set(&bmap->peb_free_blks, items_count); + atomic_add(items_count, &parent->seg_free_blks); + } + + bmap->src = &bmap->buffer[SSDFS_PEB_BLK_BMAP1]; + bmap->dst = NULL; + + init_completion(&bmap->init_end); + + atomic_set(&bmap->buffers_state, SSDFS_PEB_BMAP1_SRC); + atomic_set(&bmap->state, SSDFS_PEB_BLK_BMAP_CREATED); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; + +fail_create_peb_bmap: + ssdfs_peb_blk_bmap_destroy(bmap); + return err; +} + +/* + * ssdfs_peb_blk_bmap_destroy() - destroy PEB's block bitmap + * @ptr: PEB's block bitmap object + * + * This function tries to destroy PEB's block bitmap object. + */ +void ssdfs_peb_blk_bmap_destroy(struct ssdfs_peb_blk_bmap *ptr) +{ + if (!ptr) + return; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(rwsem_is_locked(&ptr->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("ptr %p, peb_index %u, " + "state %#x, valid_logical_blks %d, " + "invalid_logical_blks %d, " + "free_logical_blks %d\n", + ptr, ptr->peb_index, + atomic_read(&ptr->state), + atomic_read(&ptr->peb_valid_blks), + atomic_read(&ptr->peb_invalid_blks), + atomic_read(&ptr->peb_free_blks)); +#else + SSDFS_DBG("ptr %p, peb_index %u, " + "state %#x, valid_logical_blks %d, " + "invalid_logical_blks %d, " + "free_logical_blks %d\n", + ptr, ptr->peb_index, + atomic_read(&ptr->state), + atomic_read(&ptr->peb_valid_blks), + atomic_read(&ptr->peb_invalid_blks), + atomic_read(&ptr->peb_free_blks)); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (!is_peb_block_bmap_initialized(ptr)) + SSDFS_WARN("PEB's block bitmap hasn't been initialized\n"); + + atomic_set(&ptr->peb_valid_blks, 0); + atomic_set(&ptr->peb_invalid_blks, 0); + atomic_set(&ptr->peb_free_blks, 0); + + ptr->src = NULL; + ptr->dst = NULL; + atomic_set(&ptr->buffers_state, SSDFS_PEB_BMAP_BUFFERS_EMPTY); + + ssdfs_block_bmap_destroy(&ptr->buffer[SSDFS_PEB_BLK_BMAP1]); + ssdfs_block_bmap_destroy(&ptr->buffer[SSDFS_PEB_BLK_BMAP2]); + + atomic_set(&ptr->state, SSDFS_PEB_BLK_BMAP_STATE_UNKNOWN); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ +} + +/* + * ssdfs_peb_blk_bmap_init() - init PEB's block bitmap + * @bmap: pointer on PEB's block bitmap object + * @source: pointer on pagevec with bitmap state + * @hdr: header of block bitmap fragment + * @cno: log's checkpoint + * + * This function tries to init PEB's block bitmap. + * + * RETURN: + * [success] - count of free pages. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ERANGE - invalid internal calculations. + */ +int ssdfs_peb_blk_bmap_init(struct ssdfs_peb_blk_bmap *bmap, + struct ssdfs_page_vector *source, + struct ssdfs_block_bitmap_fragment *hdr, + u64 cno) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + struct ssdfs_peb_container *pebc; + struct ssdfs_block_bmap *blk_bmap = NULL; + int bmap_state = SSDFS_PEB_BLK_BMAP_STATE_UNKNOWN; + bool is_dst_peb_clean = false; + u8 flags; + u8 type; + bool under_migration = false; + bool has_ext_ptr = false; + bool has_relation = false; + u64 old_cno = U64_MAX; + u32 last_free_blk; + u32 metadata_blks; + u32 free_blks; + u32 used_blks; + u32 invalid_blks; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap || !bmap->parent || !bmap->parent->parent_si); + BUG_ON(!bmap->parent->parent_si->peb_array); + BUG_ON(!source || !hdr); + BUG_ON(ssdfs_page_vector_count(source) == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = bmap->parent->parent_si->fsi; + si = bmap->parent->parent_si; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg_id %llu, peb_index %u, cno %llu\n", + si->seg_id, bmap->peb_index, cno); +#else + SSDFS_DBG("seg_id %llu, peb_index %u, cno %llu\n", + si->seg_id, bmap->peb_index, cno); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + bmap_state = atomic_read(&bmap->state); + switch (bmap_state) { + case SSDFS_PEB_BLK_BMAP_CREATED: + /* regular init */ + break; + + case SSDFS_PEB_BLK_BMAP_HAS_CLEAN_DST: + /* + * PEB container is under migration. + * But the destination PEB is clean. + * It means that destination PEB doesn't need + * in init operation. + */ + is_dst_peb_clean = true; + break; + + default: + SSDFS_ERR("invalid PEB block bitmap state %#x\n", + atomic_read(&bmap->state)); + return -ERANGE; + } + + if (bmap->peb_index >= si->pebs_count) { + SSDFS_ERR("peb_index %u >= pebs_count %u\n", + bmap->peb_index, si->pebs_count); + return -ERANGE; + } + + pebc = &si->peb_array[bmap->peb_index]; + + flags = hdr->flags; + type = hdr->type; + + if (flags & ~SSDFS_FRAG_BLK_BMAP_FLAG_MASK) { + SSDFS_ERR("invalid flags set: %#x\n", flags); + return -EIO; + } + + if (type >= SSDFS_FRAG_BLK_BMAP_TYPE_MAX) { + SSDFS_ERR("invalid type: %#x\n", type); + return -EIO; + } + + if (is_dst_peb_clean) { + under_migration = true; + has_relation = true; + } else { + under_migration = flags & SSDFS_MIGRATING_BLK_BMAP; + has_ext_ptr = flags & SSDFS_PEB_HAS_EXT_PTR; + has_relation = flags & SSDFS_PEB_HAS_RELATION; + } + + if (type == SSDFS_SRC_BLK_BMAP && (has_ext_ptr && has_relation)) { + SSDFS_ERR("invalid flags set: %#x\n", flags); + return -EIO; + } + + down_write(&bmap->lock); + + old_cno = bmap->init_cno; + if (bmap->init_cno == U64_MAX) + bmap->init_cno = cno; + else if (bmap->init_cno != cno) { + err = -ERANGE; + SSDFS_ERR("invalid bmap state: " + "bmap->init_cno %llu, cno %llu\n", + bmap->init_cno, cno); + goto fail_init_blk_bmap; + } + + switch (type) { + case SSDFS_SRC_BLK_BMAP: + if (under_migration && has_relation) { + if (is_dst_peb_clean) + bmap->dst = &bmap->buffer[SSDFS_PEB_BLK_BMAP2]; + bmap->src = &bmap->buffer[SSDFS_PEB_BLK_BMAP1]; + blk_bmap = bmap->src; + atomic_set(&bmap->buffers_state, + SSDFS_PEB_BMAP1_SRC_PEB_BMAP2_DST); + } else if (under_migration && has_ext_ptr) { + bmap->src = &bmap->buffer[SSDFS_PEB_BLK_BMAP1]; + blk_bmap = bmap->src; + atomic_set(&bmap->buffers_state, + SSDFS_PEB_BMAP1_SRC); + } else if (under_migration) { + err = -EIO; + SSDFS_ERR("invalid flags set: %#x\n", flags); + goto fail_init_blk_bmap; + } else { + bmap->src = &bmap->buffer[SSDFS_PEB_BLK_BMAP1]; + blk_bmap = bmap->src; + atomic_set(&bmap->buffers_state, + SSDFS_PEB_BMAP1_SRC); + } + break; + + case SSDFS_DST_BLK_BMAP: + if (under_migration && has_relation) { + bmap->dst = &bmap->buffer[SSDFS_PEB_BLK_BMAP2]; + blk_bmap = bmap->dst; + atomic_set(&bmap->buffers_state, + SSDFS_PEB_BMAP1_SRC_PEB_BMAP2_DST); + } else if (under_migration && has_ext_ptr) { + bmap->src = &bmap->buffer[SSDFS_PEB_BLK_BMAP1]; + blk_bmap = bmap->src; + atomic_set(&bmap->buffers_state, + SSDFS_PEB_BMAP1_SRC); + } else { + err = -EIO; + SSDFS_ERR("invalid flags set: %#x\n", flags); + goto fail_init_blk_bmap; + } + break; + + default: + BUG(); + } + + last_free_blk = le32_to_cpu(hdr->last_free_blk); + metadata_blks = le32_to_cpu(hdr->metadata_blks); + invalid_blks = le32_to_cpu(hdr->invalid_blks); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, peb_index %u, cno %llu, " + "last_free_blk %u, metadata_blks %u, invalid_blks %u\n", + si->seg_id, bmap->peb_index, cno, + last_free_blk, metadata_blks, invalid_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_block_bmap_lock(blk_bmap); + if (unlikely(err)) { + SSDFS_ERR("fail to lock bitmap: err %d\n", err); + goto fail_init_blk_bmap; + } + + err = ssdfs_block_bmap_init(blk_bmap, source, last_free_blk, + metadata_blks, invalid_blks); + if (unlikely(err)) { + SSDFS_ERR("fail to initialize block bitmap: " + "err %d\n", err); + goto fail_define_pages_count; + } + + err = ssdfs_block_bmap_get_free_pages(blk_bmap); + if (unlikely(err < 0)) { + SSDFS_ERR("fail to get free pages: err %d\n", err); + goto fail_define_pages_count; + } else { + free_blks = err; + err = 0; + } + + err = ssdfs_block_bmap_get_used_pages(blk_bmap); + if (unlikely(err < 0)) { + SSDFS_ERR("fail to get used pages: err %d\n", err); + goto fail_define_pages_count; + } else { + used_blks = err; + err = 0; + } + + err = ssdfs_block_bmap_get_invalid_pages(blk_bmap); + if (unlikely(err < 0)) { + SSDFS_ERR("fail to get invalid pages: err %d\n", err); + goto fail_define_pages_count; + } else { + invalid_blks = err; + err = 0; + } + +fail_define_pages_count: + ssdfs_block_bmap_unlock(blk_bmap); + + if (unlikely(err)) + goto fail_init_blk_bmap; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, peb_index %u, cno %llu, " + "type %#x, under_migration %#x, has_relation %#x, " + "last_free_blk %u, metadata_blks %u, " + "free_blks %u, used_blks %u, " + "invalid_blks %u, shared_free_dst_blks %d\n", + si->seg_id, bmap->peb_index, cno, + type, under_migration, has_relation, + last_free_blk, metadata_blks, + free_blks, used_blks, invalid_blks, + atomic_read(&pebc->shared_free_dst_blks)); + SSDFS_DBG("seg_id %llu, peb_index %u, cno %llu, " + "free_blks %d, valid_blks %d, invalid_blks %d\n", + si->seg_id, bmap->peb_index, cno, + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (type) { + case SSDFS_SRC_BLK_BMAP: + if (is_dst_peb_clean && !(flags & SSDFS_MIGRATING_BLK_BMAP)) { + down_write(&bmap->modification_lock); + atomic_set(&bmap->peb_valid_blks, used_blks); + atomic_add(fsi->pages_per_peb - used_blks, + &bmap->peb_free_blks); + up_write(&bmap->modification_lock); + + atomic_set(&pebc->shared_free_dst_blks, + fsi->pages_per_peb - used_blks); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("SRC: seg_id %llu, peb_index %u, cno %llu, " + "pages_per_peb %u, used_blks %u, " + "shared_free_dst_blks %d\n", + si->seg_id, bmap->peb_index, cno, + fsi->pages_per_peb, used_blks, + atomic_read(&pebc->shared_free_dst_blks)); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&bmap->parent->modification_lock); + atomic_add(atomic_read(&bmap->peb_valid_blks), + &bmap->parent->seg_valid_blks); + atomic_add(atomic_read(&bmap->peb_free_blks), + &bmap->parent->seg_free_blks); + up_write(&bmap->parent->modification_lock); + } else if (under_migration && has_relation) { + int current_free_blks = + atomic_read(&bmap->peb_free_blks); + + if (used_blks > current_free_blks) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("used_blks %u > free_blks %d\n", + used_blks, current_free_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&bmap->modification_lock); + atomic_set(&bmap->peb_free_blks, 0); + atomic_add(used_blks, &bmap->peb_valid_blks); + up_write(&bmap->modification_lock); + + atomic_set(&pebc->shared_free_dst_blks, 0); + + down_write(&bmap->parent->modification_lock); + atomic_sub(current_free_blks, + &bmap->parent->seg_free_blks); + atomic_add(used_blks, + &bmap->parent->seg_valid_blks); + up_write(&bmap->parent->modification_lock); + } else { + down_write(&bmap->modification_lock); + atomic_sub(used_blks, &bmap->peb_free_blks); + atomic_add(used_blks, &bmap->peb_valid_blks); + up_write(&bmap->modification_lock); + + atomic_sub(used_blks, + &pebc->shared_free_dst_blks); + + down_write(&bmap->parent->modification_lock); + atomic_sub(used_blks, + &bmap->parent->seg_free_blks); + atomic_add(used_blks, + &bmap->parent->seg_valid_blks); + up_write(&bmap->parent->modification_lock); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("shared_free_dst_blks %d\n", + atomic_read(&pebc->shared_free_dst_blks)); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (under_migration && has_ext_ptr) { + down_write(&bmap->modification_lock); + atomic_add(used_blks, &bmap->peb_valid_blks); + atomic_add(invalid_blks, &bmap->peb_invalid_blks); + atomic_add(free_blks, &bmap->peb_free_blks); + up_write(&bmap->modification_lock); + } else if (under_migration) { + err = -EIO; + SSDFS_ERR("invalid flags set: %#x\n", flags); + goto fail_init_blk_bmap; + } else { + down_write(&bmap->modification_lock); + atomic_set(&bmap->peb_valid_blks, used_blks); + atomic_set(&bmap->peb_invalid_blks, invalid_blks); + atomic_set(&bmap->peb_free_blks, free_blks); + up_write(&bmap->modification_lock); + + down_write(&bmap->parent->modification_lock); + atomic_add(atomic_read(&bmap->peb_valid_blks), + &bmap->parent->seg_valid_blks); + atomic_add(atomic_read(&bmap->peb_invalid_blks), + &bmap->parent->seg_invalid_blks); + atomic_add(atomic_read(&bmap->peb_free_blks), + &bmap->parent->seg_free_blks); + up_write(&bmap->parent->modification_lock); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("SRC: seg_id %llu, peb_index %u, cno %llu, " + "free_blks %d, valid_blks %d, invalid_blks %d, " + "parent (used_blks %d, free_blks %d, invalid_blks %d)\n", + si->seg_id, bmap->peb_index, cno, + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + atomic_read(&bmap->parent->seg_valid_blks), + atomic_read(&bmap->parent->seg_free_blks), + atomic_read(&bmap->parent->seg_invalid_blks)); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + case SSDFS_DST_BLK_BMAP: + if (under_migration) { + down_write(&bmap->modification_lock); + atomic_add(used_blks, &bmap->peb_valid_blks); + atomic_add(invalid_blks, &bmap->peb_invalid_blks); + atomic_add(free_blks, &bmap->peb_free_blks); + up_write(&bmap->modification_lock); + + atomic_add(free_blks, &pebc->shared_free_dst_blks); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("DST: seg_id %llu, peb_index %u, cno %llu, " + "free_blks %u, " + "shared_free_dst_blks %d\n", + si->seg_id, bmap->peb_index, cno, + free_blks, + atomic_read(&pebc->shared_free_dst_blks)); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&bmap->parent->modification_lock); + atomic_add(used_blks, + &bmap->parent->seg_valid_blks); + atomic_add(invalid_blks, + &bmap->parent->seg_invalid_blks); + atomic_add(free_blks, + &bmap->parent->seg_free_blks); + up_write(&bmap->parent->modification_lock); + } else { + err = -EIO; + SSDFS_ERR("invalid flags set: %#x\n", flags); + goto fail_init_blk_bmap; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("DST: seg_id %llu, peb_index %u, cno %llu, " + "free_blks %d, valid_blks %d, invalid_blks %d, " + "parent (used_blks %d, free_blks %d, invalid_blks %d)\n", + si->seg_id, bmap->peb_index, cno, + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + atomic_read(&bmap->parent->seg_valid_blks), + atomic_read(&bmap->parent->seg_free_blks), + atomic_read(&bmap->parent->seg_invalid_blks)); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + default: + BUG(); + } + + switch (type) { + case SSDFS_SRC_BLK_BMAP: + if (under_migration && has_relation) { + if (!bmap->dst) + goto finish_init_blk_bmap; + else if (!ssdfs_block_bmap_initialized(bmap->dst)) + goto finish_init_blk_bmap; + } + break; + + case SSDFS_DST_BLK_BMAP: + if (under_migration && has_relation) { + if (!bmap->src) + goto finish_init_blk_bmap; + else if (!ssdfs_block_bmap_initialized(bmap->src)) + goto finish_init_blk_bmap; + } + break; + + default: + BUG(); + } + + if (atomic_read(&pebc->shared_free_dst_blks) < 0) { + SSDFS_WARN("type %#x, under_migration %#x, has_relation %#x, " + "last_free_blk %u, metadata_blks %u, " + "free_blks %u, used_blks %u, " + "invalid_blks %u, shared_free_dst_blks %d\n", + type, under_migration, has_relation, + last_free_blk, metadata_blks, + free_blks, used_blks, invalid_blks, + atomic_read(&pebc->shared_free_dst_blks)); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, peb_index %u, cno %llu, " + "free_blks %d, used_blks %d, invalid_blks %d, " + "shared_free_dst_blks %d\n", + si->seg_id, bmap->peb_index, cno, + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + atomic_read(&pebc->shared_free_dst_blks)); + SSDFS_DBG("seg_id %llu, peb_index %u, cno %llu, " + "parent (used_blks %d, free_blks %d, invalid_blks %d)\n", + si->seg_id, bmap->peb_index, cno, + atomic_read(&bmap->parent->seg_valid_blks), + atomic_read(&bmap->parent->seg_free_blks), + atomic_read(&bmap->parent->seg_invalid_blks)); +#endif /* CONFIG_SSDFS_DEBUG */ + + atomic_set(&bmap->state, SSDFS_PEB_BLK_BMAP_INITIALIZED); + complete_all(&bmap->init_end); + +fail_init_blk_bmap: + if (unlikely(err)) { + bmap->init_cno = old_cno; + complete_all(&bmap->init_end); + } + +finish_init_blk_bmap: + up_write(&bmap->lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: err %d\n", err); +#else + SSDFS_DBG("finished: err %d\n", err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * ssdfs_peb_blk_bmap_init_failed() - process failure of block bitmap init + * @bmap: pointer on PEB's block bitmap object + */ +void ssdfs_peb_blk_bmap_init_failed(struct ssdfs_peb_blk_bmap *bmap) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap); +#endif /* CONFIG_SSDFS_DEBUG */ + + complete_all(&bmap->init_end); +} + +/* + * is_ssdfs_peb_blk_bmap_dirty() - check that PEB block bitmap is dirty + * @bmap: pointer on PEB's block bitmap object + */ +bool is_ssdfs_peb_blk_bmap_dirty(struct ssdfs_peb_blk_bmap *bmap) +{ + bool is_src_dirty = false; + bool is_dst_dirty = false; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) + return false; + + down_read(&bmap->lock); + if (bmap->src != NULL) + is_src_dirty = ssdfs_block_bmap_dirtied(bmap->src); + if (bmap->dst != NULL) + is_dst_dirty = ssdfs_block_bmap_dirtied(bmap->dst); + up_read(&bmap->lock); + + return is_src_dirty || is_dst_dirty; +} + +/* + * ssdfs_peb_define_reserved_pages_per_log() - estimate reserved pages per log + * @bmap: pointer on PEB's block bitmap object + */ +int ssdfs_peb_define_reserved_pages_per_log(struct ssdfs_peb_blk_bmap *bmap) +{ + struct ssdfs_segment_blk_bmap *parent = bmap->parent; + struct ssdfs_segment_info *si = parent->parent_si; + struct ssdfs_fs_info *fsi = si->fsi; + u32 page_size = fsi->pagesize; + u32 pages_per_peb = parent->pages_per_peb; + u32 pebs_per_seg = fsi->pebs_per_seg; + u16 log_pages = si->log_pages; + bool is_migrating = false; + + switch (atomic_read(&bmap->buffers_state)) { + case SSDFS_PEB_BMAP1_SRC_PEB_BMAP2_DST: + case SSDFS_PEB_BMAP2_SRC_PEB_BMAP1_DST: + is_migrating = true; + break; + + default: + is_migrating = false; + break; + } + + return ssdfs_peb_estimate_reserved_metapages(page_size, + pages_per_peb, + log_pages, + pebs_per_seg, + is_migrating); +} + +bool has_ssdfs_peb_blk_bmap_initialized(struct ssdfs_peb_blk_bmap *bmap) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap || !bmap->parent || !bmap->parent->parent_si); + + SSDFS_DBG("seg_id %llu, peb_index %u\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + return ssdfs_peb_blk_bmap_initialized(bmap); +} + +int ssdfs_peb_blk_bmap_wait_init_end(struct ssdfs_peb_blk_bmap *bmap) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap || !bmap->parent || !bmap->parent->parent_si); + + SSDFS_DBG("seg_id %llu, peb_index %u\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (ssdfs_peb_blk_bmap_initialized(bmap)) + return 0; + else { + err = SSDFS_WAIT_COMPLETION(&bmap->init_end); + if (unlikely(err)) { + SSDFS_ERR("PEB block bitmap init failed: " + "seg_id %llu, peb_index %u, " + "err %d\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, err); + return err; + } + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = -ERANGE; + SSDFS_ERR("PEB block bitmap init failed: " + "seg_id %llu, peb_index %u, " + "err %d\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, err); + return err; + } + } + + return 0; +} + +/* + * ssdfs_peb_blk_bmap_get_free_pages() - determine PEB's free pages count + * @bmap: pointer on PEB's block bitmap object + * + * This function tries to detect PEB's free pages count. + * + * RETURN: + * [success] - count of free pages. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ERANGE - invalid internal calculations. + */ +int ssdfs_peb_blk_bmap_get_free_pages(struct ssdfs_peb_blk_bmap *bmap) +{ + int free_pages; + int log_pages; + int created_logs; + int reserved_pages_per_log; + int used_pages; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap || !bmap->parent || !bmap->parent->parent_si); + + SSDFS_DBG("seg_id %llu, peb_index %u\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = SSDFS_WAIT_COMPLETION(&bmap->init_end); + if (unlikely(err)) { +init_failed: + SSDFS_ERR("PEB block bitmap init failed: " + "seg_id %llu, peb_index %u, " + "err %d\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, err); + SSDFS_ERR("seg_id %llu, free_logical_blks %u, " + "valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + bmap->parent->parent_si->seg_id, + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + + if (bmap->src) { + SSDFS_ERR("SRC BLOCK BITMAP: bytes_count %zu, items_count %zu, " + "metadata_items %u, used_blks %u, invalid_blks %u, " + "flags %#x\n", + bmap->src->bytes_count, + bmap->src->items_count, + bmap->src->metadata_items, + bmap->src->used_blks, + bmap->src->invalid_blks, + atomic_read(&bmap->src->flags)); + } + + if (bmap->dst) { + SSDFS_ERR("DST BLOCK BITMAP: bytes_count %zu, items_count %zu, " + "metadata_items %u, used_blks %u, invalid_blks %u, " + "flags %#x\n", + bmap->dst->bytes_count, + bmap->dst->items_count, + bmap->dst->metadata_items, + bmap->dst->used_blks, + bmap->dst->invalid_blks, + atomic_read(&bmap->dst->flags)); + } + + return err; + } + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = -ERANGE; + goto init_failed; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, free_logical_blks %u, " + "valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + bmap->parent->parent_si->seg_id, + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + + if ((atomic_read(&bmap->peb_free_blks) + + atomic_read(&bmap->peb_valid_blks) + + atomic_read(&bmap->peb_invalid_blks)) > bmap->pages_per_peb) { + SSDFS_WARN("seg_id %llu, peb_index %u, " + "free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + log_pages = bmap->parent->parent_si->log_pages; + reserved_pages_per_log = ssdfs_peb_define_reserved_pages_per_log(bmap); + free_pages = atomic_read(&bmap->peb_free_blks); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log_pages %d, reserved_pages_per_log %d, " + "free_pages %d\n", + log_pages, reserved_pages_per_log, free_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (free_pages > 0) { + int upper_threshold, lower_threshold; + + created_logs = (bmap->pages_per_peb - free_pages) / log_pages; + used_pages = bmap->pages_per_peb - free_pages; + + if (created_logs == 0) { + upper_threshold = log_pages; + lower_threshold = reserved_pages_per_log; + } else { + upper_threshold = (created_logs + 1) * log_pages; + lower_threshold = ((created_logs - 1) * log_pages) + + reserved_pages_per_log; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("created_logs %d, used_pages %d, " + "upper_threshold %d, lower_threshold %d\n", + created_logs, used_pages, + upper_threshold, lower_threshold); +#endif /* CONFIG_SSDFS_DEBUG */ + + BUG_ON(used_pages > upper_threshold); + + if (used_pages == upper_threshold) + free_pages -= reserved_pages_per_log; + else if (used_pages < lower_threshold) + free_pages -= (lower_threshold - used_pages); + + if (free_pages < 0) + free_pages = 0; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_pages %d\n", free_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + return free_pages; +} + +/* + * ssdfs_peb_blk_bmap_get_used_pages() - determine PEB's used data pages count + * @bmap: pointer on PEB's block bitmap object + * + * This function tries to detect PEB's used data pages count. + * + * RETURN: + * [success] - count of used data pages. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ERANGE - invalid internal calculations. + */ +int ssdfs_peb_blk_bmap_get_used_pages(struct ssdfs_peb_blk_bmap *bmap) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap); + + SSDFS_DBG("peb_index %u\n", bmap->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = SSDFS_WAIT_COMPLETION(&bmap->init_end); + if (unlikely(err)) { +init_failed: + SSDFS_ERR("PEB block bitmap init failed: " + "seg_id %llu, peb_index %u, " + "err %d\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, err); + return err; + } + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = -ERANGE; + goto init_failed; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + + if ((atomic_read(&bmap->peb_free_blks) + + atomic_read(&bmap->peb_valid_blks) + + atomic_read(&bmap->peb_invalid_blks)) > bmap->pages_per_peb) { + SSDFS_WARN("seg_id %llu, peb_index %u, " + "free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + return atomic_read(&bmap->peb_valid_blks); +} + +/* + * ssdfs_peb_blk_bmap_get_invalid_pages() - determine PEB's invalid pages count + * @bmap: pointer on PEB's block bitmap object + * + * This function tries to detect PEB's invalid pages count. + * + * RETURN: + * [success] - count of invalid pages. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ERANGE - invalid internal calculations. + */ +int ssdfs_peb_blk_bmap_get_invalid_pages(struct ssdfs_peb_blk_bmap *bmap) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap); + + SSDFS_DBG("peb_index %u\n", bmap->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = SSDFS_WAIT_COMPLETION(&bmap->init_end); + if (unlikely(err)) { +init_failed: + SSDFS_ERR("PEB block bitmap init failed: " + "seg_id %llu, peb_index %u, " + "err %d\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, err); + return err; + } + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = -ERANGE; + goto init_failed; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + + if ((atomic_read(&bmap->peb_free_blks) + + atomic_read(&bmap->peb_valid_blks) + + atomic_read(&bmap->peb_invalid_blks)) > bmap->pages_per_peb) { + SSDFS_WARN("seg_id %llu, peb_index %u, " + "free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + return atomic_read(&bmap->peb_invalid_blks); +} + +/* + * ssdfs_src_blk_bmap_get_free_pages() - determine free pages count + * @bmap: pointer on PEB's block bitmap object + * + * This function tries to detect the free pages count + * in the source bitmap. + * + * RETURN: + * [success] - count of free pages. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ERANGE - invalid internal calculations. + */ +int ssdfs_src_blk_bmap_get_free_pages(struct ssdfs_peb_blk_bmap *bmap) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap); + + SSDFS_DBG("peb_index %u\n", bmap->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = SSDFS_WAIT_COMPLETION(&bmap->init_end); + if (unlikely(err)) { +init_failed: + SSDFS_ERR("PEB block bitmap init failed: " + "seg_id %llu, peb_index %u, " + "err %d\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, err); + return err; + } + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = -ERANGE; + goto init_failed; + } + } + + down_read(&bmap->lock); + + if (bmap->src == NULL) { + err = -ERANGE; + SSDFS_WARN("bmap pointer is empty\n"); + goto finish_get_src_free_pages; + } + + err = ssdfs_block_bmap_lock(bmap->src); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_get_src_free_pages; + } + + err = ssdfs_block_bmap_get_free_pages(bmap->src); + ssdfs_block_bmap_unlock(bmap->src); + +finish_get_src_free_pages: + up_read(&bmap->lock); + + return err; +} + +/* + * ssdfs_src_blk_bmap_get_used_pages() - determine used pages count + * @bmap: pointer on PEB's block bitmap object + * + * This function tries to detect the used pages count + * in the source bitmap. + * + * RETURN: + * [success] - count of used pages. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ERANGE - invalid internal calculations. + */ +int ssdfs_src_blk_bmap_get_used_pages(struct ssdfs_peb_blk_bmap *bmap) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap); + + SSDFS_DBG("peb_index %u\n", bmap->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = SSDFS_WAIT_COMPLETION(&bmap->init_end); + if (unlikely(err)) { +init_failed: + SSDFS_ERR("PEB block bitmap init failed: " + "seg_id %llu, peb_index %u, " + "err %d\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, err); + return err; + } + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = -ERANGE; + goto init_failed; + } + } + + down_read(&bmap->lock); + + if (bmap->src == NULL) { + err = -ERANGE; + SSDFS_WARN("bmap pointer is empty\n"); + goto finish_get_src_used_pages; + } + + err = ssdfs_block_bmap_lock(bmap->src); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_get_src_used_pages; + } + + err = ssdfs_block_bmap_get_used_pages(bmap->src); + ssdfs_block_bmap_unlock(bmap->src); + +finish_get_src_used_pages: + up_read(&bmap->lock); + + return err; +} + +/* + * ssdfs_src_blk_bmap_get_invalid_pages() - determine invalid pages count + * @bmap: pointer on PEB's block bitmap object + * + * This function tries to detect the invalid pages count + * in the source bitmap. + * + * RETURN: + * [success] - count of invalid pages. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ERANGE - invalid internal calculations. + */ +int ssdfs_src_blk_bmap_get_invalid_pages(struct ssdfs_peb_blk_bmap *bmap) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap); + + SSDFS_DBG("peb_index %u\n", bmap->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = SSDFS_WAIT_COMPLETION(&bmap->init_end); + if (unlikely(err)) { +init_failed: + SSDFS_ERR("PEB block bitmap init failed: " + "seg_id %llu, peb_index %u, " + "err %d\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, err); + return err; + } + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = -ERANGE; + goto init_failed; + } + } + + down_read(&bmap->lock); + + if (bmap->src == NULL) { + err = -ERANGE; + SSDFS_WARN("bmap pointer is empty\n"); + goto finish_get_src_invalid_pages; + } + + err = ssdfs_block_bmap_lock(bmap->src); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_get_src_invalid_pages; + } + + err = ssdfs_block_bmap_get_invalid_pages(bmap->src); + ssdfs_block_bmap_unlock(bmap->src); + +finish_get_src_invalid_pages: + up_read(&bmap->lock); + + return err; +} + +/* + * ssdfs_dst_blk_bmap_get_free_pages() - determine free pages count + * @bmap: pointer on PEB's block bitmap object + * + * This function tries to detect the free pages count + * in the destination bitmap. + * + * RETURN: + * [success] - count of free pages. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ERANGE - invalid internal calculations. + */ +int ssdfs_dst_blk_bmap_get_free_pages(struct ssdfs_peb_blk_bmap *bmap) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap); + + SSDFS_DBG("peb_index %u\n", bmap->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = SSDFS_WAIT_COMPLETION(&bmap->init_end); + if (unlikely(err)) { +init_failed: + SSDFS_ERR("PEB block bitmap init failed: " + "seg_id %llu, peb_index %u, " + "err %d\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, err); + return err; + } + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = -ERANGE; + goto init_failed; + } + } + + down_read(&bmap->lock); + + if (bmap->dst == NULL) { + err = -ERANGE; + SSDFS_WARN("bmap pointer is empty\n"); + goto finish_get_dst_free_pages; + } + + err = ssdfs_block_bmap_lock(bmap->dst); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_get_dst_free_pages; + } + + err = ssdfs_block_bmap_get_free_pages(bmap->dst); + ssdfs_block_bmap_unlock(bmap->dst); + +finish_get_dst_free_pages: + up_read(&bmap->lock); + + return err; +} + +/* + * ssdfs_dst_blk_bmap_get_used_pages() - determine used pages count + * @bmap: pointer on PEB's block bitmap object + * + * This function tries to detect the used pages count + * in the destination bitmap. + * + * RETURN: + * [success] - count of used pages. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ERANGE - invalid internal calculations. + */ +int ssdfs_dst_blk_bmap_get_used_pages(struct ssdfs_peb_blk_bmap *bmap) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap); + + SSDFS_DBG("peb_index %u\n", bmap->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = SSDFS_WAIT_COMPLETION(&bmap->init_end); + if (unlikely(err)) { +init_failed: + SSDFS_ERR("PEB block bitmap init failed: " + "seg_id %llu, peb_index %u, " + "err %d\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, err); + return err; + } + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = -ERANGE; + goto init_failed; + } + } + + down_read(&bmap->lock); + + if (bmap->dst == NULL) { + err = -ERANGE; + SSDFS_WARN("bmap pointer is empty\n"); + goto finish_get_dst_used_pages; + } + + err = ssdfs_block_bmap_lock(bmap->dst); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_get_dst_used_pages; + } + + err = ssdfs_block_bmap_get_used_pages(bmap->dst); + ssdfs_block_bmap_unlock(bmap->dst); + +finish_get_dst_used_pages: + up_read(&bmap->lock); + + return err; +} + +/* + * ssdfs_dst_blk_bmap_get_invalid_pages() - determine invalid pages count + * @bmap: pointer on PEB's block bitmap object + * + * This function tries to detect the invalid pages count + * in the destination bitmap. + * + * RETURN: + * [success] - count of invalid pages. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ERANGE - invalid internal calculations. + */ +int ssdfs_dst_blk_bmap_get_invalid_pages(struct ssdfs_peb_blk_bmap *bmap) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap); + + SSDFS_DBG("peb_index %u\n", bmap->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = SSDFS_WAIT_COMPLETION(&bmap->init_end); + if (unlikely(err)) { +init_failed: + SSDFS_ERR("PEB block bitmap init failed: " + "seg_id %llu, peb_index %u, " + "err %d\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, err); + return err; + } + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = -ERANGE; + goto init_failed; + } + } + + down_read(&bmap->lock); + + if (bmap->dst == NULL) { + err = -ERANGE; + SSDFS_WARN("bmap pointer is empty\n"); + goto finish_get_dst_invalid_pages; + } + + err = ssdfs_block_bmap_lock(bmap->dst); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_get_dst_invalid_pages; + } + + err = ssdfs_block_bmap_get_invalid_pages(bmap->dst); + ssdfs_block_bmap_unlock(bmap->dst); + +finish_get_dst_invalid_pages: + up_read(&bmap->lock); + + return err; +} diff --git a/fs/ssdfs/peb_block_bitmap.h b/fs/ssdfs/peb_block_bitmap.h new file mode 100644 index 000000000000..7cbeebe1a59e --- /dev/null +++ b/fs/ssdfs/peb_block_bitmap.h @@ -0,0 +1,165 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/peb_block_bitmap.h - PEB's block bitmap declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#ifndef _SSDFS_PEB_BLOCK_BITMAP_H +#define _SSDFS_PEB_BLOCK_BITMAP_H + +#include "block_bitmap.h" + +/* PEB's block bitmap indexes */ +enum { + SSDFS_PEB_BLK_BMAP1, + SSDFS_PEB_BLK_BMAP2, + SSDFS_PEB_BLK_BMAP_ITEMS_MAX +}; + +/* + * struct ssdfs_peb_blk_bmap - PEB container's block bitmap object + * @state: PEB container's block bitmap's state + * @peb_index: PEB index in array + * @pages_per_peb: pages per physical erase block + * @modification_lock: lock for modification operations + * @peb_valid_blks: PEB container's valid logical blocks count + * @peb_invalid_blks: PEB container's invalid logical blocks count + * @peb_free_blks: PEB container's free logical blocks count + * @buffers_state: buffers state + * @lock: buffers lock + * @init_cno: initialization checkpoint + * @src: source PEB's block bitmap object's pointer + * @dst: destination PEB's block bitmap object's pointer + * @buffers: block bitmap buffers + * @init_end: wait of init ending + * @parent: pointer on parent segment block bitmap + */ +struct ssdfs_peb_blk_bmap { + atomic_t state; + + u16 peb_index; + u32 pages_per_peb; + + struct rw_semaphore modification_lock; + atomic_t peb_valid_blks; + atomic_t peb_invalid_blks; + atomic_t peb_free_blks; + + atomic_t buffers_state; + struct rw_semaphore lock; + u64 init_cno; + struct ssdfs_block_bmap *src; + struct ssdfs_block_bmap *dst; + struct ssdfs_block_bmap buffer[SSDFS_PEB_BLK_BMAP_ITEMS_MAX]; + struct completion init_end; + + struct ssdfs_segment_blk_bmap *parent; +}; + +/* PEB container's block bitmap's possible states */ +enum { + SSDFS_PEB_BLK_BMAP_STATE_UNKNOWN, + SSDFS_PEB_BLK_BMAP_CREATED, + SSDFS_PEB_BLK_BMAP_HAS_CLEAN_DST, + SSDFS_PEB_BLK_BMAP_INITIALIZED, + SSDFS_PEB_BLK_BMAP_STATE_MAX, +}; + +/* PEB's buffer array possible states */ +enum { + SSDFS_PEB_BMAP_BUFFERS_EMPTY, + SSDFS_PEB_BMAP1_SRC, + SSDFS_PEB_BMAP1_SRC_PEB_BMAP2_DST, + SSDFS_PEB_BMAP2_SRC, + SSDFS_PEB_BMAP2_SRC_PEB_BMAP1_DST, + SSDFS_PEB_BMAP_BUFFERS_STATE_MAX +}; + +/* PEB's block bitmap operation destination */ +enum { + SSDFS_PEB_BLK_BMAP_SOURCE, + SSDFS_PEB_BLK_BMAP_DESTINATION, + SSDFS_PEB_BLK_BMAP_INDEX_MAX +}; + +/* + * PEB block bitmap API + */ +int ssdfs_peb_blk_bmap_create(struct ssdfs_segment_blk_bmap *parent, + u16 peb_index, u32 items_count, + int init_flag, int init_state); +void ssdfs_peb_blk_bmap_destroy(struct ssdfs_peb_blk_bmap *ptr); +int ssdfs_peb_blk_bmap_init(struct ssdfs_peb_blk_bmap *bmap, + struct ssdfs_page_vector *source, + struct ssdfs_block_bitmap_fragment *hdr, + u64 cno); +void ssdfs_peb_blk_bmap_init_failed(struct ssdfs_peb_blk_bmap *bmap); + +bool has_ssdfs_peb_blk_bmap_initialized(struct ssdfs_peb_blk_bmap *bmap); +int ssdfs_peb_blk_bmap_wait_init_end(struct ssdfs_peb_blk_bmap *bmap); + +bool ssdfs_peb_blk_bmap_initialized(struct ssdfs_peb_blk_bmap *ptr); +bool is_ssdfs_peb_blk_bmap_dirty(struct ssdfs_peb_blk_bmap *ptr); + +int ssdfs_peb_blk_bmap_get_free_pages(struct ssdfs_peb_blk_bmap *ptr); +int ssdfs_peb_blk_bmap_get_used_pages(struct ssdfs_peb_blk_bmap *ptr); +int ssdfs_peb_blk_bmap_get_invalid_pages(struct ssdfs_peb_blk_bmap *ptr); + +int ssdfs_peb_define_reserved_pages_per_log(struct ssdfs_peb_blk_bmap *bmap); +int ssdfs_peb_blk_bmap_reserve_metapages(struct ssdfs_peb_blk_bmap *bmap, + int bmap_index, + u32 count); +int ssdfs_peb_blk_bmap_free_metapages(struct ssdfs_peb_blk_bmap *bmap, + int bmap_index, + u32 count); +int ssdfs_peb_blk_bmap_pre_allocate(struct ssdfs_peb_blk_bmap *bmap, + int bmap_index, + u32 *len, + struct ssdfs_block_bmap_range *range); +int ssdfs_peb_blk_bmap_allocate(struct ssdfs_peb_blk_bmap *bmap, + int bmap_index, + u32 *len, + struct ssdfs_block_bmap_range *range); +int ssdfs_peb_blk_bmap_invalidate(struct ssdfs_peb_blk_bmap *bmap, + int bmap_index, + struct ssdfs_block_bmap_range *range); +int ssdfs_peb_blk_bmap_update_range(struct ssdfs_peb_blk_bmap *bmap, + int bmap_index, + int new_range_state, + struct ssdfs_block_bmap_range *range); +int ssdfs_peb_blk_bmap_collect_garbage(struct ssdfs_peb_blk_bmap *bmap, + u32 start, u32 max_len, + int blk_state, + struct ssdfs_block_bmap_range *range); +int ssdfs_peb_blk_bmap_start_migration(struct ssdfs_peb_blk_bmap *bmap); +int ssdfs_peb_blk_bmap_migrate(struct ssdfs_peb_blk_bmap *bmap, + int new_range_state, + struct ssdfs_block_bmap_range *range); +int ssdfs_peb_blk_bmap_finish_migration(struct ssdfs_peb_blk_bmap *bmap); + +/* + * PEB block bitmap internal API + */ +int ssdfs_src_blk_bmap_get_free_pages(struct ssdfs_peb_blk_bmap *ptr); +int ssdfs_src_blk_bmap_get_used_pages(struct ssdfs_peb_blk_bmap *ptr); +int ssdfs_src_blk_bmap_get_invalid_pages(struct ssdfs_peb_blk_bmap *ptr); +int ssdfs_dst_blk_bmap_get_free_pages(struct ssdfs_peb_blk_bmap *ptr); +int ssdfs_dst_blk_bmap_get_used_pages(struct ssdfs_peb_blk_bmap *ptr); +int ssdfs_dst_blk_bmap_get_invalid_pages(struct ssdfs_peb_blk_bmap *ptr); + +#endif /* _SSDFS_PEB_BLOCK_BITMAP_H */ From patchwork Sat Feb 25 01:08:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151919 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BD5AC7EE30 for ; Sat, 25 Feb 2023 01:16:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229672AbjBYBQc (ORCPT ); Fri, 24 Feb 2023 20:16:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229600AbjBYBQ1 (ORCPT ); Fri, 24 Feb 2023 20:16:27 -0500 Received: from mail-oi1-x231.google.com (mail-oi1-x231.google.com [IPv6:2607:f8b0:4864:20::231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 22D6012879 for ; Fri, 24 Feb 2023 17:16:05 -0800 (PST) Received: by mail-oi1-x231.google.com with SMTP id e21so828394oie.1 for ; Fri, 24 Feb 2023 17:16:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=adZFM1OM/LYAcH6+RB4uxm5KB114qE5sEdsKi5M/0yY=; b=xlxY+ORPfu/B/W9RJqrAwE3gdn8pZHNOJefSUx5tvYl9DMijsXnBVccQTyMjXBM9SS tStePYnFtkenRAyKk9gBiOeeq8nHSc93RlT3KmydmH5XvvXjruWubRDR2/w8L60TbnoQ CR1jhIUPvxVUqVPDKouQ5k8+lICwUASBzjgwPREyWbcmfeqdUPoeaVTe2sWit0XjltiI 6aB79ygFm4PQea2c3lFpiW6ES1Y8BNkx45L7w3nBJKnnLT/uJsvGZZX+tHanrW/96qHh TxTWF0JXQH52nN6gqs5HErVBmF58Avy+ksRufhqyXlBqnbThGeJa/Sd7PIrzTLrVO0sp YH1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=adZFM1OM/LYAcH6+RB4uxm5KB114qE5sEdsKi5M/0yY=; b=Q9oOfU3GGuB870sVFQ2s8IMf2/VDqL+6NgqpgV9M3qsp7jw06WuEBeuj+I/tXqAmqD Dy907IhoS8PxuSS1kSmH5+262xANAspGFcWJzIPkDaLoXB5QQiN/8TKrODCT3dlyl6QY 2/v4FGLkpWy4e8WUTgM0eLNehh+bxzDET55hn1U0jkd45xfL6lyY0Qw7LdZ12b53oW1Z cSLXyz7qRzhGuNVfNRi87vppRiKGec4T5/7TT1or7XSBWH5CqmVEKl+AYFlxxIQimZeP 7I7k5yBZnqmuj59DryqnSnK8EKWcz+860ZMsThSdurCWw56XL8wgBEnTX8V+9LMqtyOq VfWQ== X-Gm-Message-State: AO0yUKV4ohJCGqPFuIEF2zAhc8X3p3bdBEmP888vsyndO6izUBQNCWR6 wJ7lSNbokA/nf5vjSeiVhhbswNyXUWXkluA5 X-Google-Smtp-Source: AK7set+OrxTOjgL/sHOilEHem0HFLWKID85aq3Tw9WZc4n7T5rgq9/yIfqbHZjIU4F1Ww3Gtun1+Ig== X-Received: by 2002:a05:6808:b19:b0:35e:92bc:9f72 with SMTP id s25-20020a0568080b1900b0035e92bc9f72mr7815294oij.30.1677287763693; Fri, 24 Feb 2023 17:16:03 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:02 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 14/76] ssdfs: PEB block bitmap modification operations Date: Fri, 24 Feb 2023 17:08:25 -0800 Message-Id: <20230225010927.813929-15-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This patch implements PEB block bitmap modification operations: pre_allocate - pre_allocate page/range in aggregation of block bitmaps allocate - allocate page/range in aggregation of block bitmaps invalidate - invalidate page/range in aggregation of block bitmaps update_range - change the state of range in aggregation of block bitmaps collect_garbage - find contiguous range for requested state start_migration - prepare PEB's environment for migration migrate - move range from source block bitmap into destination one finish_migration - clean source block bitmap and swap block bitmaps Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/peb_block_bitmap.c | 2418 +++++++++++++++++++++++++++++++++++ 1 file changed, 2418 insertions(+) diff --git a/fs/ssdfs/peb_block_bitmap.c b/fs/ssdfs/peb_block_bitmap.c index 0011ed7dc306..1938e1ccc02a 100644 --- a/fs/ssdfs/peb_block_bitmap.c +++ b/fs/ssdfs/peb_block_bitmap.c @@ -1538,3 +1538,2421 @@ int ssdfs_dst_blk_bmap_get_invalid_pages(struct ssdfs_peb_blk_bmap *bmap) return err; } + +/* + * ssdfs_peb_blk_bmap_reserve_metapages() - reserve metadata pages + * @bmap: PEB's block bitmap object + * @bmap_index: source or destination block bitmap? + * @count: amount of metadata pages + * + * This function tries to reserve some amount of metadata pages. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - unable to reserve metapages. + * %-ERANGE - internal error. + */ +int ssdfs_peb_blk_bmap_reserve_metapages(struct ssdfs_peb_blk_bmap *bmap, + int bmap_index, + u32 count) +{ + struct ssdfs_block_bmap *cur_bmap = NULL; + int reserving_blks = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap); + + SSDFS_DBG("seg %llu, bmap %p, bmap_index %u, count %u, " + "free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u\n", + bmap->parent->parent_si->seg_id, + bmap, bmap_index, count, + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + SSDFS_ERR("PEB block bitmap init failed: " + "seg_id %llu, peb_index %u, " + "err %d\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, err); + return err; + } + + if (bmap_index < 0 || bmap_index >= SSDFS_PEB_BLK_BMAP_INDEX_MAX) { + SSDFS_WARN("invalid bmap_index %u\n", + bmap_index); + return -ERANGE; + } + + down_read(&bmap->lock); + + down_write(&bmap->parent->modification_lock); + down_write(&bmap->modification_lock); + + reserving_blks = min_t(int, (int)count, + atomic_read(&bmap->peb_free_blks)); + reserving_blks = min_t(int, reserving_blks, + atomic_read(&bmap->parent->seg_free_blks)); + + if (count > atomic_read(&bmap->peb_free_blks) || + count > atomic_read(&bmap->parent->seg_free_blks)) { + err = -ENOSPC; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to reserve: " + "count %u, free_logical_blks %d, " + "parent->free_logical_blks %d\n", + count, + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->parent->seg_free_blks)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (reserving_blks > 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try to reserve: " + "reserving_blks %d\n", + reserving_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + } else + goto finish_calculate_reserving_blks; + } + + atomic_sub(reserving_blks, &bmap->peb_free_blks); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + + if (atomic_read(&bmap->peb_free_blks) < 0) { + SSDFS_WARN("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + } + + if ((atomic_read(&bmap->peb_free_blks) + + atomic_read(&bmap->peb_valid_blks) + + atomic_read(&bmap->peb_invalid_blks)) > + bmap->pages_per_peb) { + SSDFS_WARN("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + atomic_sub(reserving_blks, &bmap->parent->seg_free_blks); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("parent->free_logical_blks %u, " + "parent->valid_logical_blks %u, " + "parent->invalid_logical_blks %u, " + "pages_per_peb %u\n", + atomic_read(&bmap->parent->seg_free_blks), + atomic_read(&bmap->parent->seg_valid_blks), + atomic_read(&bmap->parent->seg_invalid_blks), + bmap->parent->pages_per_peb); + + if (atomic_read(&bmap->peb_free_blks) < 0) { + SSDFS_WARN("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + } + + if ((atomic_read(&bmap->peb_free_blks) + + atomic_read(&bmap->peb_valid_blks) + + atomic_read(&bmap->peb_invalid_blks)) > + bmap->pages_per_peb) { + SSDFS_WARN("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + } +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_calculate_reserving_blks: + up_write(&bmap->modification_lock); + up_write(&bmap->parent->modification_lock); + + if (reserving_blks <= 0 && err) + goto finish_reserve_metapages; + + if (bmap_index == SSDFS_PEB_BLK_BMAP_SOURCE) + cur_bmap = bmap->src; + else if (bmap_index == SSDFS_PEB_BLK_BMAP_DESTINATION) + cur_bmap = bmap->dst; + else + cur_bmap = NULL; + + if (cur_bmap == NULL) { + err = -ERANGE; + SSDFS_WARN("bmap pointer is empty\n"); + goto finish_reserve_metapages; + } + + err = ssdfs_block_bmap_lock(cur_bmap); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_reserve_metapages; + } + + err = ssdfs_block_bmap_reserve_metadata_pages(cur_bmap, + reserving_blks); + ssdfs_block_bmap_unlock(cur_bmap); + + if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to reserve metadata pages: " + "reserving_blks %d\n", + reserving_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&bmap->parent->modification_lock); + down_write(&bmap->modification_lock); + atomic_add(reserving_blks, &bmap->peb_free_blks); + atomic_add(reserving_blks, &bmap->parent->seg_free_blks); + up_write(&bmap->modification_lock); + up_write(&bmap->parent->modification_lock); + + goto finish_reserve_metapages; + } else if (unlikely(err)) { + SSDFS_ERR("fail to reserve metadata pages: " + "reserving_blks %d, err %d\n", + reserving_blks, err); + + down_write(&bmap->parent->modification_lock); + down_write(&bmap->modification_lock); + atomic_add(reserving_blks, &bmap->peb_free_blks); + atomic_add(reserving_blks, &bmap->parent->seg_free_blks); + up_write(&bmap->modification_lock); + up_write(&bmap->parent->modification_lock); + + goto finish_reserve_metapages; + } + +finish_reserve_metapages: + up_read(&bmap->lock); + + return err; +} + +/* + * ssdfs_peb_blk_bmap_free_metapages() - free metadata pages + * @bmap: PEB's block bitmap object + * @bmap_index: source or destination block bitmap? + * @count: amount of metadata pages + * + * This function tries to free some amount of metadata pages. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_peb_blk_bmap_free_metapages(struct ssdfs_peb_blk_bmap *bmap, + int bmap_index, + u32 count) +{ + struct ssdfs_block_bmap *cur_bmap = NULL; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap); + + SSDFS_DBG("seg %llu, bmap %p, bmap_index %u, count %u, " + "free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u\n", + bmap->parent->parent_si->seg_id, + bmap, bmap_index, count, + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = SSDFS_WAIT_COMPLETION(&bmap->init_end); + if (unlikely(err)) { +init_failed: + SSDFS_ERR("PEB block bitmap init failed: " + "seg_id %llu, peb_index %u, " + "err %d\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, err); + return err; + } + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = -ERANGE; + goto init_failed; + } + } + + if (bmap_index < 0 || bmap_index >= SSDFS_PEB_BLK_BMAP_INDEX_MAX) { + SSDFS_WARN("invalid bmap_index %u\n", + bmap_index); + return -ERANGE; + } + + down_read(&bmap->lock); + + if (bmap_index == SSDFS_PEB_BLK_BMAP_SOURCE) + cur_bmap = bmap->src; + else if (bmap_index == SSDFS_PEB_BLK_BMAP_DESTINATION) + cur_bmap = bmap->dst; + else + cur_bmap = NULL; + + if (cur_bmap == NULL) { + err = -ERANGE; + SSDFS_WARN("bmap pointer is empty\n"); + goto finish_free_metapages; + } + + err = ssdfs_block_bmap_lock(cur_bmap); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_free_metapages; + } + + err = ssdfs_block_bmap_free_metadata_pages(cur_bmap, count); + ssdfs_block_bmap_unlock(cur_bmap); + + if (unlikely(err)) { + SSDFS_ERR("fail to free metadata pages: " + "count %u, err %d\n", + count, err); + goto finish_free_metapages; + } + + down_write(&bmap->parent->modification_lock); + down_write(&bmap->modification_lock); + + atomic_add(count, &bmap->peb_free_blks); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + + if ((atomic_read(&bmap->peb_free_blks) + + atomic_read(&bmap->peb_valid_blks) + + atomic_read(&bmap->peb_invalid_blks)) > + bmap->pages_per_peb) { + SSDFS_WARN("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + atomic_add(count, &bmap->parent->seg_free_blks); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("parent->free_logical_blks %u, " + "parent->valid_logical_blks %u, " + "parent->invalid_logical_blks %u, " + "pages_per_peb %u\n", + atomic_read(&bmap->parent->seg_free_blks), + atomic_read(&bmap->parent->seg_valid_blks), + atomic_read(&bmap->parent->seg_invalid_blks), + bmap->parent->pages_per_peb); + + if ((atomic_read(&bmap->peb_free_blks) + + atomic_read(&bmap->peb_valid_blks) + + atomic_read(&bmap->peb_invalid_blks)) > + bmap->pages_per_peb) { + SSDFS_WARN("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + up_write(&bmap->modification_lock); + up_write(&bmap->parent->modification_lock); + +finish_free_metapages: + up_read(&bmap->lock); + + return err; +} + +/* + * ssdfs_peb_blk_bmap_pre_allocate() - pre-allocate a range of blocks + * @bmap: PEB's block bitmap object + * @bmap_index: source or destination block bitmap? + * @len: pointer on variable with requested length of range + * @range: pointer on blocks' range [in | out] + * + * This function tries to find contiguous range of free blocks and + * to set the found range in pre-allocated state. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_peb_blk_bmap_pre_allocate(struct ssdfs_peb_blk_bmap *bmap, + int bmap_index, + u32 *len, + struct ssdfs_block_bmap_range *range) +{ + struct ssdfs_segment_info *si; + struct ssdfs_peb_container *pebc; + struct ssdfs_block_bmap *cur_bmap = NULL; + bool is_migrating = false; + int src_used_blks = 0; + int src_invalid_blks = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap || !range || !bmap->src); + + SSDFS_DBG("bmap %p, bmap_index %u, len %p\n", + bmap, bmap_index, len); + SSDFS_DBG("seg %llu, free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + bmap->parent->parent_si->seg_id, + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = SSDFS_WAIT_COMPLETION(&bmap->init_end); + if (unlikely(err)) { +init_failed: + SSDFS_ERR("PEB block bitmap init failed: " + "seg_id %llu, peb_index %u, " + "err %d\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, err); + return err; + } + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = -ERANGE; + goto init_failed; + } + } + + if (bmap_index < 0 || bmap_index >= SSDFS_PEB_BLK_BMAP_INDEX_MAX) { + SSDFS_WARN("invalid bmap_index %u\n", + bmap_index); + return -ERANGE; + } + + si = bmap->parent->parent_si; + + if (bmap->peb_index >= si->pebs_count) { + SSDFS_ERR("peb_index %u >= pebs_count %u\n", + bmap->peb_index, si->pebs_count); + return -ERANGE; + } + + pebc = &si->peb_array[bmap->peb_index]; + + switch (atomic_read(&bmap->buffers_state)) { + case SSDFS_PEB_BMAP1_SRC: + case SSDFS_PEB_BMAP2_SRC: + BUG_ON(bmap_index == SSDFS_PEB_BLK_BMAP_DESTINATION); + break; + + case SSDFS_PEB_BMAP1_SRC_PEB_BMAP2_DST: + case SSDFS_PEB_BMAP2_SRC_PEB_BMAP1_DST: + /* valid state */ + is_migrating = true; + break; + + default: + SSDFS_WARN("invalid buffers_state %#x\n", + atomic_read(&bmap->buffers_state)); + return -ERANGE; + } + + down_read(&bmap->lock); + down_write(&bmap->parent->modification_lock); + down_write(&bmap->modification_lock); + + if (bmap_index == SSDFS_PEB_BLK_BMAP_SOURCE) { + cur_bmap = bmap->src; + is_migrating = false; + } else if (bmap_index == SSDFS_PEB_BLK_BMAP_DESTINATION) { + cur_bmap = bmap->src; + + if (cur_bmap == NULL) { + err = -ERANGE; + SSDFS_WARN("bmap pointer is empty\n"); + goto finish_pre_allocate; + } + + err = ssdfs_block_bmap_lock(cur_bmap); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", + err); + goto finish_pre_allocate; + } + + src_used_blks = ssdfs_block_bmap_get_used_pages(cur_bmap); + if (src_used_blks < 0) { + err = src_used_blks; + SSDFS_ERR("fail to get SRC used blocks: err %d\n", + err); + goto finish_check_src_bmap; + } + + src_invalid_blks = ssdfs_block_bmap_get_invalid_pages(cur_bmap); + if (src_invalid_blks < 0) { + err = src_invalid_blks; + SSDFS_ERR("fail to get SRC invalid blocks: err %d\n", + err); + goto finish_check_src_bmap; + } + +finish_check_src_bmap: + ssdfs_block_bmap_unlock(cur_bmap); + + if (unlikely(err)) + goto finish_pre_allocate; + + cur_bmap = bmap->dst; + } else + cur_bmap = NULL; + + if (cur_bmap == NULL) { + err = -ERANGE; + SSDFS_WARN("bmap pointer is empty\n"); + goto finish_pre_allocate; + } + + err = ssdfs_block_bmap_lock(cur_bmap); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_pre_allocate; + } + + if (is_migrating) { + int start_blk = src_used_blks + src_invalid_blks; + + start_blk = max_t(int, start_blk, + atomic_read(&bmap->peb_valid_blks)); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("src_used_blks %d, src_invalid_blks %d, " + "valid_blks %d, start_blk %d\n", + src_used_blks, src_invalid_blks, + atomic_read(&bmap->peb_valid_blks), + start_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_block_bmap_pre_allocate(cur_bmap, start_blk, + len, range); + } else + err = ssdfs_block_bmap_pre_allocate(cur_bmap, 0, len, range); + + ssdfs_block_bmap_unlock(cur_bmap); + + if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to pre-allocate blocks: " + "len %u, err %d\n", + *len, err); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_pre_allocate; + } else if (unlikely(err)) { + SSDFS_ERR("fail to pre-allocate blocks: " + "len %u, err %d\n", + *len, err); + goto finish_pre_allocate; + } + + if (!is_migrating) { + if (range->len > atomic_read(&bmap->peb_free_blks)) { + err = -ERANGE; + SSDFS_ERR("range %u > free_logical_blks %d\n", + range->len, + atomic_read(&bmap->peb_free_blks)); + goto finish_pre_allocate; + } + + atomic_sub(range->len, &bmap->peb_free_blks); + atomic_add(range->len, &bmap->peb_valid_blks); + atomic_add(range->len, &bmap->parent->seg_valid_blks); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + + if (atomic_read(&bmap->peb_free_blks) < 0) { + SSDFS_WARN("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + } + + if ((atomic_read(&bmap->peb_free_blks) + + atomic_read(&bmap->peb_valid_blks) + + atomic_read(&bmap->peb_invalid_blks)) > + bmap->pages_per_peb) { + SSDFS_WARN("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + if (bmap_index == SSDFS_PEB_BLK_BMAP_DESTINATION) { + int shared_free_blks; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("range->len %u, shared_free_dst_blks %d\n", + range->len, + atomic_read(&pebc->shared_free_dst_blks)); +#endif /* CONFIG_SSDFS_DEBUG */ + + shared_free_blks = + atomic_sub_return(range->len, + &pebc->shared_free_dst_blks); + if (shared_free_blks < 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("range->len %u, shared_free_dst_blks %d\n", + range->len, + atomic_read(&pebc->shared_free_dst_blks)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("parent->free_logical_blks %u, " + "parent->valid_logical_blks %u, " + "parent->invalid_logical_blks %u, " + "pages_per_peb %u\n", + atomic_read(&bmap->parent->seg_free_blks), + atomic_read(&bmap->parent->seg_valid_blks), + atomic_read(&bmap->parent->seg_invalid_blks), + bmap->parent->pages_per_peb); + + if (atomic_read(&bmap->peb_free_blks) < 0) { + SSDFS_WARN("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + } + + if ((atomic_read(&bmap->peb_free_blks) + + atomic_read(&bmap->peb_valid_blks) + + atomic_read(&bmap->peb_invalid_blks)) > + bmap->pages_per_peb) { + SSDFS_WARN("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + } +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_pre_allocate: + up_write(&bmap->modification_lock); + up_write(&bmap->parent->modification_lock); + up_read(&bmap->lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PRE-ALLOCATED: range (start %u, len %u), err %d\n", + range->start, range->len, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_peb_blk_bmap_allocate() - allocate a range of blocks + * @bmap: PEB's block bitmap object + * @bmap_index: source or destination block bitmap? + * @len: pointer on variable with requested length of range + * @range: pointer on blocks' range [in | out] + * + * This function tries to find contiguous range of free blocks and + * to set the found range in allocated state. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_peb_blk_bmap_allocate(struct ssdfs_peb_blk_bmap *bmap, + int bmap_index, + u32 *len, + struct ssdfs_block_bmap_range *range) +{ + struct ssdfs_segment_info *si; + struct ssdfs_peb_container *pebc; + struct ssdfs_block_bmap *cur_bmap = NULL; + bool is_migrating = false; + int src_used_blks = 0; + int src_invalid_blks = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap || !range || !bmap->src); + + SSDFS_DBG("bmap %p, bmap_index %u, len %p\n", + bmap, bmap_index, len); + SSDFS_DBG("seg %llu, free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + bmap->parent->parent_si->seg_id, + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = SSDFS_WAIT_COMPLETION(&bmap->init_end); + if (unlikely(err)) { +init_failed: + SSDFS_ERR("PEB block bitmap init failed: " + "seg_id %llu, peb_index %u, " + "err %d\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, err); + return err; + } + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = -ERANGE; + goto init_failed; + } + } + + if (bmap_index < 0 || bmap_index >= SSDFS_PEB_BLK_BMAP_INDEX_MAX) { + SSDFS_WARN("invalid bmap_index %u\n", + bmap_index); + return -ERANGE; + } + + si = bmap->parent->parent_si; + + if (bmap->peb_index >= si->pebs_count) { + SSDFS_ERR("peb_index %u >= pebs_count %u\n", + bmap->peb_index, si->pebs_count); + return -ERANGE; + } + + pebc = &si->peb_array[bmap->peb_index]; + + switch (atomic_read(&bmap->buffers_state)) { + case SSDFS_PEB_BMAP1_SRC: + case SSDFS_PEB_BMAP2_SRC: + BUG_ON(bmap_index == SSDFS_PEB_BLK_BMAP_DESTINATION); + break; + + case SSDFS_PEB_BMAP1_SRC_PEB_BMAP2_DST: + case SSDFS_PEB_BMAP2_SRC_PEB_BMAP1_DST: + /* valid state */ + is_migrating = true; + break; + + default: + SSDFS_WARN("invalid buffers_state %#x\n", + atomic_read(&bmap->buffers_state)); + return -ERANGE; + } + + down_read(&bmap->lock); + down_write(&bmap->parent->modification_lock); + down_write(&bmap->modification_lock); + + if (bmap_index == SSDFS_PEB_BLK_BMAP_SOURCE) { + cur_bmap = bmap->src; + is_migrating = false; + } else if (bmap_index == SSDFS_PEB_BLK_BMAP_DESTINATION) { + cur_bmap = bmap->src; + + if (cur_bmap == NULL) { + err = -ERANGE; + SSDFS_WARN("bmap pointer is empty\n"); + goto finish_allocate; + } + + err = ssdfs_block_bmap_lock(cur_bmap); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", + err); + goto finish_allocate; + } + + src_used_blks = ssdfs_block_bmap_get_used_pages(cur_bmap); + if (src_used_blks < 0) { + err = src_used_blks; + SSDFS_ERR("fail to get SRC used blocks: err %d\n", + err); + goto finish_check_src_bmap; + } + + src_invalid_blks = ssdfs_block_bmap_get_invalid_pages(cur_bmap); + if (src_invalid_blks < 0) { + err = src_invalid_blks; + SSDFS_ERR("fail to get SRC invalid blocks: err %d\n", + err); + goto finish_check_src_bmap; + } + +finish_check_src_bmap: + ssdfs_block_bmap_unlock(cur_bmap); + + if (unlikely(err)) + goto finish_allocate; + + cur_bmap = bmap->dst; + } else + cur_bmap = NULL; + + if (cur_bmap == NULL) { + err = -ERANGE; + SSDFS_WARN("bmap pointer is empty\n"); + goto finish_allocate; + } + + err = ssdfs_block_bmap_lock(cur_bmap); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_allocate; + } + + if (is_migrating) { + int start_blk = src_used_blks + src_invalid_blks; + + start_blk = max_t(int, start_blk, + atomic_read(&bmap->peb_valid_blks)); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("src_used_blks %d, src_invalid_blks %d, " + "valid_blks %d, start_blk %d\n", + src_used_blks, src_invalid_blks, + atomic_read(&bmap->peb_valid_blks), + start_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_block_bmap_allocate(cur_bmap, start_blk, + len, range); + } else + err = ssdfs_block_bmap_allocate(cur_bmap, 0, len, range); + + ssdfs_block_bmap_unlock(cur_bmap); + + if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to allocate blocks: " + "len %u, err %d\n", + *len, err); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_allocate; + } else if (unlikely(err)) { + SSDFS_ERR("fail to allocate blocks: " + "len %u, err %d\n", + *len, err); + goto finish_allocate; + } + + if (!is_migrating) { + if (range->len > atomic_read(&bmap->peb_free_blks)) { + err = -ERANGE; + SSDFS_ERR("range %u > free_logical_blks %d\n", + range->len, + atomic_read(&bmap->peb_free_blks)); + goto finish_allocate; + } + + atomic_sub(range->len, &bmap->peb_free_blks); + atomic_add(range->len, &bmap->peb_valid_blks); + atomic_add(range->len, &bmap->parent->seg_valid_blks); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + + if (atomic_read(&bmap->peb_free_blks) < 0) { + SSDFS_WARN("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + } + + if ((atomic_read(&bmap->peb_free_blks) + + atomic_read(&bmap->peb_valid_blks) + + atomic_read(&bmap->peb_invalid_blks)) > + bmap->pages_per_peb) { + SSDFS_WARN("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + if (bmap_index == SSDFS_PEB_BLK_BMAP_DESTINATION) { + int shared_free_blks; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("range->len %u, shared_free_dst_blks %d\n", + range->len, + atomic_read(&pebc->shared_free_dst_blks)); +#endif /* CONFIG_SSDFS_DEBUG */ + + shared_free_blks = + atomic_sub_return(range->len, + &pebc->shared_free_dst_blks); + if (shared_free_blks < 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("range->len %u, shared_free_dst_blks %d\n", + range->len, + atomic_read(&pebc->shared_free_dst_blks)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("parent->free_logical_blks %u, " + "parent->valid_logical_blks %u, " + "parent->invalid_logical_blks %u, " + "pages_per_peb %u\n", + atomic_read(&bmap->parent->seg_free_blks), + atomic_read(&bmap->parent->seg_valid_blks), + atomic_read(&bmap->parent->seg_invalid_blks), + bmap->parent->pages_per_peb); + + if (atomic_read(&bmap->peb_free_blks) < 0) { + SSDFS_WARN("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + } + + if ((atomic_read(&bmap->peb_free_blks) + + atomic_read(&bmap->peb_valid_blks) + + atomic_read(&bmap->peb_invalid_blks)) > + bmap->pages_per_peb) { + SSDFS_WARN("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + } +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_allocate: + up_write(&bmap->modification_lock); + up_write(&bmap->parent->modification_lock); + up_read(&bmap->lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ALLOCATED: range (start %u, len %u), err %d\n", + range->start, range->len, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_peb_blk_bmap_invalidate() - invalidate a range of blocks + * @bmap: PEB's block bitmap object + * @bmap_index: source or destination block bitmap? + * @range: pointer on blocks' range [in | out] + * + * This function tries to set the requested range of blocks in + * invalid state. At first, it checks that requested range contains + * valid blocks only. And, then, it sets the requested range of blocks + * in invalid state. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_peb_blk_bmap_invalidate(struct ssdfs_peb_blk_bmap *bmap, + int bmap_index, + struct ssdfs_block_bmap_range *range) +{ + struct ssdfs_block_bmap *cur_bmap = NULL; + bool is_migrating = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap || !range || !bmap->src); + + SSDFS_DBG("seg %llu, bmap %p, bmap_index %u, " + "range (start %u, len %u)\n", + bmap->parent->parent_si->seg_id, + bmap, bmap_index, range->start, range->len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = SSDFS_WAIT_COMPLETION(&bmap->init_end); + if (unlikely(err)) { +init_failed: + SSDFS_ERR("PEB block bitmap init failed: " + "seg_id %llu, peb_index %u, " + "err %d\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, err); + return err; + } + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = -ERANGE; + goto init_failed; + } + } + + if (bmap_index < 0 || bmap_index >= SSDFS_PEB_BLK_BMAP_INDEX_MAX) { + SSDFS_WARN("invalid bmap_index %u\n", + bmap_index); + return -ERANGE; + } + + switch (atomic_read(&bmap->buffers_state)) { + case SSDFS_PEB_BMAP1_SRC: + case SSDFS_PEB_BMAP2_SRC: + BUG_ON(bmap_index == SSDFS_PEB_BLK_BMAP_DESTINATION); + break; + + case SSDFS_PEB_BMAP1_SRC_PEB_BMAP2_DST: + case SSDFS_PEB_BMAP2_SRC_PEB_BMAP1_DST: + /* valid state */ + is_migrating = true; + break; + + default: + SSDFS_WARN("invalid buffers_state %#x\n", + atomic_read(&bmap->buffers_state)); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u, " + "is_migrating %#x\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb, is_migrating); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&bmap->lock); + down_write(&bmap->parent->modification_lock); + down_write(&bmap->modification_lock); + + if (bmap_index == SSDFS_PEB_BLK_BMAP_SOURCE) + cur_bmap = bmap->src; + else if (bmap_index == SSDFS_PEB_BLK_BMAP_DESTINATION) + cur_bmap = bmap->dst; + else + cur_bmap = NULL; + + if (cur_bmap == NULL) { + err = -ERANGE; + SSDFS_WARN("bmap pointer is empty\n"); + goto finish_invalidate; + } + + err = ssdfs_block_bmap_lock(cur_bmap); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_invalidate; + } + + err = ssdfs_block_bmap_invalidate(cur_bmap, range); + + ssdfs_block_bmap_unlock(cur_bmap); + + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate blocks: " + "len %u, err %d\n", + range->len, err); + goto finish_invalidate; + } + + if (!is_migrating) { + if (range->len > atomic_read(&bmap->peb_valid_blks)) { + err = -ERANGE; + SSDFS_ERR("range %u > valid_logical_blks %d\n", + range->len, + atomic_read(&bmap->peb_valid_blks)); + goto finish_invalidate; + } + + atomic_sub(range->len, &bmap->peb_valid_blks); + atomic_add(range->len, &bmap->peb_invalid_blks); + + atomic_sub(range->len, &bmap->parent->seg_valid_blks); + atomic_add(range->len, &bmap->parent->seg_invalid_blks); + } else if (is_migrating && + bmap_index == SSDFS_PEB_BLK_BMAP_DESTINATION) { + if (range->len > atomic_read(&bmap->peb_valid_blks)) { + err = -ERANGE; + SSDFS_ERR("range %u > valid_logical_blks %d\n", + range->len, + atomic_read(&bmap->peb_valid_blks)); + goto finish_invalidate; + } + + atomic_sub(range->len, &bmap->peb_valid_blks); + atomic_add(range->len, &bmap->peb_invalid_blks); + + atomic_sub(range->len, &bmap->parent->seg_valid_blks); + atomic_add(range->len, &bmap->parent->seg_invalid_blks); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + + if (atomic_read(&bmap->peb_free_blks) < 0) { + SSDFS_WARN("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + } + + if ((atomic_read(&bmap->peb_free_blks) + + atomic_read(&bmap->peb_valid_blks) + + atomic_read(&bmap->peb_invalid_blks)) > + bmap->pages_per_peb) { + SSDFS_WARN("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + } + + SSDFS_DBG("parent->free_logical_blks %u, " + "parent->valid_logical_blks %u, " + "parent->invalid_logical_blks %u, " + "pages_per_peb %u\n", + atomic_read(&bmap->parent->seg_free_blks), + atomic_read(&bmap->parent->seg_valid_blks), + atomic_read(&bmap->parent->seg_invalid_blks), + bmap->parent->pages_per_peb); + + if (atomic_read(&bmap->peb_free_blks) < 0) { + SSDFS_WARN("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + } + + if ((atomic_read(&bmap->peb_free_blks) + + atomic_read(&bmap->peb_valid_blks) + + atomic_read(&bmap->peb_invalid_blks)) > + bmap->pages_per_peb) { + SSDFS_WARN("free_logical_blks %u, valid_logical_blks %u, " + "invalid_logical_blks %u, pages_per_peb %u\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks), + bmap->pages_per_peb); + } +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_invalidate: + up_write(&bmap->modification_lock); + up_write(&bmap->parent->modification_lock); + up_read(&bmap->lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("INVALIDATED: seg %llu, " + "range (start %u, len %u), err %d\n", + bmap->parent->parent_si->seg_id, + range->start, range->len, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_peb_blk_bmap_update_range() - update a range of blocks' state + * @bmap: PEB's block bitmap object + * @bmap_index: source or destination block bitmap? + * @new_range_state: new state of the range + * @range: pointer on blocks' range [in | out] + * + * This function tries to change a range of blocks' state. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_peb_blk_bmap_update_range(struct ssdfs_peb_blk_bmap *bmap, + int bmap_index, + int new_range_state, + struct ssdfs_block_bmap_range *range) +{ + struct ssdfs_block_bmap *cur_bmap = NULL; + int range_state; +#ifdef CONFIG_SSDFS_DEBUG + int free_blks, used_blks, invalid_blks; +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap || !range); + BUG_ON(!(new_range_state == SSDFS_BLK_PRE_ALLOCATED || + new_range_state == SSDFS_BLK_VALID)); + + SSDFS_DBG("bmap %p, peb_index %u, state %#x, " + "new_range_state %#x, " + "range (start %u, len %u)\n", + bmap, bmap->peb_index, + atomic_read(&bmap->state), + new_range_state, + range->start, range->len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = SSDFS_WAIT_COMPLETION(&bmap->init_end); + if (unlikely(err)) { +init_failed: + SSDFS_ERR("PEB block bitmap init failed: " + "seg_id %llu, peb_index %u, " + "err %d\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, err); + return err; + } + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = -ERANGE; + goto init_failed; + } + } + + if (bmap_index < 0 || bmap_index >= SSDFS_PEB_BLK_BMAP_INDEX_MAX) { + SSDFS_WARN("invalid bmap_index %u\n", + bmap_index); + return -ERANGE; + } + + switch (atomic_read(&bmap->buffers_state)) { + case SSDFS_PEB_BMAP1_SRC: + case SSDFS_PEB_BMAP2_SRC: + BUG_ON(bmap_index == SSDFS_PEB_BLK_BMAP_DESTINATION); + break; + + case SSDFS_PEB_BMAP1_SRC_PEB_BMAP2_DST: + case SSDFS_PEB_BMAP2_SRC_PEB_BMAP1_DST: + /* valid state */ + break; + + default: + SSDFS_WARN("invalid buffers_state %#x\n", + atomic_read(&bmap->buffers_state)); + return -ERANGE; + } + + down_read(&bmap->lock); + + if (bmap_index == SSDFS_PEB_BLK_BMAP_SOURCE) + cur_bmap = bmap->src; + else if (bmap_index == SSDFS_PEB_BLK_BMAP_DESTINATION) + cur_bmap = bmap->dst; + else + cur_bmap = NULL; + + if (cur_bmap == NULL) { + err = -ERANGE; + SSDFS_WARN("bmap pointer is empty\n"); + goto finish_update_range; + } + + err = ssdfs_block_bmap_lock(cur_bmap); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_update_range; + } + +#ifdef CONFIG_SSDFS_DEBUG + err = ssdfs_block_bmap_get_free_pages(cur_bmap); + if (err < 0) { + SSDFS_ERR("fail to get free pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto finish_process_bmap; + } else { + free_blks = err; + err = 0; + } + + err = ssdfs_block_bmap_get_used_pages(cur_bmap); + if (err < 0) { + SSDFS_ERR("fail to get used pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto finish_process_bmap; + } else { + used_blks = err; + err = 0; + } + + err = ssdfs_block_bmap_get_invalid_pages(cur_bmap); + if (err < 0) { + SSDFS_ERR("fail to get invalid pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto finish_process_bmap; + } else { + invalid_blks = err; + err = 0; + } + + if (unlikely(err)) + goto finish_process_bmap; + + SSDFS_DBG("BEFORE: free_blks %d, used_blks %d, invalid_blks %d\n", + free_blks, used_blks, invalid_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + range_state = ssdfs_get_range_state(cur_bmap, range); + if (range_state < 0) { + err = range_state; + SSDFS_ERR("fail to detect range state: " + "range (start %u, len %u), err %d\n", + range->start, range->len, err); + goto finish_process_bmap; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("current range_state %#x\n", + range_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (range_state) { + case SSDFS_BLK_FREE: + /* valid block state */ + break; + + case SSDFS_BLK_PRE_ALLOCATED: + if (new_range_state == SSDFS_BLK_PRE_ALLOCATED) { + /* do nothing */ + goto finish_process_bmap; + } + break; + + case SSDFS_BLK_VALID: + if (new_range_state == SSDFS_BLK_PRE_ALLOCATED) { + err = -ERANGE; + SSDFS_WARN("fail to change state: " + "range_state %#x, " + "new_range_state %#x\n", + range_state, new_range_state); + goto finish_process_bmap; + } else if (new_range_state == SSDFS_BLK_VALID) { + /* do nothing */ + goto finish_process_bmap; + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid range state: %#x\n", + range_state); + goto finish_process_bmap; + }; + + if (new_range_state == SSDFS_BLK_PRE_ALLOCATED) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try to pre-allocate: " + "range (start %u, len %u)\n", + range->start, range->len); +#endif /* CONFIG_SSDFS_DEBUG */ + err = ssdfs_block_bmap_pre_allocate(cur_bmap, 0, NULL, range); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try to allocate: " + "range (start %u, len %u)\n", + range->start, range->len); +#endif /* CONFIG_SSDFS_DEBUG */ + err = ssdfs_block_bmap_allocate(cur_bmap, 0, NULL, range); + } + +finish_process_bmap: + ssdfs_block_bmap_unlock(cur_bmap); + + if (unlikely(err)) { + SSDFS_ERR("fail to update range: " + "range (start %u, len %u), " + "new_range_state %#x, err %d\n", + range->start, range->len, + new_range_state, err); + goto finish_update_range; + } + +#ifdef CONFIG_SSDFS_DEBUG + err = ssdfs_block_bmap_lock(cur_bmap); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_update_range; + } + + err = ssdfs_block_bmap_get_free_pages(cur_bmap); + if (err < 0) { + SSDFS_ERR("fail to get free pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto unlock_bmap; + } else { + free_blks = err; + err = 0; + } + + err = ssdfs_block_bmap_get_used_pages(cur_bmap); + if (err < 0) { + SSDFS_ERR("fail to get used pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto unlock_bmap; + } else { + used_blks = err; + err = 0; + } + + err = ssdfs_block_bmap_get_invalid_pages(cur_bmap); + if (err < 0) { + SSDFS_ERR("fail to get invalid pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto unlock_bmap; + } else { + invalid_blks = err; + err = 0; + } + +unlock_bmap: + ssdfs_block_bmap_unlock(cur_bmap); + + if (unlikely(err)) + goto finish_update_range; + + SSDFS_DBG("AFTER: free_blks %d, used_blks %d, invalid_blks %d\n", + free_blks, used_blks, invalid_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_update_range: + up_read(&bmap->lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("UPDATED: range (start %u, len %u), " + "new_range_state %#x, err %d\n", + range->start, range->len, + new_range_state, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_peb_blk_bmap_collect_garbage() - find range of valid blocks for GC + * @bmap: PEB's block bitmap object + * @start: starting position for search + * @max_len: maximum requested length of valid blocks' range + * @blk_state: requested block state (pre-allocated or valid) + * @range: pointer on blocks' range [out] + * + * This function tries to find range of valid or pre_allocated blocks + * for GC in source block bitmap. The length of requested range is + * limited by @max_len. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_peb_blk_bmap_collect_garbage(struct ssdfs_peb_blk_bmap *bmap, + u32 start, u32 max_len, + int blk_state, + struct ssdfs_block_bmap_range *range) +{ + struct ssdfs_block_bmap *src = NULL; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap || !range || !bmap->src); + + SSDFS_DBG("bmap %p, start %u, max_len %u\n", + bmap, start, max_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = SSDFS_WAIT_COMPLETION(&bmap->init_end); + if (unlikely(err)) { +init_failed: + SSDFS_ERR("PEB block bitmap init failed: " + "seg_id %llu, peb_index %u, " + "err %d\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, err); + return err; + } + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = -ERANGE; + goto init_failed; + } + } + + switch (atomic_read(&bmap->buffers_state)) { + case SSDFS_PEB_BMAP1_SRC: + case SSDFS_PEB_BMAP2_SRC: + case SSDFS_PEB_BMAP1_SRC_PEB_BMAP2_DST: + case SSDFS_PEB_BMAP2_SRC_PEB_BMAP1_DST: + /* valid state */ + break; + + default: + SSDFS_WARN("invalid buffers_state %#x\n", + atomic_read(&bmap->buffers_state)); + return -ERANGE; + } + + down_read(&bmap->lock); + + src = bmap->src; + + if (src == NULL) { + err = -ERANGE; + SSDFS_WARN("bmap pointer is empty\n"); + goto finish_collect_garbage; + } + + err = ssdfs_block_bmap_lock(src); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_collect_garbage; + } + + err = ssdfs_block_bmap_collect_garbage(src, start, max_len, + blk_state, range); + + ssdfs_block_bmap_unlock(src); + + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("range (start %u, len %u) hasn't valid blocks\n", + range->start, range->len); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_collect_garbage; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find valid blocks: " + "len %u, err %d\n", + range->len, err); + goto finish_collect_garbage; + } + +finish_collect_garbage: + up_read(&bmap->lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("GARBAGE: range (start %u, len %u), " + "blk_state %#x, err %d\n", + range->start, range->len, + blk_state, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_peb_blk_bmap_start_migration() - prepare migration environment + * @bmap: PEB's block bitmap object + * + * This method tries to prepare PEB's environment for migration. + * The destination block bitmap is cleaned in buffer and pointer + * is set. Also valid/invalid/free block counters are prepared + * for migration operation. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_peb_blk_bmap_start_migration(struct ssdfs_peb_blk_bmap *bmap) +{ + struct ssdfs_segment_info *si; + struct ssdfs_peb_container *pebc; + int buffers_state, new_buffers_state; + int buffer_index; + int free_blks = 0; + int invalid_blks; + int used_blks; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap || !bmap->src); + + SSDFS_DBG("bmap %p, peb_index %u, state %#x\n", + bmap, bmap->peb_index, + atomic_read(&bmap->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = SSDFS_WAIT_COMPLETION(&bmap->init_end); + if (unlikely(err)) { +init_failed: + SSDFS_ERR("PEB block bitmap init failed: " + "seg_id %llu, peb_index %u, " + "err %d\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, err); + return err; + } + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = -ERANGE; + goto init_failed; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_logical_blocks %d, valid_logical_block %d, " + "invalid_logical_block %d\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks)); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = bmap->parent->parent_si; + + if (bmap->peb_index >= si->pebs_count) { + SSDFS_ERR("peb_index %u >= pebs_count %u\n", + bmap->peb_index, si->pebs_count); + return -ERANGE; + } + + pebc = &si->peb_array[bmap->peb_index]; + + down_write(&bmap->lock); + down_write(&bmap->parent->modification_lock); + down_write(&bmap->modification_lock); + + buffers_state = atomic_read(&bmap->buffers_state); + + switch (buffers_state) { + case SSDFS_PEB_BMAP1_SRC: + new_buffers_state = SSDFS_PEB_BMAP1_SRC_PEB_BMAP2_DST; + buffer_index = SSDFS_PEB_BLK_BMAP2; + break; + + case SSDFS_PEB_BMAP2_SRC: + new_buffers_state = SSDFS_PEB_BMAP2_SRC_PEB_BMAP1_DST; + buffer_index = SSDFS_PEB_BLK_BMAP1; + break; + + case SSDFS_PEB_BMAP1_SRC_PEB_BMAP2_DST: + case SSDFS_PEB_BMAP2_SRC_PEB_BMAP1_DST: + err = -ENOENT; + SSDFS_WARN("bmap is under migration: " + "peb_index %u, state %#x\n", + bmap->peb_index, buffers_state); + goto finish_migration_start; + + default: + err = -ERANGE; + SSDFS_WARN("fail to start migration: " + "buffers_state %#x\n", + buffers_state); + goto finish_migration_start; + } + + err = ssdfs_block_bmap_lock(&bmap->buffer[buffer_index]); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_migration_start; + } + + switch (atomic_read(&bmap->buffers_state)) { + case SSDFS_PEB_BMAP1_SRC: +#ifdef CONFIG_SSDFS_DEBUG + WARN_ON(buffers_state != SSDFS_PEB_BMAP1_SRC); + BUG_ON(!bmap->src || bmap->dst); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + case SSDFS_PEB_BMAP2_SRC: +#ifdef CONFIG_SSDFS_DEBUG + WARN_ON(buffers_state != SSDFS_PEB_BMAP2_SRC); + BUG_ON(!bmap->src || bmap->dst); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + default: + err = -ENOENT; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("block bitmap has been prepared: " + "peb_index %u\n", + bmap->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_block_bitmap_preparation; + } + + err = ssdfs_block_bmap_clean(&bmap->buffer[buffer_index]); + if (unlikely(err == -ENOENT)) { + err = -ERANGE; + SSDFS_WARN("unable to clean block bitmap: " + "peb_index %u\n", + bmap->peb_index); + goto finish_block_bitmap_preparation; + } else if (unlikely(err)) { + SSDFS_ERR("fail to clean block bitmap: " + "peb_index %u\n", + bmap->peb_index); + goto finish_block_bitmap_preparation; + } + + bmap->dst = &bmap->buffer[buffer_index]; + atomic_set(&bmap->buffers_state, new_buffers_state); + + free_blks = atomic_read(&bmap->peb_free_blks); + atomic_sub(free_blks, &bmap->peb_free_blks); + atomic_sub(free_blks, &bmap->parent->seg_free_blks); + + invalid_blks = atomic_xchg(&bmap->peb_invalid_blks, 0); + atomic_sub(invalid_blks, &bmap->parent->seg_invalid_blks); + atomic_add(invalid_blks, &bmap->peb_free_blks); + atomic_set(&pebc->shared_free_dst_blks, invalid_blks); + atomic_add(invalid_blks, &bmap->parent->seg_free_blks); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("shared_free_dst_blks %d\n", + atomic_read(&pebc->shared_free_dst_blks)); +#endif /* CONFIG_SSDFS_DEBUG */ + + used_blks = atomic_read(&bmap->peb_valid_blks); + + err = ssdfs_block_bmap_get_free_pages(bmap->dst); + if (err < 0) { + SSDFS_ERR("fail to get free pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto finish_block_bitmap_preparation; + } else { + free_blks = err; + err = 0; + } + + if (free_blks < (invalid_blks + used_blks)) { + err = -ERANGE; + SSDFS_ERR("free_blks %d < (invalid_blks %d + used_blks %d)\n", + free_blks, invalid_blks, used_blks); + goto finish_block_bitmap_preparation; + } + + free_blks -= invalid_blks + used_blks; + + atomic_add(free_blks, &bmap->peb_free_blks); + atomic_add(free_blks, &bmap->parent->seg_free_blks); + +finish_block_bitmap_preparation: + ssdfs_block_bmap_unlock(&bmap->buffer[buffer_index]); + + if (unlikely(err)) + goto finish_migration_start; + +#ifdef CONFIG_SSDFS_DEBUG + err = ssdfs_block_bmap_lock(bmap->dst); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_migration_start; + } + + err = ssdfs_block_bmap_get_free_pages(bmap->dst); + if (err < 0) { + SSDFS_ERR("fail to get free pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto unlock_dst_bmap; + } else { + free_blks = err; + err = 0; + } + + err = ssdfs_block_bmap_get_used_pages(bmap->dst); + if (err < 0) { + SSDFS_ERR("fail to get used pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto unlock_dst_bmap; + } else { + used_blks = err; + err = 0; + } + + err = ssdfs_block_bmap_get_invalid_pages(bmap->dst); + if (err < 0) { + SSDFS_ERR("fail to get invalid pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto unlock_dst_bmap; + } else { + invalid_blks = err; + err = 0; + } + +unlock_dst_bmap: + ssdfs_block_bmap_unlock(bmap->dst); + + if (unlikely(err)) + goto finish_migration_start; + + SSDFS_DBG("DST: free_blks %d, used_blks %d, invalid_blks %d\n", + free_blks, used_blks, invalid_blks); + + err = ssdfs_block_bmap_lock(bmap->src); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_migration_start; + } + + err = ssdfs_block_bmap_get_free_pages(bmap->src); + if (err < 0) { + SSDFS_ERR("fail to get free pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto unlock_src_bmap; + } else { + free_blks = err; + err = 0; + } + + err = ssdfs_block_bmap_get_used_pages(bmap->src); + if (err < 0) { + SSDFS_ERR("fail to get used pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto unlock_src_bmap; + } else { + used_blks = err; + err = 0; + } + + err = ssdfs_block_bmap_get_invalid_pages(bmap->src); + if (err < 0) { + SSDFS_ERR("fail to get invalid pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto unlock_src_bmap; + } else { + invalid_blks = err; + err = 0; + } + +unlock_src_bmap: + ssdfs_block_bmap_unlock(bmap->src); + + if (unlikely(err)) + goto finish_migration_start; + + SSDFS_DBG("SRC: free_blks %d, used_blks %d, invalid_blks %d\n", + free_blks, used_blks, invalid_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_migration_start: + up_write(&bmap->modification_lock); + up_write(&bmap->parent->modification_lock); + up_write(&bmap->lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_logical_blocks %d, valid_logical_block %d, " + "invalid_logical_block %d\n", + atomic_read(&bmap->peb_free_blks), + atomic_read(&bmap->peb_valid_blks), + atomic_read(&bmap->peb_invalid_blks)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (err == -ENOENT) + return 0; + else if (unlikely(err)) + return err; + + return 0; +} + +/* + * ssdfs_peb_blk_bmap_migrate() - migrate valid blocks + * @bmap: PEB's block bitmap object + * @new_range_state: new state of range + * @range: pointer on blocks' range + * + * This method tries to move @range of blocks from source + * block bitmap into destination block bitmap. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_peb_blk_bmap_migrate(struct ssdfs_peb_blk_bmap *bmap, + int new_range_state, + struct ssdfs_block_bmap_range *range) +{ + int buffers_state; + int range_state; + struct ssdfs_block_bmap *src; + struct ssdfs_block_bmap *dst; + int free_blks; +#ifdef CONFIG_SSDFS_DEBUG + int used_blks, invalid_blks; +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap || !range); + BUG_ON(!(new_range_state == SSDFS_BLK_PRE_ALLOCATED || + new_range_state == SSDFS_BLK_VALID)); + + SSDFS_DBG("bmap %p, peb_index %u, state %#x, " + "new_range_state %#x, range (start %u, len %u)\n", + bmap, bmap->peb_index, + atomic_read(&bmap->state), + new_range_state, + range->start, range->len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = SSDFS_WAIT_COMPLETION(&bmap->init_end); + if (unlikely(err)) { +init_failed: + SSDFS_ERR("PEB block bitmap init failed: " + "seg_id %llu, peb_index %u, " + "err %d\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, err); + return err; + } + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = -ERANGE; + goto init_failed; + } + } + + down_read(&bmap->lock); + + buffers_state = atomic_read(&bmap->buffers_state); + + switch (buffers_state) { + case SSDFS_PEB_BMAP1_SRC_PEB_BMAP2_DST: + case SSDFS_PEB_BMAP2_SRC_PEB_BMAP1_DST: + src = bmap->src; + dst = bmap->dst; + break; + + default: + err = -ERANGE; + SSDFS_WARN("fail to migrate: " + "buffers_state %#x, " + "range (start %u, len %u)\n", + buffers_state, + range->start, range->len); + goto finish_migrate; + } + +#ifdef CONFIG_SSDFS_DEBUG + if (!src || !dst) { + err = -ERANGE; + SSDFS_WARN("empty pointers\n"); + goto finish_migrate; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_block_bmap_lock(src); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_migrate; + } + + range_state = ssdfs_get_range_state(src, range); + if (range_state < 0) { + err = range_state; + SSDFS_ERR("fail to detect range state: " + "range (start %u, len %u), err %d\n", + range->start, range->len, err); + goto finish_process_source_bmap; + } + + switch (range_state) { + case SSDFS_BLK_PRE_ALLOCATED: + /* valid block state */ + err = ssdfs_block_bmap_invalidate(src, range); + break; + + case SSDFS_BLK_VALID: + if (new_range_state == SSDFS_BLK_PRE_ALLOCATED) { + err = -ERANGE; + SSDFS_WARN("fail to change state: " + "range_state %#x, " + "new_range_state %#x\n", + range_state, new_range_state); + goto finish_process_source_bmap; + } + + err = ssdfs_block_bmap_invalidate(src, range); + break; + + case SSDFS_BLK_INVALID: + /* range was invalidated already */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid range state: %#x\n", + range_state); + goto finish_process_source_bmap; + }; + +finish_process_source_bmap: + ssdfs_block_bmap_unlock(src); + + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate blocks: " + "start %u, len %u, err %d\n", + range->start, range->len, err); + goto finish_migrate; + } + + err = ssdfs_block_bmap_lock(dst); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_migrate; + } + + err = ssdfs_block_bmap_get_free_pages(dst); + if (err < 0) { + SSDFS_ERR("fail to get free pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto do_bmap_unlock; + } else { + free_blks = err; + err = 0; + } + + if (free_blks < range->len) { + u32 freed_metapages = range->len - free_blks; + + err = ssdfs_block_bmap_free_metadata_pages(dst, + freed_metapages); + if (unlikely(err)) { + SSDFS_ERR("fail to free metadata pages: err %d\n", + err); + goto do_bmap_unlock; + } + } + + if (new_range_state == SSDFS_BLK_PRE_ALLOCATED) + err = ssdfs_block_bmap_pre_allocate(dst, 0, NULL, range); + else + err = ssdfs_block_bmap_allocate(dst, 0, NULL, range); + +do_bmap_unlock: + ssdfs_block_bmap_unlock(dst); + + if (unlikely(err)) { + SSDFS_ERR("fail to allocate blocks: " + "start %u, len %u, err %d\n", + range->start, range->len, err); + goto finish_migrate; + } + +#ifdef CONFIG_SSDFS_DEBUG + err = ssdfs_block_bmap_lock(src); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_migrate; + } + + err = ssdfs_block_bmap_get_free_pages(src); + if (err < 0) { + SSDFS_ERR("fail to get free pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto unlock_src_bmap; + } else { + free_blks = err; + err = 0; + } + + err = ssdfs_block_bmap_get_used_pages(src); + if (err < 0) { + SSDFS_ERR("fail to get used pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto unlock_src_bmap; + } else { + used_blks = err; + err = 0; + } + + err = ssdfs_block_bmap_get_invalid_pages(src); + if (err < 0) { + SSDFS_ERR("fail to get invalid pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto unlock_src_bmap; + } else { + invalid_blks = err; + err = 0; + } + +unlock_src_bmap: + ssdfs_block_bmap_unlock(src); + + if (unlikely(err)) + goto finish_migrate; + + SSDFS_DBG("SRC: free_blks %d, used_blks %d, invalid_blks %d\n", + free_blks, used_blks, invalid_blks); + + err = ssdfs_block_bmap_lock(dst); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_migrate; + } + + err = ssdfs_block_bmap_get_free_pages(dst); + if (err < 0) { + SSDFS_ERR("fail to get free pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto unlock_dst_bmap; + } else { + free_blks = err; + err = 0; + } + + err = ssdfs_block_bmap_get_used_pages(dst); + if (err < 0) { + SSDFS_ERR("fail to get used pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto unlock_dst_bmap; + } else { + used_blks = err; + err = 0; + } + + err = ssdfs_block_bmap_get_invalid_pages(dst); + if (err < 0) { + SSDFS_ERR("fail to get invalid pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto unlock_dst_bmap; + } else { + invalid_blks = err; + err = 0; + } + +unlock_dst_bmap: + ssdfs_block_bmap_unlock(dst); + + if (unlikely(err)) + goto finish_migrate; + + SSDFS_DBG("DST: free_blks %d, used_blks %d, invalid_blks %d\n", + free_blks, used_blks, invalid_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_migrate: + up_read(&bmap->lock); + + return err; +} + +/* + * ssdfs_peb_blk_bmap_finish_migration() - stop migration + * @bmap: PEB's block bitmap object + * + * This method tries to make destination block bitmap as + * source and to forget about old source block bitmap. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_peb_blk_bmap_finish_migration(struct ssdfs_peb_blk_bmap *bmap) +{ + struct ssdfs_segment_info *si; + struct ssdfs_peb_container *pebc; + int buffers_state, new_buffers_state; + int buffer_index; +#ifdef CONFIG_SSDFS_DEBUG + int free_blks, used_blks, invalid_blks; +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap || !bmap->src); + + SSDFS_DBG("bmap %p, peb_index %u, state %#x\n", + bmap, bmap->peb_index, + atomic_read(&bmap->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = SSDFS_WAIT_COMPLETION(&bmap->init_end); + if (unlikely(err)) { +init_failed: + SSDFS_ERR("PEB block bitmap init failed: " + "seg_id %llu, peb_index %u, " + "err %d\n", + bmap->parent->parent_si->seg_id, + bmap->peb_index, err); + return err; + } + + if (!ssdfs_peb_blk_bmap_initialized(bmap)) { + err = -ERANGE; + goto init_failed; + } + } + + si = bmap->parent->parent_si; + + if (bmap->peb_index >= si->pebs_count) { + SSDFS_ERR("peb_index %u >= pebs_count %u\n", + bmap->peb_index, si->pebs_count); + return -ERANGE; + } + + pebc = &si->peb_array[bmap->peb_index]; + + down_write(&bmap->lock); + + buffers_state = atomic_read(&bmap->buffers_state); + + switch (buffers_state) { + case SSDFS_PEB_BMAP1_SRC_PEB_BMAP2_DST: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap->src || !bmap->dst); +#endif /* CONFIG_SSDFS_DEBUG */ + new_buffers_state = SSDFS_PEB_BMAP2_SRC; + buffer_index = SSDFS_PEB_BLK_BMAP2; + break; + + case SSDFS_PEB_BMAP2_SRC_PEB_BMAP1_DST: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap->src || !bmap->dst); +#endif /* CONFIG_SSDFS_DEBUG */ + new_buffers_state = SSDFS_PEB_BMAP1_SRC; + buffer_index = SSDFS_PEB_BLK_BMAP1; + break; + + default: + err = -ERANGE; + SSDFS_WARN("fail to start migration: " + "buffers_state %#x\n", + buffers_state); + goto finish_migration_stop; + } + +#ifdef CONFIG_SSDFS_DEBUG + err = ssdfs_block_bmap_lock(bmap->dst); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_migration_stop; + } + + err = ssdfs_block_bmap_get_free_pages(bmap->dst); + if (err < 0) { + SSDFS_ERR("fail to get free pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto unlock_dst_bmap; + } else { + free_blks = err; + err = 0; + } + + err = ssdfs_block_bmap_get_used_pages(bmap->dst); + if (err < 0) { + SSDFS_ERR("fail to get used pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto unlock_dst_bmap; + } else { + used_blks = err; + err = 0; + } + + err = ssdfs_block_bmap_get_invalid_pages(bmap->dst); + if (err < 0) { + SSDFS_ERR("fail to get invalid pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto unlock_dst_bmap; + } else { + invalid_blks = err; + err = 0; + } + +unlock_dst_bmap: + ssdfs_block_bmap_unlock(bmap->dst); + + if (unlikely(err)) + goto finish_migration_stop; + + SSDFS_DBG("DST: free_blks %d, used_blks %d, invalid_blks %d\n", + free_blks, used_blks, invalid_blks); + + err = ssdfs_block_bmap_lock(bmap->src); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_migration_stop; + } + + err = ssdfs_block_bmap_get_free_pages(bmap->src); + if (err < 0) { + SSDFS_ERR("fail to get free pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto unlock_src_bmap; + } else { + free_blks = err; + err = 0; + } + + err = ssdfs_block_bmap_get_used_pages(bmap->src); + if (err < 0) { + SSDFS_ERR("fail to get used pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto unlock_src_bmap; + } else { + used_blks = err; + err = 0; + } + + err = ssdfs_block_bmap_get_invalid_pages(bmap->src); + if (err < 0) { + SSDFS_ERR("fail to get invalid pages count: " + "peb_index %u, err %d\n", + bmap->peb_index, err); + goto unlock_src_bmap; + } else { + invalid_blks = err; + err = 0; + } + +unlock_src_bmap: + ssdfs_block_bmap_unlock(bmap->src); + + if (unlikely(err)) + goto finish_migration_stop; + + SSDFS_DBG("SRC: free_blks %d, used_blks %d, invalid_blks %d\n", + free_blks, used_blks, invalid_blks); + + if ((free_blks + used_blks + invalid_blks) > bmap->pages_per_peb) { + SSDFS_WARN("free_blks %d, used_blks %d, " + "invalid_blks %d, pages_per_peb %u\n", + free_blks, used_blks, invalid_blks, + bmap->pages_per_peb); + err = -ERANGE; + goto finish_migration_stop; + } + + if (used_blks != 0) { + SSDFS_ERR("PEB contains valid blocks %d\n", + used_blks); + err = -ERANGE; + goto finish_migration_stop; + } + + SSDFS_DBG("shared_free_dst_blks %d, pages_per_peb %u\n", + atomic_read(&pebc->shared_free_dst_blks), + bmap->pages_per_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_block_bmap_clear_dirty_state(bmap->src); + + bmap->src = &bmap->buffer[buffer_index]; + bmap->dst = NULL; + atomic_set(&bmap->buffers_state, new_buffers_state); + +finish_migration_stop: + up_write(&bmap->lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} From patchwork Sat Feb 25 01:08:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151921 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F04EC6FA8E for ; Sat, 25 Feb 2023 01:16:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229706AbjBYBQs (ORCPT ); Fri, 24 Feb 2023 20:16:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229685AbjBYBQ1 (ORCPT ); Fri, 24 Feb 2023 20:16:27 -0500 Received: from mail-oi1-x235.google.com (mail-oi1-x235.google.com [IPv6:2607:f8b0:4864:20::235]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BEA7312864 for ; Fri, 24 Feb 2023 17:16:07 -0800 (PST) Received: by mail-oi1-x235.google.com with SMTP id bl7so858685oib.0 for ; Fri, 24 Feb 2023 17:16:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/eI+flnUK3bNkddtmqhNy5+pOxr2iizZbF1weqywGzc=; b=U0/He9X0tTdwHSYp2/QXT6L2aH9QXHHviFoYDBwPpQtcLWCES+fns0nOrLqgSIFh7+ A70OD+SspO+L/l4UCya3PJDVV9qCAYemA7MXahQGi8AkHMyQtqhz4OSl/IU5oCG2fGFL nAQGg8lkm3fldOYkGIt6K2WbZzrhx3LZ2wGwtCa6KMnp7yvjUPuy3ct1QVPlHbKIcFRZ xer80X4BXr2JjY2XTAYrtIUNT3sPj+oOJNHRxdod68brqF3Hjgiy+bMXXPEJ7y/Yj3Yh AAUBMOKaz5tJ09qB1dsnlkqmyzY4xNZmHN7Y3+TZgme+cnsVcoDAX9DCXlSdZgTfD0eR 4dzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/eI+flnUK3bNkddtmqhNy5+pOxr2iizZbF1weqywGzc=; b=yE1pHQ+l6IxukcQG+rX9MbATsPhClAlJV+MMXPHP/qDLhGp1GrLTZqYcAhPkxs3R8N 3sX3kanWbzFhS7KV817G9ybUxrNHDhQNqz8oKI2gtQp6gW0An0itN1VaSyG23oa0SOgy 8Sa4OEEv52acgiN+d0N1QQb2SKnGGzUNtuF+v7qbwHCjPb5u+wREaupZWEk+u66zsUZG N5Nu11bOIiITGsv+G5I1zcOs/2jiLwZ0eQk9DxFVpyhorYtjvbhW0s08GrLPGyIBFYAO b+lj6IU6vq1vObQ9eP1/jr4Tzz1Th9rR8typKca3CWnZBcqKwwg3w77/w2JQgeSrCNBx AgVw== X-Gm-Message-State: AO0yUKXBHPxWAIQ8BZGkf0fLlk0JhGruhOSJ6BZwoetvi2JvPQENmsN+ XQDbq/I4/KNn9NYWxWG5Bsg0YHt0QRageINY X-Google-Smtp-Source: AK7set9BOCIPuereytrinA2daM3oM9LMDNbHc2Awp0UaQ0hHU+4p89YBSxy2HcQYpiXJ4CYZNvb/1g== X-Received: by 2002:a05:6808:3093:b0:383:9108:5b13 with SMTP id bl19-20020a056808309300b0038391085b13mr990760oib.1.1677287766158; Fri, 24 Feb 2023 17:16:06 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:05 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 15/76] ssdfs: introduce segment block bitmap Date: Fri, 24 Feb 2023 17:08:26 -0800 Message-Id: <20230225010927.813929-16-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org SSDFS splits a partition/volume on sequence of fixed-sized segments. Every segment can include one or several Logical Erase Blocks (LEB). LEB can be mapped into "Physical" Erase Block (PEB). PEB block bitmap object represents the aggregation of source PEB's block bitmap and destination PEB's block bitmap. Finally, segment block bitmap implements an array of PEB block bitmaps. Segment block bitmap has API: (1) create - create segment block bitmap (2) destroy - destroy segment block bitmap (3) partial_init - initialize by state of one PEB block bitmap (4) get_free_pages - get free pages in segment block bitmap (5) get_used_pages - get used pages in segment block bitmap (6) get_invalid_pages - get invalid pages in segment block bitmap (7) reserve_block - reserve a free block (8) reserved_extent - reserve some number of free blocks (9) pre_allocate - pre_allocate page/range in segment block bitmap (10) allocate - allocate page/range in segment block bitmap (11) update_range - change the state of range in segment block bitmap Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/segment_block_bitmap.c | 1425 +++++++++++++++++++++++++++++++ fs/ssdfs/segment_block_bitmap.h | 205 +++++ 2 files changed, 1630 insertions(+) create mode 100644 fs/ssdfs/segment_block_bitmap.c create mode 100644 fs/ssdfs/segment_block_bitmap.h diff --git a/fs/ssdfs/segment_block_bitmap.c b/fs/ssdfs/segment_block_bitmap.c new file mode 100644 index 000000000000..824c3d4fd31d --- /dev/null +++ b/fs/ssdfs/segment_block_bitmap.c @@ -0,0 +1,1425 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/segment_block_bitmap.c - segment's block bitmap implementation. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "page_vector.h" +#include "peb_block_bitmap.h" +#include "segment_block_bitmap.h" +#include "offset_translation_table.h" +#include "page_array.h" +#include "peb_container.h" +#include "segment_bitmap.h" +#include "page_array.h" +#include "segment.h" + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_seg_blk_page_leaks; +atomic64_t ssdfs_seg_blk_memory_leaks; +atomic64_t ssdfs_seg_blk_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_seg_blk_cache_leaks_increment(void *kaddr) + * void ssdfs_seg_blk_cache_leaks_decrement(void *kaddr) + * void *ssdfs_seg_blk_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_seg_blk_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_seg_blk_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_seg_blk_kfree(void *kaddr) + * struct page *ssdfs_seg_blk_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_seg_blk_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_seg_blk_free_page(struct page *page) + * void ssdfs_seg_blk_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(seg_blk) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(seg_blk) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_seg_blk_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_seg_blk_page_leaks, 0); + atomic64_set(&ssdfs_seg_blk_memory_leaks, 0); + atomic64_set(&ssdfs_seg_blk_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_seg_blk_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_seg_blk_page_leaks) != 0) { + SSDFS_ERR("SEGMENT BLOCK BITMAP: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_seg_blk_page_leaks)); + } + + if (atomic64_read(&ssdfs_seg_blk_memory_leaks) != 0) { + SSDFS_ERR("SEGMENT BLOCK BITMAP: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_seg_blk_memory_leaks)); + } + + if (atomic64_read(&ssdfs_seg_blk_cache_leaks) != 0) { + SSDFS_ERR("SEGMENT BLOCK BITMAP: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_seg_blk_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +#define SSDFS_SEG_BLK_BMAP_STATE_FNS(value, name) \ +static inline \ +bool is_seg_block_bmap_##name(struct ssdfs_segment_blk_bmap *bmap) \ +{ \ + return atomic_read(&bmap->state) == SSDFS_SEG_BLK_BMAP_##value; \ +} \ +static inline \ +void set_seg_block_bmap_##name(struct ssdfs_segment_blk_bmap *bmap) \ +{ \ + atomic_set(&bmap->state, SSDFS_SEG_BLK_BMAP_##value); \ +} \ + +/* + * is_seg_block_bmap_created() + * set_seg_block_bmap_created() + */ +SSDFS_SEG_BLK_BMAP_STATE_FNS(CREATED, created) + +/* + * ssdfs_segment_blk_bmap_create() - create segment block bitmap + * @si: segment object + * @init_flag: definition of block bitmap's creation state + * @init_state: block state is used during initialization + * + * This method tries to create segment block bitmap. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - unable to allocate memory. + * %-ERANGE - internal error. + */ +int ssdfs_segment_blk_bmap_create(struct ssdfs_segment_info *si, + int init_flag, int init_state) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_blk_bmap *bmap; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !si->fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("si %p, seg_id %llu, " + "init_flag %#x, init_state %#x\n", + si, si->seg_id, + init_flag, init_state); +#else + SSDFS_DBG("si %p, seg_id %llu, " + "init_flag %#x, init_state %#x\n", + si, si->seg_id, + init_flag, init_state); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + fsi = si->fsi; + bmap = &si->blk_bmap; + + bmap->parent_si = si; + atomic_set(&bmap->state, SSDFS_SEG_BLK_BMAP_STATE_UNKNOWN); + + bmap->pages_per_peb = fsi->pages_per_peb; + bmap->pages_per_seg = fsi->pages_per_seg; + + init_rwsem(&bmap->modification_lock); + atomic_set(&bmap->seg_valid_blks, 0); + atomic_set(&bmap->seg_invalid_blks, 0); + atomic_set(&bmap->seg_free_blks, 0); + + bmap->pebs_count = si->pebs_count; + + bmap->peb = ssdfs_seg_blk_kcalloc(bmap->pebs_count, + sizeof(struct ssdfs_peb_blk_bmap), + GFP_KERNEL); + if (!bmap->peb) { + SSDFS_ERR("fail to allocate PEBs' block bitmaps\n"); + return -ENOMEM; + } + + for (i = 0; i < bmap->pebs_count; i++) { + err = ssdfs_peb_blk_bmap_create(bmap, i, fsi->pages_per_peb, + init_flag, init_state); + if (unlikely(err)) { + SSDFS_ERR("fail to create PEB's block bitmap: " + "peb_index %u, err %d\n", + i, err); + goto fail_create_seg_blk_bmap; + } + } + + set_seg_block_bmap_created(bmap); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; + +fail_create_seg_blk_bmap: + ssdfs_segment_blk_bmap_destroy(bmap); + return err; +} + +/* + * ssdfs_segment_blk_bmap_destroy() - destroy segment block bitmap + * @ptr: segment block bitmap pointer + */ +void ssdfs_segment_blk_bmap_destroy(struct ssdfs_segment_blk_bmap *ptr) +{ + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ptr->parent_si) { + /* object is not created yet */ + return; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg_id %llu, state %#x\n", + ptr->parent_si->seg_id, + atomic_read(&ptr->state)); +#else + SSDFS_DBG("seg_id %llu, state %#x\n", + ptr->parent_si->seg_id, + atomic_read(&ptr->state)); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + atomic_set(&ptr->seg_valid_blks, 0); + atomic_set(&ptr->seg_invalid_blks, 0); + atomic_set(&ptr->seg_free_blks, 0); + + for (i = 0; i < ptr->pebs_count; i++) + ssdfs_peb_blk_bmap_destroy(&ptr->peb[i]); + + ssdfs_seg_blk_kfree(ptr->peb); + ptr->peb = NULL; + + atomic_set(&ptr->state, SSDFS_SEG_BLK_BMAP_STATE_UNKNOWN); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ +} + +/* + * ssdfs_segment_blk_bmap_partial_init() - partial init of segment bitmap + * @bmap: pointer on segment block bitmap + * @peb_index: PEB's index + * @source: pointer on pagevec with bitmap state + * @hdr: header of block bitmap fragment + * @cno: log's checkpoint + */ +int ssdfs_segment_blk_bmap_partial_init(struct ssdfs_segment_blk_bmap *bmap, + u16 peb_index, + struct ssdfs_page_vector *source, + struct ssdfs_block_bitmap_fragment *hdr, + u64 cno) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap || !bmap->peb || !bmap->parent_si); + BUG_ON(!source || !hdr); + BUG_ON(ssdfs_page_vector_count(source) == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg_id %llu, peb_index %u, cno %llu\n", + bmap->parent_si->seg_id, peb_index, cno); +#else + SSDFS_DBG("seg_id %llu, peb_index %u, cno %llu\n", + bmap->parent_si->seg_id, peb_index, cno); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (atomic_read(&bmap->state) != SSDFS_SEG_BLK_BMAP_CREATED) { + SSDFS_ERR("invalid segment block bitmap state %#x\n", + atomic_read(&bmap->state)); + return -ERANGE; + } + + if (peb_index >= bmap->pebs_count) { + SSDFS_ERR("peb_index %u >= seg_blkbmap->pebs_count %u\n", + peb_index, bmap->pebs_count); + return -ERANGE; + } + + err = ssdfs_peb_blk_bmap_init(&bmap->peb[peb_index], + source, hdr, cno); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: err %d\n", err); +#else + SSDFS_DBG("finished: err %d\n", err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * ssdfs_segment_blk_bmap_init_failed() - process failure of segment bitmap init + * @bmap: pointer on segment block bitmap + * @peb_index: PEB's index + */ +void ssdfs_segment_blk_bmap_init_failed(struct ssdfs_segment_blk_bmap *bmap, + u16 peb_index) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap || !bmap->peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (peb_index >= bmap->pebs_count) { + SSDFS_WARN("peb_index %u >= seg_blkbmap->pebs_count %u\n", + peb_index, bmap->pebs_count); + return; + } + + ssdfs_peb_blk_bmap_init_failed(&bmap->peb[peb_index]); +} + +/* + * is_ssdfs_segment_blk_bmap_dirty() - check that PEB block bitmap is dirty + * @bmap: pointer on segment block bitmap + * @peb_index: PEB's index + */ +bool is_ssdfs_segment_blk_bmap_dirty(struct ssdfs_segment_blk_bmap *bmap, + u16 peb_index) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap || !bmap->peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (peb_index >= bmap->pebs_count) { + SSDFS_WARN("peb_index %u >= seg_blkbmap->pebs_count %u\n", + peb_index, bmap->pebs_count); + return false; + } + + return is_ssdfs_peb_blk_bmap_dirty(&bmap->peb[peb_index]); +} + +/* + * ssdfs_define_bmap_index() - define block bitmap for operation + * @pebc: pointer on PEB container + * @bmap_index: pointer on block bitmap index value [out] + * @peb_index: pointer on PEB's index [out] + * + * This method tries to define bitmap index and PEB's index + * for operation with block bitmap. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_define_bmap_index(struct ssdfs_peb_container *pebc, + int *bmap_index, u16 *peb_index) +{ + struct ssdfs_segment_info *si; + int migration_state, items_state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si); + BUG_ON(!bmap_index || !peb_index); + BUG_ON(!rwsem_is_locked(&pebc->lock)); + BUG_ON(!mutex_is_locked(&pebc->migration_lock)); + + SSDFS_DBG("seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = pebc->parent_si; + *bmap_index = -1; + *peb_index = U16_MAX; + +try_define_bmap_index: + migration_state = atomic_read(&pebc->migration_state); + items_state = atomic_read(&pebc->items_state); + switch (migration_state) { + case SSDFS_PEB_NOT_MIGRATING: + *bmap_index = SSDFS_PEB_BLK_BMAP_SOURCE; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc->src_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + *peb_index = pebc->src_peb->peb_index; + break; + + case SSDFS_PEB_UNDER_MIGRATION: + switch (items_state) { + case SSDFS_PEB1_DST_CONTAINER: + case SSDFS_PEB2_DST_CONTAINER: + case SSDFS_PEB1_SRC_EXT_PTR_DST_CONTAINER: + case SSDFS_PEB2_SRC_EXT_PTR_DST_CONTAINER: + *bmap_index = SSDFS_PEB_BLK_BMAP_SOURCE; + break; + + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + switch (atomic_read(&pebc->migration_phase)) { + case SSDFS_SRC_PEB_NOT_EXHAUSTED: + *bmap_index = SSDFS_PEB_BLK_BMAP_SOURCE; + break; + + default: + *bmap_index = SSDFS_PEB_BLK_BMAP_DESTINATION; + break; + } + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid items_state %#x\n", + items_state); + goto finish_define_bmap_index; + }; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc->dst_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + *peb_index = pebc->dst_peb->peb_index; + break; + + case SSDFS_PEB_MIGRATION_PREPARATION: + case SSDFS_PEB_RELATION_PREPARATION: + case SSDFS_PEB_FINISHING_MIGRATION: +#ifdef CONFIG_SSDFS_DEBUG + /* unexpected situation */ + SSDFS_WARN("unexpected situation\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + err = -EAGAIN; + goto finish_define_bmap_index; + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid migration_state %#x\n", + migration_state); + goto finish_define_bmap_index; + } + +finish_define_bmap_index: + if (err == -EAGAIN) { + DEFINE_WAIT(wait); + + err = 0; + + mutex_unlock(&pebc->migration_lock); + up_read(&pebc->lock); + prepare_to_wait(&pebc->migration_wq, &wait, + TASK_UNINTERRUPTIBLE); + schedule(); + finish_wait(&pebc->migration_wq, &wait); + down_read(&pebc->lock); + mutex_lock(&pebc->migration_lock); + goto try_define_bmap_index; + } else if (unlikely(err)) { + SSDFS_ERR("fail to define bmap_index: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + return err; + } + + return 0; +} + +bool has_ssdfs_segment_blk_bmap_initialized(struct ssdfs_segment_blk_bmap *ptr, + struct ssdfs_peb_container *pebc) +{ + struct ssdfs_peb_blk_bmap *peb_blkbmap; + u16 peb_index; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr || !ptr->peb || !ptr->parent_si || !pebc); + + SSDFS_DBG("seg_id %llu, peb_index %u\n", + ptr->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (atomic_read(&ptr->state) != SSDFS_SEG_BLK_BMAP_CREATED) { + SSDFS_ERR("invalid segment block bitmap state %#x\n", + atomic_read(&ptr->state)); + return false; + } + + down_read(&pebc->lock); + if (pebc->dst_peb) + peb_index = pebc->dst_peb->peb_index; + else + peb_index = pebc->src_peb->peb_index; + up_read(&pebc->lock); + + if (peb_index >= ptr->pebs_count) { + SSDFS_ERR("peb_index %u >= pebs_count %u\n", + peb_index, ptr->pebs_count); + return false; + } + + peb_blkbmap = &ptr->peb[peb_index]; + + return has_ssdfs_peb_blk_bmap_initialized(peb_blkbmap); +} + +int ssdfs_segment_blk_bmap_wait_init_end(struct ssdfs_segment_blk_bmap *ptr, + struct ssdfs_peb_container *pebc) +{ + struct ssdfs_peb_blk_bmap *peb_blkbmap; + u16 peb_index; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr || !ptr->peb || !ptr->parent_si || !pebc); + + SSDFS_DBG("seg_id %llu, peb_index %u\n", + ptr->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (atomic_read(&ptr->state) != SSDFS_SEG_BLK_BMAP_CREATED) { + SSDFS_ERR("invalid segment block bitmap state %#x\n", + atomic_read(&ptr->state)); + return -ERANGE; + } + + down_read(&pebc->lock); + if (pebc->dst_peb) + peb_index = pebc->dst_peb->peb_index; + else + peb_index = pebc->src_peb->peb_index; + up_read(&pebc->lock); + + if (peb_index >= ptr->pebs_count) { + SSDFS_ERR("peb_index %u >= pebs_count %u\n", + peb_index, ptr->pebs_count); + return -ERANGE; + } + + peb_blkbmap = &ptr->peb[peb_index]; + + return ssdfs_peb_blk_bmap_wait_init_end(peb_blkbmap); +} + +/* + * ssdfs_segment_blk_bmap_reserve_metapages() - reserve metapages + * @ptr: segment block bitmap object + * @pebc: pointer on PEB container + * @count: amount of metadata pages for reservation + * + * This method tries to reserve @count metadata pages into + * block bitmap. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_blk_bmap_reserve_metapages(struct ssdfs_segment_blk_bmap *ptr, + struct ssdfs_peb_container *pebc, + u32 count) +{ + struct ssdfs_peb_blk_bmap *peb_blkbmap; + int bmap_index = SSDFS_PEB_BLK_BMAP_INDEX_MAX; + u16 peb_index; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr || !ptr->peb || !ptr->parent_si || !pebc); + BUG_ON(!rwsem_is_locked(&pebc->lock)); + + SSDFS_DBG("seg_id %llu, peb_index %u, count %u\n", + ptr->parent_si->seg_id, + pebc->peb_index, count); + SSDFS_DBG("free_logical_blks %d, valid_logical_blks %d, " + "invalid_logical_blks %d, pages_per_seg %u\n", + atomic_read(&ptr->seg_free_blks), + atomic_read(&ptr->seg_valid_blks), + atomic_read(&ptr->seg_invalid_blks), + ptr->pages_per_seg); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (atomic_read(&ptr->state) != SSDFS_SEG_BLK_BMAP_CREATED) { + SSDFS_ERR("invalid segment block bitmap state %#x\n", + atomic_read(&ptr->state)); + return -ERANGE; + } + + err = ssdfs_define_bmap_index(pebc, &bmap_index, &peb_index); + if (unlikely(err)) { + SSDFS_ERR("fail to define bmap_index: " + "seg %llu, peb_index %u, err %d\n", + ptr->parent_si->seg_id, + pebc->peb_index, err); + return err; + } + + if (peb_index >= ptr->pebs_count) { + SSDFS_ERR("peb_index %u >= pebs_count %u\n", + peb_index, ptr->pebs_count); + return -ERANGE; + } + + peb_blkbmap = &ptr->peb[peb_index]; + + return ssdfs_peb_blk_bmap_reserve_metapages(peb_blkbmap, + bmap_index, + count); +} + +/* + * ssdfs_segment_blk_bmap_free_metapages() - free metapages + * @ptr: segment block bitmap object + * @pebc: pointer on PEB container + * @count: amount of metadata pages for freeing + * + * This method tries to free @count metadata pages into + * block bitmap. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_blk_bmap_free_metapages(struct ssdfs_segment_blk_bmap *ptr, + struct ssdfs_peb_container *pebc, + u32 count) +{ + struct ssdfs_peb_blk_bmap *peb_blkbmap; + int bmap_index = SSDFS_PEB_BLK_BMAP_INDEX_MAX; + u16 peb_index; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr || !ptr->peb || !ptr->parent_si || !pebc); + BUG_ON(!rwsem_is_locked(&pebc->lock)); + + SSDFS_DBG("seg_id %llu, peb_index %u, count %u\n", + ptr->parent_si->seg_id, + pebc->peb_index, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (atomic_read(&ptr->state) != SSDFS_SEG_BLK_BMAP_CREATED) { + SSDFS_ERR("invalid segment block bitmap state %#x\n", + atomic_read(&ptr->state)); + return -ERANGE; + } + + err = ssdfs_define_bmap_index(pebc, &bmap_index, &peb_index); + if (unlikely(err)) { + SSDFS_ERR("fail to define bmap_index: " + "seg %llu, peb_index %u, err %d\n", + ptr->parent_si->seg_id, + pebc->peb_index, err); + return err; + } + + if (peb_index >= ptr->pebs_count) { + SSDFS_ERR("peb_index %u >= pebs_count %u\n", + peb_index, ptr->pebs_count); + return -ERANGE; + } + + peb_blkbmap = &ptr->peb[peb_index]; + + return ssdfs_peb_blk_bmap_free_metapages(peb_blkbmap, + bmap_index, + count); +} + +/* + * ssdfs_segment_blk_bmap_reserve_extent() - reserve free extent + * @ptr: segment block bitmap object + * @count: number of logical blocks + * + * This function tries to reserve some number of free blocks. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-E2BIG - segment hasn't enough free space. + */ +int ssdfs_segment_blk_bmap_reserve_extent(struct ssdfs_segment_blk_bmap *ptr, + u32 count) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + int free_blks; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr || !ptr->peb || !ptr->parent_si); + + SSDFS_DBG("seg_id %llu\n", + ptr->parent_si->seg_id); + SSDFS_DBG("BEFORE: free_logical_blks %d, valid_logical_blks %d, " + "invalid_logical_blks %d, pages_per_seg %u\n", + atomic_read(&ptr->seg_free_blks), + atomic_read(&ptr->seg_valid_blks), + atomic_read(&ptr->seg_invalid_blks), + ptr->pages_per_seg); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (atomic_read(&ptr->state) != SSDFS_SEG_BLK_BMAP_CREATED) { + SSDFS_ERR("invalid segment block bitmap state %#x\n", + atomic_read(&ptr->state)); + return -ERANGE; + } + + down_read(&ptr->modification_lock); + + free_blks = atomic_read(&ptr->seg_free_blks); + + if (free_blks < count) { + err = -E2BIG; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segment %llu hasn't enough free pages: " + "free_pages %u, requested_pages %u\n", + ptr->parent_si->seg_id, free_blks, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + atomic_set(&ptr->seg_free_blks, 0); + } else { + atomic_sub(count, &ptr->seg_free_blks); + } + + up_read(&ptr->modification_lock); + + if (err) + goto finish_reserve_extent; + + si = ptr->parent_si; + fsi = si->fsi; + + if (si->seg_type == SSDFS_USER_DATA_SEG_TYPE) { + u64 reserved = 0; + u32 pending = 0; + + spin_lock(&fsi->volume_state_lock); + reserved = fsi->reserved_new_user_data_pages; + if (fsi->reserved_new_user_data_pages >= count) { + fsi->reserved_new_user_data_pages -= count; + } else + err = -ERANGE; + spin_unlock(&fsi->volume_state_lock); + + if (err) { + SSDFS_ERR("count %u is bigger than reserved %llu\n", + count, reserved); + goto finish_reserve_extent; + } + + spin_lock(&si->pending_lock); + si->pending_new_user_data_pages += count; + pending = si->pending_new_user_data_pages; + spin_unlock(&si->pending_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, pending %u\n", + si->seg_id, pending); +#endif /* CONFIG_SSDFS_DEBUG */ + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("AFTER: free_logical_blks %d, valid_logical_blks %d, " + "invalid_logical_blks %d, pages_per_seg %u\n", + atomic_read(&ptr->seg_free_blks), + atomic_read(&ptr->seg_valid_blks), + atomic_read(&ptr->seg_invalid_blks), + ptr->pages_per_seg); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_reserve_extent: + return err; +} + +/* + * ssdfs_segment_blk_bmap_reserve_block() - reserve free block + * @ptr: segment block bitmap object + * @count: number of logical blocks + * + * This function tries to reserve a free block. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-E2BIG - segment hasn't enough free space. + */ +int ssdfs_segment_blk_bmap_reserve_block(struct ssdfs_segment_blk_bmap *ptr) +{ + return ssdfs_segment_blk_bmap_reserve_extent(ptr, 1); +} + +/* + * ssdfs_segment_blk_bmap_pre_allocate() - pre-allocate range of blocks + * @ptr: segment block bitmap object + * @pebc: pointer on PEB container + * @len: pointer on variable with requested length of range + * @range: pointer on blocks' range [in | out] + * + * This function tries to find contiguous range of free blocks and + * to set the found range in pre-allocated state. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_blk_bmap_pre_allocate(struct ssdfs_segment_blk_bmap *ptr, + struct ssdfs_peb_container *pebc, + u32 *len, + struct ssdfs_block_bmap_range *range) +{ + struct ssdfs_peb_blk_bmap *peb_blkbmap; + int bmap_index = SSDFS_PEB_BLK_BMAP_INDEX_MAX; + u16 peb_index; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr || !ptr->peb || !ptr->parent_si || !pebc); + BUG_ON(!rwsem_is_locked(&pebc->lock)); + + SSDFS_DBG("seg_id %llu, peb_index %u\n", + ptr->parent_si->seg_id, + pebc->peb_index); + SSDFS_DBG("free_logical_blks %d, valid_logical_blks %d, " + "invalid_logical_blks %d, pages_per_seg %u\n", + atomic_read(&ptr->seg_free_blks), + atomic_read(&ptr->seg_valid_blks), + atomic_read(&ptr->seg_invalid_blks), + ptr->pages_per_seg); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (atomic_read(&ptr->state) != SSDFS_SEG_BLK_BMAP_CREATED) { + SSDFS_ERR("invalid segment block bitmap state %#x\n", + atomic_read(&ptr->state)); + return -ERANGE; + } + + err = ssdfs_define_bmap_index(pebc, &bmap_index, &peb_index); + if (unlikely(err)) { + SSDFS_ERR("fail to define bmap_index: " + "seg %llu, peb_index %u, err %d\n", + ptr->parent_si->seg_id, + pebc->peb_index, err); + return err; + } + + if (peb_index >= ptr->pebs_count) { + SSDFS_ERR("peb_index %u >= pebs_count %u\n", + peb_index, ptr->pebs_count); + return -ERANGE; + } + + peb_blkbmap = &ptr->peb[peb_index]; + + return ssdfs_peb_blk_bmap_pre_allocate(peb_blkbmap, bmap_index, + len, range); +} + +/* + * ssdfs_segment_blk_bmap_allocate() - allocate range of blocks + * @ptr: segment block bitmap object + * @pebc: pointer on PEB container + * @len: pointer on variable with requested length of range + * @range: pointer on blocks' range [in | out] + * + * This function tries to find contiguous range of free blocks and + * to set the found range in allocated state. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_blk_bmap_allocate(struct ssdfs_segment_blk_bmap *ptr, + struct ssdfs_peb_container *pebc, + u32 *len, + struct ssdfs_block_bmap_range *range) +{ + struct ssdfs_peb_blk_bmap *peb_blkbmap; + int bmap_index = SSDFS_PEB_BLK_BMAP_INDEX_MAX; + u16 peb_index; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr || !ptr->peb || !ptr->parent_si || !pebc); + BUG_ON(!rwsem_is_locked(&pebc->lock)); + + SSDFS_DBG("seg_id %llu, peb_index %u\n", + ptr->parent_si->seg_id, + pebc->peb_index); + SSDFS_DBG("free_logical_blks %d, valid_logical_blks %d, " + "invalid_logical_blks %d, pages_per_seg %u\n", + atomic_read(&ptr->seg_free_blks), + atomic_read(&ptr->seg_valid_blks), + atomic_read(&ptr->seg_invalid_blks), + ptr->pages_per_seg); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (atomic_read(&ptr->state) != SSDFS_SEG_BLK_BMAP_CREATED) { + SSDFS_ERR("invalid segment block bitmap state %#x\n", + atomic_read(&ptr->state)); + return -ERANGE; + } + + err = ssdfs_define_bmap_index(pebc, &bmap_index, &peb_index); + if (unlikely(err)) { + SSDFS_ERR("fail to define bmap_index: " + "seg %llu, peb_index %u, err %d\n", + ptr->parent_si->seg_id, + pebc->peb_index, err); + return err; + } + + if (peb_index >= ptr->pebs_count) { + SSDFS_ERR("peb_index %u >= pebs_count %u\n", + peb_index, ptr->pebs_count); + return -ERANGE; + } + + peb_blkbmap = &ptr->peb[peb_index]; + + return ssdfs_peb_blk_bmap_allocate(peb_blkbmap, bmap_index, + len, range); +} + +/* + * ssdfs_segment_blk_bmap_update_range() - update range of blocks' state + * @ptr: segment block bitmap object + * @pebc: pointer on PEB container + * @peb_migration_id: migration_id of PEB + * @range_state: new state of range + * @range: pointer on blocks' range [in | out] + * + * This function tries to change state of @range. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_blk_bmap_update_range(struct ssdfs_segment_blk_bmap *bmap, + struct ssdfs_peb_container *pebc, + u8 peb_migration_id, + int range_state, + struct ssdfs_block_bmap_range *range) +{ + struct ssdfs_segment_info *si; + struct ssdfs_peb_container *dst_pebc; + struct ssdfs_peb_blk_bmap *dst_blkbmap; + int bmap_index = SSDFS_PEB_BLK_BMAP_INDEX_MAX; + u16 peb_index; + int migration_state, migration_phase, items_state; + bool need_migrate = false; + bool need_move = false; + int src_migration_id = -1, dst_migration_id = -1; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap || !bmap->peb || !bmap->parent_si); + BUG_ON(!pebc || !range); + BUG_ON(!rwsem_is_locked(&pebc->lock)); + BUG_ON(!mutex_is_locked(&pebc->migration_lock)); + + SSDFS_DBG("seg_id %llu, peb_index %u, peb_migration_id %u, " + "range (start %u, len %u)\n", + bmap->parent_si->seg_id, pebc->peb_index, + peb_migration_id, range->start, range->len); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = pebc->parent_si; + +try_define_bmap_index: + migration_state = atomic_read(&pebc->migration_state); + migration_phase = atomic_read(&pebc->migration_phase); + items_state = atomic_read(&pebc->items_state); + switch (migration_state) { + case SSDFS_PEB_NOT_MIGRATING: + need_migrate = false; + need_move = false; + bmap_index = SSDFS_PEB_BLK_BMAP_SOURCE; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc->src_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + src_migration_id = + ssdfs_get_peb_migration_id_checked(pebc->src_peb); + if (unlikely(src_migration_id < 0)) { + err = src_migration_id; + SSDFS_ERR("invalid peb_migration_id: " + "err %d\n", + err); + goto finish_define_bmap_index; + } + + if (peb_migration_id > src_migration_id) { + err = -ERANGE; + SSDFS_ERR("migration_id %u > src_migration_id %u\n", + peb_migration_id, + src_migration_id); + goto finish_define_bmap_index; + } + peb_index = pebc->src_peb->peb_index; + break; + + case SSDFS_PEB_UNDER_MIGRATION: + switch (items_state) { + case SSDFS_PEB1_DST_CONTAINER: + case SSDFS_PEB2_DST_CONTAINER: + need_migrate = false; + need_move = false; + bmap_index = SSDFS_PEB_BLK_BMAP_SOURCE; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc->dst_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + dst_migration_id = + ssdfs_get_peb_migration_id_checked(pebc->dst_peb); + if (unlikely(dst_migration_id < 0)) { + err = dst_migration_id; + SSDFS_ERR("invalid peb_migration_id: " + "err %d\n", + err); + goto finish_define_bmap_index; + } + + if (peb_migration_id != dst_migration_id) { + err = -ERANGE; + SSDFS_ERR("migration_id %u != " + "dst_migration_id %u\n", + peb_migration_id, + dst_migration_id); + goto finish_define_bmap_index; + } + peb_index = pebc->dst_peb->peb_index; + break; + + case SSDFS_PEB1_SRC_EXT_PTR_DST_CONTAINER: + case SSDFS_PEB2_SRC_EXT_PTR_DST_CONTAINER: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc->src_peb || !pebc->dst_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + src_migration_id = + ssdfs_get_peb_migration_id_checked(pebc->src_peb); + if (unlikely(src_migration_id < 0)) { + err = src_migration_id; + SSDFS_ERR("invalid peb_migration_id: " + "err %d\n", + err); + goto finish_define_bmap_index; + } + + dst_migration_id = + ssdfs_get_peb_migration_id_checked(pebc->dst_peb); + if (unlikely(dst_migration_id < 0)) { + err = dst_migration_id; + SSDFS_ERR("invalid peb_migration_id: " + "err %d\n", + err); + goto finish_define_bmap_index; + } + + if (src_migration_id == dst_migration_id) { + err = -ERANGE; + SSDFS_ERR("src_migration_id %u == " + "dst_migration_id %u\n", + src_migration_id, + dst_migration_id); + goto finish_define_bmap_index; + } + + if (peb_migration_id == src_migration_id) { + int state; + + need_migrate = true; + need_move = false; + + dst_pebc = pebc->dst_peb->pebc; + state = atomic_read(&dst_pebc->items_state); + switch (state) { + case SSDFS_PEB1_DST_CONTAINER: + case SSDFS_PEB2_DST_CONTAINER: + bmap_index = SSDFS_PEB_BLK_BMAP_SOURCE; + break; + + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + bmap_index = + SSDFS_PEB_BLK_BMAP_DESTINATION; + break; + + default: + BUG(); + } + + peb_index = U16_MAX; + } else if (peb_migration_id == dst_migration_id) { + err = -ERANGE; + SSDFS_WARN("invalid request: " + "peb_migration_id %u, " + "dst_migration_id %u\n", + peb_migration_id, + dst_migration_id); + goto finish_define_bmap_index; + } else { + err = -ERANGE; + SSDFS_ERR("fail to select PEB: " + "peb_migration_id %u, " + "src_migration_id %u, " + "dst_migration_id %u\n", + peb_migration_id, + src_migration_id, + dst_migration_id); + goto finish_define_bmap_index; + } + break; + + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc->src_peb || !pebc->dst_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + src_migration_id = + ssdfs_get_peb_migration_id_checked(pebc->src_peb); + if (unlikely(src_migration_id < 0)) { + err = src_migration_id; + SSDFS_ERR("invalid peb_migration_id: " + "err %d\n", + err); + goto finish_define_bmap_index; + } + + dst_migration_id = + ssdfs_get_peb_migration_id_checked(pebc->dst_peb); + if (unlikely(dst_migration_id < 0)) { + err = dst_migration_id; + SSDFS_ERR("invalid peb_migration_id: " + "err %d\n", + err); + goto finish_define_bmap_index; + } + + if (src_migration_id == dst_migration_id) { + err = -ERANGE; + SSDFS_ERR("src_migration_id %u == " + "dst_migration_id %u\n", + src_migration_id, + dst_migration_id); + goto finish_define_bmap_index; + } + + if (peb_migration_id == src_migration_id) { + switch (migration_phase) { + case SSDFS_SRC_PEB_NOT_EXHAUSTED: + need_migrate = false; + need_move = false; + bmap_index = + SSDFS_PEB_BLK_BMAP_SOURCE; + peb_index = pebc->src_peb->peb_index; + break; + + default: + need_migrate = true; + need_move = false; + bmap_index = + SSDFS_PEB_BLK_BMAP_INDEX_MAX; + peb_index = pebc->src_peb->peb_index; + break; + } + } else if (peb_migration_id == dst_migration_id) { + need_migrate = false; + need_move = false; + bmap_index = SSDFS_PEB_BLK_BMAP_DESTINATION; + peb_index = pebc->dst_peb->peb_index; + } else if ((peb_migration_id + 1) == src_migration_id) { + switch (migration_phase) { + case SSDFS_SRC_PEB_NOT_EXHAUSTED: + need_migrate = false; + need_move = false; + bmap_index = + SSDFS_PEB_BLK_BMAP_SOURCE; + peb_index = pebc->src_peb->peb_index; + break; + + default: + need_migrate = false; + need_move = true; + bmap_index = + SSDFS_PEB_BLK_BMAP_DESTINATION; + peb_index = pebc->dst_peb->peb_index; + break; + } + } else { + err = -ERANGE; + SSDFS_ERR("fail to select PEB: " + "peb_migration_id %u, " + "src_migration_id %u, " + "dst_migration_id %u\n", + peb_migration_id, + src_migration_id, + dst_migration_id); + goto finish_define_bmap_index; + } + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid items_state %#x\n", + items_state); + goto finish_define_bmap_index; + }; + break; + + case SSDFS_PEB_MIGRATION_PREPARATION: + case SSDFS_PEB_RELATION_PREPARATION: + case SSDFS_PEB_FINISHING_MIGRATION: +#ifdef CONFIG_SSDFS_DEBUG + /* unexpected situation */ + SSDFS_WARN("unexpected situation\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + err = -EAGAIN; + goto finish_define_bmap_index; + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid migration_state %#x\n", + migration_state); + goto finish_define_bmap_index; + } + +finish_define_bmap_index: + if (err == -EAGAIN) { + DEFINE_WAIT(wait); + + err = 0; + mutex_unlock(&pebc->migration_lock); + up_read(&pebc->lock); + prepare_to_wait(&pebc->migration_wq, &wait, + TASK_UNINTERRUPTIBLE); + schedule(); + finish_wait(&pebc->migration_wq, &wait); + down_read(&pebc->lock); + mutex_lock(&pebc->migration_lock); + goto try_define_bmap_index; + } else if (unlikely(err)) { + SSDFS_ERR("fail to define bmap_index: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(need_migrate && need_move); + + SSDFS_DBG("seg_id %llu, migration_state %#x, items_state %#x, " + "peb_migration_id %u, src_migration_id %d, " + "dst_migration_id %d, migration_phase %#x\n", + si->seg_id, migration_state, items_state, + peb_migration_id, src_migration_id, + dst_migration_id, migration_phase); + SSDFS_DBG("seg_id %llu, need_migrate %#x, need_move %#x\n", + si->seg_id, need_migrate, need_move); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (need_migrate) { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(peb_index >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (peb_index >= bmap->pebs_count) { + SSDFS_ERR("peb_index %u >= pebs_count %u\n", + peb_index, bmap->pebs_count); + return -ERANGE; + } + + dst_blkbmap = &bmap->peb[peb_index]; + + err = ssdfs_peb_blk_bmap_migrate(dst_blkbmap, + range_state, + range); + if (unlikely(err)) { + SSDFS_ERR("fail to migrate: " + "range (start %u, len %u), " + "range_state %#x, " + "err %d\n", + range->start, range->len, + range_state, err); + SSDFS_ERR("seg_id %llu, peb_index %u, " + "peb_migration_id %u, " + "range (start %u, len %u)\n", + bmap->parent_si->seg_id, + pebc->peb_index, + peb_migration_id, + range->start, range->len); + SSDFS_ERR("seg_id %llu, migration_state %#x, " + "items_state %#x, " + "peb_migration_id %u, src_migration_id %d, " + "dst_migration_id %d, migration_phase %#x\n", + si->seg_id, migration_state, items_state, + peb_migration_id, src_migration_id, + dst_migration_id, migration_phase); + SSDFS_ERR("seg_id %llu, need_migrate %#x, " + "need_move %#x\n", + si->seg_id, need_migrate, need_move); + return err; + } + } else if (need_move) { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc->dst_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + peb_index = pebc->dst_peb->peb_index; + + if (peb_index >= bmap->pebs_count) { + SSDFS_ERR("peb_index %u >= pebs_count %u\n", + peb_index, bmap->pebs_count); + return -ERANGE; + } + + dst_blkbmap = &bmap->peb[peb_index]; + + if (range_state == SSDFS_BLK_PRE_ALLOCATED) { + err = ssdfs_peb_blk_bmap_pre_allocate(dst_blkbmap, + bmap_index, + NULL, + range); + } else { + err = ssdfs_peb_blk_bmap_allocate(dst_blkbmap, + bmap_index, + NULL, + range); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to move: " + "range (start %u, len %u), " + "range_state %#x, " + "err %d\n", + range->start, range->len, + range_state, err); + SSDFS_ERR("seg_id %llu, peb_index %u, " + "peb_migration_id %u, " + "range (start %u, len %u)\n", + bmap->parent_si->seg_id, + pebc->peb_index, + peb_migration_id, + range->start, range->len); + SSDFS_ERR("seg_id %llu, migration_state %#x, " + "items_state %#x, " + "peb_migration_id %u, src_migration_id %d, " + "dst_migration_id %d, migration_phase %#x\n", + si->seg_id, migration_state, items_state, + peb_migration_id, src_migration_id, + dst_migration_id, migration_phase); + SSDFS_ERR("seg_id %llu, need_migrate %#x, " + "need_move %#x\n", + si->seg_id, need_migrate, need_move); + return err; + } + } else { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(peb_index >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (peb_index >= bmap->pebs_count) { + SSDFS_ERR("peb_index %u >= pebs_count %u\n", + peb_index, bmap->pebs_count); + return -ERANGE; + } + + dst_blkbmap = &bmap->peb[peb_index]; + + err = ssdfs_peb_blk_bmap_update_range(dst_blkbmap, + bmap_index, + range_state, + range); + if (unlikely(err)) { + SSDFS_ERR("fail to update range: " + "range (start %u, len %u), " + "range_state %#x, " + "err %d\n", + range->start, range->len, + range_state, err); + SSDFS_ERR("seg_id %llu, peb_index %u, " + "peb_migration_id %u, " + "range (start %u, len %u)\n", + bmap->parent_si->seg_id, + pebc->peb_index, + peb_migration_id, + range->start, range->len); + SSDFS_ERR("seg_id %llu, migration_state %#x, " + "items_state %#x, " + "peb_migration_id %u, src_migration_id %d, " + "dst_migration_id %d, migration_phase %#x\n", + si->seg_id, migration_state, items_state, + peb_migration_id, src_migration_id, + dst_migration_id, migration_phase); + SSDFS_ERR("seg_id %llu, need_migrate %#x, " + "need_move %#x\n", + si->seg_id, need_migrate, need_move); + return err; + } + } + + return 0; +} diff --git a/fs/ssdfs/segment_block_bitmap.h b/fs/ssdfs/segment_block_bitmap.h new file mode 100644 index 000000000000..899e34a4343a --- /dev/null +++ b/fs/ssdfs/segment_block_bitmap.h @@ -0,0 +1,205 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/segment_block_bitmap.h - segment's block bitmap declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#ifndef _SSDFS_SEGMENT_BLOCK_BITMAP_H +#define _SSDFS_SEGMENT_BLOCK_BITMAP_H + +#include "peb_block_bitmap.h" + +/* + * struct ssdfs_segment_blk_bmap - segment block bitmap object + * @state: segment block bitmap's state + * @pages_per_peb: pages per physical erase block + * @pages_per_seg: pages per segment + * @modification_lock: lock for modification operations + * @seg_valid_blks: segment's valid logical blocks count + * @seg_invalid_blks: segment's invalid logical blocks count + * @seg_free_blks: segment's free logical blocks count + * @peb: array of PEB block bitmap objects + * @pebs_count: PEBs count in segment + * @parent_si: pointer on parent segment object + */ +struct ssdfs_segment_blk_bmap { + atomic_t state; + + u32 pages_per_peb; + u32 pages_per_seg; + + struct rw_semaphore modification_lock; + atomic_t seg_valid_blks; + atomic_t seg_invalid_blks; + atomic_t seg_free_blks; + + struct ssdfs_peb_blk_bmap *peb; + u16 pebs_count; + + struct ssdfs_segment_info *parent_si; +}; + +/* Segment block bitmap's possible states */ +enum { + SSDFS_SEG_BLK_BMAP_STATE_UNKNOWN, + SSDFS_SEG_BLK_BMAP_CREATED, + SSDFS_SEG_BLK_BMAP_STATE_MAX, +}; + +/* + * Segment block bitmap API + */ +int ssdfs_segment_blk_bmap_create(struct ssdfs_segment_info *si, + int init_flag, int init_state); +void ssdfs_segment_blk_bmap_destroy(struct ssdfs_segment_blk_bmap *ptr); +int ssdfs_segment_blk_bmap_partial_init(struct ssdfs_segment_blk_bmap *bmap, + u16 peb_index, + struct ssdfs_page_vector *source, + struct ssdfs_block_bitmap_fragment *hdr, + u64 cno); +void ssdfs_segment_blk_bmap_init_failed(struct ssdfs_segment_blk_bmap *bmap, + u16 peb_index); + +bool is_ssdfs_segment_blk_bmap_dirty(struct ssdfs_segment_blk_bmap *bmap, + u16 peb_index); + +bool has_ssdfs_segment_blk_bmap_initialized(struct ssdfs_segment_blk_bmap *ptr, + struct ssdfs_peb_container *pebc); +int ssdfs_segment_blk_bmap_wait_init_end(struct ssdfs_segment_blk_bmap *ptr, + struct ssdfs_peb_container *pebc); + +int ssdfs_segment_blk_bmap_reserve_metapages(struct ssdfs_segment_blk_bmap *ptr, + struct ssdfs_peb_container *pebc, + u32 count); +int ssdfs_segment_blk_bmap_free_metapages(struct ssdfs_segment_blk_bmap *ptr, + struct ssdfs_peb_container *pebc, + u32 count); +int ssdfs_segment_blk_bmap_reserve_block(struct ssdfs_segment_blk_bmap *ptr); +int ssdfs_segment_blk_bmap_reserve_extent(struct ssdfs_segment_blk_bmap *ptr, + u32 count); +int ssdfs_segment_blk_bmap_pre_allocate(struct ssdfs_segment_blk_bmap *ptr, + struct ssdfs_peb_container *pebc, + u32 *len, + struct ssdfs_block_bmap_range *range); +int ssdfs_segment_blk_bmap_allocate(struct ssdfs_segment_blk_bmap *ptr, + struct ssdfs_peb_container *pebc, + u32 *len, + struct ssdfs_block_bmap_range *range); +int ssdfs_segment_blk_bmap_update_range(struct ssdfs_segment_blk_bmap *ptr, + struct ssdfs_peb_container *pebc, + u8 peb_migration_id, + int range_state, + struct ssdfs_block_bmap_range *range); + +static inline +int ssdfs_segment_blk_bmap_get_free_pages(struct ssdfs_segment_blk_bmap *ptr) +{ +#ifdef CONFIG_SSDFS_DEBUG + int free_blks; + int valid_blks; + int invalid_blks; + int calculated; + + BUG_ON(!ptr); + + free_blks = atomic_read(&ptr->seg_free_blks); + valid_blks = atomic_read(&ptr->seg_valid_blks); + invalid_blks = atomic_read(&ptr->seg_invalid_blks); + calculated = free_blks + valid_blks + invalid_blks; + + SSDFS_DBG("free_logical_blks %d, valid_logical_blks %d, " + "invalid_logical_blks %d, pages_per_seg %u\n", + free_blks, valid_blks, invalid_blks, + ptr->pages_per_seg); + + if (calculated > ptr->pages_per_seg) { + SSDFS_WARN("free_logical_blks %d, valid_logical_blks %d, " + "invalid_logical_blks %d, calculated %d, " + "pages_per_seg %u\n", + free_blks, valid_blks, invalid_blks, + calculated, ptr->pages_per_seg); + } +#endif /* CONFIG_SSDFS_DEBUG */ + return atomic_read(&ptr->seg_free_blks); +} + +static inline +int ssdfs_segment_blk_bmap_get_used_pages(struct ssdfs_segment_blk_bmap *ptr) +{ +#ifdef CONFIG_SSDFS_DEBUG + int free_blks; + int valid_blks; + int invalid_blks; + int calculated; + + BUG_ON(!ptr); + + free_blks = atomic_read(&ptr->seg_free_blks); + valid_blks = atomic_read(&ptr->seg_valid_blks); + invalid_blks = atomic_read(&ptr->seg_invalid_blks); + calculated = free_blks + valid_blks + invalid_blks; + + SSDFS_DBG("free_logical_blks %d, valid_logical_blks %d, " + "invalid_logical_blks %d, pages_per_seg %u\n", + free_blks, valid_blks, invalid_blks, + ptr->pages_per_seg); + + if (calculated > ptr->pages_per_seg) { + SSDFS_WARN("free_logical_blks %d, valid_logical_blks %d, " + "invalid_logical_blks %d, calculated %d, " + "pages_per_seg %u\n", + free_blks, valid_blks, invalid_blks, + calculated, ptr->pages_per_seg); + } +#endif /* CONFIG_SSDFS_DEBUG */ + return atomic_read(&ptr->seg_valid_blks); +} + +static inline +int ssdfs_segment_blk_bmap_get_invalid_pages(struct ssdfs_segment_blk_bmap *ptr) +{ +#ifdef CONFIG_SSDFS_DEBUG + int free_blks; + int valid_blks; + int invalid_blks; + int calculated; + + BUG_ON(!ptr); + + free_blks = atomic_read(&ptr->seg_free_blks); + valid_blks = atomic_read(&ptr->seg_valid_blks); + invalid_blks = atomic_read(&ptr->seg_invalid_blks); + calculated = free_blks + valid_blks + invalid_blks; + + SSDFS_DBG("free_logical_blks %d, valid_logical_blks %d, " + "invalid_logical_blks %d, pages_per_seg %u\n", + free_blks, valid_blks, invalid_blks, + ptr->pages_per_seg); + + if (calculated > ptr->pages_per_seg) { + SSDFS_WARN("free_logical_blks %d, valid_logical_blks %d, " + "invalid_logical_blks %d, calculated %d, " + "pages_per_seg %u\n", + free_blks, valid_blks, invalid_blks, + calculated, ptr->pages_per_seg); + } +#endif /* CONFIG_SSDFS_DEBUG */ + return atomic_read(&ptr->seg_invalid_blks); +} + +#endif /* _SSDFS_SEGMENT_BLOCK_BITMAP_H */ From patchwork Sat Feb 25 01:08:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151920 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40BE1C6FA8E for ; Sat, 25 Feb 2023 01:16:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229600AbjBYBQd (ORCPT ); Fri, 24 Feb 2023 20:16:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229683AbjBYBQ1 (ORCPT ); Fri, 24 Feb 2023 20:16:27 -0500 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 940C114EBA for ; Fri, 24 Feb 2023 17:16:09 -0800 (PST) Received: by mail-oi1-x22e.google.com with SMTP id bk32so762466oib.10 for ; Fri, 24 Feb 2023 17:16:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rEtIMF7jvR+q3tDMJmdQuHQ1cOBS0qnrBNVMhJsDmig=; b=X3iyCXxucZJkxPcK6qbUWCCSdWXilxfoYYOhHnjDhpyQZnGcSHcM0oorbsEYtLvZYo w98l6d1Glgxue/LPAPSuJjd82y1k/d4h/RG1UA+uGzW4S4s8PT/BDiWBYsIHo8HTqC9y q3poXNM7DqQIkgCkj916iQmabEcfghUUlkuYfp0y0IWIuQTfT01KXaA4INy/tXBwpWMg XMHPySWd5lHdJMeOaJnPls71nH5b8Rx0fkTYP4Be2pIg6m6kqeh7XgQDNm9yOP+43COY 8eQB7X8DYetYmfMT9ioVm3K/45h3HWkfI5md1rgBjLAI7dagEuGuzRTXgnf6iv0pa6kF ReuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rEtIMF7jvR+q3tDMJmdQuHQ1cOBS0qnrBNVMhJsDmig=; b=QpdWIos1ArgcLzHSnExejdxI6tC37XoxDJsjYVZIL2dV6sbwLi9wPzDAVLLBHhN9yt Yh0cy9d1dawNBTamWyIPwDlQhUDyG5woQzzwTfGrqti/aNBARWOG7Mz3NCP67P55C2Bn RDHPm3/cQeiKE1yHAxJ7aEGUKr/CQ+D+OCqy3XF9WtC3GCYVGTldWlESwBVUUR7w01kk 8JVNfUOuCI/DzVDGGWisDp2FGReGskZ+kzMIwho+aEAUuQw2b8HLi/qjhs4yAynaoNYi wISF0CwSlDQGwQ9Fcv/fs/FKDEGxIl13BHKOg/5JSwn3mtjvNt0XfIJpxHCU1Xm+v9uW YCFg== X-Gm-Message-State: AO0yUKX9OiJe/sfnt2TRQkG3koONK/7OW47h/+/Ft6dojXZuySHs6gyN KKe/fAGDDgAIpJV8/WcWjIpnBKH0NpWrtOoH X-Google-Smtp-Source: AK7set8LpXTa9elhrOEEehfwu97q7TqRG7UNNY1Ci+sFfztQE+7Ne44MdzV3sNfxsq/nwPHgrdfbjg== X-Received: by 2002:a54:4103:0:b0:371:c6e:f45a with SMTP id l3-20020a544103000000b003710c6ef45amr7553995oic.34.1677287768332; Fri, 24 Feb 2023 17:16:08 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:07 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 16/76] ssdfs: introduce segment request queue Date: Fri, 24 Feb 2023 17:08:27 -0800 Message-Id: <20230225010927.813929-17-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org SSDFS implements a current segment concept. This concept implies that current segment object receives requests for adding new data or metadata. Also, PEB container object can receive requests for update of existing data or metadata. Segment request queue implements concept of such add new data or update requests. Segment request object defines: (1) logical extent (inode ID, logical offset from file's beginning in bytes, valid bytes count in request); (2) volume extent (segment ID, logical block ID, length in logical blocks); (3) request class (read, create, update, and so on); (4) request command (read page, create block, update block, and so on); (5) request type (synchronous, asynchronous, and so on). Current segment has create queue and every PEB container object has update queue. Caller needs to allocate segment request object, initialize it, and add into particular request queue. PEB's dedicated thread takes requests from the queue and executes requested action. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/request_queue.c | 1240 ++++++++++++++++++++++++++++++++++++++ fs/ssdfs/request_queue.h | 417 +++++++++++++ 2 files changed, 1657 insertions(+) create mode 100644 fs/ssdfs/request_queue.c create mode 100644 fs/ssdfs/request_queue.h diff --git a/fs/ssdfs/request_queue.c b/fs/ssdfs/request_queue.c new file mode 100644 index 000000000000..985adfe31bb3 --- /dev/null +++ b/fs/ssdfs/request_queue.c @@ -0,0 +1,1240 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/request_queue.c - request queue implementation. + * + * Copyright (c) 2014-2019, HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "request_queue.h" +#include "segment_bitmap.h" +#include "offset_translation_table.h" +#include "page_array.h" +#include "page_vector.h" +#include "peb_container.h" +#include "segment.h" +#include "btree_search.h" +#include "btree_node.h" +#include "btree.h" +#include "snapshots_tree.h" + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_req_queue_page_leaks; +atomic64_t ssdfs_req_queue_memory_leaks; +atomic64_t ssdfs_req_queue_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_req_queue_cache_leaks_increment(void *kaddr) + * void ssdfs_req_queue_cache_leaks_decrement(void *kaddr) + * void *ssdfs_req_queue_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_req_queue_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_req_queue_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_req_queue_kfree(void *kaddr) + * struct page *ssdfs_req_queue_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_req_queue_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_req_queue_free_page(struct page *page) + * void ssdfs_req_queue_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(req_queue) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(req_queue) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_req_queue_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_req_queue_page_leaks, 0); + atomic64_set(&ssdfs_req_queue_memory_leaks, 0); + atomic64_set(&ssdfs_req_queue_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_req_queue_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_req_queue_page_leaks) != 0) { + SSDFS_ERR("REQUESTS QUEUE: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_req_queue_page_leaks)); + } + + if (atomic64_read(&ssdfs_req_queue_memory_leaks) != 0) { + SSDFS_ERR("REQUESTS QUEUE: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_req_queue_memory_leaks)); + } + + if (atomic64_read(&ssdfs_req_queue_cache_leaks) != 0) { + SSDFS_ERR("REQUESTS QUEUE: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_req_queue_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +static struct kmem_cache *ssdfs_seg_req_obj_cachep; + +void ssdfs_zero_seg_req_obj_cache_ptr(void) +{ + ssdfs_seg_req_obj_cachep = NULL; +} + +static +void ssdfs_init_seg_req_object_once(void *obj) +{ + struct ssdfs_segment_request *req_obj = obj; + + memset(req_obj, 0, sizeof(struct ssdfs_segment_request)); +} + +void ssdfs_shrink_seg_req_obj_cache(void) +{ + if (ssdfs_seg_req_obj_cachep) + kmem_cache_shrink(ssdfs_seg_req_obj_cachep); +} + +void ssdfs_destroy_seg_req_obj_cache(void) +{ + if (ssdfs_seg_req_obj_cachep) + kmem_cache_destroy(ssdfs_seg_req_obj_cachep); +} + +int ssdfs_init_seg_req_obj_cache(void) +{ + ssdfs_seg_req_obj_cachep = kmem_cache_create("ssdfs_seg_req_obj_cache", + sizeof(struct ssdfs_segment_request), 0, + SLAB_RECLAIM_ACCOUNT | + SLAB_MEM_SPREAD | + SLAB_ACCOUNT, + ssdfs_init_seg_req_object_once); + if (!ssdfs_seg_req_obj_cachep) { + SSDFS_ERR("unable to create segment request objects cache\n"); + return -ENOMEM; + } + + return 0; +} + +/* + * ssdfs_requests_queue_init() - initialize request queue + * @rq: initialized request queue + */ +void ssdfs_requests_queue_init(struct ssdfs_requests_queue *rq) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!rq); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock_init(&rq->lock); + INIT_LIST_HEAD(&rq->list); +} + +/* + * is_ssdfs_requests_queue_empty() - check that requests queue is empty + * @rq: requests queue + */ +bool is_ssdfs_requests_queue_empty(struct ssdfs_requests_queue *rq) +{ + bool is_empty; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!rq); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&rq->lock); + is_empty = list_empty_careful(&rq->list); + spin_unlock(&rq->lock); + + return is_empty; +} + +/* + * ssdfs_requests_queue_add_head() - add request at the head of queue + * @rq: requests queue + * @req: request + */ +void ssdfs_requests_queue_add_head(struct ssdfs_requests_queue *rq, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!rq || !req); + + SSDFS_DBG("seg_id %llu, class %#x, cmd %#x\n", + req->place.start.seg_id, + req->private.class, + req->private.cmd); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&rq->lock); + list_add(&req->list, &rq->list); + spin_unlock(&rq->lock); +} + +/* + * ssdfs_requests_queue_add_head_inc() - add request at the head of queue + * @fsi: pointer on shared file system object + * @rq: requests queue + * @req: request + */ +void ssdfs_requests_queue_add_head_inc(struct ssdfs_fs_info *fsi, + struct ssdfs_requests_queue *rq, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !rq || !req); + + SSDFS_DBG("seg_id %llu, class %#x, cmd %#x\n", + req->place.start.seg_id, + req->private.class, + req->private.cmd); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_requests_queue_add_head(rq, req); + atomic64_inc(&fsi->flush_reqs); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("flush_reqs %lld\n", + atomic64_read(&fsi->flush_reqs)); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +/* + * ssdfs_requests_queue_add_tail() - add request at the tail of queue + * @rq: requests queue + * @req: request + */ +void ssdfs_requests_queue_add_tail(struct ssdfs_requests_queue *rq, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!rq || !req); + + SSDFS_DBG("seg_id %llu, class %#x, cmd %#x\n", + req->place.start.seg_id, + req->private.class, + req->private.cmd); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&rq->lock); + list_add_tail(&req->list, &rq->list); + spin_unlock(&rq->lock); +} + +/* + * ssdfs_requests_queue_add_tail_inc() - add request at the tail of queue + * @fsi: pointer on shared file system object + * @rq: requests queue + * @req: request + */ +void ssdfs_requests_queue_add_tail_inc(struct ssdfs_fs_info *fsi, + struct ssdfs_requests_queue *rq, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !rq || !req); + + SSDFS_DBG("seg_id %llu, class %#x, cmd %#x\n", + req->place.start.seg_id, + req->private.class, + req->private.cmd); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_requests_queue_add_tail(rq, req); + atomic64_inc(&fsi->flush_reqs); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("flush_reqs %lld\n", + atomic64_read(&fsi->flush_reqs)); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +/* + * is_request_command_valid() - check request's command validity + * @class: request's class + * @cmd: request's command + */ +static inline +bool is_request_command_valid(int class, int cmd) +{ + bool is_valid = false; + + switch (class) { + case SSDFS_PEB_READ_REQ: + is_valid = cmd > SSDFS_UNKNOWN_CMD && + cmd < SSDFS_READ_CMD_MAX; + break; + + case SSDFS_PEB_PRE_ALLOCATE_DATA_REQ: + case SSDFS_PEB_CREATE_DATA_REQ: + case SSDFS_PEB_PRE_ALLOCATE_LNODE_REQ: + case SSDFS_PEB_CREATE_LNODE_REQ: + case SSDFS_PEB_PRE_ALLOCATE_HNODE_REQ: + case SSDFS_PEB_CREATE_HNODE_REQ: + case SSDFS_PEB_PRE_ALLOCATE_IDXNODE_REQ: + case SSDFS_PEB_CREATE_IDXNODE_REQ: + case SSDFS_ZONE_USER_DATA_MIGRATE_REQ: + is_valid = cmd > SSDFS_READ_CMD_MAX && + cmd < SSDFS_CREATE_CMD_MAX; + break; + + case SSDFS_PEB_UPDATE_REQ: + case SSDFS_PEB_PRE_ALLOC_UPDATE_REQ: + is_valid = cmd > SSDFS_CREATE_CMD_MAX && + cmd < SSDFS_UPDATE_CMD_MAX; + break; + + case SSDFS_PEB_DIFF_ON_WRITE_REQ: + is_valid = cmd > SSDFS_UPDATE_CMD_MAX && + cmd < SSDFS_DIFF_ON_WRITE_MAX; + break; + + case SSDFS_PEB_COLLECT_GARBAGE_REQ: + is_valid = cmd > SSDFS_DIFF_ON_WRITE_MAX && + cmd < SSDFS_COLLECT_GARBAGE_CMD_MAX; + break; + + default: + is_valid = false; + } + + return is_valid; +} + +/* + * ssdfs_requests_queue_remove_first() - get request and remove from queue + * @rq: requests queue + * @req: first request [out] + * + * This function get first request in @rq, remove it from queue + * and return as @req. + * + * RETURN: + * [success] - @req contains pointer on request. + * [failure] - error code: + * + * %-ENODATA - queue is empty. + * %-ENOENT - first empty is NULL. + */ +int ssdfs_requests_queue_remove_first(struct ssdfs_requests_queue *rq, + struct ssdfs_segment_request **req) +{ + bool is_empty; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!rq || !req); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&rq->lock); + is_empty = list_empty_careful(&rq->list); + if (!is_empty) { + *req = list_first_entry_or_null(&rq->list, + struct ssdfs_segment_request, + list); + if (!*req) { + SSDFS_WARN("first entry is NULL\n"); + err = -ENOENT; + } else + list_del(&(*req)->list); + } + spin_unlock(&rq->lock); + + if (is_empty) { + SSDFS_WARN("requests queue is empty\n"); + return -ENODATA; + } else if (err) + return err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!is_request_command_valid((*req)->private.class, + (*req)->private.cmd)); + BUG_ON((*req)->private.type >= SSDFS_REQ_TYPE_MAX); + + SSDFS_DBG("seg_id %llu, class %#x, cmd %#x\n", + (*req)->place.start.seg_id, + (*req)->private.class, + (*req)->private.cmd); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_requests_queue_remove_all() - remove all requests from queue + * @rq: requests queue + * @err: error code + * + * This function removes all requests from the queue. + */ +void ssdfs_requests_queue_remove_all(struct ssdfs_requests_queue *rq, + int err) +{ + bool is_empty; + LIST_HEAD(tmp_list); + struct list_head *this, *next; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!rq); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&rq->lock); + is_empty = list_empty_careful(&rq->list); + if (!is_empty) + list_replace_init(&rq->list, &tmp_list); + spin_unlock(&rq->lock); + + if (is_empty) + return; + + list_for_each_safe(this, next, &tmp_list) { + struct ssdfs_segment_request *req; + unsigned int i; + + req = list_entry(this, struct ssdfs_segment_request, list); + + if (!req) { + SSDFS_WARN("empty request ptr\n"); + continue; + } + + list_del(&req->list); + + SSDFS_WARN("delete request: " + "class %#x, cmd %#x, type %#x, refs_count %u, " + "seg %llu, extent (start %u, len %u)\n", + req->private.class, req->private.cmd, + req->private.type, + atomic_read(&req->private.refs_count), + req->place.start.seg_id, + req->place.start.blk_index, + req->place.len); + + atomic_set(&req->result.state, SSDFS_REQ_FAILED); + + switch (req->private.type) { + case SSDFS_REQ_SYNC: + req->result.err = err; + complete(&req->result.wait); + wake_up_all(&req->private.wait_queue); + break; + + case SSDFS_REQ_ASYNC: + complete(&req->result.wait); + wake_up_all(&req->private.wait_queue); + + for (i = 0; i < pagevec_count(&req->result.pvec); i++) { + struct page *page = req->result.pvec.pages[i]; + + if (!page) { + SSDFS_WARN("empty page ptr: index %u\n", i); + continue; + } + +#ifdef CONFIG_SSDFS_DEBUG + WARN_ON(!PageLocked(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ClearPageUptodate(page); + ssdfs_clear_page_private(page, 0); + ClearPageMappedToDisk(page); + ssdfs_clear_dirty_page(page); + ssdfs_unlock_page(page); + end_page_writeback(page); + } + + ssdfs_put_request(req); + ssdfs_request_free(req); + break; + + case SSDFS_REQ_ASYNC_NO_FREE: + complete(&req->result.wait); + wake_up_all(&req->private.wait_queue); + + for (i = 0; i < pagevec_count(&req->result.pvec); i++) { + struct page *page = req->result.pvec.pages[i]; + + if (!page) { + SSDFS_WARN("empty page ptr: index %u\n", i); + continue; + } + +#ifdef CONFIG_SSDFS_DEBUG + WARN_ON(!PageLocked(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ClearPageUptodate(page); + ssdfs_clear_page_private(page, 0); + ClearPageMappedToDisk(page); + ssdfs_clear_dirty_page(page); + ssdfs_unlock_page(page); + end_page_writeback(page); + } + + ssdfs_put_request(req); + break; + + default: + BUG(); + }; + } +} + +/* + * ssdfs_request_alloc() - allocate memory for segment request object + */ +struct ssdfs_segment_request *ssdfs_request_alloc(void) +{ + struct ssdfs_segment_request *ptr; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ssdfs_seg_req_obj_cachep); +#endif /* CONFIG_SSDFS_DEBUG */ + + ptr = kmem_cache_alloc(ssdfs_seg_req_obj_cachep, GFP_KERNEL); + if (!ptr) { + SSDFS_ERR("fail to allocate memory for request\n"); + return ERR_PTR(-ENOMEM); + } + + ssdfs_req_queue_cache_leaks_increment(ptr); + + return ptr; +} + +/* + * ssdfs_request_free() - free memory for segment request object + */ +void ssdfs_request_free(struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ssdfs_seg_req_obj_cachep); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!req) + return; + + ssdfs_req_queue_cache_leaks_decrement(req); + kmem_cache_free(ssdfs_seg_req_obj_cachep, req); +} + +/* + * ssdfs_request_init() - common request initialization + * @req: request [out] + */ +void ssdfs_request_init(struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset(req, 0, sizeof(struct ssdfs_segment_request)); + + INIT_LIST_HEAD(&req->list); + atomic_set(&req->private.refs_count, 0); + init_waitqueue_head(&req->private.wait_queue); + pagevec_init(&req->result.pvec); + pagevec_init(&req->result.diffs); + atomic_set(&req->result.state, SSDFS_REQ_CREATED); + init_completion(&req->result.wait); + req->result.err = 0; +} + +/* + * ssdfs_get_request() - increment reference counter + * @req: request + */ +void ssdfs_get_request(struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); +#endif /* CONFIG_SSDFS_DEBUG */ + + WARN_ON(atomic_inc_return(&req->private.refs_count) <= 0); +} + +/* + * ssdfs_put_request() - decrement reference counter + * @req: request + */ +void ssdfs_put_request(struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (atomic_dec_return(&req->private.refs_count) < 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("request's reference count %d\n", + atomic_read(&req->private.refs_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + } +} + +/* + * ssdfs_request_add_page() - add memory page into segment request + * @page: memory page + * @req: segment request [out] + */ +int ssdfs_request_add_page(struct page *page, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page || !req); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pagevec_space(&req->result.pvec) == 0) { + SSDFS_WARN("request's pagevec is full\n"); + return -E2BIG; + } + + pagevec_add(&req->result.pvec, page); + return 0; +} + +/* + * ssdfs_request_add_diff_page() - add diff page into segment request + * @page: memory page + * @req: segment request [out] + */ +int ssdfs_request_add_diff_page(struct page *page, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page || !req); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pagevec_space(&req->result.diffs) == 0) { + SSDFS_WARN("request's pagevec is full\n"); + return -E2BIG; + } + + pagevec_add(&req->result.diffs, page); + return 0; +} + +/* + * ssdfs_request_allocate_and_add_page() - allocate and add page into request + * @req: segment request [out] + */ +struct page * +ssdfs_request_allocate_and_add_page(struct ssdfs_segment_request *req) +{ + struct page *page; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); + + SSDFS_DBG("pagevec count %d\n", + pagevec_count(&req->result.pvec)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pagevec_space(&req->result.pvec) == 0) { + SSDFS_WARN("request's pagevec is full\n"); + return ERR_PTR(-E2BIG); + } + + page = ssdfs_req_queue_alloc_page(GFP_KERNEL | __GFP_ZERO); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("unable to allocate memory page\n"); + return ERR_PTR(err); + } + + pagevec_add(&req->result.pvec, page); + return page; +} + +/* + * ssdfs_request_allocate_and_add_diff_page() - allocate and add diff page + * @req: segment request [out] + */ +struct page * +ssdfs_request_allocate_and_add_diff_page(struct ssdfs_segment_request *req) +{ + struct page *page; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); + + SSDFS_DBG("pagevec count %d\n", + pagevec_count(&req->result.diffs)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pagevec_space(&req->result.diffs) == 0) { + SSDFS_WARN("request's pagevec is full\n"); + return ERR_PTR(-E2BIG); + } + + page = ssdfs_req_queue_alloc_page(GFP_KERNEL | __GFP_ZERO); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("unable to allocate memory page\n"); + return ERR_PTR(err); + } + + pagevec_add(&req->result.diffs, page); + return page; +} + +/* + * ssdfs_request_allocate_and_add_old_state_page() - allocate+add old state page + * @req: segment request [out] + */ +struct page * +ssdfs_request_allocate_and_add_old_state_page(struct ssdfs_segment_request *req) +{ + struct page *page; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); + + SSDFS_DBG("pagevec count %d\n", + pagevec_count(&req->result.old_state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pagevec_space(&req->result.old_state) == 0) { + SSDFS_WARN("request's pagevec is full\n"); + return ERR_PTR(-E2BIG); + } + + page = ssdfs_req_queue_alloc_page(GFP_KERNEL | __GFP_ZERO); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("unable to allocate memory page\n"); + return ERR_PTR(err); + } + + pagevec_add(&req->result.old_state, page); + return page; +} + +/* + * ssdfs_request_allocate_locked_page() - allocate and add locked page + * @req: segment request [out] + * @page_index: index of the page + */ +struct page * +ssdfs_request_allocate_locked_page(struct ssdfs_segment_request *req, + int page_index) +{ + struct page *page; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); + + SSDFS_DBG("pagevec count %d\n", + pagevec_count(&req->result.pvec)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pagevec_space(&req->result.pvec) == 0) { + SSDFS_WARN("request's pagevec is full\n"); + return ERR_PTR(-E2BIG); + } + + if (page_index >= PAGEVEC_SIZE) { + SSDFS_ERR("invalid page index %d\n", + page_index); + return ERR_PTR(-EINVAL); + } + + page = req->result.pvec.pages[page_index]; + + if (page) { + SSDFS_ERR("page already exists: index %d\n", + page_index); + return ERR_PTR(-EINVAL); + } + + page = ssdfs_req_queue_alloc_page(GFP_KERNEL | __GFP_ZERO); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("unable to allocate memory page\n"); + return ERR_PTR(err); + } + + req->result.pvec.pages[page_index] = page; + + if ((page_index + 1) > req->result.pvec.nr) + req->result.pvec.nr = page_index + 1; + + ssdfs_lock_page(page); + + return page; +} + +/* + * ssdfs_request_allocate_locked_diff_page() - allocate locked diff page + * @req: segment request [out] + * @page_index: index of the page + */ +struct page * +ssdfs_request_allocate_locked_diff_page(struct ssdfs_segment_request *req, + int page_index) +{ + struct page *page; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); + + SSDFS_DBG("pagevec count %d\n", + pagevec_count(&req->result.diffs)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pagevec_space(&req->result.diffs) == 0) { + SSDFS_WARN("request's pagevec is full\n"); + return ERR_PTR(-E2BIG); + } + + if (page_index >= PAGEVEC_SIZE) { + SSDFS_ERR("invalid page index %d\n", + page_index); + return ERR_PTR(-EINVAL); + } + + page = req->result.diffs.pages[page_index]; + + if (page) { + SSDFS_ERR("page already exists: index %d\n", + page_index); + return ERR_PTR(-EINVAL); + } + + page = ssdfs_req_queue_alloc_page(GFP_KERNEL | __GFP_ZERO); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("unable to allocate memory page\n"); + return ERR_PTR(err); + } + + req->result.diffs.pages[page_index] = page; + + if ((page_index + 1) > req->result.diffs.nr) + req->result.diffs.nr = page_index + 1; + + ssdfs_lock_page(page); + + return page; +} + +/* + * ssdfs_request_add_allocated_page_locked() - allocate, add and lock page + * @req: segment request [out] + */ +int ssdfs_request_add_allocated_page_locked(struct ssdfs_segment_request *req) +{ + struct page *page; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = ssdfs_request_allocate_and_add_page(req); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("fail to allocate page: err %d\n", + err); + return err; + } + + ssdfs_lock_page(page); + return 0; +} + +/* + * ssdfs_request_add_allocated_diff_locked() - allocate, add and lock page + * @req: segment request [out] + */ +int ssdfs_request_add_allocated_diff_locked(struct ssdfs_segment_request *req) +{ + struct page *page; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = ssdfs_request_allocate_and_add_diff_page(req); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("fail to allocate page: err %d\n", + err); + return err; + } + + ssdfs_lock_page(page); + return 0; +} + +/* + * ssdfs_request_add_old_state_page_locked() - allocate, add and lock page + * @req: segment request [out] + */ +int ssdfs_request_add_old_state_page_locked(struct ssdfs_segment_request *req) +{ + struct page *page; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = ssdfs_request_allocate_and_add_old_state_page(req); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("fail to allocate page: err %d\n", + err); + return err; + } + + ssdfs_lock_page(page); + return 0; +} + +/* + * ssdfs_request_unlock_and_remove_pages() - unlock and remove pages + * @req: segment request [out] + */ +void ssdfs_request_unlock_and_remove_pages(struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_unlock_and_remove_old_state(req); + ssdfs_request_unlock_and_remove_update(req); + ssdfs_request_unlock_and_remove_diffs(req); +} + +/* + * ssdfs_request_unlock_and_remove_update() - unlock and remove update pages + * @req: segment request [out] + */ +void ssdfs_request_unlock_and_remove_update(struct ssdfs_segment_request *req) +{ + unsigned count; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); +#endif /* CONFIG_SSDFS_DEBUG */ + + count = pagevec_count(&req->result.pvec); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("result: pages count %u\n", + count); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < count; i++) { + struct page *page = req->result.pvec.pages[i]; + + if (!page) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %d is NULL\n", i); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } + + ssdfs_unlock_page(page); + } + + ssdfs_req_queue_pagevec_release(&req->result.pvec); +} + +/* + * ssdfs_request_unlock_and_remove_diffs() - unlock and remove diffs + * @req: segment request [out] + */ +void ssdfs_request_unlock_and_remove_diffs(struct ssdfs_segment_request *req) +{ + unsigned count; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); +#endif /* CONFIG_SSDFS_DEBUG */ + + count = pagevec_count(&req->result.diffs); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("diff: pages count %u\n", + count); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < count; i++) { + struct page *page = req->result.diffs.pages[i]; + + if (!page) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %d is NULL\n", i); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } + + ssdfs_unlock_page(page); + } + + ssdfs_req_queue_pagevec_release(&req->result.diffs); +} + +/* + * ssdfs_request_unlock_and_remove_old_state() - unlock and remove old state + * @req: segment request [out] + */ +void ssdfs_request_unlock_and_remove_old_state(struct ssdfs_segment_request *req) +{ + unsigned count; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); +#endif /* CONFIG_SSDFS_DEBUG */ + + count = pagevec_count(&req->result.old_state); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("old_state: pages count %u\n", + count); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < count; i++) { + struct page *page = req->result.old_state.pages[i]; + + if (!page) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %d is NULL\n", i); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } + + ssdfs_unlock_page(page); + } + + ssdfs_req_queue_pagevec_release(&req->result.old_state); +} + +/* + * ssdfs_request_switch_update_on_diff() - switch block update on diff page + * @fsi: shared file system info object + * @diff_page: page with prepared delta + * @req: segment request [out] + */ +int ssdfs_request_switch_update_on_diff(struct ssdfs_fs_info *fsi, + struct page *diff_page, + struct ssdfs_segment_request *req) +{ + struct page *page; + u32 mem_pages_per_block; + int page_index; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !req); +#endif /* CONFIG_SSDFS_DEBUG */ + + mem_pages_per_block = fsi->pagesize / PAGE_SIZE; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(mem_pages_per_block == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_unlock_and_remove_old_state(req); + + page_index = req->result.processed_blks * mem_pages_per_block; + + for (i = 0; i < mem_pages_per_block; i++) { + page_index += i; + + if (page_index >= pagevec_count(&req->result.pvec)) { + SSDFS_ERR("page_index %d >= pvec_size %u\n", + page_index, + pagevec_count(&req->result.pvec)); + return -ERANGE; + } + + page = req->result.pvec.pages[page_index]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + clear_page_new(page); + SetPageUptodate(page); + ssdfs_clear_dirty_page(page); + + ssdfs_unlock_page(page); + end_page_writeback(page); + + if (!(req->private.flags & SSDFS_REQ_DONT_FREE_PAGES)) + ssdfs_req_queue_forget_page(page); + + req->result.pvec.pages[page_index] = NULL; + } + + page_index = req->result.processed_blks * mem_pages_per_block; + set_page_new(diff_page); + req->result.pvec.pages[page_index] = diff_page; + req->result.diffs.pages[0] = NULL; + + if (pagevec_count(&req->result.diffs) > 1) { + SSDFS_WARN("diff pagevec contains several pages %u\n", + pagevec_count(&req->result.diffs)); + ssdfs_req_queue_pagevec_release(&req->result.diffs); + } else + pagevec_reinit(&req->result.diffs); + + return 0; +} + +/* + * ssdfs_request_unlock_and_remove_page() - unlock and remove page + * @req: segment request [in|out] + * @page_index: page index + */ +void ssdfs_request_unlock_and_remove_page(struct ssdfs_segment_request *req, + int page_index) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (page_index >= pagevec_count(&req->result.pvec)) { + SSDFS_ERR("page_index %d >= pagevec_count %u\n", + page_index, + pagevec_count(&req->result.pvec)); + return; + } + + if (!req->result.pvec.pages[page_index]) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %d is NULL\n", page_index); +#endif /* CONFIG_SSDFS_DEBUG */ + return; + } + + ssdfs_unlock_page(req->result.pvec.pages[page_index]); + ssdfs_req_queue_forget_page(req->result.pvec.pages[page_index]); + req->result.pvec.pages[page_index] = NULL; +} + +/* + * ssdfs_free_flush_request_pages() - unlock and remove flush request's pages + * @req: segment request [out] + */ +void ssdfs_free_flush_request_pages(struct ssdfs_segment_request *req) +{ + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < pagevec_count(&req->result.pvec); i++) { + struct page *page = req->result.pvec.pages[i]; + bool need_free_page = false; + + if (!page) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %d is NULL\n", i); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } + + if (need_add_block(page)) { + clear_page_new(page); + + if (req->private.flags & SSDFS_REQ_PREPARE_DIFF) + need_free_page = true; + } + + if (PageWriteback(page)) + end_page_writeback(page); + else { + SSDFS_WARN("page %d is not under writeback: " + "cmd %#x, type %#x\n", + i, req->private.cmd, + req->private.type); + } + + if (PageLocked(page)) + ssdfs_unlock_page(page); + else { + SSDFS_WARN("page %d is not locked: " + "cmd %#x, type %#x\n", + i, req->private.cmd, + req->private.type); + } + + req->result.pvec.pages[i] = NULL; + + if (need_free_page) + ssdfs_req_queue_free_page(page); + else if (!(req->private.flags & SSDFS_REQ_DONT_FREE_PAGES)) + ssdfs_req_queue_free_page(page); + } + + if (req->private.flags & SSDFS_REQ_DONT_FREE_PAGES) { + /* + * Do nothing + */ + } else + pagevec_reinit(&req->result.pvec); +} + +/* + * ssdfs_peb_extent_length() - determine extent length in pagevec + * @si: segment object + * @pvec: page vector + */ +u8 ssdfs_peb_extent_length(struct ssdfs_segment_info *si, + struct pagevec *pvec) +{ + u32 len; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !si->fsi || !pvec); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (si->fsi->pagesize < PAGE_SIZE) { + BUG_ON(PAGE_SIZE % si->fsi->pagesize); + len = PAGE_SIZE / si->fsi->pagesize; + len *= pagevec_count(pvec); + BUG_ON(len == 0); + } else { + len = pagevec_count(pvec) * PAGE_SIZE; + BUG_ON(len == 0); + BUG_ON(len % si->fsi->pagesize); + len = si->fsi->pagesize / len; + BUG_ON(len == 0); + } + + BUG_ON(len >= U8_MAX); + return (u8)len; +} diff --git a/fs/ssdfs/request_queue.h b/fs/ssdfs/request_queue.h new file mode 100644 index 000000000000..7287182a3875 --- /dev/null +++ b/fs/ssdfs/request_queue.h @@ -0,0 +1,417 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/request_queue.h - request queue declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#ifndef _SSDFS_REQUEST_QUEUE_H +#define _SSDFS_REQUEST_QUEUE_H + +#include + +/* + * struct ssdfs_requests_queue - requests queue descriptor + * @lock: requests queue's lock + * @list: requests queue's list + */ +struct ssdfs_requests_queue { + spinlock_t lock; + struct list_head list; +}; + +/* + * Request classes + */ +enum { + SSDFS_UNKNOWN_REQ_CLASS, /* 0x00 */ + SSDFS_PEB_READ_REQ, /* 0x01 */ + SSDFS_PEB_PRE_ALLOCATE_DATA_REQ, /* 0x02 */ + SSDFS_PEB_CREATE_DATA_REQ, /* 0x03 */ + SSDFS_PEB_PRE_ALLOCATE_LNODE_REQ, /* 0x04 */ + SSDFS_PEB_CREATE_LNODE_REQ, /* 0x05 */ + SSDFS_PEB_PRE_ALLOCATE_HNODE_REQ, /* 0x06 */ + SSDFS_PEB_CREATE_HNODE_REQ, /* 0x07 */ + SSDFS_PEB_PRE_ALLOCATE_IDXNODE_REQ, /* 0x08 */ + SSDFS_PEB_CREATE_IDXNODE_REQ, /* 0x09 */ + SSDFS_PEB_UPDATE_REQ, /* 0x0A */ + SSDFS_PEB_PRE_ALLOC_UPDATE_REQ, /* 0x0B */ + SSDFS_PEB_DIFF_ON_WRITE_REQ, /* 0x0C */ + SSDFS_PEB_COLLECT_GARBAGE_REQ, /* 0x0D */ + SSDFS_ZONE_USER_DATA_MIGRATE_REQ, /* 0x0E */ + SSDFS_PEB_REQ_CLASS_MAX, /* 0x0F */ +}; + +/* + * Request commands + */ +enum { + SSDFS_UNKNOWN_CMD, /* 0x00 */ + SSDFS_READ_PAGE, /* 0x01 */ + SSDFS_READ_PAGES_READAHEAD, /* 0x02 */ + SSDFS_READ_SRC_ALL_LOG_HEADERS, /* 0x03 */ + SSDFS_READ_DST_ALL_LOG_HEADERS, /* 0x04 */ + SSDFS_READ_BLK_BMAP_SRC_USING_PEB, /* 0x05 */ + SSDFS_READ_BLK_BMAP_DST_USING_PEB, /* 0x06 */ + SSDFS_READ_BLK_BMAP_SRC_USED_PEB, /* 0x07 */ + SSDFS_READ_BLK_BMAP_DST_USED_PEB, /* 0x08 */ + SSDFS_READ_BLK2OFF_TABLE_SRC_PEB, /* 0x09 */ + SSDFS_READ_BLK2OFF_TABLE_DST_PEB, /* 0x0A */ + SSDFS_READ_INIT_SEGBMAP, /* 0x0B */ + SSDFS_READ_INIT_MAPTBL, /* 0x0C */ + SSDFS_READ_SRC_LAST_LOG_FOOTER, /* 0x0D */ + SSDFS_READ_DST_LAST_LOG_FOOTER, /* 0x0E */ + SSDFS_READ_CMD_MAX, /* 0x0F */ + SSDFS_CREATE_BLOCK, /* 0x10 */ + SSDFS_CREATE_EXTENT, /* 0x11 */ + SSDFS_MIGRATE_ZONE_USER_BLOCK, /* 0x12 */ + SSDFS_MIGRATE_ZONE_USER_EXTENT, /* 0x13 */ + SSDFS_CREATE_CMD_MAX, /* 0x14 */ + SSDFS_UPDATE_BLOCK, /* 0x15 */ + SSDFS_UPDATE_PRE_ALLOC_BLOCK, /* 0x16 */ + SSDFS_UPDATE_EXTENT, /* 0x17 */ + SSDFS_UPDATE_PRE_ALLOC_EXTENT, /* 0x18 */ + SSDFS_COMMIT_LOG_NOW, /* 0x19 */ + SSDFS_START_MIGRATION_NOW, /* 0x1A */ + SSDFS_EXTENT_WAS_INVALIDATED, /* 0x1B */ + SSDFS_UPDATE_CMD_MAX, /* 0x1C */ + SSDFS_BTREE_NODE_DIFF, /* 0x1D */ + SSDFS_USER_DATA_DIFF, /* 0x1E */ + SSDFS_DIFF_ON_WRITE_MAX, /* 0x1F */ + SSDFS_COPY_PAGE, /* 0x20 */ + SSDFS_COPY_PRE_ALLOC_PAGE, /* 0x21 */ + SSDFS_MIGRATE_RANGE, /* 0x22 */ + SSDFS_MIGRATE_PRE_ALLOC_PAGE, /* 0x23 */ + SSDFS_MIGRATE_FRAGMENT, /* 0x24 */ + SSDFS_COLLECT_GARBAGE_CMD_MAX, /* 0x25 */ + SSDFS_KNOWN_CMD_MAX, /* 0x26 */ +}; + +/* + * Request types + */ +enum { + SSDFS_UNKNOWN_REQ_TYPE, + SSDFS_REQ_SYNC, + SSDFS_REQ_ASYNC, + SSDFS_REQ_ASYNC_NO_FREE, + SSDFS_REQ_TYPE_MAX, +}; + +/* + * Request flags + */ +#define SSDFS_REQ_DONT_FREE_PAGES (1 << 0) +#define SSDFS_REQ_READ_ONLY_CACHE (1 << 1) +#define SSDFS_REQ_PREPARE_DIFF (1 << 2) +#define SSDFS_REQ_FLAGS_MASK 0x7 + +/* + * Result states + */ +enum { + SSDFS_UNKNOWN_REQ_RESULT, + SSDFS_REQ_CREATED, + SSDFS_REQ_STARTED, + SSDFS_REQ_FINISHED, + SSDFS_REQ_FAILED, + SSDFS_REQ_RESULT_MAX +}; + +/* + * struct ssdfs_logical_extent - logical extent descriptor + * @ino: inode identification number + * @logical_offset: logical offset from file's begin in bytes + * @data_bytes: valid bytes count in request + * @cno: checkpoint + * @parent_snapshot: parent snapshot + */ +struct ssdfs_logical_extent { + u64 ino; + u64 logical_offset; + u32 data_bytes; + u64 cno; + u64 parent_snapshot; +}; + +/* + * struct ssdfs_request_internal_data - private request data + * @class: request class + * @cmd: request command + * @type: request type + * @refs_count: reference counter + * @flags: request flags + * @wait_queue: queue for result waiting + */ +struct ssdfs_request_internal_data { + int class; + int cmd; + int type; + atomic_t refs_count; + u32 flags; + wait_queue_head_t wait_queue; +}; + +/* + * struct ssdfs_request_result - requst result + * @pvec: array of memory pages + * @old_state: array of memory pages with initial state + * @diffs: array of diffs + * @processed_blks: count of processed physical pages + * @state: result's state + * @wait: wait-for-completion of operation + * @err: code of error + */ +struct ssdfs_request_result { + struct pagevec pvec; + struct pagevec old_state; + struct pagevec diffs; + int processed_blks; + atomic_t state; + struct completion wait; + int err; +}; + +/* + * struct ssdfs_segment_request - segment I/O request + * @list: requests queue list + * @extent: logical extent descriptor + * @place: logical blocks placement in segment + * @private: internal data of request + * @result: request result description + */ +struct ssdfs_segment_request { + struct list_head list; + struct ssdfs_logical_extent extent; + struct ssdfs_volume_extent place; + struct ssdfs_request_internal_data private; + struct ssdfs_request_result result; +}; + +/* + * struct ssdfs_peb_phys_offset - PEB's physical offset + * @peb_index: PEB's index + * @peb_migration_id: identification number of PEB in migration sequence + * @peb_page: PEB's page index + * @log_area: identification number of log area + * @byte_offset: offset in bytes from area's beginning + */ +struct ssdfs_peb_phys_offset { + u16 peb_index; + u8 peb_migration_id; + u16 peb_page; + u8 log_area; + u32 byte_offset; +}; + +struct ssdfs_segment_info; + +/* + * struct ssdfs_seg2req_pair - segment/request pair + * @si: pointer on segment object + * @req: pointer on request object + */ +struct ssdfs_seg2req_pair { + struct ssdfs_segment_info *si; + struct ssdfs_segment_request *req; +}; + +/* + * Request's inline functions + */ + +/* + * ssdfs_request_prepare_logical_extent() - prepare logical extent + * @ino: inode id + * @logical_offset: logical offset in bytes from file's beginning + * @data_bytes: extent length in bytes + * @cno: checkpoint number + * @parent_snapshot: parent snapshot number + * @req: segment request [out] + */ +static inline +void ssdfs_request_prepare_logical_extent(u64 ino, + u64 logical_offset, + u32 data_bytes, + u64 cno, + u64 parent_snapshot, + struct ssdfs_segment_request *req) +{ + req->extent.ino = ino; + req->extent.logical_offset = logical_offset; + req->extent.data_bytes = data_bytes; + req->extent.cno = cno; + req->extent.parent_snapshot = parent_snapshot; +} + +/* + * ssdfs_request_prepare_internal_data() - prepare request's internal data + * @class: request class + * @cmd: request command + * @type: request type + * @req: segment request [out] + */ +static inline +void ssdfs_request_prepare_internal_data(int class, int cmd, int type, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); + BUG_ON(class <= SSDFS_UNKNOWN_REQ_CLASS || + class >= SSDFS_PEB_REQ_CLASS_MAX); + BUG_ON(cmd <= SSDFS_UNKNOWN_CMD || cmd >= SSDFS_KNOWN_CMD_MAX); + BUG_ON(type <= SSDFS_UNKNOWN_REQ_TYPE || + type >= SSDFS_REQ_TYPE_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + req->private.class = class; + req->private.cmd = cmd; + req->private.type = type; +} + +/* + * ssdfs_request_define_segment() - define segment number + * @seg_id: segment number + * @req: segment request [out] + */ +static inline +void ssdfs_request_define_segment(u64 seg_id, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); + BUG_ON(seg_id == U64_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + req->place.start.seg_id = seg_id; +} + +/* + * ssdfs_request_define_volume_extent() - define logical volume extent + * @start: starting logical block number + * @len: count of logical blocks in the extent + * @req: segment request [out] + */ +static inline +void ssdfs_request_define_volume_extent(u16 start, u16 len, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); + BUG_ON(start == U16_MAX); + BUG_ON(len == U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + req->place.start.blk_index = start; + req->place.len = len; +} + +/* + * has_request_been_executed() - check that reqeust has been executed + * @req: segment request + */ +static inline +bool has_request_been_executed(struct ssdfs_segment_request *req) +{ + bool has_been_executed = false; + + switch (atomic_read(&req->result.state)) { + case SSDFS_REQ_CREATED: + case SSDFS_REQ_STARTED: + has_been_executed = false; + break; + + case SSDFS_REQ_FINISHED: + case SSDFS_REQ_FAILED: + has_been_executed = true; + break; + + default: + SSDFS_ERR("invalid result's state %#x\n", + atomic_read(&req->result.state)); + has_been_executed = true; + } + + return has_been_executed; +} + +/* + * Request queue's API + */ +void ssdfs_requests_queue_init(struct ssdfs_requests_queue *rq); +bool is_ssdfs_requests_queue_empty(struct ssdfs_requests_queue *rq); +void ssdfs_requests_queue_add_tail(struct ssdfs_requests_queue *rq, + struct ssdfs_segment_request *req); +void ssdfs_requests_queue_add_tail_inc(struct ssdfs_fs_info *fsi, + struct ssdfs_requests_queue *rq, + struct ssdfs_segment_request *req); +void ssdfs_requests_queue_add_head(struct ssdfs_requests_queue *rq, + struct ssdfs_segment_request *req); +void ssdfs_requests_queue_add_head_inc(struct ssdfs_fs_info *fsi, + struct ssdfs_requests_queue *rq, + struct ssdfs_segment_request *req); +int ssdfs_requests_queue_remove_first(struct ssdfs_requests_queue *rq, + struct ssdfs_segment_request **req); +void ssdfs_requests_queue_remove_all(struct ssdfs_requests_queue *rq, + int err); + +/* + * Request's API + */ +void ssdfs_zero_seg_req_obj_cache_ptr(void); +int ssdfs_init_seg_req_obj_cache(void); +void ssdfs_shrink_seg_req_obj_cache(void); +void ssdfs_destroy_seg_req_obj_cache(void); + +struct ssdfs_segment_request *ssdfs_request_alloc(void); +void ssdfs_request_free(struct ssdfs_segment_request *req); +void ssdfs_request_init(struct ssdfs_segment_request *req); +void ssdfs_get_request(struct ssdfs_segment_request *req); +void ssdfs_put_request(struct ssdfs_segment_request *req); +int ssdfs_request_add_page(struct page *page, + struct ssdfs_segment_request *req); +int ssdfs_request_add_diff_page(struct page *page, + struct ssdfs_segment_request *req); +struct page * +ssdfs_request_allocate_and_add_page(struct ssdfs_segment_request *req); +struct page * +ssdfs_request_allocate_and_add_diff_page(struct ssdfs_segment_request *req); +struct page * +ssdfs_request_allocate_and_add_old_state_page(struct ssdfs_segment_request *req); +struct page * +ssdfs_request_allocate_locked_page(struct ssdfs_segment_request *req, + int page_index); +struct page * +ssdfs_request_allocate_locked_diff_page(struct ssdfs_segment_request *req, + int page_index); +int ssdfs_request_add_allocated_page_locked(struct ssdfs_segment_request *req); +int ssdfs_request_add_allocated_diff_locked(struct ssdfs_segment_request *req); +int ssdfs_request_add_old_state_page_locked(struct ssdfs_segment_request *req); +void ssdfs_request_unlock_and_remove_page(struct ssdfs_segment_request *req, + int page_index); +void ssdfs_request_unlock_and_remove_pages(struct ssdfs_segment_request *req); +void ssdfs_request_unlock_and_remove_update(struct ssdfs_segment_request *req); +void ssdfs_request_unlock_and_remove_diffs(struct ssdfs_segment_request *req); +void ssdfs_request_unlock_and_remove_old_state(struct ssdfs_segment_request *req); +int ssdfs_request_switch_update_on_diff(struct ssdfs_fs_info *fsi, + struct page *diff_page, + struct ssdfs_segment_request *req); +void ssdfs_free_flush_request_pages(struct ssdfs_segment_request *req); +u8 ssdfs_peb_extent_length(struct ssdfs_segment_info *si, + struct pagevec *pvec); + +#endif /* _SSDFS_REQUEST_QUEUE_H */ From patchwork Sat Feb 25 01:08:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151922 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDE19C64ED8 for ; Sat, 25 Feb 2023 01:16:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229671AbjBYBQt (ORCPT ); Fri, 24 Feb 2023 20:16:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229506AbjBYBQ2 (ORCPT ); Fri, 24 Feb 2023 20:16:28 -0500 Received: from mail-oi1-x22d.google.com (mail-oi1-x22d.google.com [IPv6:2607:f8b0:4864:20::22d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 186FE126FB for ; Fri, 24 Feb 2023 17:16:12 -0800 (PST) Received: by mail-oi1-x22d.google.com with SMTP id e21so828566oie.1 for ; Fri, 24 Feb 2023 17:16:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=6V7o+mnWJmceniHH5yuCWtrowlXjF5Z0w+XxdNBjCjo=; b=ZhxW5qSNl8eHoQQMScxomqpw/7HBvZAN9/lHiT03uCtL70l8ommFsui9UTjM6p0NIA WIuSwbzaNA6H2wafC1KY1cH2w2A3hXWFI0Nd+woF3ZS+YdYipQl1NtB1Y18dJti5rZzO ucaUj7F1EkFhdvUJEvYahAkShHqTEWSTPoypDHCt1NZxweT56QLvg67OlVdG8TuM42S2 i74fvJ63hJk5CZrxhVKDlQYjg1NK8WiJzGtpRyJBc0bRRy96NwWhPHvzQOJjcORH1ZiT bov8UzH6ZSpkp322vpulPOaeyHpoMq1+pMBnUPc7SGrm8T2keYrT7KsQYUQlu2ft2VQ+ 5oZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6V7o+mnWJmceniHH5yuCWtrowlXjF5Z0w+XxdNBjCjo=; b=b+a/Nq9DM3EfSKP2daX1bJ4SaYe9gD1h2vxXgSoGLyDr8JxEyp6/nSowYgIEQ5Mp7l KuNk1Krbkkh6Xd3/yYBJp0XH3/0o8cu1Rt4+tY/n6tvujd0JGIhI74nzh7YqpM+59asO bIldtRxJhf89WwIh0+edGKcYbgxsRnOk87PJa02/91C9n76Vk0mmdhFAN+5CiMpVUZmD 7oNeSiIVv0cdpYPfdbVAyzZqGhpVVz2EFpXnSVaHR0ZL+eRhvrlAhNH//vkHOOUAcD6S Q8iZnEYJG422cCrYGe5ZMQxrQJrkQUT3B2nVf3dPt00iFDi9esx/WEuz0jndDATtW7uu g8PA== X-Gm-Message-State: AO0yUKXgVyBEdXmNwa6BoXAgIgXDWXRMHsmw1zeBzRgN9dQUN/q8Z6F2 fSTQjOPYij7m9PE8ksFV1bgwK863NE20AJOk X-Google-Smtp-Source: AK7set/P4Y8bOwKTr1ylLu8RFlvKnH8h7adLRo66uLg6l15bex4rWEaaZSYQleKcsOL7BTt26qUdrg== X-Received: by 2002:aca:1215:0:b0:384:9f4:dd3c with SMTP id 21-20020aca1215000000b0038409f4dd3cmr1205399ois.38.1677287770667; Fri, 24 Feb 2023 17:16:10 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:09 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 17/76] ssdfs: introduce offset translation table Date: Fri, 24 Feb 2023 17:08:28 -0800 Message-Id: <20230225010927.813929-18-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org One of the key goal of SSDFS is to decrease the write amplification factor. Logical extent concept is the fundamental technique to achieve the goal. Logical extent describes any volume extent on the basis of segment ID, logical block ID, and length. Migration scheme guarantee that segment ID will be always the same for any logical extent. Offset translation table provides the way to convert logical block ID into offset inside of a log of particular "Physical" Erase Block (PEB). As a result, extents b-tree never needs to be updated because logical extent will never change until intentional movement from one segment into another one. Offset translation table is a metadata structure that is stored as metadata in every log. The responsibility of offset translation table keeps the knowledge which particular logical blocks are stored into log's payload and which offset in the payload should be used to access and retrieve the content of logical block. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/offset_translation_table.c | 2914 +++++++++++++++++++++++++++ fs/ssdfs/offset_translation_table.h | 446 ++++ 2 files changed, 3360 insertions(+) create mode 100644 fs/ssdfs/offset_translation_table.c create mode 100644 fs/ssdfs/offset_translation_table.h diff --git a/fs/ssdfs/offset_translation_table.c b/fs/ssdfs/offset_translation_table.c new file mode 100644 index 000000000000..169f8106c5be --- /dev/null +++ b/fs/ssdfs/offset_translation_table.c @@ -0,0 +1,2914 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/offset_translation_table.c - offset translation table functionality. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates. + * https://www.bytedance.com/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + * Cong Wang + */ + +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "offset_translation_table.h" +#include "page_array.h" +#include "page_vector.h" +#include "peb.h" +#include "peb_container.h" +#include "segment_bitmap.h" +#include "segment.h" + +#include + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_blk2off_page_leaks; +atomic64_t ssdfs_blk2off_memory_leaks; +atomic64_t ssdfs_blk2off_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_blk2off_cache_leaks_increment(void *kaddr) + * void ssdfs_blk2off_cache_leaks_decrement(void *kaddr) + * void *ssdfs_blk2off_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_blk2off_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_blk2off_kvzalloc(size_t size, gfp_t flags) + * void *ssdfs_blk2off_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_blk2off_kfree(void *kaddr) + * void ssdfs_blk2off_kvfree(void *kaddr) + * struct page *ssdfs_blk2off_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_blk2off_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_blk2off_free_page(struct page *page) + * void ssdfs_blk2off_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(blk2off) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(blk2off) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_blk2off_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_blk2off_page_leaks, 0); + atomic64_set(&ssdfs_blk2off_memory_leaks, 0); + atomic64_set(&ssdfs_blk2off_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_blk2off_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_blk2off_page_leaks) != 0) { + SSDFS_ERR("BLK2OFF TABLE: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_blk2off_page_leaks)); + } + + if (atomic64_read(&ssdfs_blk2off_memory_leaks) != 0) { + SSDFS_ERR("BLK2OFF TABLE: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_blk2off_memory_leaks)); + } + + if (atomic64_read(&ssdfs_blk2off_cache_leaks) != 0) { + SSDFS_ERR("BLK2OFF TABLE: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_blk2off_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +/****************************************************************************** + * BLK2OFF TABLE CACHE * + ******************************************************************************/ + +static struct kmem_cache *ssdfs_blk2off_frag_obj_cachep; + +void ssdfs_zero_blk2off_frag_obj_cache_ptr(void) +{ + ssdfs_blk2off_frag_obj_cachep = NULL; +} + +static void ssdfs_init_blk2off_frag_object_once(void *obj) +{ + struct ssdfs_phys_offset_table_fragment *frag_obj = obj; + + memset(frag_obj, 0, sizeof(struct ssdfs_phys_offset_table_fragment)); +} + +void ssdfs_shrink_blk2off_frag_obj_cache(void) +{ + if (ssdfs_blk2off_frag_obj_cachep) + kmem_cache_shrink(ssdfs_blk2off_frag_obj_cachep); +} + +void ssdfs_destroy_blk2off_frag_obj_cache(void) +{ + if (ssdfs_blk2off_frag_obj_cachep) + kmem_cache_destroy(ssdfs_blk2off_frag_obj_cachep); +} + +int ssdfs_init_blk2off_frag_obj_cache(void) +{ + size_t obj_size = sizeof(struct ssdfs_phys_offset_table_fragment); + + ssdfs_blk2off_frag_obj_cachep = + kmem_cache_create("ssdfs_blk2off_frag_obj_cache", + obj_size, 0, + SLAB_RECLAIM_ACCOUNT | + SLAB_MEM_SPREAD | + SLAB_ACCOUNT, + ssdfs_init_blk2off_frag_object_once); + if (!ssdfs_blk2off_frag_obj_cachep) { + SSDFS_ERR("unable to create blk2off fragments cache\n"); + return -ENOMEM; + } + + return 0; +} + +/* + * ssdfs_blk2off_frag_alloc() - allocate memory for blk2off fragment + */ +static +struct ssdfs_phys_offset_table_fragment *ssdfs_blk2off_frag_alloc(void) +{ + struct ssdfs_phys_offset_table_fragment *ptr; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ssdfs_blk2off_frag_obj_cachep); +#endif /* CONFIG_SSDFS_DEBUG */ + + ptr = kmem_cache_alloc(ssdfs_blk2off_frag_obj_cachep, GFP_KERNEL); + if (!ptr) { + SSDFS_ERR("fail to allocate memory for blk2off fragment\n"); + return ERR_PTR(-ENOMEM); + } + + ssdfs_blk2off_cache_leaks_increment(ptr); + + return ptr; +} + +/* + * ssdfs_blk2off_frag_free() - free memory for blk2off fragment + */ +static +void ssdfs_blk2off_frag_free(void *ptr) +{ + struct ssdfs_phys_offset_table_fragment *frag; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ssdfs_blk2off_frag_obj_cachep); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ptr) + return; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ptr %p\n", ptr); +#endif /* CONFIG_SSDFS_DEBUG */ + + frag = (struct ssdfs_phys_offset_table_fragment *)ptr; + + WARN_ON(atomic_read(&frag->state) == SSDFS_BLK2OFF_FRAG_DIRTY); + + if (frag->buf) { + ssdfs_blk2off_kfree(frag->buf); + frag->buf = NULL; + } + + ssdfs_blk2off_cache_leaks_decrement(frag); + kmem_cache_free(ssdfs_blk2off_frag_obj_cachep, frag); +} + +/****************************************************************************** + * BLK2OFF TABLE OBJECT FUNCTIONALITY * + ******************************************************************************/ + +/* + * struct ssdfs_blk2off_init - initialization environment + * @table: pointer on translation table object + * @blk2off_pvec: blk2off table fragment + * @blk_desc_pvec: blk desc table fragment + * @peb_index: PEB's index + * @cno: checkpoint + * @fragments_count: count of fragments in portion + * @capacity: maximum amount of items + * @tbl_hdr: portion header + * @tbl_hdr_off: portion header's offset + * @pot_hdr: fragment header + * @pot_hdr_off: fragment header's offset + * @bmap: temporary bitmap + * @bmap_bytes: bytes in temporaray bitmap + * @extent_array: translation extents temporary array + * @extents_count: count of extents in array + */ +struct ssdfs_blk2off_init { + struct ssdfs_blk2off_table *table; + struct pagevec *blk2off_pvec; + struct pagevec *blk_desc_pvec; + u16 peb_index; + u64 cno; + u32 fragments_count; + u16 capacity; + + struct ssdfs_blk2off_table_header tbl_hdr; + u32 tbl_hdr_off; + struct ssdfs_phys_offset_table_header pot_hdr; + u32 pot_hdr_off; + + unsigned long *bmap; + u32 bmap_bytes; + + struct ssdfs_translation_extent *extent_array; + u16 extents_count; +}; + +static +void ssdfs_debug_blk2off_table_object(struct ssdfs_blk2off_table *tbl); + +/* + * ssdfs_blk2off_table_init_fragment() - init PEB's fragment + * @ptr: fragment pointer + * @sequence_id: fragment's sequence ID + * @start_id: fragment's start ID + * @pages_per_peb: PEB's pages count + * @state: fragment state after initialization + * @buf_size: pointer on buffer size + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - fail to allocate memory. + */ +static int +ssdfs_blk2off_table_init_fragment(struct ssdfs_phys_offset_table_fragment *ptr, + u16 sequence_id, u16 start_id, + u32 pages_per_peb, int state, + size_t *buf_size) +{ + size_t blk2off_tbl_hdr_size = sizeof(struct ssdfs_blk2off_table_header); + size_t hdr_size = sizeof(struct ssdfs_phys_offset_table_header); + size_t off_size = sizeof(struct ssdfs_phys_offset_descriptor); + size_t fragment_size = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ptr %p, sequence_id %u, start_id %u, " + "pages_per_peb %u, state %#x, buf_size %p\n", + ptr, sequence_id, start_id, pages_per_peb, + state, buf_size); + + BUG_ON(!ptr); + BUG_ON(sequence_id > SSDFS_BLK2OFF_TBL_REVERT_THRESHOLD); + BUG_ON(state < SSDFS_BLK2OFF_FRAG_CREATED || + state >= SSDFS_BLK2OFF_FRAG_STATE_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + init_rwsem(&ptr->lock); + + down_write(&ptr->lock); + + if (buf_size) { + fragment_size = min_t(size_t, *buf_size, PAGE_SIZE); + } else { + fragment_size += blk2off_tbl_hdr_size; + fragment_size += hdr_size + (off_size * pages_per_peb); + fragment_size = min_t(size_t, fragment_size, PAGE_SIZE); + } + + ptr->buf_size = fragment_size; + ptr->buf = ssdfs_blk2off_kzalloc(ptr->buf_size, GFP_KERNEL); + if (!ptr->buf) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate table buffer\n"); + goto finish_fragment_init; + } + + ptr->start_id = start_id; + ptr->sequence_id = sequence_id; + atomic_set(&ptr->id_count, 0); + + ptr->hdr = SSDFS_POFFTH(ptr->buf); + ptr->phys_offs = SSDFS_PHYSOFFD(ptr->buf + hdr_size); + + atomic_set(&ptr->state, state); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("FRAGMENT: sequence_id %u, start_id %u, id_count %d\n", + sequence_id, start_id, atomic_read(&ptr->id_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_fragment_init: + up_write(&ptr->lock); + return err; +} + +/* + * ssdfs_get_migrating_block() - get pointer on migrating block + * @table: pointer on translation table object + * @logical_blk: logical block ID + * @need_allocate: should descriptor being allocated? + * + * This method tries to return pointer on migrating block's + * descriptor. In the case of necessity the descriptor + * will be allocated (if @need_allocate is true). + * + * RETURN: + * [success] - pointer on migrating block's descriptor. + * [failure] - error code: + * + * %-EINVAL - invalid value. + * %-ENOMEM - fail to allocate memory. + */ +static +struct ssdfs_migrating_block * +ssdfs_get_migrating_block(struct ssdfs_blk2off_table *table, + u16 logical_blk, + bool need_allocate) +{ + struct ssdfs_migrating_block *migrating_blk = NULL; + void *kaddr; + size_t blk_desc_size = sizeof(struct ssdfs_migrating_block); + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table); + BUG_ON(logical_blk >= table->lblk2off_capacity); + + SSDFS_DBG("logical_blk %u, need_allocate %#x\n", + logical_blk, need_allocate); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (need_allocate) { + migrating_blk = ssdfs_blk2off_kzalloc(blk_desc_size, + GFP_KERNEL); + if (!migrating_blk) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate migrating block desc\n"); + goto fail_get_migrating_blk; + } + + err = ssdfs_dynamic_array_set(&table->migrating_blks, + logical_blk, &migrating_blk); + if (unlikely(err)) { + ssdfs_blk2off_kfree(migrating_blk); + SSDFS_ERR("fail to store migrating block in array: " + "logical_blk %u, err %d\n", + logical_blk, err); + goto fail_get_migrating_blk; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u descriptor has been allocated\n", + logical_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + kaddr = ssdfs_dynamic_array_get_locked(&table->migrating_blks, + logical_blk); + if (IS_ERR_OR_NULL(kaddr)) { + err = (kaddr == NULL ? -ENOENT : PTR_ERR(kaddr)); + SSDFS_ERR("fail to get migrating block: " + "logical_blk %u, err %d\n", + logical_blk, err); + goto fail_get_migrating_blk; + } + + migrating_blk = SSDFS_MIGRATING_BLK(*(u8 **)kaddr); + + err = ssdfs_dynamic_array_release(&table->migrating_blks, + logical_blk, kaddr); + if (unlikely(err)) { + SSDFS_ERR("fail to release: " + "logical_blk %u, err %d\n", + logical_blk, err); + goto fail_get_migrating_blk; + } + } + + if (migrating_blk) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u, state %#x\n", + logical_blk, migrating_blk->state); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return migrating_blk; + +fail_get_migrating_blk: + return ERR_PTR(err); +} + +/* + * ssdfs_destroy_migrating_blocks_array() - destroy descriptors array + * @table: pointer on translation table object + * + * This method tries to free memory of migrating block + * descriptors array. + */ +static +void ssdfs_destroy_migrating_blocks_array(struct ssdfs_blk2off_table *table) +{ + struct ssdfs_migrating_block *migrating_blk = NULL; + void *kaddr; + u32 items_count; + u32 i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table); +#endif /* CONFIG_SSDFS_DEBUG */ + + items_count = table->last_allocated_blk + 1; + + for (i = 0; i < items_count; i++) { + kaddr = ssdfs_dynamic_array_get_locked(&table->migrating_blks, + i); + if (IS_ERR_OR_NULL(kaddr)) + continue; + + migrating_blk = SSDFS_MIGRATING_BLK(*(u8 **)kaddr); + + if (migrating_blk) + ssdfs_blk2off_kfree(migrating_blk); + + ssdfs_dynamic_array_release(&table->migrating_blks, + i, kaddr); + } + + ssdfs_dynamic_array_destroy(&table->migrating_blks); +} + +/* + * ssdfs_blk2off_table_create() - create translation table object + * @fsi: pointer on shared file system object + * @items_count: table's capacity + * @type: table's type + * @state: initial state of object + * + * This method tries to create translation table object. + * + * RETURN: + * [success] - pointer on created object. + * [failure] - error code: + * + * %-EINVAL - invalid value. + * %-ENOMEM - fail to allocate memory. + */ +struct ssdfs_blk2off_table * +ssdfs_blk2off_table_create(struct ssdfs_fs_info *fsi, + u16 items_count, u8 type, + int state) +{ + struct ssdfs_blk2off_table *ptr; + size_t table_size = sizeof(struct ssdfs_blk2off_table); + size_t off_pos_size = sizeof(struct ssdfs_offset_position); + size_t ptr_size = sizeof(struct ssdfs_migrating_block *); + u32 bytes; + u32 bits_count; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + BUG_ON(state <= SSDFS_BLK2OFF_OBJECT_UNKNOWN || + state >= SSDFS_BLK2OFF_OBJECT_STATE_MAX); + BUG_ON(items_count > (2 * fsi->pages_per_seg)); + BUG_ON(type <= SSDFS_UNKNOWN_OFF_TABLE_TYPE || + type >= SSDFS_OFF_TABLE_MAX_TYPE); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("fsi %p, items_count %u, type %u, state %#x\n", + fsi, items_count, type, state); +#else + SSDFS_DBG("fsi %p, items_count %u, type %u, state %#x\n", + fsi, items_count, type, state); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ptr = (struct ssdfs_blk2off_table *)ssdfs_blk2off_kzalloc(table_size, + GFP_KERNEL); + if (!ptr) { + SSDFS_ERR("fail to allocate translation table\n"); + return ERR_PTR(-ENOMEM); + } + + ptr->fsi = fsi; + + atomic_set(&ptr->flags, 0); + atomic_set(&ptr->state, SSDFS_BLK2OFF_OBJECT_UNKNOWN); + + ptr->pages_per_peb = fsi->pages_per_peb; + ptr->pages_per_seg = fsi->pages_per_seg; + ptr->type = type; + + init_rwsem(&ptr->translation_lock); + init_waitqueue_head(&ptr->wait_queue); + + ptr->init_cno = U64_MAX; + ptr->used_logical_blks = 0; + ptr->free_logical_blks = items_count; + ptr->last_allocated_blk = U16_MAX; + + bytes = ssdfs_blk2off_table_bmap_bytes(items_count); + bytes = min_t(u32, bytes, PAGE_SIZE); + bits_count = bytes * BITS_PER_BYTE; + + ptr->lbmap.bits_count = bits_count; + ptr->lbmap.bytes_count = bytes; + + for (i = 0; i < SSDFS_LBMAP_ARRAY_MAX; i++) { + ptr->lbmap.array[i] = + (unsigned long *)ssdfs_blk2off_kvzalloc(bytes, + GFP_KERNEL); + if (!ptr->lbmap.array[i]) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate bitmaps\n"); + goto free_bmap; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("init_bmap %lx, state_bmap %lx, modification_bmap %lx\n", + *ptr->lbmap.array[SSDFS_LBMAP_INIT_INDEX], + *ptr->lbmap.array[SSDFS_LBMAP_STATE_INDEX], + *ptr->lbmap.array[SSDFS_LBMAP_MODIFICATION_INDEX]); +#endif /* CONFIG_SSDFS_DEBUG */ + + ptr->lblk2off_capacity = items_count; + + err = ssdfs_dynamic_array_create(&ptr->lblk2off, + ptr->lblk2off_capacity, + off_pos_size, + 0xFF); + if (unlikely(err)) { + SSDFS_ERR("fail to create translation array: " + "off_pos_size %zu, items_count %u\n", + off_pos_size, + ptr->lblk2off_capacity); + goto free_bmap; + } + + err = ssdfs_dynamic_array_create(&ptr->migrating_blks, + ptr->lblk2off_capacity, + ptr_size, + 0); + if (unlikely(err)) { + SSDFS_ERR("fail to create migrating blocks array: " + "ptr_size %zu, items_count %u\n", + ptr_size, + ptr->lblk2off_capacity); + goto free_bmap; + } + + ptr->pebs_count = fsi->pebs_per_seg; + + ptr->peb = ssdfs_blk2off_kcalloc(ptr->pebs_count, + sizeof(struct ssdfs_phys_offset_table_array), + GFP_KERNEL); + if (!ptr->peb) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate phys offsets array\n"); + goto free_translation_array; + } + + for (i = 0; i < ptr->pebs_count; i++) { + struct ssdfs_phys_offset_table_array *table = &ptr->peb[i]; + struct ssdfs_sequence_array *seq_ptr = NULL; + u32 threshold = SSDFS_BLK2OFF_TBL_REVERT_THRESHOLD; + + seq_ptr = ssdfs_create_sequence_array(threshold); + if (IS_ERR_OR_NULL(seq_ptr)) { + err = (seq_ptr == NULL ? -ENOMEM : PTR_ERR(seq_ptr)); + SSDFS_ERR("fail to allocate sequence: " + "err %d\n", err); + goto free_phys_offs_array; + } else + table->sequence = seq_ptr; + + if (state == SSDFS_BLK2OFF_OBJECT_COMPLETE_INIT) { + struct ssdfs_phys_offset_table_fragment *fragment; + u16 start_id = i * fsi->pages_per_peb; + u32 pages_per_peb = fsi->pages_per_peb; + int fragment_state = SSDFS_BLK2OFF_FRAG_INITIALIZED; + + atomic_set(&table->fragment_count, 1); + + fragment = ssdfs_blk2off_frag_alloc(); + if (IS_ERR_OR_NULL(fragment)) { + err = (fragment == NULL ? -ENOMEM : + PTR_ERR(fragment)); + SSDFS_ERR("fail to allocate fragment: " + "err %d\n", err); + goto free_phys_offs_array; + } + + err = ssdfs_sequence_array_init_item(table->sequence, + 0, fragment); + if (unlikely(err)) { + ssdfs_blk2off_frag_free(fragment); + SSDFS_ERR("fail to init fragment: " + "err %d\n", err); + goto free_phys_offs_array; + } + + err = ssdfs_blk2off_table_init_fragment(fragment, 0, + start_id, + pages_per_peb, + fragment_state, + NULL); + if (unlikely(err)) { + SSDFS_ERR("fail to init fragment: " + "fragment_index %d, err %d\n", + i, err); + goto free_phys_offs_array; + } + + atomic_set(&table->state, + SSDFS_BLK2OFF_TABLE_COMPLETE_INIT); + } else if (state == SSDFS_BLK2OFF_OBJECT_CREATED) { + atomic_set(&table->fragment_count, 0); + atomic_set(&table->state, + SSDFS_BLK2OFF_TABLE_CREATED); + } else + BUG(); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("init_bmap %lx, state_bmap %lx, modification_bmap %lx\n", + *ptr->lbmap.array[SSDFS_LBMAP_INIT_INDEX], + *ptr->lbmap.array[SSDFS_LBMAP_STATE_INDEX], + *ptr->lbmap.array[SSDFS_LBMAP_MODIFICATION_INDEX]); +#endif /* CONFIG_SSDFS_DEBUG */ + + init_completion(&ptr->partial_init_end); + init_completion(&ptr->full_init_end); + + atomic_set(&ptr->state, state); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return ptr; + +free_phys_offs_array: + for (i = 0; i < ptr->pebs_count; i++) { + struct ssdfs_sequence_array *sequence; + + sequence = ptr->peb[i].sequence; + ssdfs_destroy_sequence_array(sequence, ssdfs_blk2off_frag_free); + ptr->peb[i].sequence = NULL; + } + + ssdfs_blk2off_kfree(ptr->peb); + +free_translation_array: + ssdfs_dynamic_array_destroy(&ptr->lblk2off); + +free_bmap: + for (i = 0; i < SSDFS_LBMAP_ARRAY_MAX; i++) { + ssdfs_blk2off_kvfree(ptr->lbmap.array[i]); + ptr->lbmap.array[i] = NULL; + } + + ptr->lbmap.bits_count = 0; + ptr->lbmap.bytes_count = 0; + + ssdfs_blk2off_kfree(ptr); + + return ERR_PTR(err); +} + +/* + * ssdfs_blk2off_table_destroy() - destroy translation table object + * @table: pointer on translation table object + */ +void ssdfs_blk2off_table_destroy(struct ssdfs_blk2off_table *table) +{ +#ifdef CONFIG_SSDFS_DEBUG + int migrating_blks = -1; +#endif /* CONFIG_SSDFS_DEBUG */ + int state; + int i; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("table %p\n", table); +#else + SSDFS_DBG("table %p\n", table); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (!table) { + WARN_ON(!table); + return; + } + + if (table->peb) { + for (i = 0; i < table->pebs_count; i++) { + struct ssdfs_sequence_array *sequence; + + sequence = table->peb[i].sequence; + ssdfs_destroy_sequence_array(sequence, + ssdfs_blk2off_frag_free); + table->peb[i].sequence = NULL; + + state = atomic_read(&table->peb[i].state); + + switch (state) { + case SSDFS_BLK2OFF_TABLE_DIRTY: + case SSDFS_BLK2OFF_TABLE_DIRTY_PARTIAL_INIT: + SSDFS_WARN("unexpected table state %#x\n", + state); + break; + + default: + /* do nothing */ + break; + } + } + + ssdfs_blk2off_kfree(table->peb); + table->peb = NULL; + } + +#ifdef CONFIG_SSDFS_DEBUG + if (table->last_allocated_blk >= U16_MAX) + migrating_blks = 0; + else + migrating_blks = table->last_allocated_blk + 1; + + for (i = 0; i < migrating_blks; i++) { + struct ssdfs_migrating_block *blk = + ssdfs_get_migrating_block(table, i, false); + + if (IS_ERR_OR_NULL(blk)) + continue; + + switch (blk->state) { + case SSDFS_LBLOCK_UNDER_MIGRATION: + case SSDFS_LBLOCK_UNDER_COMMIT: + SSDFS_ERR("logical blk %d is under migration\n", i); + ssdfs_blk2off_pagevec_release(&blk->pvec); + break; + } + } +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_dynamic_array_destroy(&table->lblk2off); + + ssdfs_destroy_migrating_blocks_array(table); + + for (i = 0; i < SSDFS_LBMAP_ARRAY_MAX; i++) { + ssdfs_blk2off_kvfree(table->lbmap.array[i]); + table->lbmap.array[i] = NULL; + } + + table->lbmap.bits_count = 0; + table->lbmap.bytes_count = 0; + + ssdfs_blk2off_kfree(table); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ +} + +/* + * ssdfs_blk2off_table_resize_bitmap_array() - resize bitmap array + * @lbmap: bitmap pointer + * @logical_blk: new threshold + */ +static inline +int ssdfs_blk2off_table_resize_bitmap_array(struct ssdfs_bitmap_array *lbmap, + u16 logical_blk) +{ + unsigned long *bmap_ptr; + u32 new_bits_count; + u32 new_bytes_count; + u32 bits_per_page; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!lbmap); + + SSDFS_DBG("lbmap %p, logical_blk %u\n", + lbmap, logical_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + bits_per_page = PAGE_SIZE * BITS_PER_BYTE; + + new_bits_count = logical_blk + bits_per_page - 1; + new_bits_count /= bits_per_page; + new_bits_count *= bits_per_page; + + new_bytes_count = ssdfs_blk2off_table_bmap_bytes(new_bits_count); + + for (i = 0; i < SSDFS_LBMAP_ARRAY_MAX; i++) { + bmap_ptr = kvrealloc(lbmap->array[i], + lbmap->bytes_count, + new_bytes_count, + GFP_KERNEL | __GFP_ZERO); + if (!bmap_ptr) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate bitmaps\n"); + goto finish_bitmap_array_resize; + } else + lbmap->array[i] = (unsigned long *)bmap_ptr; + } + + lbmap->bits_count = new_bits_count; + lbmap->bytes_count = new_bytes_count; + +finish_bitmap_array_resize: + return err; +} + +/* + * ssdfs_blk2off_table_bmap_set() - set bit for logical block + * @lbmap: bitmap pointer + * @bitmap_index: index of bitmap + * @logical_blk: logical block number + */ +static inline +int ssdfs_blk2off_table_bmap_set(struct ssdfs_bitmap_array *lbmap, + int bitmap_index, u16 logical_blk) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!lbmap); + + SSDFS_DBG("lbmap %p, bitmap_index %d, logical_blk %u\n", + lbmap, bitmap_index, logical_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (bitmap_index >= SSDFS_LBMAP_ARRAY_MAX) { + SSDFS_ERR("invalid bitmap index %d\n", + bitmap_index); + return -EINVAL; + } + + if (logical_blk >= lbmap->bits_count) { + err = ssdfs_blk2off_table_resize_bitmap_array(lbmap, + logical_blk); + if (unlikely(err)) { + SSDFS_ERR("fail to realloc bitmap array: " + "logical_blk %u, err %d\n", + logical_blk, err); + return err; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!lbmap->array[bitmap_index]); +#endif /* CONFIG_SSDFS_DEBUG */ + + bitmap_set(lbmap->array[bitmap_index], logical_blk, 1); + + return 0; +} + +/* + * ssdfs_blk2off_table_bmap_clear() - clear bit for logical block + * @lbmap: bitmap pointer + * @bitmap_index: index of bitmap + * @logical_blk: logical block number + */ +static inline +int ssdfs_blk2off_table_bmap_clear(struct ssdfs_bitmap_array *lbmap, + int bitmap_index, u16 logical_blk) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!lbmap); + + SSDFS_DBG("lbmap %p, bitmap_index %d, logical_blk %u\n", + lbmap, bitmap_index, logical_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (bitmap_index >= SSDFS_LBMAP_ARRAY_MAX) { + SSDFS_ERR("invalid bitmap index %d\n", + bitmap_index); + return -EINVAL; + } + + if (logical_blk >= lbmap->bits_count) { + err = ssdfs_blk2off_table_resize_bitmap_array(lbmap, + logical_blk); + if (unlikely(err)) { + SSDFS_ERR("fail to realloc bitmap array: " + "logical_blk %u, err %d\n", + logical_blk, err); + return err; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!lbmap->array[bitmap_index]); +#endif /* CONFIG_SSDFS_DEBUG */ + + bitmap_clear(lbmap->array[bitmap_index], logical_blk, 1); + + return 0; +} + +/* + * ssdfs_blk2off_table_bmap_vacant() - check bit for logical block + * @lbmap: bitmap pointer + * @bitmap_index: index of bitmap + * @lbmap_bits: count of bits in bitmap + * @logical_blk: logical block number + */ +static inline +bool ssdfs_blk2off_table_bmap_vacant(struct ssdfs_bitmap_array *lbmap, + int bitmap_index, + u16 lbmap_bits, + u16 logical_blk) +{ + unsigned long found; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!lbmap); + + SSDFS_DBG("lbmap %p, bitmap_index %d, " + "lbmap_bits %u, logical_blk %u\n", + lbmap, bitmap_index, + lbmap_bits, logical_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (bitmap_index >= SSDFS_LBMAP_ARRAY_MAX) { + SSDFS_ERR("invalid bitmap index %d\n", + bitmap_index); + return false; + } + + if (logical_blk >= lbmap->bits_count) + return true; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!lbmap->array[bitmap_index]); +#endif /* CONFIG_SSDFS_DEBUG */ + + found = find_next_zero_bit(lbmap->array[bitmap_index], + lbmap_bits, logical_blk); + + return found == logical_blk; +} + +/* + * ssdfs_blk2off_table_extent_vacant() - check extent vacancy + * @lbmap: bitmap pointer + * @bitmap_index: index of bitmap + * @lbmap_bits: count of bits in bitmap + * @extent: pointer on extent + */ +static inline +bool ssdfs_blk2off_table_extent_vacant(struct ssdfs_bitmap_array *lbmap, + int bitmap_index, + u16 lbmap_bits, + struct ssdfs_blk2off_range *extent) +{ + unsigned long start, end; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!lbmap || !extent); + + SSDFS_DBG("lbmap %p, bitmap_index %d, " + "lbmap_bits %u, extent (start %u, len %u)\n", + lbmap, bitmap_index, lbmap_bits, + extent->start_lblk, extent->len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (bitmap_index >= SSDFS_LBMAP_ARRAY_MAX) { + SSDFS_ERR("invalid bitmap index %d\n", + bitmap_index); + return false; + } + + if (extent->start_lblk >= lbmap_bits) { + SSDFS_ERR("invalid extent start %u\n", + extent->start_lblk); + return false; + } + + if (extent->len == 0 || extent->len >= U16_MAX) { + SSDFS_ERR("invalid extent length\n"); + return false; + } + + if (extent->start_lblk >= lbmap->bits_count) + return true; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!lbmap->array[bitmap_index]); +#endif /* CONFIG_SSDFS_DEBUG */ + + start = find_next_zero_bit(lbmap->array[bitmap_index], + lbmap_bits, extent->start_lblk); + + if (start != extent->start_lblk) + return false; + else if (extent->len == 1) + return true; + + end = find_next_bit(lbmap->array[bitmap_index], lbmap_bits, start); + + if ((end - start) == extent->len) + return true; + + return false; +} + +/* + * is_ssdfs_table_header_magic_valid() - check segment header's magic + * @hdr: table header + */ +bool is_ssdfs_table_header_magic_valid(struct ssdfs_blk2off_table_header *hdr) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!hdr); +#endif /* CONFIG_SSDFS_DEBUG */ + + return le16_to_cpu(hdr->magic.key) == SSDFS_BLK2OFF_TABLE_HDR_MAGIC; +} + +/* + * ssdfs_check_table_header() - check table header + * @hdr: table header + * @size: size of header + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - header is invalid. + */ +static +int ssdfs_check_table_header(struct ssdfs_blk2off_table_header *hdr, + size_t size) +{ + u16 extents_off = offsetof(struct ssdfs_blk2off_table_header, + sequence); + size_t extent_size = sizeof(struct ssdfs_translation_extent); + size_t extent_area; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!hdr); + + SSDFS_DBG("hdr %p, size %zu\n", hdr, size); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_ssdfs_magic_valid(&hdr->magic) || + !is_ssdfs_table_header_magic_valid(hdr)) { + SSDFS_ERR("invalid table magic\n"); + return -EIO; + } + + if (!is_csum_valid(&hdr->check, hdr, size)) { + SSDFS_ERR("invalid checksum\n"); + return -EIO; + } + + if (extents_off != le16_to_cpu(hdr->extents_off)) { + SSDFS_ERR("invalid extents offset %u\n", + le16_to_cpu(hdr->extents_off)); + return -EIO; + } + + extent_area = extent_size * le16_to_cpu(hdr->extents_count); + if (le16_to_cpu(hdr->offset_table_off) != (extents_off + extent_area)) { + SSDFS_ERR("invalid table offset: extents_off %u, " + "extents_count %u, offset_table_off %u\n", + le16_to_cpu(hdr->extents_off), + le16_to_cpu(hdr->extents_count), + le16_to_cpu(hdr->offset_table_off)); + return -EIO; + } + + return 0; +} + +/* + * ssdfs_check_fragment() - check table's fragment + * @table: pointer on table object + * @peb_index: PEB's index + * @hdr: fragment's header + * @fragment_size: size of fragment in bytes + * + * Method tries to check fragment validity. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - corrupted fragment. + * %-ERANGE - internal error. + */ +static +int ssdfs_check_fragment(struct ssdfs_blk2off_table *table, + u16 peb_index, + struct ssdfs_phys_offset_table_header *hdr, + size_t fragment_size) +{ + u16 start_id; + u16 sequence_id; + u16 id_count; + u32 byte_size; + u32 items_size; + __le32 csum1, csum2; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table || !hdr); + BUG_ON(peb_index >= table->pebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + start_id = le16_to_cpu(hdr->start_id); + id_count = le16_to_cpu(hdr->id_count); + byte_size = le32_to_cpu(hdr->byte_size); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("table %p, peb_index %u, start_id %u, " + "id_count %u, byte_size %u, " + "fragment_id %u\n", + table, peb_index, + start_id, id_count, byte_size, + hdr->sequence_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (le32_to_cpu(hdr->magic) != SSDFS_PHYS_OFF_TABLE_MAGIC) { + SSDFS_ERR("invalid magic %#x\n", + le32_to_cpu(hdr->magic)); + return -EIO; + } + + if (byte_size > fragment_size) { + SSDFS_ERR("byte_size %u > fragment_size %zu\n", + byte_size, fragment_size); + return -ERANGE; + } + + csum1 = hdr->checksum; + hdr->checksum = 0; + csum2 = ssdfs_crc32_le(hdr, byte_size); + hdr->checksum = csum1; + + if (csum1 != csum2) { + SSDFS_ERR("csum1 %#x != csum2 %#x\n", + le32_to_cpu(csum1), + le32_to_cpu(csum2)); + return -EIO; + } + + if (le16_to_cpu(hdr->peb_index) != peb_index) { + SSDFS_ERR("invalid peb_index %u\n", + le16_to_cpu(hdr->peb_index)); + return -EIO; + } + + if (start_id == U16_MAX) { + SSDFS_ERR("invalid start_id %u for peb_index %u\n", + start_id, peb_index); + return -EIO; + } + + if (id_count == 0 || id_count > table->pages_per_peb) { + SSDFS_ERR("invalid id_count %u for peb_index %u\n", + le16_to_cpu(hdr->id_count), + peb_index); + return -EIO; + } + + items_size = (u32)id_count * + sizeof(struct ssdfs_phys_offset_descriptor); + + if (byte_size < items_size) { + SSDFS_ERR("invalid byte_size %u for peb_index %u\n", + le32_to_cpu(hdr->byte_size), + peb_index); + return -EIO; + } + + sequence_id = le16_to_cpu(hdr->sequence_id); + if (sequence_id > SSDFS_BLK2OFF_TBL_REVERT_THRESHOLD) { + SSDFS_ERR("invalid sequence_id %u for peb_index %u\n", + sequence_id, peb_index); + return -EIO; + } + + if (le16_to_cpu(hdr->type) == SSDFS_UNKNOWN_OFF_TABLE_TYPE || + le16_to_cpu(hdr->type) >= SSDFS_OFF_TABLE_MAX_TYPE) { + SSDFS_ERR("invalid type %#x for peb_index %u\n", + le16_to_cpu(hdr->type), peb_index); + return -EIO; + } + + if (le16_to_cpu(hdr->flags) & ~SSDFS_OFF_TABLE_FLAGS_MASK) { + SSDFS_ERR("invalid flags set %#x for peb_index %u\n", + le16_to_cpu(hdr->flags), peb_index); + return -EIO; + } + + return 0; +} + +/* + * ssdfs_get_checked_table_header() - get and check table header + * @portion: pointer on portion init environment [out] + */ +static +int ssdfs_get_checked_table_header(struct ssdfs_blk2off_init *portion) +{ + size_t hdr_size = sizeof(struct ssdfs_blk2off_table_header); + struct page *page; + int page_index; + u32 page_off; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!portion || !portion->blk2off_pvec); + + SSDFS_DBG("source %p, offset %u\n", + portion->blk2off_pvec, portion->tbl_hdr_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_index = portion->tbl_hdr_off >> PAGE_SHIFT; + if (portion->tbl_hdr_off >= PAGE_SIZE) + page_off = portion->tbl_hdr_off % PAGE_SIZE; + else + page_off = portion->tbl_hdr_off; + + if (page_index >= pagevec_count(portion->blk2off_pvec)) { + SSDFS_ERR("invalid page index %d: " + "offset %u, pagevec_count %u\n", + page_index, portion->tbl_hdr_off, + pagevec_count(portion->blk2off_pvec)); + return -EINVAL; + } + + page = portion->blk2off_pvec->pages[page_index]; + + ssdfs_lock_page(page); + err = ssdfs_memcpy_from_page(&portion->tbl_hdr, 0, hdr_size, + page, page_off, PAGE_SIZE, + hdr_size); + ssdfs_unlock_page(page); + + if (unlikely(err)) { + SSDFS_ERR("fail to copy: " + "page_off %u, hdr_size %zu\n", + page_off, hdr_size); + return err; + } + + err = ssdfs_check_table_header(&portion->tbl_hdr, hdr_size); + if (err) { + SSDFS_ERR("invalid table header\n"); + return err; + } + + portion->fragments_count = + le16_to_cpu(portion->tbl_hdr.fragments_count); + + return 0; +} + +/* + * ssdfs_blk2off_prepare_temp_bmap() - prepare temporary bitmap + * @portion: initialization environment [in | out] + */ +static inline +int ssdfs_blk2off_prepare_temp_bmap(struct ssdfs_blk2off_init *portion) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!portion || portion->bmap); +#endif /* CONFIG_SSDFS_DEBUG */ + + portion->bmap_bytes = ssdfs_blk2off_table_bmap_bytes(portion->capacity); + portion->bmap = ssdfs_blk2off_kvzalloc(portion->bmap_bytes, + GFP_KERNEL); + if (unlikely(!portion->bmap)) { + SSDFS_ERR("fail to allocate memory\n"); + return -ENOMEM; + } + + return 0; +} + +/* + * ssdfs_blk2off_prepare_extent_array() - prepare extents array + * @portion: initialization environment [in | out] + */ +static +int ssdfs_blk2off_prepare_extent_array(struct ssdfs_blk2off_init *portion) +{ + size_t extent_size = sizeof(struct ssdfs_translation_extent); + u32 extents_off, table_off; + size_t ext_array_size; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!portion || !portion->blk2off_pvec || portion->extent_array); +#endif /* CONFIG_SSDFS_DEBUG */ + + extents_off = offsetof(struct ssdfs_blk2off_table_header, sequence); + if (extents_off != le16_to_cpu(portion->tbl_hdr.extents_off)) { + SSDFS_ERR("invalid extents offset %u\n", + le16_to_cpu(portion->tbl_hdr.extents_off)); + return -EIO; + } + + portion->extents_count = le16_to_cpu(portion->tbl_hdr.extents_count); + ext_array_size = extent_size * portion->extents_count; + table_off = le16_to_cpu(portion->tbl_hdr.offset_table_off); + + if (ext_array_size == 0 || + (extents_off + ext_array_size) != table_off) { + SSDFS_ERR("invalid table header: " + "extents_off %u, extents_count %u, " + "offset_table_off %u\n", + extents_off, portion->extents_count, table_off); + return -EIO; + } + + if (ext_array_size > 0) { + u32 array_size = ext_array_size; + u32 read_bytes = 0; + int page_index; + u32 page_off; +#ifdef CONFIG_SSDFS_DEBUG + int i; +#endif /* CONFIG_SSDFS_DEBUG */ + + portion->extent_array = ssdfs_blk2off_kzalloc(ext_array_size, + GFP_KERNEL); + if (unlikely(!portion->extent_array)) { + SSDFS_ERR("fail to allocate memory\n"); + return -ENOMEM; + } + + extents_off = offsetof(struct ssdfs_blk2off_table_header, + sequence); + page_index = extents_off >> PAGE_SHIFT; + page_off = extents_off % PAGE_SIZE; + + while (array_size > 0) { + u32 size; + struct page *page; + + if (page_index >= pagevec_count(portion->blk2off_pvec)) { + SSDFS_ERR("invalid request: " + "page_index %d, pagevec_size %u\n", + page_index, + pagevec_count(portion->blk2off_pvec)); + return -ERANGE; + } + + size = min_t(u32, PAGE_SIZE - page_off, + array_size); + page = portion->blk2off_pvec->pages[page_index]; + + ssdfs_lock_page(page); + err = ssdfs_memcpy_from_page(portion->extent_array, + read_bytes, ext_array_size, + page, + page_off, PAGE_SIZE, + size); + ssdfs_unlock_page(page); + + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", + err); + return err; + } + + read_bytes += size; + array_size -= size; + extents_off += size; + + page_index = extents_off >> PAGE_SHIFT; + page_off = extents_off % PAGE_SIZE; + }; + +#ifdef CONFIG_SSDFS_DEBUG + for (i = 0; i < portion->extents_count; i++) { + struct ssdfs_translation_extent *extent; + extent = &portion->extent_array[i]; + + SSDFS_DBG("index %d, logical_blk %u, offset_id %u, " + "len %u, sequence_id %u, state %u\n", + i, + le16_to_cpu(extent->logical_blk), + le16_to_cpu(extent->offset_id), + le16_to_cpu(extent->len), + extent->sequence_id, + extent->state); + } +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return 0; +} + +/* + * ssdfs_get_fragment_header() - get fragment header + * @portion: initialization environment [in | out] + * @offset: header offset in bytes + */ +static +int ssdfs_get_fragment_header(struct ssdfs_blk2off_init *portion, + u32 offset) +{ + size_t hdr_size = sizeof(struct ssdfs_phys_offset_table_header); + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!portion || !portion->blk2off_pvec); + + SSDFS_DBG("source %p, offset %u\n", + portion->blk2off_pvec, offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_unaligned_read_pagevec(portion->blk2off_pvec, + offset, + hdr_size, + &portion->pot_hdr); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + return err; + } + + return 0; +} + +/* + * ssdfs_get_checked_fragment() - get checked table's fragment + * @portion: initialization environment [in | out] + * + * This method tries to get and to check fragment validity. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-EIO - corrupted fragment. + * %-EEXIST - has been initialized already. + */ +static +int ssdfs_get_checked_fragment(struct ssdfs_blk2off_init *portion) +{ + struct ssdfs_phys_offset_table_array *phys_off_table; + struct ssdfs_phys_offset_table_fragment *fragment; + struct page *page; + void *kaddr; + int page_index; + u32 page_off; + size_t fragment_size; + u16 start_id; + u16 sequence_id; + int state; + u32 read_bytes; +#ifdef CONFIG_SSDFS_DEBUG + int i; +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!portion || !portion->table || !portion->blk2off_pvec); + + SSDFS_DBG("table %p, peb_index %u, source %p, offset %u\n", + portion->table, portion->peb_index, + portion->blk2off_pvec, portion->pot_hdr_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + fragment_size = le32_to_cpu(portion->pot_hdr.byte_size); + start_id = le16_to_cpu(portion->pot_hdr.start_id); + sequence_id = le16_to_cpu(portion->pot_hdr.sequence_id); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sequence_id %u\n", sequence_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (fragment_size > PAGE_SIZE) { + SSDFS_ERR("invalid fragment_size %zu\n", + fragment_size); + +#ifdef CONFIG_SSDFS_DEBUG + for (i = 0; i < pagevec_count(portion->blk2off_pvec); i++) { + page = portion->blk2off_pvec->pages[i]; + + kaddr = kmap_local_page(page); + SSDFS_DBG("PAGE DUMP: index %d\n", + i); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, + PAGE_SIZE); + SSDFS_DBG("\n"); + kunmap_local(kaddr); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + return -EIO; + } + + if (sequence_id > SSDFS_BLK2OFF_TBL_REVERT_THRESHOLD) { + SSDFS_ERR("invalid sequence_id %u\n", + sequence_id); + return -EIO; + } + + phys_off_table = &portion->table->peb[portion->peb_index]; + + kaddr = ssdfs_sequence_array_get_item(phys_off_table->sequence, + sequence_id); + if (IS_ERR_OR_NULL(kaddr)) { + /* expected state -> continue logic */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment %u is absent\n", + sequence_id); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment %u has been initialized already\n", + sequence_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EEXIST; + } + + fragment = ssdfs_blk2off_frag_alloc(); + if (IS_ERR_OR_NULL(fragment)) { + err = (fragment == NULL ? -ENOMEM : PTR_ERR(fragment)); + SSDFS_ERR("fail to allocate fragment: " + "err %d\n", err); + return err; + } + + err = ssdfs_sequence_array_init_item(phys_off_table->sequence, + sequence_id, + fragment); + if (unlikely(err)) { + ssdfs_blk2off_frag_free(fragment); + SSDFS_ERR("fail to init fragment: " + "err %d\n", err); + return err; + } + + state = SSDFS_BLK2OFF_FRAG_CREATED; + err = ssdfs_blk2off_table_init_fragment(fragment, + sequence_id, + start_id, + portion->table->pages_per_peb, + state, + &fragment_size); + if (unlikely(err)) { + SSDFS_ERR("fail to initialize fragment: err %d\n", + err); + return err; + } + + page_index = portion->pot_hdr_off >> PAGE_SHIFT; + if (portion->pot_hdr_off >= PAGE_SIZE) + page_off = portion->pot_hdr_off % PAGE_SIZE; + else + page_off = portion->pot_hdr_off; + + down_write(&fragment->lock); + + read_bytes = 0; + while (fragment_size > 0) { + u32 size; + + size = min_t(u32, PAGE_SIZE - page_off, fragment_size); + + if (page_index >= pagevec_count(portion->blk2off_pvec)) { + err = -ERANGE; + SSDFS_ERR("invalid request: " + "page_index %d, pvec_size %u\n", + page_index, + pagevec_count(portion->blk2off_pvec)); + goto finish_fragment_read; + } + + page = portion->blk2off_pvec->pages[page_index]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("read_bytes %u, fragment->buf_size %zu, " + "page_off %u, size %u\n", + read_bytes, fragment->buf_size, page_off, page_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + err = ssdfs_memcpy_from_page(fragment->buf, + read_bytes, fragment->buf_size, + page, page_off, PAGE_SIZE, + size); + ssdfs_unlock_page(page); + + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + goto finish_fragment_read; + } + + read_bytes += size; + fragment_size -= size; + portion->pot_hdr_off += size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("read_bytes %u, fragment_size %zu, " + "pot_hdr_off %u\n", + read_bytes, fragment_size, + portion->pot_hdr_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_index = portion->pot_hdr_off >> PAGE_SHIFT; + if (portion->pot_hdr_off >= PAGE_SIZE) + page_off = portion->pot_hdr_off % PAGE_SIZE; + else + page_off = portion->pot_hdr_off; + }; + + err = ssdfs_check_fragment(portion->table, portion->peb_index, + fragment->hdr, + fragment->buf_size); + if (err) + goto finish_fragment_read; + + fragment->start_id = start_id; + atomic_set(&fragment->id_count, + le16_to_cpu(fragment->hdr->id_count)); + atomic_set(&fragment->state, SSDFS_BLK2OFF_FRAG_INITIALIZED); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("FRAGMENT: sequence_id %u, start_id %u, id_count %d\n", + sequence_id, start_id, atomic_read(&fragment->id_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_fragment_read: + up_write(&fragment->lock); + + if (err) { + SSDFS_ERR("corrupted fragment: err %d\n", + err); + return err; + } + + return 0; +} + +/* + * is_ssdfs_offset_position_older() - is position checkpoint older? + * @pos: position offset + * @cno: checkpoint number for comparison + */ +static inline +bool is_ssdfs_offset_position_older(struct ssdfs_offset_position *pos, + u64 cno) +{ + if (pos->cno != SSDFS_INVALID_CNO) + return pos->cno >= cno; + + return false; +} + +/* + * ssdfs_check_translation_extent() - check translation extent + * @extent: pointer on translation extent + * @capacity: logical blocks capacity + * @sequence_id: extent's sequence id + * + * This method tries to check extent's validity. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - corrupted translation extent. + */ +static +int ssdfs_check_translation_extent(struct ssdfs_translation_extent *extent, + u16 capacity, u8 sequence_id) +{ + u16 logical_blk; + u16 offset_id; + u16 len; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!extent); +#endif /* CONFIG_SSDFS_DEBUG */ + + logical_blk = le16_to_cpu(extent->logical_blk); + offset_id = le16_to_cpu(extent->offset_id); + len = le16_to_cpu(extent->len); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u, offset_id %u, len %u, " + "sequence_id %u, state %#x\n", + logical_blk, offset_id, len, + extent->sequence_id, extent->state); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (extent->state <= SSDFS_LOGICAL_BLK_UNKNOWN_STATE || + extent->state >= SSDFS_LOGICAL_BLK_STATE_MAX) { + SSDFS_ERR("invalid translation extent: " + "unknown state %#x\n", + extent->state); + return -EIO; + } + + if (logical_blk > (U16_MAX - len) || + (logical_blk + len) > capacity) { + SSDFS_ERR("invalid translation extent: " + "logical_blk %u, len %u, capacity %u\n", + logical_blk, len, capacity); + return -EIO; + } + + if (extent->state != SSDFS_LOGICAL_BLK_FREE) { + if (offset_id > (U16_MAX - len)) { + SSDFS_ERR("invalid translation extent: " + "offset_id %u, len %u\n", + offset_id, len); + return -EIO; + } + } + + if (sequence_id != extent->sequence_id) { + SSDFS_ERR("invalid translation extent: " + "sequence_id %u != extent->sequence_id %u\n", + sequence_id, extent->sequence_id); + return -EIO; + } + + return 0; +} + +/* + * ssdfs_process_used_translation_extent() - process used translation extent + * @portion: pointer on portion init environment [in | out] + * @extent_index: index of extent + * + * This method checks translation extent, to set bitmap for + * logical blocks in the extent and to fill portion of + * offset position array by physical offsets id. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-EIO - corrupted translation extent. + * %-EAGAIN - extent is partially processed in the fragment. + */ +static +int ssdfs_process_used_translation_extent(struct ssdfs_blk2off_init *portion, + int *extent_index) +{ + struct ssdfs_sequence_array *sequence = NULL; + struct ssdfs_phys_offset_table_fragment *frag = NULL; + struct ssdfs_phys_offset_descriptor *phys_off = NULL; + struct ssdfs_translation_extent *extent = NULL; + struct ssdfs_dynamic_array *lblk2off; + void *ptr; + u16 peb_index; + u16 sequence_id; + u16 pos_array_items; + u16 start_id; + u16 id_count; + u16 id_diff; + u32 logical_blk; + u16 offset_id; + u16 len; + int phys_off_index; + bool is_partially_processed = false; + struct ssdfs_blk_state_offset *state_off; + int i, j; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!portion || !extent_index); + BUG_ON(!portion->bmap || !portion->extent_array); + BUG_ON(portion->cno == SSDFS_INVALID_CNO); + BUG_ON(*extent_index >= portion->extents_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + lblk2off = &portion->table->lblk2off; + + peb_index = portion->peb_index; + sequence_id = le16_to_cpu(portion->pot_hdr.sequence_id); + + sequence = portion->table->peb[peb_index].sequence; + ptr = ssdfs_sequence_array_get_item(sequence, sequence_id); + if (IS_ERR_OR_NULL(ptr)) { + err = (ptr == NULL ? -ENOENT : PTR_ERR(ptr)); + SSDFS_ERR("fail to get fragment: " + "sequence_id %u, err %d\n", + sequence_id, err); + return err; + } + frag = (struct ssdfs_phys_offset_table_fragment *)ptr; + + start_id = le16_to_cpu(portion->pot_hdr.start_id); + id_count = le16_to_cpu(portion->pot_hdr.id_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_id %u, id_count %u\n", + start_id, id_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + extent = &portion->extent_array[*extent_index]; + + err = ssdfs_check_translation_extent(extent, portion->capacity, + *extent_index); + if (err) { + SSDFS_ERR("invalid translation extent: " + "index %u, err %d\n", + *extent_index, err); + return err; + } + + if (*extent_index == 0 && extent->state != SSDFS_LOGICAL_BLK_USED) { + SSDFS_ERR("invalid translation extent state %#x\n", + extent->state); + return -EIO; + } + + logical_blk = le16_to_cpu(extent->logical_blk); + offset_id = le16_to_cpu(extent->offset_id); + len = le16_to_cpu(extent->len); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u, offset_id %u, len %u, " + "sequence_id %u, state %#x\n", + logical_blk, offset_id, len, + extent->sequence_id, extent->state); +#endif /* CONFIG_SSDFS_DEBUG */ + + if ((start_id + id_count) < offset_id) { + SSDFS_ERR("start_id %u + id_count %u < offset_id %u\n", + start_id, id_count, offset_id); + return -EIO; + } + + if ((offset_id + len) <= start_id) { + SSDFS_ERR("offset_id %u + len %u <= start_id %u\n", + offset_id, len, start_id); + return -EIO; + } + + if (offset_id < start_id) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("offset_id %u, len %u, " + "start_id %u,id_count %u\n", + offset_id, len, + start_id, id_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + id_diff = start_id - offset_id; + offset_id += id_diff; + logical_blk += id_diff; + len -= id_diff; + } + + if ((offset_id + len) > (start_id + id_count)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("offset_id %u, len %u, " + "start_id %u,id_count %u\n", + offset_id, len, + start_id, id_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + is_partially_processed = true; + + /* correct lenght */ + len = (start_id + id_count) - offset_id; + } + + pos_array_items = portion->capacity - logical_blk; + + if (pos_array_items < len) { + SSDFS_ERR("array_items %u < len %u\n", + pos_array_items, len); + return -EINVAL; + } + + if (id_count > atomic_read(&frag->id_count)) { + SSDFS_ERR("id_count %u > frag->id_count %d\n", + id_count, + atomic_read(&frag->id_count)); + return -EIO; + } + + phys_off_index = offset_id - start_id; + + if ((phys_off_index + len) > id_count) { + SSDFS_ERR("phys_off_index %d, len %u, id_count %u\n", + phys_off_index, len, id_count); + return -EIO; + } + + bitmap_clear(portion->bmap, 0, portion->capacity); + + down_read(&frag->lock); + +#ifdef CONFIG_SSDFS_DEBUG + for (j = 0; j < pagevec_count(portion->blk_desc_pvec); j++) { + void *kaddr; + struct page *page = portion->blk_desc_pvec->pages[j]; + + kaddr = kmap_local_page(page); + SSDFS_DBG("PAGE DUMP: index %d\n", + j); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, + PAGE_SIZE); + SSDFS_DBG("\n"); + kunmap_local(kaddr); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < len; i++) { + size_t area_tbl_size = sizeof(struct ssdfs_area_block_table); + size_t desc_size = sizeof(struct ssdfs_block_descriptor); + struct ssdfs_offset_position *pos; + u16 id = offset_id + i; + u16 cur_blk; + u32 byte_offset; + bool is_invalid = false; + + phys_off = &frag->phys_offs[phys_off_index + i]; + + cur_blk = le16_to_cpu(phys_off->page_desc.logical_blk); + byte_offset = le32_to_cpu(phys_off->blk_state.byte_offset); + + if (byte_offset < area_tbl_size) { + err = -EIO; + SSDFS_ERR("corrupted phys offset: " + "byte_offset %u, area_tbl_size %zu\n", + byte_offset, area_tbl_size); + goto finish_process_fragment; + } + + byte_offset -= area_tbl_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_blk %u, byte_offset %u\n", + cur_blk, byte_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (cur_blk >= portion->capacity) { + err = -EIO; + SSDFS_ERR("logical_blk %u >= portion->capacity %u\n", + cur_blk, portion->capacity); + goto finish_process_fragment; + } + + if (cur_blk < logical_blk || cur_blk >= (logical_blk + len)) { + err = -EIO; + SSDFS_ERR("cur_blk %u, logical_blk %u, len %u\n", + cur_blk, logical_blk, len); + goto finish_process_fragment; + } + + pos = SSDFS_OFF_POS(ssdfs_dynamic_array_get_locked(lblk2off, + cur_blk)); + if (IS_ERR_OR_NULL(pos)) { + err = (pos == NULL ? -ENOENT : PTR_ERR(pos)); + SSDFS_ERR("fail to get logical block: " + "cur_blk %u, err %d\n", + cur_blk, err); + goto finish_process_fragment; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("portion->cno %#llx, " + "pos (cno %#llx, id %u, peb_index %u, " + "sequence_id %u, offset_index %u)\n", + portion->cno, + pos->cno, pos->id, pos->peb_index, + pos->sequence_id, pos->offset_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_ssdfs_offset_position_older(pos, portion->cno)) { + /* logical block has been initialized already */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical block %u has been initialized already\n", + cur_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + err = ssdfs_dynamic_array_release(lblk2off, + cur_blk, pos); + if (unlikely(err)) { + SSDFS_ERR("fail to release: " + "cur_blk %u, err %d\n", + cur_blk, err); + goto finish_process_fragment; + } else + continue; + } + + peb_index = portion->peb_index; + + bitmap_set(portion->bmap, cur_blk, 1); + + pos->cno = portion->cno; + pos->id = id; + pos->peb_index = peb_index; + pos->sequence_id = sequence_id; + pos->offset_index = phys_off_index + i; + + err = ssdfs_unaligned_read_pagevec(portion->blk_desc_pvec, + byte_offset, + desc_size, + &pos->blk_desc.buf); + if (err == -E2BIG) { + err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable init block descriptor: " + "logical block %u\n", + cur_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + pos->blk_desc.status = SSDFS_BLK_DESC_BUF_UNKNOWN_STATE; + memset(&pos->blk_desc.buf, 0xFF, + sizeof(desc_size)); + } else if (unlikely(err)) { + SSDFS_ERR("fail to read block descriptor: " + "cur_blk %u, err %d\n", + cur_blk, err); + ssdfs_dynamic_array_release(lblk2off, + cur_blk, pos); + goto finish_process_fragment; + } else + pos->blk_desc.status = SSDFS_BLK_DESC_BUF_INITIALIZED; + + state_off = &pos->blk_desc.buf.state[0]; + + switch (pos->blk_desc.status) { + case SSDFS_BLK_DESC_BUF_INITIALIZED: + is_invalid = + IS_SSDFS_BLK_STATE_OFFSET_INVALID(state_off); + break; + + default: + is_invalid = false; + break; + } + + if (is_invalid) { + err = -ERANGE; + SSDFS_ERR("block state offset invalid\n"); + + SSDFS_ERR("status %#x, ino %llu, " + "logical_offset %u, peb_index %u, " + "peb_page %u\n", + pos->blk_desc.status, + le64_to_cpu(pos->blk_desc.buf.ino), + le32_to_cpu(pos->blk_desc.buf.logical_offset), + le16_to_cpu(pos->blk_desc.buf.peb_index), + le16_to_cpu(pos->blk_desc.buf.peb_page)); + + for (j = 0; j < SSDFS_BLK_STATE_OFF_MAX; j++) { + state_off = &pos->blk_desc.buf.state[j]; + + SSDFS_ERR("BLK STATE OFFSET %d: " + "log_start_page %u, log_area %#x, " + "byte_offset %u, " + "peb_migration_id %u\n", + j, + le16_to_cpu(state_off->log_start_page), + state_off->log_area, + le32_to_cpu(state_off->byte_offset), + state_off->peb_migration_id); + } + + ssdfs_dynamic_array_release(lblk2off, cur_blk, pos); + +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + + goto finish_process_fragment; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("status %#x, ino %llu, " + "logical_offset %u, peb_index %u, peb_page %u\n", + pos->blk_desc.status, + le64_to_cpu(pos->blk_desc.buf.ino), + le32_to_cpu(pos->blk_desc.buf.logical_offset), + le16_to_cpu(pos->blk_desc.buf.peb_index), + le16_to_cpu(pos->blk_desc.buf.peb_page)); + + for (j = 0; j < SSDFS_BLK_STATE_OFF_MAX; j++) { + state_off = &pos->blk_desc.buf.state[j]; + + SSDFS_DBG("BLK STATE OFFSET %d: " + "log_start_page %u, log_area %#x, " + "byte_offset %u, peb_migration_id %u\n", + j, + le16_to_cpu(state_off->log_start_page), + state_off->log_area, + le32_to_cpu(state_off->byte_offset), + state_off->peb_migration_id); + } + + SSDFS_DBG("set init bitmap: cur_blk %u\n", + cur_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_dynamic_array_release(lblk2off, cur_blk, pos); + if (unlikely(err)) { + SSDFS_ERR("fail to release: " + "cur_blk %u, err %d\n", + cur_blk, err); + goto finish_process_fragment; + } + + err = ssdfs_blk2off_table_bmap_set(&portion->table->lbmap, + SSDFS_LBMAP_INIT_INDEX, + cur_blk); + if (unlikely(err)) { + SSDFS_ERR("fail to set init bitmap: " + "logical_blk %u, err %d\n", + cur_blk, err); + goto finish_process_fragment; + } + } + +finish_process_fragment: + up_read(&frag->lock); + + if (unlikely(err)) + return err; + + if (bitmap_intersects(portion->bmap, + portion->table->lbmap.array[SSDFS_LBMAP_STATE_INDEX], + portion->table->lbmap.bits_count)) { + SSDFS_ERR("invalid translation extent: " + "logical_blk %u, offset_id %u, len %u\n", + logical_blk, offset_id, len); + return -EIO; + } + + bitmap_or(portion->table->lbmap.array[SSDFS_LBMAP_STATE_INDEX], + portion->bmap, + portion->table->lbmap.array[SSDFS_LBMAP_STATE_INDEX], + portion->table->lbmap.bits_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("init_bmap %lx, state_bmap %lx, modification_bmap %lx\n", + *portion->table->lbmap.array[SSDFS_LBMAP_INIT_INDEX], + *portion->table->lbmap.array[SSDFS_LBMAP_STATE_INDEX], + *portion->table->lbmap.array[SSDFS_LBMAP_MODIFICATION_INDEX]); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_partially_processed) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("extent has been processed partially: " + "index %u\n", *extent_index); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EAGAIN; + } + + return 0; +} + +/* + * ssdfs_process_free_translation_extent() - process free translation extent + * @portion: pointer on portion init environment [in | out] + * @extent_index: index of extent + * + * This method checks translation extent, to set bitmap for + * logical blocks in the extent and to fill portion of + * offset position array by physical offsets id. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-EIO - corrupted translation extent. + */ +static +int ssdfs_process_free_translation_extent(struct ssdfs_blk2off_init *portion, + int *extent_index) +{ + struct ssdfs_sequence_array *sequence = NULL; + struct ssdfs_phys_offset_table_fragment *frag = NULL; + struct ssdfs_translation_extent *extent = NULL; + struct ssdfs_dynamic_array *lblk2off; + void *ptr; + u16 peb_index; + u16 sequence_id; + u16 pos_array_items; + size_t pos_size = sizeof(struct ssdfs_offset_position); + u32 logical_blk; + u16 len; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!portion || !extent_index); + BUG_ON(!portion->extent_array); + BUG_ON(portion->cno == SSDFS_INVALID_CNO); + BUG_ON(*extent_index >= portion->extents_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + lblk2off = &portion->table->lblk2off; + + peb_index = portion->peb_index; + sequence_id = le16_to_cpu(portion->pot_hdr.sequence_id); + + sequence = portion->table->peb[peb_index].sequence; + ptr = ssdfs_sequence_array_get_item(sequence, sequence_id); + if (IS_ERR_OR_NULL(ptr)) { + err = (ptr == NULL ? -ENOENT : PTR_ERR(ptr)); + SSDFS_ERR("fail to get fragment: " + "sequence_id %u, err %d\n", + sequence_id, err); + return err; + } + frag = (struct ssdfs_phys_offset_table_fragment *)ptr; + + extent = &portion->extent_array[*extent_index]; + logical_blk = le16_to_cpu(extent->logical_blk); + len = le16_to_cpu(extent->len); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u, len %u, " + "sequence_id %u, state %#x\n", + logical_blk, len, + extent->sequence_id, extent->state); +#endif /* CONFIG_SSDFS_DEBUG */ + + pos_array_items = portion->capacity - logical_blk; + + if (pos_array_items < len) { + SSDFS_ERR("array_items %u < len %u\n", + pos_array_items, len); + return -EINVAL; + } + + err = ssdfs_check_translation_extent(extent, portion->capacity, + *extent_index); + if (err) { + SSDFS_ERR("invalid translation extent: " + "sequence_id %u, err %d\n", + *extent_index, err); + return err; + } + + down_read(&frag->lock); + + for (i = 0; i < len; i++) { + struct ssdfs_offset_position *pos; + u32 cur_blk = logical_blk + i; + + pos = SSDFS_OFF_POS(ssdfs_dynamic_array_get_locked(lblk2off, + cur_blk)); + if (IS_ERR_OR_NULL(pos)) { + err = (pos == NULL ? -ENOENT : PTR_ERR(pos)); + SSDFS_ERR("fail to get logical block: " + "cur_blk %u, err %d\n", + cur_blk, err); + goto finish_process_fragment; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("portion->cno %#llx, " + "pos (cno %#llx, id %u, peb_index %u, " + "sequence_id %u, offset_index %u)\n", + portion->cno, + pos->cno, pos->id, pos->peb_index, + pos->sequence_id, pos->offset_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_ssdfs_offset_position_older(pos, portion->cno)) { + /* logical block has been initialized already */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical block %u has been initialized already\n", + cur_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + err = ssdfs_dynamic_array_release(lblk2off, + cur_blk, pos); + if (unlikely(err)) { + SSDFS_ERR("fail to release: " + "cur_blk %u, err %d\n", + cur_blk, err); + goto finish_process_fragment; + } else + continue; + } + + err = ssdfs_blk2off_table_bmap_clear(&portion->table->lbmap, + SSDFS_LBMAP_STATE_INDEX, + cur_blk); + if (unlikely(err)) { + SSDFS_ERR("fail to clear state bitmap: " + "logical_blk %u, err %d\n", + cur_blk, err); + goto finish_process_fragment; + } + + memset(pos, 0xFF, pos_size); + + pos->cno = portion->cno; + pos->peb_index = portion->peb_index; + + err = ssdfs_dynamic_array_release(lblk2off, cur_blk, pos); + if (unlikely(err)) { + SSDFS_ERR("fail to release: " + "cur_blk %u, err %d\n", + cur_blk, err); + goto finish_process_fragment; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("set init bitmap: cur_blk %u\n", + cur_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_blk2off_table_bmap_set(&portion->table->lbmap, + SSDFS_LBMAP_INIT_INDEX, + cur_blk); + if (unlikely(err)) { + SSDFS_ERR("fail to set init bitmap: " + "logical_blk %u, err %d\n", + cur_blk, err); + goto finish_process_fragment; + } + + /* + * Free block needs to be marked as modified + * with the goal not to lose the information + * about free blocks in the case of PEB migration. + * Because, offsets translation table's snapshot + * needs to contain information about free blocks. + */ + err = ssdfs_blk2off_table_bmap_set(&portion->table->lbmap, + SSDFS_LBMAP_MODIFICATION_INDEX, + cur_blk); + if (unlikely(err)) { + SSDFS_ERR("fail to set modification bitmap: " + "logical_blk %u, err %d\n", + cur_blk, err); + goto finish_process_fragment; + } + } + +finish_process_fragment: + up_read(&frag->lock); + + return err; +} + +/* + * ssdfs_blk2off_fragment_init() - initialize portion's fragment + * @portion: pointer on portion init environment [in | out] + * @fragment_index: index of fragment + * @extent_index: pointer on extent index [in | out] + */ +static +int ssdfs_blk2off_fragment_init(struct ssdfs_blk2off_init *portion, + int fragment_index, + int *extent_index) +{ + struct ssdfs_sequence_array *sequence = NULL; + struct ssdfs_translation_extent *extent = NULL; + u16 logical_blk; + u16 offset_id; + u16 len; + u16 start_id; + u16 id_count; + u16 processed_offset_ids = 0; + int state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!portion || !portion->table || !portion->blk2off_pvec); + BUG_ON(!portion->bmap || !portion->extent_array); + BUG_ON(!extent_index); + BUG_ON(portion->peb_index >= portion->table->pebs_count); + + SSDFS_DBG("peb_index %u, fragment_index %d, " + "extent_index %u, extents_count %u\n", + portion->peb_index, fragment_index, + *extent_index, portion->extents_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (fragment_index == 0) { + portion->pot_hdr_off = portion->tbl_hdr_off + + le16_to_cpu(portion->tbl_hdr.offset_table_off); + err = ssdfs_get_fragment_header(portion, portion->pot_hdr_off); + } else { + portion->pot_hdr_off = portion->tbl_hdr_off + + le16_to_cpu(portion->pot_hdr.next_fragment_off); + err = ssdfs_get_fragment_header(portion, portion->pot_hdr_off); + } + + if (err) { + SSDFS_ERR("fail to get fragment header: err %d\n", + err); + return err; + } + + err = ssdfs_get_checked_fragment(portion); + if (err == -EEXIST) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment has been initialized already: " + "peb_index %u, offset %u\n", + portion->peb_index, + portion->pot_hdr_off); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (err) { + SSDFS_ERR("fail to get checked fragment: " + "peb_index %u, offset %u, err %d\n", + portion->peb_index, + portion->pot_hdr_off, err); + return err; + } + + if (*extent_index >= portion->extents_count) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("extent_index %u >= extents_count %u\n", + *extent_index, portion->extents_count); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + start_id = le16_to_cpu(portion->pot_hdr.start_id); + id_count = le16_to_cpu(portion->pot_hdr.id_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_id %u, id_count %u\n", + start_id, id_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + while (*extent_index < portion->extents_count) { + extent = &portion->extent_array[*extent_index]; + logical_blk = le16_to_cpu(extent->logical_blk); + offset_id = le16_to_cpu(extent->offset_id); + len = le16_to_cpu(extent->len); + state = extent->state; + + if (processed_offset_ids > id_count) { + SSDFS_ERR("processed_offset_ids %u > id_count %u\n", + processed_offset_ids, id_count); + return -ERANGE; + } else if (processed_offset_ids == id_count) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment has been processed: " + "processed_offset_ids %u == id_count %u\n", + processed_offset_ids, id_count); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_fragment_processing; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u, len %u, " + "state %#x, extent_index %d\n", + logical_blk, len, state, *extent_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (logical_blk >= portion->capacity) { + err = -ERANGE; + SSDFS_ERR("logical_blk %u >= capacity %u\n", + logical_blk, portion->capacity); + return err; + } + + if (state != SSDFS_LOGICAL_BLK_FREE) { + if (offset_id >= (start_id + id_count)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("offset_id %u, start_id %u, " + "id_count %u\n", + offset_id, start_id, id_count); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_fragment_processing; + } + } + + if (state == SSDFS_LOGICAL_BLK_USED) { + err = ssdfs_process_used_translation_extent(portion, + extent_index); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("extent has been processed partially: " + "sequence_id %u, err %d\n", + *extent_index, err); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (unlikely(err)) { + SSDFS_ERR("invalid translation extent: " + "sequence_id %u, err %d\n", + *extent_index, err); + return err; + } + } else if (state == SSDFS_LOGICAL_BLK_FREE) { + err = ssdfs_process_free_translation_extent(portion, + extent_index); + if (err) { + SSDFS_ERR("invalid translation extent: " + "sequence_id %u, err %d\n", + *extent_index, err); + return err; + } + } else + BUG(); + + if (err == -EAGAIN) { + SSDFS_DBG("don't increment extent index\n"); + goto finish_fragment_processing; + } else + ++*extent_index; + + if (state != SSDFS_LOGICAL_BLK_FREE) + processed_offset_ids += len; + }; + +finish_fragment_processing: + if (portion->table->init_cno == U64_MAX || + portion->cno >= portion->table->init_cno) { + u16 peb_index = portion->peb_index; + u16 sequence_id = le16_to_cpu(portion->pot_hdr.sequence_id); + + sequence = portion->table->peb[peb_index].sequence; + + if (is_ssdfs_sequence_array_last_id_invalid(sequence) || + ssdfs_sequence_array_last_id(sequence) <= sequence_id) { + portion->table->init_cno = portion->cno; + portion->table->used_logical_blks = + le16_to_cpu(portion->pot_hdr.used_logical_blks); + portion->table->free_logical_blks = + le16_to_cpu(portion->pot_hdr.free_logical_blks); + portion->table->last_allocated_blk = + le16_to_cpu(portion->pot_hdr.last_allocated_blk); + + ssdfs_sequence_array_set_last_id(sequence, sequence_id); + } + } + + atomic_inc(&portion->table->peb[portion->peb_index].fragment_count); + + return err; +} + +/* + * ssdfs_define_peb_table_state() - define PEB's table state + * @table: pointer on translation table object + * @peb_index: PEB's index + */ +static inline +int ssdfs_define_peb_table_state(struct ssdfs_blk2off_table *table, + u16 peb_index) +{ + int state; + u16 last_allocated_blk; + u16 allocated_blks; + int init_bits; + int count; + unsigned long last_id; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table); + BUG_ON(peb_index >= table->pebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + count = atomic_read(&table->peb[peb_index].fragment_count); + last_id = ssdfs_sequence_array_last_id(table->peb[peb_index].sequence); + last_allocated_blk = table->last_allocated_blk; + + if (last_allocated_blk >= U16_MAX) + allocated_blks = 0; + else + allocated_blks = last_allocated_blk + 1; + + init_bits = bitmap_weight(table->lbmap.array[SSDFS_LBMAP_INIT_INDEX], + allocated_blks); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("table %p, peb_index %u, count %d, last_id %lu, " + "last_allocated_blk %u, init_bits %d, " + "allocated_blks %u\n", + table, peb_index, count, last_id, + last_allocated_blk, init_bits, + allocated_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (init_bits < 0) { + SSDFS_ERR("invalid init bmap: weight %d\n", + init_bits); + return -ERANGE; + } + + if (count == 0) { + SSDFS_ERR("fragment_count == 0\n"); + return -ERANGE; + } + + state = atomic_cmpxchg(&table->peb[peb_index].state, + SSDFS_BLK2OFF_TABLE_CREATED, + SSDFS_BLK2OFF_TABLE_PARTIAL_INIT); + if (state <= SSDFS_BLK2OFF_TABLE_UNDEFINED || + state > SSDFS_BLK2OFF_TABLE_DIRTY_PARTIAL_INIT) { + SSDFS_WARN("unexpected state %#x\n", + state); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("state %#x\n", state); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (init_bits > 0) { + if (init_bits >= allocated_blks) { + state = atomic_cmpxchg(&table->peb[peb_index].state, + SSDFS_BLK2OFF_TABLE_PARTIAL_INIT, + SSDFS_BLK2OFF_TABLE_COMPLETE_INIT); + if (state == SSDFS_BLK2OFF_TABLE_PARTIAL_INIT) { + /* table is completely initialized */ + goto finish_define_peb_table_state; + } + + state = atomic_cmpxchg(&table->peb[peb_index].state, + SSDFS_BLK2OFF_TABLE_DIRTY_PARTIAL_INIT, + SSDFS_BLK2OFF_TABLE_DIRTY); + if (state == SSDFS_BLK2OFF_TABLE_DIRTY_PARTIAL_INIT) { + /* table is dirty already */ + goto finish_define_peb_table_state; + } + + if (state < SSDFS_BLK2OFF_TABLE_PARTIAL_INIT || + state > SSDFS_BLK2OFF_TABLE_COMPLETE_INIT) { + SSDFS_WARN("unexpected state %#x\n", + state); + return -ERANGE; + } + } + } else { + SSDFS_WARN("init_bits == 0\n"); + return -ERANGE; + } + +finish_define_peb_table_state: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("state %#x\n", atomic_read(&table->peb[peb_index].state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_define_blk2off_table_object_state() - define table object state + * @table: pointer on translation table object + */ +static inline +int ssdfs_define_blk2off_table_object_state(struct ssdfs_blk2off_table *table) +{ + int state; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = SSDFS_BLK2OFF_TABLE_STATE_MAX; + for (i = 0; i < table->pebs_count; i++) { + int peb_tbl_state = atomic_read(&table->peb[i].state); + + if (peb_tbl_state < state) + state = peb_tbl_state; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("table %p, state %#x\n", table, state); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (state) { + case SSDFS_BLK2OFF_TABLE_CREATED: + state = atomic_read(&table->state); + if (state != SSDFS_BLK2OFF_OBJECT_CREATED) { + SSDFS_WARN("unexpected state %#x\n", + state); + return -ERANGE; + } + break; + + case SSDFS_BLK2OFF_TABLE_PARTIAL_INIT: + case SSDFS_BLK2OFF_TABLE_DIRTY_PARTIAL_INIT: + state = atomic_cmpxchg(&table->state, + SSDFS_BLK2OFF_OBJECT_CREATED, + SSDFS_BLK2OFF_OBJECT_PARTIAL_INIT); + complete_all(&table->partial_init_end); + + if (state <= SSDFS_BLK2OFF_OBJECT_UNKNOWN || + state > SSDFS_BLK2OFF_OBJECT_PARTIAL_INIT) { + SSDFS_WARN("unexpected state %#x\n", + state); + return -ERANGE; + } + break; + + case SSDFS_BLK2OFF_TABLE_COMPLETE_INIT: + case SSDFS_BLK2OFF_TABLE_DIRTY: + state = atomic_cmpxchg(&table->state, + SSDFS_BLK2OFF_OBJECT_PARTIAL_INIT, + SSDFS_BLK2OFF_OBJECT_COMPLETE_INIT); + if (state == SSDFS_BLK2OFF_OBJECT_CREATED) { + state = atomic_cmpxchg(&table->state, + SSDFS_BLK2OFF_OBJECT_CREATED, + SSDFS_BLK2OFF_OBJECT_COMPLETE_INIT); + } + complete_all(&table->partial_init_end); + complete_all(&table->full_init_end); + + if (state < SSDFS_BLK2OFF_OBJECT_CREATED || + state > SSDFS_BLK2OFF_OBJECT_COMPLETE_INIT) { + SSDFS_WARN("unexpected state %#x\n", + state); + return -ERANGE; + } + break; + + default: + SSDFS_WARN("unexpected state %#x\n", state); + return -ERANGE; + }; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("state %#x\n", atomic_read(&table->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_blk2off_table_partial_init() - initialize PEB's table fragment + * @table: pointer on translation table object + * @blk2off_pvec: blk2off fragment + * @blk_desc_pvec: blk desc fragment + * @peb_index: PEB's index + * @cno: fragment's checkpoint (log's checkpoint) + * + * This method tries to initialize PEB's table fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EIO - corrupted translation extent. + */ +int ssdfs_blk2off_table_partial_init(struct ssdfs_blk2off_table *table, + struct pagevec *blk2off_pvec, + struct pagevec *blk_desc_pvec, + u16 peb_index, + u64 cno) +{ + struct ssdfs_blk2off_init portion; + int extent_index = 0; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table || !blk2off_pvec || !blk_desc_pvec); + BUG_ON(peb_index >= table->pebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("table %p, peb_index %u\n", + table, peb_index); +#else + SSDFS_DBG("table %p, peb_index %u\n", + table, peb_index); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + memset(&portion, 0, sizeof(struct ssdfs_blk2off_init)); + + if (pagevec_count(blk2off_pvec) == 0) { + SSDFS_ERR("fail to init because of empty pagevec\n"); + return -EINVAL; + } + + if (ssdfs_blk2off_table_initialized(table, peb_index)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PEB's table has been initialized already: " + "peb_index %u\n", + peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + portion.table = table; + portion.blk2off_pvec = blk2off_pvec; + portion.blk_desc_pvec = blk_desc_pvec; + portion.peb_index = peb_index; + portion.cno = cno; + + portion.tbl_hdr_off = 0; + err = ssdfs_get_checked_table_header(&portion); + if (err) { + SSDFS_ERR("invalid table header\n"); + return err; + } + + down_write(&table->translation_lock); + + portion.capacity = table->lblk2off_capacity; + + err = ssdfs_blk2off_prepare_temp_bmap(&portion); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate memory\n"); + goto unlock_translation_table; + } + + err = ssdfs_blk2off_prepare_extent_array(&portion); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate memory\n"); + goto unlock_translation_table; + } + + portion.pot_hdr_off = portion.tbl_hdr_off + + le16_to_cpu(portion.tbl_hdr.offset_table_off); + + for (i = 0; i < portion.fragments_count; i++) { + err = ssdfs_blk2off_fragment_init(&portion, + i, + &extent_index); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("continue to process extent: " + "fragment %d, extent_index %d\n", + i, extent_index); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } else if (err == -EEXIST) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment has been initiliazed already: " + "fragment_index %d, extent_index %d\n", + i, extent_index); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } else if (unlikely(err)) { + SSDFS_ERR("fail to initialize fragment: " + "fragment_index %d, extent_index %d, " + "err %d\n", + i, extent_index, err); + goto unlock_translation_table; + } + } + + err = ssdfs_define_peb_table_state(table, peb_index); + if (err) { + SSDFS_ERR("fail to define PEB's table state: " + "peb_index %u, err %d\n", + peb_index, err); + goto unlock_translation_table; + } + + err = ssdfs_define_blk2off_table_object_state(table); + if (err) { + SSDFS_ERR("fail to define table object state: " + "err %d\n", + err); + goto unlock_translation_table; + } + +unlock_translation_table: + up_write(&table->translation_lock); + + ssdfs_blk2off_kvfree(portion.bmap); + portion.bmap = NULL; + ssdfs_blk2off_kfree(portion.extent_array); + portion.extent_array = NULL; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: err %d\n", err); +#else + SSDFS_DBG("finished: err %d\n", err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} diff --git a/fs/ssdfs/offset_translation_table.h b/fs/ssdfs/offset_translation_table.h new file mode 100644 index 000000000000..a999531dae59 --- /dev/null +++ b/fs/ssdfs/offset_translation_table.h @@ -0,0 +1,446 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/offset_translation_table.h - offset table declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates. + * https://www.bytedance.com/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + * Cong Wang + */ + +#ifndef _SSDFS_OFFSET_TRANSLATION_TABLE_H +#define _SSDFS_OFFSET_TRANSLATION_TABLE_H + +#include + +#include "request_queue.h" +#include "sequence_array.h" +#include "dynamic_array.h" + +/* + * struct ssdfs_phys_offset_table_fragment - fragment of phys offsets table + * @lock: table fragment lock + * @start_id: starting physical offset id number in fragment + * @sequence_id: fragment's sequence_id in PEB + * @id_count: count of id numbers in sequence + * @state: fragment state + * @hdr: pointer on fragment's header + * @phys_offs: array of physical offsets in fragment + * @buf: buffer of fragment + * @buf_size: size of buffer in bytes + * + * One fragment can be used for one PEB's log. But one log can contain + * several fragments too. In memory exists the same count of fragments + * as on volume. + */ +struct ssdfs_phys_offset_table_fragment { + struct rw_semaphore lock; + u16 start_id; + u16 sequence_id; + atomic_t id_count; + atomic_t state; + + struct ssdfs_phys_offset_table_header *hdr; + struct ssdfs_phys_offset_descriptor *phys_offs; + unsigned char *buf; + size_t buf_size; +}; + +enum { + SSDFS_BLK2OFF_FRAG_UNDEFINED, + SSDFS_BLK2OFF_FRAG_CREATED, + SSDFS_BLK2OFF_FRAG_INITIALIZED, + SSDFS_BLK2OFF_FRAG_DIRTY, + SSDFS_BLK2OFF_FRAG_UNDER_COMMIT, + SSDFS_BLK2OFF_FRAG_COMMITED, + SSDFS_BLK2OFF_FRAG_STATE_MAX, +}; + +#define SSDFS_INVALID_FRAG_ID U16_MAX +#define SSDFS_BLK2OFF_TBL_REVERT_THRESHOLD (U16_MAX - 1) + +/* + * struct ssdfs_phys_offset_table_array - array of log's fragments in PEB + * @state: PEB's translation table state + * @fragment_count: fragments count + * @array: array of fragments + */ +struct ssdfs_phys_offset_table_array { + atomic_t state; + atomic_t fragment_count; + struct ssdfs_sequence_array *sequence; +}; + +enum { + SSDFS_BLK2OFF_TABLE_UNDEFINED, + SSDFS_BLK2OFF_TABLE_CREATED, + SSDFS_BLK2OFF_TABLE_PARTIAL_INIT, + SSDFS_BLK2OFF_TABLE_DIRTY_PARTIAL_INIT, + SSDFS_BLK2OFF_TABLE_COMPLETE_INIT, + SSDFS_BLK2OFF_TABLE_DIRTY, + SSDFS_BLK2OFF_TABLE_STATE_MAX, +}; + +#define SSDFS_BLK2OFF_TABLE_INVALID_ID U16_MAX + +/* + * struct ssdfs_block_descriptor_state - block descriptor state + * @status: state of block descriptor buffer + * @buf: block descriptor buffer + */ +struct ssdfs_block_descriptor_state { + u32 status; + struct ssdfs_block_descriptor buf; +}; + +/* + * Block descriptor buffer state + */ +enum { + SSDFS_BLK_DESC_BUF_UNKNOWN_STATE, + SSDFS_BLK_DESC_BUF_INITIALIZED, + SSDFS_BLK_DESC_BUF_STATE_MAX, + SSDFS_BLK_DESC_BUF_ALLOCATED = U32_MAX, +}; + +/* + * struct ssdfs_offset_position - defines offset id and position + * @cno: checkpoint of change + * @id: physical offset ID + * @peb_index: PEB's index + * @sequence_id: sequence ID of physical offset table's fragment + * @offset_index: offset index inside of fragment + * @blk_desc: logical block descriptor + */ +struct ssdfs_offset_position { + u64 cno; + u16 id; + u16 peb_index; + u16 sequence_id; + u16 offset_index; + + struct ssdfs_block_descriptor_state blk_desc; +}; + +/* + * struct ssdfs_migrating_block - migrating block state + * @state: logical block's state + * @peb_index: PEB's index + * @pvec: copy of logical block's content (under migration only) + */ +struct ssdfs_migrating_block { + int state; + u16 peb_index; + struct pagevec pvec; +}; + +/* + * Migrating block's states + */ +enum { + SSDFS_LBLOCK_UNKNOWN_STATE, + SSDFS_LBLOCK_UNDER_MIGRATION, + SSDFS_LBLOCK_UNDER_COMMIT, + SSDFS_LBLOCK_STATE_MAX +}; + +enum { + SSDFS_LBMAP_INIT_INDEX, + SSDFS_LBMAP_STATE_INDEX, + SSDFS_LBMAP_MODIFICATION_INDEX, + SSDFS_LBMAP_ARRAY_MAX, +}; + + +/* + * struct ssdfs_bitmap_array - bitmap array + * @bits_count: number of available bits in every bitmap + * @bytes_count: number of allocated bytes in every bitmap + * @array: array of bitmaps + */ +struct ssdfs_bitmap_array { + u32 bits_count; + u32 bytes_count; + unsigned long *array[SSDFS_LBMAP_ARRAY_MAX]; +}; + +/* + * struct ssdfs_blk2off_table - in-core translation table + * @flags: flags of translation table + * @state: translation table object state + * @pages_per_peb: pages per physical erase block + * @pages_per_seg: pages per segment + * @type: translation table type + * @translation_lock: lock of translation operation + * @init_cno: last actual checkpoint + * @used_logical_blks: count of used logical blocks + * @free_logical_blks: count of free logical blocks + * @last_allocated_blk: last allocated block (hint for allocation) + * @lbmap: array of block bitmaps + * @lblk2off: array of correspondence between logical numbers and phys off ids + * @migrating_blks: array of migrating blocks + * @lblk2off_capacity: capacity of correspondence array + * @peb: sequence of physical offset arrays + * @pebs_count: count of PEBs in segment + * @partial_init_end: wait of partial init ending + * @full_init_end: wait of full init ending + * @wait_queue: wait queue of blk2off table + * @fsi: pointer on shared file system object + */ +struct ssdfs_blk2off_table { + atomic_t flags; + atomic_t state; + + u32 pages_per_peb; + u32 pages_per_seg; + u8 type; + + struct rw_semaphore translation_lock; + u64 init_cno; + u16 used_logical_blks; + u16 free_logical_blks; + u16 last_allocated_blk; + struct ssdfs_bitmap_array lbmap; + struct ssdfs_dynamic_array lblk2off; + struct ssdfs_dynamic_array migrating_blks; + u16 lblk2off_capacity; + + struct ssdfs_phys_offset_table_array *peb; + u16 pebs_count; + + struct completion partial_init_end; + struct completion full_init_end; + wait_queue_head_t wait_queue; + + struct ssdfs_fs_info *fsi; +}; + +#define SSDFS_OFF_POS(ptr) \ + ((struct ssdfs_offset_position *)(ptr)) +#define SSDFS_MIGRATING_BLK(ptr) \ + ((struct ssdfs_migrating_block *)(ptr)) + +enum { + SSDFS_BLK2OFF_OBJECT_UNKNOWN, + SSDFS_BLK2OFF_OBJECT_CREATED, + SSDFS_BLK2OFF_OBJECT_PARTIAL_INIT, + SSDFS_BLK2OFF_OBJECT_COMPLETE_INIT, + SSDFS_BLK2OFF_OBJECT_STATE_MAX, +}; + +/* + * struct ssdfs_blk2off_table_snapshot - table state snapshot + * @cno: checkpoint of snapshot + * @bmap_copy: copy of modification bitmap + * @tbl_copy: copy of translation table + * @capacity: capacity of table + * @used_logical_blks: count of used logical blocks + * @free_logical_blks: count of free logical blocks + * @last_allocated_blk: last allocated block (hint for allocation) + * @peb_index: PEB index + * @start_sequence_id: sequence ID of the first dirty fragment + * @dirty_fragments: count of dirty fragments + * @fragments_count: total count of fragments + * + * The @bmap_copy and @tbl_copy are allocated during getting + * snapshot inside of called function. Freeing of allocated + * memory SHOULD BE MADE by caller. + */ +struct ssdfs_blk2off_table_snapshot { + u64 cno; + + unsigned long *bmap_copy; + struct ssdfs_offset_position *tbl_copy; + u16 capacity; + + u16 used_logical_blks; + u16 free_logical_blks; + u16 last_allocated_blk; + + u16 peb_index; + u16 start_sequence_id; + u16 dirty_fragments; + u32 fragments_count; +}; + +/* + * struct ssdfs_blk2off_range - extent of logical blocks + * @start_lblk: start logical block number + * @len: count of logical blocks in extent + */ +struct ssdfs_blk2off_range { + u16 start_lblk; + u16 len; +}; + +/* + * Inline functions + */ + +/* + * ssdfs_blk2off_table_bmap_bytes() - calculate bmap bytes count + * @items_count: bits count in bitmap + */ +static inline +size_t ssdfs_blk2off_table_bmap_bytes(size_t items_count) +{ + size_t bytes; + + bytes = (items_count + BITS_PER_LONG - 1) / BITS_PER_BYTE; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_count %zu, bmap_bytes %zu\n", + items_count, bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + return bytes; +} + +static inline +bool is_ssdfs_logical_block_migrating(int blk_state) +{ + bool is_migrating = false; + + switch (blk_state) { + case SSDFS_LBLOCK_UNDER_MIGRATION: + case SSDFS_LBLOCK_UNDER_COMMIT: + is_migrating = true; + break; + + default: + /* do nothing */ + break; + } + + return is_migrating; +} + +/* Function prototypes */ +struct ssdfs_blk2off_table * +ssdfs_blk2off_table_create(struct ssdfs_fs_info *fsi, + u16 items_count, u8 type, + int state); +void ssdfs_blk2off_table_destroy(struct ssdfs_blk2off_table *table); +int ssdfs_blk2off_table_partial_init(struct ssdfs_blk2off_table *table, + struct pagevec *blk2off_pvec, + struct pagevec *blk2_desc_pvec, + u16 peb_index, + u64 cno); +int ssdfs_blk2off_table_blk_desc_init(struct ssdfs_blk2off_table *table, + u16 logical_blk, + struct ssdfs_offset_position *pos); +int ssdfs_blk2off_table_resize(struct ssdfs_blk2off_table *table, + u16 new_items_count); +int ssdfs_blk2off_table_snapshot(struct ssdfs_blk2off_table *table, + u16 peb_index, + struct ssdfs_blk2off_table_snapshot *snapshot); +void ssdfs_blk2off_table_free_snapshot(struct ssdfs_blk2off_table_snapshot *sp); +int ssdfs_blk2off_table_extract_extents(struct ssdfs_blk2off_table_snapshot *sp, + struct ssdfs_translation_extent *array, + u16 capacity, + u16 *extent_count); +int ssdfs_blk2off_table_prepare_for_commit(struct ssdfs_blk2off_table *table, + u16 peb_index, u16 sequence_id, + u32 *offset_table_off, + struct ssdfs_blk2off_table_snapshot *sp); +int ssdfs_peb_store_offsets_table_header(struct ssdfs_peb_info *pebi, + struct ssdfs_blk2off_table_header *hdr, + pgoff_t *cur_page, + u32 *write_offset); +int +ssdfs_peb_store_offsets_table_extents(struct ssdfs_peb_info *pebi, + struct ssdfs_translation_extent *array, + u16 extent_count, + pgoff_t *cur_page, + u32 *write_offset); +int ssdfs_peb_store_offsets_table_fragment(struct ssdfs_peb_info *pebi, + struct ssdfs_blk2off_table *table, + u16 peb_index, u16 sequence_id, + pgoff_t *cur_page, + u32 *write_offset); +int ssdfs_peb_store_offsets_table(struct ssdfs_peb_info *pebi, + struct ssdfs_metadata_descriptor *desc, + pgoff_t *cur_page, + u32 *write_offset); +int +ssdfs_blk2off_table_forget_snapshot(struct ssdfs_blk2off_table *table, + struct ssdfs_blk2off_table_snapshot *sp, + struct ssdfs_translation_extent *array, + u16 extent_count); + +bool ssdfs_blk2off_table_dirtied(struct ssdfs_blk2off_table *table, + u16 peb_index); +bool ssdfs_blk2off_table_initialized(struct ssdfs_blk2off_table *table, + u16 peb_index); + +int ssdfs_blk2off_table_get_used_logical_blks(struct ssdfs_blk2off_table *tbl, + u16 *used_blks); +int ssdfs_blk2off_table_get_offset_position(struct ssdfs_blk2off_table *table, + u16 logical_blk, + struct ssdfs_offset_position *pos); +struct ssdfs_phys_offset_descriptor * +ssdfs_blk2off_table_convert(struct ssdfs_blk2off_table *table, + u16 logical_blk, u16 *peb_index, + int *migration_state, + struct ssdfs_offset_position *pos); +int ssdfs_blk2off_table_allocate_block(struct ssdfs_blk2off_table *table, + u16 *logical_blk); +int ssdfs_blk2off_table_allocate_extent(struct ssdfs_blk2off_table *table, + u16 len, + struct ssdfs_blk2off_range *extent); +int ssdfs_blk2off_table_change_offset(struct ssdfs_blk2off_table *table, + u16 logical_blk, + u16 peb_index, + struct ssdfs_block_descriptor *blk_desc, + struct ssdfs_phys_offset_descriptor *off); +int ssdfs_blk2off_table_free_block(struct ssdfs_blk2off_table *table, + u16 peb_index, u16 logical_blk); +int ssdfs_blk2off_table_free_extent(struct ssdfs_blk2off_table *table, + u16 peb_index, + struct ssdfs_blk2off_range *extent); + +int ssdfs_blk2off_table_get_block_migration(struct ssdfs_blk2off_table *table, + u16 logical_blk, + u16 peb_index); +int ssdfs_blk2off_table_set_block_migration(struct ssdfs_blk2off_table *table, + u16 logical_blk, + u16 peb_index, + struct ssdfs_segment_request *req); +int ssdfs_blk2off_table_get_block_state(struct ssdfs_blk2off_table *table, + struct ssdfs_segment_request *req); +int ssdfs_blk2off_table_update_block_state(struct ssdfs_blk2off_table *table, + struct ssdfs_segment_request *req); +int ssdfs_blk2off_table_set_block_commit(struct ssdfs_blk2off_table *table, + u16 logical_blk, + u16 peb_index); +int ssdfs_blk2off_table_revert_migration_state(struct ssdfs_blk2off_table *tbl, + u16 peb_index); + +#ifdef CONFIG_SSDFS_TESTING +int ssdfs_blk2off_table_fragment_set_clean(struct ssdfs_blk2off_table *table, + u16 peb_index, u16 sequence_id); +#else +static inline +int ssdfs_blk2off_table_fragment_set_clean(struct ssdfs_blk2off_table *table, + u16 peb_index, u16 sequence_id) +{ + SSDFS_ERR("set fragment clean is not supported\n"); + return -EOPNOTSUPP; +} +#endif /* CONFIG_SSDFS_TESTING */ + +#endif /* _SSDFS_OFFSET_TRANSLATION_TABLE_H */ From patchwork Sat Feb 25 01:08:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151923 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E77BDC7EE2E for ; Sat, 25 Feb 2023 01:16:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229696AbjBYBQt (ORCPT ); Fri, 24 Feb 2023 20:16:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48692 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229690AbjBYBQ2 (ORCPT ); Fri, 24 Feb 2023 20:16:28 -0500 Received: from mail-oi1-x231.google.com (mail-oi1-x231.google.com [IPv6:2607:f8b0:4864:20::231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF09813516 for ; Fri, 24 Feb 2023 17:16:13 -0800 (PST) Received: by mail-oi1-x231.google.com with SMTP id o12so802569oik.6 for ; Fri, 24 Feb 2023 17:16:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vzKZF+MN7J4TeG2OpcacBeN5i9f5Sx3rJPlWEIJAXfw=; b=LHbSScYIOlXYb0eBz4YKvqDC2bUHpJQ+tiEBl4Esjc8+YZsXve4/sDFp0TVBvySKV/ KR+eUB4Z5DtpkbYOqrpsZtrC4RILMwfIivorFsXsNKAeqkWRZksKsVfAtW/fOg93TqDb eYfkT8fWeaUzPotU/3gyq/dYJbbHn32HkVNcrbZ8vpn/zs/hHh81fOB4dz8m70ecqdJW ZuOz5pNVu/RdMS0JZ0gQR2MGUcT5UVOoH52ISmvgZ6Xe0dqlMJsiyj0ZOVXsK+tVovS2 wN/VaRYthyCvy0DjokociXq11wJSGtFMlOokZnjoKko5c2mdYAN6kdDN3dC6hyC4Ti34 9eaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vzKZF+MN7J4TeG2OpcacBeN5i9f5Sx3rJPlWEIJAXfw=; b=sZseK0MPt8hK7oMnYcvA41k082dnvAkaH6YRR87A2c21GDugFJbRem66DK8oOcPny5 9RLRo9bkozOVbk8t6JYBJ8ccQJ+lFjIjSOWMFqaKyTXzsO47ClF1wEPfyzrkFKiCQzMT jl6HtXFFRjo/uMxEtOaC+POR/c4RqJQOcOW4z/yN34hbFtaGOSXM3mn2v3XB823AU6Pj RJj4KVrEbwjhVmKdyqO5UgrIMLnnNBo8gjsJ2xdv69XMeiZZnYfIwolGbitVIPboaEGx 8EqJ1OZo6qOfEUC+gzY+PxdLFWpqJQEgrpmvcLm32rXCnc0n62XhM75gGuLAf1IAfpCs e5WA== X-Gm-Message-State: AO0yUKW5wyzno4eAyJyVzM3YMuOXhjWytE7uNbPJ2vaXbQEexCgQshBO VVvlgPUmaMQY9vYfnNziMouL8PvXJlTG8+zT X-Google-Smtp-Source: AK7set+ZGCU2vKE6aL4BBoIVqgjt5169zkJ/MjsybP4edO4csHld1R58OAVWbOTZS8jSHn5UCjxW8Q== X-Received: by 2002:a05:6808:2896:b0:378:53b:f56d with SMTP id eu22-20020a056808289600b00378053bf56dmr4364765oib.37.1677287772594; Fri, 24 Feb 2023 17:16:12 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:11 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 18/76] ssdfs: flush offset translation table Date: Fri, 24 Feb 2023 17:08:29 -0800 Message-Id: <20230225010927.813929-19-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Offset translation table can be imagined like a sequence of fragments. Every fragment contains array of physical offset descriptors that provides the way to convert logical block ID into the physical offset in the log. Flush logic identifies dirty fragments and stores these fragments as part of log's metadata during log commit operation. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/offset_translation_table.c | 2250 +++++++++++++++++++++++++++ 1 file changed, 2250 insertions(+) diff --git a/fs/ssdfs/offset_translation_table.c b/fs/ssdfs/offset_translation_table.c index 169f8106c5be..ccdabc3b7f72 100644 --- a/fs/ssdfs/offset_translation_table.c +++ b/fs/ssdfs/offset_translation_table.c @@ -2912,3 +2912,2253 @@ int ssdfs_blk2off_table_partial_init(struct ssdfs_blk2off_table *table, return err; } + +const u16 last_used_blk[U8_MAX + 1] = { +/* 00 - 0x00 */ U16_MAX, 0, 1, 1, +/* 01 - 0x04 */ 2, 2, 2, 2, +/* 02 - 0x08 */ 3, 3, 3, 3, +/* 03 - 0x0C */ 3, 3, 3, 3, +/* 04 - 0x10 */ 4, 4, 4, 4, +/* 05 - 0x14 */ 4, 4, 4, 4, +/* 06 - 0x18 */ 4, 4, 4, 4, +/* 07 - 0x1C */ 4, 4, 4, 4, +/* 08 - 0x20 */ 5, 5, 5, 5, +/* 09 - 0x24 */ 5, 5, 5, 5, +/* 10 - 0x28 */ 5, 5, 5, 5, +/* 11 - 0x2C */ 5, 5, 5, 5, +/* 12 - 0x30 */ 5, 5, 5, 5, +/* 13 - 0x34 */ 5, 5, 5, 5, +/* 14 - 0x38 */ 5, 5, 5, 5, +/* 15 - 0x3C */ 5, 5, 5, 5, +/* 16 - 0x40 */ 6, 6, 6, 6, +/* 17 - 0x44 */ 6, 6, 6, 6, +/* 18 - 0x48 */ 6, 6, 6, 6, +/* 19 - 0x4C */ 6, 6, 6, 6, +/* 20 - 0x50 */ 6, 6, 6, 6, +/* 21 - 0x54 */ 6, 6, 6, 6, +/* 22 - 0x58 */ 6, 6, 6, 6, +/* 23 - 0x5C */ 6, 6, 6, 6, +/* 24 - 0x60 */ 6, 6, 6, 6, +/* 25 - 0x64 */ 6, 6, 6, 6, +/* 26 - 0x68 */ 6, 6, 6, 6, +/* 27 - 0x6C */ 6, 6, 6, 6, +/* 28 - 0x70 */ 6, 6, 6, 6, +/* 29 - 0x74 */ 6, 6, 6, 6, +/* 30 - 0x78 */ 6, 6, 6, 6, +/* 31 - 0x7C */ 6, 6, 6, 6, +/* 32 - 0x80 */ 7, 7, 7, 7, +/* 33 - 0x84 */ 7, 7, 7, 7, +/* 34 - 0x88 */ 7, 7, 7, 7, +/* 35 - 0x8C */ 7, 7, 7, 7, +/* 36 - 0x90 */ 7, 7, 7, 7, +/* 37 - 0x94 */ 7, 7, 7, 7, +/* 38 - 0x98 */ 7, 7, 7, 7, +/* 39 - 0x9C */ 7, 7, 7, 7, +/* 40 - 0xA0 */ 7, 7, 7, 7, +/* 41 - 0xA4 */ 7, 7, 7, 7, +/* 42 - 0xA8 */ 7, 7, 7, 7, +/* 43 - 0xAC */ 7, 7, 7, 7, +/* 44 - 0xB0 */ 7, 7, 7, 7, +/* 45 - 0xB4 */ 7, 7, 7, 7, +/* 46 - 0xB8 */ 7, 7, 7, 7, +/* 47 - 0xBC */ 7, 7, 7, 7, +/* 48 - 0xC0 */ 7, 7, 7, 7, +/* 49 - 0xC4 */ 7, 7, 7, 7, +/* 50 - 0xC8 */ 7, 7, 7, 7, +/* 51 - 0xCC */ 7, 7, 7, 7, +/* 52 - 0xD0 */ 7, 7, 7, 7, +/* 53 - 0xD4 */ 7, 7, 7, 7, +/* 54 - 0xD8 */ 7, 7, 7, 7, +/* 55 - 0xDC */ 7, 7, 7, 7, +/* 56 - 0xE0 */ 7, 7, 7, 7, +/* 57 - 0xE4 */ 7, 7, 7, 7, +/* 58 - 0xE8 */ 7, 7, 7, 7, +/* 59 - 0xEC */ 7, 7, 7, 7, +/* 60 - 0xF0 */ 7, 7, 7, 7, +/* 61 - 0xF4 */ 7, 7, 7, 7, +/* 62 - 0xF8 */ 7, 7, 7, 7, +/* 63 - 0xFC */ 7, 7, 7, 7 +}; + +/* + * ssdfs_blk2off_table_find_last_valid_block() - find last valid block + * @table: pointer on translation table object + * + * RETURN: + * [success] - last valid logical block number. + * [failure] - U16_MAX. + */ +static +u16 ssdfs_blk2off_table_find_last_valid_block(struct ssdfs_blk2off_table *table) +{ + u16 logical_blk; + unsigned long *lbmap; + unsigned char *byte; + int long_count, byte_count; + int i, j; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table); + BUG_ON(!rwsem_is_locked(&table->translation_lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + logical_blk = U16_MAX; + long_count = BITS_TO_LONGS(table->lbmap.bits_count); + lbmap = table->lbmap.array[SSDFS_LBMAP_STATE_INDEX]; + + for (i = long_count - 1; i >= 0; i--) { + if (lbmap[i] != 0) { + byte_count = sizeof(unsigned long); + for (j = byte_count - 1; j >= 0; j--) { + byte = (unsigned char *)lbmap[i] + j; + logical_blk = last_used_blk[*byte]; + if (logical_blk != U16_MAX) + break; + } + goto calculate_logical_blk; + } + } + +calculate_logical_blk: + if (logical_blk != U16_MAX) + logical_blk += i * BITS_PER_LONG; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("table %p, logical_blk %u\n", + table, logical_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + return logical_blk; +} + +/* + * ssdfs_blk2off_table_resize() - resize table + * @table: pointer on translation table object + * @new_items_count: new table size + * + * This method tries to grow or to shrink table. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - unable to shrink table. + * %-ENOMEM - unable to realloc table. + */ +int ssdfs_blk2off_table_resize(struct ssdfs_blk2off_table *table, + u16 new_items_count) +{ + u16 last_blk; + int diff; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table); + + SSDFS_DBG("table %p, lblk2off_capacity %u, new_items_count %u\n", + table, table->lblk2off_capacity, new_items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&table->translation_lock); + + if (new_items_count == table->lblk2off_capacity) { + SSDFS_WARN("new_items_count %u == lblk2off_capacity %u\n", + new_items_count, table->lblk2off_capacity); + goto finish_table_resize; + } else if (new_items_count < table->lblk2off_capacity) { + last_blk = ssdfs_blk2off_table_find_last_valid_block(table); + + if (last_blk != U16_MAX && last_blk >= new_items_count) { + err = -ERANGE; + SSDFS_ERR("unable to shrink bitmap: " + "last_blk %u >= new_items_count %u\n", + last_blk, new_items_count); + goto finish_table_resize; + } + } + + diff = (int)new_items_count - table->lblk2off_capacity; + + table->lblk2off_capacity = new_items_count; + table->free_logical_blks += diff; + +finish_table_resize: + up_write(&table->translation_lock); + + return err; +} + +/* + * ssdfs_blk2off_table_dirtied() - check that PEB's table is dirty + * @table: pointer on translation table object + * @peb_index: PEB's index + */ +bool ssdfs_blk2off_table_dirtied(struct ssdfs_blk2off_table *table, + u16 peb_index) +{ + bool is_dirty = false; + struct ssdfs_phys_offset_table_array *phys_off_table; + struct ssdfs_sequence_array *sequence; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table); + BUG_ON(!table->peb); + BUG_ON(peb_index >= table->pebs_count); + + SSDFS_DBG("table %p, peb_index %u\n", + table, peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + phys_off_table = &table->peb[peb_index]; + sequence = phys_off_table->sequence; + is_dirty = has_ssdfs_sequence_array_state(sequence, + SSDFS_SEQUENCE_ITEM_DIRTY_TAG); + + switch (atomic_read(&phys_off_table->state)) { + case SSDFS_BLK2OFF_TABLE_DIRTY: + case SSDFS_BLK2OFF_TABLE_DIRTY_PARTIAL_INIT: + if (!is_dirty) { + /* table is dirty without dirty fragments */ + SSDFS_WARN("table is marked as dirty!\n"); + } + break; + + default: + if (is_dirty) { + /* there are dirty fragments but table is clean */ + SSDFS_WARN("table is not dirty\n"); + } + break; + } + + return is_dirty; +} + +/* + * ssdfs_blk2off_table_initialized() - check that PEB's table is initialized + * @table: pointer on translation table object + * @peb_index: PEB's index + */ +bool ssdfs_blk2off_table_initialized(struct ssdfs_blk2off_table *table, + u16 peb_index) +{ + int state; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table); + BUG_ON(peb_index >= table->pebs_count); + + SSDFS_DBG("table %p, peb_index %u\n", + table, peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + BUG_ON(!table->peb); + + state = atomic_read(&table->peb[peb_index].state); + + return state >= SSDFS_BLK2OFF_TABLE_COMPLETE_INIT && + state < SSDFS_BLK2OFF_TABLE_STATE_MAX; +} + +static +int ssdfs_change_fragment_state(void *item, int old_state, int new_state) +{ + struct ssdfs_phys_offset_table_fragment *fragment = + (struct ssdfs_phys_offset_table_fragment *)item; + int state; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("old_state %#x, new_state %#x\n", + old_state, new_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!fragment) { + SSDFS_ERR("pointer is NULL\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sequence_id %u, state %#x\n", + fragment->sequence_id, + atomic_read(&fragment->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_cmpxchg(&fragment->state, old_state, new_state); + + switch (new_state) { + case SSDFS_BLK2OFF_FRAG_DIRTY: + switch (state) { + case SSDFS_BLK2OFF_FRAG_CREATED: + case SSDFS_BLK2OFF_FRAG_INITIALIZED: + case SSDFS_BLK2OFF_FRAG_DIRTY: + /* expected old state */ + break; + + default: + SSDFS_ERR("invalid old_state %#x\n", + old_state); + return -ERANGE; + } + break; + + default: + if (state != old_state) { + SSDFS_ERR("state %#x != old_state %#x\n", + state, old_state); + return -ERANGE; + } + break; + } + + return 0; +} + +static inline +int ssdfs_calculate_start_sequence_id(u16 last_sequence_id, + u16 dirty_fragments, + u16 *start_sequence_id) +{ + u16 upper_bound; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!start_sequence_id); + + SSDFS_DBG("last_sequence_id %u, dirty_fragments %u\n", + last_sequence_id, dirty_fragments); +#endif /* CONFIG_SSDFS_DEBUG */ + + *start_sequence_id = U16_MAX; + + if (last_sequence_id > SSDFS_BLK2OFF_TBL_REVERT_THRESHOLD) { + SSDFS_ERR("invalid last_sequence_id %u\n", + last_sequence_id); + return -ERANGE; + } + + if (dirty_fragments > SSDFS_BLK2OFF_TBL_REVERT_THRESHOLD) { + SSDFS_ERR("invalid dirty_fragments %u\n", + dirty_fragments); + return -ERANGE; + } + + upper_bound = last_sequence_id + 1; + + if (upper_bound >= dirty_fragments) + *start_sequence_id = upper_bound - dirty_fragments; + else { + *start_sequence_id = SSDFS_BLK2OFF_TBL_REVERT_THRESHOLD - + (dirty_fragments - upper_bound); + } + + return 0; +} + +/* + * ssdfs_blk2off_table_snapshot() - get table's snapshot + * @table: pointer on translation table object + * @peb_index: PEB's index + * @snapshot: pointer on table's snapshot object + * + * This method tries to get table's snapshot. The @bmap_copy + * and @tbl_copy fields of snapshot object are allocated during + * getting snapshot by this method. Freeing of allocated + * memory SHOULD BE MADE by caller. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal logic error. + * %-ENOMEM - fail to allocate memory. + * %-ENODATA - PEB hasn't dirty fragments. + */ +int ssdfs_blk2off_table_snapshot(struct ssdfs_blk2off_table *table, + u16 peb_index, + struct ssdfs_blk2off_table_snapshot *snapshot) +{ + struct ssdfs_phys_offset_table_array *pot_table; + struct ssdfs_sequence_array *sequence; + u32 capacity; + size_t bmap_bytes, tbl_bytes; + u16 last_sequence_id; + unsigned long dirty_fragments; + int state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table || !snapshot); + BUG_ON(peb_index >= table->pebs_count); + + SSDFS_DBG("table %p, peb_index %u, snapshot %p\n", + table, peb_index, snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset(snapshot, 0, sizeof(struct ssdfs_blk2off_table_snapshot)); + snapshot->bmap_copy = NULL; + snapshot->tbl_copy = NULL; + + down_write(&table->translation_lock); + + if (!ssdfs_blk2off_table_dirtied(table, peb_index)) { + err = -ENODATA; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("table isn't dirty for peb_index %u\n", + peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_snapshoting; + } + + capacity = ssdfs_dynamic_array_items_count(&table->lblk2off); + if (capacity == 0) { + err = -ERANGE; + SSDFS_ERR("invalid capacity %u\n", capacity); + goto finish_snapshoting; + } + + bmap_bytes = ssdfs_blk2off_table_bmap_bytes(table->lbmap.bits_count); + snapshot->bmap_copy = ssdfs_blk2off_kvzalloc(bmap_bytes, GFP_KERNEL); + if (!snapshot->bmap_copy) { + err = -ENOMEM; + SSDFS_ERR("fail to allocated bytes %zu\n", + bmap_bytes); + goto finish_snapshoting; + } + + tbl_bytes = ssdfs_dynamic_array_allocated_bytes(&table->lblk2off); + if (tbl_bytes == 0) { + err = -ERANGE; + SSDFS_ERR("invalid bytes count %zu\n", tbl_bytes); + goto finish_snapshoting; + } + + snapshot->tbl_copy = ssdfs_blk2off_kvzalloc(tbl_bytes, GFP_KERNEL); + if (!snapshot->tbl_copy) { + err = -ENOMEM; + SSDFS_ERR("fail to allocated bytes %zu\n", + tbl_bytes); + goto finish_snapshoting; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("capacity %u, bits_count %u, " + "bmap_bytes %zu, tbl_bytes %zu, " + "last_allocated_blk %u\n", + capacity, table->lbmap.bits_count, + bmap_bytes, tbl_bytes, + table->last_allocated_blk); + SSDFS_DBG("init_bmap %lx, state_bmap %lx, bmap_copy %lx\n", + *table->lbmap.array[SSDFS_LBMAP_INIT_INDEX], + *table->lbmap.array[SSDFS_LBMAP_STATE_INDEX], + *snapshot->bmap_copy); +#endif /* CONFIG_SSDFS_DEBUG */ + + bitmap_or(snapshot->bmap_copy, + snapshot->bmap_copy, + table->lbmap.array[SSDFS_LBMAP_MODIFICATION_INDEX], + table->lbmap.bits_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("modification_bmap %lx, bmap_copy %lx\n", + *table->lbmap.array[SSDFS_LBMAP_MODIFICATION_INDEX], + *snapshot->bmap_copy); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_dynamic_array_copy_content(&table->lblk2off, + snapshot->tbl_copy, + tbl_bytes); + if (unlikely(err)) { + SSDFS_ERR("fail to copy position array: " + "err %d\n", err); + goto finish_snapshoting; + } + + snapshot->capacity = capacity; + + snapshot->used_logical_blks = table->used_logical_blks; + snapshot->free_logical_blks = table->free_logical_blks; + snapshot->last_allocated_blk = table->last_allocated_blk; + + snapshot->peb_index = peb_index; + snapshot->start_sequence_id = SSDFS_INVALID_FRAG_ID; + + sequence = table->peb[peb_index].sequence; + err = ssdfs_sequence_array_change_all_states(sequence, + SSDFS_SEQUENCE_ITEM_DIRTY_TAG, + SSDFS_SEQUENCE_ITEM_UNDER_COMMIT_TAG, + ssdfs_change_fragment_state, + SSDFS_BLK2OFF_FRAG_DIRTY, + SSDFS_BLK2OFF_FRAG_UNDER_COMMIT, + &dirty_fragments); + if (unlikely(err)) { + SSDFS_ERR("fail to change from dirty to under_commit: " + "err %d\n", err); + goto finish_snapshoting; + } else if (dirty_fragments >= U16_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid dirty_fragments %lu\n", + dirty_fragments); + goto finish_snapshoting; + } + +#ifdef CONFIG_SSDFS_SAVE_WHOLE_BLK2OFF_TBL_IN_EVERY_LOG + snapshot->start_sequence_id = 0; + snapshot->dirty_fragments = dirty_fragments; + snapshot->fragments_count = + atomic_read(&table->peb[peb_index].fragment_count); +#else + snapshot->dirty_fragments = dirty_fragments; + + last_sequence_id = + ssdfs_sequence_array_last_id(table->peb[peb_index].sequence); + err = ssdfs_calculate_start_sequence_id(last_sequence_id, + snapshot->dirty_fragments, + &snapshot->start_sequence_id); + if (unlikely(err)) { + SSDFS_ERR("fail to calculate start sequence ID: " + "err %d\n", err); + goto finish_snapshoting; + } + + snapshot->fragments_count = + atomic_read(&table->peb[peb_index].fragment_count); +#endif /* CONFIG_SSDFS_SAVE_WHOLE_BLK2OFF_TBL_IN_EVERY_LOG */ + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_sequence_id %u, dirty_fragments %u\n", + snapshot->start_sequence_id, + snapshot->dirty_fragments); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (snapshot->dirty_fragments == 0) { + err = -ERANGE; + SSDFS_ERR("PEB hasn't dirty fragments\n"); + goto finish_snapshoting; + } + + snapshot->cno = ssdfs_current_cno(table->fsi->sb); + + pot_table = &table->peb[peb_index]; + state = atomic_cmpxchg(&pot_table->state, + SSDFS_BLK2OFF_TABLE_DIRTY_PARTIAL_INIT, + SSDFS_BLK2OFF_TABLE_PARTIAL_INIT); + if (state != SSDFS_BLK2OFF_TABLE_DIRTY_PARTIAL_INIT) { + state = atomic_cmpxchg(&pot_table->state, + SSDFS_BLK2OFF_TABLE_DIRTY, + SSDFS_BLK2OFF_TABLE_COMPLETE_INIT); + if (state != SSDFS_BLK2OFF_TABLE_DIRTY) { + err = -ERANGE; + SSDFS_ERR("table isn't dirty: " + "state %#x\n", + state); + goto finish_snapshoting; + } + } + +finish_snapshoting: + up_write(&table->translation_lock); + + if (err) { + if (snapshot->bmap_copy) { + ssdfs_blk2off_kvfree(snapshot->bmap_copy); + snapshot->bmap_copy = NULL; + } + + if (snapshot->tbl_copy) { + ssdfs_blk2off_kvfree(snapshot->tbl_copy); + snapshot->tbl_copy = NULL; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_blk2off_table_free_snapshot() - free snapshot's resources + * @sp: pointer on tabls's snapshot + */ +void ssdfs_blk2off_table_free_snapshot(struct ssdfs_blk2off_table_snapshot *sp) +{ + if (!sp) + return; + + if (sp->bmap_copy) { + ssdfs_blk2off_kvfree(sp->bmap_copy); + sp->bmap_copy = NULL; + } + + if (sp->tbl_copy) { + ssdfs_blk2off_kvfree(sp->tbl_copy); + sp->tbl_copy = NULL; + } + + memset(sp, 0, sizeof(struct ssdfs_blk2off_table_snapshot)); +} + +/* + * ssdfs_find_changed_area() - find changed area + * @sp: table's snapshot + * @start: starting bit for search + * @found: found range of set bits + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal logic error. + * %-ENODATA - nothing was found. + */ +static inline +int ssdfs_find_changed_area(struct ssdfs_blk2off_table_snapshot *sp, + unsigned long start, + struct ssdfs_blk2off_range *found) +{ + unsigned long modified_bits; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!sp || !found); +#endif /* CONFIG_SSDFS_DEBUG */ + + modified_bits = bitmap_weight(sp->bmap_copy, sp->capacity); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("snapshot %p, peb_index %u, start %lu, found %p\n", + sp, sp->peb_index, start, found); + SSDFS_DBG("modified_bits %lu, capacity %u\n", + modified_bits, sp->capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + start = find_next_bit(sp->bmap_copy, sp->capacity, start); + if (start >= sp->capacity) { + SSDFS_DBG("nothing found\n"); + return -ENODATA; + } + + found->start_lblk = (u16)start; + + start = find_next_zero_bit(sp->bmap_copy, sp->capacity, start); + start = (unsigned long)min_t(u16, (u16)start, sp->capacity); + + found->len = (u16)(start - found->start_lblk); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found_start %lu, found_end %lu, len %lu\n", + (unsigned long)found->start_lblk, + start, + (unsigned long)found->len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (found->len == 0) { + SSDFS_ERR("found empty extent\n"); + return -ERANGE; + } + + return 0; +} + +/* + * struct ssdfs_blk2off_found_range - found range + * @range: range descriptor + * @start_id: starting offset ID + * @state: state of logical blocks in extent (used, free and so on) + */ +struct ssdfs_blk2off_found_range { + struct ssdfs_blk2off_range range; + u16 start_id; + u8 state; +}; + +/* + * ssdfs_translation_extent_init() - init translation extent + * @found: range of changed logical blocks + * @sequence_id: sequence ID of extent + * @extent: pointer on initialized extent [out] + */ +static inline +void ssdfs_translation_extent_init(struct ssdfs_blk2off_found_range *found, + u8 sequence_id, + struct ssdfs_translation_extent *extent) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!found || !extent); + BUG_ON(found->state <= SSDFS_LOGICAL_BLK_UNKNOWN_STATE || + found->state >= SSDFS_LOGICAL_BLK_STATE_MAX); + + SSDFS_DBG("start %u, len %u, id %u, sequence_id %u, state %#x\n", + found->range.start_lblk, found->range.len, + found->start_id, sequence_id, found->state); +#endif /* CONFIG_SSDFS_DEBUG */ + + extent->logical_blk = cpu_to_le16(found->range.start_lblk); + extent->offset_id = cpu_to_le16(found->start_id); + extent->len = cpu_to_le16(found->range.len); + extent->sequence_id = sequence_id; + extent->state = found->state; +} + +/* + * can_translation_extent_be_merged() - check opportunity to merge extents + * @extent: extent for checking + * @found: range of changed logical blocks + */ +static inline +bool can_translation_extent_be_merged(struct ssdfs_translation_extent *extent, + struct ssdfs_blk2off_found_range *found) +{ + u16 logical_blk; + u16 offset_id; + u16 len; + u16 found_blk; + u16 found_len; + u16 found_id; + u8 found_state; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!extent || !found); + BUG_ON(found->start_id == SSDFS_BLK2OFF_TABLE_INVALID_ID); + BUG_ON(found->state <= SSDFS_LOGICAL_BLK_UNKNOWN_STATE || + found->state >= SSDFS_LOGICAL_BLK_STATE_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + logical_blk = le16_to_cpu(extent->logical_blk); + offset_id = le16_to_cpu(extent->offset_id); + len = le16_to_cpu(extent->len); + + found_blk = found->range.start_lblk; + found_len = found->range.len; + found_id = found->start_id; + found_state = found->state; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("EXTENT: logical_blk %u, offset_id %u, len %u, " + "sequence_id %u, state %#x; " + "FOUND: logical_blk %u, start_id %u, " + "len %u, state %#x\n", + logical_blk, offset_id, len, + extent->sequence_id, extent->state, + found->range.start_lblk, found->start_id, + found->range.len, found->state); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (extent->state != found->state) + return false; + + if (found_id == offset_id) { + SSDFS_ERR("start_id %u == offset_id %u\n", + found_id, offset_id); + return false; + } else if (found_id > offset_id && + (offset_id + len) == found_id) { + if ((logical_blk + len) == found_blk) + return true; + else if ((found_blk + found_len) == logical_blk) + return true; + } else if (found_id < offset_id && + (found_id + found_len) == offset_id) { + if ((logical_blk + len) == found_blk) + return true; + else if ((found_blk + found_len) == logical_blk) + return true; + } + + return false; +} + +/* + * ssdfs_merge_translation_extent() - merge translation extents + * @extent: extent for checking + * @found: range of changed logical blocks + */ +static inline +int ssdfs_merge_translation_extent(struct ssdfs_translation_extent *extent, + struct ssdfs_blk2off_found_range *found) +{ + u16 logical_blk; + u16 offset_id; + u16 len; + u16 found_blk; + u16 found_len; + u16 found_id; + u8 found_state; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!extent || !found); + BUG_ON(found->start_id == SSDFS_BLK2OFF_TABLE_INVALID_ID); + BUG_ON(found->state <= SSDFS_LOGICAL_BLK_UNKNOWN_STATE || + found->state >= SSDFS_LOGICAL_BLK_STATE_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + logical_blk = le16_to_cpu(extent->logical_blk); + offset_id = le16_to_cpu(extent->offset_id); + len = le16_to_cpu(extent->len); + + found_blk = found->range.start_lblk; + found_len = found->range.len; + found_id = found->start_id; + found_state = found->state; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("EXTENT: logical_blk %u, offset_id %u, len %u, " + "sequence_id %u, state %#x; " + "FOUND: logical_blk %u, start_id %u, " + "len %u, state %#x\n", + logical_blk, offset_id, len, + extent->sequence_id, extent->state, + found_blk, found_id, found_len, + found_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (extent->state != found_state) { + SSDFS_ERR("extent->state %#x != state %#x\n", + extent->state, found_state); + return -EINVAL; + } + + if (found_id == offset_id) { + SSDFS_ERR("start_id %u == offset_id %u\n", + found_id, offset_id); + return -ERANGE; + } + + if (found_id > offset_id && + (offset_id + len) == found_id) { + if ((logical_blk + len) == found_blk) { + extent->len = cpu_to_le16(len + found_len); + } else if ((found_blk + found_len) == logical_blk) { + extent->logical_blk = cpu_to_le16(found_blk); + extent->len = cpu_to_le16(len + found_len); + } + } else if (found_id < offset_id && + (found_id + found_len) == offset_id) { + if ((logical_blk + len) == found_blk) { + extent->offset_id = cpu_to_le16(found_id); + extent->len = cpu_to_le16(len + found_len); + } else if ((found_blk + found_len) == logical_blk) { + extent->logical_blk = cpu_to_le16(found_blk); + extent->offset_id = cpu_to_le16(found_id); + extent->len = cpu_to_le16(len + found_len); + } + } else { + SSDFS_ERR("fail to merge the translation extent\n"); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_insert_translation_extent() - insert translation extent into the queue + * @found: range of changed logical blocks + * @array: extents array [in|out] + * @capacity: capacity of extents array + * @extent_count: pointer on extents count value [out] + */ +static inline +int ssdfs_insert_translation_extent(struct ssdfs_blk2off_found_range *found, + struct ssdfs_translation_extent *array, + u16 capacity, u16 *extent_count) +{ + struct ssdfs_translation_extent *extent; + size_t extent_size = sizeof(struct ssdfs_translation_extent); + size_t array_bytes = extent_size * capacity; + u16 logical_blk; + u16 offset_id; + u16 len; + int i, j; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!found || !extent_count); + BUG_ON(found->state <= SSDFS_LOGICAL_BLK_UNKNOWN_STATE || + found->state >= SSDFS_LOGICAL_BLK_STATE_MAX); + + SSDFS_DBG("start_id %u, state %#x, extent_count %u\n", + found->start_id, found->state, *extent_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + BUG_ON(*extent_count >= capacity); + + if (found->start_id == SSDFS_BLK2OFF_TABLE_INVALID_ID) { + extent = &array[*extent_count]; + ssdfs_translation_extent_init(found, *extent_count, extent); + (*extent_count)++; + + return 0; + } + + for (i = 0; i < *extent_count; i++) { + extent = &array[i]; + + logical_blk = le16_to_cpu(extent->logical_blk); + offset_id = le16_to_cpu(extent->offset_id); + len = le16_to_cpu(extent->len); + + if (offset_id >= SSDFS_BLK2OFF_TABLE_INVALID_ID) + continue; + + if (found->start_id == offset_id) { + SSDFS_ERR("start_id %u == offset_id %u\n", + found->start_id, offset_id); + return -ERANGE; + } else if (found->start_id > offset_id && + can_translation_extent_be_merged(extent, found)) { + err = ssdfs_merge_translation_extent(extent, found); + if (unlikely(err)) { + SSDFS_ERR("fail to merge extent: " + "err %d\n", err); + return err; + } else + return 0; + } else if (found->start_id < offset_id) { + if (can_translation_extent_be_merged(extent, found)) { + err = ssdfs_merge_translation_extent(extent, + found); + if (unlikely(err)) { + SSDFS_ERR("fail to merge extent: " + "err %d\n", err); + return err; + } else + return 0; + } else { + i++; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to merge: index %d\n", i); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + } + } + } + + if (i < *extent_count) { +#ifdef CONFIG_SSDFS_DEBUG + if (((i + 1) + (*extent_count - i)) > capacity) { + SSDFS_WARN("value is out capacity\n"); + return -ERANGE; + } + + SSDFS_DBG("extent_count %u, index %d, extent_size %zu\n", + *extent_count, i, extent_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_memmove(array, (i + 1) * extent_size, array_bytes, + array, i * extent_size, array_bytes, + (*extent_count - i) * extent_size); + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", err); + return err; + } + + for (j = i + 1; j <= *extent_count; j++) { + extent = &array[j]; + extent->sequence_id = j; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("extent_count %u, index %d, extent_size %zu\n", + *extent_count, i, extent_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + extent = &array[i]; + ssdfs_translation_extent_init(found, i, extent); + + (*extent_count)++; + +#ifdef CONFIG_SSDFS_DEBUG + for (i = 0; i < *extent_count; i++) { + extent = &array[i]; + + SSDFS_DBG("index %d, logical_blk %u, offset_id %u, " + "len %u, sequence_id %u, state %u\n", + i, + le16_to_cpu(extent->logical_blk), + le16_to_cpu(extent->offset_id), + le16_to_cpu(extent->len), + extent->sequence_id, + extent->state); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +static inline +bool is_found_logical_block_free(struct ssdfs_blk2off_table_snapshot *sp, + u16 blk) +{ + struct ssdfs_offset_position *pos; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!sp); + + SSDFS_DBG("blk %u\n", blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + pos = &sp->tbl_copy[blk]; + + return pos->id == SSDFS_BLK2OFF_TABLE_INVALID_ID && + pos->offset_index >= U16_MAX; +} + +static inline +bool is_found_extent_ended(struct ssdfs_blk2off_table_snapshot *sp, + u16 blk, + struct ssdfs_blk2off_found_range *found) +{ + struct ssdfs_offset_position *pos; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!sp || !found); + + SSDFS_DBG("blk %u\n", blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + pos = &sp->tbl_copy[blk]; + + if (pos->peb_index != sp->peb_index) { + /* changes of another PEB */ + return true; + } else if (pos->id != SSDFS_BLK2OFF_TABLE_INVALID_ID) { + if (found->start_id == SSDFS_BLK2OFF_TABLE_INVALID_ID) + found->start_id = pos->id; + else if ((found->start_id + found->range.len) != pos->id) + return true; + } else if (pos->id == SSDFS_BLK2OFF_TABLE_INVALID_ID && + found->state != SSDFS_LOGICAL_BLK_FREE) { + if (found->range.start_lblk != U16_MAX) { + /* state is changed */ + return true; + } + } + + return false; +} + +/* + * ssdfs_blk2off_table_extract_extents() - extract changed extents + * @sp: table's snapshot + * @array: extents array [in|out] + * @capacity: capacity of extents array + * @extent_count: pointer on extents count value [out] + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal logic error. + */ +int ssdfs_blk2off_table_extract_extents(struct ssdfs_blk2off_table_snapshot *sp, + struct ssdfs_translation_extent *array, + u16 capacity, u16 *extent_count) +{ + unsigned long start = 0; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!sp || !array || !extent_count); + BUG_ON(capacity == 0); + + SSDFS_DBG("snapshot %p, peb_index %u, extents %p, " + "capacity %u, extent_count %p\n", + sp, sp->peb_index, array, + capacity, extent_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + *extent_count = 0; + + do { + struct ssdfs_blk2off_range changed_area = {0}; + struct ssdfs_blk2off_found_range found = { + .range.start_lblk = U16_MAX, + .range.len = 0, + .start_id = SSDFS_BLK2OFF_TABLE_INVALID_ID, + .state = SSDFS_LOGICAL_BLK_UNKNOWN_STATE, + }; + struct ssdfs_offset_position *pos; + + err = ssdfs_find_changed_area(sp, start, &changed_area); + if (err == -ENODATA) { + err = 0; + SSDFS_DBG("nothing found\n"); + goto finish_extract_extents; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find changed area: err %d\n", + err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("changed area: start %u, len %u\n", + changed_area.start_lblk, changed_area.len); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < changed_area.len; i++) { + u16 blk = changed_area.start_lblk + i; + bool is_extent_ended = false; + + pos = &sp->tbl_copy[blk]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cno %llx, id %u, peb_index %u, " + "sequence_id %u, offset_index %u\n", + pos->cno, pos->id, pos->peb_index, + pos->sequence_id, pos->offset_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pos->peb_index == U16_MAX) { + SSDFS_WARN("invalid peb_index: " + "logical_blk %u\n", + blk); + return -ERANGE; + } + + if (is_found_logical_block_free(sp, blk)) { + /* free block */ + + switch (found.state) { + case SSDFS_LOGICAL_BLK_UNKNOWN_STATE: + found.range.start_lblk = blk; + found.range.len = 1; + found.state = SSDFS_LOGICAL_BLK_FREE; + break; + + case SSDFS_LOGICAL_BLK_FREE: + found.range.len++; + break; + + case SSDFS_LOGICAL_BLK_USED: + is_extent_ended = true; + break; + + default: + SSDFS_ERR("unexpected blk state %#x\n", + found.state); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free block: start_lblk %u, " + "len %u, state %#x, " + "is_extent_ended %#x\n", + found.range.start_lblk, + found.range.len, + found.state, + is_extent_ended); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + /* used block */ + + switch (found.state) { + case SSDFS_LOGICAL_BLK_UNKNOWN_STATE: + found.range.start_lblk = blk; + found.range.len = 1; + found.start_id = pos->id; + found.state = SSDFS_LOGICAL_BLK_USED; + break; + + case SSDFS_LOGICAL_BLK_USED: + is_extent_ended = + is_found_extent_ended(sp, blk, + &found); + if (!is_extent_ended) + found.range.len++; + break; + + case SSDFS_LOGICAL_BLK_FREE: + is_extent_ended = true; + break; + + default: + SSDFS_ERR("unexpected blk state %#x\n", + found.state); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("used block: start_lblk %u, " + "len %u, state %#x, " + "is_extent_ended %#x\n", + found.range.start_lblk, + found.range.len, + found.state, + is_extent_ended); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + if (is_extent_ended) { + if (found.range.start_lblk == U16_MAX) { + SSDFS_ERR("invalid start_lblk %u\n", + found.range.start_lblk); + return -ERANGE; + } + + err = ssdfs_insert_translation_extent(&found, + array, + capacity, + extent_count); + if (unlikely(err)) { + SSDFS_ERR("fail to insert extent: " + "start_id %u, state %#x, " + "err %d\n", + found.start_id, found.state, + err); + return err; + } + + pos = &sp->tbl_copy[blk]; + + if (pos->id == SSDFS_BLK2OFF_TABLE_INVALID_ID) + found.state = SSDFS_LOGICAL_BLK_FREE; + else + found.state = SSDFS_LOGICAL_BLK_USED; + + found.range.start_lblk = blk; + found.range.len = 1; + found.start_id = pos->id; + } + } + + if (found.range.start_lblk != U16_MAX) { + err = ssdfs_insert_translation_extent(&found, + array, + capacity, + extent_count); + if (unlikely(err)) { + SSDFS_ERR("fail to insert extent: " + "start_id %u, state %#x, " + "err %d\n", + found.start_id, found.state, err); + return err; + } + + start = found.range.start_lblk + found.range.len; + + found.range.start_lblk = U16_MAX; + found.range.len = 0; + found.state = SSDFS_LOGICAL_BLK_UNKNOWN_STATE; + } else + start = changed_area.start_lblk + changed_area.len; + } while (start < sp->capacity); + +finish_extract_extents: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("extents_count %u\n", *extent_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (*extent_count == 0) { + err = -ERANGE; + SSDFS_ERR("invalid state of change bitmap\n"); + return err; + } + + return 0; +} + +/* + * ssdfs_blk2off_table_prepare_for_commit() - prepare fragment for commit + * @table: pointer on table object + * @peb_index: PEB's index + * @sequence_id: fragment's sequence ID + * @offset_table_off: pointer on current offset to offset table header [in|out] + * @sp: pointer on snapshot + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal logic error. + */ +int +ssdfs_blk2off_table_prepare_for_commit(struct ssdfs_blk2off_table *table, + u16 peb_index, u16 sequence_id, + u32 *offset_table_off, + struct ssdfs_blk2off_table_snapshot *sp) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_phys_offset_table_array *pot_table; + struct ssdfs_sequence_array *sequence; + struct ssdfs_phys_offset_table_fragment *fragment; + void *ptr; + u16 id_count; + u32 byte_size; + u16 flags = 0; + int last_sequence_id; + bool has_next_fragment = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table || !sp || !table->fsi || !offset_table_off); + BUG_ON(peb_index >= table->pebs_count); + BUG_ON(peb_index != sp->peb_index); + + SSDFS_DBG("table %p, peb_index %u, sequence_id %u, " + "offset_table_off %p, sp %p\n", + table, peb_index, sequence_id, + offset_table_off, sp); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = table->fsi; + + down_read(&table->translation_lock); + + pot_table = &table->peb[peb_index]; + + sequence = pot_table->sequence; + ptr = ssdfs_sequence_array_get_item(sequence, sequence_id); + if (IS_ERR_OR_NULL(ptr)) { + err = (ptr == NULL ? -ENOENT : PTR_ERR(ptr)); + SSDFS_ERR("fail to get fragment: " + "sequence_id %u, err %d\n", + sequence_id, err); + goto finish_prepare_for_commit; + } + fragment = (struct ssdfs_phys_offset_table_fragment *)ptr; + + if (atomic_read(&fragment->state) != SSDFS_BLK2OFF_FRAG_UNDER_COMMIT) { + err = -ERANGE; + SSDFS_ERR("fragment isn't under commit: " + "state %#x\n", + atomic_read(&fragment->state)); + goto finish_prepare_for_commit; + } + + down_write(&fragment->lock); + + fragment->hdr->magic = cpu_to_le32(SSDFS_PHYS_OFF_TABLE_MAGIC); + fragment->hdr->checksum = 0; + + fragment->hdr->start_id = cpu_to_le16(fragment->start_id); + id_count = (u16)atomic_read(&fragment->id_count); + fragment->hdr->id_count = cpu_to_le16(id_count); + byte_size = sizeof(struct ssdfs_phys_offset_table_header); + byte_size += (u32)id_count * sizeof(struct ssdfs_phys_offset_descriptor); + fragment->hdr->byte_size = cpu_to_le32(byte_size); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hdr_size %zu, id_count %u, " + "desc_size %zu, byte_size %u\n", + sizeof(struct ssdfs_phys_offset_table_header), + id_count, + sizeof(struct ssdfs_phys_offset_descriptor), + byte_size); + SSDFS_DBG("fragment: start_id %u, id_count %u\n", + le16_to_cpu(fragment->hdr->start_id), + le16_to_cpu(fragment->hdr->id_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fragment->hdr->peb_index = cpu_to_le16(peb_index); + fragment->hdr->sequence_id = cpu_to_le16(fragment->sequence_id); + fragment->hdr->type = cpu_to_le16(table->type); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sequence_id %u, start_sequence_id %u, " + "dirty_fragments %u, fragment->sequence_id %u\n", + sequence_id, sp->start_sequence_id, + sp->dirty_fragments, + fragment->sequence_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + last_sequence_id = ssdfs_sequence_array_last_id(pot_table->sequence); + has_next_fragment = sequence_id != last_sequence_id; + + flags |= SSDFS_OFF_TABLE_HAS_CSUM; + if (has_next_fragment) + flags |= SSDFS_OFF_TABLE_HAS_NEXT_FRAGMENT; + + switch (fsi->metadata_options.blk2off_tbl.compression) { + case SSDFS_BLK2OFF_TBL_ZLIB_COMPR_TYPE: + case SSDFS_BLK2OFF_TBL_LZO_COMPR_TYPE: + flags |= SSDFS_BLK_DESC_TBL_COMPRESSED; + break; + default: + /* do nothing */ + break; + } + + fragment->hdr->flags = cpu_to_le16(flags); + + fragment->hdr->used_logical_blks = cpu_to_le16(sp->used_logical_blks); + fragment->hdr->free_logical_blks = cpu_to_le16(sp->free_logical_blks); + fragment->hdr->last_allocated_blk = cpu_to_le16(sp->last_allocated_blk); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(byte_size >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + *offset_table_off += byte_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("offset_table_off %u\n", *offset_table_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (has_next_fragment) { + fragment->hdr->next_fragment_off = + cpu_to_le16((u16)byte_size); + } else { + fragment->hdr->next_fragment_off = + cpu_to_le16(U16_MAX); + } + + fragment->hdr->checksum = ssdfs_crc32_le(fragment->hdr, byte_size); + + up_write(&fragment->lock); + +finish_prepare_for_commit: + up_read(&table->translation_lock); + + return err; +} + +/* + * ssdfs_blk2off_table_forget_snapshot() - undirty PEB's table + * @table: pointer on table object + * @sp: pointer on snapshot + * @array: extents array + * @extent_count: count of extents in array + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal logic error. + */ +int +ssdfs_blk2off_table_forget_snapshot(struct ssdfs_blk2off_table *table, + struct ssdfs_blk2off_table_snapshot *sp, + struct ssdfs_translation_extent *array, + u16 extent_count) +{ + struct ssdfs_phys_offset_table_array *pot_table; + struct ssdfs_sequence_array *sequence; + struct ssdfs_offset_position *pos; + u16 last_sequence_id; + unsigned long commited_fragments = 0; + int i, j; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table || !sp || !array); + BUG_ON(sp->peb_index >= table->pebs_count); + BUG_ON(extent_count == 0); + + SSDFS_DBG("table %p, peb_index %u, sp %p, " + "extents %p, extents_count %u\n", + table, sp->peb_index, sp, + array, extent_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&table->translation_lock); + + pot_table = &table->peb[sp->peb_index]; + last_sequence_id = ssdfs_sequence_array_last_id(pot_table->sequence); + + if (sp->dirty_fragments == 0) { + err = -EINVAL; + SSDFS_ERR("dirty_fragments == 0\n"); + goto finish_forget_snapshot; + } + + sequence = table->peb[sp->peb_index].sequence; + err = ssdfs_sequence_array_change_all_states(sequence, + SSDFS_SEQUENCE_ITEM_UNDER_COMMIT_TAG, + SSDFS_SEQUENCE_ITEM_COMMITED_TAG, + ssdfs_change_fragment_state, + SSDFS_BLK2OFF_FRAG_UNDER_COMMIT, + SSDFS_BLK2OFF_FRAG_COMMITED, + &commited_fragments); + if (unlikely(err)) { + SSDFS_ERR("fail to set fragments as commited: " + "err %d\n", err); + goto finish_forget_snapshot; + } + + if (sp->dirty_fragments != commited_fragments) { + err = -ERANGE; + SSDFS_ERR("dirty_fragments %u != commited_fragments %lu\n", + sp->dirty_fragments, commited_fragments); + goto finish_forget_snapshot; + } + + for (i = 0; i < extent_count; i++) { + u16 start_blk = le16_to_cpu(array[i].logical_blk); + u16 len = le16_to_cpu(array[i].len); + + for (j = 0; j < len; j++) { + u16 blk = start_blk + j; + u64 cno1, cno2; + void *kaddr; + + kaddr = ssdfs_dynamic_array_get_locked(&table->lblk2off, + blk); + if (IS_ERR_OR_NULL(kaddr)) { + err = (kaddr == NULL ? -ENOENT : PTR_ERR(kaddr)); + SSDFS_ERR("fail to get logical block: " + "blk %u, err %d\n", + blk, err); + goto finish_forget_snapshot; + } + + pos = SSDFS_OFF_POS(kaddr); + cno1 = pos->cno; + cno2 = sp->tbl_copy[blk].cno; + + err = ssdfs_dynamic_array_release(&table->lblk2off, + blk, pos); + if (unlikely(err)) { + SSDFS_ERR("fail to release: " + "blk %u, err %d\n", + blk, err); + goto finish_forget_snapshot; + } + + if (cno1 < cno2) { + SSDFS_WARN("cno1 %llu < cno2 %llu\n", + cno1, cno2); + } else if (cno1 > cno2) + continue; + + /* + * Don't clear information about free blocks + * in the modification bitmap. Otherwise, + * this information will be lost during + * the PEBs migration. + */ + if (array[i].state != SSDFS_LOGICAL_BLK_FREE) { + err = + ssdfs_blk2off_table_bmap_clear(&table->lbmap, + SSDFS_LBMAP_MODIFICATION_INDEX, blk); + if (unlikely(err)) { + SSDFS_ERR("fail to clear bitmap: " + "blk %u, err %d\n", + blk, err); + goto finish_forget_snapshot; + } + } + + err = ssdfs_blk2off_table_bmap_set(&table->lbmap, + SSDFS_LBMAP_INIT_INDEX, blk); + if (unlikely(err)) { + SSDFS_ERR("fail to set bitmap: " + "blk %u, err %d\n", + blk, err); + goto finish_forget_snapshot; + } + } + } + +finish_forget_snapshot: + up_write(&table->translation_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_peb_store_offsets_table_header() - store offsets table header + * @pebi: pointer on PEB object + * @hdr: table header + * @cur_page: pointer on current page value [in|out] + * @write_offset: pointer on write offset value [in|out] + * + * This function tries to store table header into log. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - fail to find memory page. + */ +int ssdfs_peb_store_offsets_table_header(struct ssdfs_peb_info *pebi, + struct ssdfs_blk2off_table_header *hdr, + pgoff_t *cur_page, + u32 *write_offset) +{ + size_t hdr_sz = sizeof(struct ssdfs_blk2off_table_header); + struct page *page; + u32 page_off, cur_offset; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); + BUG_ON(!hdr || !cur_page || !write_offset); + + SSDFS_DBG("peb %llu, current_log.start_page %u, " + "hdr %p, cur_page %lu, write_offset %u\n", + pebi->peb_id, + pebi->current_log.start_page, + hdr, *cur_page, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_off = *write_offset % PAGE_SIZE; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON((PAGE_SIZE - page_off) < hdr_sz); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = ssdfs_page_array_grab_page(&pebi->cache, *cur_page); + if (IS_ERR_OR_NULL(page)) { + SSDFS_ERR("fail to get cache page: index %lu\n", + *cur_page); + return -ENOMEM; + } + + err = ssdfs_memcpy_to_page(page, page_off, PAGE_SIZE, + hdr, 0, hdr_sz, + hdr_sz); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + goto finish_copy; + } + + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + + err = ssdfs_page_array_set_page_dirty(&pebi->cache, *cur_page); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu as dirty: err %d\n", + *cur_page, err); + } + +finish_copy: + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) + return err; + + *write_offset += hdr_sz; + + cur_offset = (*cur_page << PAGE_SHIFT) + page_off + hdr_sz; + *cur_page = cur_offset >> PAGE_SHIFT; + + return 0; +} + +/* + * ssdfs_peb_store_offsets_table_extents() - store translation extents + * @pebi: pointer on PEB object + * @array: translation extents array + * @extent_count: count of extents in the array + * @cur_page: pointer on current page value [in|out] + * @write_offset: pointer on write offset value [in|out] + * + * This function tries to store translation extents into log. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - fail to find memory page. + */ +int +ssdfs_peb_store_offsets_table_extents(struct ssdfs_peb_info *pebi, + struct ssdfs_translation_extent *array, + u16 extent_count, + pgoff_t *cur_page, + u32 *write_offset) +{ + struct page *page; + size_t extent_size = sizeof(struct ssdfs_translation_extent); + size_t array_size = extent_size * extent_count; + u32 rest_bytes, written_bytes = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); + BUG_ON(!array || !cur_page || !write_offset); + BUG_ON(extent_count == 0 || extent_count == U16_MAX); + + SSDFS_DBG("peb %llu, current_log.start_page %u, " + "array %p, extent_count %u, " + "cur_page %lu, write_offset %u\n", + pebi->peb_id, + pebi->current_log.start_page, + array, extent_count, + *cur_page, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + rest_bytes = extent_count * extent_size; + + while (rest_bytes > 0) { + u32 bytes; + u32 cur_off = *write_offset % PAGE_SIZE; + u32 new_off; + + bytes = min_t(u32, rest_bytes, PAGE_SIZE - cur_off); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(bytes < extent_size); + BUG_ON(written_bytes > (extent_count * extent_size)); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = ssdfs_page_array_grab_page(&pebi->cache, + *cur_page); + if (IS_ERR_OR_NULL(page)) { + SSDFS_ERR("fail to get cache page: index %lu\n", + *cur_page); + return -ENOMEM; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_off %u, written_bytes %u, bytes %u\n", + cur_off, written_bytes, bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_memcpy_to_page(page, cur_off, PAGE_SIZE, + array, written_bytes, array_size, + bytes); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + goto finish_copy; + } + + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + + err = ssdfs_page_array_set_page_dirty(&pebi->cache, + *cur_page); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu as dirty: err %d\n", + *cur_page, err); + } + +finish_copy: + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) + return err; + + *write_offset += bytes; + + new_off = (*cur_page << PAGE_SHIFT) + cur_off + bytes; + *cur_page = new_off >> PAGE_SHIFT; + + rest_bytes -= bytes; + written_bytes += bytes; + }; + + return 0; +} + +/* + * ssdfs_peb_store_offsets_table_fragment() - store fragment of offsets table + * @pebi: pointer on PEB object + * @table: pointer on translation table object + * @peb_index: PEB's index + * @sequence_id: sequence ID of fragment + * @cur_page: pointer on current page value [in|out] + * @write_offset: pointer on write offset value [in|out] + * + * This function tries to store table's fragment into log. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOMEM - fail to find memory page. + */ +int ssdfs_peb_store_offsets_table_fragment(struct ssdfs_peb_info *pebi, + struct ssdfs_blk2off_table *table, + u16 peb_index, u16 sequence_id, + pgoff_t *cur_page, + u32 *write_offset) +{ + struct ssdfs_phys_offset_table_array *pot_table; + struct ssdfs_sequence_array *sequence; + struct ssdfs_phys_offset_table_fragment *fragment; + struct ssdfs_phys_offset_table_header *hdr; + size_t hdr_size = sizeof(struct ssdfs_phys_offset_table_header); + struct page *page; + void *kaddr; + u32 fragment_size; + u16 flags; + u32 next_fragment_off; + u32 rest_bytes, written_bytes = 0; + u32 cur_off; + u32 new_off; + u32 diff; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); + BUG_ON(!table || !cur_page || !write_offset); + BUG_ON(peb_index >= table->pebs_count); + + SSDFS_DBG("peb %llu, current_log.start_page %u, " + "peb_index %u, sequence_id %u, " + "cur_page %lu, write_offset %u\n", + pebi->peb_id, + pebi->current_log.start_page, + peb_index, sequence_id, + *cur_page, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&table->translation_lock); + + pot_table = &table->peb[peb_index]; + + sequence = pot_table->sequence; + kaddr = ssdfs_sequence_array_get_item(sequence, sequence_id); + if (IS_ERR_OR_NULL(kaddr)) { + err = (kaddr == NULL ? -ENOENT : PTR_ERR(kaddr)); + SSDFS_ERR("fail to get fragment: " + "sequence_id %u, err %d\n", + sequence_id, err); + goto finish_store_fragment; + } + fragment = (struct ssdfs_phys_offset_table_fragment *)kaddr; + + down_write(&fragment->lock); + + if (atomic_read(&fragment->state) != SSDFS_BLK2OFF_FRAG_UNDER_COMMIT) { + err = -ERANGE; + SSDFS_ERR("invalid fragment state %#x\n", + atomic_read(&fragment->state)); + goto finish_fragment_copy; + } + + hdr = fragment->hdr; + + if (!hdr) { + err = -ERANGE; + SSDFS_ERR("header pointer is NULL\n"); + goto finish_fragment_copy; + } + + fragment_size = le32_to_cpu(hdr->byte_size); + rest_bytes = fragment_size; + + if (fragment_size < hdr_size || fragment_size > fragment->buf_size) { + err = -ERANGE; + SSDFS_ERR("invalid fragment size %u\n", + fragment_size); + goto finish_fragment_copy; + } + + next_fragment_off = ssdfs_peb_correct_area_write_offset(*write_offset + + fragment_size, + hdr_size); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_page %lu, write_offset %u, fragment_size %u, " + "hdr_size %zu, next_fragment_off %u\n", + *cur_page, *write_offset, fragment_size, + hdr_size, next_fragment_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + flags = le16_to_cpu(hdr->flags); + if (flags & SSDFS_OFF_TABLE_HAS_NEXT_FRAGMENT) { + diff = next_fragment_off - *write_offset; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(diff >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr->next_fragment_off = cpu_to_le16((u16)diff); + hdr->checksum = ssdfs_crc32_le(hdr, + le32_to_cpu(hdr->byte_size)); + } + + while (rest_bytes > 0) { + u32 bytes; + cur_off = *write_offset % PAGE_SIZE; + + bytes = min_t(u32, rest_bytes, PAGE_SIZE - cur_off); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(written_bytes > fragment_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = ssdfs_page_array_grab_page(&pebi->cache, + *cur_page); + if (IS_ERR_OR_NULL(page)) { + err = -ENOMEM; + SSDFS_ERR("fail to get cache page: index %lu\n", + *cur_page); + goto finish_fragment_copy; + } + + err = ssdfs_memcpy_to_page(page, cur_off, PAGE_SIZE, + hdr, written_bytes, fragment_size, + bytes); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + goto finish_cur_copy; + } + + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ssdfs_page_array_set_page_dirty %lu\n", + *cur_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_page_array_set_page_dirty(&pebi->cache, + *cur_page); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu as dirty: err %d\n", + *cur_page, err); + } + +finish_cur_copy: + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) + goto finish_fragment_copy; + + *write_offset += bytes; + + new_off = (*cur_page << PAGE_SHIFT) + cur_off + bytes; + *cur_page = new_off >> PAGE_SHIFT; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_off %u, bytes %u, new_off %u, cur_page %lu\n", + cur_off, bytes, new_off, *cur_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + rest_bytes -= bytes; + written_bytes += bytes; + }; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(*write_offset > next_fragment_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + diff = next_fragment_off - *write_offset; + + if (diff > 0) { + cur_off = *write_offset % PAGE_SIZE; + *write_offset = next_fragment_off; + + new_off = (*cur_page << PAGE_SHIFT) + cur_off + diff; + *cur_page = new_off >> PAGE_SHIFT; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_page %lu, write_offset %u, " + "next_fragment_off %u\n", + *cur_page, *write_offset, + next_fragment_off); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_fragment_copy: + up_write(&fragment->lock); + +finish_store_fragment: + up_read(&table->translation_lock); + + return err; +} + +static inline +u16 ssdfs_next_sequence_id(u16 sequence_id) +{ + u16 next_sequence_id = U16_MAX; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("sequence_id %u\n", sequence_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (sequence_id > SSDFS_BLK2OFF_TBL_REVERT_THRESHOLD) { + SSDFS_ERR("invalid sequence_id %u\n", + sequence_id); + return U16_MAX; + } else if (sequence_id < SSDFS_BLK2OFF_TBL_REVERT_THRESHOLD) { + /* increment value */ + next_sequence_id = sequence_id + 1; + } else + next_sequence_id = 0; + + return next_sequence_id; +} + +/* + * ssdfs_peb_store_offsets_table() - store offsets table + * @pebi: pointer on PEB object + * @desc: offsets table descriptor [out] + * @cur_page: pointer on current page value [in|out] + * @write_offset: pointer on write offset value [in|out] + * + * This function tries to store the offsets table into log. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOMEM - fail to find memory page. + */ +int ssdfs_peb_store_offsets_table(struct ssdfs_peb_info *pebi, + struct ssdfs_metadata_descriptor *desc, + pgoff_t *cur_page, + u32 *write_offset) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_blk2off_table *table; + struct ssdfs_blk2off_table_snapshot snapshot = {0}; + struct ssdfs_blk2off_table_header hdr; + struct ssdfs_translation_extent *extents = NULL; + size_t tbl_hdr_size = sizeof(struct ssdfs_blk2off_table_header); + u16 extents_off = offsetof(struct ssdfs_blk2off_table_header, sequence); + u16 extent_count = 0; + u32 offset_table_off; + u16 peb_index; + u32 table_start_offset; + u16 sequence_id; + u32 fragments_count = 0; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!pebi->pebc->parent_si->blk2off_table); + BUG_ON(!desc || !cur_page || !write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg %llu, peb %llu, current_log.start_page %u, " + "cur_page %lu, write_offset %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page, + *cur_page, *write_offset); +#else + SSDFS_DBG("seg %llu, peb %llu, current_log.start_page %u, " + "cur_page %lu, write_offset %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page, + *cur_page, *write_offset); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + fsi = pebi->pebc->parent_si->fsi; + peb_index = pebi->peb_index; + table = pebi->pebc->parent_si->blk2off_table; + + memset(desc, 0, sizeof(struct ssdfs_metadata_descriptor)); + memset(&hdr, 0, tbl_hdr_size); + + err = ssdfs_blk2off_table_snapshot(table, peb_index, &snapshot); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("table hasn't dirty fragments: peb_index %u\n", + peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to get snapshot: peb_index %u, err %d\n", + peb_index, err); + return err; + } + + if (unlikely(peb_index != snapshot.peb_index)) { + err = -ERANGE; + SSDFS_ERR("peb_index %u != snapshot.peb_index %u\n", + peb_index, snapshot.peb_index); + goto fail_store_off_table; + } + + if (unlikely(!snapshot.bmap_copy || !snapshot.tbl_copy)) { + err = -ERANGE; + SSDFS_ERR("invalid snapshot: " + "peb_index %u, bmap_copy %p, tbl_copy %p\n", + peb_index, + snapshot.bmap_copy, + snapshot.tbl_copy); + goto fail_store_off_table; + } + + extents = ssdfs_blk2off_kcalloc(snapshot.capacity, + sizeof(struct ssdfs_translation_extent), + GFP_KERNEL); + if (unlikely(!extents)) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate extent array\n"); + goto fail_store_off_table; + } + + hdr.magic.common = cpu_to_le32(SSDFS_SUPER_MAGIC); + hdr.magic.key = cpu_to_le16(SSDFS_BLK2OFF_TABLE_HDR_MAGIC); + hdr.magic.version.major = SSDFS_MAJOR_REVISION; + hdr.magic.version.minor = SSDFS_MINOR_REVISION; + + err = ssdfs_blk2off_table_extract_extents(&snapshot, extents, + snapshot.capacity, + &extent_count); + if (unlikely(err)) { + SSDFS_ERR("fail to extract the extent array: " + "peb_index %u, err %d\n", + peb_index, err); + goto fail_store_off_table; + } else if (extent_count == 0) { + err = -ERANGE; + SSDFS_ERR("invalid extent count\n"); + goto fail_store_off_table; + } + + hdr.extents_off = cpu_to_le16(extents_off); + hdr.extents_count = cpu_to_le16(extent_count); + +#ifdef CONFIG_SSDFS_SAVE_WHOLE_BLK2OFF_TBL_IN_EVERY_LOG + fragments_count = snapshot.fragments_count; +#else + fragments_count = snapshot.dirty_fragments; +#endif /* CONFIG_SSDFS_SAVE_WHOLE_BLK2OFF_TBL_IN_EVERY_LOG */ + + offset_table_off = tbl_hdr_size + + ((extent_count - 1) * + sizeof(struct ssdfs_translation_extent)); + + hdr.offset_table_off = cpu_to_le16((u16)offset_table_off); + + sequence_id = snapshot.start_sequence_id; + for (i = 0; i < fragments_count; i++) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment_index %d, offset_table_off %u\n", + i, offset_table_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_blk2off_table_prepare_for_commit(table, peb_index, + sequence_id, + &offset_table_off, + &snapshot); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare fragment for commit: " + "peb_index %u, sequence_id %u, err %d\n", + peb_index, sequence_id, err); + goto fail_store_off_table; + } + + sequence_id = ssdfs_next_sequence_id(sequence_id); + if (sequence_id > SSDFS_BLK2OFF_TBL_REVERT_THRESHOLD) { + err = -ERANGE; + SSDFS_ERR("invalid next sequence_id %u\n", + sequence_id); + goto fail_store_off_table; + } + } + + hdr.fragments_count = cpu_to_le16(snapshot.dirty_fragments); + + ssdfs_memcpy(hdr.sequence, 0, sizeof(struct ssdfs_translation_extent), + extents, 0, sizeof(struct ssdfs_translation_extent), + sizeof(struct ssdfs_translation_extent)); + + hdr.check.bytes = cpu_to_le16(tbl_hdr_size); + hdr.check.flags = cpu_to_le16(SSDFS_CRC32); + + err = ssdfs_calculate_csum(&hdr.check, &hdr, tbl_hdr_size); + if (unlikely(err)) { + SSDFS_ERR("unable to calculate checksum: err %d\n", err); + goto fail_store_off_table; + } + + *write_offset = ssdfs_peb_correct_area_write_offset(*write_offset, + tbl_hdr_size); + table_start_offset = *write_offset; + + desc->offset = cpu_to_le32(*write_offset + + (pebi->current_log.start_page * fsi->pagesize)); + + err = ssdfs_peb_store_offsets_table_header(pebi, &hdr, + cur_page, write_offset); + if (unlikely(err)) { + SSDFS_ERR("fail to store offsets table's header: " + "cur_page %lu, write_offset %u, err %d\n", + *cur_page, *write_offset, err); + goto fail_store_off_table; + } + + if (extent_count > 1) { + err = ssdfs_peb_store_offsets_table_extents(pebi, &extents[1], + extent_count - 1, + cur_page, + write_offset); + if (unlikely(err)) { + SSDFS_ERR("fail to store offsets table's extents: " + "cur_page %lu, write_offset %u, err %d\n", + *cur_page, *write_offset, err); + goto fail_store_off_table; + } + } + + sequence_id = snapshot.start_sequence_id; + for (i = 0; i < fragments_count; i++) { + err = ssdfs_peb_store_offsets_table_fragment(pebi, table, + peb_index, + sequence_id, + cur_page, + write_offset); + if (unlikely(err)) { + SSDFS_ERR("fail to store offsets table's fragment: " + "sequence_id %u, cur_page %lu, " + "write_offset %u, err %d\n", + sequence_id, *cur_page, + *write_offset, err); + goto fail_store_off_table; + } + + sequence_id = ssdfs_next_sequence_id(sequence_id); + if (sequence_id > SSDFS_BLK2OFF_TBL_REVERT_THRESHOLD) { + err = -ERANGE; + SSDFS_ERR("invalid next sequence_id %u\n", + sequence_id); + goto fail_store_off_table; + } + } + + err = ssdfs_blk2off_table_forget_snapshot(table, &snapshot, + extents, extent_count); + if (unlikely(err)) { + SSDFS_ERR("fail to forget snapshot state: " + "peb_index %u, err %d\n", + peb_index, err); + goto fail_store_off_table; + } + + BUG_ON(*write_offset <= table_start_offset); + desc->size = cpu_to_le32(*write_offset - table_start_offset); + + pebi->current_log.seg_flags |= SSDFS_SEG_HDR_HAS_OFFSET_TABLE; + +fail_store_off_table: + ssdfs_blk2off_table_free_snapshot(&snapshot); + + ssdfs_blk2off_kfree(extents); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: err %d\n", err); +#else + SSDFS_DBG("finished: err %d\n", err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} From patchwork Sat Feb 25 01:08:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151924 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB477C7EE30 for ; Sat, 25 Feb 2023 01:16:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229650AbjBYBQv (ORCPT ); Fri, 24 Feb 2023 20:16:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229697AbjBYBQ2 (ORCPT ); Fri, 24 Feb 2023 20:16:28 -0500 Received: from mail-oi1-x232.google.com (mail-oi1-x232.google.com [IPv6:2607:f8b0:4864:20::232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7296D12BD6 for ; Fri, 24 Feb 2023 17:16:16 -0800 (PST) Received: by mail-oi1-x232.google.com with SMTP id bm20so798875oib.7 for ; Fri, 24 Feb 2023 17:16:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3S3GmeUOr0XGrkUiTYyCIRpgvmzKnmZ0HUQNcR+/F9A=; b=21H03ghy8s97v/JauWwYEc2psHz2jbQaLyM3C6+8rhIyQnQSoI6eq8kxf074Mr09n6 Uohgk+d99QYDctjgYuXK7GzEaIxG59BFvgoLyHO82QEBARFdoh8uU/PGW8WH4rMbef3r GG7JkJnvb8w8g2a4gk1R6hgXEEncYxjn11l7DwstnwIRNMibS0Z0Yzu9ycGbgmqJ33PM z61SOR8GEAPf7d9kG64WLN2ZSWKHXZtQhTZQOtQN5I/rj6WmdD4luBilXSi8MR3aFHm7 qK/oySaA/D0WNUEXTmrGcuWP+uSrDR0uahcQEVrNY+H5HRlB6Z/13yswrlUtWa2ev4cI /AKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3S3GmeUOr0XGrkUiTYyCIRpgvmzKnmZ0HUQNcR+/F9A=; b=3wKaCok0iiGavJU6biKSzxeHv0O9W0yDoWGY+xJqP++VoJXOE0n0wKLm0nrSWNRZQf UQH/TASWTk9fPBjW6TxL1ACwIeWpmNWuEqpn26gccjMRWUcKLcXbVVIByeHnWP0PJHi8 F5O4G4xegVrC7QI0oTjWmBksbMa9RH3rARCNpSgNZj1OdwxTrazRZGjx+/ZSyltbUkeA sshpb46uLFl0XyKdbkNUOpr7+rEcab2gC4naQG12Z0vExfsVBBOmETXfxCQqdowHeHtG I90npawtzTPSzCGDFy3bHXxkutDbwPYJtYoDzvn07tyRF/wIQePj4bS4M8lCw7VWSXKv XaAQ== X-Gm-Message-State: AO0yUKVfFZLMRrppN2XOkp9N8JCKTdiXR5eULX84fg/j4wBRwPYKY6CD XkknGFGRxWeoXkW1w8S7N1N5U4eYmzEngVHA X-Google-Smtp-Source: AK7set9wMyG/nBz8TgyewGhCOz819RtVWgjVXTgXN1CdHxmisooVfVSC5fRomXR9+fKARJyXifzd+w== X-Received: by 2002:aca:90b:0:b0:383:f036:cefa with SMTP id 11-20020aca090b000000b00383f036cefamr1288325oij.43.1677287774979; Fri, 24 Feb 2023 17:16:14 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:14 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 19/76] ssdfs: offset translation table API implementation Date: Fri, 24 Feb 2023 17:08:30 -0800 Message-Id: <20230225010927.813929-20-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The responsibility of offset translation table is to implement the mechanism of converting logical block ID into the physical offset in the log. As a result, offset translation table implements API: (1) create - create empty offset translation table (2) destroy - destroy offset translation table (3) partial_init - initialize offset translation table by one fragment (4) store_offsets_table - flush dirty fragments (5) convert - convert logical block ID into offset descriptor (6) allocate_block - allocate logical block (7) allocate_extent - allocate logical extent (8) change_offset - initialize offset of allocated logical block (9) free_block - free logical block (10) free_extent - free logical extent Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/offset_translation_table.c | 2996 +++++++++++++++++++++++++++ 1 file changed, 2996 insertions(+) diff --git a/fs/ssdfs/offset_translation_table.c b/fs/ssdfs/offset_translation_table.c index ccdabc3b7f72..bde595f69d9f 100644 --- a/fs/ssdfs/offset_translation_table.c +++ b/fs/ssdfs/offset_translation_table.c @@ -5162,3 +5162,2999 @@ int ssdfs_peb_store_offsets_table(struct ssdfs_peb_info *pebi, return err; } + +/* + * ssdfs_blk2off_table_get_used_logical_blks() - get used logical blocks count + * @tbl: pointer on table object + * @used_blks: pointer on used logical blocks count [out] + * + * This method tries to get used logical blocks count. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EAGAIN - table doesn't initialized yet. + */ +int ssdfs_blk2off_table_get_used_logical_blks(struct ssdfs_blk2off_table *tbl, + u16 *used_blks) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !used_blks); + + SSDFS_DBG("table %p, used_blks %p\n", + tbl, used_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + *used_blks = U16_MAX; + + if (atomic_read(&tbl->state) < SSDFS_BLK2OFF_OBJECT_PARTIAL_INIT) { + SSDFS_DBG("table is not initialized yet\n"); + return -EAGAIN; + } + + down_read(&tbl->translation_lock); + *used_blks = tbl->used_logical_blks; + up_read(&tbl->translation_lock); + + return 0; +} + +/* + * ssdfs_blk2off_table_blk_desc_init() - init block descriptor for offset + * @table: pointer on table object + * @logical_blk: logical block number + * @pos: pointer of offset's position [in] + * + * This method tries to init block descriptor for offset. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal logic error. + * %-ENODATA - table doesn't contain logical block or corresponding ID. + */ +int ssdfs_blk2off_table_blk_desc_init(struct ssdfs_blk2off_table *table, + u16 logical_blk, + struct ssdfs_offset_position *pos) +{ + struct ssdfs_offset_position *old_pos = NULL; + struct ssdfs_blk_state_offset *state_off; + size_t desc_size = sizeof(struct ssdfs_block_descriptor_state); + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table || !pos); + + SSDFS_DBG("table %p, logical_blk %u, pos %p\n", + table, logical_blk, pos); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (logical_blk >= table->lblk2off_capacity) { + SSDFS_ERR("logical_blk %u >= lblk2off_capacity %u\n", + logical_blk, table->lblk2off_capacity); + return -ERANGE; + } + + down_write(&table->translation_lock); + + if (ssdfs_blk2off_table_bmap_vacant(&table->lbmap, + SSDFS_LBMAP_STATE_INDEX, + table->lblk2off_capacity, + logical_blk)) { + err = -ENODATA; + SSDFS_ERR("requested block %u hasn't been allocated\n", + logical_blk); + goto finish_init; + } + + old_pos = SSDFS_OFF_POS(ssdfs_dynamic_array_get_locked(&table->lblk2off, + logical_blk)); + if (IS_ERR_OR_NULL(old_pos)) { + err = (old_pos == NULL ? -ENOENT : PTR_ERR(old_pos)); + SSDFS_ERR("fail to get logical block: " + "logical_blk %u, err %d\n", + logical_blk, err); + goto finish_init; + } + + switch (old_pos->blk_desc.status) { + case SSDFS_BLK_DESC_BUF_UNKNOWN_STATE: + case SSDFS_BLK_DESC_BUF_ALLOCATED: + /* continue logic */ + break; + + case SSDFS_BLK_DESC_BUF_INITIALIZED: + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical block %u has been initialized\n", + logical_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_init; + + default: + err = -ERANGE; + SSDFS_ERR("invalid state %#x of blk desc buffer\n", + old_pos->blk_desc.status); + goto finish_init; + } + + state_off = &pos->blk_desc.buf.state[0]; + + if (IS_SSDFS_BLK_STATE_OFFSET_INVALID(state_off)) { + err = -ERANGE; + SSDFS_ERR("block state offset invalid\n"); + SSDFS_ERR("log_start_page %u, log_area %u, " + "peb_migration_id %u, byte_offset %u\n", + le16_to_cpu(state_off->log_start_page), + state_off->log_area, + state_off->peb_migration_id, + le32_to_cpu(state_off->byte_offset)); + goto finish_init; + } + + ssdfs_memcpy(&old_pos->blk_desc, 0, desc_size, + &pos->blk_desc.buf, 0, desc_size, + desc_size); + +finish_init: + ssdfs_dynamic_array_release(&table->lblk2off, logical_blk, old_pos); + up_write(&table->translation_lock); + + return err; +} + +/* + * ssdfs_blk2off_table_get_checked_position() - get checked offset's position + * @table: pointer on table object + * @logical_blk: logical block number + * @pos: pointer of offset's position [out] + * + * This method tries to get and to check offset's position for + * requested logical block. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal logic error. + * %-ENODATA - table doesn't contain logical block or corresponding ID. + * %-ENOENT - table's fragment for requested logical block not initialized. + * %-EBUSY - logical block hasn't ID yet. + */ +static +int ssdfs_blk2off_table_get_checked_position(struct ssdfs_blk2off_table *table, + u16 logical_blk, + struct ssdfs_offset_position *pos) +{ + struct ssdfs_phys_offset_table_array *phys_off_table; + struct ssdfs_sequence_array *sequence; + struct ssdfs_phys_offset_table_fragment *fragment; + void *ptr; + size_t off_pos_size = sizeof(struct ssdfs_offset_position); + int state; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table || !pos); + BUG_ON(!rwsem_is_locked(&table->translation_lock)); + + SSDFS_DBG("table %p, logical_blk %u, pos %p\n", + table, logical_blk, pos); + + ssdfs_debug_blk2off_table_object(table); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (logical_blk >= table->lblk2off_capacity) { + SSDFS_ERR("logical_blk %u >= lblk2off_capacity %u\n", + logical_blk, table->lblk2off_capacity); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("init_bmap %lx, state_bmap %lx, modification_bmap %lx\n", + *table->lbmap.array[SSDFS_LBMAP_INIT_INDEX], + *table->lbmap.array[SSDFS_LBMAP_STATE_INDEX], + *table->lbmap.array[SSDFS_LBMAP_MODIFICATION_INDEX]); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (ssdfs_blk2off_table_bmap_vacant(&table->lbmap, + SSDFS_LBMAP_STATE_INDEX, + table->lblk2off_capacity, + logical_blk)) { + SSDFS_ERR("requested block %u hasn't been allocated\n", + logical_blk); + return -ENODATA; + } + + ptr = ssdfs_dynamic_array_get_locked(&table->lblk2off, logical_blk); + if (IS_ERR_OR_NULL(ptr)) { + err = (ptr == NULL ? -ENOENT : PTR_ERR(ptr)); + SSDFS_ERR("fail to get logical block: " + "logical_blk %u, err %d\n", + logical_blk, err); + return err; + } + + ssdfs_memcpy(pos, 0, off_pos_size, + ptr, 0, off_pos_size, + off_pos_size); + + err = ssdfs_dynamic_array_release(&table->lblk2off, logical_blk, ptr); + if (unlikely(err)) { + SSDFS_ERR("fail to release: " + "logical_blk %u, err %d\n", + logical_blk, err); + return err; + } + + if (pos->id == SSDFS_INVALID_OFFSET_ID) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical block %u hasn't ID yet\n", + logical_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EBUSY; + } + + if (pos->peb_index >= table->pebs_count) { + SSDFS_ERR("peb_index %u >= pebs_count %u\n", + pos->peb_index, table->pebs_count); + return -ERANGE; + } + + if (pos->sequence_id > SSDFS_BLK2OFF_TBL_REVERT_THRESHOLD) { + SSDFS_ERR("sequence_id %u is out of order\n", + pos->sequence_id); + return -ERANGE; + } + + phys_off_table = &table->peb[pos->peb_index]; + + sequence = phys_off_table->sequence; + ptr = ssdfs_sequence_array_get_item(sequence, pos->sequence_id); + if (IS_ERR_OR_NULL(ptr)) { + err = (ptr == NULL ? -ENOENT : PTR_ERR(ptr)); + SSDFS_ERR("fail to get fragment: " + "sequence_id %u, err %d\n", + pos->sequence_id, err); + return err; + } + fragment = (struct ssdfs_phys_offset_table_fragment *)ptr; + + state = atomic_read(&fragment->state); + if (state < SSDFS_BLK2OFF_FRAG_INITIALIZED) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment %u is not initialized yet\n", + pos->sequence_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOENT; + } else if (state >= SSDFS_BLK2OFF_FRAG_STATE_MAX) { + SSDFS_ERR("unknown fragment's state\n"); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_blk2off_table_check_fragment_desc() - check fragment's description + * @table: pointer on table object + * @frag: pointer on fragment + * @pos: pointer of offset's position + * + * This method tries to check fragment's description. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal logic error. + */ +static +int ssdfs_blk2off_table_check_fragment_desc(struct ssdfs_blk2off_table *table, + struct ssdfs_phys_offset_table_fragment *frag, + struct ssdfs_offset_position *pos) +{ + u16 start_id; + int id_count; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table || !frag || !pos); + BUG_ON(!rwsem_is_locked(&table->translation_lock)); + + SSDFS_DBG("table %p, id %u, peb_index %u, " + "sequence_id %u, offset_index %u\n", + table, pos->id, pos->peb_index, + pos->sequence_id, pos->offset_index); + + BUG_ON(!rwsem_is_locked(&frag->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + start_id = frag->start_id; + id_count = atomic_read(&frag->id_count); + + if (pos->id < start_id || pos->id >= (start_id + id_count)) { + SSDFS_ERR("id %u out of range (start %u, len %u)\n", + pos->id, start_id, id_count); + return -ERANGE; + } + + if (pos->offset_index >= id_count) { + SSDFS_ERR("offset_index %u >= id_count %u\n", + pos->offset_index, id_count); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + if (!frag->phys_offs) { + SSDFS_ERR("offsets table pointer is NULL\n"); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +bool has_logical_block_id_assigned(struct ssdfs_blk2off_table *table, + u16 logical_blk) +{ + u16 capacity; + bool has_assigned = false; + + down_read(&table->translation_lock); + capacity = table->lblk2off_capacity; + has_assigned = !ssdfs_blk2off_table_bmap_vacant(&table->lbmap, + SSDFS_LBMAP_MODIFICATION_INDEX, + capacity, + logical_blk); + up_read(&table->translation_lock); + + return has_assigned; +} + +/* + * ssdfs_blk2off_table_convert() - convert logical block into offset + * @table: pointer on table object + * @logical_blk: logical block number + * @peb_index: pointer on PEB index value [out] + * @migration_state: migration state of the block [out] + * @pos: offset position [out] + * + * This method tries to convert logical block number into offset. + * + * RETURN: + * [success] - pointer on found offset. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ERANGE - internal logic error. + * %-EAGAIN - table doesn't prepared for conversion yet. + * %-ENODATA - table doesn't contain logical block. + * %-ENOENT - table's fragment for requested logical block not initialized + */ +struct ssdfs_phys_offset_descriptor * +ssdfs_blk2off_table_convert(struct ssdfs_blk2off_table *table, + u16 logical_blk, + u16 *peb_index, + int *migration_state, + struct ssdfs_offset_position *pos) +{ + struct ssdfs_phys_offset_table_array *phys_off_table; + struct ssdfs_sequence_array *sequence; + struct ssdfs_phys_offset_table_fragment *fragment; + struct ssdfs_phys_offset_descriptor *ptr = NULL; + struct ssdfs_migrating_block *blk = NULL; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table || !peb_index || !pos); + + SSDFS_DBG("table %p, logical_blk %u\n", + table, logical_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + *peb_index = U16_MAX; + + down_read(&table->translation_lock); + + if (logical_blk >= table->lblk2off_capacity) { + err = -EINVAL; + SSDFS_ERR("fail to convert logical block: " + "block %u >= capacity %u\n", + logical_blk, + table->lblk2off_capacity); + goto finish_translation; + } + + if (atomic_read(&table->state) <= SSDFS_BLK2OFF_OBJECT_PARTIAL_INIT) { + u16 capacity = table->lblk2off_capacity; + + if (ssdfs_blk2off_table_bmap_vacant(&table->lbmap, + SSDFS_LBMAP_INIT_INDEX, + capacity, + logical_blk)) { + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("table is not initialized yet: " + "logical_blk %u\n", + logical_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_translation; + } + } + + if (migration_state) { + blk = ssdfs_get_migrating_block(table, logical_blk, false); + if (IS_ERR_OR_NULL(blk)) + *migration_state = SSDFS_LBLOCK_UNKNOWN_STATE; + else + *migration_state = blk->state; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u, migration_state %#x\n", + logical_blk, *migration_state); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + err = ssdfs_blk2off_table_get_checked_position(table, logical_blk, + pos); + if (err == -EBUSY) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to get checked position: logical_blk %u\n", + logical_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + up_read(&table->translation_lock); + wait_event_interruptible_timeout(table->wait_queue, + has_logical_block_id_assigned(table, + logical_blk), + SSDFS_DEFAULT_TIMEOUT); + down_read(&table->translation_lock); + + err = ssdfs_blk2off_table_get_checked_position(table, + logical_blk, + pos); + if (unlikely(err)) { + SSDFS_ERR("fail to get checked offset's position: " + "logical_block %u, err %d\n", + logical_blk, err); + goto finish_translation; + } + } else if (unlikely(err)) { + SSDFS_ERR("fail to get checked offset's position: " + "logical_block %u, err %d\n", + logical_blk, err); + goto finish_translation; + } + + *peb_index = pos->peb_index; + phys_off_table = &table->peb[pos->peb_index]; + + sequence = phys_off_table->sequence; + kaddr = ssdfs_sequence_array_get_item(sequence, pos->sequence_id); + if (IS_ERR_OR_NULL(kaddr)) { + err = (kaddr == NULL ? -ENOENT : PTR_ERR(kaddr)); + SSDFS_ERR("fail to get fragment: " + "sequence_id %u, err %d\n", + pos->sequence_id, err); + goto finish_translation; + } + fragment = (struct ssdfs_phys_offset_table_fragment *)kaddr; + + down_read(&fragment->lock); + + err = ssdfs_blk2off_table_check_fragment_desc(table, fragment, pos); + if (unlikely(err)) { + SSDFS_ERR("invalid fragment description: err %d\n", err); + goto finish_fragment_lookup; + } + + ptr = &fragment->phys_offs[pos->offset_index]; + +finish_fragment_lookup: + up_read(&fragment->lock); + +finish_translation: + up_read(&table->translation_lock); + + if (err) + return ERR_PTR(err); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u, " + "logical_offset %u, peb_index %u, peb_page %u, " + "log_start_page %u, log_area %u, " + "peb_migration_id %u, byte_offset %u\n", + logical_blk, + le32_to_cpu(ptr->page_desc.logical_offset), + pos->peb_index, + le16_to_cpu(ptr->page_desc.peb_page), + le16_to_cpu(ptr->blk_state.log_start_page), + ptr->blk_state.log_area, + ptr->blk_state.peb_migration_id, + le32_to_cpu(ptr->blk_state.byte_offset)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return ptr; +} + +/* + * ssdfs_blk2off_table_get_offset_position() - get offset position + * @table: pointer on table object + * @logical_blk: logical block number + * @pos: offset position + * + * This method tries to get offset position. + * + * RETURN: + * [success] - pointer on found offset. + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ERANGE - internal logic error. + * %-EAGAIN - table doesn't prepared for conversion yet. + * %-ENODATA - table doesn't contain logical block. + */ +int ssdfs_blk2off_table_get_offset_position(struct ssdfs_blk2off_table *table, + u16 logical_blk, + struct ssdfs_offset_position *pos) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table || !pos); + + SSDFS_DBG("table %p, logical_blk %u\n", + table, logical_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&table->translation_lock); + + if (logical_blk >= table->lblk2off_capacity) { + err = -EINVAL; + SSDFS_ERR("fail to convert logical block: " + "block %u >= capacity %u\n", + logical_blk, + table->lblk2off_capacity); + goto finish_extract_position; + } + + if (atomic_read(&table->state) <= SSDFS_BLK2OFF_OBJECT_PARTIAL_INIT) { + u16 capacity = table->lblk2off_capacity; + + if (ssdfs_blk2off_table_bmap_vacant(&table->lbmap, + SSDFS_LBMAP_INIT_INDEX, + capacity, + logical_blk)) { + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("table is not initialized yet: " + "logical_blk %u\n", + logical_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_extract_position; + } + } + + err = ssdfs_blk2off_table_get_checked_position(table, logical_blk, + pos); + if (unlikely(err)) { + SSDFS_ERR("fail to get checked offset's position: " + "logical_block %u, err %d\n", + logical_blk, err); + goto finish_extract_position; + } + +finish_extract_position: + up_read(&table->translation_lock); + + if (err) + return err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u, " + "pos->cno %llu, pos->id %u, pos->peb_index %u, " + "pos->sequence_id %u, pos->offset_index %u\n", + logical_blk, pos->cno, pos->id, + pos->peb_index, pos->sequence_id, + pos->offset_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * calculate_rest_range_id_count() - get rest range's IDs + * @ptr: pointer on fragment object + * + * This method calculates the rest count of IDs. + */ +static inline +int calculate_rest_range_id_count(struct ssdfs_phys_offset_table_fragment *ptr) +{ + int id_count = atomic_read(&ptr->id_count); + size_t blk2off_tbl_hdr_size = sizeof(struct ssdfs_blk2off_table_header); + size_t hdr_size = sizeof(struct ssdfs_phys_offset_table_header); + size_t off_size = sizeof(struct ssdfs_phys_offset_descriptor); + size_t metadata_size = blk2off_tbl_hdr_size + hdr_size; + int id_capacity; + int start_id = ptr->start_id; + int rest_range_ids; + + if ((start_id + id_count) > SSDFS_INVALID_OFFSET_ID) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_id %d, id_count %d\n", + start_id, id_count); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + id_capacity = (ptr->buf_size - metadata_size) / off_size; + + if (id_count >= id_capacity) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("id_count %d, id_capacity %d\n", + id_count, id_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + rest_range_ids = id_capacity - id_count; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("id_count %d, id_capacity %d, rest_range_ids %d\n", + id_count, id_capacity, rest_range_ids); +#endif /* CONFIG_SSDFS_DEBUG */ + + return rest_range_ids; +} + +/* + * is_id_valid_for_assignment() - check ID validity + * @table: pointer on table object + * @ptr: pointer on fragment object + * @id: ID value + */ +static +bool is_id_valid_for_assignment(struct ssdfs_blk2off_table *table, + struct ssdfs_phys_offset_table_fragment *ptr, + int id) +{ + int id_count = atomic_read(&ptr->id_count); + int rest_range_ids; + + if (id < ptr->start_id) { + SSDFS_WARN("id %d < start_id %u\n", + id, ptr->start_id); + return false; + } + + if (id > (ptr->start_id + id_count)) { + SSDFS_WARN("id %d > (ptr->start_id %u + id_count %d)", + id, ptr->start_id, id_count); + return false; + } + + rest_range_ids = calculate_rest_range_id_count(ptr); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("id %d, rest_range_ids %d\n", + id, rest_range_ids); +#endif /* CONFIG_SSDFS_DEBUG */ + + return rest_range_ids > 0; +} + +/* + * ssdfs_blk2off_table_assign_id() - assign ID for logical block + * @table: pointer on table object + * @logical_blk: logical block number + * @peb_index: PEB's index + * @blk_desc: block descriptor + * @last_sequence_id: pointer on last fragment index [out] + * + * This method tries to define physical offset's ID value for + * requested logical block number in last actual PEB's fragment. + * If the last actual fragment hasn't vacant ID then the method + * returns error and found last fragment index in + * @last_sequence_id. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ERANGE - internal logic error. + * %-ENOENT - table's fragment for requested logical block not initialized + * %-ENOSPC - fragment hasn't vacant IDs and it needs to initialize next one + */ +static +int ssdfs_blk2off_table_assign_id(struct ssdfs_blk2off_table *table, + u16 logical_blk, u16 peb_index, + struct ssdfs_block_descriptor *blk_desc, + u16 *last_sequence_id) +{ + struct ssdfs_phys_offset_table_array *phys_off_table; + struct ssdfs_sequence_array *sequence; + struct ssdfs_phys_offset_table_fragment *fragment; + struct ssdfs_offset_position *pos; + int state; + int id = -1; + u16 offset_index = U16_MAX; + u16 capacity; + void *kaddr; + unsigned long last_id; +#ifdef CONFIG_SSDFS_DEBUG + int i; +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table || !last_sequence_id); + BUG_ON(!rwsem_is_locked(&table->translation_lock)); + + SSDFS_DBG("table %p, logical_blk %u, peb_index %u\n", + table, logical_blk, peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (peb_index >= table->pebs_count) { + SSDFS_ERR("fail to change offset value: " + "peb_index %u >= pebs_count %u\n", + peb_index, table->pebs_count); + return -EINVAL; + } + + capacity = table->lblk2off_capacity; + phys_off_table = &table->peb[peb_index]; + + state = atomic_read(&phys_off_table->state); + if (state < SSDFS_BLK2OFF_TABLE_PARTIAL_INIT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("table is not initialized for peb %u\n", + peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOENT; + } else if (state >= SSDFS_BLK2OFF_TABLE_STATE_MAX) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unknown table state %#x\n", + state); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + sequence = phys_off_table->sequence; + + if (is_ssdfs_sequence_array_last_id_invalid(sequence)) { + /* first creation */ + return -ENOSPC; + } + + last_id = ssdfs_sequence_array_last_id(sequence); + if (last_id >= U16_MAX) { + SSDFS_ERR("invalid last_id %lu\n", last_id); + return -ERANGE; + } else + *last_sequence_id = (u16)last_id; + + if (*last_sequence_id > SSDFS_BLK2OFF_TBL_REVERT_THRESHOLD) { + SSDFS_ERR("invalid last_sequence_id %d\n", + *last_sequence_id); + return -ERANGE; + } + + kaddr = ssdfs_sequence_array_get_item(sequence, *last_sequence_id); + if (IS_ERR_OR_NULL(kaddr)) { + err = (kaddr == NULL ? -ENOENT : PTR_ERR(kaddr)); + SSDFS_ERR("fail to get fragment: " + "sequence_id %u, err %d\n", + *last_sequence_id, err); + return err; + } + fragment = (struct ssdfs_phys_offset_table_fragment *)kaddr; + + state = atomic_read(&fragment->state); + if (state < SSDFS_BLK2OFF_FRAG_CREATED) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment %u isn't created\n", + *last_sequence_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOENT; + } else if (state == SSDFS_BLK2OFF_FRAG_UNDER_COMMIT || + state == SSDFS_BLK2OFF_FRAG_COMMITED) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment %d is under commit\n", + *last_sequence_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } else if (state >= SSDFS_BLK2OFF_FRAG_STATE_MAX) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unknown fragment state %#x\n", + state); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + pos = SSDFS_OFF_POS(ssdfs_dynamic_array_get_locked(&table->lblk2off, + logical_blk)); + if (IS_ERR_OR_NULL(pos)) { + err = (pos == NULL ? -ENOENT : PTR_ERR(pos)); + SSDFS_ERR("fail to get logical block: " + "logical_blk %u, err %d\n", + logical_blk, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("POS BEFORE: cno %llu, id %u, peb_index %u, " + "sequence_id %u, offset_index %u\n", + pos->cno, pos->id, pos->peb_index, + pos->sequence_id, pos->offset_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ssdfs_blk2off_table_bmap_vacant(&table->lbmap, + SSDFS_LBMAP_MODIFICATION_INDEX, + capacity, + logical_blk)) { + if (pos->sequence_id == *last_sequence_id) { + pos->cno = ssdfs_current_cno(table->fsi->sb); + pos->peb_index = peb_index; + id = pos->id; + offset_index = pos->offset_index; + } else if (pos->sequence_id < *last_sequence_id) { + offset_index = + atomic_inc_return(&fragment->id_count) - 1; + id = fragment->start_id + offset_index; + + if (!is_id_valid_for_assignment(table, fragment, id)) { + err = -ENOSPC; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("id %d cannot be assign " + "for fragment %d\n", + id, *last_sequence_id); +#endif /* CONFIG_SSDFS_DEBUG */ + atomic_dec(&fragment->id_count); + goto finish_assign_id; + } + + pos->cno = ssdfs_current_cno(table->fsi->sb); + pos->id = (u16)id; + pos->peb_index = peb_index; + pos->sequence_id = *last_sequence_id; + pos->offset_index = offset_index; + } else if (pos->sequence_id >= SSDFS_INVALID_FRAG_ID) { + offset_index = + atomic_inc_return(&fragment->id_count) - 1; + id = fragment->start_id + offset_index; + + if (!is_id_valid_for_assignment(table, fragment, id)) { + err = -ENOSPC; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("id %d cannot be assign " + "for fragment %d\n", + id, *last_sequence_id); +#endif /* CONFIG_SSDFS_DEBUG */ + atomic_dec(&fragment->id_count); + goto finish_assign_id; + } + + pos->cno = ssdfs_current_cno(table->fsi->sb); + pos->id = (u16)id; + pos->peb_index = peb_index; + pos->sequence_id = *last_sequence_id; + pos->offset_index = offset_index; + } else if (pos->sequence_id > *last_sequence_id) { + err = -ERANGE; + SSDFS_WARN("sequence_id %u > last_sequence_id %d\n", + pos->sequence_id, + *last_sequence_id); + goto finish_assign_id; + } + } else { + offset_index = atomic_inc_return(&fragment->id_count) - 1; + id = fragment->start_id + offset_index; + + if (!is_id_valid_for_assignment(table, fragment, id)) { + err = -ENOSPC; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("id %d cannot be assign for fragment %d\n", + id, *last_sequence_id); +#endif /* CONFIG_SSDFS_DEBUG */ + atomic_dec(&fragment->id_count); + goto finish_assign_id; + } + + pos->cno = ssdfs_current_cno(table->fsi->sb); + pos->id = (u16)id; + pos->peb_index = peb_index; + pos->sequence_id = *last_sequence_id; + pos->offset_index = offset_index; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("POS AFTER: cno %llu, id %u, peb_index %u, " + "sequence_id %u, offset_index %u\n", + pos->cno, pos->id, pos->peb_index, + pos->sequence_id, pos->offset_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (blk_desc) { + ssdfs_memcpy(&pos->blk_desc.buf, + 0, sizeof(struct ssdfs_block_descriptor), + blk_desc, + 0, sizeof(struct ssdfs_block_descriptor), + sizeof(struct ssdfs_block_descriptor)); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u, id %d, " + "peb_index %u, sequence_id %u, offset_index %u\n", + logical_blk, id, peb_index, + *last_sequence_id, offset_index); + + for (i = 0; i < SSDFS_BLK_STATE_OFF_MAX; i++) { + struct ssdfs_blk_state_offset *offset = NULL; + + offset = &blk_desc->state[i]; + + SSDFS_DBG("BLK STATE OFFSET %d: " + "log_start_page %u, log_area %#x, " + "byte_offset %u, " + "peb_migration_id %u\n", + i, + le16_to_cpu(offset->log_start_page), + offset->log_area, + le32_to_cpu(offset->byte_offset), + offset->peb_migration_id); + } +#endif /* CONFIG_SSDFS_DEBUG */ + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("DONE: logical_blk %u, id %d, " + "peb_index %u, sequence_id %u, offset_index %u\n", + logical_blk, id, peb_index, + *last_sequence_id, offset_index); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_assign_id: + ssdfs_dynamic_array_release(&table->lblk2off, logical_blk, pos); + return err; +} + +/* + * ssdfs_blk2off_table_add_fragment() - add fragment into PEB's table + * @table: pointer on table object + * @peb_index: PEB's index + * @old_sequence_id: old last sequence id + * + * This method tries to initialize additional fragment into + * PEB's table. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ERANGE - internal logic error. + * %-EAGAIN - PEB's fragment count isn't equal to @old_fragment_count + * %-ENOSPC - table hasn't space for new fragments + */ +static +int ssdfs_blk2off_table_add_fragment(struct ssdfs_blk2off_table *table, + u16 peb_index, + u16 old_sequence_id) +{ + struct ssdfs_phys_offset_table_array *phys_off_table; + struct ssdfs_sequence_array *sequence; + struct ssdfs_phys_offset_table_fragment *fragment, *prev_fragment; + unsigned long last_sequence_id = ULONG_MAX; + u16 start_id; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table); + BUG_ON(!rwsem_is_locked(&table->translation_lock)); + + SSDFS_DBG("table %p, peb_index %u, old_sequence_id %d\n", + table, peb_index, old_sequence_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (peb_index >= table->pebs_count) { + SSDFS_ERR("fail to change offset value: " + "peb_index %u >= pebs_count %u\n", + peb_index, table->pebs_count); + return -EINVAL; + } + + phys_off_table = &table->peb[peb_index]; + sequence = phys_off_table->sequence; + + if (is_ssdfs_sequence_array_last_id_invalid(sequence)) { + /* + * first creation + */ + } else { + last_sequence_id = ssdfs_sequence_array_last_id(sequence); + if (last_sequence_id != old_sequence_id) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("last_id %lu != old_id %u\n", + last_sequence_id, old_sequence_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EAGAIN; + } + } + + fragment = ssdfs_blk2off_frag_alloc(); + if (IS_ERR_OR_NULL(fragment)) { + err = (fragment == NULL ? -ENOMEM : PTR_ERR(fragment)); + SSDFS_ERR("fail to allocate fragment: " + "err %d\n", err); + return err; + } + + err = ssdfs_sequence_array_add_item(sequence, fragment, + &last_sequence_id); + if (unlikely(err)) { + ssdfs_blk2off_frag_free(fragment); + SSDFS_ERR("fail to add fragment: " + "err %d\n", err); + return err; + } + + if (last_sequence_id == 0) { + start_id = 0; + } else { + int prev_id_count; + void *kaddr; + + kaddr = ssdfs_sequence_array_get_item(sequence, + last_sequence_id - 1); + if (IS_ERR_OR_NULL(kaddr)) { + err = (kaddr == NULL ? -ENOENT : PTR_ERR(kaddr)); + SSDFS_ERR("fail to get fragment: " + "sequence_id %lu, err %d\n", + last_sequence_id - 1, err); + return err; + } + prev_fragment = + (struct ssdfs_phys_offset_table_fragment *)kaddr; + + start_id = prev_fragment->start_id; + prev_id_count = atomic_read(&prev_fragment->id_count); + + if ((start_id + prev_id_count + 1) >= SSDFS_INVALID_OFFSET_ID) + start_id = 0; + else + start_id += prev_id_count; + } + + err = ssdfs_blk2off_table_init_fragment(fragment, last_sequence_id, + start_id, table->pages_per_peb, + SSDFS_BLK2OFF_FRAG_INITIALIZED, + NULL); + if (err) { + SSDFS_ERR("fail to init fragment %lu: err %d\n", + last_sequence_id, err); + return err; + } + + atomic_inc(&phys_off_table->fragment_count); + + return 0; +} + +/* + * ssdfs_table_fragment_set_dirty() - set fragment dirty + * @table: pointer on table object + * @peb_index: PEB's index value + * @sequence_id: fragment's sequence_id + */ +static inline +int ssdfs_table_fragment_set_dirty(struct ssdfs_blk2off_table *table, + u16 peb_index, u16 sequence_id) +{ + struct ssdfs_phys_offset_table_array *phys_off_table; + int new_state = SSDFS_BLK2OFF_TABLE_UNDEFINED; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table); + BUG_ON(!rwsem_is_locked(&table->translation_lock)); + + SSDFS_DBG("table %p, peb_index %u, sequence_id %u\n", + table, peb_index, sequence_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + phys_off_table = &table->peb[peb_index]; + + err = ssdfs_sequence_array_change_state(phys_off_table->sequence, + sequence_id, + SSDFS_SEQUENCE_ITEM_NO_TAG, + SSDFS_SEQUENCE_ITEM_DIRTY_TAG, + ssdfs_change_fragment_state, + SSDFS_BLK2OFF_FRAG_INITIALIZED, + SSDFS_BLK2OFF_FRAG_DIRTY); + if (unlikely(err)) { + SSDFS_ERR("fail to set fragment dirty: " + "sequence_id %u, err %d\n", + sequence_id, err); + return err; + } + + switch (atomic_read(&phys_off_table->state)) { + case SSDFS_BLK2OFF_TABLE_COMPLETE_INIT: + new_state = SSDFS_BLK2OFF_TABLE_DIRTY; + break; + + case SSDFS_BLK2OFF_TABLE_PARTIAL_INIT: + new_state = SSDFS_BLK2OFF_TABLE_DIRTY_PARTIAL_INIT; + break; + + case SSDFS_BLK2OFF_TABLE_DIRTY_PARTIAL_INIT: + SSDFS_DBG("blk2off table is dirty already\n"); + new_state = SSDFS_BLK2OFF_TABLE_DIRTY_PARTIAL_INIT; + break; + + case SSDFS_BLK2OFF_TABLE_DIRTY: + SSDFS_DBG("blk2off table is dirty already\n"); + new_state = SSDFS_BLK2OFF_TABLE_DIRTY; + break; + + default: + SSDFS_WARN("unexpected blk2off state %#x\n", + atomic_read(&phys_off_table->state)); + new_state = SSDFS_BLK2OFF_TABLE_DIRTY; + break; + } + + atomic_set(&phys_off_table->state, + new_state); + + return 0; +} + +/* + * ssdfs_blk2off_table_fragment_set_clean() - set fragment clean + * @table: pointer on table object + * @peb_index: PEB's index value + * @sequence_id: fragment's sequence_id + */ +#ifdef CONFIG_SSDFS_TESTING +int ssdfs_blk2off_table_fragment_set_clean(struct ssdfs_blk2off_table *table, + u16 peb_index, u16 sequence_id) +{ + struct ssdfs_phys_offset_table_array *phys_off_table; + int new_state = SSDFS_BLK2OFF_TABLE_COMPLETE_INIT; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table); + BUG_ON(!rwsem_is_locked(&table->translation_lock)); + + SSDFS_DBG("table %p, peb_index %u, sequence_id %u\n", + table, peb_index, sequence_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + phys_off_table = &table->peb[peb_index]; + + err = ssdfs_sequence_array_change_state(phys_off_table->sequence, + sequence_id, + SSDFS_SEQUENCE_ITEM_DIRTY_TAG, + SSDFS_SEQUENCE_ITEM_NO_TAG, + ssdfs_change_fragment_state, + SSDFS_BLK2OFF_FRAG_DIRTY, + SSDFS_BLK2OFF_FRAG_INITIALIZED); + if (unlikely(err)) { + SSDFS_ERR("fail to set fragment clean: " + "sequence_id %u, err %d\n", + sequence_id, err); + return err; + } + + atomic_set(&phys_off_table->state, new_state); + + return 0; +} +#endif /* CONFIG_SSDFS_TESTING */ + +/* + * ssdfs_blk2off_table_change_offset() - update logical block's offset + * @table: pointer on table object + * @logical_blk: logical block number + * @peb_index: PEB's index value + * @blk_desc: block descriptor + * @off: new value of offset [in] + * + * This method tries to update offset value for logical block. + * Firstly, logical blocks' state bitmap is set when allocation + * takes place. But table->lblk2off array contains U16_MAX for + * this logical block number. It means that logical block was + * allocated but it doesn't correspond to any physical offset + * ID. Secondly, it needs to provide every call of + * ssdfs_blk2off_table_change_offset() with peb_index value. + * In such situation the method sets correspondence between + * logical block and physical offset ID. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input value. + * %-ERANGE - internal logic error. + * %-EAGAIN - table doesn't prepared for this change yet. + * %-ENODATA - table doesn't contain logical block. + * %-ENOENT - table's fragment for requested logical block not initialized + */ +int ssdfs_blk2off_table_change_offset(struct ssdfs_blk2off_table *table, + u16 logical_blk, + u16 peb_index, + struct ssdfs_block_descriptor *blk_desc, + struct ssdfs_phys_offset_descriptor *off) +{ + struct ssdfs_phys_offset_table_array *phys_off_table; + struct ssdfs_sequence_array *sequence; + struct ssdfs_phys_offset_table_fragment *fragment; + struct ssdfs_offset_position pos = {0}; + u16 last_sequence_id = SSDFS_INVALID_FRAG_ID; + void *kaddr; + u16 capacity; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table || !off); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("table %p, logical_blk %u, peb_index %u, " + "off->page_desc.logical_offset %u, " + "off->page_desc.logical_blk %u, " + "off->page_desc.peb_page %u, " + "off->blk_state.log_start_page %u, " + "off->blk_state.log_area %u, " + "off->blk_state.peb_migration_id %u, " + "off->blk_state.byte_offset %u\n", + table, logical_blk, peb_index, + le32_to_cpu(off->page_desc.logical_offset), + le16_to_cpu(off->page_desc.logical_blk), + le16_to_cpu(off->page_desc.peb_page), + le16_to_cpu(off->blk_state.log_start_page), + off->blk_state.log_area, + off->blk_state.peb_migration_id, + le32_to_cpu(off->blk_state.byte_offset)); +#else + SSDFS_DBG("table %p, logical_blk %u, peb_index %u, " + "off->page_desc.logical_offset %u, " + "off->page_desc.logical_blk %u, " + "off->page_desc.peb_page %u, " + "off->blk_state.log_start_page %u, " + "off->blk_state.log_area %u, " + "off->blk_state.peb_migration_id %u, " + "off->blk_state.byte_offset %u\n", + table, logical_blk, peb_index, + le32_to_cpu(off->page_desc.logical_offset), + le16_to_cpu(off->page_desc.logical_blk), + le16_to_cpu(off->page_desc.peb_page), + le16_to_cpu(off->blk_state.log_start_page), + off->blk_state.log_area, + off->blk_state.peb_migration_id, + le32_to_cpu(off->blk_state.byte_offset)); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (peb_index >= table->pebs_count) { + SSDFS_ERR("fail to change offset value: " + "peb_index %u >= pebs_count %u\n", + peb_index, table->pebs_count); + return -EINVAL; + } + + down_write(&table->translation_lock); + + if (logical_blk >= table->lblk2off_capacity) { + err = -EINVAL; + SSDFS_ERR("fail to convert logical block: " + "block %u >= capacity %u\n", + logical_blk, + table->lblk2off_capacity); + goto finish_table_modification; + } + + capacity = table->lblk2off_capacity; + + if (atomic_read(&table->state) <= SSDFS_BLK2OFF_OBJECT_PARTIAL_INIT) { + if (ssdfs_blk2off_table_bmap_vacant(&table->lbmap, + SSDFS_LBMAP_INIT_INDEX, + capacity, + logical_blk)) { + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("table is not initialized yet: " + "logical_blk %u\n", + logical_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_table_modification; + } + } + + if (ssdfs_blk2off_table_bmap_vacant(&table->lbmap, + SSDFS_LBMAP_STATE_INDEX, + capacity, + logical_blk)) { + err = -ENODATA; + SSDFS_ERR("logical block is not allocated yet: " + "logical_blk %u\n", + logical_blk); + goto finish_table_modification; + } + + err = ssdfs_blk2off_table_assign_id(table, logical_blk, + peb_index, blk_desc, + &last_sequence_id); + if (err == -ENOSPC) { + err = ssdfs_blk2off_table_add_fragment(table, peb_index, + last_sequence_id); + if (unlikely(err)) { + SSDFS_ERR("fail to add fragment: " + "peb_index %u, err %d\n", + peb_index, err); + goto finish_table_modification; + } + + err = ssdfs_blk2off_table_assign_id(table, logical_blk, + peb_index, blk_desc, + &last_sequence_id); + if (unlikely(err)) { + SSDFS_ERR("fail to assign id: " + "peb_index %u, logical_blk %u, err %d\n", + peb_index, logical_blk, err); + goto finish_table_modification; + } + } else if (err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("meet unintialized fragment: " + "peb_index %u, logical_blk %u\n", + peb_index, logical_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_table_modification; + } else if (unlikely(err)) { + SSDFS_ERR("fail to assign id: " + "peb_index %u, logical_blk %u, err %d\n", + peb_index, logical_blk, err); + goto finish_table_modification; + } + + err = ssdfs_blk2off_table_get_checked_position(table, logical_blk, + &pos); + if (unlikely(err)) { + SSDFS_ERR("fail to get checked offset's position: " + "logical_block %u, err %d\n", + logical_blk, err); + goto finish_table_modification; + } + + phys_off_table = &table->peb[peb_index]; + + sequence = phys_off_table->sequence; + kaddr = ssdfs_sequence_array_get_item(sequence, pos.sequence_id); + if (IS_ERR_OR_NULL(kaddr)) { + err = (kaddr == NULL ? -ENOENT : PTR_ERR(kaddr)); + SSDFS_ERR("fail to get fragment: " + "sequence_id %u, err %d\n", + pos.sequence_id, err); + goto finish_table_modification; + } + fragment = (struct ssdfs_phys_offset_table_fragment *)kaddr; + + down_write(&fragment->lock); + + err = ssdfs_blk2off_table_check_fragment_desc(table, fragment, &pos); + if (unlikely(err)) { + SSDFS_ERR("invalid fragment description: err %d\n", err); + goto finish_fragment_modification; + } + + err = ssdfs_blk2off_table_bmap_set(&table->lbmap, + SSDFS_LBMAP_MODIFICATION_INDEX, + logical_blk); + if (unlikely(err)) { + SSDFS_ERR("fail to set bitmap: " + "logical_blk %u, err %d\n", + logical_blk, err); + goto finish_fragment_modification; + } + + downgrade_write(&table->translation_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u, POS: cno %llu, id %u, " + "peb_index %u, sequence_id %u, offset_index %u\n", + logical_blk, pos.cno, pos.id, pos.peb_index, + pos.sequence_id, pos.offset_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_memcpy(&fragment->phys_offs[pos.offset_index], + 0, sizeof(struct ssdfs_phys_offset_descriptor), + off, 0, sizeof(struct ssdfs_phys_offset_descriptor), + sizeof(struct ssdfs_phys_offset_descriptor)); + + ssdfs_table_fragment_set_dirty(table, peb_index, pos.sequence_id); + + up_write(&fragment->lock); + up_read(&table->translation_lock); + + wake_up_all(&table->wait_queue); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; + +finish_fragment_modification: + up_write(&fragment->lock); + +finish_table_modification: + up_write(&table->translation_lock); + + wake_up_all(&table->wait_queue); + + return err; +} + +/* + * ssdfs_blk2off_table_bmap_allocate() - find vacant and set logical block + * @lbmap: bitmap array pointer + * @bitmap_index: index of bitmap in array + * @start_blk: start block for search + * @len: requested length + * @max_blks: upper bound for search + * @extent: pointer on found extent of logical blocks [out] + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EAGAIN - allocated extent hasn't requested length. + * %-ENODATA - unable to allocate. + */ +static inline +int ssdfs_blk2off_table_bmap_allocate(struct ssdfs_bitmap_array *lbmap, + int bitmap_index, + u16 start_blk, u16 len, + u16 max_blks, + struct ssdfs_blk2off_range *extent) +{ + unsigned long found, end; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!lbmap || !extent); + + SSDFS_DBG("lbmap %p, bitmap_index %d, " + "start_blk %u, len %u, " + "max_blks %u, extent %p\n", + lbmap, bitmap_index, + start_blk, len, max_blks, extent); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (bitmap_index >= SSDFS_LBMAP_ARRAY_MAX) { + SSDFS_ERR("invalid bitmap index %d\n", + bitmap_index); + return -EINVAL; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!lbmap->array[bitmap_index]); +#endif /* CONFIG_SSDFS_DEBUG */ + + len = min_t(u16, len, max_blks); + + found = find_next_zero_bit(lbmap->array[bitmap_index], + lbmap->bits_count, start_blk); + if (found >= lbmap->bits_count) { + if (lbmap->bits_count >= max_blks) { + SSDFS_DBG("unable to allocate\n"); + return -ENODATA; + } + + err = ssdfs_blk2off_table_resize_bitmap_array(lbmap, + lbmap->bits_count); + if (unlikely(err)) { + SSDFS_ERR("fail to realloc bitmap array: " + "err %d\n", err); + return err; + } + + found = find_next_zero_bit(lbmap->array[bitmap_index], + lbmap->bits_count, start_blk); + if (found >= lbmap->bits_count) { + SSDFS_ERR("unable to allocate\n"); + return -ENODATA; + } + } + BUG_ON(found >= U16_MAX); + + if (found >= max_blks) { + SSDFS_DBG("unable to allocate\n"); + return -ENODATA; + } + + end = min_t(unsigned long, found + len, (unsigned long)max_blks); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found %lu, len %u, max_blks %u, end %lu\n", + found, len, max_blks, end); +#endif /* CONFIG_SSDFS_DEBUG */ + + end = find_next_bit(lbmap->array[bitmap_index], + end, found); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found %lu, end %lu\n", + found, end); +#endif /* CONFIG_SSDFS_DEBUG */ + + extent->start_lblk = (u16)found; + extent->len = (u16)(end - found); + + if (extent->len < len && lbmap->bits_count < max_blks) { + err = ssdfs_blk2off_table_resize_bitmap_array(lbmap, end); + if (unlikely(err)) { + SSDFS_ERR("fail to realloc bitmap array: " + "err %d\n", err); + return err; + } + + end = find_next_bit(lbmap->array[bitmap_index], + end, found); + } + + extent->start_lblk = (u16)found; + extent->len = (u16)(end - found); + + bitmap_set(lbmap->array[bitmap_index], extent->start_lblk, extent->len); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found extent (start %u, len %u)\n", + extent->start_lblk, extent->len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (extent->len < len) + return -EAGAIN; + + return 0; +} + +/* + * ssdfs_blk2off_table_allocate_extent() - allocate vacant extent + * @table: pointer on table object + * @len: requested length + * @extent: pointer on found extent [out] + * + * This method tries to allocate vacant extent. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal logic error. + * %-EAGAIN - table doesn't prepared for this change yet. + * %-ENODATA - bitmap hasn't vacant logical blocks. + */ +int ssdfs_blk2off_table_allocate_extent(struct ssdfs_blk2off_table *table, + u16 len, + struct ssdfs_blk2off_range *extent) +{ + void *kaddr; + size_t off_pos_size = sizeof(struct ssdfs_offset_position); + u16 start_blk = 0; + u16 i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table || !extent); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("table %p, len %u, extent %p, " + "used_logical_blks %u, free_logical_blks %u, " + "last_allocated_blk %u\n", + table, len, extent, + table->used_logical_blks, + table->free_logical_blks, + table->last_allocated_blk); +#else + SSDFS_DBG("table %p, len %u, extent %p, " + "used_logical_blks %u, free_logical_blks %u, " + "last_allocated_blk %u\n", + table, len, extent, + table->used_logical_blks, + table->free_logical_blks, + table->last_allocated_blk); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (atomic_read(&table->state) <= SSDFS_BLK2OFF_OBJECT_CREATED) { + SSDFS_DBG("unable to allocate before initialization\n"); + return -EAGAIN; + } + + down_write(&table->translation_lock); + + if (table->free_logical_blks == 0) { + if (table->used_logical_blks != table->lblk2off_capacity) { + err = -ERANGE; + SSDFS_ERR("used_logical_blks %u != capacity %u\n", + table->used_logical_blks, + table->lblk2off_capacity); + } else { + err = -ENODATA; + SSDFS_DBG("bitmap hasn't vacant logical blocks\n"); + } + goto finish_allocation; + } + + if (atomic_read(&table->state) == SSDFS_BLK2OFF_OBJECT_PARTIAL_INIT) { + u16 capacity = table->lblk2off_capacity; + bool is_vacant; + + start_blk = table->last_allocated_blk; + is_vacant = ssdfs_blk2off_table_bmap_vacant(&table->lbmap, + SSDFS_LBMAP_INIT_INDEX, + capacity, + start_blk); + + if (is_vacant) { + start_blk = table->used_logical_blks; + if (start_blk > 0) + start_blk--; + + is_vacant = + ssdfs_blk2off_table_bmap_vacant(&table->lbmap, + SSDFS_LBMAP_INIT_INDEX, + capacity, + start_blk); + } + + if (is_vacant) { + err = -EAGAIN; + SSDFS_DBG("table is not initialized yet\n"); + goto finish_allocation; + } + } + + err = ssdfs_blk2off_table_bmap_allocate(&table->lbmap, + SSDFS_LBMAP_STATE_INDEX, + start_blk, len, + table->lblk2off_capacity, + extent); + if (err == -EAGAIN) { + err = 0; + SSDFS_DBG("requested extent doesn't allocated fully\n"); + goto finish_allocation; + } else if (err == -ENODATA) + goto try_next_range; + else if (unlikely(err)) { + SSDFS_ERR("fail to find vacant extent: err %d\n", + err); + goto finish_allocation; + } else + goto save_found_extent; + +try_next_range: + if (atomic_read(&table->state) < SSDFS_BLK2OFF_OBJECT_COMPLETE_INIT) { + err = -EAGAIN; + SSDFS_DBG("table is not initialized yet\n"); + goto finish_allocation; + } + + err = ssdfs_blk2off_table_bmap_allocate(&table->lbmap, + SSDFS_LBMAP_STATE_INDEX, + 0, len, start_blk, + extent); + if (err == -EAGAIN) { + err = 0; + SSDFS_DBG("requested extent doesn't allocated fully\n"); + goto finish_allocation; + } else if (err == -ENODATA) { + SSDFS_DBG("bitmap hasn't vacant logical blocks\n"); + goto finish_allocation; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find vacant extent: err %d\n", + err); + goto finish_allocation; + } + +save_found_extent: + for (i = 0; i < extent->len; i++) { + u16 blk = extent->start_lblk + i; + + kaddr = ssdfs_dynamic_array_get_locked(&table->lblk2off, blk); + if (IS_ERR_OR_NULL(kaddr)) { + err = (kaddr == NULL ? -ENOENT : PTR_ERR(kaddr)); + SSDFS_ERR("fail to get logical block: " + "blk %u, extent (start %u, len %u), " + "err %d\n", + blk, extent->start_lblk, + extent->len, err); + goto finish_allocation; + } + + memset(kaddr, 0xFF, off_pos_size); + + err = ssdfs_dynamic_array_release(&table->lblk2off, + blk, kaddr); + if (unlikely(err)) { + SSDFS_ERR("fail to release: " + "blk %u, extent (start %u, len %u), " + "err %d\n", + blk, extent->start_lblk, + extent->len, err); + goto finish_allocation; + } + } + + BUG_ON(table->used_logical_blks > (U16_MAX - extent->len)); + BUG_ON((table->used_logical_blks + extent->len) > + table->lblk2off_capacity); + table->used_logical_blks += extent->len; + + BUG_ON(extent->len > table->free_logical_blks); + table->free_logical_blks -= extent->len; + + BUG_ON(extent->len == 0); + table->last_allocated_blk = extent->start_lblk + extent->len - 1; + +finish_allocation: + up_write(&table->translation_lock); + + if (!err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("extent (start %u, len %u) has been allocated\n", + extent->start_lblk, extent->len); +#endif /* CONFIG_SSDFS_DEBUG */ + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: err %d\n", err); +#else + SSDFS_DBG("finished: err %d\n", err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ssdfs_debug_blk2off_table_object(table); + + return err; +} + +/* + * ssdfs_blk2off_table_allocate_block() - allocate vacant logical block + * @table: pointer on table object + * @logical_blk: pointer on found logical block value [out] + * + * This method tries to allocate vacant logical block. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal logic error. + * %-EAGAIN - table doesn't prepared for this change yet. + * %-ENODATA - bitmap hasn't vacant logical blocks. + */ +int ssdfs_blk2off_table_allocate_block(struct ssdfs_blk2off_table *table, + u16 *logical_blk) +{ + struct ssdfs_blk2off_range extent = {0}; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table || !logical_blk); + + SSDFS_DBG("table %p, logical_blk %p, " + "used_logical_blks %u, free_logical_blks %u, " + "last_allocated_blk %u\n", + table, logical_blk, + table->used_logical_blks, + table->free_logical_blks, + table->last_allocated_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_blk2off_table_allocate_extent(table, 1, &extent); + if (err) { + SSDFS_ERR("fail to allocate logical block: err %d\n", + err); + SSDFS_ERR("used_logical_blks %u, free_logical_blks %u, " + "last_allocated_blk %u\n", + table->used_logical_blks, + table->free_logical_blks, + table->last_allocated_blk); + return err; + } else if (extent.start_lblk >= table->lblk2off_capacity || + extent.len != 1) { + SSDFS_ERR("invalid extent (start %u, len %u)\n", + extent.start_lblk, extent.len); + return -ERANGE; + } + + *logical_blk = extent.start_lblk; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical block %u has been allocated\n", + *logical_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_blk2off_table_free_extent() - free extent + * @table: pointer on table object + * @peb_index: PEB's index + * @extent: pointer on extent + * + * This method tries to free extent of logical blocks. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input + * %-ERANGE - internal logic error. + * %-EAGAIN - table doesn't prepared for this change yet. + * %-ENOENT - logical block isn't allocated yet. + */ +int ssdfs_blk2off_table_free_extent(struct ssdfs_blk2off_table *table, + u16 peb_index, + struct ssdfs_blk2off_range *extent) +{ + struct ssdfs_phys_offset_table_array *phys_off_table; + struct ssdfs_sequence_array *sequence; + struct ssdfs_phys_offset_table_fragment *fragment; + struct ssdfs_phys_offset_descriptor off; + u16 last_sequence_id = SSDFS_INVALID_FRAG_ID; + struct ssdfs_offset_position pos = {0}; + void *old_pos; + size_t desc_size = sizeof(struct ssdfs_offset_position); + struct ssdfs_block_descriptor blk_desc = {0}; + bool is_vacant; + u16 end_lblk; + int state; + void *kaddr; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table || !extent); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("table %p, extent (start %u, len %u)\n", + table, extent->start_lblk, extent->len); +#else + SSDFS_DBG("table %p, extent (start %u, len %u)\n", + table, extent->start_lblk, extent->len); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (atomic_read(&table->state) <= SSDFS_BLK2OFF_OBJECT_CREATED) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to free before initialization: " + "extent (start %u, len %u)\n", + extent->start_lblk, extent->len); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EAGAIN; + } + + memset(&blk_desc, 0xFF, sizeof(struct ssdfs_block_descriptor)); + + down_write(&table->translation_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("used_logical_blks %u, free_logical_blks %u, " + "last_allocated_blk %u, lblk2off_capacity %u\n", + table->used_logical_blks, + table->free_logical_blks, + table->last_allocated_blk, + table->lblk2off_capacity); + + BUG_ON(extent->len > table->used_logical_blks); + BUG_ON(table->used_logical_blks > table->lblk2off_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + if ((extent->start_lblk + extent->len) > table->lblk2off_capacity) { + err = -EINVAL; + SSDFS_ERR("fail to free extent (start %u, len %u)\n", + extent->start_lblk, extent->len); + goto finish_freeing; + } + + state = atomic_read(&table->state); + if (state == SSDFS_BLK2OFF_OBJECT_PARTIAL_INIT) { + is_vacant = ssdfs_blk2off_table_extent_vacant(&table->lbmap, + SSDFS_LBMAP_INIT_INDEX, + table->lblk2off_capacity, + extent); + + if (is_vacant) { + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to free before initialization: " + "extent (start %u, len %u)\n", + extent->start_lblk, extent->len); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_freeing; + } + } + + is_vacant = ssdfs_blk2off_table_extent_vacant(&table->lbmap, + SSDFS_LBMAP_STATE_INDEX, + table->lblk2off_capacity, + extent); + if (is_vacant) { + err = -ENOENT; + SSDFS_WARN("extent (start %u, len %u) " + "doesn't allocated yet\n", + extent->start_lblk, extent->len); + goto finish_freeing; + } + + end_lblk = extent->start_lblk + extent->len; + for (i = extent->start_lblk; i < end_lblk; i++) { + old_pos = ssdfs_dynamic_array_get_locked(&table->lblk2off, i); + if (IS_ERR_OR_NULL(old_pos)) { + err = (old_pos == NULL ? -ENOENT : PTR_ERR(old_pos)); + SSDFS_ERR("fail to get logical block: " + "blk %u, err %d\n", + i, err); + goto finish_freeing; + } + + if (SSDFS_OFF_POS(old_pos)->id == U16_MAX) { + SSDFS_WARN("logical block %d hasn't associated ID\n", + i); + } + + err = ssdfs_dynamic_array_release(&table->lblk2off, + i, old_pos); + if (unlikely(err)) { + SSDFS_ERR("fail to release: " + "blk %u, err %d\n", + i, err); + goto finish_freeing; + } + + err = ssdfs_blk2off_table_assign_id(table, i, peb_index, + &blk_desc, + &last_sequence_id); + if (err == -ENOSPC) { + err = ssdfs_blk2off_table_add_fragment(table, peb_index, + last_sequence_id); + if (unlikely(err)) { + SSDFS_ERR("fail to add fragment: " + "peb_index %u, err %d\n", + peb_index, err); + goto finish_freeing; + } + + err = ssdfs_blk2off_table_assign_id(table, i, + peb_index, + &blk_desc, + &last_sequence_id); + if (unlikely(err)) { + SSDFS_ERR("fail to assign id: " + "peb_index %u, logical_blk %u, " + "err %d\n", + peb_index, i, err); + goto finish_freeing; + } + } else if (err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("meet unintialized fragment: " + "peb_index %u, logical_blk %u\n", + peb_index, i); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_freeing; + } else if (unlikely(err)) { + SSDFS_ERR("fail to assign id: " + "peb_index %u, logical_blk %u, err %d\n", + peb_index, i, err); + goto finish_freeing; + } + + err = ssdfs_blk2off_table_get_checked_position(table, (u16)i, + &pos); + if (unlikely(err)) { + SSDFS_ERR("fail to get checked offset's position: " + "logical_block %d, err %d\n", + i, err); + goto finish_freeing; + } + + phys_off_table = &table->peb[peb_index]; + + sequence = phys_off_table->sequence; + kaddr = ssdfs_sequence_array_get_item(sequence, + pos.sequence_id); + if (IS_ERR_OR_NULL(kaddr)) { + err = (kaddr == NULL ? -ENOENT : PTR_ERR(kaddr)); + SSDFS_ERR("fail to get fragment: " + "sequence_id %u, err %d\n", + pos.sequence_id, err); + goto finish_freeing; + } + fragment = (struct ssdfs_phys_offset_table_fragment *)kaddr; + + down_write(&fragment->lock); + + err = ssdfs_blk2off_table_check_fragment_desc(table, fragment, + &pos); + if (unlikely(err)) { + SSDFS_ERR("invalid fragment description: err %d\n", + err); + goto finish_fragment_modification; + } + + ssdfs_blk2off_table_bmap_clear(&table->lbmap, + SSDFS_LBMAP_STATE_INDEX, + (u16)i); + ssdfs_blk2off_table_bmap_set(&table->lbmap, + SSDFS_LBMAP_MODIFICATION_INDEX, + (u16)i); + + off.page_desc.logical_offset = cpu_to_le32(U32_MAX); + off.page_desc.logical_blk = cpu_to_le16((u16)i); + off.page_desc.peb_page = cpu_to_le16(U16_MAX); + off.blk_state.log_start_page = cpu_to_le16(U16_MAX); + off.blk_state.log_area = U8_MAX; + off.blk_state.peb_migration_id = U8_MAX; + off.blk_state.byte_offset = cpu_to_le32(U32_MAX); + + ssdfs_memcpy(&fragment->phys_offs[pos.offset_index], + 0, sizeof(struct ssdfs_phys_offset_descriptor), + &off, + 0, sizeof(struct ssdfs_phys_offset_descriptor), + sizeof(struct ssdfs_phys_offset_descriptor)); + + ssdfs_table_fragment_set_dirty(table, peb_index, + pos.sequence_id); + +finish_fragment_modification: + up_write(&fragment->lock); + + if (unlikely(err)) + goto finish_freeing; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("BEFORE: logical_blk %d, pos (cno %llx, id %u, " + "sequence_id %u, offset_index %u)\n", + i, pos.cno, pos.id, pos.sequence_id, + pos.offset_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + pos.cno = ssdfs_current_cno(table->fsi->sb); + pos.id = SSDFS_BLK2OFF_TABLE_INVALID_ID; + pos.offset_index = U16_MAX; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("AFTER: logical_blk %d, pos (cno %llx, id %u, " + "sequence_id %u, offset_index %u)\n", + i, pos.cno, pos.id, pos.sequence_id, + pos.offset_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + old_pos = ssdfs_dynamic_array_get_locked(&table->lblk2off, i); + if (IS_ERR_OR_NULL(kaddr)) { + err = (kaddr == NULL ? -ENOENT : PTR_ERR(kaddr)); + SSDFS_ERR("fail to get logical block: " + "blk %u, err %d\n", + i, err); + goto finish_freeing; + } + + err = ssdfs_memcpy(old_pos, 0, desc_size, + &pos, 0, desc_size, + desc_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", + err); + goto finish_freeing; + } + + err = ssdfs_dynamic_array_release(&table->lblk2off, + i, kaddr); + if (unlikely(err)) { + SSDFS_ERR("fail to release: " + "blk %u, err %d\n", + i, err); + goto finish_freeing; + } + + BUG_ON(table->used_logical_blks == 0); + table->used_logical_blks--; + BUG_ON(table->free_logical_blks == U16_MAX); + table->free_logical_blks++; + } + +finish_freeing: + up_write(&table->translation_lock); + + if (!err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("extent (start %u, len %u) has been freed\n", + extent->start_lblk, extent->len); +#endif /* CONFIG_SSDFS_DEBUG */ + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: err %d\n", err); +#else + SSDFS_DBG("finished: err %d\n", err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + wake_up_all(&table->wait_queue); + + return err; +} + +/* + * ssdfs_blk2off_table_free_block() - free logical block + * @table: pointer on table object + * @peb_index: PEB's index + * @logical_blk: logical block number + * + * This method tries to free logical block number. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input + * %-ERANGE - internal logic error. + * %-EAGAIN - table doesn't prepared for this change yet. + * %-ENOENT - logical block isn't allocated yet. + */ +int ssdfs_blk2off_table_free_block(struct ssdfs_blk2off_table *table, + u16 peb_index, + u16 logical_blk) +{ + struct ssdfs_blk2off_range extent = { + .start_lblk = logical_blk, + .len = 1, + }; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table); + + SSDFS_DBG("table %p, logical_blk %u\n", + table, logical_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_blk2off_table_free_extent(table, peb_index, &extent); + if (err) { + SSDFS_ERR("fail to free logical block %u: err %d\n", + logical_blk, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical block %u has been freed\n", + logical_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_blk2off_table_set_block_migration() - set block migration + * @table: pointer on table object + * @logical_blk: logical block number + * @peb_index: PEB index in the segment + * @req: request's result with block's content + * + * This method tries to set migration state for logical block. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal logic error. + * %-EAGAIN - table doesn't prepared for this change yet. + */ +int ssdfs_blk2off_table_set_block_migration(struct ssdfs_blk2off_table *table, + u16 logical_blk, + u16 peb_index, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_migrating_block *blk = NULL; + u32 pages_per_lblk; + u32 start_page; + u32 count; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table || !req); + + SSDFS_DBG("table %p, logical_blk %u, peb_index %u, req %p\n", + table, logical_blk, peb_index, req); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = table->fsi; + pages_per_lblk = fsi->pagesize >> PAGE_SHIFT; + + if (peb_index >= table->pebs_count) { + SSDFS_ERR("fail to set block migration: " + "peb_index %u >= pebs_count %u\n", + peb_index, table->pebs_count); + return -EINVAL; + } + + if (logical_blk < req->place.start.blk_index || + logical_blk >= (req->place.start.blk_index + req->place.len)) { + SSDFS_ERR("inconsistent request: " + "logical_blk %u, " + "request (start_blk %u, len %u)\n", + logical_blk, + req->place.start.blk_index, + req->place.len); + return -EINVAL; + } + + count = pagevec_count(&req->result.pvec); + if (count % pages_per_lblk) { + SSDFS_ERR("inconsistent request: " + "pagevec count %u, " + "pages_per_lblk %u, req->place.len %u\n", + count, pages_per_lblk, req->place.len); + return -EINVAL; + } + + down_write(&table->translation_lock); + + if (logical_blk > table->last_allocated_blk) { + err = -EINVAL; + SSDFS_ERR("fail to set block migrating: " + "block %u > last_allocated_block %u\n", + logical_blk, + table->last_allocated_blk); + goto finish_set_block_migration; + } + + if (atomic_read(&table->state) <= SSDFS_BLK2OFF_OBJECT_PARTIAL_INIT) { + u16 capacity = table->lblk2off_capacity; + + if (ssdfs_blk2off_table_bmap_vacant(&table->lbmap, + SSDFS_LBMAP_INIT_INDEX, + capacity, + logical_blk)) { + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("table is not initialized yet: " + "logical_blk %u\n", + logical_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_set_block_migration; + } + } + + blk = ssdfs_get_migrating_block(table, logical_blk, true); + if (IS_ERR_OR_NULL(blk)) { + err = (blk == NULL ? -ENOENT : PTR_ERR(blk)); + SSDFS_ERR("fail to get migrating block: " + "logical_blk %u, err %d\n", + logical_blk, err); + goto finish_set_block_migration; + } + + switch (blk->state) { + case SSDFS_LBLOCK_UNKNOWN_STATE: + /* expected state */ + break; + + case SSDFS_LBLOCK_UNDER_MIGRATION: + case SSDFS_LBLOCK_UNDER_COMMIT: + err = -ERANGE; + SSDFS_WARN("logical_blk %u is under migration already\n", + logical_blk); + goto finish_set_block_migration; + + default: + err = -ERANGE; + SSDFS_ERR("unexpected state %#x\n", + blk->state); + goto finish_set_block_migration; + } + + pagevec_init(&blk->pvec); + + start_page = logical_blk - req->place.start.blk_index; + for (i = start_page; i < (start_page + pages_per_lblk); i++) { + struct page *page; +#ifdef CONFIG_SSDFS_DEBUG + void *kaddr; + + SSDFS_DBG("start_page %u, logical_blk %u, " + "blk_index %u, i %d, " + "pagevec_count %u\n", + start_page, logical_blk, + req->place.start.blk_index, + i, + pagevec_count(&req->result.pvec)); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = ssdfs_blk2off_alloc_page(GFP_KERNEL); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("unable to allocate #%d memory page\n", i); + ssdfs_blk2off_pagevec_release(&blk->pvec); + goto finish_set_block_migration; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); + + BUG_ON(i >= pagevec_count(&req->result.pvec)); + BUG_ON(!req->result.pvec.pages[i]); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_memcpy_page(page, 0, PAGE_SIZE, + req->result.pvec.pages[i], 0, PAGE_SIZE, + PAGE_SIZE); + +#ifdef CONFIG_SSDFS_DEBUG + kaddr = kmap_local_page(req->result.pvec.pages[i]); + SSDFS_DBG("BLOCK STATE DUMP: page_index %d\n", i); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, PAGE_SIZE); + SSDFS_DBG("\n"); + kunmap_local(kaddr); +#endif /* CONFIG_SSDFS_DEBUG */ + + pagevec_add(&blk->pvec, page); + } + + blk->state = SSDFS_LBLOCK_UNDER_MIGRATION; + blk->peb_index = peb_index; + +finish_set_block_migration: + up_write(&table->translation_lock); + + if (!err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u is under migration: " + "(peb_index %u, state %#x)\n", + logical_blk, peb_index, blk->state); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return err; +} + +/* + * ssdfs_blk2off_table_get_block_migration() - get block's migration state + * @table: pointer on table object + * @logical_blk: logical block number + * @peb_index: PEB index + * + * This method tries to get the migration state of logical block. + * + */ +int ssdfs_blk2off_table_get_block_migration(struct ssdfs_blk2off_table *table, + u16 logical_blk, + u16 peb_index) +{ + struct ssdfs_migrating_block *blk = NULL; + int migration_state = SSDFS_LBLOCK_UNKNOWN_STATE; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table); + BUG_ON(!rwsem_is_locked(&table->translation_lock)); + + SSDFS_DBG("table %p, logical_blk %u, peb_index %u\n", + table, logical_blk, peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + blk = ssdfs_get_migrating_block(table, logical_blk, false); + if (IS_ERR_OR_NULL(blk)) + migration_state = SSDFS_LBLOCK_UNKNOWN_STATE; + else + migration_state = blk->state; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u, migration_state %#x\n", + logical_blk, migration_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + return migration_state; +} + +/* + * ssdfs_blk2off_table_get_block_state() - get state migrating block + * @table: pointer on table object + * @req: segment request [in|out] + * + * This method tries to get the state of logical block under migration. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal logic error. + * %-EAGAIN - logical block is not migrating. + * %-ENOMEM - fail to allocate memory. + */ +int ssdfs_blk2off_table_get_block_state(struct ssdfs_blk2off_table *table, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + u16 logical_blk; + struct ssdfs_migrating_block *blk = NULL; + u32 read_bytes; + int start_page; + u32 data_bytes = 0; + int processed_blks; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table || !req); + + SSDFS_DBG("table %p, req %p\n", + table, req); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = table->fsi; + read_bytes = req->result.processed_blks * fsi->pagesize; + start_page = (int)(read_bytes >> PAGE_SHIFT); + BUG_ON(start_page >= U16_MAX); + + if (pagevec_count(&req->result.pvec) <= start_page) { + SSDFS_ERR("page_index %d >= pagevec_count %u\n", + start_page, + pagevec_count(&req->result.pvec)); + return -ERANGE; + } + + logical_blk = req->place.start.blk_index + req->result.processed_blks; + + down_read(&table->translation_lock); + + if (logical_blk > table->last_allocated_blk) { + err = -EINVAL; + SSDFS_ERR("fail to get migrating block: " + "block %u > last_allocated_block %u\n", + logical_blk, + table->last_allocated_blk); + goto finish_get_block_state; + } + + blk = ssdfs_get_migrating_block(table, logical_blk, false); + if (IS_ERR_OR_NULL(blk)) { + err = -EAGAIN; + goto finish_get_block_state; + } + + switch (blk->state) { + case SSDFS_LBLOCK_UNDER_MIGRATION: + case SSDFS_LBLOCK_UNDER_COMMIT: + /* expected state */ + break; + + case SSDFS_LBLOCK_UNKNOWN_STATE: + err = -EAGAIN; + goto finish_get_block_state; + + default: + err = -ERANGE; + SSDFS_ERR("unexpected state %#x\n", + blk->state); + goto finish_get_block_state; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u, state %#x\n", + logical_blk, blk->state); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pagevec_count(&blk->pvec) == (fsi->pagesize >> PAGE_SHIFT)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u, blk pagevec count %u\n", + logical_blk, pagevec_count(&blk->pvec)); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + SSDFS_WARN("logical_blk %u, blk pagevec count %u\n", + logical_blk, pagevec_count(&blk->pvec)); + } + + for (i = 0; i < pagevec_count(&blk->pvec); i++) { + int page_index = start_page + i; + struct page *page; +#ifdef CONFIG_SSDFS_DEBUG + void *kaddr; + + SSDFS_DBG("index %d, read_bytes %u, " + "start_page %u, page_index %d\n", + i, read_bytes, start_page, page_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (page_index >= pagevec_count(&req->result.pvec)) { + err = -ERANGE; + SSDFS_ERR("page_index %d >= count %d\n", + page_index, + pagevec_count(&req->result.pvec)); + goto finish_get_block_state; + } + + page = req->result.pvec.pages[page_index]; + ssdfs_lock_page(blk->pvec.pages[i]); + + ssdfs_memcpy_page(page, 0, PAGE_SIZE, + blk->pvec.pages[i], 0, PAGE_SIZE, + PAGE_SIZE); + +#ifdef CONFIG_SSDFS_DEBUG + kaddr = kmap_local_page(blk->pvec.pages[i]); + SSDFS_DBG("BLOCK STATE DUMP: page_index %d\n", i); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, PAGE_SIZE); + SSDFS_DBG("\n"); + kunmap_local(kaddr); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_unlock_page(blk->pvec.pages[i]); + SetPageUptodate(page); + + data_bytes += PAGE_SIZE; + } + +finish_get_block_state: + up_read(&table->translation_lock); + + if (!err) { + processed_blks = + (data_bytes + fsi->pagesize - 1) >> fsi->log_pagesize; + req->result.processed_blks += processed_blks; + } + + return err; +} + +/* + * ssdfs_blk2off_table_update_block_state() - update state migrating block + * @table: pointer on table object + * @req: segment request [in|out] + * + * This method tries to update the state of logical block under migration. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal logic error. + * %-ENOENT - logical block is not migrating. + * %-ENOMEM - fail to allocate memory. + */ +int ssdfs_blk2off_table_update_block_state(struct ssdfs_blk2off_table *table, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + u16 logical_blk; + struct ssdfs_migrating_block *blk = NULL; + u32 read_bytes; + int start_page; + u32 data_bytes = 0; + int processed_blks; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table || !req); + BUG_ON(!rwsem_is_locked(&table->translation_lock)); + + SSDFS_DBG("table %p, req %p\n", + table, req); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = table->fsi; + read_bytes = req->result.processed_blks * fsi->pagesize; + start_page = (int)(read_bytes >> PAGE_SHIFT); + BUG_ON(start_page >= U16_MAX); + + if (pagevec_count(&req->result.pvec) <= start_page) { + SSDFS_ERR("page_index %d >= pagevec_count %u\n", + start_page, + pagevec_count(&req->result.pvec)); + return -ERANGE; + } + + logical_blk = req->place.start.blk_index + req->result.processed_blks; + + if (logical_blk > table->last_allocated_blk) { + err = -EINVAL; + SSDFS_ERR("fail to get migrating block: " + "block %u > last_allocated_block %u\n", + logical_blk, + table->last_allocated_blk); + goto finish_update_block_state; + } + + blk = ssdfs_get_migrating_block(table, logical_blk, false); + if (IS_ERR_OR_NULL(blk)) { + err = -ENOENT; + goto finish_update_block_state; + } + + switch (blk->state) { + case SSDFS_LBLOCK_UNDER_MIGRATION: + /* expected state */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("unexpected state %#x\n", + blk->state); + goto finish_update_block_state; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u, state %#x\n", + logical_blk, blk->state); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pagevec_count(&blk->pvec) == (fsi->pagesize >> PAGE_SHIFT)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u, blk pagevec count %u\n", + logical_blk, pagevec_count(&blk->pvec)); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + SSDFS_WARN("logical_blk %u, blk pagevec count %u\n", + logical_blk, pagevec_count(&blk->pvec)); + } + + for (i = 0; i < pagevec_count(&blk->pvec); i++) { + int page_index = start_page + i; + struct page *page; +#ifdef CONFIG_SSDFS_DEBUG + void *kaddr; + + SSDFS_DBG("index %d, read_bytes %u, " + "start_page %u, page_index %d\n", + i, read_bytes, start_page, page_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (page_index >= pagevec_count(&req->result.pvec)) { + err = -ERANGE; + SSDFS_ERR("page_index %d >= count %d\n", + page_index, + pagevec_count(&req->result.pvec)); + goto finish_update_block_state; + } + + page = req->result.pvec.pages[page_index]; + ssdfs_lock_page(blk->pvec.pages[i]); + + ssdfs_memcpy_page(blk->pvec.pages[i], 0, PAGE_SIZE, + page, 0, PAGE_SIZE, + PAGE_SIZE); + +#ifdef CONFIG_SSDFS_DEBUG + kaddr = kmap_local_page(blk->pvec.pages[i]); + SSDFS_DBG("BLOCK STATE DUMP: page_index %d\n", i); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, PAGE_SIZE); + SSDFS_DBG("\n"); + kunmap_local(kaddr); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_unlock_page(blk->pvec.pages[i]); + + data_bytes += PAGE_SIZE; + } + +finish_update_block_state: + if (!err) { + processed_blks = + (data_bytes + fsi->pagesize - 1) >> fsi->log_pagesize; + req->result.processed_blks += processed_blks; + } + + return err; +} + +/* + * ssdfs_blk2off_table_set_block_commit() - set block commit + * @table: pointer on table object + * @logical_blk: logical block number + * @peb_index: PEB index in the segment + * + * This method tries to set commit state for logical block. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input + * %-ERANGE - internal logic error + */ +int ssdfs_blk2off_table_set_block_commit(struct ssdfs_blk2off_table *table, + u16 logical_blk, + u16 peb_index) +{ + struct ssdfs_migrating_block *blk = NULL; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!table); + + SSDFS_DBG("table %p, logical_blk %u, peb_index %u\n", + table, logical_blk, peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (peb_index >= table->pebs_count) { + SSDFS_ERR("fail to set block commit: " + "peb_index %u >= pebs_count %u\n", + peb_index, table->pebs_count); + return -EINVAL; + } + + down_write(&table->translation_lock); + + if (logical_blk > table->last_allocated_blk) { + err = -EINVAL; + SSDFS_ERR("fail to set block commit: " + "block %u > last_allocated_block %u\n", + logical_blk, + table->last_allocated_blk); + goto finish_set_block_commit; + } + + blk = ssdfs_get_migrating_block(table, logical_blk, false); + if (IS_ERR_OR_NULL(blk)) { + err = (blk == NULL ? -ENOENT : PTR_ERR(blk)); + SSDFS_ERR("fail to get migrating block: " + "logical_blk %u, err %d\n", + logical_blk, err); + goto finish_set_block_commit; + } + + switch (blk->state) { + case SSDFS_LBLOCK_UNDER_MIGRATION: + /* expected state */ + break; + + case SSDFS_LBLOCK_UNDER_COMMIT: + err = -ERANGE; + SSDFS_ERR("logical_blk %u is under commit already\n", + logical_blk); + goto finish_set_block_commit; + + default: + err = -ERANGE; + SSDFS_ERR("unexpected state %#x\n", + blk->state); + goto finish_set_block_commit; + } + + if (blk->peb_index != peb_index) { + err = -ERANGE; + SSDFS_ERR("blk->peb_index %u != peb_index %u\n", + blk->peb_index, peb_index); + goto finish_set_block_commit; + } + + blk->state = SSDFS_LBLOCK_UNDER_COMMIT; + +finish_set_block_commit: + up_write(&table->translation_lock); + + if (!err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u is under commit: " + "(peb_index %u, state %#x)\n", + logical_blk, peb_index, blk->state); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return err; +} + +/* + * ssdfs_blk2off_table_revert_migration_state() - revert migration state + * @table: pointer on table object + * @peb_index: PEB index in the segment + * + * This method tries to revert migration state for logical block. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input + */ +int ssdfs_blk2off_table_revert_migration_state(struct ssdfs_blk2off_table *tbl, + u16 peb_index) +{ + struct ssdfs_migrating_block *blk = NULL; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl); + + SSDFS_DBG("table %p, peb_index %u\n", + tbl, peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (peb_index >= tbl->pebs_count) { + SSDFS_ERR("fail to revert migration state: " + "peb_index %u >= pebs_count %u\n", + peb_index, tbl->pebs_count); + return -EINVAL; + } + + down_write(&tbl->translation_lock); + + for (i = 0; i <= tbl->last_allocated_blk; i++) { + blk = ssdfs_get_migrating_block(tbl, i, false); + if (IS_ERR_OR_NULL(blk)) + continue; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("blk->peb_index %u, peb_index %u\n", + blk->peb_index, peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (blk->peb_index != peb_index) + continue; + + if (blk->state == SSDFS_LBLOCK_UNDER_COMMIT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("reverting migration state: blk %d\n", + i); +#endif /* CONFIG_SSDFS_DEBUG */ + + blk->state = SSDFS_LBLOCK_UNKNOWN_STATE; + ssdfs_blk2off_pagevec_release(&blk->pvec); + + ssdfs_blk2off_kfree(blk); + blk = NULL; + + err = ssdfs_dynamic_array_set(&tbl->migrating_blks, + i, &blk); + if (unlikely(err)) { + SSDFS_ERR("fail to zero pointer: " + "logical_blk %d, err %d\n", + i, err); + goto finish_revert_migration_state; + } + } + } + +finish_revert_migration_state: + up_write(&tbl->translation_lock); + + if (!err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("migration state was reverted for peb_index %u\n", + peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return err; +} + +static inline +int ssdfs_show_fragment_details(void *ptr) +{ +#ifdef CONFIG_SSDFS_DEBUG + struct ssdfs_phys_offset_table_fragment *fragment; + + fragment = (struct ssdfs_phys_offset_table_fragment *)ptr; + if (!fragment) { + SSDFS_ERR("empty pointer on fragment\n"); + return -ERANGE; + } + + SSDFS_DBG("fragment: " + "start_id %u, sequence_id %u, " + "id_count %d, state %#x, " + "hdr %p, phys_offs %p, " + "buf_size %zu\n", + fragment->start_id, + fragment->sequence_id, + atomic_read(&fragment->id_count), + atomic_read(&fragment->state), + fragment->hdr, + fragment->phys_offs, + fragment->buf_size); + + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + fragment->buf, + fragment->buf_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +static +void ssdfs_debug_blk2off_table_object(struct ssdfs_blk2off_table *tbl) +{ +#ifdef CONFIG_SSDFS_DEBUG + u32 items_count; + int i; + + BUG_ON(!tbl); + + SSDFS_DBG("flags %#x, state %#x, pages_per_peb %u, " + "pages_per_seg %u, type %#x\n", + atomic_read(&tbl->flags), + atomic_read(&tbl->state), + tbl->pages_per_peb, + tbl->pages_per_seg, + tbl->type); + + SSDFS_DBG("init_cno %llu, used_logical_blks %u, " + "free_logical_blks %u, last_allocated_blk %u\n", + tbl->init_cno, tbl->used_logical_blks, + tbl->free_logical_blks, tbl->last_allocated_blk); + + for (i = 0; i < SSDFS_LBMAP_ARRAY_MAX; i++) { + unsigned long *bmap = tbl->lbmap.array[i]; + + SSDFS_DBG("lbmap: index %d, bmap %p\n", i, bmap); + if (bmap) { + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + bmap, + tbl->lbmap.bytes_count); + } + } + + SSDFS_DBG("lblk2off_capacity %u, capacity %u\n", + tbl->lblk2off_capacity, + ssdfs_dynamic_array_items_count(&tbl->lblk2off)); + + items_count = tbl->last_allocated_blk + 1; + + for (i = 0; i < items_count; i++) { + void *kaddr; + + kaddr = ssdfs_dynamic_array_get_locked(&tbl->lblk2off, i); + if (IS_ERR_OR_NULL(kaddr)) + continue; + + SSDFS_DBG("lbk2off: index %d, " + "cno %llu, id %u, peb_index %u, " + "sequence_id %u, offset_index %u\n", + i, + SSDFS_OFF_POS(kaddr)->cno, + SSDFS_OFF_POS(kaddr)->id, + SSDFS_OFF_POS(kaddr)->peb_index, + SSDFS_OFF_POS(kaddr)->sequence_id, + SSDFS_OFF_POS(kaddr)->offset_index); + + ssdfs_dynamic_array_release(&tbl->lblk2off, i, kaddr); + } + + SSDFS_DBG("pebs_count %u\n", tbl->pebs_count); + + for (i = 0; i < tbl->pebs_count; i++) { + struct ssdfs_phys_offset_table_array *peb = &tbl->peb[i]; + int fragments_count = atomic_read(&peb->fragment_count); + + SSDFS_DBG("peb: index %d, state %#x, " + "fragment_count %d, last_sequence_id %lu\n", + i, atomic_read(&peb->state), + fragments_count, + ssdfs_sequence_array_last_id(peb->sequence)); + + ssdfs_sequence_array_apply_for_all(peb->sequence, + ssdfs_show_fragment_details); + } +#endif /* CONFIG_SSDFS_DEBUG */ +} From patchwork Sat Feb 25 01:08:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151925 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7FD6C7EE2F for ; Sat, 25 Feb 2023 01:16:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229713AbjBYBQw (ORCPT ); Fri, 24 Feb 2023 20:16:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229699AbjBYBQ2 (ORCPT ); Fri, 24 Feb 2023 20:16:28 -0500 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 614BF1352D for ; Fri, 24 Feb 2023 17:16:18 -0800 (PST) Received: by mail-oi1-x22e.google.com with SMTP id bh20so791731oib.9 for ; Fri, 24 Feb 2023 17:16:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Ca27sWt17j3FFyJHkHnFEHd8JprW5tG+cG5ZkZDqYsc=; b=g3sGp8Tr3usIxwp4B8KAUE1iAX7C3Qm0GLQMunWOouHtkUu4a8mMRRLEr7uetOP9DS 9FnOIRKwrKBRWELjqG9gteW50mK6CH7iZBJrNmH0braIIBvlKktg+95AVmtVibHwwTOf CVtxBTROhT+2gNzeMMuhNlDnXMWicByrajOPrSNJhs5wLkLl40GcUhAqeM+pZ/vdtJCf sKQWNOisjIo3sd3G0kLsaYc9sDds9TfZ/S4sqxxyI8ePqAZI59FXn1D9EjSG0YMcWoWt TaYH5QU7RpXy+gWIVKjrBzAy4f5wxYewoILB3yT8OuDX0QDmykNhkgBRhFmD1cnUzgk3 ufwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ca27sWt17j3FFyJHkHnFEHd8JprW5tG+cG5ZkZDqYsc=; b=lHtAAehCwO56ih4LJq98OlggCNKdug26aRcG1bctcIYh7zoyp+2/kKEDUkybtsivgg u76w2qjhhnUvafe6+JER4VGAXY4hhd6BLvj6rxgJ7Ruz9mNHpv0eK8A1OwqzAK/LME2B aYKS/aIYZv/MTjuNkz7esyLZJkWooiBsxmtjFVI7ERcMquU6c0FVi+NG2P0o5Pg/JixD JpL9feSE/eWm0LjWTv41djT4LAlGBXHNppX93gth7o1RNSH2Jj+6Fr6i9u6WUpLMNiLy 0BlbX8hcR+WcNh4S4x5gZBzrItNrRdvgZWTfsGjFN6A2o42I+ZLahEp7FYwWhfUYEtvj 82Ew== X-Gm-Message-State: AO0yUKXLalwT5k1Wom38JUOGuMVIAoVM+6eVTmWt097kZo35BPjCxf7V gTMfEKfG8KC0ZM3JlPIxu/S2bHp71z5BHt5n X-Google-Smtp-Source: AK7set8mcnCkhQCbVUWThQoehvlOiRuGM9IOwNfe9RKfKzy+79IvGziF9ooyS7ZtccVP6xitRdeVIQ== X-Received: by 2002:a05:6808:1494:b0:378:81e7:4ee2 with SMTP id e20-20020a056808149400b0037881e74ee2mr915652oiw.10.1677287776971; Fri, 24 Feb 2023 17:16:16 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:16 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 20/76] ssdfs: introduce PEB object Date: Fri, 24 Feb 2023 17:08:31 -0800 Message-Id: <20230225010927.813929-21-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org SSDFS splits a partition/volume on sequence of fixed-sized segments. Every segment can include one or several Logical Erase Blocks (LEB). LEB can be mapped into "Physical" Erase Block (PEB). PEB represent concept of erase block or zone that can be allocated, be filled by logs, and be erased. PEB object keeps knowledge about PEB ID, index in segment, and current log details. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/page_array.c | 1746 +++++++++++++++++++++++++++++++++++++++++ fs/ssdfs/page_array.h | 119 +++ fs/ssdfs/peb.c | 813 +++++++++++++++++++ fs/ssdfs/peb.h | 970 +++++++++++++++++++++++ 4 files changed, 3648 insertions(+) create mode 100644 fs/ssdfs/page_array.c create mode 100644 fs/ssdfs/page_array.h create mode 100644 fs/ssdfs/peb.c create mode 100644 fs/ssdfs/peb.h diff --git a/fs/ssdfs/page_array.c b/fs/ssdfs/page_array.c new file mode 100644 index 000000000000..38e9859efa45 --- /dev/null +++ b/fs/ssdfs/page_array.c @@ -0,0 +1,1746 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/page_array.c - page array object's functionality. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "page_array.h" + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_parray_page_leaks; +atomic64_t ssdfs_parray_memory_leaks; +atomic64_t ssdfs_parray_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_parray_cache_leaks_increment(void *kaddr) + * void ssdfs_parray_cache_leaks_decrement(void *kaddr) + * void *ssdfs_parray_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_parray_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_parray_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_parray_kfree(void *kaddr) + * struct page *ssdfs_parray_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_parray_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_parray_free_page(struct page *page) + * void ssdfs_parray_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(parray) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(parray) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_parray_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_parray_page_leaks, 0); + atomic64_set(&ssdfs_parray_memory_leaks, 0); + atomic64_set(&ssdfs_parray_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_parray_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_parray_page_leaks) != 0) { + SSDFS_ERR("PAGE ARRAY: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_parray_page_leaks)); + } + + if (atomic64_read(&ssdfs_parray_memory_leaks) != 0) { + SSDFS_ERR("PAGE ARRAY: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_parray_memory_leaks)); + } + + if (atomic64_read(&ssdfs_parray_cache_leaks) != 0) { + SSDFS_ERR("PAGE ARRAY: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_parray_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +/* + * ssdfs_create_page_array() - create page array + * @capacity: maximum number of pages in the array + * @array: pointer of memory area for the array creation [out] + * + * This method tries to create the page array with @capacity + * of maximum number of pages in the array. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENOMEM - unable to allocate memory. + */ +int ssdfs_create_page_array(int capacity, struct ssdfs_page_array *array) +{ + void *addr[SSDFS_PAGE_ARRAY_BMAP_COUNT]; + size_t bmap_bytes; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); + BUG_ON(atomic_read(&array->state) != SSDFS_PAGE_ARRAY_UNKNOWN_STATE); + + SSDFS_DBG("capacity %d, array %p\n", + capacity, array); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (capacity == 0) { + SSDFS_ERR("invalid capacity %d\n", + capacity); + return -EINVAL; + } + + init_rwsem(&array->lock); + atomic_set(&array->pages_capacity, capacity); + array->pages_count = 0; + array->last_page = SSDFS_PAGE_ARRAY_INVALID_LAST_PAGE; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pages_count %lu, last_page %lu\n", + array->pages_count, array->last_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + array->pages = ssdfs_parray_kcalloc(capacity, sizeof(struct page *), + GFP_KERNEL); + if (!array->pages) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate memory: capacity %d\n", + capacity); + goto finish_create_page_array; + } + + bmap_bytes = capacity + BITS_PER_LONG; + bmap_bytes /= BITS_PER_BYTE; + array->bmap_bytes = bmap_bytes; + + for (i = 0; i < SSDFS_PAGE_ARRAY_BMAP_COUNT; i++) { + spin_lock_init(&array->bmap[i].lock); + array->bmap[i].ptr = NULL; + } + + for (i = 0; i < SSDFS_PAGE_ARRAY_BMAP_COUNT; i++) { + addr[i] = ssdfs_parray_kmalloc(bmap_bytes, GFP_KERNEL); + + if (!addr[i]) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate bmap: index %d\n", + i); + for (; i >= 0; i--) + ssdfs_parray_kfree(addr[i]); + goto free_page_array; + } + + memset(addr[i], 0xFF, bmap_bytes); + } + + down_write(&array->lock); + for (i = 0; i < SSDFS_PAGE_ARRAY_BMAP_COUNT; i++) { + spin_lock(&array->bmap[i].lock); + array->bmap[i].ptr = addr[i]; + addr[i] = NULL; + spin_unlock(&array->bmap[i].lock); + } + up_write(&array->lock); + + atomic_set(&array->state, SSDFS_PAGE_ARRAY_CREATED); + + return 0; + +free_page_array: + ssdfs_parray_kfree(array->pages); + array->pages = NULL; + +finish_create_page_array: + return err; +} + +/* + * ssdfs_destroy_page_array() - destroy page array + * @array: page array object + * + * This method tries to destroy the page array. + */ +void ssdfs_destroy_page_array(struct ssdfs_page_array *array) +{ + int state; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); + BUG_ON(rwsem_is_locked(&array->lock)); + + SSDFS_DBG("array %p, state %#x\n", + array, + atomic_read(&array->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_page_array_release_all_pages(array); + + state = atomic_xchg(&array->state, SSDFS_PAGE_ARRAY_UNKNOWN_STATE); + + switch (state) { + case SSDFS_PAGE_ARRAY_CREATED: + /* expected state */ + break; + + case SSDFS_PAGE_ARRAY_DIRTY: + SSDFS_WARN("page array is dirty on destruction\n"); + break; + + default: + SSDFS_WARN("unexpected state %#x of page array\n", + state); + break; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pages_count %lu, last_page %lu\n", + array->pages_count, array->last_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + atomic_set(&array->pages_capacity, 0); + array->pages_count = 0; + array->last_page = SSDFS_PAGE_ARRAY_INVALID_LAST_PAGE; + + if (array->pages) + ssdfs_parray_kfree(array->pages); + + array->pages = NULL; + + array->bmap_bytes = 0; + + for (i = 0; i < SSDFS_PAGE_ARRAY_BMAP_COUNT; i++) { + spin_lock(&array->bmap[i].lock); + if (array->bmap[i].ptr) + ssdfs_parray_kfree(array->bmap[i].ptr); + array->bmap[i].ptr = NULL; + spin_unlock(&array->bmap[i].lock); + } +} + +/* + * ssdfs_reinit_page_array() - change the capacity of the page array + * @capacity: new value of the capacity + * @array: pointer of memory area for the array creation + * + * This method tries to change the capacity of the page array. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENOMEM - unable to allocate memory. + * %-ERANGE - internal error. + */ +int ssdfs_reinit_page_array(int capacity, struct ssdfs_page_array *array) +{ + struct page **pages; + void *addr[SSDFS_PAGE_ARRAY_BMAP_COUNT]; + int old_capacity; + size_t bmap_bytes; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); + + SSDFS_DBG("array %p, capacity %d, state %#x\n", + array, capacity, + atomic_read(&array->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&array->state)) { + case SSDFS_PAGE_ARRAY_CREATED: + case SSDFS_PAGE_ARRAY_DIRTY: + /* expected state */ + break; + + default: + SSDFS_WARN("unexpected state %#x of page array\n", + atomic_read(&array->state)); + return -ERANGE; + } + + down_write(&array->lock); + + old_capacity = atomic_read(&array->pages_capacity); + + if (capacity < old_capacity) { + err = -EINVAL; + SSDFS_ERR("unable to shrink: " + "capacity %d, pages_capacity %d\n", + capacity, + old_capacity); + goto finish_reinit; + } + + if (capacity == old_capacity) { + err = 0; + SSDFS_WARN("capacity %d == pages_capacity %d\n", + capacity, + old_capacity); + goto finish_reinit; + } + + atomic_set(&array->pages_capacity, capacity); + + pages = ssdfs_parray_kcalloc(capacity, sizeof(struct page *), + GFP_KERNEL); + if (!pages) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate memory: capacity %d\n", + capacity); + goto finish_reinit; + } + + bmap_bytes = capacity + BITS_PER_LONG; + bmap_bytes /= BITS_PER_BYTE; + + for (i = 0; i < SSDFS_PAGE_ARRAY_BMAP_COUNT; i++) { + addr[i] = ssdfs_parray_kmalloc(bmap_bytes, GFP_KERNEL); + + if (!addr[i]) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate bmap: index %d\n", + i); + for (; i >= 0; i--) + ssdfs_parray_kfree(addr[i]); + ssdfs_parray_kfree(pages); + goto finish_reinit; + } + + memset(addr[i], 0xFF, bmap_bytes); + } + + err = ssdfs_memcpy(pages, + 0, sizeof(struct page *) * capacity, + array->pages, + 0, sizeof(struct page *) * old_capacity, + sizeof(struct page *) * old_capacity); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + goto finish_reinit; + } + + ssdfs_parray_kfree(array->pages); + array->pages = pages; + + for (i = 0; i < SSDFS_PAGE_ARRAY_BMAP_COUNT; i++) { + void *tmp_addr = NULL; + + spin_lock(&array->bmap[i].lock); + ssdfs_memcpy(addr[i], 0, bmap_bytes, + array->bmap[i].ptr, 0, array->bmap_bytes, + array->bmap_bytes); + tmp_addr = array->bmap[i].ptr; + array->bmap[i].ptr = addr[i]; + addr[i] = NULL; + spin_unlock(&array->bmap[i].lock); + + ssdfs_parray_kfree(tmp_addr); + } + + array->bmap_bytes = bmap_bytes; + +finish_reinit: + if (unlikely(err)) + atomic_set(&array->pages_capacity, old_capacity); + + up_write(&array->lock); + + return err; +} + +/* + * is_ssdfs_page_array_empty() - is page array empty? + * @array: page array object + * + * This method tries to check that page array is empty. + */ +bool is_ssdfs_page_array_empty(struct ssdfs_page_array *array) +{ + bool is_empty = false; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&array->lock); + is_empty = array->pages_count == 0; + up_read(&array->lock); + + return is_empty; +} + +/* + * ssdfs_page_array_get_last_page_index() - get latest page index + * @array: page array object + * + * This method tries to get latest page index. + */ +unsigned long +ssdfs_page_array_get_last_page_index(struct ssdfs_page_array *array) +{ + unsigned long index; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&array->lock); + index = array->last_page; + up_read(&array->lock); + + return index; +} + +/* + * ssdfs_page_array_add_page() - add memory page into the page array + * @array: page array object + * @page: memory page + * @page_index: index of the page in the page array + * + * This method tries to add a page into the page array. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EEXIST - page array contains the page for the index. + */ +int ssdfs_page_array_add_page(struct ssdfs_page_array *array, + struct page *page, + unsigned long page_index) +{ + struct ssdfs_page_array_bitmap *bmap; + int capacity; + unsigned long found; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array || !page); + + SSDFS_DBG("array %p, page %p, page_index %lu, state %#x\n", + array, page, page_index, + atomic_read(&array->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&array->state)) { + case SSDFS_PAGE_ARRAY_CREATED: + case SSDFS_PAGE_ARRAY_DIRTY: + /* expected state */ + break; + + default: + SSDFS_WARN("unexpected state %#x of page array\n", + atomic_read(&array->state)); + return -ERANGE; + } + + capacity = atomic_read(&array->pages_capacity); + + if (page_index >= capacity) { + SSDFS_ERR("page_index %lu >= pages_capacity %d\n", + page_index, + capacity); + return -EINVAL; + } + + down_write(&array->lock); + + capacity = atomic_read(&array->pages_capacity); + + if (array->pages_count > capacity) { + err = -ERANGE; + SSDFS_ERR("corrupted page array: " + "pages_count %lu, pages_capacity %d\n", + array->pages_count, + capacity); + goto finish_add_page; + } + + if (array->pages_count == capacity) { + err = -EEXIST; + SSDFS_ERR("page %lu is allocated already\n", + page_index); + goto finish_add_page; + } + + bmap = &array->bmap[SSDFS_PAGE_ARRAY_ALLOC_BMAP]; + if (!bmap->ptr) { + err = -ERANGE; + SSDFS_WARN("bitmap is empty\n"); + goto finish_add_page; + } + + spin_lock(&bmap->lock); + found = bitmap_find_next_zero_area(bmap->ptr, capacity, + page_index, 1, 0); + if (found == page_index) { + /* page is allocated already */ + err = -EEXIST; + } else + bitmap_clear(bmap->ptr, page_index, 1); + spin_unlock(&bmap->lock); + + if (err) { + SSDFS_ERR("page %lu is allocated already\n", + page_index); + goto finish_add_page; + } + + if (array->pages[page_index]) { + err = -ERANGE; + SSDFS_WARN("position %lu contains page pointer\n", + page_index); + goto finish_add_page; + } else { + ssdfs_get_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + array->pages[page_index] = page; + page->index = page_index; + } + + ssdfs_parray_account_page(page); + array->pages_count++; + + if (array->last_page >= SSDFS_PAGE_ARRAY_INVALID_LAST_PAGE) + array->last_page = page_index; + else if (array->last_page < page_index) + array->last_page = page_index; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pages_count %lu, last_page %lu\n", + array->pages_count, array->last_page); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_add_page: + up_write(&array->lock); + + return err; +} + +/* + * ssdfs_page_array_allocate_page_locked() - allocate and add page + * @array: page array object + * @page_index: index of the page in the page array + * + * This method tries to allocate, to add into the page array and + * to lock page. + * + * RETURN: + * [success] - pointer on allocated and locked page. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOMEM - unable to allocate memory page. + * %-EEXIST - page array contains the page for the index. + */ +struct page * +ssdfs_page_array_allocate_page_locked(struct ssdfs_page_array *array, + unsigned long page_index) +{ + struct page *page; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); + + SSDFS_DBG("array %p, page_index %lu, state %#x\n", + array, page_index, + atomic_read(&array->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&array->state)) { + case SSDFS_PAGE_ARRAY_CREATED: + case SSDFS_PAGE_ARRAY_DIRTY: + /* expected state */ + break; + + default: + SSDFS_WARN("unexpected state %#x of page array\n", + atomic_read(&array->state)); + return ERR_PTR(-ERANGE); + } + + page = ssdfs_parray_alloc_page(GFP_KERNEL | __GFP_ZERO); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("unable to allocate memory page\n"); + return ERR_PTR(err); + } + + /* + * The ssdfs_page_array_add_page() calls + * ssdfs_parray_account_page(). It needs to exclude + * the improper leaks accounting. + */ + ssdfs_parray_forget_page(page); + + err = ssdfs_page_array_add_page(array, page, page_index); + if (unlikely(err)) { + SSDFS_ERR("fail to add page: " + "page_index %lu, err %d\n", + page_index, err); + ssdfs_parray_free_page(page); + return ERR_PTR(err); + } + + ssdfs_lock_page(page); + return page; +} + +/* + * ssdfs_page_array_get_page() - get page unlocked + * @array: page array object + * @page_index: index of the page in the page array + * + * This method tries to find a page into the page array. + * + * RETURN: + * [success] - pointer on page. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOENT - no allocated page for the requested index. + */ +struct page *ssdfs_page_array_get_page(struct ssdfs_page_array *array, + unsigned long page_index) +{ + struct page *page; + struct ssdfs_page_array_bitmap *bmap; + int capacity; + unsigned long found; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); + + SSDFS_DBG("array %p, page_index %lu, state %#x\n", + array, page_index, + atomic_read(&array->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&array->state)) { + case SSDFS_PAGE_ARRAY_CREATED: + case SSDFS_PAGE_ARRAY_DIRTY: + /* expected state */ + break; + + default: + SSDFS_WARN("unexpected state %#x of page array\n", + atomic_read(&array->state)); + return ERR_PTR(-ERANGE); + } + + capacity = atomic_read(&array->pages_capacity); + + if (page_index >= capacity) { + SSDFS_ERR("page_index %lu >= pages_capacity %d\n", + page_index, + capacity); + return ERR_PTR(-EINVAL); + } + + down_read(&array->lock); + + bmap = &array->bmap[SSDFS_PAGE_ARRAY_ALLOC_BMAP]; + if (!bmap->ptr) { + err = -ERANGE; + SSDFS_WARN("bitmap is empty\n"); + goto finish_get_page; + } + + spin_lock(&bmap->lock); + found = bitmap_find_next_zero_area(bmap->ptr, capacity, + page_index, 1, 0); + if (found != page_index) { + /* page is not allocated yet */ + err = -ENOENT; + } + spin_unlock(&bmap->lock); + + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %lu is not allocated yet\n", + page_index); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_get_page; + } + + page = array->pages[page_index]; + + if (!page) { + err = -ERANGE; + SSDFS_ERR("page pointer is NULL\n"); + goto finish_get_page; + } + +finish_get_page: + up_read(&array->lock); + + if (unlikely(err)) + return ERR_PTR(err); + + ssdfs_get_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return page; +} + +/* + * ssdfs_page_array_get_page_locked() - get page locked + * @array: page array object + * @page_index: index of the page in the page array + * + * This method tries to find and to lock a page into the + * page array. + * + * RETURN: + * [success] - pointer on locked page. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOENT - no allocated page for the requested index. + */ +struct page *ssdfs_page_array_get_page_locked(struct ssdfs_page_array *array, + unsigned long page_index) +{ + struct page *page; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); + + SSDFS_DBG("array %p, page_index %lu, state %#x\n", + array, page_index, + atomic_read(&array->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = ssdfs_page_array_get_page(array, page_index); + if (PTR_ERR(page) == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %lu is not allocated yet\n", + page_index); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (IS_ERR_OR_NULL(page)) { + SSDFS_ERR("fail to get the page: " + "page_index %lu, err %d\n", + page_index, (int)PTR_ERR(page)); + } else + ssdfs_lock_page(page); + + return page; +} + +/* + * ssdfs_page_array_grab_page() - get or add page locked + * @array: page array object + * @page_index: index of the page in the page array + * + * This method tries to find and to lock a page into the + * page array. If no such page then to add and to lock + * the page. + * + * RETURN: + * [success] - pointer on locked page. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOMEM - fail to add the page. + */ +struct page *ssdfs_page_array_grab_page(struct ssdfs_page_array *array, + unsigned long page_index) +{ + struct page *page = ERR_PTR(-ENOMEM); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); + + SSDFS_DBG("array %p, page_index %lu, state %#x\n", + array, page_index, + atomic_read(&array->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = ssdfs_page_array_get_page_locked(array, page_index); + if (PTR_ERR(page) == -ENOENT) { + page = ssdfs_page_array_allocate_page_locked(array, + page_index); + if (IS_ERR_OR_NULL(page)) { + if (!page) + page = ERR_PTR(-ENOMEM); + + SSDFS_ERR("fail to allocate the page: " + "page_index %lu, err %d\n", + page_index, (int)PTR_ERR(page)); + } else { + ssdfs_get_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + } else if (IS_ERR_OR_NULL(page)) { + if (!page) + page = ERR_PTR(-ENOMEM); + + SSDFS_ERR("fail to get page: " + "page_index %lu, err %d\n", + page_index, (int)PTR_ERR(page)); + } + + return page; +} + +/* + * ssdfs_page_array_set_page_dirty() - set page dirty + * @array: page array object + * @page_index: index of the page in the page array + * + * This method tries to set page as dirty in the page array. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOENT - no allocated page for the requested index. + */ +int ssdfs_page_array_set_page_dirty(struct ssdfs_page_array *array, + unsigned long page_index) +{ + struct page *page; + struct ssdfs_page_array_bitmap *bmap; + int capacity; + unsigned long found; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); + + SSDFS_DBG("array %p, page_index %lu, state %#x\n", + array, page_index, + atomic_read(&array->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&array->state)) { + case SSDFS_PAGE_ARRAY_CREATED: + case SSDFS_PAGE_ARRAY_DIRTY: + /* expected state */ + break; + + default: + SSDFS_WARN("unexpected state %#x of page array\n", + atomic_read(&array->state)); + return -ERANGE; + } + + capacity = atomic_read(&array->pages_capacity); + + if (page_index >= capacity) { + SSDFS_ERR("page_index %lu >= pages_capacity %d\n", + page_index, + capacity); + return -EINVAL; + } + + down_read(&array->lock); + + bmap = &array->bmap[SSDFS_PAGE_ARRAY_ALLOC_BMAP]; + if (!bmap->ptr) { + err = -ERANGE; + SSDFS_WARN("allocation bitmap is empty\n"); + goto finish_set_page_dirty; + } + + spin_lock(&bmap->lock); + found = bitmap_find_next_zero_area(bmap->ptr, capacity, + page_index, 1, 0); + if (found != page_index) { + /* page is not allocated yet */ + err = -ENOENT; + } + spin_unlock(&bmap->lock); + + if (err) { + SSDFS_ERR("page %lu is not allocated yet\n", + page_index); + goto finish_set_page_dirty; + } + + bmap = &array->bmap[SSDFS_PAGE_ARRAY_DIRTY_BMAP]; + if (!bmap->ptr) { + err = -ERANGE; + SSDFS_WARN("dirty bitmap is empty\n"); + goto finish_set_page_dirty; + } + + spin_lock(&bmap->lock); + found = bitmap_find_next_zero_area(bmap->ptr, capacity, + page_index, 1, 0); + if (found == page_index) { + /* page is dirty already */ + err = -EEXIST; + } + bitmap_clear(bmap->ptr, page_index, 1); + spin_unlock(&bmap->lock); + + if (err) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %lu is dirty already\n", + page_index); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + page = array->pages[page_index]; + + if (!page) { + err = -ERANGE; + SSDFS_ERR("page pointer is NULL\n"); + goto finish_set_page_dirty; + } + + SetPageDirty(page); + + atomic_set(&array->state, SSDFS_PAGE_ARRAY_DIRTY); + +finish_set_page_dirty: + up_read(&array->lock); + + return err; +} + +/* + * ssdfs_page_array_clear_dirty_page() - set page as clean + * @array: page array object + * @page_index: index of the page in the page array + * + * This method tries to set page as clean in the page array. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOENT - no allocated page for the requested index. + */ +int ssdfs_page_array_clear_dirty_page(struct ssdfs_page_array *array, + unsigned long page_index) +{ + struct page *page; + struct ssdfs_page_array_bitmap *bmap; + int capacity; + unsigned long found; + bool is_clean = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); + + SSDFS_DBG("array %p, page_index %lu, state %#x\n", + array, page_index, + atomic_read(&array->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&array->state)) { + case SSDFS_PAGE_ARRAY_CREATED: + case SSDFS_PAGE_ARRAY_DIRTY: + /* expected state */ + break; + + default: + SSDFS_WARN("unexpected state %#x of page array\n", + atomic_read(&array->state)); + return -ERANGE; + } + + capacity = atomic_read(&array->pages_capacity); + + if (page_index >= capacity) { + SSDFS_ERR("page_index %lu >= pages_capacity %d\n", + page_index, + capacity); + return -EINVAL; + } + + down_read(&array->lock); + + bmap = &array->bmap[SSDFS_PAGE_ARRAY_ALLOC_BMAP]; + if (!bmap->ptr) { + err = -ERANGE; + SSDFS_WARN("allocation bitmap is empty\n"); + goto finish_clear_page_dirty; + } + + spin_lock(&bmap->lock); + found = bitmap_find_next_zero_area(bmap->ptr, capacity, + page_index, 1, 0); + if (found != page_index) { + /* page is not allocated yet */ + err = -ENOENT; + } + spin_unlock(&bmap->lock); + + if (err) { + SSDFS_ERR("page %lu is not allocated yet\n", + page_index); + goto finish_clear_page_dirty; + } + + bmap = &array->bmap[SSDFS_PAGE_ARRAY_DIRTY_BMAP]; + if (!bmap->ptr) { + err = -ERANGE; + SSDFS_WARN("dirty bitmap is empty\n"); + goto finish_clear_page_dirty; + } + + spin_lock(&bmap->lock); + bitmap_set(bmap->ptr, page_index, 1); + is_clean = bitmap_full(bmap->ptr, capacity); + spin_unlock(&bmap->lock); + + page = array->pages[page_index]; + + if (!page) { + err = -ERANGE; + SSDFS_ERR("page pointer is NULL\n"); + goto finish_clear_page_dirty; + } + + ClearPageDirty(page); + + if (is_clean) + atomic_set(&array->state, SSDFS_PAGE_ARRAY_CREATED); + +finish_clear_page_dirty: + up_read(&array->lock); + + return err; +} + +/* + * ssdfs_page_array_clear_dirty_range() - clear dirty pages in the range + * @array: page array object + * @start: starting index + * @end: ending index (inclusive) + * + * This method tries to set the range's dirty pages as clean + * in the page array. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_page_array_clear_dirty_range(struct ssdfs_page_array *array, + unsigned long start, + unsigned long end) +{ + struct page *page; + struct ssdfs_page_array_bitmap *bmap; + int capacity; + bool is_clean = false; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); + + SSDFS_DBG("array %p, start %lu, end %lu, state %#x\n", + array, start, end, + atomic_read(&array->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&array->state)) { + case SSDFS_PAGE_ARRAY_CREATED: + SSDFS_DBG("no dirty pages in page array\n"); + return 0; + + case SSDFS_PAGE_ARRAY_DIRTY: + /* expected state */ + break; + + default: + SSDFS_WARN("unexpected state %#x of page array\n", + atomic_read(&array->state)); + return -ERANGE; + } + + if (start > end) { + SSDFS_ERR("start %lu > end %lu\n", + start, end); + return -EINVAL; + } + + down_write(&array->lock); + + capacity = atomic_read(&array->pages_capacity); + + bmap = &array->bmap[SSDFS_PAGE_ARRAY_DIRTY_BMAP]; + if (!bmap->ptr) { + err = -ERANGE; + SSDFS_WARN("dirty bitmap is empty\n"); + goto finish_clear_dirty_pages; + } + + end = min_t(int, capacity, end + 1); + + for (i = start; i < end; i++) { + page = array->pages[i]; + + if (page) + ClearPageDirty(page); + } + + spin_lock(&bmap->lock); + bitmap_set(bmap->ptr, start, end - start); + is_clean = bitmap_full(bmap->ptr, capacity); + spin_unlock(&bmap->lock); + + if (is_clean) + atomic_set(&array->state, SSDFS_PAGE_ARRAY_CREATED); + +finish_clear_dirty_pages: + up_write(&array->lock); + + return err; +} + +/* + * ssdfs_page_array_clear_all_dirty_pages() - clear all dirty pages + * @array: page array object + * + * This method tries to set all dirty pages as clean in the page array. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_page_array_clear_all_dirty_pages(struct ssdfs_page_array *array) +{ + int capacity; + unsigned long start = 0, end = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); + + SSDFS_DBG("array %p, state %#x\n", + array, + atomic_read(&array->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + capacity = atomic_read(&array->pages_capacity); + + if (capacity > 0) + end = capacity - 1; + + return ssdfs_page_array_clear_dirty_range(array, start, end); +} + +/* + * ssdfs_page_array_lookup_range() - find pages for a requested tag + * @array: page array object + * @start: pointer on start index value [in|out] + * @end: ending index (inclusive) + * @tag: tag value for the search + * @max_pages: maximum number of pages in the pagevec + * @pvec: pagevec for storing found pages [out] + * + * This method tries to find pages in the page array for + * the requested tag. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOENT - nothing was found for the requested tag. + */ +int ssdfs_page_array_lookup_range(struct ssdfs_page_array *array, + unsigned long *start, + unsigned long end, + int tag, int max_pages, + struct pagevec *pvec) +{ + int state; + struct page *page; + struct ssdfs_page_array_bitmap *bmap; + int capacity; + unsigned long found; + int count = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array || !start || !pvec); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_read(&array->state); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("array %p, start %lu, end %lu, " + "tag %#x, max_pages %d, state %#x\n", + array, *start, end, tag, max_pages, state); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (state) { + case SSDFS_PAGE_ARRAY_CREATED: + case SSDFS_PAGE_ARRAY_DIRTY: + /* expected state */ + break; + + default: + SSDFS_WARN("unexpected state %#x of page array\n", + state); + return -ERANGE; + } + + pagevec_reinit(pvec); + + if (*start > end) { + SSDFS_ERR("start %lu > end %lu\n", + *start, end); + return -EINVAL; + } + + switch (tag) { + case SSDFS_DIRTY_PAGE_TAG: + if (state != SSDFS_PAGE_ARRAY_DIRTY) { + SSDFS_DBG("page array is clean\n"); + return -ENOENT; + } + break; + + default: + SSDFS_ERR("unknown tag %#x\n", + tag); + return -EINVAL; + } + + max_pages = min_t(int, max_pages, (int)PAGEVEC_SIZE); + + down_read(&array->lock); + + capacity = atomic_read(&array->pages_capacity); + if (capacity <= 0) { + err = -ERANGE; + SSDFS_ERR("invalid capacity %d\n", capacity); + goto finish_search; + } + + bmap = &array->bmap[SSDFS_PAGE_ARRAY_DIRTY_BMAP]; + if (!bmap->ptr) { + err = -ERANGE; + SSDFS_WARN("dirty bitmap is empty\n"); + goto finish_search; + } + + end = min_t(int, capacity - 1, end); + + spin_lock(&bmap->lock); + found = bitmap_find_next_zero_area(bmap->ptr, capacity, + *start, 1, 0); + spin_unlock(&bmap->lock); + + *start = (int)found; + + while (found <= end) { + page = array->pages[found]; + + if (page) { + if (!PageDirty(page)) { + SSDFS_ERR("page %lu is not dirty\n", + page_index(page)); + } + ssdfs_get_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + pagevec_add(pvec, page); + count++; + } + + if (count >= max_pages) + goto finish_search; + + found++; + + if (found >= capacity) + break; + + spin_lock(&bmap->lock); + found = bitmap_find_next_zero_area(bmap->ptr, capacity, + found, 1, 0); + spin_unlock(&bmap->lock); + }; + +finish_search: + up_read(&array->lock); + + return err; +} + +/* + * ssdfs_page_array_define_last_page() - define last page index + * @array: page array object + * @capacity: pages capacity in array + * + * This method tries to define last page index. + */ +static inline +void ssdfs_page_array_define_last_page(struct ssdfs_page_array *array, + int capacity) +{ + struct ssdfs_page_array_bitmap *alloc_bmap; + unsigned long *ptr; + unsigned long found; + unsigned long i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); + BUG_ON(!rwsem_is_locked(&array->lock)); + + SSDFS_DBG("array %p, state %#x\n", + array, atomic_read(&array->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + alloc_bmap = &array->bmap[SSDFS_PAGE_ARRAY_ALLOC_BMAP]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!alloc_bmap->ptr); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (array->pages_count == 0) { + /* empty array */ + array->last_page = SSDFS_PAGE_ARRAY_INVALID_LAST_PAGE; + } else if (array->last_page >= SSDFS_PAGE_ARRAY_INVALID_LAST_PAGE) { + /* do nothing */ + } else if (array->last_page > 0) { + for (i = array->last_page; i > array->pages_count; i--) { + spin_lock(&alloc_bmap->lock); + ptr = alloc_bmap->ptr; + found = bitmap_find_next_zero_area(ptr, + capacity, + i, 1, 0); + spin_unlock(&alloc_bmap->lock); + + if (found == i) + break; + } + + array->last_page = i; + } else + array->last_page = SSDFS_PAGE_ARRAY_INVALID_LAST_PAGE; +} + +/* + * ssdfs_page_array_delete_page() - delete page from the page array + * @array: page array object + * @page_index: index of the page + * + * This method tries to delete a page from the page array. + * + * RETURN: + * [success] - pointer on deleted page. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOENT - page array hasn't a page for the index. + */ +struct page *ssdfs_page_array_delete_page(struct ssdfs_page_array *array, + unsigned long page_index) +{ + struct page *page; + struct ssdfs_page_array_bitmap *alloc_bmap, *dirty_bmap; + int capacity; + unsigned long found; + bool is_clean = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); + + SSDFS_DBG("array %p, page_index %lu, state %#x\n", + array, page_index, + atomic_read(&array->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&array->state)) { + case SSDFS_PAGE_ARRAY_CREATED: + case SSDFS_PAGE_ARRAY_DIRTY: + /* expected state */ + break; + + default: + SSDFS_WARN("unexpected state %#x of page array\n", + atomic_read(&array->state)); + return ERR_PTR(-ERANGE); + } + + capacity = atomic_read(&array->pages_capacity); + + if (page_index >= capacity) { + SSDFS_ERR("page_index %lu >= pages_capacity %d\n", + page_index, + capacity); + return ERR_PTR(-EINVAL); + } + + down_write(&array->lock); + + alloc_bmap = &array->bmap[SSDFS_PAGE_ARRAY_ALLOC_BMAP]; + if (!alloc_bmap->ptr) { + err = -ERANGE; + SSDFS_WARN("alloc bitmap is empty\n"); + goto finish_delete_page; + } + + dirty_bmap = &array->bmap[SSDFS_PAGE_ARRAY_DIRTY_BMAP]; + if (!dirty_bmap->ptr) { + err = -ERANGE; + SSDFS_WARN("dirty bitmap is empty\n"); + goto finish_delete_page; + } + + spin_lock(&alloc_bmap->lock); + found = bitmap_find_next_zero_area(alloc_bmap->ptr, capacity, + page_index, 1, 0); + if (found != page_index) { + /* page is not allocated yet */ + err = -ENOENT; + } + spin_unlock(&alloc_bmap->lock); + + if (err) { + SSDFS_ERR("page %lu is not allocated yet\n", + page_index); + goto finish_delete_page; + } + + page = array->pages[page_index]; + + if (!page) { + err = -ERANGE; + SSDFS_ERR("page pointer is NULL\n"); + goto finish_delete_page; + } + + spin_lock(&alloc_bmap->lock); + bitmap_set(alloc_bmap->ptr, page_index, 1); + spin_unlock(&alloc_bmap->lock); + + spin_lock(&dirty_bmap->lock); + bitmap_set(dirty_bmap->ptr, page_index, 1); + is_clean = bitmap_full(dirty_bmap->ptr, capacity); + spin_unlock(&dirty_bmap->lock); + + array->pages_count--; + array->pages[page_index] = NULL; + + if (array->last_page == page_index) + ssdfs_page_array_define_last_page(array, capacity); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pages_count %lu, last_page %lu\n", + array->pages_count, array->last_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_clean) + atomic_set(&array->state, SSDFS_PAGE_ARRAY_CREATED); + +finish_delete_page: + up_write(&array->lock); + + if (unlikely(err)) + return ERR_PTR(err); + + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_parray_forget_page(page); + + return page; +} + +/* + * ssdfs_page_array_release_pages() - release pages in the range + * @array: page array object + * @start: pointer on start index value [in|out] + * @end: ending index (inclusive) + * + * This method tries to release pages for the requested range. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_page_array_release_pages(struct ssdfs_page_array *array, + unsigned long *start, + unsigned long end) +{ + struct page *page; + struct ssdfs_page_array_bitmap *alloc_bmap, *dirty_bmap; + int capacity; + unsigned long found, found_dirty; +#ifdef CONFIG_SSDFS_DEBUG + unsigned long released = 0; + unsigned long allocated_pages = 0; + unsigned long dirty_pages = 0; +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array || !start); + + SSDFS_DBG("array %p, start %lu, end %lu, state %#x\n", + array, *start, end, + atomic_read(&array->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&array->state)) { + case SSDFS_PAGE_ARRAY_CREATED: + case SSDFS_PAGE_ARRAY_DIRTY: + /* expected state */ + break; + + default: + SSDFS_WARN("unexpected state %#x of page array\n", + atomic_read(&array->state)); + return -ERANGE; + } + + if (*start > end) { + SSDFS_ERR("start %lu > end %lu\n", + *start, end); + return -EINVAL; + } + + down_write(&array->lock); + + capacity = atomic_read(&array->pages_capacity); + if (capacity <= 0) { + err = -ERANGE; + SSDFS_ERR("invalid capacity %d\n", capacity); + goto finish_release_pages_range; + } + + if (array->pages_count == 0) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pages_count %lu\n", + array->pages_count); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_release_pages_range; + } + +#ifdef CONFIG_SSDFS_DEBUG + released = array->pages_count; +#endif /* CONFIG_SSDFS_DEBUG */ + + alloc_bmap = &array->bmap[SSDFS_PAGE_ARRAY_ALLOC_BMAP]; + if (!alloc_bmap->ptr) { + err = -ERANGE; + SSDFS_WARN("allocation bitmap is empty\n"); + goto finish_release_pages_range; + } + +#ifdef CONFIG_SSDFS_DEBUG + spin_lock(&alloc_bmap->lock); + allocated_pages = bitmap_weight(alloc_bmap->ptr, capacity); + spin_unlock(&alloc_bmap->lock); + allocated_pages = capacity - allocated_pages; +#endif /* CONFIG_SSDFS_DEBUG */ + + dirty_bmap = &array->bmap[SSDFS_PAGE_ARRAY_DIRTY_BMAP]; + if (!dirty_bmap->ptr) { + err = -ERANGE; + SSDFS_WARN("dirty bitmap is empty\n"); + goto finish_release_pages_range; + } + +#ifdef CONFIG_SSDFS_DEBUG + spin_lock(&dirty_bmap->lock); + dirty_pages = bitmap_weight(dirty_bmap->ptr, capacity); + spin_unlock(&dirty_bmap->lock); + dirty_pages = capacity - dirty_pages; +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&alloc_bmap->lock); + found = bitmap_find_next_zero_area(alloc_bmap->ptr, capacity, + *start, 1, 0); + spin_unlock(&alloc_bmap->lock); + + end = min_t(int, capacity - 1, end); + + *start = found; + + while (found <= end) { + spin_lock(&dirty_bmap->lock); + found_dirty = bitmap_find_next_zero_area(dirty_bmap->ptr, + capacity, + found, 1, 0); + spin_unlock(&dirty_bmap->lock); + + if (found == found_dirty) { + err = -ERANGE; + SSDFS_ERR("page %lu is dirty\n", + found); + goto finish_release_pages_range; + } + + page = array->pages[found]; + + if (page) { + ssdfs_lock_page(page); + ClearPageUptodate(page); + ssdfs_clear_page_private(page, 0); + ssdfs_unlock_page(page); + + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_parray_free_page(page); + array->pages[found] = NULL; + } + + spin_lock(&alloc_bmap->lock); + bitmap_set(alloc_bmap->ptr, found, 1); + spin_unlock(&alloc_bmap->lock); + + array->pages_count--; + + found++; + + if (found >= capacity) + break; + + spin_lock(&alloc_bmap->lock); + found = bitmap_find_next_zero_area(alloc_bmap->ptr, + capacity, + found, 1, 0); + spin_unlock(&alloc_bmap->lock); + }; + + ssdfs_page_array_define_last_page(array, capacity); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pages_count %lu, last_page %lu\n", + array->pages_count, array->last_page); + + released -= array->pages_count; + + SSDFS_DBG("released %lu, pages_count %lu, " + "allocated_pages %lu, dirty_pages %lu\n", + released, array->pages_count, + allocated_pages, dirty_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_release_pages_range: + up_write(&array->lock); + + return err; +} + +/* + * ssdfs_page_array_release_all_pages() - release all pages + * @array: page array object + * + * This method tries to release all pages in the page array. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_page_array_release_all_pages(struct ssdfs_page_array *array) +{ + int capacity; + unsigned long start = 0, end = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); + + SSDFS_DBG("array %p, state %#x\n", + array, + atomic_read(&array->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + capacity = atomic_read(&array->pages_capacity); + + if (capacity > 0) + end = capacity - 1; + + return ssdfs_page_array_release_pages(array, &start, end); +} diff --git a/fs/ssdfs/page_array.h b/fs/ssdfs/page_array.h new file mode 100644 index 000000000000..020190bceaf9 --- /dev/null +++ b/fs/ssdfs/page_array.h @@ -0,0 +1,119 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/page_array.h - page array object declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#ifndef _SSDFS_PAGE_ARRAY_H +#define _SSDFS_PAGE_ARRAY_H + +/* + * struct ssdfs_page_array_bitmap - bitmap of states + * @lock: bitmap lock + * @ptr: bitmap + */ +struct ssdfs_page_array_bitmap { + spinlock_t lock; + unsigned long *ptr; +}; + +/* + * struct ssdfs_page_array - array of memory pages + * @state: page array's state + * @pages_capacity: maximum possible number of pages in array + * @lock: page array's lock + * @pages: array of memory pages' pointers + * @pages_count: current number of allocated pages + * @last_page: latest page index + * @bmap_bytes: number of bytes in every bitmap + * bmap: array of bitmaps + */ +struct ssdfs_page_array { + atomic_t state; + atomic_t pages_capacity; + + struct rw_semaphore lock; + struct page **pages; + unsigned long pages_count; +#define SSDFS_PAGE_ARRAY_INVALID_LAST_PAGE (ULONG_MAX) + unsigned long last_page; + size_t bmap_bytes; + +#define SSDFS_PAGE_ARRAY_ALLOC_BMAP (0) +#define SSDFS_PAGE_ARRAY_DIRTY_BMAP (1) +#define SSDFS_PAGE_ARRAY_BMAP_COUNT (2) + struct ssdfs_page_array_bitmap bmap[SSDFS_PAGE_ARRAY_BMAP_COUNT]; +}; + +/* Page array states */ +enum { + SSDFS_PAGE_ARRAY_UNKNOWN_STATE, + SSDFS_PAGE_ARRAY_CREATED, + SSDFS_PAGE_ARRAY_DIRTY, + SSDFS_PAGE_ARRAY_STATE_MAX +}; + +/* Available tags */ +enum { + SSDFS_UNKNOWN_PAGE_TAG, + SSDFS_DIRTY_PAGE_TAG, + SSDFS_PAGE_TAG_MAX +}; + +/* + * Page array's API + */ +int ssdfs_create_page_array(int capacity, struct ssdfs_page_array *array); +void ssdfs_destroy_page_array(struct ssdfs_page_array *array); +int ssdfs_reinit_page_array(int capacity, struct ssdfs_page_array *array); +bool is_ssdfs_page_array_empty(struct ssdfs_page_array *array); +unsigned long +ssdfs_page_array_get_last_page_index(struct ssdfs_page_array *array); +int ssdfs_page_array_add_page(struct ssdfs_page_array *array, + struct page *page, + unsigned long page_index); +struct page * +ssdfs_page_array_allocate_page_locked(struct ssdfs_page_array *array, + unsigned long page_index); +struct page *ssdfs_page_array_get_page_locked(struct ssdfs_page_array *array, + unsigned long page_index); +struct page *ssdfs_page_array_get_page(struct ssdfs_page_array *array, + unsigned long page_index); +struct page *ssdfs_page_array_grab_page(struct ssdfs_page_array *array, + unsigned long page_index); +int ssdfs_page_array_set_page_dirty(struct ssdfs_page_array *array, + unsigned long page_index); +int ssdfs_page_array_clear_dirty_page(struct ssdfs_page_array *array, + unsigned long page_index); +int ssdfs_page_array_clear_dirty_range(struct ssdfs_page_array *array, + unsigned long start, + unsigned long end); +int ssdfs_page_array_clear_all_dirty_pages(struct ssdfs_page_array *array); +int ssdfs_page_array_lookup_range(struct ssdfs_page_array *array, + unsigned long *start, + unsigned long end, + int tag, int max_pages, + struct pagevec *pvec); +struct page *ssdfs_page_array_delete_page(struct ssdfs_page_array *array, + unsigned long page_index); +int ssdfs_page_array_release_pages(struct ssdfs_page_array *array, + unsigned long *start, + unsigned long end); +int ssdfs_page_array_release_all_pages(struct ssdfs_page_array *array); + +#endif /* _SSDFS_PAGE_ARRAY_H */ diff --git a/fs/ssdfs/peb.c b/fs/ssdfs/peb.c new file mode 100644 index 000000000000..9f95ef176744 --- /dev/null +++ b/fs/ssdfs/peb.c @@ -0,0 +1,813 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/peb.c - Physical Erase Block (PEB) object's functionality. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "compression.h" +#include "page_vector.h" +#include "block_bitmap.h" +#include "segment_bitmap.h" +#include "offset_translation_table.h" +#include "page_array.h" +#include "peb.h" +#include "peb_container.h" +#include "segment.h" +#include "peb_mapping_table.h" + +#include + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_peb_page_leaks; +atomic64_t ssdfs_peb_memory_leaks; +atomic64_t ssdfs_peb_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_peb_cache_leaks_increment(void *kaddr) + * void ssdfs_peb_cache_leaks_decrement(void *kaddr) + * void *ssdfs_peb_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_peb_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_peb_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_peb_kfree(void *kaddr) + * struct page *ssdfs_peb_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_peb_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_peb_free_page(struct page *page) + * void ssdfs_peb_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(peb) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(peb) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_peb_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_peb_page_leaks, 0); + atomic64_set(&ssdfs_peb_memory_leaks, 0); + atomic64_set(&ssdfs_peb_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_peb_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_peb_page_leaks) != 0) { + SSDFS_ERR("PEB: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_peb_page_leaks)); + } + + if (atomic64_read(&ssdfs_peb_memory_leaks) != 0) { + SSDFS_ERR("PEB: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_peb_memory_leaks)); + } + + if (atomic64_read(&ssdfs_peb_cache_leaks) != 0) { + SSDFS_ERR("PEB: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_peb_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +/* + * ssdfs_create_clean_peb_object() - create "clean" PEB object + * @pebi: pointer on unitialized PEB object + * + * This function tries to initialize PEB object for "clean" + * state of the segment. + * + * RETURN: + * [success] - PEB object has been constructed sucessfully. + * [failure] - error code: + * + * %-EINVAL - invalid input. + */ +static +int ssdfs_create_clean_peb_object(struct ssdfs_peb_info *pebi) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(pebi->peb_id == U64_MAX); + + SSDFS_DBG("pebi %p, peb_id %llu\n", + pebi, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_peb_current_log_init(pebi, pebi->log_pages, 0, 0, U32_MAX); + + return 0; +} + +/* + * ssdfs_create_using_peb_object() - create "using" PEB object + * @pebi: pointer on unitialized PEB object + * + * This function tries to initialize PEB object for "using" + * state of the segment. + * + * RETURN: + * [success] - PEB object has been constructed sucessfully. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENOMEM - unable to allocate memory. + */ +static +int ssdfs_create_using_peb_object(struct ssdfs_peb_info *pebi) +{ + struct ssdfs_fs_info *fsi; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(pebi->peb_id == U64_MAX); + + SSDFS_DBG("pebi %p, peb_id %llu\n", + pebi, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + if (fsi->is_zns_device) { + loff_t offset = pebi->peb_id * fsi->erasesize; + + err = fsi->devops->reopen_zone(fsi->sb, offset); + if (unlikely(err)) { + SSDFS_ERR("fail to reopen zone: " + "offset %llu, err %d\n", + offset, err); + return err; + } + } + + return 0; +} + +/* + * ssdfs_create_used_peb_object() - create "used" PEB object + * @pebi: pointer on unitialized PEB object + * + * This function tries to initialize PEB object for "used" + * state of the segment. + * + * RETURN: + * [success] - PEB object has been constructed sucessfully. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENOMEM - unable to allocate memory. + */ +static +int ssdfs_create_used_peb_object(struct ssdfs_peb_info *pebi) +{ + struct ssdfs_fs_info *fsi; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(pebi->peb_id == U64_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pebi %p, peb_id %llu\n", + pebi, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_peb_current_log_init(pebi, 0, fsi->pages_per_peb, 0, U32_MAX); + + return 0; +} + +/* + * ssdfs_create_dirty_peb_object() - create "dirty" PEB object + * @pebi: pointer on unitialized PEB object + * + * This function tries to initialize PEB object for "dirty" + * state of the PEB. + * + * RETURN: + * [success] - PEB object has been constructed sucessfully. + * [failure] - error code: + * + * %-EINVAL - invalid input. + */ +static +int ssdfs_create_dirty_peb_object(struct ssdfs_peb_info *pebi) +{ + struct ssdfs_fs_info *fsi; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(pebi->peb_id == U64_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pebi %p, peb_id %llu\n", + pebi, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_peb_current_log_init(pebi, 0, fsi->pages_per_peb, 0, U32_MAX); + + return 0; +} + +static inline +size_t ssdfs_peb_temp_buffer_default_size(u32 pagesize) +{ + size_t blk_desc_size = sizeof(struct ssdfs_block_descriptor); + size_t size; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(pagesize > SSDFS_128KB); +#endif /* CONFIG_SSDFS_DEBUG */ + + size = (SSDFS_128KB / pagesize) * blk_desc_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page_size %u, default_size %zu\n", + pagesize, size); +#endif /* CONFIG_SSDFS_DEBUG */ + + return size; +} + +/* + * ssdfs_peb_realloc_read_buffer() - realloc temporary read buffer + * @buf: pointer on read buffer + */ +int ssdfs_peb_realloc_read_buffer(struct ssdfs_peb_read_buffer *buf, + size_t new_size) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (buf->size >= PAGE_SIZE) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to realloc buffer: " + "old_size %zu\n", + buf->size); +#endif /* CONFIG_SSDFS_DEBUG */ + return -E2BIG; + } + + if (buf->size == new_size) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("do nothing: old_size %zu, new_size %zu\n", + buf->size, new_size); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + if (buf->size > new_size) { + SSDFS_ERR("shrink not supported\n"); + return -EOPNOTSUPP; + } + + buf->ptr = krealloc(buf->ptr, new_size, GFP_KERNEL); + if (!buf->ptr) { + SSDFS_ERR("fail to allocate buffer\n"); + return -ENOMEM; + } + + buf->size = new_size; + + return 0; +} + +/* + * ssdfs_peb_realloc_write_buffer() - realloc temporary write buffer + * @buf: pointer on write buffer + */ +int ssdfs_peb_realloc_write_buffer(struct ssdfs_peb_temp_buffer *buf) +{ + size_t new_size; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (buf->size >= PAGE_SIZE) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to realloc buffer: " + "old_size %zu\n", + buf->size); +#endif /* CONFIG_SSDFS_DEBUG */ + return -E2BIG; + } + + new_size = min_t(size_t, buf->size * 2, (size_t)PAGE_SIZE); + + buf->ptr = krealloc(buf->ptr, new_size, GFP_KERNEL); + if (!buf->ptr) { + SSDFS_ERR("fail to allocate buffer\n"); + return -ENOMEM; + } + + buf->size = new_size; + + return 0; +} + +/* + * ssdfs_peb_current_log_prepare() - prepare current log object + * @pebi: pointer on PEB object + */ +static inline +int ssdfs_peb_current_log_prepare(struct ssdfs_peb_info *pebi) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_peb_area *area; + struct ssdfs_peb_temp_buffer *write_buf; + size_t blk_desc_size = sizeof(struct ssdfs_block_descriptor); + size_t buf_size; + u16 flags; + size_t bmap_bytes; + size_t bmap_pages; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + flags = fsi->metadata_options.blk2off_tbl.flags; + buf_size = ssdfs_peb_temp_buffer_default_size(fsi->pagesize); + + mutex_init(&pebi->current_log.lock); + atomic_set(&pebi->current_log.sequence_id, 0); + + pebi->current_log.start_page = U32_MAX; + pebi->current_log.reserved_pages = 0; + pebi->current_log.free_data_pages = pebi->log_pages; + pebi->current_log.seg_flags = 0; + pebi->current_log.prev_log_bmap_bytes = U32_MAX; + pebi->current_log.last_log_time = U64_MAX; + pebi->current_log.last_log_cno = U64_MAX; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_data_pages %u\n", + pebi->current_log.free_data_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + bmap_bytes = BLK_BMAP_BYTES(fsi->pages_per_peb); + bmap_pages = (bmap_bytes + PAGE_SIZE - 1) / PAGE_SIZE; + + err = ssdfs_page_vector_create(&pebi->current_log.bmap_snapshot, + bmap_pages); + if (unlikely(err)) { + SSDFS_ERR("fail to create page vector: " + "bmap_pages %zu, err %d\n", + bmap_pages, err); + return err; + } + + for (i = 0; i < SSDFS_LOG_AREA_MAX; i++) { + struct ssdfs_peb_area_metadata *metadata; + size_t metadata_size = sizeof(struct ssdfs_peb_area_metadata); + size_t blk_table_size = sizeof(struct ssdfs_area_block_table); + + area = &pebi->current_log.area[i]; + metadata = &area->metadata; + memset(&area->metadata, 0, metadata_size); + + switch (i) { + case SSDFS_LOG_BLK_DESC_AREA: + write_buf = &area->metadata.area.blk_desc.flush_buf; + + area->has_metadata = true; + area->write_offset = blk_table_size; + area->compressed_offset = blk_table_size; + area->metadata.reserved_offset = blk_table_size; + + if (flags & SSDFS_BLK2OFF_TBL_MAKE_COMPRESSION) { + write_buf->ptr = ssdfs_peb_kzalloc(buf_size, + GFP_KERNEL); + if (!write_buf->ptr) { + err = -ENOMEM; + SSDFS_ERR("unable to allocate\n"); + goto fail_init_current_log; + } + + write_buf->write_offset = 0; + write_buf->granularity = blk_desc_size; + write_buf->size = buf_size; + } else { + write_buf->ptr = NULL; + write_buf->write_offset = 0; + write_buf->granularity = 0; + write_buf->size = 0; + } + break; + + case SSDFS_LOG_MAIN_AREA: + case SSDFS_LOG_DIFFS_AREA: + case SSDFS_LOG_JOURNAL_AREA: + area->has_metadata = false; + area->write_offset = 0; + area->compressed_offset = 0; + area->metadata.reserved_offset = 0; + break; + + default: + BUG(); + }; + + err = ssdfs_create_page_array(fsi->pages_per_peb, + &area->array); + if (unlikely(err)) { + SSDFS_ERR("fail to create page array: " + "capacity %u, err %d\n", + fsi->pages_per_peb, err); + goto fail_init_current_log; + } + } + + atomic_set(&pebi->current_log.state, SSDFS_LOG_PREPARED); + return 0; + +fail_init_current_log: + for (--i; i >= 0; i--) { + area = &pebi->current_log.area[i]; + + if (i == SSDFS_LOG_BLK_DESC_AREA) { + write_buf = &area->metadata.area.blk_desc.flush_buf; + + area->metadata.area.blk_desc.capacity = 0; + area->metadata.area.blk_desc.items_count = 0; + + if (write_buf->ptr) { + ssdfs_peb_kfree(write_buf->ptr); + write_buf->ptr = NULL; + } + } + + ssdfs_destroy_page_array(&area->array); + } + + ssdfs_page_vector_destroy(&pebi->current_log.bmap_snapshot); + + return err; +} + +/* + * ssdfs_peb_current_log_destroy() - destroy current log object + * @pebi: pointer on PEB object + */ +static inline +int ssdfs_peb_current_log_destroy(struct ssdfs_peb_info *pebi) +{ + struct ssdfs_peb_temp_buffer *write_buf; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); + BUG_ON(mutex_is_locked(&pebi->current_log.lock)); + + SSDFS_DBG("pebi %p\n", pebi); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_peb_current_log_lock(pebi); + + for (i = 0; i < SSDFS_LOG_AREA_MAX; i++) { + struct ssdfs_page_array *area_pages; + + area_pages = &pebi->current_log.area[i].array; + + if (atomic_read(&area_pages->state) == SSDFS_PAGE_ARRAY_DIRTY) { + ssdfs_fs_error(pebi->pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "PEB %llu is dirty on destruction\n", + pebi->peb_id); + err = -EIO; + } + + if (i == SSDFS_LOG_BLK_DESC_AREA) { + struct ssdfs_peb_area *area; + + area = &pebi->current_log.area[i]; + area->metadata.area.blk_desc.capacity = 0; + area->metadata.area.blk_desc.items_count = 0; + + write_buf = &area->metadata.area.blk_desc.flush_buf; + + if (write_buf->ptr) { + ssdfs_peb_kfree(write_buf->ptr); + write_buf->ptr = NULL; + write_buf->write_offset = 0; + write_buf->size = 0; + } + } + + ssdfs_destroy_page_array(area_pages); + } + + ssdfs_page_vector_release(&pebi->current_log.bmap_snapshot); + ssdfs_page_vector_destroy(&pebi->current_log.bmap_snapshot); + + atomic_set(&pebi->current_log.state, SSDFS_LOG_UNKNOWN); + ssdfs_peb_current_log_unlock(pebi); + + return err; +} + +/* + * ssdfs_peb_object_create() - create PEB object in array + * @pebi: pointer on PEB object + * @pebc: pointer on PEB container + * @peb_id: PEB identification number + * @peb_state: PEB's state + * @peb_migration_id: PEB's migration ID + * + * This function tries to create PEB object for + * @peb_index in array. + * + * RETURN: + * [success] - PEB object has been constructed sucessfully. + * [failure] - error code: + * + * %-EINVAL - invalid input. + */ +int ssdfs_peb_object_create(struct ssdfs_peb_info *pebi, + struct ssdfs_peb_container *pebc, + u64 peb_id, int peb_state, + u8 peb_migration_id) +{ + struct ssdfs_fs_info *fsi; + int peb_type; + size_t buf_size; + u16 flags; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebc || !pebc->parent_si); + + if ((peb_id * pebc->parent_si->fsi->pebs_per_seg) >= + pebc->parent_si->fsi->nsegs) { + SSDFS_ERR("requested peb_id %llu >= nsegs %llu\n", + peb_id, pebc->parent_si->fsi->nsegs); + return -EINVAL; + } + + if (pebc->peb_index >= pebc->parent_si->pebs_count) { + SSDFS_ERR("requested peb_index %u >= pebs_count %u\n", + pebc->peb_index, + pebc->parent_si->pebs_count); + return -EINVAL; + } +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("pebi %p, seg %llu, peb_id %llu, " + "peb_index %u, pebc %p, " + "peb_state %#x, peb_migration_id %u\n", + pebi, pebc->parent_si->seg_id, + pebi->peb_id, pebc->peb_index, pebc, + peb_state, peb_migration_id); +#else + SSDFS_DBG("pebi %p, seg %llu, peb_id %llu, " + "peb_index %u, pebc %p, " + "peb_state %#x, peb_migration_id %u\n", + pebi, pebc->parent_si->seg_id, + pebi->peb_id, pebc->peb_index, pebc, + peb_state, peb_migration_id); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + fsi = pebc->parent_si->fsi; + flags = fsi->metadata_options.blk2off_tbl.flags; + buf_size = ssdfs_peb_temp_buffer_default_size(fsi->pagesize); + + atomic_set(&pebi->state, SSDFS_PEB_OBJECT_UNKNOWN_STATE); + + peb_type = SEG2PEB_TYPE(pebc->parent_si->seg_type); + if (peb_type >= SSDFS_MAPTBL_PEB_TYPE_MAX) { + err = -EINVAL; + SSDFS_ERR("invalid seg_type %#x\n", + pebc->parent_si->seg_type); + goto fail_conctruct_peb_obj; + } + + pebi->peb_id = peb_id; + pebi->peb_index = pebc->peb_index; + pebi->log_pages = pebc->log_pages; + pebi->peb_create_time = ssdfs_current_timestamp(); + ssdfs_set_peb_migration_id(pebi, peb_migration_id); + init_completion(&pebi->init_end); + atomic_set(&pebi->reserved_bytes.blk_bmap, 0); + atomic_set(&pebi->reserved_bytes.blk2off_tbl, 0); + atomic_set(&pebi->reserved_bytes.blk_desc_tbl, 0); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb_id %llu, " + "peb_create_time %llx\n", + pebc->parent_si->seg_id, + pebi->peb_id, + pebi->peb_create_time); +#endif /* CONFIG_SSDFS_DEBUG */ + + init_rwsem(&pebi->read_buffer.lock); + if (flags & SSDFS_BLK2OFF_TBL_MAKE_COMPRESSION) { + pebi->read_buffer.blk_desc.ptr = ssdfs_peb_kzalloc(buf_size, + GFP_KERNEL); + if (!pebi->read_buffer.blk_desc.ptr) { + err = -ENOMEM; + SSDFS_ERR("unable to allocate\n"); + goto fail_conctruct_peb_obj; + } + + pebi->read_buffer.blk_desc.offset = U32_MAX; + pebi->read_buffer.blk_desc.size = buf_size; + } else { + pebi->read_buffer.blk_desc.ptr = NULL; + pebi->read_buffer.blk_desc.offset = U32_MAX; + pebi->read_buffer.blk_desc.size = 0; + } + + pebi->pebc = pebc; + + err = ssdfs_create_page_array(fsi->pages_per_peb, + &pebi->cache); + if (unlikely(err)) { + SSDFS_ERR("fail to create page array: " + "capacity %u, err %d\n", + fsi->pages_per_peb, err); + goto fail_conctruct_peb_obj; + } + + err = ssdfs_peb_current_log_prepare(pebi); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare current log: err %d\n", + err); + goto fail_conctruct_peb_obj; + } + + switch (peb_state) { + case SSDFS_MAPTBL_CLEAN_PEB_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_CLEAN_STATE: + err = ssdfs_create_clean_peb_object(pebi); + if (unlikely(err)) { + SSDFS_ERR("fail to create clean PEB object: err %d\n", + err); + goto fail_conctruct_peb_obj; + } + break; + + case SSDFS_MAPTBL_USING_PEB_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + err = ssdfs_create_using_peb_object(pebi); + if (unlikely(err)) { + SSDFS_ERR("fail to create using PEB object: err %d\n", + err); + goto fail_conctruct_peb_obj; + } + break; + + case SSDFS_MAPTBL_USED_PEB_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_USED_STATE: + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_PRE_DIRTY_STATE: + err = ssdfs_create_used_peb_object(pebi); + if (unlikely(err)) { + SSDFS_ERR("fail to create used PEB object: err %d\n", + err); + goto fail_conctruct_peb_obj; + } + break; + + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_DIRTY_STATE: + err = ssdfs_create_dirty_peb_object(pebi); + if (unlikely(err)) { + SSDFS_ERR("fail to create dirty PEB object: err %d\n", + err); + goto fail_conctruct_peb_obj; + } + break; + + default: + SSDFS_ERR("invalid PEB state\n"); + err = -EINVAL; + goto fail_conctruct_peb_obj; + }; + + atomic_set(&pebi->state, SSDFS_PEB_OBJECT_CREATED); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; + +fail_conctruct_peb_obj: + ssdfs_peb_object_destroy(pebi); + pebi->peb_id = U64_MAX; + pebi->pebc = pebc; + return err; +} + +/* + * ssdfs_peb_object_destroy() - destroy PEB object in array + * @pebi: pointer on PEB object + * + * This function tries to destroy PEB object. + * + * RETURN: + * [success] - PEB object has been destroyed sucessfully. + * [failure] - error code: + * + * %-EIO - I/O errors were detected. + */ +int ssdfs_peb_object_destroy(struct ssdfs_peb_info *pebi) +{ + struct ssdfs_fs_info *fsi; + int state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("peb_id %llu\n", pebi->peb_id); +#else + SSDFS_DBG("peb_id %llu\n", pebi->peb_id); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + fsi = pebi->pebc->parent_si->fsi; + + if (pebi->peb_id >= (fsi->nsegs * fsi->pebs_per_seg)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("invalid PEB id %llu\n", pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EINVAL; + } + + err = ssdfs_peb_current_log_destroy(pebi); + + down_write(&pebi->read_buffer.lock); + if (pebi->read_buffer.blk_desc.ptr) { + ssdfs_peb_kfree(pebi->read_buffer.blk_desc.ptr); + pebi->read_buffer.blk_desc.ptr = NULL; + pebi->read_buffer.blk_desc.offset = U32_MAX; + pebi->read_buffer.blk_desc.size = 0; + } + up_write(&pebi->read_buffer.lock); + + state = atomic_read(&pebi->cache.state); + if (state == SSDFS_PAGE_ARRAY_DIRTY) { + ssdfs_fs_error(pebi->pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "PEB %llu is dirty on destruction\n", + pebi->peb_id); + err = -EIO; + } + + ssdfs_destroy_page_array(&pebi->cache); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: err %d\n", err); +#else + SSDFS_DBG("finished: err %d\n", err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} diff --git a/fs/ssdfs/peb.h b/fs/ssdfs/peb.h new file mode 100644 index 000000000000..bf20770d3b95 --- /dev/null +++ b/fs/ssdfs/peb.h @@ -0,0 +1,970 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/peb.h - Physical Erase Block (PEB) object declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#ifndef _SSDFS_PEB_H +#define _SSDFS_PEB_H + +#include "request_queue.h" + +#define SSDFS_BLKBMAP_FRAG_HDR_CAPACITY \ + (sizeof(struct ssdfs_block_bitmap_fragment) + \ + (sizeof(struct ssdfs_fragment_desc) * \ + SSDFS_BLK_BMAP_FRAGMENTS_CHAIN_MAX)) + +#define SSDFS_BLKBMAP_HDR_CAPACITY \ + (sizeof(struct ssdfs_block_bitmap_header) + \ + SSDFS_BLKBMAP_FRAG_HDR_CAPACITY) + +/* + * struct ssdfs_blk_bmap_init_env - block bitmap init environment + * @bmap_hdr: pointer on block bitmap header + * @bmap_hdr_buf: block bitmap header buffer + * @frag_hdr: block bitmap fragment header + * @frag_hdr_buf: block bitmap fragment header buffer + * @fragment_index: index of bmap fragment + * @array: page vector that stores block bitmap content + * @read_bytes: counter of all read bytes + */ +struct ssdfs_blk_bmap_init_env { + struct ssdfs_block_bitmap_header *bmap_hdr; + struct ssdfs_block_bitmap_fragment *frag_hdr; + u8 bmap_hdr_buf[SSDFS_BLKBMAP_HDR_CAPACITY]; + int fragment_index; + struct ssdfs_page_vector array; + u32 read_bytes; +}; + +/* + * struct ssdfs_blk2off_table_init_env - blk2off table init environment + * @tbl_hdr: blk2off table header + * @pvec: pagevec with blk2off table fragment + * @blk2off_tbl_hdr_off: blk2off table header offset + * @read_off: current read offset + * @write_off: current write offset + */ +struct ssdfs_blk2off_table_init_env { + struct ssdfs_blk2off_table_header tbl_hdr; + struct pagevec pvec; + u32 blk2off_tbl_hdr_off; + u32 read_off; + u32 write_off; +}; + +/* + * struct ssdfs_blk_desc_table_init_env - blk desc table init environment + * @tbl_hdr: blk2off table header + * @pvec: pagevec with blk2off table fragment + * @compressed_buf: buffer for compressed blk2off table fragment + * @buf_size: size of compressed buffer + * @read_off: current read offset + * @write_off: current write offset + */ +struct ssdfs_blk_desc_table_init_env { + struct pagevec pvec; + void *compressed_buf; + u32 buf_size; + u32 read_off; + u32 write_off; +}; + +/* + * struct ssdfs_read_init_env - read operation init environment + * @log_hdr: log header + * @has_seg_hdr: does log have segment header? + * @footer: log footer + * @has_footer: does log have footer? + * @cur_migration_id: current PEB's migration ID + * @prev_migration_id: previous PEB's migration ID + * @log_offset: offset in pages of the requested log + * @log_pages: pages count in every log of segment + * @log_bytes: number of bytes in the requested log + * @b_init: block bitmap init environment + * @t_init: blk2off table init environment + * @bdt_init: blk desc table init environment + */ +struct ssdfs_read_init_env { + void *log_hdr; + bool has_seg_hdr; + struct ssdfs_log_footer *footer; + bool has_footer; + int cur_migration_id; + int prev_migration_id; + u32 log_offset; + u32 log_pages; + u32 log_bytes; + + struct ssdfs_blk_bmap_init_env b_init; + struct ssdfs_blk2off_table_init_env t_init; + struct ssdfs_blk_desc_table_init_env bdt_init; +}; + +/* + * struct ssdfs_protection_window - protection window length + * @cno_lock: lock of checkpoints set + * @create_cno: creation checkpoint + * @last_request_cno: last request checkpoint + * @reqs_count: current number of active requests + * @protected_range: last measured protected range length + * @future_request_cno: expectation to receive a next request in the future + */ +struct ssdfs_protection_window { + spinlock_t cno_lock; + u64 create_cno; + u64 last_request_cno; + u32 reqs_count; + u64 protected_range; + u64 future_request_cno; +}; + +/* + * struct ssdfs_peb_diffs_area_metadata - diffs area's metadata + * @hdr: diffs area's table header + */ +struct ssdfs_peb_diffs_area_metadata { + struct ssdfs_block_state_descriptor hdr; +}; + +/* + * struct ssdfs_peb_journal_area_metadata - journal area's metadata + * @hdr: journal area's table header + */ +struct ssdfs_peb_journal_area_metadata { + struct ssdfs_block_state_descriptor hdr; +}; + +/* + * struct ssdfs_peb_read_buffer - read buffer + * @ptr: pointer on buffer + * @offset: logical offset in metadata structure + * @size: buffer size in bytes + */ +struct ssdfs_peb_read_buffer { + void *ptr; + u32 offset; + size_t size; +}; + +/* + * struct ssdfs_peb_temp_read_buffers - read temporary buffers + * @lock: temporary buffers lock + * @blk_desc: block descriptor table's temp read buffer + */ +struct ssdfs_peb_temp_read_buffers { + struct rw_semaphore lock; + struct ssdfs_peb_read_buffer blk_desc; +}; + +/* + * struct ssdfs_peb_temp_buffer - temporary (write) buffer + * @ptr: pointer on buffer + * @write_offset: current write offset into buffer + * @granularity: size of one item in bytes + * @size: buffer size in bytes + */ +struct ssdfs_peb_temp_buffer { + void *ptr; + u32 write_offset; + size_t granularity; + size_t size; +}; + +/* + * struct ssdfs_peb_area_metadata - descriptor of area's items chain + * @area.blk_desc.table: block descriptors area table + * @area.blk_desc.flush_buf: write block descriptors buffer (compression case) + * @area.blk_desc.capacity: max number of block descriptors in reserved space + * @area.blk_desc.items_count: number of items in the whole table + * @area.diffs.table: diffs area's table + * @area.journal.table: journal area's table + * @area.main.desc: main area's descriptor + * @reserved_offset: reserved write offset of table + * @sequence_id: fragment's sequence number + */ +struct ssdfs_peb_area_metadata { + union { + struct { + struct ssdfs_area_block_table table; + struct ssdfs_peb_temp_buffer flush_buf; + int capacity; + int items_count; + } blk_desc; + + struct { + struct ssdfs_peb_diffs_area_metadata table; + } diffs; + + struct { + struct ssdfs_peb_journal_area_metadata table; + } journal; + + struct { + struct ssdfs_block_state_descriptor desc; + } main; + } area; + + u32 reserved_offset; + u8 sequence_id; +}; + +/* + * struct ssdfs_peb_area - log's area descriptor + * @has_metadata: does area contain metadata? + * @metadata: descriptor of area's items chain + * @write_offset: current write offset + * @compressed_offset: current write offset for compressed data + * @array: area's memory pages + */ +struct ssdfs_peb_area { + bool has_metadata; + struct ssdfs_peb_area_metadata metadata; + + u32 write_offset; + u32 compressed_offset; + struct ssdfs_page_array array; +}; + +/* Log possible states */ +enum { + SSDFS_LOG_UNKNOWN, + SSDFS_LOG_PREPARED, + SSDFS_LOG_INITIALIZED, + SSDFS_LOG_CREATED, + SSDFS_LOG_COMMITTED, + SSDFS_LOG_STATE_MAX, +}; + +/* + * struct ssdfs_peb_log - current log + * @lock: exclusive lock of current log + * @state: current log's state + * @sequence_id: index of partial log in the sequence + * @start_page: current log's start page index + * @pages_capacity: rest free pages in log + * @write_offset: current offset in bytes for adding data in log + * @seg_flags: segment header's flags for the log + * @prev_log_bmap_bytes: bytes count in block bitmap of previous log + * @last_log_time: creation timestamp of last log + * @last_log_cno: last log checkpoint + * @bmap_snapshot: snapshot of block bitmap + * @area: log's areas (main, diff updates, journal) + */ +struct ssdfs_peb_log { + struct mutex lock; + atomic_t state; + atomic_t sequence_id; + u32 start_page; + u32 reserved_pages; /* metadata pages in the log */ + u32 free_data_pages; /* free data pages capacity */ + u32 seg_flags; + u32 prev_log_bmap_bytes; + u64 last_log_time; + u64 last_log_cno; + struct ssdfs_page_vector bmap_snapshot; + struct ssdfs_peb_area area[SSDFS_LOG_AREA_MAX]; +}; + +/* + * struct ssdfs_peb_info - Physical Erase Block (PEB) description + * @peb_id: PEB number + * @peb_index: PEB index + * @log_pages: count of pages in full partial log + * @peb_create_time: PEB creation timestamp + * @peb_migration_id: identification number of PEB in migration sequence + * @state: PEB object state + * @init_end: wait of full init ending + * @reserved_bytes.blk_bmap: reserved bytes for block bitmap + * @reserved_bytes.blk2off_tbl: reserved bytes for blk2off table + * @reserved_bytes.blk_desc_tbl: reserved bytes for block descriptor table + * @current_log: PEB's current log + * @read_buffer: temporary read buffers (compression case) + * @env: init environment + * @cache: PEB's memory pages + * @pebc: pointer on parent container + */ +struct ssdfs_peb_info { + /* Static data */ + u64 peb_id; + u16 peb_index; + u32 log_pages; + + u64 peb_create_time; + + /* + * The peb_migration_id is stored in two places: + * (1) struct ssdfs_segment_header; + * (2) struct ssdfs_blk_state_offset. + * + * The goal of peb_migration_id is to distinguish PEB + * objects during PEB object's migration. Every + * destination PEB is received the migration_id that + * is incremented migration_id value of source PEB + * object. If peb_migration_id is achieved value of + * SSDFS_PEB_MIGRATION_ID_MAX then peb_migration_id + * is started from SSDFS_PEB_MIGRATION_ID_START again. + * + * A PEB object is received the peb_migration_id value + * during the PEB object creation operation. The "clean" + * PEB object receives SSDFS_PEB_MIGRATION_ID_START + * value. The destinaton PEB object receives incremented + * peb_migration_id value of source PEB object during + * creation operation. Otherwise, the real peb_migration_id + * value is set during PEB's initialization + * by means of extracting the actual value from segment + * header. + */ + atomic_t peb_migration_id; + + atomic_t state; + struct completion init_end; + + /* Reserved bytes */ + struct { + atomic_t blk_bmap; + atomic_t blk2off_tbl; + atomic_t blk_desc_tbl; + } reserved_bytes; + + /* Current log */ + struct ssdfs_peb_log current_log; + + /* Read buffer */ + struct ssdfs_peb_temp_read_buffers read_buffer; + + /* Init environment */ + struct ssdfs_read_init_env env; + + /* PEB's memory pages */ + struct ssdfs_page_array cache; + + /* Parent container */ + struct ssdfs_peb_container *pebc; +}; + +/* PEB object states */ +enum { + SSDFS_PEB_OBJECT_UNKNOWN_STATE, + SSDFS_PEB_OBJECT_CREATED, + SSDFS_PEB_OBJECT_INITIALIZED, + SSDFS_PEB_OBJECT_STATE_MAX +}; + +#define SSDFS_AREA_TYPE2INDEX(type)({ \ + int index; \ + switch (type) { \ + case SSDFS_LOG_BLK_DESC_AREA: \ + index = SSDFS_BLK_DESC_AREA_INDEX; \ + break; \ + case SSDFS_LOG_MAIN_AREA: \ + index = SSDFS_COLD_PAYLOAD_AREA_INDEX; \ + break; \ + case SSDFS_LOG_DIFFS_AREA: \ + index = SSDFS_WARM_PAYLOAD_AREA_INDEX; \ + break; \ + case SSDFS_LOG_JOURNAL_AREA: \ + index = SSDFS_HOT_PAYLOAD_AREA_INDEX; \ + break; \ + default: \ + BUG(); \ + }; \ + index; \ +}) + +#define SSDFS_AREA_TYPE2FLAG(type)({ \ + int flag; \ + switch (type) { \ + case SSDFS_LOG_BLK_DESC_AREA: \ + flag = SSDFS_LOG_HAS_BLK_DESC_CHAIN; \ + break; \ + case SSDFS_LOG_MAIN_AREA: \ + flag = SSDFS_LOG_HAS_COLD_PAYLOAD; \ + break; \ + case SSDFS_LOG_DIFFS_AREA: \ + flag = SSDFS_LOG_HAS_WARM_PAYLOAD; \ + break; \ + case SSDFS_LOG_JOURNAL_AREA: \ + flag = SSDFS_LOG_HAS_HOT_PAYLOAD; \ + break; \ + default: \ + BUG(); \ + }; \ + flag; \ +}) + +/* + * Inline functions + */ + +/* + * ssdfs_peb_correct_area_write_offset() - correct write offset + * @write_offset: current write offset + * @data_size: requested size of data + * + * This function checks that we can place whole data into current + * memory page. + * + * RETURN: corrected value of write offset. + */ +static inline +u32 ssdfs_peb_correct_area_write_offset(u32 write_offset, u32 data_size) +{ + u32 page_index1, page_index2; + u32 new_write_offset = write_offset + data_size; + + page_index1 = write_offset / PAGE_SIZE; + page_index2 = new_write_offset / PAGE_SIZE; + + if (page_index1 != page_index2) { + u32 calculated_write_offset = page_index2 * PAGE_SIZE; + + if (new_write_offset == calculated_write_offset) + return write_offset; + else + return calculated_write_offset; + } + + return write_offset; +} + +/* + * ssdfs_peb_current_log_lock() - lock current log object + * @pebi: pointer on PEB object + */ +static inline +void ssdfs_peb_current_log_lock(struct ssdfs_peb_info *pebi) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = mutex_lock_killable(&pebi->current_log.lock); + WARN_ON(err); +} + +/* + * ssdfs_peb_current_log_unlock() - unlock current log object + * @pebi: pointer on PEB object + */ +static inline +void ssdfs_peb_current_log_unlock(struct ssdfs_peb_info *pebi) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); + WARN_ON(!mutex_is_locked(&pebi->current_log.lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + mutex_unlock(&pebi->current_log.lock); +} + +static inline +bool is_ssdfs_peb_current_log_locked(struct ssdfs_peb_info *pebi) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); +#endif /* CONFIG_SSDFS_DEBUG */ + + return mutex_is_locked(&pebi->current_log.lock); +} + +/* + * ssdfs_peb_current_log_state() - check current log's state + * @pebi: pointer on PEB object + * @state: checked state + */ +static inline +bool ssdfs_peb_current_log_state(struct ssdfs_peb_info *pebi, + int state) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); + BUG_ON(state < SSDFS_LOG_UNKNOWN || state >= SSDFS_LOG_STATE_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + return atomic_read(&pebi->current_log.state) >= state; +} + +/* + * ssdfs_peb_set_current_log_state() - set current log's state + * @pebi: pointer on PEB object + * @state: new log's state + */ +static inline +void ssdfs_peb_set_current_log_state(struct ssdfs_peb_info *pebi, + int state) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); + BUG_ON(state < SSDFS_LOG_UNKNOWN || state >= SSDFS_LOG_STATE_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + return atomic_set(&pebi->current_log.state, state); +} + +/* + * ssdfs_peb_current_log_init() - initialize current log object + * @pebi: pointer on PEB object + * @free_pages: free pages in the current log + * @start_page: start page of the current log + * @sequence_id: index of partial log in the sequence + * @prev_log_bmap_bytes: bytes count in block bitmap of previous log + */ +static inline +void ssdfs_peb_current_log_init(struct ssdfs_peb_info *pebi, + u32 free_pages, + u32 start_page, + int sequence_id, + u32 prev_log_bmap_bytes) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); + + SSDFS_DBG("peb_id %llu, " + "pebi->current_log.start_page %u, " + "free_pages %u, sequence_id %d, " + "prev_log_bmap_bytes %u\n", + pebi->peb_id, start_page, free_pages, + sequence_id, prev_log_bmap_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_peb_current_log_lock(pebi); + pebi->current_log.start_page = start_page; + pebi->current_log.free_data_pages = free_pages; + pebi->current_log.prev_log_bmap_bytes = prev_log_bmap_bytes; + atomic_set(&pebi->current_log.sequence_id, sequence_id); + atomic_set(&pebi->current_log.state, SSDFS_LOG_INITIALIZED); + ssdfs_peb_current_log_unlock(pebi); +} + +/* + * ssdfs_get_leb_id_for_peb_index() - convert PEB's index into LEB's ID + * @fsi: pointer on shared file system object + * @seg: segment number + * @peb_index: index of PEB object in array + * + * This function converts PEB's index into LEB's identification + * number. + * + * RETURN: + * [success] - LEB's identification number. + * [failure] - U64_MAX. + */ +static inline +u64 ssdfs_get_leb_id_for_peb_index(struct ssdfs_fs_info *fsi, + u64 seg, u32 peb_index) +{ + u64 leb_id = U64_MAX; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + + if (peb_index >= fsi->pebs_per_seg) { + SSDFS_ERR("requested peb_index %u >= pebs_per_seg %u\n", + peb_index, fsi->pebs_per_seg); + return U64_MAX; + } + + SSDFS_DBG("fsi %p, seg %llu, peb_index %u\n", + fsi, seg, peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (fsi->lebs_per_peb_index == SSDFS_LEBS_PER_PEB_INDEX_DEFAULT) + leb_id = (seg * fsi->pebs_per_seg) + peb_index; + else + leb_id = seg + (peb_index * fsi->lebs_per_peb_index); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb_index %u, leb_id %llu\n", + seg, peb_index, leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + return leb_id; +} + +/* + * ssdfs_get_seg_id_for_leb_id() - convert LEB's into segment's ID + * @fsi: pointer on shared file system object + * @leb_id: LEB ID + * + * This function converts LEB's ID into segment's identification + * number. + * + * RETURN: + * [success] - LEB's identification number. + * [failure] - U64_MAX. + */ +static inline +u64 ssdfs_get_seg_id_for_leb_id(struct ssdfs_fs_info *fsi, + u64 leb_id) +{ + u64 seg_id = U64_MAX; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + + SSDFS_DBG("fsi %p, leb_id %llu\n", + fsi, leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (fsi->lebs_per_peb_index == SSDFS_LEBS_PER_PEB_INDEX_DEFAULT) + seg_id = div_u64(leb_id, fsi->pebs_per_seg); + else + seg_id = div_u64(leb_id, fsi->lebs_per_peb_index); + + return seg_id; +} + +/* + * ssdfs_get_peb_migration_id() - get PEB's migration ID + * @pebi: pointer on PEB object + */ +static inline +int ssdfs_get_peb_migration_id(struct ssdfs_peb_info *pebi) +{ + int id; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); +#endif /* CONFIG_SSDFS_DEBUG */ + + id = atomic_read(&pebi->peb_migration_id); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(id >= U8_MAX); + BUG_ON(id < 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + return id; +} + +/* + * is_peb_migration_id_valid() - check PEB's migration_id + * @peb_migration_id: PEB's migration ID value + */ +static inline +bool is_peb_migration_id_valid(int peb_migration_id) +{ + if (peb_migration_id < 0 || + peb_migration_id > SSDFS_PEB_MIGRATION_ID_MAX) { + /* preliminary check */ + return false; + } + + switch (peb_migration_id) { + case SSDFS_PEB_MIGRATION_ID_MAX: + case SSDFS_PEB_UNKNOWN_MIGRATION_ID: + return false; + } + + return true; +} + +/* + * ssdfs_get_peb_migration_id_checked() - get checked PEB's migration ID + * @pebi: pointer on PEB object + */ +static inline +int ssdfs_get_peb_migration_id_checked(struct ssdfs_peb_info *pebi) +{ + int res, err; + + switch (atomic_read(&pebi->state)) { + case SSDFS_PEB_OBJECT_CREATED: + err = SSDFS_WAIT_COMPLETION(&pebi->init_end); + if (unlikely(err)) { + SSDFS_ERR("PEB init failed: " + "err %d\n", err); + return err; + } + + if (atomic_read(&pebi->state) != SSDFS_PEB_OBJECT_INITIALIZED) { + SSDFS_ERR("PEB %llu is not initialized\n", + pebi->peb_id); + return -ERANGE; + } + break; + + case SSDFS_PEB_OBJECT_INITIALIZED: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid PEB state %#x\n", + atomic_read(&pebi->state)); + return -ERANGE; + } + + res = ssdfs_get_peb_migration_id(pebi); + + if (!is_peb_migration_id_valid(res)) { + res = -ERANGE; + SSDFS_WARN("invalid peb_migration_id: " + "peb %llu, peb_index %u, id %d\n", + pebi->peb_id, pebi->peb_index, res); + } + + return res; +} + +/* + * ssdfs_set_peb_migration_id() - set PEB's migration ID + * @pebi: pointer on PEB object + * @id: new PEB's migration_id + */ +static inline +void ssdfs_set_peb_migration_id(struct ssdfs_peb_info *pebi, + int id) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); + + SSDFS_DBG("peb_id %llu, peb_migration_id %d\n", + pebi->peb_id, id); +#endif /* CONFIG_SSDFS_DEBUG */ + + atomic_set(&pebi->peb_migration_id, id); +} + +static inline +int __ssdfs_define_next_peb_migration_id(int prev_id) +{ + int id = prev_id; + + if (id < 0) + return SSDFS_PEB_MIGRATION_ID_START; + + id += 1; + + if (id >= SSDFS_PEB_MIGRATION_ID_MAX) + id = SSDFS_PEB_MIGRATION_ID_START; + + return id; +} + +/* + * ssdfs_define_next_peb_migration_id() - define next PEB's migration_id + * @pebi: pointer on source PEB object + */ +static inline +int ssdfs_define_next_peb_migration_id(struct ssdfs_peb_info *src_peb) +{ + int id; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!src_peb); + + SSDFS_DBG("peb %llu, peb_index %u\n", + src_peb->peb_id, src_peb->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + id = ssdfs_get_peb_migration_id_checked(src_peb); + if (id < 0) { + SSDFS_ERR("fail to get peb_migration_id: " + "peb %llu, peb_index %u, err %d\n", + src_peb->peb_id, src_peb->peb_index, + id); + return SSDFS_PEB_MIGRATION_ID_MAX; + } + + return __ssdfs_define_next_peb_migration_id(id); +} + +/* + * ssdfs_define_prev_peb_migration_id() - define prev PEB's migration_id + * @pebi: pointer on source PEB object + */ +static inline +int ssdfs_define_prev_peb_migration_id(struct ssdfs_peb_info *pebi) +{ + int id; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); + + SSDFS_DBG("peb %llu, peb_index %u\n", + pebi->peb_id, pebi->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + id = ssdfs_get_peb_migration_id_checked(pebi); + if (id < 0) { + SSDFS_ERR("fail to get peb_migration_id: " + "peb %llu, peb_index %u, err %d\n", + pebi->peb_id, pebi->peb_index, + id); + return SSDFS_PEB_MIGRATION_ID_MAX; + } + + id--; + + if (id == SSDFS_PEB_UNKNOWN_MIGRATION_ID) + id = SSDFS_PEB_MIGRATION_ID_MAX - 1; + + return id; +} + +/* + * IS_SSDFS_BLK_STATE_OFFSET_INVALID() - check that block state offset invalid + * @desc: block state offset + */ +static inline +bool IS_SSDFS_BLK_STATE_OFFSET_INVALID(struct ssdfs_blk_state_offset *desc) +{ + if (!desc) + return true; + + if (le16_to_cpu(desc->log_start_page) == U16_MAX && + desc->log_area == U8_MAX && + desc->peb_migration_id == U8_MAX && + le32_to_cpu(desc->byte_offset) == U32_MAX) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log_start_page %u, log_area %u, " + "peb_migration_id %u, byte_offset %u\n", + le16_to_cpu(desc->log_start_page), + desc->log_area, + desc->peb_migration_id, + le32_to_cpu(desc->byte_offset)); +#endif /* CONFIG_SSDFS_DEBUG */ + return true; + } + + if (desc->peb_migration_id == SSDFS_PEB_UNKNOWN_MIGRATION_ID) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log_start_page %u, log_area %u, " + "peb_migration_id %u, byte_offset %u\n", + le16_to_cpu(desc->log_start_page), + desc->log_area, + desc->peb_migration_id, + le32_to_cpu(desc->byte_offset)); +#endif /* CONFIG_SSDFS_DEBUG */ + return true; + } + + return false; +} + +/* + * SSDFS_BLK_DESC_INIT() - init block descriptor + * @blk_desc: block descriptor + */ +static inline +void SSDFS_BLK_DESC_INIT(struct ssdfs_block_descriptor *blk_desc) +{ + if (!blk_desc) { + SSDFS_WARN("block descriptor pointer is NULL\n"); + return; + } + + memset(blk_desc, 0xFF, sizeof(struct ssdfs_block_descriptor)); +} + +/* + * IS_SSDFS_BLK_DESC_EXHAUSTED() - check that block descriptor is exhausted + * @blk_desc: block descriptor + */ +static inline +bool IS_SSDFS_BLK_DESC_EXHAUSTED(struct ssdfs_block_descriptor *blk_desc) +{ + struct ssdfs_blk_state_offset *offset = NULL; + + if (!blk_desc) + return true; + + offset = &blk_desc->state[SSDFS_BLK_STATE_OFF_MAX - 1]; + + if (!IS_SSDFS_BLK_STATE_OFFSET_INVALID(offset)) + return true; + + return false; +} + +static inline +bool IS_SSDFS_BLK_DESC_READY_FOR_DIFF(struct ssdfs_block_descriptor *blk_desc) +{ + return !IS_SSDFS_BLK_STATE_OFFSET_INVALID(&blk_desc->state[0]); +} + +static inline +u8 SSDFS_GET_BLK_DESC_MIGRATION_ID(struct ssdfs_block_descriptor *blk_desc) +{ + if (IS_SSDFS_BLK_STATE_OFFSET_INVALID(&blk_desc->state[0])) + return U8_MAX; + + return blk_desc->state[0].peb_migration_id; +} + +static inline +void DEBUG_BLOCK_DESCRIPTOR(u64 seg_id, u64 peb_id, + struct ssdfs_block_descriptor *blk_desc) +{ +#ifdef CONFIG_SSDFS_DEBUG + int i; + + SSDFS_DBG("seg_id %llu, peb_id %llu, ino %llu, " + "logical_offset %u, peb_index %u, peb_page %u\n", + seg_id, peb_id, + le64_to_cpu(blk_desc->ino), + le32_to_cpu(blk_desc->logical_offset), + le16_to_cpu(blk_desc->peb_index), + le16_to_cpu(blk_desc->peb_page)); + + for (i = 0; i < SSDFS_BLK_STATE_OFF_MAX; i++) { + SSDFS_DBG("BLK STATE OFFSET %d: " + "log_start_page %u, log_area %#x, " + "byte_offset %u, peb_migration_id %u\n", + i, + le16_to_cpu(blk_desc->state[i].log_start_page), + blk_desc->state[i].log_area, + le32_to_cpu(blk_desc->state[i].byte_offset), + blk_desc->state[i].peb_migration_id); + } +#endif /* CONFIG_SSDFS_DEBUG */ +} + +/* + * PEB object's API + */ +int ssdfs_peb_object_create(struct ssdfs_peb_info *pebi, + struct ssdfs_peb_container *pebc, + u64 peb_id, int peb_state, + u8 peb_migration_id); +int ssdfs_peb_object_destroy(struct ssdfs_peb_info *pebi); + +/* + * PEB internal functions declaration + */ +int ssdfs_unaligned_read_cache(struct ssdfs_peb_info *pebi, + u32 area_offset, u32 area_size, + void *buf); +int ssdfs_peb_read_log_hdr_desc_array(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + u16 log_start_page, + struct ssdfs_metadata_descriptor *array, + size_t array_size); +u16 ssdfs_peb_estimate_min_partial_log_pages(struct ssdfs_peb_info *pebi); +bool is_ssdfs_peb_exhausted(struct ssdfs_fs_info *fsi, + struct ssdfs_peb_info *pebi); +bool is_ssdfs_peb_ready_to_exhaust(struct ssdfs_fs_info *fsi, + struct ssdfs_peb_info *pebi); +int ssdfs_peb_realloc_read_buffer(struct ssdfs_peb_read_buffer *buf, + size_t new_size); +int ssdfs_peb_realloc_write_buffer(struct ssdfs_peb_temp_buffer *buf); + +#endif /* _SSDFS_PEB_H */ From patchwork Sat Feb 25 01:08:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151926 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09A3DC64ED8 for ; Sat, 25 Feb 2023 01:16:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229604AbjBYBQy (ORCPT ); Fri, 24 Feb 2023 20:16:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48678 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229700AbjBYBQ2 (ORCPT ); Fri, 24 Feb 2023 20:16:28 -0500 Received: from mail-oi1-x234.google.com (mail-oi1-x234.google.com [IPv6:2607:f8b0:4864:20::234]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C7BF2136F7 for ; Fri, 24 Feb 2023 17:16:20 -0800 (PST) Received: by mail-oi1-x234.google.com with SMTP id q15so779933oiw.11 for ; Fri, 24 Feb 2023 17:16:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=WowGxjzXwI7KdW98ZlrNVWxoR0YY4GE4CFG7i8zaVM0=; b=45kbuxv4MUgJMH9mRYfHk0HFeQk59RVEl2GTLgVZ0V4vZg+YmGvBjDc60glmvhEQZ6 NYpMOuSNz5fdkRqd9zJcdCsMH7WP6e3qfVR/0tj9s1h2CwTMNWYXJkJbn+Opjc3qpFK6 T3g181vCXXRGkVuC00xkAdnxV1DDdhsvrkfQwFELT/jVY9PeejKkyZiE1QpUgF86j4/9 AJeOJCYa8NzAtLNwJFaj8Ls3BtKrsvuiq910Rl9NaJan99ra4NOOMiL+YmXNMB1jpV4G 7MRyNKgxCXPGNjvClj3FlhYxDjcmn66JS7UmPpQgYr97EPKKnhf+i91/nduu/gcDTrZg lDHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WowGxjzXwI7KdW98ZlrNVWxoR0YY4GE4CFG7i8zaVM0=; b=PYC7ZGS9e2dLBZy7/QAoi6z+rqnN79lUl0nPcCRWh5brOjTkEbOqOxie3ShP3uML8U ojNjR7plBNHuU02mPJ1xA3YC24czlZ52YV/Sl27tg377a3k9Qa0JH0gJpPYHduy80tvw GH8xK0t5bbRFN8KCYONxFWmNGY56s3F4lxob8uICNygDiKlrlpZkgAg0o+g2dji6ddOn ex2SzqYegjZpvMu2sDEjNG75BTHz3o5lSHmudNIsvYdawC94YxFXVg4715Ac3TfuMzAy NELWuXN32+e7wzgiGjO/k0cUwt9vPzzSp7ACv7ZKFfAsB8rjrCjsEJZ19UFIL14mgbIG HaXw== X-Gm-Message-State: AO0yUKXBA/ZLOB2WFjx5zH8HH9bSHb49q7KpW8EeTQhA2VnLum9Cmz6I wIQHgkbJte02sXQ4aeqyB5tv006EFVfApsT9 X-Google-Smtp-Source: AK7set/WiH6uS/jvaXa5QHg9yJLV+DDX/fYOZ+XNITCzr0iKrKl3milLBPC/k/I8tpocF4OPTQ/9nA== X-Received: by 2002:a05:6808:b25:b0:364:9c99:f6cc with SMTP id t5-20020a0568080b2500b003649c99f6ccmr4355279oij.22.1677287779378; Fri, 24 Feb 2023 17:16:19 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:18 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 21/76] ssdfs: introduce PEB container Date: Fri, 24 Feb 2023 17:08:32 -0800 Message-Id: <20230225010927.813929-22-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org SSDFS implements a migration scheme. Migration scheme is a fundamental technique of GC overhead management. The key responsibility of the migration scheme is to guarantee the presence of data in the same segment for any update operations. Generally speaking, the migration scheme’s model is implemented on the basis of association an exhausted "Physical" Erase Block (PEB) with a clean one. The goal such association of two PEBs is to implement the gradual migration of data by means of the update operations in the initial (exhausted) PEB. As a result, the old, exhausted PEB becomes invalidated after complete data migration and it will be possible to apply the erase operation to convert it in the clean state. To implement the migration scheme concept, SSDFS introduces PEB container that includes source and destination erase blocks. PEB container object keeps the pointers on source and destination PEB objects during migration logic execution. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/compr_lzo.c | 256 ++++++ fs/ssdfs/compr_zlib.c | 359 +++++++++ fs/ssdfs/compression.c | 548 +++++++++++++ fs/ssdfs/compression.h | 104 +++ fs/ssdfs/peb_container.h | 291 +++++++ fs/ssdfs/peb_migration_scheme.c | 1302 +++++++++++++++++++++++++++++++ 6 files changed, 2860 insertions(+) create mode 100644 fs/ssdfs/compr_lzo.c create mode 100644 fs/ssdfs/compr_zlib.c create mode 100644 fs/ssdfs/compression.c create mode 100644 fs/ssdfs/compression.h create mode 100644 fs/ssdfs/peb_container.h create mode 100644 fs/ssdfs/peb_migration_scheme.c diff --git a/fs/ssdfs/compr_lzo.c b/fs/ssdfs/compr_lzo.c new file mode 100644 index 000000000000..c3b71b1f9842 --- /dev/null +++ b/fs/ssdfs/compr_lzo.c @@ -0,0 +1,256 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/compr_lzo.c - LZO compression support. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "compression.h" + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_lzo_page_leaks; +atomic64_t ssdfs_lzo_memory_leaks; +atomic64_t ssdfs_lzo_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_lzo_cache_leaks_increment(void *kaddr) + * void ssdfs_lzo_cache_leaks_decrement(void *kaddr) + * void *ssdfs_lzo_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_lzo_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_lzo_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_lzo_kfree(void *kaddr) + * struct page *ssdfs_lzo_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_lzo_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_lzo_free_page(struct page *page) + * void ssdfs_lzo_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(lzo) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(lzo) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_lzo_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_lzo_page_leaks, 0); + atomic64_set(&ssdfs_lzo_memory_leaks, 0); + atomic64_set(&ssdfs_lzo_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_lzo_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_lzo_page_leaks) != 0) { + SSDFS_ERR("LZO: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_lzo_page_leaks)); + } + + if (atomic64_read(&ssdfs_lzo_memory_leaks) != 0) { + SSDFS_ERR("LZO: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_lzo_memory_leaks)); + } + + if (atomic64_read(&ssdfs_lzo_cache_leaks) != 0) { + SSDFS_ERR("LZO: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_lzo_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +static int ssdfs_lzo_compress(struct list_head *ws_ptr, + unsigned char *data_in, + unsigned char *cdata_out, + size_t *srclen, size_t *destlen); + +static int ssdfs_lzo_decompress(struct list_head *ws_ptr, + unsigned char *cdata_in, + unsigned char *data_out, + size_t srclen, size_t destlen); + +static struct list_head *ssdfs_lzo_alloc_workspace(void); +static void ssdfs_lzo_free_workspace(struct list_head *ptr); + +static const struct ssdfs_compress_ops ssdfs_lzo_compress_ops = { + .alloc_workspace = ssdfs_lzo_alloc_workspace, + .free_workspace = ssdfs_lzo_free_workspace, + .compress = ssdfs_lzo_compress, + .decompress = ssdfs_lzo_decompress, +}; + +static struct ssdfs_compressor lzo_compr = { + .type = SSDFS_COMPR_LZO, + .compr_ops = &ssdfs_lzo_compress_ops, + .name = "lzo", +}; + +struct ssdfs_lzo_workspace { + void *mem; + void *cbuf; /* where compressed data goes */ + struct list_head list; +}; + +static void ssdfs_lzo_free_workspace(struct list_head *ptr) +{ + struct ssdfs_lzo_workspace *workspace; + + workspace = list_entry(ptr, struct ssdfs_lzo_workspace, list); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("workspace %p\n", workspace); +#endif /* CONFIG_SSDFS_DEBUG */ + + vfree(workspace->cbuf); + vfree(workspace->mem); + ssdfs_lzo_kfree(workspace); +} + +static struct list_head *ssdfs_lzo_alloc_workspace(void) +{ + struct ssdfs_lzo_workspace *workspace; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try to allocate workspace\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + workspace = ssdfs_lzo_kzalloc(sizeof(*workspace), GFP_KERNEL); + if (unlikely(!workspace)) + goto failed_alloc_workspaces; + + workspace->mem = vmalloc(LZO1X_MEM_COMPRESS); + workspace->cbuf = vmalloc(lzo1x_worst_compress(PAGE_SIZE)); + if (!workspace->mem || !workspace->cbuf) + goto failed_alloc_workspaces; + + INIT_LIST_HEAD(&workspace->list); + + return &workspace->list; + +failed_alloc_workspaces: + SSDFS_ERR("unable to allocate memory for workspace\n"); + ssdfs_lzo_free_workspace(&workspace->list); + return ERR_PTR(-ENOMEM); +} + +int ssdfs_lzo_init(void) +{ + return ssdfs_register_compressor(&lzo_compr); +} + +void ssdfs_lzo_exit(void) +{ + ssdfs_unregister_compressor(&lzo_compr); +} + +static int ssdfs_lzo_compress(struct list_head *ws, + unsigned char *data_in, + unsigned char *cdata_out, + size_t *srclen, size_t *destlen) +{ + struct ssdfs_lzo_workspace *workspace; + size_t compress_size; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ws_ptr %p, data_in %p, cdata_out %p, " + "srclen ptr %p, destlen ptr %p\n", + ws, data_in, cdata_out, srclen, destlen); + + BUG_ON(!ws || !data_in || !cdata_out || !srclen || !destlen); +#endif /* CONFIG_SSDFS_DEBUG */ + + workspace = list_entry(ws, struct ssdfs_lzo_workspace, list); + + err = lzo1x_1_compress(data_in, *srclen, workspace->cbuf, + &compress_size, workspace->mem); + if (err != LZO_E_OK) { + SSDFS_ERR("LZO compression failed: internal err %d, " + "srclen %zu, destlen %zu\n", + err, *srclen, *destlen); + err = -EINVAL; + goto failed_compress; + } + + if (compress_size > *destlen) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to compress: compress_size %zu, " + "destlen %zu\n", + compress_size, *destlen); +#endif /* CONFIG_SSDFS_DEBUG */ + err = -E2BIG; + goto failed_compress; + } + + ssdfs_memcpy(cdata_out, 0, *destlen, + workspace->cbuf, 0, lzo1x_worst_compress(PAGE_SIZE), + compress_size); + *destlen = compress_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("compress has succeded: srclen %zu, destlen %zu\n", + *srclen, *destlen); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; + +failed_compress: + return err; +} + +static int ssdfs_lzo_decompress(struct list_head *ws, + unsigned char *cdata_in, + unsigned char *data_out, + size_t srclen, size_t destlen) +{ + size_t dl = destlen; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ws_ptr %p, cdata_in %p, data_out %p, " + "srclen %zu, destlen %zu\n", + ws, cdata_in, data_out, srclen, destlen); + + BUG_ON(!ws || !cdata_in || !data_out); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = lzo1x_decompress_safe(cdata_in, srclen, data_out, &dl); + + if (err != LZO_E_OK || dl != destlen) { + SSDFS_ERR("decompression failed: LZO compressor err %d, " + "srclen %zu, destlen %zu\n", + err, srclen, destlen); + return -EINVAL; + } + + return 0; +} diff --git a/fs/ssdfs/compr_zlib.c b/fs/ssdfs/compr_zlib.c new file mode 100644 index 000000000000..a410907dc531 --- /dev/null +++ b/fs/ssdfs/compr_zlib.c @@ -0,0 +1,359 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/compr_zlib.c - ZLIB compression support. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "compression.h" + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_zlib_page_leaks; +atomic64_t ssdfs_zlib_memory_leaks; +atomic64_t ssdfs_zlib_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_zlib_cache_leaks_increment(void *kaddr) + * void ssdfs_zlib_cache_leaks_decrement(void *kaddr) + * void *ssdfs_zlib_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_zlib_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_zlib_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_zlib_kfree(void *kaddr) + * struct page *ssdfs_zlib_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_zlib_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_zlib_free_page(struct page *page) + * void ssdfs_zlib_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(zlib) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(zlib) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_zlib_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_zlib_page_leaks, 0); + atomic64_set(&ssdfs_zlib_memory_leaks, 0); + atomic64_set(&ssdfs_zlib_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_zlib_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_zlib_page_leaks) != 0) { + SSDFS_ERR("ZLIB: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_zlib_page_leaks)); + } + + if (atomic64_read(&ssdfs_zlib_memory_leaks) != 0) { + SSDFS_ERR("ZLIB: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_zlib_memory_leaks)); + } + + if (atomic64_read(&ssdfs_zlib_cache_leaks) != 0) { + SSDFS_ERR("ZLIB: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_zlib_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +#define COMPR_LEVEL CONFIG_SSDFS_ZLIB_COMR_LEVEL + +static int ssdfs_zlib_compress(struct list_head *ws_ptr, + unsigned char *data_in, + unsigned char *cdata_out, + size_t *srclen, size_t *destlen); + +static int ssdfs_zlib_decompress(struct list_head *ws_ptr, + unsigned char *cdata_in, + unsigned char *data_out, + size_t srclen, size_t destlen); + +static struct list_head *ssdfs_zlib_alloc_workspace(void); +static void ssdfs_zlib_free_workspace(struct list_head *ptr); + +static const struct ssdfs_compress_ops ssdfs_zlib_compress_ops = { + .alloc_workspace = ssdfs_zlib_alloc_workspace, + .free_workspace = ssdfs_zlib_free_workspace, + .compress = ssdfs_zlib_compress, + .decompress = ssdfs_zlib_decompress, +}; + +static struct ssdfs_compressor zlib_compr = { + .type = SSDFS_COMPR_ZLIB, + .compr_ops = &ssdfs_zlib_compress_ops, + .name = "zlib", +}; + +struct ssdfs_zlib_workspace { + z_stream inflate_stream; + z_stream deflate_stream; + struct list_head list; +}; + +static void ssdfs_zlib_free_workspace(struct list_head *ptr) +{ + struct ssdfs_zlib_workspace *workspace; + + workspace = list_entry(ptr, struct ssdfs_zlib_workspace, list); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("workspace %p\n", workspace); +#endif /* CONFIG_SSDFS_DEBUG */ + + vfree(workspace->deflate_stream.workspace); + vfree(workspace->inflate_stream.workspace); + ssdfs_zlib_kfree(workspace); +} + +static struct list_head *ssdfs_zlib_alloc_workspace(void) +{ + struct ssdfs_zlib_workspace *workspace; + int deflate_size, inflate_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try to allocate workspace\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + workspace = ssdfs_zlib_kzalloc(sizeof(*workspace), GFP_KERNEL); + if (unlikely(!workspace)) { + SSDFS_ERR("unable to allocate memory for workspace\n"); + return ERR_PTR(-ENOMEM); + } + + deflate_size = zlib_deflate_workspacesize(MAX_WBITS, MAX_MEM_LEVEL); + workspace->deflate_stream.workspace = vmalloc(deflate_size); + if (unlikely(!workspace->deflate_stream.workspace)) { + SSDFS_ERR("unable to allocate memory for deflate stream\n"); + goto failed_alloc_workspaces; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("deflate stream size %d\n", deflate_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + inflate_size = zlib_inflate_workspacesize(); + workspace->inflate_stream.workspace = vmalloc(inflate_size); + if (unlikely(!workspace->inflate_stream.workspace)) { + SSDFS_ERR("unable to allocate memory for inflate stream\n"); + goto failed_alloc_workspaces; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("inflate stream size %d\n", inflate_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + INIT_LIST_HEAD(&workspace->list); + + return &workspace->list; + +failed_alloc_workspaces: + ssdfs_zlib_free_workspace(&workspace->list); + return ERR_PTR(-ENOMEM); +} + +int ssdfs_zlib_init(void) +{ + return ssdfs_register_compressor(&zlib_compr); +} + +void ssdfs_zlib_exit(void) +{ + ssdfs_unregister_compressor(&zlib_compr); +} + +static int ssdfs_zlib_compress(struct list_head *ws, + unsigned char *data_in, + unsigned char *cdata_out, + size_t *srclen, size_t *destlen) +{ + struct ssdfs_zlib_workspace *workspace; + z_stream *stream; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ws || !data_in || !cdata_out || !srclen || !destlen); + + SSDFS_DBG("ws_ptr %p, data_in %p, cdata_out %p, " + "srclen %zu, destlen %zu\n", + ws, data_in, cdata_out, *srclen, *destlen); +#endif /* CONFIG_SSDFS_DEBUG */ + + workspace = list_entry(ws, struct ssdfs_zlib_workspace, list); + stream = &workspace->deflate_stream; + + if (Z_OK != zlib_deflateInit(stream, COMPR_LEVEL)) { + SSDFS_ERR("zlib_deflateInit() failed\n"); + err = -EINVAL; + goto failed_compress; + } + + stream->next_in = data_in; + stream->avail_in = *srclen; + stream->total_in = 0; + + stream->next_out = cdata_out; + stream->avail_out = *destlen; + stream->total_out = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("calling deflate with: " + "stream->avail_in %lu, stream->total_in %lu, " + "stream->avail_out %lu, stream->total_out %lu\n", + (unsigned long)stream->avail_in, + (unsigned long)stream->total_in, + (unsigned long)stream->avail_out, + (unsigned long)stream->total_out); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = zlib_deflate(stream, Z_FINISH); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("deflate returned with: " + "stream->avail_in %lu, stream->total_in %lu, " + "stream->avail_out %lu, stream->total_out %lu\n", + (unsigned long)stream->avail_in, + (unsigned long)stream->total_in, + (unsigned long)stream->avail_out, + (unsigned long)stream->total_out); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (err != Z_STREAM_END) { + if (err == Z_OK) { + err = -E2BIG; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to compress: " + "total_in %zu, total_out %zu\n", + stream->total_in, stream->total_out); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + SSDFS_ERR("ZLIB compression failed: " + "internal err %d\n", + err); + } + goto failed_compress; + } + + err = zlib_deflateEnd(stream); + if (err != Z_OK) { + SSDFS_ERR("ZLIB compression failed with internal err %d\n", + err); + goto failed_compress; + } + + if (stream->total_out >= stream->total_in) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to compress: total_in %zu, total_out %zu\n", + stream->total_in, stream->total_out); +#endif /* CONFIG_SSDFS_DEBUG */ + err = -E2BIG; + goto failed_compress; + } + + *destlen = stream->total_out; + *srclen = stream->total_in; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("compress has succeded: srclen %zu, destlen %zu\n", + *srclen, *destlen); +#endif /* CONFIG_SSDFS_DEBUG */ + +failed_compress: + return err; +} + +static int ssdfs_zlib_decompress(struct list_head *ws, + unsigned char *cdata_in, + unsigned char *data_out, + size_t srclen, size_t destlen) +{ + struct ssdfs_zlib_workspace *workspace; + int wbits = MAX_WBITS; + int ret = Z_OK; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ws || !cdata_in || !data_out); + + SSDFS_DBG("ws_ptr %p, cdata_in %p, data_out %p, " + "srclen %zu, destlen %zu\n", + ws, cdata_in, data_out, srclen, destlen); +#endif /* CONFIG_SSDFS_DEBUG */ + + workspace = list_entry(ws, struct ssdfs_zlib_workspace, list); + + workspace->inflate_stream.next_in = cdata_in; + workspace->inflate_stream.avail_in = srclen; + workspace->inflate_stream.total_in = 0; + + workspace->inflate_stream.next_out = data_out; + workspace->inflate_stream.avail_out = destlen; + workspace->inflate_stream.total_out = 0; + + /* + * If it's deflate, and it's got no preset dictionary, then + * we can tell zlib to skip the adler32 check. + */ + if (srclen > 2 && !(cdata_in[1] & PRESET_DICT) && + ((cdata_in[0] & 0x0f) == Z_DEFLATED) && + !(((cdata_in[0] << 8) + cdata_in[1]) % 31)) { + + wbits = -((cdata_in[0] >> 4) + 8); + workspace->inflate_stream.next_in += 2; + workspace->inflate_stream.avail_in -= 2; + } + + if (Z_OK != zlib_inflateInit2(&workspace->inflate_stream, wbits)) { + SSDFS_ERR("zlib_inflateInit2() failed\n"); + return -EINVAL; + } + + do { + ret = zlib_inflate(&workspace->inflate_stream, Z_FINISH); + } while (ret == Z_OK); + + zlib_inflateEnd(&workspace->inflate_stream); + + if (ret != Z_STREAM_END) { + SSDFS_ERR("inflate returned %d\n", ret); + return -EFAULT; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("decompression has succeded: " + "total_in %zu, total_out %zu\n", + workspace->inflate_stream.total_in, + workspace->inflate_stream.total_out); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} diff --git a/fs/ssdfs/compression.c b/fs/ssdfs/compression.c new file mode 100644 index 000000000000..78b67d342180 --- /dev/null +++ b/fs/ssdfs/compression.c @@ -0,0 +1,548 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/compression.c - compression logic implementation. + * + * Copyright (c) 2019-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "compression.h" + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_compr_page_leaks; +atomic64_t ssdfs_compr_memory_leaks; +atomic64_t ssdfs_compr_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_compr_cache_leaks_increment(void *kaddr) + * void ssdfs_compr_cache_leaks_decrement(void *kaddr) + * void *ssdfs_compr_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_compr_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_compr_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_compr_kfree(void *kaddr) + * struct page *ssdfs_compr_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_compr_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_compr_free_page(struct page *page) + * void ssdfs_compr_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(compr) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(compr) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_compr_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_compr_page_leaks, 0); + atomic64_set(&ssdfs_compr_memory_leaks, 0); + atomic64_set(&ssdfs_compr_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_compr_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_compr_page_leaks) != 0) { + SSDFS_ERR("COMPRESSION: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_compr_page_leaks)); + } + + if (atomic64_read(&ssdfs_compr_memory_leaks) != 0) { + SSDFS_ERR("COMPRESSION: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_compr_memory_leaks)); + } + + if (atomic64_read(&ssdfs_compr_cache_leaks) != 0) { + SSDFS_ERR("COMPRESSION: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_compr_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +struct ssdfs_compressor *ssdfs_compressors[SSDFS_COMPR_TYPES_CNT]; + +static struct list_head compr_idle_workspace[SSDFS_COMPR_TYPES_CNT]; +static spinlock_t compr_workspace_lock[SSDFS_COMPR_TYPES_CNT]; +static int compr_num_workspace[SSDFS_COMPR_TYPES_CNT]; +static atomic_t compr_alloc_workspace[SSDFS_COMPR_TYPES_CNT]; +static wait_queue_head_t compr_workspace_wait[SSDFS_COMPR_TYPES_CNT]; + +static inline bool unable_compress(int type) +{ + if (!ssdfs_compressors[type]) + return true; + else if (!ssdfs_compressors[type]->compr_ops) + return true; + else if (!ssdfs_compressors[type]->compr_ops->compress) + return true; + return false; +} + +static inline bool unable_decompress(int type) +{ + if (!ssdfs_compressors[type]) + return true; + else if (!ssdfs_compressors[type]->compr_ops) + return true; + else if (!ssdfs_compressors[type]->compr_ops->decompress) + return true; + return false; +} + +static int ssdfs_none_compress(struct list_head *ws_ptr, + unsigned char *data_in, + unsigned char *cdata_out, + size_t *srclen, size_t *destlen) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("data_in %p, cdata_out %p, srclen %p, destlen %p\n", + data_in, cdata_out, srclen, destlen); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (*srclen > *destlen) { + SSDFS_ERR("src_len %zu > dest_len %zu\n", + *srclen, *destlen); + return -E2BIG; + } + + err = ssdfs_memcpy(cdata_out, 0, PAGE_SIZE, + data_in, 0, PAGE_SIZE, + *srclen); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + return err; + } + + *destlen = *srclen; + return 0; +} + +static int ssdfs_none_decompress(struct list_head *ws_ptr, + unsigned char *cdata_in, + unsigned char *data_out, + size_t srclen, size_t destlen) +{ + /* TODO: implement ssdfs_none_decompress() */ + SSDFS_WARN("TODO: implement %s\n", __func__); + return -EOPNOTSUPP; +} + +static const struct ssdfs_compress_ops ssdfs_compr_none_ops = { + .compress = ssdfs_none_compress, + .decompress = ssdfs_none_decompress, +}; + +static struct ssdfs_compressor ssdfs_none_compr = { + .type = SSDFS_COMPR_NONE, + .compr_ops = &ssdfs_compr_none_ops, + .name = "none", +}; + +static inline bool unknown_compression(int type) +{ + return type < SSDFS_COMPR_NONE || type >= SSDFS_COMPR_TYPES_CNT; +} + +int ssdfs_register_compressor(struct ssdfs_compressor *compr) +{ + SSDFS_INFO("register %s compressor\n", compr->name); + ssdfs_compressors[compr->type] = compr; + return 0; +} + +int ssdfs_unregister_compressor(struct ssdfs_compressor *compr) +{ + SSDFS_INFO("unregister %s compressor\n", compr->name); + ssdfs_compressors[compr->type] = NULL; + return 0; +} + +int ssdfs_compressors_init(void) +{ + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("init compressors subsystem\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < SSDFS_COMPR_TYPES_CNT; i++) { + INIT_LIST_HEAD(&compr_idle_workspace[i]); + spin_lock_init(&compr_workspace_lock[i]); + atomic_set(&compr_alloc_workspace[i], 0); + init_waitqueue_head(&compr_workspace_wait[i]); + } + + err = ssdfs_zlib_init(); + if (err) + goto out; + + err = ssdfs_lzo_init(); + if (err) + goto zlib_exit; + + err = ssdfs_register_compressor(&ssdfs_none_compr); + if (err) + goto lzo_exit; + + return 0; + +lzo_exit: + ssdfs_lzo_exit(); + +zlib_exit: + ssdfs_zlib_exit(); + +out: + return err; +} + +void ssdfs_free_workspaces(void) +{ + struct list_head *workspace; + const struct ssdfs_compress_ops *ops; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("destruct auxiliary workspaces\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < SSDFS_COMPR_TYPES_CNT; i++) { + if (!ssdfs_compressors[i]) + continue; + + ops = ssdfs_compressors[i]->compr_ops; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ops); +#endif /* CONFIG_SSDFS_DEBUG */ + + while (!list_empty(&compr_idle_workspace[i])) { + workspace = compr_idle_workspace[i].next; + list_del(workspace); + if (ops->free_workspace) + ops->free_workspace(workspace); + atomic_dec(&compr_alloc_workspace[i]); + } + } +} + +void ssdfs_compressors_exit(void) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("deinitialize compressors subsystem\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_free_workspaces(); + ssdfs_unregister_compressor(&ssdfs_none_compr); + ssdfs_zlib_exit(); + ssdfs_lzo_exit(); +} + +/* + * Find an available workspace or allocate a new one. + * ERR_PTR is returned in the case of error. + */ +static struct list_head *ssdfs_find_workspace(int type) +{ + struct list_head *workspace; + int cpus; + struct list_head *idle_workspace; + spinlock_t *workspace_lock; + atomic_t *alloc_workspace; + wait_queue_head_t *workspace_wait; + int *num_workspace; + const struct ssdfs_compress_ops *ops; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("type %d\n", type); + + if (unknown_compression(type)) { + SSDFS_ERR("unknown compression type %d\n", type); + BUG(); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + ops = ssdfs_compressors[type]->compr_ops; + + if (!ops->alloc_workspace) + return ERR_PTR(-EOPNOTSUPP); + + cpus = num_online_cpus(); + idle_workspace = &compr_idle_workspace[type]; + workspace_lock = &compr_workspace_lock[type]; + alloc_workspace = &compr_alloc_workspace[type]; + workspace_wait = &compr_workspace_wait[type]; + num_workspace = &compr_num_workspace[type]; + +again: + spin_lock(workspace_lock); + + if (!list_empty(idle_workspace)) { + workspace = idle_workspace->next; + list_del(workspace); + (*num_workspace)--; + spin_unlock(workspace_lock); + return workspace; + } + + if (atomic_read(alloc_workspace) > cpus) { + DEFINE_WAIT(wait); + + spin_unlock(workspace_lock); + prepare_to_wait(workspace_wait, &wait, TASK_UNINTERRUPTIBLE); + if (atomic_read(alloc_workspace) > cpus && !*num_workspace) + schedule(); + finish_wait(workspace_wait, &wait); + goto again; + } + atomic_inc(alloc_workspace); + spin_unlock(workspace_lock); + + workspace = ops->alloc_workspace(); + if (IS_ERR(workspace)) { + atomic_dec(alloc_workspace); + wake_up(workspace_wait); + } + + return workspace; +} + +static void ssdfs_free_workspace(int type, struct list_head *workspace) +{ + struct list_head *idle_workspace; + spinlock_t *workspace_lock; + atomic_t *alloc_workspace; + wait_queue_head_t *workspace_wait; + int *num_workspace; + const struct ssdfs_compress_ops *ops; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("type %d, workspace %p\n", type, workspace); + + if (unknown_compression(type)) { + SSDFS_ERR("unknown compression type %d\n", type); + BUG(); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + ops = ssdfs_compressors[type]->compr_ops; + + if (!ops->free_workspace) + return; + + idle_workspace = &compr_idle_workspace[type]; + workspace_lock = &compr_workspace_lock[type]; + alloc_workspace = &compr_alloc_workspace[type]; + workspace_wait = &compr_workspace_wait[type]; + num_workspace = &compr_num_workspace[type]; + + spin_lock(workspace_lock); + if (*num_workspace < num_online_cpus()) { + list_add_tail(workspace, idle_workspace); + (*num_workspace)++; + spin_unlock(workspace_lock); + goto wake; + } + spin_unlock(workspace_lock); + + ops->free_workspace(workspace); + atomic_dec(alloc_workspace); +wake: + smp_mb(); + if (waitqueue_active(workspace_wait)) + wake_up(workspace_wait); +} + +#define SSDFS_DICT_SIZE 256 +#define SSDFS_MIN_MAX_DIFF_THRESHOLD 150 + +bool ssdfs_can_compress_data(struct page *page, + unsigned data_size) +{ + unsigned *counts; + unsigned found_symbols = 0; + unsigned min, max; + u8 *kaddr; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(data_size == 0 || data_size > PAGE_SIZE); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_ZLIB + if (CONFIG_SSDFS_ZLIB_COMR_LEVEL == Z_NO_COMPRESSION) + return false; +#endif /* CONFIG_SSDFS_DEBUG */ + + counts = ssdfs_compr_kzalloc(sizeof(unsigned) * SSDFS_DICT_SIZE, + GFP_KERNEL); + if (!counts) { + SSDFS_WARN("fail to alloc array\n"); + return true; + } + + min = SSDFS_DICT_SIZE; + max = 0; + + kaddr = (u8 *)kmap_local_page(page); + for (i = 0; i < data_size; i++) { + u8 *value = kaddr + i; + counts[*value]++; + if (counts[*value] == 1) + found_symbols++; + if (counts[*value] < min) + min = counts[*value]; + if (counts[*value] > max) + max = counts[*value]; + } + kunmap_local(kaddr); + + ssdfs_compr_kfree(counts); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("data_size %u, found_symbols %u, min %u, max %u\n", + data_size, found_symbols, min, max); +#endif /* CONFIG_SSDFS_DEBUG */ + + return (max - min) >= SSDFS_MIN_MAX_DIFF_THRESHOLD; +} + +int ssdfs_compress(int type, unsigned char *data_in, unsigned char *cdata_out, + size_t *srclen, size_t *destlen) +{ + const struct ssdfs_compress_ops *ops; + struct list_head *workspace = NULL; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("type %d, data_in %p, cdata_out %p, " + "srclen %zu, destlen %zu\n", + type, data_in, cdata_out, *srclen, *destlen); + + if (unknown_compression(type)) { + SSDFS_ERR("unknown compression type %d\n", type); + BUG(); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unable_compress(type)) { + SSDFS_ERR("unsupported compression type %d\n", type); + err = -EOPNOTSUPP; + goto failed_compress; + } + + workspace = ssdfs_find_workspace(type); + if (PTR_ERR(workspace) == -EOPNOTSUPP && + ssdfs_compressors[type]->type == SSDFS_COMPR_NONE) { + /* + * None compressor case. + * Simply call compress() operation. + */ + } else if (IS_ERR(workspace)) { + err = -ENOMEM; + goto failed_compress; + } + + ops = ssdfs_compressors[type]->compr_ops; + err = ops->compress(workspace, data_in, cdata_out, srclen, destlen); + + ssdfs_free_workspace(type, workspace); + if (err == -E2BIG) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("%s compressor is unable to compress data %p " + "of size %zu\n", + ssdfs_compressors[type]->name, + data_in, *srclen); +#endif /* CONFIG_SSDFS_DEBUG */ + goto failed_compress; + } else if (unlikely(err)) { + SSDFS_ERR("%s compressor fails to compress data %p " + "of size %zu because of err %d\n", + ssdfs_compressors[type]->name, + data_in, *srclen, err); + goto failed_compress; + } + + return 0; + +failed_compress: + return err; +} + +int ssdfs_decompress(int type, unsigned char *cdata_in, unsigned char *data_out, + size_t srclen, size_t destlen) +{ + const struct ssdfs_compress_ops *ops; + struct list_head *workspace; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("type %d, cdata_in %p, data_out %p, " + "srclen %zu, destlen %zu\n", + type, cdata_in, data_out, srclen, destlen); + + if (unknown_compression(type)) { + SSDFS_ERR("unknown compression type %d\n", type); + BUG(); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unable_decompress(type)) { + SSDFS_ERR("unsupported compression type %d\n", type); + err = -EOPNOTSUPP; + goto failed_decompress; + } + + workspace = ssdfs_find_workspace(type); + if (PTR_ERR(workspace) == -EOPNOTSUPP && + ssdfs_compressors[type]->type == SSDFS_COMPR_NONE) { + /* + * None compressor case. + * Simply call decompress() operation. + */ + } else if (IS_ERR(workspace)) { + err = -ENOMEM; + goto failed_decompress; + } + + ops = ssdfs_compressors[type]->compr_ops; + err = ops->decompress(workspace, cdata_in, data_out, srclen, destlen); + + ssdfs_free_workspace(type, workspace); + if (unlikely(err)) { + SSDFS_ERR("%s compresor fails to decompress data %p " + "of size %zu because of err %d\n", + ssdfs_compressors[type]->name, + cdata_in, srclen, err); + goto failed_decompress; + } + + return 0; + +failed_decompress: + return err; +} diff --git a/fs/ssdfs/compression.h b/fs/ssdfs/compression.h new file mode 100644 index 000000000000..77243b09babd --- /dev/null +++ b/fs/ssdfs/compression.h @@ -0,0 +1,104 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/compression.h - compression/decompression support declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#ifndef _SSDFS_COMPRESSION_H +#define _SSDFS_COMPRESSION_H + +/* + * SSDFS compression algorithms. + * + * SSDFS_COMPR_NONE: no compression + * SSDFS_COMPR_ZLIB: ZLIB compression + * SSDFS_COMPR_LZO: LZO compression + * SSDFS_COMPR_TYPES_CNT: count of supported compression types + */ +enum { + SSDFS_COMPR_NONE, + SSDFS_COMPR_ZLIB, + SSDFS_COMPR_LZO, + SSDFS_COMPR_TYPES_CNT, +}; + +/* + * struct ssdfs_compress_ops - compressor operations + * @alloc_workspace - prepare workspace for (de)compression + * @free_workspace - free workspace after (de)compression + * @compress - compression method + * @decompress - decompression method + */ +struct ssdfs_compress_ops { + struct list_head * (*alloc_workspace)(void); + void (*free_workspace)(struct list_head *workspace); + int (*compress)(struct list_head *ws_ptr, + unsigned char *data_in, + unsigned char *cdata_out, + size_t *srclen, + size_t *destlen); + int (*decompress)(struct list_head *ws_ptr, + unsigned char *cdata_in, + unsigned char *data_out, + size_t srclen, + size_t destlen); +}; + +/* + * struct ssdfs_compressor - compressor type. + * @type: compressor type + * @name: compressor name + * @compr_ops: compressor operations + */ +struct ssdfs_compressor { + int type; + const char *name; + const struct ssdfs_compress_ops *compr_ops; +}; + +/* Available SSDFS compressors */ +extern struct ssdfs_compressor *ssdfs_compressors[SSDFS_COMPR_TYPES_CNT]; + +/* compression.c */ +int ssdfs_register_compressor(struct ssdfs_compressor *); +int ssdfs_unregister_compressor(struct ssdfs_compressor *); +bool ssdfs_can_compress_data(struct page *page, unsigned data_size); +int ssdfs_compress(int type, unsigned char *data_in, unsigned char *cdata_out, + size_t *srclen, size_t *destlen); +int ssdfs_decompress(int type, unsigned char *cdata_in, unsigned char *data_out, + size_t srclen, size_t destlen); + +#ifdef CONFIG_SSDFS_ZLIB +/* compr_zlib.c */ +int ssdfs_zlib_init(void); +void ssdfs_zlib_exit(void); +#else +static inline int ssdfs_zlib_init(void) { return 0; } +static inline void ssdfs_zlib_exit(void) { return; } +#endif /* CONFIG_SSDFS_ZLIB */ + +#ifdef CONFIG_SSDFS_LZO +/* compr_lzo.c */ +int ssdfs_lzo_init(void); +void ssdfs_lzo_exit(void); +#else +static inline int ssdfs_lzo_init(void) { return 0; } +static inline void ssdfs_lzo_exit(void) { return; } +#endif /* CONFIG_SSDFS_LZO */ + +#endif /* _SSDFS_COMPRESSION_H */ diff --git a/fs/ssdfs/peb_container.h b/fs/ssdfs/peb_container.h new file mode 100644 index 000000000000..4a3e18ada1f5 --- /dev/null +++ b/fs/ssdfs/peb_container.h @@ -0,0 +1,291 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/peb_container.h - PEB container declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#ifndef _SSDFS_PEB_CONTAINER_H +#define _SSDFS_PEB_CONTAINER_H + +#include "block_bitmap.h" +#include "peb.h" + +/* PEB container's array indexes */ +enum { + SSDFS_SEG_PEB1, + SSDFS_SEG_PEB2, + SSDFS_SEG_PEB_ITEMS_MAX +}; + +/* PEB container possible states */ +enum { + SSDFS_PEB_CONTAINER_EMPTY, + SSDFS_PEB1_SRC_CONTAINER, + SSDFS_PEB1_DST_CONTAINER, + SSDFS_PEB1_SRC_PEB2_DST_CONTAINER, + SSDFS_PEB1_SRC_EXT_PTR_DST_CONTAINER, + SSDFS_PEB2_SRC_CONTAINER, + SSDFS_PEB2_DST_CONTAINER, + SSDFS_PEB2_SRC_PEB1_DST_CONTAINER, + SSDFS_PEB2_SRC_EXT_PTR_DST_CONTAINER, + SSDFS_PEB_CONTAINER_STATE_MAX +}; + +/* + * PEB migration state + */ +enum { + SSDFS_PEB_UNKNOWN_MIGRATION_STATE, + SSDFS_PEB_NOT_MIGRATING, + SSDFS_PEB_MIGRATION_PREPARATION, + SSDFS_PEB_RELATION_PREPARATION, + SSDFS_PEB_UNDER_MIGRATION, + SSDFS_PEB_FINISHING_MIGRATION, + SSDFS_PEB_MIGRATION_STATE_MAX +}; + +/* + * PEB migration phase + */ +enum { + SSDFS_PEB_MIGRATION_STATUS_UNKNOWN, + SSDFS_SRC_PEB_NOT_EXHAUSTED, + SSDFS_DST_PEB_RECEIVES_DATA, + SSDFS_SHARED_ZONE_RECEIVES_DATA, + SSDFS_PEB_MIGRATION_PHASE_MAX +}; + +/* + * struct ssdfs_peb_container - PEB container + * @peb_type: type of PEB + * @peb_index: index of PEB in the array + * @log_pages: count of pages in full log + * @threads: PEB container's threads array + * @read_rq: read requests queue + * @update_rq: update requests queue + * @crq_ptr_lock: lock of pointer on create requests queue + * @create_rq: pointer on shared new page requests queue + * @parent_si: pointer on parent segment object + * @migration_lock: migration lock + * @migration_state: PEB migration state + * @migration_phase: PEB migration phase + * @items_state: items array state + * @shared_free_dst_blks: count of blocks that destination is able to share + * @migration_wq: wait queue for migration operations + * @cache_protection: PEB cache protection window + * @lock: container's internals lock + * @src_peb: pointer on source PEB + * @dst_peb: pointer on destination PEB + * @dst_peb_refs: reference counter of destination PEB (sharing counter) + * @items: buffers for PEB objects + * @peb_kobj: /sys/fs/ssdfs/// kernel object + * @peb_kobj_unregister: completion state for kernel object + */ +struct ssdfs_peb_container { + /* Static data */ + u8 peb_type; + u16 peb_index; + u32 log_pages; + + /* PEB container's threads */ + struct ssdfs_thread_info thread[SSDFS_PEB_THREAD_TYPE_MAX]; + + /* Read requests queue */ + struct ssdfs_requests_queue read_rq; + + /* Update requests queue */ + struct ssdfs_requests_queue update_rq; + + spinlock_t pending_lock; + u32 pending_updated_user_data_pages; + + /* Shared new page requests queue */ + spinlock_t crq_ptr_lock; + struct ssdfs_requests_queue *create_rq; + + /* Parent segment */ + struct ssdfs_segment_info *parent_si; + + /* Migration info */ + struct mutex migration_lock; + atomic_t migration_state; + atomic_t migration_phase; + atomic_t items_state; + atomic_t shared_free_dst_blks; + wait_queue_head_t migration_wq; + + /* PEB cache protection window */ + struct ssdfs_protection_window cache_protection; + + /* PEB objects */ + struct rw_semaphore lock; + struct ssdfs_peb_info *src_peb; + struct ssdfs_peb_info *dst_peb; + atomic_t dst_peb_refs; + struct ssdfs_peb_info items[SSDFS_SEG_PEB_ITEMS_MAX]; + + /* /sys/fs/ssdfs/// */ + struct kobject peb_kobj; + struct completion peb_kobj_unregister; +}; + +#define PEBI_PTR(pebi) \ + ((struct ssdfs_peb_info *)(pebi)) +#define PEBC_PTR(pebc) \ + ((struct ssdfs_peb_container *)(pebc)) +#define READ_RQ_PTR(pebc) \ + (&PEBC_PTR(pebc)->read_rq) + +#define SSDFS_GC_FINISH_MIGRATION (4) + +/* + * Inline functions + */ +static inline +bool is_peb_container_empty(struct ssdfs_peb_container *pebc) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc); +#endif /* CONFIG_SSDFS_DEBUG */ + + return atomic_read(&pebc->items_state) == SSDFS_PEB_CONTAINER_EMPTY; +} + +/* + * is_create_requests_queue_empty() - check that create queue has requests + * @pebc: pointer on PEB container + */ +static inline +bool is_create_requests_queue_empty(struct ssdfs_peb_container *pebc) +{ + bool is_create_rq_empty = true; + + spin_lock(&pebc->crq_ptr_lock); + if (pebc->create_rq) { + is_create_rq_empty = + is_ssdfs_requests_queue_empty(pebc->create_rq); + } + spin_unlock(&pebc->crq_ptr_lock); + + return is_create_rq_empty; +} + +/* + * have_flush_requests() - check that create or update queue have requests + * @pebc: pointer on PEB container + */ +static inline +bool have_flush_requests(struct ssdfs_peb_container *pebc) +{ + bool is_create_rq_empty = true; + bool is_update_rq_empty = true; + + is_create_rq_empty = is_create_requests_queue_empty(pebc); + is_update_rq_empty = is_ssdfs_requests_queue_empty(&pebc->update_rq); + + return !is_create_rq_empty || !is_update_rq_empty; +} + +static inline +bool is_ssdfs_peb_containing_user_data(struct ssdfs_peb_container *pebc) +{ + return pebc->peb_type == SSDFS_MAPTBL_DATA_PEB_TYPE; +} + +/* + * PEB container's API + */ +int ssdfs_peb_container_create(struct ssdfs_fs_info *fsi, + u64 seg, u32 peb_index, + u8 peb_type, + u32 log_pages, + struct ssdfs_segment_info *si); +void ssdfs_peb_container_destroy(struct ssdfs_peb_container *pebc); + +int ssdfs_peb_container_invalidate_block(struct ssdfs_peb_container *pebc, + struct ssdfs_phys_offset_descriptor *desc); +int ssdfs_peb_get_free_pages(struct ssdfs_peb_container *pebc); +int ssdfs_peb_get_used_data_pages(struct ssdfs_peb_container *pebc); +int ssdfs_peb_get_invalid_pages(struct ssdfs_peb_container *pebc); + +int ssdfs_peb_join_create_requests_queue(struct ssdfs_peb_container *pebc, + struct ssdfs_requests_queue *create_rq); +void ssdfs_peb_forget_create_requests_queue(struct ssdfs_peb_container *pebc); +bool is_peb_joined_into_create_requests_queue(struct ssdfs_peb_container *pebc); + +struct ssdfs_peb_info * +ssdfs_get_current_peb_locked(struct ssdfs_peb_container *pebc); +void ssdfs_unlock_current_peb(struct ssdfs_peb_container *pebc); +struct ssdfs_peb_info * +ssdfs_get_peb_for_migration_id(struct ssdfs_peb_container *pebc, + u8 migration_id); + +int ssdfs_peb_container_create_destination(struct ssdfs_peb_container *ptr); +int ssdfs_peb_container_forget_source(struct ssdfs_peb_container *pebc); +int ssdfs_peb_container_forget_relation(struct ssdfs_peb_container *pebc); +int ssdfs_peb_container_change_state(struct ssdfs_peb_container *pebc); + +/* + * PEB container's private API + */ +int ssdfs_peb_gc_thread_func(void *data); +int ssdfs_peb_read_thread_func(void *data); +int ssdfs_peb_flush_thread_func(void *data); + +u16 ssdfs_peb_estimate_reserved_metapages(u32 page_size, u32 pages_per_peb, + u16 log_pages, u32 pebs_per_seg, + bool is_migrating); +int ssdfs_peb_read_page(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req, + struct completion **end); +int ssdfs_peb_readahead_pages(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req, + struct completion **end); +void ssdfs_peb_mark_request_block_uptodate(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req, + int blk_index); +int ssdfs_peb_copy_page(struct ssdfs_peb_container *pebc, + u32 logical_blk, + struct ssdfs_segment_request *req); +int ssdfs_peb_copy_pages_range(struct ssdfs_peb_container *pebc, + struct ssdfs_block_bmap_range *range, + struct ssdfs_segment_request *req); +int ssdfs_peb_copy_pre_alloc_page(struct ssdfs_peb_container *pebc, + u32 logical_blk, + struct ssdfs_segment_request *req); +int __ssdfs_peb_get_block_state_desc(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + struct ssdfs_metadata_descriptor *area_desc, + struct ssdfs_block_state_descriptor *desc, + u64 *cno, u64 *parent_snapshot); +int ssdfs_blk_desc_buffer_init(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req, + struct ssdfs_phys_offset_descriptor *desc_off, + struct ssdfs_offset_position *pos, + struct ssdfs_metadata_descriptor *array, + size_t array_size); +int ssdfs_peb_read_block_state(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req, + struct ssdfs_phys_offset_descriptor *desc_off, + struct ssdfs_offset_position *pos, + struct ssdfs_metadata_descriptor *array, + size_t array_size); +bool ssdfs_peb_has_dirty_pages(struct ssdfs_peb_info *pebi); +int ssdfs_collect_dirty_segments_now(struct ssdfs_fs_info *fsi); + +#endif /* _SSDFS_PEB_CONTAINER_H */ diff --git a/fs/ssdfs/peb_migration_scheme.c b/fs/ssdfs/peb_migration_scheme.c new file mode 100644 index 000000000000..d036b04fc646 --- /dev/null +++ b/fs/ssdfs/peb_migration_scheme.c @@ -0,0 +1,1302 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/peb_migration_scheme.c - Implementation of PEBs' migration scheme. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates. + * https://www.bytedance.com/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + * Cong Wang + */ + +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "page_vector.h" +#include "block_bitmap.h" +#include "peb_block_bitmap.h" +#include "segment_block_bitmap.h" +#include "offset_translation_table.h" +#include "page_array.h" +#include "peb.h" +#include "peb_container.h" +#include "peb_mapping_table.h" +#include "segment_bitmap.h" +#include "segment.h" + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_migration_page_leaks; +atomic64_t ssdfs_migration_memory_leaks; +atomic64_t ssdfs_migration_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_migration_cache_leaks_increment(void *kaddr) + * void ssdfs_migration_cache_leaks_decrement(void *kaddr) + * void *ssdfs_migration_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_migration_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_migration_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_migration_kfree(void *kaddr) + * struct page *ssdfs_migration_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_migration_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_migration_free_page(struct page *page) + * void ssdfs_migration_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(migration) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(migration) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_migration_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_migration_page_leaks, 0); + atomic64_set(&ssdfs_migration_memory_leaks, 0); + atomic64_set(&ssdfs_migration_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_migration_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_migration_page_leaks) != 0) { + SSDFS_ERR("MIGRATION SCHEME: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_migration_page_leaks)); + } + + if (atomic64_read(&ssdfs_migration_memory_leaks) != 0) { + SSDFS_ERR("MIGRATION SCHEME: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_migration_memory_leaks)); + } + + if (atomic64_read(&ssdfs_migration_cache_leaks) != 0) { + SSDFS_ERR("MIGRATION SCHEME: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_migration_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +/* + * ssdfs_peb_start_migration() - prepare and start PEB's migration + * @pebc: pointer on PEB container + */ +int ssdfs_peb_start_migration(struct ssdfs_peb_container *pebc) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg_id %llu, peb_index %u, peb_type %#x, " + "migration_state %#x, items_state %#x\n", + pebc->parent_si->seg_id, + pebc->peb_index, pebc->peb_type, + atomic_read(&pebc->migration_state), + atomic_read(&pebc->items_state)); +#else + SSDFS_DBG("seg_id %llu, peb_index %u, peb_type %#x, " + "migration_state %#x, items_state %#x\n", + pebc->parent_si->seg_id, + pebc->peb_index, pebc->peb_type, + atomic_read(&pebc->migration_state), + atomic_read(&pebc->items_state)); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + fsi = pebc->parent_si->fsi; + si = pebc->parent_si; + + mutex_lock(&pebc->migration_lock); + +check_migration_state: + switch (atomic_read(&pebc->migration_state)) { + case SSDFS_PEB_NOT_MIGRATING: + /* valid state */ + break; + + case SSDFS_PEB_UNDER_MIGRATION: + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PEB is under migration already: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + goto start_migration_done; + + case SSDFS_PEB_MIGRATION_PREPARATION: + case SSDFS_PEB_RELATION_PREPARATION: + case SSDFS_PEB_FINISHING_MIGRATION: { + DEFINE_WAIT(wait); + + mutex_unlock(&pebc->migration_lock); + prepare_to_wait(&pebc->migration_wq, &wait, + TASK_UNINTERRUPTIBLE); + schedule(); + finish_wait(&pebc->migration_wq, &wait); + mutex_lock(&pebc->migration_lock); + goto check_migration_state; + } + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid migration_state %#x\n", + atomic_read(&pebc->migration_state)); + goto start_migration_done; + } + + err = ssdfs_peb_container_create_destination(pebc); + if (unlikely(err)) { + SSDFS_ERR("fail to start PEB migration: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, + err); + goto start_migration_done; + } + + for (i = 0; i < SSDFS_GC_THREAD_TYPE_MAX; i++) { + atomic_inc(&fsi->gc_should_act[i]); + wake_up_all(&fsi->gc_wait_queue[i]); + } + +start_migration_done: + mutex_unlock(&pebc->migration_lock); + + wake_up_all(&pebc->migration_wq); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * is_peb_under_migration() - check that PEB is under migration + * @pebc: pointer on PEB container + */ +bool is_peb_under_migration(struct ssdfs_peb_container *pebc) +{ + int state; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_read(&pebc->migration_state); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("migration state %#x\n", state); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (state) { + case SSDFS_PEB_NOT_MIGRATING: + return false; + + case SSDFS_PEB_MIGRATION_PREPARATION: + case SSDFS_PEB_RELATION_PREPARATION: + case SSDFS_PEB_UNDER_MIGRATION: + case SSDFS_PEB_FINISHING_MIGRATION: + return true; + + default: + SSDFS_WARN("invalid migration_state %#x\n", + atomic_read(&pebc->migration_state)); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return false; +} + +/* + * is_pebs_relation_alive() - check PEBs' relation validity + * @pebc: pointer on PEB container + */ +bool is_pebs_relation_alive(struct ssdfs_peb_container *pebc) +{ + struct ssdfs_segment_info *si; + struct ssdfs_peb_container *dst_pebc; + int shared_free_dst_blks = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si); + BUG_ON(!mutex_is_locked(&pebc->migration_lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = pebc->parent_si; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_state %#x\n", + atomic_read(&pebc->items_state)); +#endif /* CONFIG_SSDFS_DEBUG */ + +try_define_items_state: + switch (atomic_read(&pebc->items_state)) { + case SSDFS_PEB1_SRC_CONTAINER: + case SSDFS_PEB2_SRC_CONTAINER: + return false; + + case SSDFS_PEB1_DST_CONTAINER: + case SSDFS_PEB2_DST_CONTAINER: + if (atomic_read(&pebc->shared_free_dst_blks) <= 0) + return false; + else + return true; + + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + return true; + + case SSDFS_PEB1_SRC_EXT_PTR_DST_CONTAINER: + case SSDFS_PEB2_SRC_EXT_PTR_DST_CONTAINER: + switch (atomic_read(&pebc->migration_state)) { + case SSDFS_PEB_UNDER_MIGRATION: + /* valid state */ + break; + + case SSDFS_PEB_RELATION_PREPARATION: { + DEFINE_WAIT(wait); + + mutex_unlock(&pebc->migration_lock); + prepare_to_wait(&pebc->migration_wq, &wait, + TASK_UNINTERRUPTIBLE); + schedule(); + finish_wait(&pebc->migration_wq, &wait); + mutex_lock(&pebc->migration_lock); + goto try_define_items_state; + } + break; + + default: + SSDFS_WARN("invalid migration_state %#x\n", + atomic_read(&pebc->migration_state)); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + down_read(&pebc->lock); + + if (!pebc->dst_peb) { + err = -ERANGE; + SSDFS_WARN("dst_peb is NULL\n"); + goto finish_relation_check; + } + + dst_pebc = pebc->dst_peb->pebc; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!dst_pebc); +#endif /* CONFIG_SSDFS_DEBUG */ + + shared_free_dst_blks = + atomic_read(&dst_pebc->shared_free_dst_blks); + +finish_relation_check: + up_read(&pebc->lock); + + if (unlikely(err)) + return false; + + if (shared_free_dst_blks > 0) + return true; + break; + + default: + SSDFS_WARN("invalid items_state %#x\n", + atomic_read(&pebc->items_state)); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + } + + return false; +} + +/* + * has_peb_migration_done() - check that PEB's migration has been done + * @pebc: pointer on PEB container + */ +bool has_peb_migration_done(struct ssdfs_peb_container *pebc) +{ + struct ssdfs_segment_info *si; + struct ssdfs_segment_blk_bmap *seg_blkbmap; + struct ssdfs_peb_blk_bmap *peb_blkbmap; + struct ssdfs_block_bmap *blk_bmap; + u16 valid_blks = U16_MAX; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si); + + SSDFS_DBG("migration_state %#x\n", + atomic_read(&pebc->migration_state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&pebc->migration_state)) { + case SSDFS_PEB_NOT_MIGRATING: + case SSDFS_PEB_FINISHING_MIGRATION: + return true; + + case SSDFS_PEB_MIGRATION_PREPARATION: + case SSDFS_PEB_RELATION_PREPARATION: + return false; + + case SSDFS_PEB_UNDER_MIGRATION: + /* valid state */ + break; + + default: + SSDFS_WARN("invalid migration_state %#x\n", + atomic_read(&pebc->migration_state)); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + return true; + } + + si = pebc->parent_si; + seg_blkbmap = &si->blk_bmap; + + if (pebc->peb_index >= seg_blkbmap->pebs_count) { + SSDFS_WARN("peb_index %u >= pebs_count %u\n", + pebc->peb_index, + seg_blkbmap->pebs_count); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!seg_blkbmap->peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + peb_blkbmap = &seg_blkbmap->peb[pebc->peb_index]; + + down_read(&peb_blkbmap->lock); + + switch (atomic_read(&peb_blkbmap->buffers_state)) { + case SSDFS_PEB_BMAP1_SRC_PEB_BMAP2_DST: + case SSDFS_PEB_BMAP2_SRC_PEB_BMAP1_DST: + /* valid state */ + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid buffers_state %#x\n", + atomic_read(&peb_blkbmap->buffers_state)); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_define_bmap_state; + break; + } + + blk_bmap = peb_blkbmap->src; + + if (!blk_bmap) { + err = -ERANGE; + SSDFS_WARN("source block bitmap is NULL\n"); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_define_bmap_state; + } + + err = ssdfs_block_bmap_lock(blk_bmap); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap\n"); + goto finish_define_bmap_state; + } + + err = ssdfs_block_bmap_get_used_pages(blk_bmap); + + ssdfs_block_bmap_unlock(blk_bmap); + + if (unlikely(err < 0)) { + SSDFS_ERR("fail to define valid blocks count: " + "err %d\n", err); + goto finish_define_bmap_state; + } else { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(err >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + valid_blks = (u16)err; + err = 0; + } + +finish_define_bmap_state: + up_read(&peb_blkbmap->lock); + + if (unlikely(err)) + return false; + + return valid_blks == 0 ? true : false; +} + +/* + * should_migration_be_finished() - check that migration should be finished + * @pebc: pointer on PEB container + */ +bool should_migration_be_finished(struct ssdfs_peb_container *pebc) +{ + return !is_pebs_relation_alive(pebc) || has_peb_migration_done(pebc); +} + +/* + * ssdfs_peb_migrate_valid_blocks_range() - migrate valid blocks + * @si: segment object + * @pebc: pointer on PEB container + * @peb_blkbmap: PEB container's block bitmap + * @range: range of valid blocks + */ +int ssdfs_peb_migrate_valid_blocks_range(struct ssdfs_segment_info *si, + struct ssdfs_peb_container *pebc, + struct ssdfs_peb_blk_bmap *peb_blkbmap, + struct ssdfs_block_bmap_range *range) +{ + struct ssdfs_segment_request *req; + struct ssdfs_block_bmap_range copy_range; + struct ssdfs_block_bmap_range sub_range; + bool need_repeat = false; + int processed_blks; + struct page *page; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !pebc || !peb_blkbmap || !range); + + SSDFS_DBG("seg_id %llu, peb_index %u, peb_type %#x, " + "range (start %u, len %u)\n", + si->seg_id, pebc->peb_index, pebc->peb_type, + range->start, range->len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (range->len == 0) { + SSDFS_ERR("empty range\n"); + return -EINVAL; + } + + ssdfs_memcpy(©_range, + 0, sizeof(struct ssdfs_block_bmap_range), + range, + 0, sizeof(struct ssdfs_block_bmap_range), + sizeof(struct ssdfs_block_bmap_range)); + +repeat_valid_blocks_processing: + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + SSDFS_ERR("fail to allocate request: err %d\n", + err); + return err; + } + + need_repeat = false; + ssdfs_request_init(req); + ssdfs_get_request(req); + + ssdfs_request_prepare_internal_data(SSDFS_PEB_COLLECT_GARBAGE_REQ, + SSDFS_COPY_PAGE, + SSDFS_REQ_SYNC, + req); + ssdfs_request_define_segment(si->seg_id, req); + + err = ssdfs_peb_copy_pages_range(pebc, ©_range, req); + if (err == -EAGAIN) { + err = 0; + need_repeat = true; + } else if (unlikely(err)) { + SSDFS_ERR("fail to copy range: err %d\n", + err); + goto fail_process_valid_blocks; + } + + processed_blks = req->result.processed_blks; + + if (copy_range.len < processed_blks) { + err = -ERANGE; + SSDFS_ERR("range1 %u <= range2 %d\n", + copy_range.len, processed_blks); + goto fail_process_valid_blocks; + } + + for (i = 0; i < processed_blks; i++) + ssdfs_peb_mark_request_block_uptodate(pebc, req, i); + + for (i = 0; i < pagevec_count(&req->result.pvec); i++) { + page = req->result.pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + set_page_writeback(page); + } + + req->result.err = 0; + req->result.processed_blks = 0; + atomic_set(&req->result.state, SSDFS_UNKNOWN_REQ_RESULT); + + err = ssdfs_segment_migrate_range_async(si, req); + if (unlikely(err)) { + SSDFS_ERR("fail to migrate range: err %d\n", + err); + goto fail_process_valid_blocks; + } + + sub_range.start = copy_range.start; + sub_range.len = processed_blks; + + err = ssdfs_peb_blk_bmap_invalidate(peb_blkbmap, + SSDFS_PEB_BLK_BMAP_SOURCE, + &sub_range); + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate range: " + "(start %u, len %u), err %d\n", + sub_range.start, sub_range.len, err); + goto finish_valid_blocks_processing; + } + + if (need_repeat) { + copy_range.start += processed_blks; + copy_range.len -= processed_blks; + goto repeat_valid_blocks_processing; + } + + return 0; + +fail_process_valid_blocks: + ssdfs_migration_pagevec_release(&req->result.pvec); + ssdfs_put_request(req); + ssdfs_request_free(req); + +finish_valid_blocks_processing: + return err; +} + +/* + * ssdfs_peb_migrate_pre_alloc_blocks_range() - migrate pre-allocated blocks + * @si: segment object + * @pebc: pointer on PEB container + * @peb_blkbmap: PEB container's block bitmap + * @range: range of pre-allocated blocks + */ +static +int ssdfs_peb_migrate_pre_alloc_blocks_range(struct ssdfs_segment_info *si, + struct ssdfs_peb_container *pebc, + struct ssdfs_peb_blk_bmap *peb_blkbmap, + struct ssdfs_block_bmap_range *range) +{ + struct ssdfs_segment_request *req; + struct ssdfs_block_bmap_range sub_range; + int processed_blks = 0; + u32 logical_blk; + bool has_data = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !pebc || !peb_blkbmap || !range); + + SSDFS_DBG("seg_id %llu, peb_index %u, peb_type %#x, " + "range (start %u, len %u)\n", + si->seg_id, pebc->peb_index, pebc->peb_type, + range->start, range->len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (range->len == 0) { + SSDFS_ERR("empty range\n"); + return -EINVAL; + } + + while (processed_blks < range->len) { + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + SSDFS_ERR("fail to allocate request: err %d\n", + err); + return err; + } + + ssdfs_request_init(req); + ssdfs_get_request(req); + + ssdfs_request_prepare_internal_data(SSDFS_PEB_COLLECT_GARBAGE_REQ, + SSDFS_COPY_PRE_ALLOC_PAGE, + SSDFS_REQ_SYNC, + req); + ssdfs_request_define_segment(si->seg_id, req); + + logical_blk = range->start + processed_blks; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(logical_blk >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + req->place.start.blk_index = (u16)logical_blk; + req->place.len = 1; + + req->extent.ino = U64_MAX; + req->extent.logical_offset = U64_MAX; + req->extent.data_bytes = 0; + + req->result.processed_blks = 0; + + err = ssdfs_peb_copy_pre_alloc_page(pebc, logical_blk, req); + if (err == -ENODATA) { + /* pre-allocated page hasn't content */ + err = 0; + has_data = false; + } else if (unlikely(err)) { + SSDFS_ERR("fail to copy pre-alloc page: " + "logical_blk %u, err %d\n", + logical_blk, err); + goto fail_process_pre_alloc_blocks; + } else { + int i; + u32 pages_count = pagevec_count(&req->result.pvec); + struct page *page; + + ssdfs_peb_mark_request_block_uptodate(pebc, req, 0); + + for (i = 0; i < pages_count; i++) { + page = req->result.pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + set_page_writeback(page); + } + + has_data = true; + } + + req->result.err = 0; + req->result.processed_blks = 0; + atomic_set(&req->result.state, SSDFS_UNKNOWN_REQ_RESULT); + + if (has_data) { + err = ssdfs_segment_migrate_fragment_async(si, req); + } else { + err = ssdfs_segment_migrate_pre_alloc_page_async(si, + req); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to migrate pre-alloc page: " + "logical_blk %u, err %d\n", + logical_blk, err); + goto fail_process_pre_alloc_blocks; + } + + sub_range.start = logical_blk; + sub_range.len = 1; + + err = ssdfs_peb_blk_bmap_invalidate(peb_blkbmap, + SSDFS_PEB_BLK_BMAP_SOURCE, + &sub_range); + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate range: " + "(start %u, len %u), err %d\n", + sub_range.start, sub_range.len, err); + goto finish_pre_alloc_blocks_processing; + } + + processed_blks++; + } + + return 0; + +fail_process_pre_alloc_blocks: + ssdfs_migration_pagevec_release(&req->result.pvec); + ssdfs_put_request(req); + ssdfs_request_free(req); + +finish_pre_alloc_blocks_processing: + return err; +} + +/* + * has_ssdfs_source_peb_valid_blocks() - check that source PEB has valid blocks + * @pebc: pointer on PEB container + */ +bool has_ssdfs_source_peb_valid_blocks(struct ssdfs_peb_container *pebc) +{ + struct ssdfs_segment_info *si; + struct ssdfs_segment_blk_bmap *seg_blkbmap; + struct ssdfs_peb_blk_bmap *peb_blkbmap; + int used_pages; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc); + + SSDFS_DBG("seg_id %llu, peb_index %u, peb_type %#x, " + "migration_state %#x, items_state %#x\n", + pebc->parent_si->seg_id, + pebc->peb_index, pebc->peb_type, + atomic_read(&pebc->migration_state), + atomic_read(&pebc->items_state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = pebc->parent_si; + seg_blkbmap = &si->blk_bmap; + peb_blkbmap = &seg_blkbmap->peb[pebc->peb_index]; + + used_pages = ssdfs_src_blk_bmap_get_used_pages(peb_blkbmap); + if (used_pages < 0) { + err = used_pages; + SSDFS_ERR("fail to get used pages: err %d\n", + err); + return false; + } + + if (used_pages > 0) + return true; + + return false; +} + +/* + * ssdfs_peb_prepare_range_migration() - prepare blocks' range migration + * @pebc: pointer on PEB container + * @range_len: required range length + * @blk_type: type of migrating block + * + * This method tries to prepare range of blocks for migration. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - no blocks for migration has been found. + */ +int ssdfs_peb_prepare_range_migration(struct ssdfs_peb_container *pebc, + u32 range_len, int blk_type) +{ + struct ssdfs_segment_info *si; + struct ssdfs_segment_blk_bmap *seg_blkbmap; + struct ssdfs_peb_blk_bmap *peb_blkbmap; + struct ssdfs_block_bmap_range range = {0, 0}; + u32 pages_per_peb; + int used_pages; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc); + BUG_ON(!mutex_is_locked(&pebc->migration_lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg_id %llu, peb_index %u, peb_type %#x, " + "migration_state %#x, migration_phase %#x, " + "items_state %#x, range_len %u, blk_type %#x\n", + pebc->parent_si->seg_id, + pebc->peb_index, pebc->peb_type, + atomic_read(&pebc->migration_state), + atomic_read(&pebc->migration_phase), + atomic_read(&pebc->items_state), + range_len, blk_type); +#else + SSDFS_DBG("seg_id %llu, peb_index %u, peb_type %#x, " + "migration_state %#x, migration_phase %#x, " + "items_state %#x, range_len %u, blk_type %#x\n", + pebc->parent_si->seg_id, + pebc->peb_index, pebc->peb_type, + atomic_read(&pebc->migration_state), + atomic_read(&pebc->migration_phase), + atomic_read(&pebc->items_state), + range_len, blk_type); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (range_len == 0) { + SSDFS_ERR("invalid range_len %u\n", range_len); + return -EINVAL; + } + + switch (blk_type) { + case SSDFS_BLK_VALID: + case SSDFS_BLK_PRE_ALLOCATED: + /* expected state */ + break; + + default: + SSDFS_ERR("unexpected blk_type %#x\n", + blk_type); + return -EINVAL; + } + + si = pebc->parent_si; + seg_blkbmap = &si->blk_bmap; + peb_blkbmap = &seg_blkbmap->peb[pebc->peb_index]; + pages_per_peb = si->fsi->pages_per_peb; + + switch (atomic_read(&pebc->migration_state)) { + case SSDFS_PEB_UNDER_MIGRATION: + /* valid state */ + break; + + default: + SSDFS_WARN("invalid migration_state %#x\n", + atomic_read(&pebc->migration_state)); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + switch (atomic_read(&pebc->migration_phase)) { + case SSDFS_SRC_PEB_NOT_EXHAUSTED: +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("SRC PEB is not exausted\n"); +#else + SSDFS_DBG("SRC PEB is not exausted\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + return -ENODATA; + + case SSDFS_DST_PEB_RECEIVES_DATA: + case SSDFS_SHARED_ZONE_RECEIVES_DATA: + /* continue logic */ + break; + + default: + SSDFS_ERR("unexpected migration phase %#x\n", + atomic_read(&pebc->migration_phase)); + return -ERANGE; + } + + used_pages = ssdfs_src_blk_bmap_get_used_pages(peb_blkbmap); + if (used_pages < 0) { + err = used_pages; + SSDFS_ERR("fail to get used pages: err %d\n", + err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("used_pages %d\n", used_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (used_pages > 0) { + err = ssdfs_peb_blk_bmap_collect_garbage(peb_blkbmap, + 0, pages_per_peb, + blk_type, + &range); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found range: (start %u, len %u), err %d\n", + range.start, range.len, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (err == -ENODATA) { + /* no valid blocks */ +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("no valid blocks\n"); +#else + SSDFS_DBG("no valid blocks\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to collect garbage: " + "seg_id %llu, err %d\n", + si->seg_id, err); + return err; + } else if (range.len == 0) { + SSDFS_ERR("invalid found range " + "(start %u, len %u)\n", + range.start, range.len); + return -ERANGE; + } + + range.len = min_t(u32, range_len, (u32)range.len); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("final range: (start %u, len %u)\n", + range.start, range.len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_ssdfs_peb_containing_user_data(pebc)) { + ssdfs_account_updated_user_data_pages(si->fsi, + range.len); + } + + switch (blk_type) { + case SSDFS_BLK_VALID: + err = ssdfs_peb_migrate_valid_blocks_range(si, pebc, + peb_blkbmap, + &range); + if (unlikely(err)) { + SSDFS_ERR("fail to migrate valid blocks: " + "range (start %u, len %u), err %d\n", + range.start, range.len, err); + return err; + } + break; + + case SSDFS_BLK_PRE_ALLOCATED: + err = ssdfs_peb_migrate_pre_alloc_blocks_range(si, + pebc, + peb_blkbmap, + &range); + if (unlikely(err)) { + SSDFS_ERR("fail to migrate pre-alloc blocks: " + "range (start %u, len %u), err %d\n", + range.start, range.len, err); + return err; + } + break; + + default: + BUG(); + } + } else { +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("unable to find blocks for migration\n"); +#else + SSDFS_DBG("unable to find blocks for migration\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + return -ENODATA; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} + +/* + * ssdfs_peb_finish_migration() - finish PEB migration + * @pebc: pointer on PEB container + */ +int ssdfs_peb_finish_migration(struct ssdfs_peb_container *pebc) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + struct ssdfs_peb_info *pebi; + struct ssdfs_segment_blk_bmap *seg_blkbmap; + struct ssdfs_peb_blk_bmap *peb_blkbmap; + int used_pages; + u32 pages_per_peb; + int old_migration_state; + bool is_peb_exhausted = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg_id %llu, peb_index %u, peb_type %#x, " + "migration_state %#x, items_state %#x\n", + pebc->parent_si->seg_id, + pebc->peb_index, pebc->peb_type, + atomic_read(&pebc->migration_state), + atomic_read(&pebc->items_state)); +#else + SSDFS_DBG("seg_id %llu, peb_index %u, peb_type %#x, " + "migration_state %#x, items_state %#x\n", + pebc->parent_si->seg_id, + pebc->peb_index, pebc->peb_type, + atomic_read(&pebc->migration_state), + atomic_read(&pebc->items_state)); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + si = pebc->parent_si; + fsi = pebc->parent_si->fsi; + pages_per_peb = fsi->pages_per_peb; + seg_blkbmap = &si->blk_bmap; + peb_blkbmap = &seg_blkbmap->peb[pebc->peb_index]; + + mutex_lock(&pebc->migration_lock); + +check_migration_state: + old_migration_state = atomic_read(&pebc->migration_state); + + used_pages = ssdfs_src_blk_bmap_get_used_pages(peb_blkbmap); + if (used_pages < 0) { + err = used_pages; + SSDFS_ERR("fail to get used pages: err %d\n", + err); + goto finish_migration_done; + } + + switch (old_migration_state) { + case SSDFS_PEB_NOT_MIGRATING: + case SSDFS_PEB_MIGRATION_PREPARATION: + case SSDFS_PEB_RELATION_PREPARATION: + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PEB is not migrating: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_migration_done; + + case SSDFS_PEB_UNDER_MIGRATION: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PEB is under migration: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!fsi->is_zns_device) { + pebi = pebc->dst_peb; + if (!pebi) { + err = -ERANGE; + SSDFS_ERR("PEB pointer is NULL: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); + goto finish_migration_done; + } + } + + ssdfs_peb_current_log_lock(pebi); + is_peb_exhausted = is_ssdfs_peb_exhausted(fsi, pebi); + ssdfs_peb_current_log_unlock(pebi); + + if (is_peb_exhausted) + goto try_finish_migration_now; + else if (has_peb_migration_done(pebc)) + goto try_finish_migration_now; + else if (used_pages <= SSDFS_GC_FINISH_MIGRATION) + goto try_finish_migration_now; + else { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("Don't finish migration: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_migration_done; + } + break; + + case SSDFS_PEB_FINISHING_MIGRATION: { + DEFINE_WAIT(wait); + + mutex_unlock(&pebc->migration_lock); + prepare_to_wait(&pebc->migration_wq, &wait, + TASK_UNINTERRUPTIBLE); + schedule(); + finish_wait(&pebc->migration_wq, &wait); + mutex_lock(&pebc->migration_lock); + goto check_migration_state; + } + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid migration_state %#x\n", + atomic_read(&pebc->migration_state)); + goto finish_migration_done; + } + +try_finish_migration_now: + atomic_set(&pebc->migration_state, SSDFS_PEB_FINISHING_MIGRATION); + + while (used_pages > 0) { + struct ssdfs_block_bmap_range range1 = {0, 0}; + struct ssdfs_block_bmap_range range2 = {0, 0}; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("used_pages %d\n", used_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_blk_bmap_collect_garbage(peb_blkbmap, + 0, pages_per_peb, + SSDFS_BLK_VALID, + &range1); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("range1.start %u, range1.len %u, err %d\n", + range1.start, range1.len, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (err == -ENODATA) { + /* no valid blocks */ + err = 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to collect garbage: " + "seg_id %llu, err %d\n", + si->seg_id, err); + goto finish_migration_done; + } + + err = ssdfs_peb_blk_bmap_collect_garbage(peb_blkbmap, + 0, pages_per_peb, + SSDFS_BLK_PRE_ALLOCATED, + &range2); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("range2.start %u, range2.len %u, err %d\n", + range2.start, range2.len, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (err == -ENODATA) { + /* no valid blocks */ + err = 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to collect garbage: " + "seg_id %llu, err %d\n", + si->seg_id, err); + goto finish_migration_done; + } + + if (range1.len == 0 && range2.len == 0) { + err = -ERANGE; + SSDFS_ERR("no valid blocks were found\n"); + goto finish_migration_done; + } + + if (range1.len > 0) { + if (is_ssdfs_peb_containing_user_data(pebc)) { + ssdfs_account_updated_user_data_pages(si->fsi, + range1.len); + } + + err = ssdfs_peb_migrate_valid_blocks_range(si, pebc, + peb_blkbmap, + &range1); + if (unlikely(err)) { + SSDFS_ERR("fail to migrate valid blocks: " + "range (start %u, len %u), err %d\n", + range1.start, range1.len, err); + goto finish_migration_done; + } + } + + if (range2.len > 0) { + if (is_ssdfs_peb_containing_user_data(pebc)) { + ssdfs_account_updated_user_data_pages(si->fsi, + range2.len); + } + + err = ssdfs_peb_migrate_pre_alloc_blocks_range(si, + pebc, + peb_blkbmap, + &range2); + if (unlikely(err)) { + SSDFS_ERR("fail to migrate pre-alloc blocks: " + "range (start %u, len %u), err %d\n", + range2.start, range2.len, err); + goto finish_migration_done; + } + } + + used_pages -= range1.len; + used_pages -= range2.len; + + if (used_pages < 0) { + err = -ERANGE; + SSDFS_ERR("invalid used_pages %d\n", + used_pages); + goto finish_migration_done; + } + } + + used_pages = ssdfs_src_blk_bmap_get_used_pages(peb_blkbmap); + if (used_pages != 0) { + err = -ERANGE; + SSDFS_ERR("source PEB has valid blocks: " + "used_pages %d\n", + used_pages); + goto finish_migration_done; + } + + switch (atomic_read(&pebc->items_state)) { + case SSDFS_PEB1_SRC_EXT_PTR_DST_CONTAINER: + case SSDFS_PEB2_SRC_EXT_PTR_DST_CONTAINER: + err = ssdfs_peb_container_forget_relation(pebc); + break; + + case SSDFS_PEB1_DST_CONTAINER: + case SSDFS_PEB2_DST_CONTAINER: + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + err = ssdfs_peb_container_forget_source(pebc); + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid items_state %#x\n", + atomic_read(&pebc->items_state)); + goto finish_migration_done; + } + + if (unlikely(err)) { + SSDFS_ERR("fail to break relation: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + goto finish_migration_done; + } + +finish_migration_done: + if (err) { + switch (atomic_read(&pebc->migration_state)) { + case SSDFS_PEB_FINISHING_MIGRATION: + atomic_set(&pebc->migration_state, + old_migration_state); + break; + + default: + /* do nothing */ + break; + } + } + + mutex_unlock(&pebc->migration_lock); + + wake_up_all(&pebc->migration_wq); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} From patchwork Sat Feb 25 01:08:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151927 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD56BC6FA8E for ; Sat, 25 Feb 2023 01:17:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229721AbjBYBRL (ORCPT ); Fri, 24 Feb 2023 20:17:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48704 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229701AbjBYBQ3 (ORCPT ); Fri, 24 Feb 2023 20:16:29 -0500 Received: from mail-oi1-x235.google.com (mail-oi1-x235.google.com [IPv6:2607:f8b0:4864:20::235]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE38E14997 for ; Fri, 24 Feb 2023 17:16:22 -0800 (PST) Received: by mail-oi1-x235.google.com with SMTP id bl7so859011oib.0 for ; Fri, 24 Feb 2023 17:16:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=HhRCif+96swTaLHU6Gyio1NAw0Od79etdDjB0vZ0cBQ=; b=R6VbJTP0w3HH8ITuY8a9F3XQAXE3tLlJYo+KyRrV5bNWmD2kWRjI7F1neyz56OAK5Z wErX+jBvBC5plbzirKrbjtAJd1ZcM8DbZLjVOGpRdz73z+gF6aayoO0SjhDjO4Y8klWh kH9Mjn27stk979kDdO/m1ggiRivstEi+D+UwucO/PBN1nrc7E+IPiggB97vU1KCZJ7kx /gzfmb+WZgdlwBgdQdVOl6yZloCWmBXtXEJN/pIICsnPSMPdLUzMM8Yit6EZ+yhnzLmv X+CWWcuzYplD6UpWG7qgxUl4yuQ4tynRsslPGckuNTMGH4kYuVJ1cKOedVCl8rBkFSyU jqvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HhRCif+96swTaLHU6Gyio1NAw0Od79etdDjB0vZ0cBQ=; b=bXL4wWOGbdzXDWR64AyL8HgBpV2ONrtNWwq15MBOZq9Z1MTEz0yUKO4xFaSD1ZzdM9 Cjp70sKfrXNvihbADpEJvlCOnoCPIViO8RmUYIOTLrvPHabGh5jdjr9Ljg8eo/oYNZ7H e8N3Zz9+lJqk3zXnBGQZ+Lfl/YlTrjy4Ib42IPZnxbvet/e3xosHMwsg8O7lCC+ENHfh rE2yooiIh14kx7EYVMJs4oltSX4xV2JfgrsuvvlRchf4ZSMDmJNQV4f+k7bLrSmqnPs1 bxh3GVyizV8wAv/SToou8EluEbCY9KVCNETKZDMmCqFzpZDcYwUSHDVkcdMT/5BUNqmW booQ== X-Gm-Message-State: AO0yUKUbzwkiGYSBCFPbXFd2z/XNWRltQEPf1yn4TankrkaV4jW83Utj UNsCGvLpsS23TP84IhqX8t4WJZml8rcRXF8T X-Google-Smtp-Source: AK7set9dunxArKT4UgDXeU/xVpGSWeMX1bPUvGw6r5ER2XxZYvFzQE6wJvH4Bi8DrEOc8XoYUlXAYw== X-Received: by 2002:a05:6808:6393:b0:37f:acd5:20ff with SMTP id ec19-20020a056808639300b0037facd520ffmr5087382oib.43.1677287781318; Fri, 24 Feb 2023 17:16:21 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:20 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 22/76] ssdfs: create/destroy PEB container Date: Fri, 24 Feb 2023 17:08:33 -0800 Message-Id: <20230225010927.813929-23-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org "Physical" Erase Block object can be in one of the possible state (clean, using, used, pre-dirty, dirty). It means that PEB container creation logic needs to define the state of particular erase block and detect that it's under migration or not. As a result, creation logic prepare proper sequence of initialization requests, add these request into request queue, and start threads that executes PEB container initialization. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/peb_container.c | 2669 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 2669 insertions(+) create mode 100644 fs/ssdfs/peb_container.c diff --git a/fs/ssdfs/peb_container.c b/fs/ssdfs/peb_container.c new file mode 100644 index 000000000000..668ded673719 --- /dev/null +++ b/fs/ssdfs/peb_container.c @@ -0,0 +1,2669 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear + /* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/peb_container.c - PEB container implementation. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates. + * https://www.bytedance.com/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + * Cong Wang + */ + +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "offset_translation_table.h" +#include "page_array.h" +#include "page_vector.h" +#include "peb_container.h" +#include "segment_bitmap.h" +#include "segment.h" +#include "current_segment.h" +#include "peb_mapping_table.h" +#include "btree_search.h" +#include "btree_node.h" +#include "btree.h" +#include "invalidated_extents_tree.h" + +enum { + SSDFS_SRC_PEB, + SSDFS_DST_PEB, + SSDFS_SRC_AND_DST_PEB +}; + +static +struct ssdfs_thread_descriptor thread_desc[SSDFS_PEB_THREAD_TYPE_MAX] = { + {.threadfn = ssdfs_peb_read_thread_func, + .fmt = "ssdfs-r%llu-%u",}, + {.threadfn = ssdfs_peb_flush_thread_func, + .fmt = "ssdfs-f%llu-%u",}, + {.threadfn = ssdfs_peb_gc_thread_func, + .fmt = "ssdfs-gc%llu-%u",}, +}; + +/* + * ssdfs_peb_mark_request_block_uptodate() - mark block uptodate + * @pebc: pointer on PEB container + * @req: request + * @blk_index: index of block in request's sequence + * + * This function mark memory pages of request as uptodate and + * not dirty. Page should be locked. + */ +void ssdfs_peb_mark_request_block_uptodate(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req, + int blk_index) +{ + u32 pagesize; + u32 mem_pages; + pgoff_t page_index; + u32 page_off; + u32 i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!req); + + SSDFS_DBG("blk_index %d, processed_blocks %d\n", + blk_index, req->result.processed_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pagevec_count(&req->result.pvec) == 0) { + SSDFS_DBG("pagevec is empty\n"); + return; + } + + BUG_ON(blk_index >= req->result.processed_blks); + + pagesize = pebc->parent_si->fsi->pagesize; + mem_pages = (pagesize + PAGE_SIZE - 1) >> PAGE_SHIFT; + page_index = ssdfs_phys_page_to_mem_page(pebc->parent_si->fsi, + blk_index); + page_off = (page_index * pagesize) % PAGE_SIZE; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(mem_pages > 1 && page_off != 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < mem_pages; i++) { + if ((page_off + pagesize) != PAGE_SIZE) + return; + else { + struct page *page; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(i >= pagevec_count(&req->result.pvec)); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = req->result.pvec.pages[i]; + + if (!PageLocked(page)) { + SSDFS_WARN("failed to mark block uptodate: " + "page %d is not locked\n", + i); + } else { + if (!PageError(page)) { + ClearPageDirty(page); + SetPageUptodate(page); + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); + SSDFS_DBG("page_index %ld, flags %#lx\n", + page->index, page->flags); +#endif /* CONFIG_SSDFS_DEBUG */ + } + } +} + +/* + * ssdfs_peb_start_thread() - start PEB's thread + * @pebc: pointer on PEB container + * @type: thread type + * + * This function tries to start PEB's thread of @type. + * + * RETURN: + * [success] - PEB's thread has been started. + * [failure] - error code: + * + * %-EINVAL - invalid input. + */ +static +int ssdfs_peb_start_thread(struct ssdfs_peb_container *pebc, int type) +{ + struct ssdfs_segment_info *si; + ssdfs_threadfn threadfn; + const char *fmt; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si); + + if (type >= SSDFS_PEB_THREAD_TYPE_MAX) { + SSDFS_ERR("invalid thread type %d\n", type); + return -EINVAL; + } + + SSDFS_DBG("seg_id %llu, peb_index %u, thread_type %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, + type); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = pebc->parent_si; + threadfn = thread_desc[type].threadfn; + fmt = thread_desc[type].fmt; + + pebc->thread[type].task = kthread_create(threadfn, pebc, fmt, + pebc->parent_si->seg_id, + pebc->peb_index); + if (IS_ERR_OR_NULL(pebc->thread[type].task)) { + err = PTR_ERR(pebc->thread[type].task); + if (err == -EINTR) { + /* + * Ignore this error. + */ + } else { + if (err == 0) + err = -ERANGE; + SSDFS_ERR("fail to start thread: " + "seg_id %llu, peb_index %u, thread_type %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, type); + } + + return err; + } + + init_waitqueue_entry(&pebc->thread[type].wait, + pebc->thread[type].task); + add_wait_queue(&si->wait_queue[type], + &pebc->thread[type].wait); + init_completion(&pebc->thread[type].full_stop); + + wake_up_process(pebc->thread[type].task); + + return 0; +} + +/* + * ssdfs_peb_stop_thread() - stop PEB's thread + * @pebc: pointer on PEB container + * @type: thread type + * + * This function tries to stop PEB's thread of @type. + * + * RETURN: + * [success] - PEB's thread has been stopped. + * [failure] - error code: + * + * %-EINVAL - invalid input. + */ +static +int ssdfs_peb_stop_thread(struct ssdfs_peb_container *pebc, int type) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si); + + if (type >= SSDFS_PEB_THREAD_TYPE_MAX) { + SSDFS_ERR("invalid thread type %d\n", type); + return -EINVAL; + } + + SSDFS_DBG("type %#x, task %p\n", + type, pebc->thread[type].task); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!pebc->thread[type].task) + return 0; + + err = kthread_stop(pebc->thread[type].task); + if (err == -EINTR) { + /* + * Ignore this error. + * The wake_up_process() was never called. + */ + return 0; + } else if (unlikely(err)) { + SSDFS_WARN("thread function had some issue: err %d\n", + err); + return err; + } + + finish_wait(&pebc->parent_si->wait_queue[type], + &pebc->thread[type].wait); + + pebc->thread[type].task = NULL; + + err = SSDFS_WAIT_COMPLETION(&pebc->thread[type].full_stop); + if (unlikely(err)) { + SSDFS_ERR("stop thread fails: err %d\n", err); + return err; + } + + return 0; +} + +/* + * ssdfs_peb_map_leb2peb() - map LEB ID into PEB ID + * @fsi: pointer on shared file system object + * @leb_id: LEB ID number + * @peb_type: PEB type + * @pebr: pointer on PEBs association container [out] + * + * This method tries to map LEB ID into PEB ID. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENODATA - can't map LEB to PEB. + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_map_leb2peb(struct ssdfs_fs_info *fsi, + u64 leb_id, int peb_type, + struct ssdfs_maptbl_peb_relation *pebr) +{ + struct completion *end; +#ifdef CONFIG_SSDFS_DEBUG + struct ssdfs_maptbl_peb_descriptor *ptr; +#endif /* CONFIG_SSDFS_DEBUG */ + u64 peb_id; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !fsi->maptbl || !pebr); + BUG_ON(leb_id == U64_MAX); + + SSDFS_DBG("leb_id %llu, peb_type %#x\n", + leb_id, peb_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_maptbl_map_leb2peb(fsi, leb_id, peb_type, + pebr, &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + return err; + } + + err = ssdfs_maptbl_map_leb2peb(fsi, leb_id, peb_type, + pebr, &end); + } + + if (err == -EACCES || err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("can't map LEB to PEB: " + "leb_id %llu, peb_type %#x, err %d\n", + leb_id, peb_type, err); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } else if (err == -EEXIST) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB is mapped already: " + "leb_id %llu, peb_type %#x\n", + leb_id, peb_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_maptbl_convert_leb2peb(fsi, leb_id, + peb_type, + pebr, &end); + if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, peb_type %#x, err %d\n", + leb_id, peb_type, err); + return err; + } + } else if (unlikely(err)) { + SSDFS_ERR("fail to map LEB to PEB: " + "leb_id %llu, peb_type %#x, err %d\n", + leb_id, peb_type, err); + return err; + } + + peb_id = pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id; + + if (peb_id == U64_MAX) { + SSDFS_ERR("invalid peb_id\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB %llu, PEB %llu\n", leb_id, peb_id); + + ptr = &pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX]; + SSDFS_DBG("MAIN: peb_id %llu, shared_peb_index %u, " + "erase_cycles %u, type %#x, state %#x, " + "flags %#x\n", + ptr->peb_id, ptr->shared_peb_index, + ptr->erase_cycles, ptr->type, + ptr->state, ptr->flags); + ptr = &pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX]; + SSDFS_DBG("RELATION: peb_id %llu, shared_peb_index %u, " + "erase_cycles %u, type %#x, state %#x, " + "flags %#x\n", + ptr->peb_id, ptr->shared_peb_index, + ptr->erase_cycles, ptr->type, + ptr->state, ptr->flags); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_peb_convert_leb2peb() - convert LEB ID into PEB ID + * @fsi: pointer on shared file system object + * @leb_id: LEB ID number + * @peb_type: PEB type + * @pebr: pointer on PEBs association container [out] + * + * This method tries to convert LEB ID into PEB ID. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENODATA - can't convert LEB to PEB. + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_convert_leb2peb(struct ssdfs_fs_info *fsi, + u64 leb_id, int peb_type, + struct ssdfs_maptbl_peb_relation *pebr) +{ + struct completion *end; +#ifdef CONFIG_SSDFS_DEBUG + struct ssdfs_maptbl_peb_descriptor *ptr; +#endif /* CONFIG_SSDFS_DEBUG */ + u64 peb_id; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !fsi->maptbl || !pebr); + BUG_ON(leb_id == U64_MAX); + + SSDFS_DBG("leb_id %llu, peb_type %#x\n", + leb_id, peb_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_maptbl_convert_leb2peb(fsi, leb_id, + peb_type, + pebr, &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + return err; + } + + err = ssdfs_maptbl_convert_leb2peb(fsi, leb_id, peb_type, + pebr, &end); + } + + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB doesn't mapped: leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, peb_type %#x, err %d\n", + leb_id, peb_type, err); + return err; + } + + peb_id = pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id; + if (peb_id == U64_MAX) { + SSDFS_ERR("invalid peb_id\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB %llu, PEB %llu\n", leb_id, peb_id); + + ptr = &pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX]; + SSDFS_DBG("MAIN: peb_id %llu, shared_peb_index %u, " + "erase_cycles %u, type %#x, state %#x, " + "flags %#x\n", + ptr->peb_id, ptr->shared_peb_index, + ptr->erase_cycles, ptr->type, + ptr->state, ptr->flags); + ptr = &pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX]; + SSDFS_DBG("RELATION: peb_id %llu, shared_peb_index %u, " + "erase_cycles %u, type %#x, state %#x, " + "flags %#x\n", + ptr->peb_id, ptr->shared_peb_index, + ptr->erase_cycles, ptr->type, + ptr->state, ptr->flags); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_create_clean_peb_container() - create "clean" PEB container + * @pebc: pointer on PEB container + * @selected_peb: source or destination PEB? + * + * This function tries to initialize PEB container for "clean" + * state of the PEB. + * + * RETURN: + * [success] - PEB container has been constructed sucessfully. + * [failure] - error code: + * + * %-EINVAL - invalid input. + */ +static +int ssdfs_create_clean_peb_container(struct ssdfs_peb_container *pebc, + int selected_peb) +{ + struct ssdfs_segment_info *si; + struct ssdfs_blk2off_table *blk2off_table; + struct ssdfs_peb_blk_bmap *peb_blkbmap; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si); + BUG_ON(!pebc->parent_si->blk_bmap.peb); + + SSDFS_DBG("peb_index %u, peb_type %#x, " + "selected_peb %d\n", + pebc->peb_index, pebc->peb_type, + selected_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = pebc->parent_si; + blk2off_table = si->blk2off_table; + + atomic_set(&blk2off_table->peb[pebc->peb_index].state, + SSDFS_BLK2OFF_TABLE_COMPLETE_INIT); + + peb_blkbmap = &si->blk_bmap.peb[pebc->peb_index]; + ssdfs_set_block_bmap_initialized(peb_blkbmap->src); + atomic_set(&peb_blkbmap->state, SSDFS_PEB_BLK_BMAP_INITIALIZED); + + if (selected_peb == SSDFS_SRC_PEB) { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc->src_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + ssdfs_set_peb_migration_id(pebc->src_peb, + SSDFS_PEB_MIGRATION_ID_START); + atomic_set(&pebc->src_peb->state, + SSDFS_PEB_OBJECT_INITIALIZED); + complete_all(&pebc->src_peb->init_end); + } else if (selected_peb == SSDFS_DST_PEB) { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc->dst_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + ssdfs_set_peb_migration_id(pebc->dst_peb, + SSDFS_PEB_MIGRATION_ID_START); + atomic_set(&pebc->dst_peb->state, + SSDFS_PEB_OBJECT_INITIALIZED); + complete_all(&pebc->dst_peb->init_end); + } else + BUG(); + + err = ssdfs_peb_start_thread(pebc, SSDFS_PEB_READ_THREAD); + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto fail_create_clean_peb_obj; + } else if (unlikely(err)) { + SSDFS_ERR("fail to start read thread: " + "peb_index %u, err %d\n", + pebc->peb_index, err); + goto fail_create_clean_peb_obj; + } + + err = ssdfs_peb_start_thread(pebc, SSDFS_PEB_FLUSH_THREAD); + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto stop_read_thread; + } else if (unlikely(err)) { + SSDFS_ERR("fail to start flush thread: " + "peb_index %u, err %d\n", + pebc->peb_index, err); + goto stop_read_thread; + } + + return 0; + +stop_read_thread: + ssdfs_peb_stop_thread(pebc, SSDFS_PEB_READ_THREAD); + +fail_create_clean_peb_obj: + return err; +} + +/* + * ssdfs_create_using_peb_container() - create "using" PEB container + * @pebc: pointer on PEB container + * @selected_peb: source or destination PEB? + * + * This function tries to initialize PEB conatiner for "using" + * state of the PEB. + * + * RETURN: + * [success] - PEB object has been constructed sucessfully. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENOMEM - unable to allocate memory. + */ +static +int ssdfs_create_using_peb_container(struct ssdfs_peb_container *pebc, + int selected_peb) +{ + struct ssdfs_peb_blk_bmap *peb_blkbmap; + struct ssdfs_segment_request *req1, *req2, *req3, *req4, *req5; + int command; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si); + BUG_ON(!pebc->parent_si->blk_bmap.peb); + BUG_ON(selected_peb < SSDFS_SRC_PEB || + selected_peb > SSDFS_SRC_AND_DST_PEB); + + SSDFS_DBG("seg %llu, peb_index %u, peb_type %#x, " + "selected_peb %u\n", + pebc->parent_si->seg_id, + pebc->peb_index, pebc->peb_type, + selected_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (selected_peb == SSDFS_SRC_PEB) + command = SSDFS_READ_SRC_ALL_LOG_HEADERS; + else if (selected_peb == SSDFS_DST_PEB) + command = SSDFS_READ_DST_ALL_LOG_HEADERS; + else if (selected_peb == SSDFS_SRC_AND_DST_PEB) + command = SSDFS_READ_DST_ALL_LOG_HEADERS; + else + BUG(); + + req1 = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req1)) { + err = (req1 == NULL ? -ENOMEM : PTR_ERR(req1)); + req1 = NULL; + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + goto fail_create_using_peb_obj; + } + + ssdfs_request_init(req1); + /* read thread puts request */ + ssdfs_get_request(req1); + /* it needs to be sure that request will be not freed */ + ssdfs_get_request(req1); + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + command, + SSDFS_REQ_ASYNC, + req1); + ssdfs_request_define_segment(pebc->parent_si->seg_id, req1); + ssdfs_peb_read_request_cno(pebc); + ssdfs_requests_queue_add_tail(&pebc->read_rq, req1); + + if (selected_peb == SSDFS_SRC_PEB) + command = SSDFS_READ_BLK_BMAP_SRC_USING_PEB; + else if (selected_peb == SSDFS_DST_PEB) + command = SSDFS_READ_BLK_BMAP_DST_USING_PEB; + else if (selected_peb == SSDFS_SRC_AND_DST_PEB) + command = SSDFS_READ_BLK_BMAP_DST_USING_PEB; + else + BUG(); + + req2 = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req2)) { + err = (req2 == NULL ? -ENOMEM : PTR_ERR(req2)); + req2 = NULL; + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + ssdfs_requests_queue_remove_all(&pebc->read_rq, -ERANGE); + goto fail_create_using_peb_obj; + } + + ssdfs_request_init(req2); + ssdfs_get_request(req2); + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + command, + SSDFS_REQ_ASYNC, + req2); + ssdfs_request_define_segment(pebc->parent_si->seg_id, req2); + ssdfs_peb_read_request_cno(pebc); + ssdfs_requests_queue_add_tail(&pebc->read_rq, req2); + + if (selected_peb == SSDFS_SRC_AND_DST_PEB) { + command = SSDFS_READ_SRC_LAST_LOG_FOOTER; + + req3 = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req3)) { + err = (req3 == NULL ? -ENOMEM : PTR_ERR(req3)); + req1 = NULL; + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + ssdfs_requests_queue_remove_all(&pebc->read_rq, + -ERANGE); + goto fail_create_using_peb_obj; + } + + ssdfs_request_init(req3); + ssdfs_get_request(req3); + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + command, + SSDFS_REQ_ASYNC, + req3); + ssdfs_request_define_segment(pebc->parent_si->seg_id, req3); + ssdfs_peb_read_request_cno(pebc); + ssdfs_requests_queue_add_tail(&pebc->read_rq, req3); + + command = SSDFS_READ_SRC_ALL_LOG_HEADERS; + + req4 = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req4)) { + err = (req4 == NULL ? -ENOMEM : PTR_ERR(req4)); + req4 = NULL; + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + ssdfs_requests_queue_remove_all(&pebc->read_rq, + -ERANGE); + goto fail_create_using_peb_obj; + } + + ssdfs_request_init(req4); + ssdfs_get_request(req4); + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + command, + SSDFS_REQ_ASYNC, + req4); + ssdfs_request_define_segment(pebc->parent_si->seg_id, req4); + ssdfs_peb_read_request_cno(pebc); + ssdfs_requests_queue_add_tail(&pebc->read_rq, req4); + } + + if (selected_peb == SSDFS_SRC_PEB) + command = SSDFS_READ_BLK2OFF_TABLE_SRC_PEB; + else if (selected_peb == SSDFS_DST_PEB) + command = SSDFS_READ_BLK2OFF_TABLE_DST_PEB; + else if (selected_peb == SSDFS_SRC_AND_DST_PEB) + command = SSDFS_READ_BLK2OFF_TABLE_DST_PEB; + else + BUG(); + + req5 = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req5)) { + err = (req5 == NULL ? -ENOMEM : PTR_ERR(req5)); + req5 = NULL; + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + ssdfs_requests_queue_remove_all(&pebc->read_rq, -ERANGE); + goto fail_create_using_peb_obj; + } + + ssdfs_request_init(req5); + ssdfs_get_request(req5); + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + command, + SSDFS_REQ_ASYNC, + req5); + ssdfs_request_define_segment(pebc->parent_si->seg_id, req5); + ssdfs_peb_read_request_cno(pebc); + ssdfs_requests_queue_add_tail(&pebc->read_rq, req5); + + err = ssdfs_peb_start_thread(pebc, SSDFS_PEB_READ_THREAD); + if (unlikely(err)) { + if (err == -EINTR) { + /* + * Ignore this error. + */ + } else { + SSDFS_ERR("fail to start read thread: " + "peb_index %u, err %d\n", + pebc->peb_index, err); + } + + ssdfs_requests_queue_remove_all(&pebc->read_rq, -ERANGE); + goto fail_create_using_peb_obj; + } + + err = ssdfs_peb_start_thread(pebc, SSDFS_PEB_FLUSH_THREAD); + if (unlikely(err)) { + if (err == -EINTR) { + /* + * Ignore this error. + */ + } else { + SSDFS_ERR("fail to start flush thread: " + "peb_index %u, err %d\n", + pebc->peb_index, err); + } + + goto stop_read_thread; + } + + peb_blkbmap = &pebc->parent_si->blk_bmap.peb[pebc->peb_index]; + + if (!ssdfs_peb_blk_bmap_initialized(peb_blkbmap)) { + err = SSDFS_WAIT_COMPLETION(&req1->result.wait); + if (unlikely(err)) { + SSDFS_ERR("read thread fails: err %d\n", + err); + goto stop_flush_thread; + } + + /* + * Block bitmap has been locked for initialization. + * Now it isn't initialized yet. It should check + * block bitmap initialization state during first + * request about free pages count. + */ + } + + ssdfs_put_request(req1); + + /* + * Current log start_page and data_free_pages count was defined + * in the read thread during searching last actual state of block + * bitmap. + */ + + /* + * Wake up read request if it waits zeroing + * of reference counter. + */ + wake_up_all(&pebc->parent_si->wait_queue[SSDFS_PEB_READ_THREAD]); + + return 0; + +stop_flush_thread: + ssdfs_peb_stop_thread(pebc, SSDFS_PEB_FLUSH_THREAD); + +stop_read_thread: + ssdfs_requests_queue_remove_all(&pebc->read_rq, -ERANGE); + wake_up_all(&pebc->parent_si->wait_queue[SSDFS_PEB_READ_THREAD]); + ssdfs_peb_stop_thread(pebc, SSDFS_PEB_READ_THREAD); + +fail_create_using_peb_obj: + return err; +} + +/* + * ssdfs_create_used_peb_container() - create "used" PEB container + * @pebi: pointer on PEB container + * @selected_peb: source or destination PEB? + * + * This function tries to initialize PEB container for "used" + * state of the PEB. + * + * RETURN: + * [success] - PEB container has been constructed sucessfully. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENOMEM - unable to allocate memory. + */ +static +int ssdfs_create_used_peb_container(struct ssdfs_peb_container *pebc, + int selected_peb) +{ + struct ssdfs_peb_blk_bmap *peb_blkbmap; + struct ssdfs_segment_request *req1, *req2, *req3; + int command; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si); + BUG_ON(!pebc->parent_si->blk_bmap.peb); + BUG_ON(selected_peb < SSDFS_SRC_PEB || selected_peb > SSDFS_DST_PEB); + + SSDFS_DBG("seg %llu, peb_index %u, peb_type %#x, " + "selected_peb %u\n", + pebc->parent_si->seg_id, + pebc->peb_index, pebc->peb_type, + selected_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (selected_peb == SSDFS_SRC_PEB) + command = SSDFS_READ_SRC_ALL_LOG_HEADERS; + else if (selected_peb == SSDFS_DST_PEB) + command = SSDFS_READ_DST_ALL_LOG_HEADERS; + else + BUG(); + + req1 = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req1)) { + err = (req1 == NULL ? -ENOMEM : PTR_ERR(req1)); + req1 = NULL; + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + goto fail_create_used_peb_obj; + } + + ssdfs_request_init(req1); + /* read thread puts request */ + ssdfs_get_request(req1); + /* it needs to be sure that request will be not freed */ + ssdfs_get_request(req1); + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + command, + SSDFS_REQ_ASYNC, + req1); + ssdfs_request_define_segment(pebc->parent_si->seg_id, req1); + ssdfs_peb_read_request_cno(pebc); + ssdfs_requests_queue_add_tail(&pebc->read_rq, req1); + + if (selected_peb == SSDFS_SRC_PEB) + command = SSDFS_READ_BLK_BMAP_SRC_USED_PEB; + else if (selected_peb == SSDFS_DST_PEB) + command = SSDFS_READ_BLK_BMAP_DST_USED_PEB; + else + BUG(); + + req2 = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req2)) { + err = (req2 == NULL ? -ENOMEM : PTR_ERR(req2)); + req2 = NULL; + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + ssdfs_requests_queue_remove_all(&pebc->read_rq, -ERANGE); + goto fail_create_used_peb_obj; + } + + ssdfs_request_init(req2); + ssdfs_get_request(req2); + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + command, + SSDFS_REQ_ASYNC, + req2); + ssdfs_request_define_segment(pebc->parent_si->seg_id, req2); + ssdfs_peb_read_request_cno(pebc); + ssdfs_requests_queue_add_tail(&pebc->read_rq, req2); + + if (selected_peb == SSDFS_SRC_PEB) + command = SSDFS_READ_BLK2OFF_TABLE_SRC_PEB; + else if (selected_peb == SSDFS_DST_PEB) + command = SSDFS_READ_BLK2OFF_TABLE_DST_PEB; + else + BUG(); + + req3 = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req3)) { + err = (req3 == NULL ? -ENOMEM : PTR_ERR(req3)); + req3 = NULL; + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + ssdfs_requests_queue_remove_all(&pebc->read_rq, -ERANGE); + goto fail_create_used_peb_obj; + } + + ssdfs_request_init(req3); + ssdfs_get_request(req3); + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + command, + SSDFS_REQ_ASYNC, + req3); + ssdfs_request_define_segment(pebc->parent_si->seg_id, req3); + ssdfs_peb_read_request_cno(pebc); + ssdfs_requests_queue_add_tail(&pebc->read_rq, req3); + + err = ssdfs_peb_start_thread(pebc, SSDFS_PEB_READ_THREAD); + if (unlikely(err)) { + if (err == -EINTR) { + /* + * Ignore this error. + */ + } else { + SSDFS_ERR("fail to start read thread: " + "peb_index %u, err %d\n", + pebc->peb_index, err); + } + + ssdfs_requests_queue_remove_all(&pebc->read_rq, -ERANGE); + goto fail_create_used_peb_obj; + } + + err = ssdfs_peb_start_thread(pebc, SSDFS_PEB_FLUSH_THREAD); + if (unlikely(err)) { + if (err == -EINTR) { + /* + * Ignore this error. + */ + } else { + SSDFS_ERR("fail to start flush thread: " + "peb_index %u, err %d\n", + pebc->peb_index, err); + } + + goto stop_read_thread; + } + + peb_blkbmap = &pebc->parent_si->blk_bmap.peb[pebc->peb_index]; + + if (!ssdfs_peb_blk_bmap_initialized(peb_blkbmap)) { + err = SSDFS_WAIT_COMPLETION(&req1->result.wait); + if (unlikely(err)) { + SSDFS_ERR("read thread fails: err %d\n", + err); + goto stop_flush_thread; + } + + /* + * Block bitmap has been locked for initialization. + * Now it isn't initialized yet. It should check + * block bitmap initialization state during first + * request about free pages count. + */ + } + + ssdfs_put_request(req1); + + /* + * Current log start_page and data_free_pages count was defined + * in the read thread during searching last actual state of block + * bitmap. + */ + + /* + * Wake up read request if it waits zeroing + * of reference counter. + */ + wake_up_all(&pebc->parent_si->wait_queue[SSDFS_PEB_READ_THREAD]); + + return 0; + +stop_flush_thread: + ssdfs_peb_stop_thread(pebc, SSDFS_PEB_FLUSH_THREAD); + +stop_read_thread: + ssdfs_requests_queue_remove_all(&pebc->read_rq, -ERANGE); + wake_up_all(&pebc->parent_si->wait_queue[SSDFS_PEB_READ_THREAD]); + ssdfs_peb_stop_thread(pebc, SSDFS_PEB_READ_THREAD); + +fail_create_used_peb_obj: + return err; +} + +/* + * ssdfs_create_pre_dirty_peb_container() - create "pre-dirty" PEB container + * @pebi: pointer on PEB container + * @selected_peb: source or destination PEB? + * + * This function tries to initialize PEB container for "pre-dirty" + * state of the PEB. + * + * RETURN: + * [success] - PEB container has been constructed sucessfully. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENOMEM - unable to allocate memory. + */ +static +int ssdfs_create_pre_dirty_peb_container(struct ssdfs_peb_container *pebc, + int selected_peb) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si); + BUG_ON(!pebc->parent_si->blk_bmap.peb); + BUG_ON(selected_peb < SSDFS_SRC_PEB || selected_peb > SSDFS_DST_PEB); + + SSDFS_DBG("seg %llu, peb_index %u, peb_type %#x, " + "selected_peb %u\n", + pebc->parent_si->seg_id, + pebc->peb_index, pebc->peb_type, + selected_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + return ssdfs_create_used_peb_container(pebc, selected_peb); +} + +/* + * ssdfs_create_dirty_peb_container() - create "dirty" PEB container + * @pebi: pointer on PEB container + * @selected_peb: source or destination PEB? + * + * This function tries to initialize PEB container for "dirty" + * state of the PEB. + * + * RETURN: + * [success] - PEB container has been constructed sucessfully. + * [failure] - error code: + * + * %-EINVAL - invalid input. + */ +static +int ssdfs_create_dirty_peb_container(struct ssdfs_peb_container *pebc, + int selected_peb) +{ + struct ssdfs_segment_info *si; + struct ssdfs_blk2off_table *blk2off_table; + struct ssdfs_segment_request *req; + int command; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si); + BUG_ON(!pebc->parent_si->blk_bmap.peb); + BUG_ON(selected_peb < SSDFS_SRC_PEB || selected_peb > SSDFS_DST_PEB); + + SSDFS_DBG("seg %llu, peb_index %u, peb_type %#x, " + "selected_peb %u\n", + pebc->parent_si->seg_id, + pebc->peb_index, pebc->peb_type, + selected_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = pebc->parent_si; + blk2off_table = si->blk2off_table; + + command = SSDFS_READ_SRC_LAST_LOG_FOOTER; + + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + req = NULL; + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + goto fail_create_dirty_peb_obj; + } + + ssdfs_request_init(req); + /* read thread puts request */ + ssdfs_get_request(req); + /* it needs to be sure that request will be not freed */ + ssdfs_get_request(req); + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + command, + SSDFS_REQ_ASYNC, + req); + ssdfs_request_define_segment(pebc->parent_si->seg_id, req); + ssdfs_peb_read_request_cno(pebc); + ssdfs_requests_queue_add_tail(&pebc->read_rq, req); + + err = ssdfs_peb_start_thread(pebc, SSDFS_PEB_READ_THREAD); + if (unlikely(err)) { + if (err == -EINTR) { + /* + * Ignore this error. + */ + } else { + SSDFS_ERR("fail to start read thread: " + "peb_index %u, err %d\n", + pebc->peb_index, err); + } + + ssdfs_requests_queue_remove_all(&pebc->read_rq, -ERANGE); + goto fail_create_dirty_peb_obj; + } + + err = SSDFS_WAIT_COMPLETION(&req->result.wait); + if (unlikely(err)) { + SSDFS_ERR("read thread fails: err %d\n", + err); + goto stop_read_thread; + } + + ssdfs_put_request(req); + + /* + * Wake up read request if it waits zeroing + * of reference counter. + */ + wake_up_all(&pebc->parent_si->wait_queue[SSDFS_PEB_READ_THREAD]); + + atomic_set(&blk2off_table->peb[pebc->peb_index].state, + SSDFS_BLK2OFF_TABLE_COMPLETE_INIT); + + return 0; + +stop_read_thread: + ssdfs_requests_queue_remove_all(&pebc->read_rq, -ERANGE); + wake_up_all(&pebc->parent_si->wait_queue[SSDFS_PEB_READ_THREAD]); + ssdfs_peb_stop_thread(pebc, SSDFS_PEB_READ_THREAD); + +fail_create_dirty_peb_obj: + return err; +} + +/* + * ssdfs_create_dirty_using_container() - create "dirty" + "using" PEB container + * @pebc: pointer on PEB container + * @selected_peb: source or destination PEB? + * + * This function tries to initialize PEB conatiner for "dirty" + "using" + * state of the PEBs. + * + * RETURN: + * [success] - PEB object has been constructed sucessfully. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENOMEM - unable to allocate memory. + */ +static +int ssdfs_create_dirty_using_container(struct ssdfs_peb_container *pebc, + int selected_peb) +{ + struct ssdfs_peb_blk_bmap *peb_blkbmap; + struct ssdfs_segment_request *req1, *req2, *req3, *req4; + int command; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si); + BUG_ON(!pebc->parent_si->blk_bmap.peb); + BUG_ON(selected_peb != SSDFS_SRC_AND_DST_PEB); + + SSDFS_DBG("seg %llu, peb_index %u, peb_type %#x, " + "selected_peb %u\n", + pebc->parent_si->seg_id, + pebc->peb_index, pebc->peb_type, + selected_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (selected_peb == SSDFS_SRC_AND_DST_PEB) + command = SSDFS_READ_SRC_LAST_LOG_FOOTER; + else + BUG(); + + req1 = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req1)) { + err = (req1 == NULL ? -ENOMEM : PTR_ERR(req1)); + req1 = NULL; + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + goto fail_create_dirty_using_peb_obj; + } + + ssdfs_request_init(req1); + ssdfs_get_request(req1); + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + command, + SSDFS_REQ_ASYNC, + req1); + ssdfs_request_define_segment(pebc->parent_si->seg_id, req1); + ssdfs_peb_read_request_cno(pebc); + ssdfs_requests_queue_add_tail(&pebc->read_rq, req1); + + if (selected_peb == SSDFS_SRC_AND_DST_PEB) + command = SSDFS_READ_DST_ALL_LOG_HEADERS; + else + BUG(); + + req2 = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req2)) { + err = (req2 == NULL ? -ENOMEM : PTR_ERR(req2)); + req2 = NULL; + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + goto fail_create_dirty_using_peb_obj; + } + + ssdfs_request_init(req2); + /* read thread puts request */ + ssdfs_get_request(req2); + /* it needs to be sure that request will be not freed */ + ssdfs_get_request(req2); + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + command, + SSDFS_REQ_ASYNC, + req2); + ssdfs_request_define_segment(pebc->parent_si->seg_id, req2); + ssdfs_peb_read_request_cno(pebc); + ssdfs_requests_queue_add_tail(&pebc->read_rq, req2); + + if (selected_peb == SSDFS_SRC_AND_DST_PEB) + command = SSDFS_READ_BLK_BMAP_DST_USING_PEB; + else + BUG(); + + req3 = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req3)) { + err = (req3 == NULL ? -ENOMEM : PTR_ERR(req3)); + req3 = NULL; + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + ssdfs_requests_queue_remove_all(&pebc->read_rq, -ERANGE); + goto fail_create_dirty_using_peb_obj; + } + + ssdfs_request_init(req3); + ssdfs_get_request(req3); + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + command, + SSDFS_REQ_ASYNC, + req3); + ssdfs_request_define_segment(pebc->parent_si->seg_id, req3); + ssdfs_peb_read_request_cno(pebc); + ssdfs_requests_queue_add_tail(&pebc->read_rq, req3); + + if (selected_peb == SSDFS_SRC_AND_DST_PEB) + command = SSDFS_READ_BLK2OFF_TABLE_DST_PEB; + else + BUG(); + + req4 = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req4)) { + err = (req4 == NULL ? -ENOMEM : PTR_ERR(req4)); + req4 = NULL; + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + ssdfs_requests_queue_remove_all(&pebc->read_rq, -ERANGE); + goto fail_create_dirty_using_peb_obj; + } + + ssdfs_request_init(req4); + ssdfs_get_request(req4); + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + command, + SSDFS_REQ_ASYNC, + req4); + ssdfs_request_define_segment(pebc->parent_si->seg_id, req4); + ssdfs_peb_read_request_cno(pebc); + ssdfs_requests_queue_add_tail(&pebc->read_rq, req4); + + err = ssdfs_peb_start_thread(pebc, SSDFS_PEB_READ_THREAD); + if (unlikely(err)) { + if (err == -EINTR) { + /* + * Ignore this error. + */ + } else { + SSDFS_ERR("fail to start read thread: " + "peb_index %u, err %d\n", + pebc->peb_index, err); + } + + ssdfs_requests_queue_remove_all(&pebc->read_rq, -ERANGE); + goto fail_create_dirty_using_peb_obj; + } + + err = ssdfs_peb_start_thread(pebc, SSDFS_PEB_FLUSH_THREAD); + if (unlikely(err)) { + if (err == -EINTR) { + /* + * Ignore this error. + */ + } else { + SSDFS_ERR("fail to start flush thread: " + "peb_index %u, err %d\n", + pebc->peb_index, err); + } + + goto stop_read_thread; + } + + peb_blkbmap = &pebc->parent_si->blk_bmap.peb[pebc->peb_index]; + + if (!ssdfs_peb_blk_bmap_initialized(peb_blkbmap)) { + err = SSDFS_WAIT_COMPLETION(&req2->result.wait); + if (unlikely(err)) { + SSDFS_ERR("read thread fails: err %d\n", + err); + goto stop_flush_thread; + } + + /* + * Block bitmap has been locked for initialization. + * Now it isn't initialized yet. It should check + * block bitmap initialization state during first + * request about free pages count. + */ + } + + ssdfs_put_request(req2); + + /* + * Current log start_page and data_free_pages count was defined + * in the read thread during searching last actual state of block + * bitmap. + */ + + /* + * Wake up read request if it waits zeroing + * of reference counter. + */ + wake_up_all(&pebc->parent_si->wait_queue[SSDFS_PEB_READ_THREAD]); + + return 0; + +stop_flush_thread: + ssdfs_peb_stop_thread(pebc, SSDFS_PEB_FLUSH_THREAD); + +stop_read_thread: + ssdfs_requests_queue_remove_all(&pebc->read_rq, -ERANGE); + wake_up_all(&pebc->parent_si->wait_queue[SSDFS_PEB_READ_THREAD]); + ssdfs_peb_stop_thread(pebc, SSDFS_PEB_READ_THREAD); + +fail_create_dirty_using_peb_obj: + return err; +} + +/* + * ssdfs_create_dirty_used_container() - create "dirty" + "used" PEB container + * @pebi: pointer on PEB container + * @selected_peb: source or destination PEB? + * + * This function tries to initialize PEB container for "dirty" + "used" + * state of the PEBs. + * + * RETURN: + * [success] - PEB container has been constructed sucessfully. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENOMEM - unable to allocate memory. + */ +static +int ssdfs_create_dirty_used_container(struct ssdfs_peb_container *pebc, + int selected_peb) +{ + struct ssdfs_peb_blk_bmap *peb_blkbmap; + struct ssdfs_segment_request *req1, *req2, *req3, *req4; + int command; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si); + BUG_ON(!pebc->parent_si->blk_bmap.peb); + BUG_ON(selected_peb != SSDFS_SRC_AND_DST_PEB); + + SSDFS_DBG("seg %llu, peb_index %u, peb_type %#x, " + "selected_peb %u\n", + pebc->parent_si->seg_id, + pebc->peb_index, pebc->peb_type, + selected_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (selected_peb == SSDFS_SRC_AND_DST_PEB) + command = SSDFS_READ_SRC_LAST_LOG_FOOTER; + else + BUG(); + + req1 = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req1)) { + err = (req1 == NULL ? -ENOMEM : PTR_ERR(req1)); + req1 = NULL; + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + goto fail_create_dirty_used_peb_obj; + } + + ssdfs_request_init(req1); + ssdfs_get_request(req1); + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + command, + SSDFS_REQ_ASYNC, + req1); + ssdfs_request_define_segment(pebc->parent_si->seg_id, req1); + ssdfs_peb_read_request_cno(pebc); + ssdfs_requests_queue_add_tail(&pebc->read_rq, req1); + + if (selected_peb == SSDFS_SRC_AND_DST_PEB) + command = SSDFS_READ_DST_ALL_LOG_HEADERS; + else + BUG(); + + req2 = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req2)) { + err = (req2 == NULL ? -ENOMEM : PTR_ERR(req2)); + req2 = NULL; + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + goto fail_create_dirty_used_peb_obj; + } + + ssdfs_request_init(req2); + /* read thread puts request */ + ssdfs_get_request(req2); + /* it needs to be sure that request will be not freed */ + ssdfs_get_request(req2); + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + command, + SSDFS_REQ_ASYNC, + req2); + ssdfs_request_define_segment(pebc->parent_si->seg_id, req2); + ssdfs_peb_read_request_cno(pebc); + ssdfs_requests_queue_add_tail(&pebc->read_rq, req2); + + if (selected_peb == SSDFS_SRC_AND_DST_PEB) + command = SSDFS_READ_BLK_BMAP_DST_USED_PEB; + else + BUG(); + + req3 = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req3)) { + err = (req3 == NULL ? -ENOMEM : PTR_ERR(req3)); + req3 = NULL; + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + ssdfs_requests_queue_remove_all(&pebc->read_rq, -ERANGE); + goto fail_create_dirty_used_peb_obj; + } + + ssdfs_request_init(req3); + ssdfs_get_request(req3); + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + command, + SSDFS_REQ_ASYNC, + req3); + ssdfs_request_define_segment(pebc->parent_si->seg_id, req3); + ssdfs_peb_read_request_cno(pebc); + ssdfs_requests_queue_add_tail(&pebc->read_rq, req3); + + if (selected_peb == SSDFS_SRC_AND_DST_PEB) + command = SSDFS_READ_BLK2OFF_TABLE_DST_PEB; + else + BUG(); + + req4 = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req4)) { + err = (req4 == NULL ? -ENOMEM : PTR_ERR(req4)); + req4 = NULL; + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + ssdfs_requests_queue_remove_all(&pebc->read_rq, -ERANGE); + goto fail_create_dirty_used_peb_obj; + } + + ssdfs_request_init(req4); + ssdfs_get_request(req4); + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + command, + SSDFS_REQ_ASYNC, + req4); + ssdfs_request_define_segment(pebc->parent_si->seg_id, req4); + ssdfs_peb_read_request_cno(pebc); + ssdfs_requests_queue_add_tail(&pebc->read_rq, req4); + + err = ssdfs_peb_start_thread(pebc, SSDFS_PEB_READ_THREAD); + if (unlikely(err)) { + if (err == -EINTR) { + /* + * Ignore this error. + */ + } else { + SSDFS_ERR("fail to start read thread: " + "peb_index %u, err %d\n", + pebc->peb_index, err); + } + + ssdfs_requests_queue_remove_all(&pebc->read_rq, -ERANGE); + goto fail_create_dirty_used_peb_obj; + } + + err = ssdfs_peb_start_thread(pebc, SSDFS_PEB_FLUSH_THREAD); + if (unlikely(err)) { + if (err == -EINTR) { + /* + * Ignore this error. + */ + } else { + SSDFS_ERR("fail to start flush thread: " + "peb_index %u, err %d\n", + pebc->peb_index, err); + } + + goto stop_read_thread; + } + + peb_blkbmap = &pebc->parent_si->blk_bmap.peb[pebc->peb_index]; + + if (!ssdfs_peb_blk_bmap_initialized(peb_blkbmap)) { + err = SSDFS_WAIT_COMPLETION(&req2->result.wait); + if (unlikely(err)) { + SSDFS_ERR("read thread fails: err %d\n", + err); + goto stop_flush_thread; + } + + /* + * Block bitmap has been locked for initialization. + * Now it isn't initialized yet. It should check + * block bitmap initialization state during first + * request about free pages count. + */ + } + + ssdfs_put_request(req2); + + /* + * Current log start_page and data_free_pages count was defined + * in the read thread during searching last actual state of block + * bitmap. + */ + + /* + * Wake up read request if it waits zeroing + * of reference counter. + */ + wake_up_all(&pebc->parent_si->wait_queue[SSDFS_PEB_READ_THREAD]); + + return 0; + +stop_flush_thread: + ssdfs_peb_stop_thread(pebc, SSDFS_PEB_FLUSH_THREAD); + +stop_read_thread: + ssdfs_requests_queue_remove_all(&pebc->read_rq, -ERANGE); + wake_up_all(&pebc->parent_si->wait_queue[SSDFS_PEB_READ_THREAD]); + ssdfs_peb_stop_thread(pebc, SSDFS_PEB_READ_THREAD); + +fail_create_dirty_used_peb_obj: + return err; +} + +/* + * ssdfs_peb_container_get_peb_relation() - get description of relation + * @fsi: file system info object + * @seg: segment identification number + * @peb_index: PEB's index + * @peb_type: PEB's type + * @seg_state: segment state + * @pebr: description of PEBs relation [out] + * + * This function tries to retrieve PEBs' relation description. + * + * RETURN: + * [success]. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENODATA - cannott map LEB to PEB. + */ +static +int ssdfs_peb_container_get_peb_relation(struct ssdfs_fs_info *fsi, + u64 seg, u32 peb_index, + u8 peb_type, int seg_state, + struct ssdfs_maptbl_peb_relation *pebr) +{ + u64 leb_id; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !pebr); + + SSDFS_DBG("fsi %p, seg %llu, peb_index %u, " + "peb_type %#x, seg_state %#x\n", + fsi, seg, peb_index, peb_type, seg_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + leb_id = ssdfs_get_leb_id_for_peb_index(fsi, seg, peb_index); + if (leb_id == U64_MAX) { + SSDFS_ERR("fail to convert PEB index into LEB ID: " + "seg %llu, peb_index %u\n", + seg, peb_index); + return -EINVAL; + } + + switch (seg_state) { + case SSDFS_SEG_CLEAN: + err = ssdfs_peb_map_leb2peb(fsi, leb_id, peb_type, + pebr); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("can't map LEB to PEB: " + "leb_id %llu, peb_type %#x, err %d\n", + leb_id, peb_type, err); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to map LEB to PEB: " + "leb_id %llu, peb_type %#x, err %d\n", + leb_id, peb_type, err); + return err; + } + break; + + case SSDFS_SEG_DATA_USING: + case SSDFS_SEG_LEAF_NODE_USING: + case SSDFS_SEG_HYBRID_NODE_USING: + case SSDFS_SEG_INDEX_NODE_USING: + err = ssdfs_peb_map_leb2peb(fsi, leb_id, peb_type, + pebr); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("can't map LEB to PEB: " + "leb_id %llu, peb_type %#x, err %d\n", + leb_id, peb_type, err); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to map LEB to PEB: " + "leb_id %llu, peb_type %#x, err %d\n", + leb_id, peb_type, err); + return err; + } + break; + + case SSDFS_SEG_USED: + case SSDFS_SEG_PRE_DIRTY: + err = ssdfs_peb_convert_leb2peb(fsi, leb_id, peb_type, + pebr); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB doesn't mapped: leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, peb_type %#x, err %d\n", + leb_id, peb_type, err); + return err; + } + break; + + case SSDFS_SEG_DIRTY: + err = ssdfs_peb_convert_leb2peb(fsi, leb_id, peb_type, + pebr); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB doesn't mapped: leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, peb_type %#x, err %d\n", + leb_id, peb_type, err); + return err; + } + break; + + default: + SSDFS_ERR("invalid segment state\n"); + return -EINVAL; + }; + + return 0; +} + +/* + * ssdfs_peb_container_start_threads() - start PEB container's threads + * @pebc: pointer on PEB container + * @src_peb_state: source PEB's state + * @dst_peb_state: destination PEB's state + * @src_peb_flags: source PEB's flags + * + * This function tries to start PEB's container threads. + * + * RETURN: + * [success]. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENOMEM - unable to allocate memory. + */ +static +int ssdfs_peb_container_start_threads(struct ssdfs_peb_container *pebc, + int src_peb_state, + int dst_peb_state, + u8 src_peb_flags) +{ + struct ssdfs_peb_blk_bmap *peb_blkbmap; + bool peb_has_ext_ptr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc); + + SSDFS_DBG("seg %llu, peb_index %u, src_peb_state %#x, " + "dst_peb_state %#x, src_peb_flags %#x\n", + pebc->parent_si->seg_id, + pebc->peb_index, src_peb_state, + dst_peb_state, src_peb_flags); +#endif /* CONFIG_SSDFS_DEBUG */ + + peb_has_ext_ptr = src_peb_flags & SSDFS_MAPTBL_SOURCE_PEB_HAS_EXT_PTR; + + switch (src_peb_state) { + case SSDFS_MAPTBL_UNKNOWN_PEB_STATE: + switch (dst_peb_state) { + case SSDFS_MAPTBL_MIGRATION_DST_CLEAN_STATE: + err = ssdfs_create_clean_peb_container(pebc, + SSDFS_DST_PEB); + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto fail_start_threads; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create clean PEB container: " + "err %d\n", err); + goto fail_start_threads; + } + break; + + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + err = ssdfs_create_using_peb_container(pebc, + SSDFS_DST_PEB); + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto fail_start_threads; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create using PEB container: " + "err %d\n", err); + goto fail_start_threads; + } + break; + case SSDFS_MAPTBL_MIGRATION_DST_USED_STATE: + err = ssdfs_create_used_peb_container(pebc, + SSDFS_DST_PEB); + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto fail_start_threads; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create used PEB container: " + "err %d\n", err); + goto fail_start_threads; + } + break; + + case SSDFS_MAPTBL_MIGRATION_DST_PRE_DIRTY_STATE: + err = ssdfs_create_pre_dirty_peb_container(pebc, + SSDFS_DST_PEB); + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto fail_start_threads; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create pre-dirty PEB " + "container: err %d\n", err); + goto fail_start_threads; + } + break; + + case SSDFS_MAPTBL_MIGRATION_DST_DIRTY_STATE: + err = ssdfs_create_dirty_peb_container(pebc, + SSDFS_DST_PEB); + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto fail_start_threads; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create dirty PEB container: " + "err %d\n", err); + goto fail_start_threads; + } + break; + + default: + SSDFS_ERR("invalid PEB state: " + "source %#x, destination %#x\n", + src_peb_state, dst_peb_state); + err = -ERANGE; + goto fail_start_threads; + } + break; + + case SSDFS_MAPTBL_CLEAN_PEB_STATE: + err = ssdfs_create_clean_peb_container(pebc, + SSDFS_SRC_PEB); + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto fail_start_threads; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create clean PEB container: " + "err %d\n", err); + goto fail_start_threads; + } + break; + + case SSDFS_MAPTBL_USING_PEB_STATE: + err = ssdfs_create_using_peb_container(pebc, + SSDFS_SRC_PEB); + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto fail_start_threads; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create using PEB container: " + "err %d\n", err); + goto fail_start_threads; + } + break; + + case SSDFS_MAPTBL_USED_PEB_STATE: + err = ssdfs_create_used_peb_container(pebc, + SSDFS_SRC_PEB); + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto fail_start_threads; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create used PEB container: " + "err %d\n", err); + goto fail_start_threads; + } + break; + + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + err = ssdfs_create_pre_dirty_peb_container(pebc, + SSDFS_SRC_PEB); + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto fail_start_threads; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create pre-dirty PEB container: " + "err %d\n", err); + goto fail_start_threads; + } + break; + + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + err = ssdfs_create_dirty_peb_container(pebc, + SSDFS_SRC_PEB); + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto fail_start_threads; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create dirty PEB container: " + "err %d\n", err); + goto fail_start_threads; + } + break; + + case SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE: + switch (dst_peb_state) { + case SSDFS_MAPTBL_MIGRATION_DST_CLEAN_STATE: + peb_blkbmap = + &pebc->parent_si->blk_bmap.peb[pebc->peb_index]; + atomic_set(&peb_blkbmap->state, + SSDFS_PEB_BLK_BMAP_HAS_CLEAN_DST); + + err = ssdfs_create_used_peb_container(pebc, + SSDFS_SRC_PEB); + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto fail_start_threads; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create used PEB container: " + "err %d\n", err); + goto fail_start_threads; + } + break; + + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + if (peb_has_ext_ptr) { + err = ssdfs_create_used_peb_container(pebc, + SSDFS_SRC_PEB); + } else { + err = ssdfs_create_using_peb_container(pebc, + SSDFS_SRC_AND_DST_PEB); + } + + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto fail_start_threads; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create using PEB container: " + "err %d\n", err); + goto fail_start_threads; + } + break; + + default: + SSDFS_ERR("invalid PEB state: " + "source %#x, destination %#x\n", + src_peb_state, dst_peb_state); + err = -ERANGE; + goto fail_start_threads; + } + break; + + case SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE: + switch (dst_peb_state) { + case SSDFS_MAPTBL_MIGRATION_DST_CLEAN_STATE: + peb_blkbmap = + &pebc->parent_si->blk_bmap.peb[pebc->peb_index]; + atomic_set(&peb_blkbmap->state, + SSDFS_PEB_BLK_BMAP_HAS_CLEAN_DST); + + err = ssdfs_create_pre_dirty_peb_container(pebc, + SSDFS_SRC_PEB); + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto fail_start_threads; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create pre-dirty PEB " + "container: err %d\n", err); + goto fail_start_threads; + } + break; + + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + if (peb_has_ext_ptr) { + err = ssdfs_create_pre_dirty_peb_container(pebc, + SSDFS_SRC_PEB); + } else { + err = ssdfs_create_using_peb_container(pebc, + SSDFS_SRC_AND_DST_PEB); + } + + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto fail_start_threads; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create using PEB container: " + "err %d\n", err); + goto fail_start_threads; + } + break; + + default: + SSDFS_ERR("invalid PEB state: " + "source %#x, destination %#x\n", + src_peb_state, dst_peb_state); + err = -ERANGE; + goto fail_start_threads; + } + break; + + case SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE: + switch (dst_peb_state) { + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + if (peb_has_ext_ptr) { + err = ssdfs_create_dirty_peb_container(pebc, + SSDFS_SRC_PEB); + } else { + err = ssdfs_create_dirty_using_container(pebc, + SSDFS_SRC_AND_DST_PEB); + } + + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto fail_start_threads; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create using PEB container: " + "err %d\n", err); + goto fail_start_threads; + } + break; + + case SSDFS_MAPTBL_MIGRATION_DST_USED_STATE: + if (peb_has_ext_ptr) { + err = ssdfs_create_dirty_peb_container(pebc, + SSDFS_SRC_PEB); + } else { + err = ssdfs_create_dirty_used_container(pebc, + SSDFS_SRC_AND_DST_PEB); + } + + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto fail_start_threads; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create used PEB container: " + "err %d\n", err); + goto fail_start_threads; + } + break; + + case SSDFS_MAPTBL_MIGRATION_DST_PRE_DIRTY_STATE: + if (peb_has_ext_ptr) { + err = ssdfs_create_dirty_peb_container(pebc, + SSDFS_SRC_PEB); + } else { + err = ssdfs_create_dirty_used_container(pebc, + SSDFS_SRC_AND_DST_PEB); + } + + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto fail_start_threads; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create pre-dirty PEB " + "container: err %d\n", err); + goto fail_start_threads; + } + break; + + case SSDFS_MAPTBL_MIGRATION_DST_DIRTY_STATE: + if (peb_has_ext_ptr) { + err = ssdfs_create_dirty_peb_container(pebc, + SSDFS_SRC_PEB); + } else { + err = ssdfs_create_dirty_peb_container(pebc, + SSDFS_DST_PEB); + } + + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto fail_start_threads; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create dirty PEB container: " + "err %d\n", err); + goto fail_start_threads; + } + break; + + default: + SSDFS_ERR("invalid PEB state: " + "source %#x, destination %#x\n", + src_peb_state, dst_peb_state); + err = -ERANGE; + goto fail_start_threads; + } + break; + + default: + SSDFS_ERR("invalid PEB state: " + "source %#x, destination %#x\n", + src_peb_state, dst_peb_state); + err = -ERANGE; + goto fail_start_threads; + }; + +fail_start_threads: + return err; +} + +/* + * ssdfs_peb_container_create() - create PEB's container object + * @fsi: pointer on shared file system object + * @seg: segment number + * @peb_index: index of PEB object in array + * @log_pages: count of pages in log + * @si: pointer on parent segment object + * + * This function tries to create PEB object(s) for @seg + * identification number and for @peb_index in array. + * + * RETURN: + * [success] - PEB object(s) has been constructed sucessfully. + * [failure] - error code: + * + * %-EINVAL - invalid input. + */ +int ssdfs_peb_container_create(struct ssdfs_fs_info *fsi, + u64 seg, u32 peb_index, + u8 peb_type, + u32 log_pages, + struct ssdfs_segment_info *si) +{ + struct ssdfs_peb_container *pebc; + struct ssdfs_peb_info *pebi; + struct ssdfs_maptbl_peb_relation pebr; + struct ssdfs_maptbl_peb_descriptor *mtblpd; + int src_peb_state = SSDFS_MAPTBL_UNKNOWN_PEB_STATE; + int dst_peb_state = SSDFS_MAPTBL_UNKNOWN_PEB_STATE; + u8 src_peb_flags = 0; + u8 dst_peb_flags = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !si || !si->peb_array); + + if (seg >= fsi->nsegs) { + SSDFS_ERR("requested seg %llu >= nsegs %llu\n", + seg, fsi->nsegs); + return -EINVAL; + } + + if (peb_index >= si->pebs_count) { + SSDFS_ERR("requested peb_index %u >= pebs_count %u\n", + peb_index, si->pebs_count); + return -EINVAL; + } +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("fsi %p, seg %llu, peb_index %u, " + "peb_type %#x, si %p\n", + fsi, seg, peb_index, peb_type, si); +#else + SSDFS_DBG("fsi %p, seg %llu, peb_index %u, " + "peb_type %#x, si %p\n", + fsi, seg, peb_index, peb_type, si); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + pebc = &si->peb_array[peb_index]; + + memset(pebc, 0, sizeof(struct ssdfs_peb_container)); + mutex_init(&pebc->migration_lock); + atomic_set(&pebc->migration_state, SSDFS_PEB_UNKNOWN_MIGRATION_STATE); + atomic_set(&pebc->migration_phase, SSDFS_PEB_MIGRATION_STATUS_UNKNOWN); + atomic_set(&pebc->items_state, SSDFS_PEB_CONTAINER_EMPTY); + atomic_set(&pebc->shared_free_dst_blks, 0); + init_waitqueue_head(&pebc->migration_wq); + init_rwsem(&pebc->lock); + atomic_set(&pebc->dst_peb_refs, 0); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("shared_free_dst_blks %d\n", + atomic_read(&pebc->shared_free_dst_blks)); + SSDFS_DBG("dst_peb_refs %d\n", + atomic_read(&pebc->dst_peb_refs)); +#endif /* CONFIG_SSDFS_DEBUG */ + + pebc->peb_type = peb_type; + if (peb_type >= SSDFS_MAPTBL_PEB_TYPE_MAX) { + SSDFS_ERR("invalid seg_type %#x\n", si->seg_type); + return -EINVAL; + } + + pebc->peb_index = peb_index; + pebc->log_pages = log_pages; + pebc->parent_si = si; + + ssdfs_requests_queue_init(&pebc->read_rq); + ssdfs_requests_queue_init(&pebc->update_rq); + spin_lock_init(&pebc->pending_lock); + pebc->pending_updated_user_data_pages = 0; + spin_lock_init(&pebc->crq_ptr_lock); + pebc->create_rq = NULL; + + spin_lock_init(&pebc->cache_protection.cno_lock); + pebc->cache_protection.create_cno = ssdfs_current_cno(fsi->sb); + pebc->cache_protection.last_request_cno = + pebc->cache_protection.create_cno; + pebc->cache_protection.reqs_count = 0; + pebc->cache_protection.protected_range = 0; + pebc->cache_protection.future_request_cno = + pebc->cache_protection.create_cno; + + err = ssdfs_peb_container_get_peb_relation(fsi, seg, peb_index, + peb_type, + atomic_read(&si->seg_state), + &pebr); + if (err == -ENODATA) { + struct ssdfs_peb_blk_bmap *peb_blkbmap; + + err = 0; + + peb_blkbmap = &pebc->parent_si->blk_bmap.peb[pebc->peb_index]; + ssdfs_set_block_bmap_initialized(peb_blkbmap->src); + atomic_set(&peb_blkbmap->state, SSDFS_PEB_BLK_BMAP_INITIALIZED); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("can't map LEB to PEB: " + "seg %llu, peb_index %u, " + "peb_type %#x, err %d\n", + seg, peb_index, peb_type, err); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_init_peb_container; + } else if (unlikely(err)) { + SSDFS_ERR("fail to map LEB to PEB: " + "seg %llu, peb_index %u, " + "peb_type %#x, err %d\n", + seg, peb_index, peb_type, err); + goto fail_init_peb_container; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("MAIN_INDEX: peb_id %llu, type %#x, " + "state %#x, consistency %#x; " + "RELATION_INDEX: peb_id %llu, type %#x, " + "state %#x, consistency %#x\n", + pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id, + pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].type, + pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].consistency, + pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX].peb_id, + pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX].type, + pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX].state, + pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX].consistency); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&pebc->lock); + + mtblpd = &pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX]; + + if (mtblpd->peb_id == U64_MAX) + goto try_process_relation; + + pebi = &pebc->items[SSDFS_SEG_PEB1]; + + err = ssdfs_peb_object_create(pebi, pebc, + mtblpd->peb_id, + mtblpd->state, + SSDFS_PEB_UNKNOWN_MIGRATION_ID); + if (unlikely(err)) { + SSDFS_ERR("fail to create PEB object: " + "seg %llu, peb_index %u, " + "peb_id %llu, peb_state %#x\n", + seg, peb_index, + mtblpd->peb_id, + mtblpd->state); + goto fail_create_peb_objects; + } + + pebc->src_peb = pebi; + src_peb_state = mtblpd->state; + src_peb_flags = mtblpd->flags; + + if (mtblpd->flags & SSDFS_MAPTBL_SHARED_DESTINATION_PEB || + (mtblpd->flags & SSDFS_MAPTBL_SHARED_DESTINATION_PEB && + mtblpd->flags & SSDFS_MAPTBL_SOURCE_PEB_HAS_EXT_PTR)) { + SSDFS_ERR("invalid set of flags %#x\n", + mtblpd->flags); + err = -EIO; + goto fail_create_peb_objects; + } + + atomic_set(&pebc->migration_state, SSDFS_PEB_NOT_MIGRATING); + atomic_set(&pebc->items_state, SSDFS_PEB1_SRC_CONTAINER); + + switch (mtblpd->state) { + case SSDFS_MAPTBL_CLEAN_PEB_STATE: + case SSDFS_MAPTBL_USING_PEB_STATE: + case SSDFS_MAPTBL_USED_PEB_STATE: + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + /* PEB container has been created */ + goto start_container_threads; + break; + + case SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE: + /* + * Do nothing here. + * Follow to create second PEB object. + */ + break; + + default: + SSDFS_WARN("invalid PEB state: " + "seg %llu, peb_index %u, " + "peb_id %llu, peb_state %#x\n", + seg, peb_index, + mtblpd->peb_id, + mtblpd->state); + err = -ERANGE; + goto fail_create_peb_objects; + } + +try_process_relation: + mtblpd = &pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX]; + + if (mtblpd->peb_id == U64_MAX) { + SSDFS_ERR("invalid peb_id\n"); + err = -ERANGE; + goto fail_create_peb_objects; + } + + switch (mtblpd->state) { + case SSDFS_MAPTBL_MIGRATION_DST_CLEAN_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_DIRTY_STATE: + /* + * Do nothing here. + * Follow to create second PEB object. + */ + break; + + default: + SSDFS_WARN("invalid PEB state: " + "seg %llu, peb_index %u, " + "peb_id %llu, peb_state %#x\n", + seg, peb_index, + mtblpd->peb_id, + mtblpd->state); + err = -ERANGE; + goto fail_create_peb_objects; + } + + if (mtblpd->flags & SSDFS_MAPTBL_SHARED_DESTINATION_PEB) { + u8 shared_peb_index = mtblpd->shared_peb_index; + + if (!pebc->src_peb) { + SSDFS_ERR("source PEB is absent\n"); + err = -ERANGE; + goto fail_create_peb_objects; + } + + if (shared_peb_index >= si->pebs_count) { + SSDFS_ERR("shared_peb_index %u >= si->pebs_count %u\n", + shared_peb_index, si->pebs_count); + err = -ERANGE; + goto fail_create_peb_objects; + } + + pebi = &si->peb_array[shared_peb_index].items[SSDFS_SEG_PEB2]; + pebc->dst_peb = pebi; + atomic_set(&pebc->items_state, + SSDFS_PEB1_SRC_EXT_PTR_DST_CONTAINER); + atomic_inc(&si->peb_array[shared_peb_index].dst_peb_refs); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_id %llu, dst_peb_refs %d\n", + pebi->peb_id, + atomic_read(&si->peb_array[shared_peb_index].dst_peb_refs)); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + pebi = &pebc->items[SSDFS_SEG_PEB2]; + + err = ssdfs_peb_object_create(pebi, pebc, + mtblpd->peb_id, + mtblpd->state, + SSDFS_PEB_UNKNOWN_MIGRATION_ID); + if (unlikely(err)) { + SSDFS_ERR("fail to create PEB object: " + "seg %llu, peb_index %u, " + "peb_id %llu, peb_state %#x\n", + seg, peb_index, + mtblpd->peb_id, + mtblpd->state); + goto fail_create_peb_objects; + } + + pebc->dst_peb = pebi; + + if (!pebc->src_peb) { + atomic_set(&pebc->items_state, + SSDFS_PEB2_DST_CONTAINER); + } else { + atomic_set(&pebc->items_state, + SSDFS_PEB1_SRC_PEB2_DST_CONTAINER); + atomic_inc(&pebc->dst_peb_refs); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_id %llu, dst_peb_refs %d\n", + mtblpd->peb_id, + atomic_read(&pebc->dst_peb_refs)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + } + + dst_peb_state = mtblpd->state; + dst_peb_flags = mtblpd->flags; + + if (mtblpd->flags & SSDFS_MAPTBL_SOURCE_PEB_HAS_EXT_PTR || + (mtblpd->flags & SSDFS_MAPTBL_SHARED_DESTINATION_PEB && + mtblpd->flags & SSDFS_MAPTBL_SOURCE_PEB_HAS_EXT_PTR)) { + SSDFS_ERR("invalid set of flags %#x\n", + mtblpd->flags); + err = -EIO; + goto fail_create_peb_objects; + } + + atomic_set(&pebc->migration_state, SSDFS_PEB_UNDER_MIGRATION); + atomic_inc(&si->migration.migrating_pebs); + +start_container_threads: + up_write(&pebc->lock); + + err = ssdfs_peb_container_start_threads(pebc, src_peb_state, + dst_peb_state, + src_peb_flags); + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto fail_init_peb_container; + } else if (unlikely(err)) { + SSDFS_ERR("fail to start PEB's threads: " + "err %d\n", err); + goto fail_init_peb_container; + } + +finish_init_peb_container: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PEB has been created: " + "seg %llu, peb_index %u\n", + seg, peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; + +fail_create_peb_objects: + up_write(&pebc->lock); + +fail_init_peb_container: + ssdfs_peb_container_destroy(pebc); + return err; +} + +/* + * ssdfs_peb_container_destroy() - destroy PEB's container object + * @ptr: pointer on container placement + */ +void ssdfs_peb_container_destroy(struct ssdfs_peb_container *ptr) +{ + int migration_state; + int items_state; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr); +#endif /* CONFIG_SSDFS_DEBUG */ + + migration_state = atomic_read(&ptr->migration_state); + items_state = atomic_read(&ptr->items_state); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("ptr %p, migration_state %#x, items_state %#x\n", + ptr, migration_state, items_state); +#else + SSDFS_DBG("ptr %p, migration_state %#x, items_state %#x\n", + ptr, migration_state, items_state); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (!is_ssdfs_requests_queue_empty(&ptr->read_rq)) { + ssdfs_fs_error(ptr->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "read requests queue isn't empty\n"); + err = -EIO; + ssdfs_requests_queue_remove_all(&ptr->read_rq, err); + } + + if (!is_ssdfs_requests_queue_empty(&ptr->update_rq)) { + ssdfs_fs_error(ptr->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "flush requests queue isn't empty\n"); + err = -EIO; + ssdfs_requests_queue_remove_all(&ptr->update_rq, err); + } + + if (is_peb_container_empty(ptr)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PEB container is empty: " + "peb_type %#x, peb_index %u\n", + ptr->peb_type, ptr->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + return; + } + + if (migration_state <= SSDFS_PEB_UNKNOWN_MIGRATION_STATE || + migration_state >= SSDFS_PEB_MIGRATION_STATE_MAX) { + SSDFS_WARN("invalid migration_state %#x\n", + migration_state); + } + + if (items_state < SSDFS_PEB_CONTAINER_EMPTY || + items_state >= SSDFS_PEB_CONTAINER_STATE_MAX) { + SSDFS_WARN("invalid items_state %#x\n", + items_state); + } + + for (i = 0; i < SSDFS_PEB_THREAD_TYPE_MAX; i++) { + int err2; + + err2 = ssdfs_peb_stop_thread(ptr, i); + if (err2 == -EIO) { + ssdfs_fs_error(ptr->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "thread I/O issue: " + "peb_index %u, thread type %#x\n", + ptr->peb_index, i); + } else if (unlikely(err2)) { + SSDFS_WARN("thread stopping issue: " + "peb_index %u, thread type %#x, err %d\n", + ptr->peb_index, i, err2); + } + } + + down_write(&ptr->lock); + + switch (atomic_read(&ptr->items_state)) { + case SSDFS_PEB_CONTAINER_EMPTY: +#ifdef CONFIG_SSDFS_DEBUG + WARN_ON(ptr->src_peb); + WARN_ON(ptr->dst_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + ptr->src_peb = NULL; + ptr->dst_peb = NULL; + break; + + case SSDFS_PEB1_SRC_CONTAINER: + case SSDFS_PEB2_SRC_CONTAINER: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr->src_peb); + WARN_ON(ptr->dst_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + err = ssdfs_peb_object_destroy(ptr->src_peb); + if (unlikely(err)) { + SSDFS_WARN("fail to destroy PEB object: " + "err %d\n", + err); + } + ptr->src_peb = NULL; + ptr->dst_peb = NULL; + break; + + case SSDFS_PEB1_DST_CONTAINER: + case SSDFS_PEB2_DST_CONTAINER: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr->dst_peb); + WARN_ON(ptr->src_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + err = ssdfs_peb_object_destroy(ptr->dst_peb); + if (unlikely(err)) { + SSDFS_WARN("fail to destroy PEB object: " + "err %d\n", + err); + } + + ptr->src_peb = NULL; + ptr->dst_peb = NULL; + break; + + case SSDFS_PEB1_SRC_EXT_PTR_DST_CONTAINER: + case SSDFS_PEB2_SRC_EXT_PTR_DST_CONTAINER: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr->src_peb); + BUG_ON(!ptr->dst_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + err = ssdfs_peb_object_destroy(ptr->src_peb); + if (unlikely(err)) { + SSDFS_WARN("fail to destroy PEB object: " + "err %d\n", + err); + } + ptr->src_peb = NULL; + ptr->dst_peb = NULL; + break; + + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr->src_peb); + BUG_ON(!ptr->dst_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + err = ssdfs_peb_object_destroy(ptr->src_peb); + if (unlikely(err)) { + SSDFS_WARN("fail to destroy PEB object: " + "err %d\n", + err); + } + err = ssdfs_peb_object_destroy(ptr->dst_peb); + if (unlikely(err)) { + SSDFS_WARN("fail to destroy PEB object: " + "err %d\n", + err); + } + ptr->src_peb = NULL; + ptr->dst_peb = NULL; + break; + + default: + BUG(); + } + + memset(ptr->items, 0, + sizeof(struct ssdfs_peb_info) * SSDFS_SEG_PEB_ITEMS_MAX); + + up_write(&ptr->lock); + + atomic_set(&ptr->migration_state, SSDFS_PEB_UNKNOWN_MIGRATION_STATE); + atomic_set(&ptr->items_state, SSDFS_PEB_CONTAINER_EMPTY); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ +} From patchwork Sat Feb 25 01:08:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151928 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C65D8C64ED8 for ; Sat, 25 Feb 2023 01:17:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229582AbjBYBRM (ORCPT ); Fri, 24 Feb 2023 20:17:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229674AbjBYBQ3 (ORCPT ); Fri, 24 Feb 2023 20:16:29 -0500 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA86A14E8D for ; Fri, 24 Feb 2023 17:16:24 -0800 (PST) Received: by mail-oi1-x22e.google.com with SMTP id bh20so791863oib.9 for ; Fri, 24 Feb 2023 17:16:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=6ksZ0t2YKR8B6C7nF1sfXOGTZ6hq9F6b8Rj3jN9XECU=; b=A47uAJ8RGbZTFkaTnQGFUBjeu6xaaQXDaEUaoy/ph3I/C2HvrhyKLkft0+iQsjXkRi 5ds4QuxNa6ZJoPm+dZv9cB/rNXiGxbAal+y1j/vB6bFra02LmaoMv5XQhTVVZSzKU/cr J1S12czGfBJa/Y9g5nZM3knystCjaJjnvcIbe6eOy34xmZmerZ+eGOAuCjeyF1kooiwk vFJPrlVLSBITWpM1gH37X5p31ca5PruqTIRhrW4jZSb2EEHjvGbx4fBL8I5SIPqGVnw9 I9TtbK4iM5OJbTbKahkXRMdpQl15vWmNm4HKCqEzALR/dqNyEVUrBZmKEVdw7mSYTB4h wqjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6ksZ0t2YKR8B6C7nF1sfXOGTZ6hq9F6b8Rj3jN9XECU=; b=MVPG2usWRR6rNXYOsn4sT+Ax8k0gyA/zSKkDQ2CE5geKO9DDFUb4L9XFxxd69Gxi2A WcborZliL6LZJLmA9uTtCMs+/XUR25Vw+91pGTyKcRL9gw0rcacOLSEJhVE+r9CJdaWp E6zJju8xkuG/A6nmt/yDBpefZt0vZ/T3ckDUigEL58R5CKb8wzpcRNhlqb6EPejdQj03 EnQvpe4NRDgM/szX9f8yTMa0PSAPllps2SO+c6QsFP1vFx+3bPI0w3IsUssXdHaUWd7Y ulFXLNJgfvof65CWG8p9Uli6YLS7ZvXp+irdn/eb2/j+OUY4243iNSwQ+jL8x9wGTxYJ kEpA== X-Gm-Message-State: AO0yUKWxKJhKWqXL3wOoft7StVOGxn3XIEVXF/Uuu1+9epmYWmgplhhJ 8peNDYe4hidWi/HtKDewkoijRNiT8Q1110l7 X-Google-Smtp-Source: AK7set/nw55YWCFUZdgQl0DEqUhg0sekd9jfU38pI1B8gD5qse0EFKPZ1V5epmYZkZjBoK9lfRNOnw== X-Received: by 2002:a05:6808:2981:b0:37f:8776:7fb with SMTP id ex1-20020a056808298100b0037f877607fbmr6804654oib.24.1677287783465; Fri, 24 Feb 2023 17:16:23 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:22 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 23/76] ssdfs: PEB container API implementation Date: Fri, 24 Feb 2023 17:08:34 -0800 Message-Id: <20230225010927.813929-24-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This patch implements PEB container's API logic. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/peb_container.c | 2980 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 2980 insertions(+) diff --git a/fs/ssdfs/peb_container.c b/fs/ssdfs/peb_container.c index 668ded673719..92798bcbe8b7 100644 --- a/fs/ssdfs/peb_container.c +++ b/fs/ssdfs/peb_container.c @@ -2667,3 +2667,2983 @@ void ssdfs_peb_container_destroy(struct ssdfs_peb_container *ptr) SSDFS_DBG("finished\n"); #endif /* CONFIG_SSDFS_TRACK_API_CALL */ } + +/* + * ssdfs_peb_container_prepare_relation() - prepare relation with destination + * @ptr: pointer on PEB container + * + * This method tries to create the relation between source of @ptr + * and existing destination in another PEB container. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_container_prepare_relation(struct ssdfs_peb_container *ptr) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + struct ssdfs_peb_mapping_table *maptbl; + struct ssdfs_migration_destination *destination; + struct ssdfs_peb_container *relation; + int shared_index; + int destination_state; + u16 peb_index, dst_peb_index; + u64 leb_id, dst_leb_id; + struct completion *end; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr || !ptr->src_peb); + BUG_ON(!ptr->parent_si || !ptr->parent_si->fsi); + BUG_ON(!mutex_is_locked(&ptr->migration_lock)); + + SSDFS_DBG("ptr %p, peb_index %u, " + "peb_type %#x, log_pages %u\n", + ptr, + ptr->peb_index, + ptr->peb_type, + ptr->log_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = ptr->parent_si->fsi; + maptbl = fsi->maptbl; + si = ptr->parent_si; + peb_index = ptr->peb_index; + +try_define_relation: + destination = &si->migration.array[SSDFS_LAST_DESTINATION]; + + spin_lock(&si->migration.lock); + destination_state = destination->state; + shared_index = destination->shared_peb_index; + spin_unlock(&si->migration.lock); + + switch (destination_state) { + case SSDFS_VALID_DESTINATION: + /* do nothing here */ + break; + + case SSDFS_DESTINATION_UNDER_CREATION: + /* FALLTHRU */ + fallthrough; + case SSDFS_OBSOLETE_DESTINATION: { + DEFINE_WAIT(wait); + + mutex_unlock(&ptr->migration_lock); + prepare_to_wait(&ptr->migration_wq, &wait, + TASK_UNINTERRUPTIBLE); + schedule(); + finish_wait(&ptr->migration_wq, &wait); + mutex_lock(&ptr->migration_lock); + goto try_define_relation; + } + break; + + case SSDFS_EMPTY_DESTINATION: + SSDFS_ERR("destination is empty\n"); + return -ERANGE; + + default: + BUG(); + } + + if (shared_index < 0 || shared_index >= si->pebs_count) { + SSDFS_ERR("invalid shared_index %d\n", + shared_index); + return -ERANGE; + } + + relation = &si->peb_array[shared_index]; + + destination_state = atomic_read(&relation->migration_state); + switch (destination_state) { + case SSDFS_PEB_MIGRATION_PREPARATION: + SSDFS_ERR("destination PEB is under preparation: " + "shared_index %d\n", + shared_index); + return -ERANGE; + + case SSDFS_PEB_UNDER_MIGRATION: + switch (atomic_read(&relation->items_state)) { + case SSDFS_PEB1_DST_CONTAINER: + case SSDFS_PEB2_DST_CONTAINER: + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + /* do nothing */ + break; + + default: + SSDFS_WARN("invalid relation state: " + "shared_index %d\n", + shared_index); + return -ERANGE; + } + + down_read(&relation->lock); + + if (!relation->dst_peb) { + err = -ERANGE; + SSDFS_ERR("dst_peb is NULL\n"); + goto finish_define_relation; + } + + ptr->dst_peb = relation->dst_peb; + atomic_inc(&relation->dst_peb_refs); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_id %llu, dst_peb_refs %d\n", + relation->dst_peb->peb_id, + atomic_read(&relation->dst_peb_refs)); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_define_relation: + up_read(&relation->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to define relation: " + "shared_index %d\n", + shared_index); + return err; + } + + leb_id = ssdfs_get_leb_id_for_peb_index(fsi, + si->seg_id, + peb_index); + if (leb_id >= U64_MAX) { + SSDFS_ERR("fail to convert PEB index into LEB ID: " + "seg %llu, peb_index %u\n", + si->seg_id, peb_index); + return -ERANGE; + } + + dst_peb_index = ptr->dst_peb->peb_index; + + dst_leb_id = ssdfs_get_leb_id_for_peb_index(fsi, + si->seg_id, + dst_peb_index); + if (dst_leb_id >= U64_MAX) { + SSDFS_ERR("fail to convert PEB index into LEB ID: " + "seg %llu, peb_index %u\n", + si->seg_id, peb_index); + return -ERANGE; + } + + err = ssdfs_maptbl_set_indirect_relation(maptbl, + leb_id, + ptr->peb_type, + dst_leb_id, + dst_peb_index, + &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + ptr->dst_peb = NULL; + atomic_dec(&relation->dst_peb_refs); +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dst_peb_refs %d\n", + atomic_read(&relation->dst_peb_refs)); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } + + err = ssdfs_maptbl_set_indirect_relation(maptbl, + leb_id, + ptr->peb_type, + dst_leb_id, + dst_peb_index, + &end); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to set relation LEB to PEB: " + "leb_id %llu, dst_peb_index %u" + "err %d\n", + leb_id, dst_peb_index, err); + ptr->dst_peb = NULL; + atomic_dec(&relation->dst_peb_refs); +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dst_peb_refs %d\n", + atomic_read(&relation->dst_peb_refs)); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } + + switch (atomic_read(&ptr->items_state)) { + case SSDFS_PEB1_SRC_CONTAINER: + atomic_set(&ptr->items_state, + SSDFS_PEB1_SRC_EXT_PTR_DST_CONTAINER); + break; + + case SSDFS_PEB2_SRC_CONTAINER: + atomic_set(&ptr->items_state, + SSDFS_PEB2_SRC_EXT_PTR_DST_CONTAINER); + break; + + default: + BUG(); + } + break; + + case SSDFS_PEB_RELATION_PREPARATION: + SSDFS_WARN("peb not migrating: " + "shared_index %d\n", + shared_index); + return -ERANGE; + + case SSDFS_PEB_NOT_MIGRATING: + SSDFS_WARN("peb not migrating: " + "shared_index %d\n", + shared_index); + return -ERANGE; + + default: + BUG(); + } + + return 0; +} + +/* + * __ssdfs_peb_container_prepare_destination() - prepare destination + * @ptr: pointer on PEB container + * + * This method tries to create the destination PEB in requested + * container. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - try to create a relation. + */ +static +int __ssdfs_peb_container_prepare_destination(struct ssdfs_peb_container *ptr) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + struct ssdfs_migration_destination *destination; + struct ssdfs_maptbl_peb_relation pebr; + struct ssdfs_peb_info *pebi; + struct ssdfs_peb_blk_bmap *peb_blkbmap; + int shared_index; + int destination_state; + int items_state; + u16 peb_index; + u64 leb_id; + u64 peb_id; + u64 seg; + u32 log_pages; + u8 peb_migration_id; + struct completion *end; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr || !ptr->src_peb); + BUG_ON(!ptr->parent_si || !ptr->parent_si->fsi); + + SSDFS_DBG("ptr %p, peb_index %u, " + "peb_type %#x, log_pages %u\n", + ptr, + ptr->peb_index, + ptr->peb_type, + ptr->log_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = ptr->parent_si->fsi; + si = ptr->parent_si; + seg = si->seg_id; + peb_index = ptr->peb_index; + log_pages = ptr->log_pages; + + spin_lock(&si->migration.lock); + destination = &si->migration.array[SSDFS_CREATING_DESTINATION]; + destination_state = destination->state; + shared_index = destination->shared_peb_index; + spin_unlock(&si->migration.lock); + + if (destination_state != SSDFS_DESTINATION_UNDER_CREATION && + shared_index != ptr->peb_index) { + SSDFS_ERR("destination_state %#x, " + "shared_index %d, " + "peb_index %u\n", + destination_state, + shared_index, + ptr->peb_index); + return -ERANGE; + } + + leb_id = ssdfs_get_leb_id_for_peb_index(fsi, si->seg_id, peb_index); + if (leb_id >= U64_MAX) { + SSDFS_ERR("fail to convert PEB index into LEB ID: " + "seg %llu, peb_index %u\n", + si->seg_id, peb_index); + return -ERANGE; + } + + err = ssdfs_maptbl_add_migration_peb(fsi, leb_id, ptr->peb_type, + &pebr, &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + goto fail_prepare_destination; + } + + err = ssdfs_maptbl_add_migration_peb(fsi, leb_id, + ptr->peb_type, + &pebr, &end); + } + + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find PEB for migration: " + "leb_id %llu, peb_type %#x\n", + leb_id, ptr->peb_type); +#endif /* CONFIG_SSDFS_DEBUG */ + goto fail_prepare_destination; + } else if (err == -EBUSY) { + DEFINE_WAIT(wait); + +wait_erase_operation_end: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("wait_erase_operation_end: " + "leb_id %llu, peb_type %#x\n", + leb_id, ptr->peb_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + wake_up_all(&fsi->maptbl->wait_queue); + + mutex_unlock(&ptr->migration_lock); + prepare_to_wait(&fsi->maptbl->erase_ops_end_wq, &wait, + TASK_UNINTERRUPTIBLE); + schedule(); + finish_wait(&fsi->maptbl->erase_ops_end_wq, &wait); + mutex_lock(&ptr->migration_lock); + + err = ssdfs_maptbl_add_migration_peb(fsi, leb_id, ptr->peb_type, + &pebr, &end); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find PEB for migration: " + "leb_id %llu, peb_type %#x\n", + leb_id, ptr->peb_type); +#endif /* CONFIG_SSDFS_DEBUG */ + goto fail_prepare_destination; + } else if (err == -EBUSY) { + /* + * We still have pre-erased PEBs. + * Let's wait more. + */ + goto wait_erase_operation_end; + } else if (unlikely(err)) { + SSDFS_ERR("fail to add migration PEB: " + "leb_id %llu, peb_type %#x, " + "err %d\n", + leb_id, ptr->peb_type, err); + goto fail_prepare_destination; + } + } else if (unlikely(err)) { + SSDFS_ERR("fail to add migration PEB: " + "leb_id %llu, peb_type %#x, " + "err %d\n", + leb_id, ptr->peb_type, err); + goto fail_prepare_destination; + } + + down_write(&ptr->lock); + + items_state = atomic_read(&ptr->items_state); + + switch (items_state) { + case SSDFS_PEB1_SRC_CONTAINER: + pebi = &ptr->items[SSDFS_SEG_PEB2]; + break; + + case SSDFS_PEB_CONTAINER_EMPTY: + case SSDFS_PEB2_SRC_CONTAINER: + pebi = &ptr->items[SSDFS_SEG_PEB1]; + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid container state: %#x\n", + atomic_read(&ptr->items_state)); + goto finish_prepare_destination; + break; + }; + + peb_id = pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX].peb_id; + peb_migration_id = ssdfs_define_next_peb_migration_id(ptr->src_peb); + if (!is_peb_migration_id_valid(peb_migration_id)) { + err = -ERANGE; + SSDFS_ERR("fail to define peb_migration_id\n"); + goto finish_prepare_destination; + } + + err = ssdfs_peb_object_create(pebi, ptr, peb_id, + SSDFS_MAPTBL_CLEAN_PEB_STATE, + peb_migration_id); + if (unlikely(err)) { + SSDFS_ERR("fail to create PEB object: " + "seg %llu, peb_index %u, " + "peb_id %llu\n", + seg, peb_index, + peb_id); + goto finish_prepare_destination; + } + + ptr->dst_peb = pebi; + atomic_inc(&ptr->dst_peb_refs); + + atomic_set(&pebi->state, + SSDFS_PEB_OBJECT_INITIALIZED); + complete_all(&pebi->init_end); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_id %llu, dst_peb_refs %d\n", + pebi->peb_id, + atomic_read(&ptr->dst_peb_refs)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (items_state) { + case SSDFS_PEB_CONTAINER_EMPTY: + atomic_set(&ptr->items_state, + SSDFS_PEB1_DST_CONTAINER); + break; + + case SSDFS_PEB1_SRC_CONTAINER: + atomic_set(&ptr->items_state, + SSDFS_PEB1_SRC_PEB2_DST_CONTAINER); + break; + + case SSDFS_PEB2_SRC_CONTAINER: + atomic_set(&ptr->items_state, + SSDFS_PEB2_SRC_PEB1_DST_CONTAINER); + break; + + default: + BUG(); + } + + if (atomic_read(&ptr->items_state) == SSDFS_PEB1_DST_CONTAINER) { + int free_blks; + + free_blks = ssdfs_peb_get_free_pages(ptr); + if (unlikely(free_blks < 0)) { + err = free_blks; + SSDFS_ERR("fail to get free_blks: " + "peb_index %u, err %d\n", + ptr->peb_index, err); + goto finish_prepare_destination; + } else if (free_blks == 0) { + err = -ERANGE; + SSDFS_ERR("PEB hasn't free blocks\n"); + goto finish_prepare_destination; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(free_blks >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + atomic_set(&ptr->shared_free_dst_blks, (u16)free_blks); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("shared_free_dst_blks %d\n", + atomic_read(&ptr->shared_free_dst_blks)); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + if (ptr->peb_index >= si->blk_bmap.pebs_count) { + err = -ERANGE; + SSDFS_ERR("peb_index %u >= pebs_count %u\n", + ptr->peb_index, + si->blk_bmap.pebs_count); + goto finish_prepare_destination; + } + + peb_blkbmap = &si->blk_bmap.peb[ptr->peb_index]; + err = ssdfs_peb_blk_bmap_start_migration(peb_blkbmap); + if (unlikely(err)) { + SSDFS_ERR("fail to start PEB's block bitmap migration: " + "seg %llu, peb_index %u, err %d\n", + si->seg_id, ptr->peb_index, err); + goto finish_prepare_destination; + } + } + +finish_prepare_destination: + up_write(&ptr->lock); + + if (unlikely(err)) + goto fail_prepare_destination; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_container_prepare_destination: " + "seg_id %llu, leb_id %llu, peb_id %llu, " + "free_blks %d, used_blks %d, invalid_blks %d\n", + si->seg_id, leb_id, peb_id, + ssdfs_peb_get_free_pages(ptr), + ssdfs_peb_get_used_data_pages(ptr), + ssdfs_peb_get_invalid_pages(ptr)); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&si->migration.lock); + ssdfs_memcpy(&si->migration.array[SSDFS_LAST_DESTINATION], + 0, sizeof(struct ssdfs_migration_destination), + &si->migration.array[SSDFS_CREATING_DESTINATION], + 0, sizeof(struct ssdfs_migration_destination), + sizeof(struct ssdfs_migration_destination)); + destination = &si->migration.array[SSDFS_LAST_DESTINATION]; + destination->state = SSDFS_VALID_DESTINATION; + memset(&si->migration.array[SSDFS_CREATING_DESTINATION], + 0xFF, sizeof(struct ssdfs_migration_destination)); + destination = &si->migration.array[SSDFS_CREATING_DESTINATION]; + destination->state = SSDFS_EMPTY_DESTINATION; + spin_unlock(&si->migration.lock); + + return 0; + +fail_prepare_destination: + spin_lock(&si->migration.lock); + + destination = &si->migration.array[SSDFS_CREATING_DESTINATION]; + destination->state = SSDFS_EMPTY_DESTINATION; + destination->shared_peb_index = -1; + + destination = &si->migration.array[SSDFS_LAST_DESTINATION]; + switch (destination->state) { + case SSDFS_OBSOLETE_DESTINATION: + destination->state = SSDFS_VALID_DESTINATION; + break; + + case SSDFS_EMPTY_DESTINATION: + /* do nothing */ + break; + + case SSDFS_VALID_DESTINATION: + SSDFS_DBG("old destination is valid\n"); + break; + + default: + BUG(); + }; + + spin_unlock(&si->migration.lock); + + return err; +} + +/* + * ssdfs_peb_container_prepare_zns_destination() - prepare ZNS destination + * @ptr: pointer on PEB container + * + * This method tries to create relation with shared segment for + * user data updates. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - try to create a relation. + */ +static +int ssdfs_peb_container_prepare_zns_destination(struct ssdfs_peb_container *ptr) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_current_segment *cur_seg; + struct ssdfs_segment_info *si; + struct ssdfs_segment_info *dest_si = NULL; + struct ssdfs_peb_mapping_table *maptbl; + u64 start = U64_MAX; + int seg_type = SSDFS_USER_DATA_SEG_TYPE; + u16 peb_index, dst_peb_index; + u64 leb_id, dst_leb_id; + struct completion *end; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr || !ptr->src_peb); + BUG_ON(!ptr->parent_si || !ptr->parent_si->fsi); + BUG_ON(!mutex_is_locked(&ptr->migration_lock)); + + SSDFS_DBG("ptr %p, peb_index %u, " + "peb_type %#x, log_pages %u\n", + ptr, + ptr->peb_index, + ptr->peb_type, + ptr->log_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = ptr->parent_si->fsi; + maptbl = fsi->maptbl; + si = ptr->parent_si; + peb_index = ptr->peb_index; + + leb_id = ssdfs_get_leb_id_for_peb_index(fsi, si->seg_id, peb_index); + if (leb_id >= U64_MAX) { + SSDFS_ERR("fail to convert PEB index into LEB ID: " + "seg %llu, peb_index %u\n", + si->seg_id, peb_index); + return -ERANGE; + } + + down_read(&fsi->cur_segs->lock); + + cur_seg = fsi->cur_segs->objects[SSDFS_CUR_DATA_UPDATE_SEG]; + + ssdfs_current_segment_lock(cur_seg); + + if (is_ssdfs_current_segment_empty(cur_seg)) { + start = cur_seg->seg_id; + dest_si = ssdfs_grab_segment(fsi, seg_type, U64_MAX, start); + if (IS_ERR_OR_NULL(dest_si)) { + err = (dest_si == NULL ? -ENOMEM : PTR_ERR(dest_si)); + if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to create segment object: " + "err %d\n", err); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + SSDFS_ERR("fail to create segment object: " + "err %d\n", err); + } + + goto finish_get_current_segment; + } + + err = ssdfs_current_segment_add(cur_seg, dest_si); + /* + * ssdfs_grab_segment() has got object already. + */ + ssdfs_segment_put_object(dest_si); + if (unlikely(err)) { + SSDFS_ERR("fail to add segment %llu as current: " + "err %d\n", + dest_si->seg_id, err); + goto finish_get_current_segment; + } + } + + dst_peb_index = 0; + dst_leb_id = ssdfs_get_leb_id_for_peb_index(fsi, dest_si->seg_id, dst_peb_index); + if (leb_id >= U64_MAX) { + SSDFS_ERR("fail to convert PEB index into LEB ID: " + "seg %llu, peb_index %u\n", + dest_si->seg_id, dst_peb_index); + return -ERANGE; + } + +finish_get_current_segment: + ssdfs_current_segment_unlock(cur_seg); + up_read(&fsi->cur_segs->lock); + + if (unlikely(err)) + return err; + + err = ssdfs_maptbl_set_zns_indirect_relation(maptbl, + leb_id, + ptr->peb_type, + &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + ptr->dst_peb = NULL; + return err; + } + + err = ssdfs_maptbl_set_zns_indirect_relation(maptbl, + leb_id, + ptr->peb_type, + &end); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to set relation LEB to PEB: " + "leb_id %llu, dst leb_id %llu" + "err %d\n", + leb_id, dst_leb_id, err); + ptr->dst_peb = NULL; + return err; + } + + switch (atomic_read(&ptr->items_state)) { + case SSDFS_PEB1_SRC_CONTAINER: + atomic_set(&ptr->items_state, + SSDFS_PEB1_SRC_EXT_PTR_DST_CONTAINER); + break; + + case SSDFS_PEB2_SRC_CONTAINER: + atomic_set(&ptr->items_state, + SSDFS_PEB2_SRC_EXT_PTR_DST_CONTAINER); + break; + + default: + BUG(); + } + + return 0; +} + +/* + * ssdfs_peb_container_prepare_destination() - prepare destination + * @ptr: pointer on PEB container + * + * This method tries to create the destination PEB in requested + * container. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - try to create a relation. + */ +static +int ssdfs_peb_container_prepare_destination(struct ssdfs_peb_container *ptr) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr || !ptr->src_peb); + BUG_ON(!ptr->parent_si || !ptr->parent_si->fsi); + + SSDFS_DBG("ptr %p, peb_index %u, " + "peb_type %#x, log_pages %u\n", + ptr, + ptr->peb_index, + ptr->peb_type, + ptr->log_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = ptr->parent_si->fsi; + si = ptr->parent_si; + + if (fsi->is_zns_device && is_ssdfs_peb_containing_user_data(ptr)) + err = ssdfs_peb_container_prepare_zns_destination(ptr); + else + err = __ssdfs_peb_container_prepare_destination(ptr); + + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to prepare destination: " + "seg %llu, peb_index %u, err %d\n", + si->seg_id, ptr->peb_index, err); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to prepare destination: " + "seg %llu, peb_index %u, err %d\n", + si->seg_id, ptr->peb_index, err); + } + + return err; +} + +/* + * ssdfs_peb_container_create_destination() - create destination + * @ptr: pointer on PEB container + * + * This method tries to create the destination or relation + * with another PEB container. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_peb_container_create_destination(struct ssdfs_peb_container *ptr) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + struct ssdfs_peb_container *relation; + struct ssdfs_migration_destination *destination; + bool need_create_relation = false; + u16 migration_threshold; + u16 pebs_per_destination; + u16 destination_index; + int migration_state; + int items_state; + int destination_pebs; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr || !ptr->src_peb); + BUG_ON(!ptr->parent_si || !ptr->parent_si->fsi); + BUG_ON(!mutex_is_locked(&ptr->migration_lock)); + + SSDFS_DBG("ptr %p, peb_index %u, " + "peb_type %#x, log_pages %u\n", + ptr, + ptr->peb_index, + ptr->peb_type, + ptr->log_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = ptr->parent_si->fsi; + si = ptr->parent_si; + + spin_lock(&fsi->volume_state_lock); + migration_threshold = fsi->migration_threshold; + spin_unlock(&fsi->volume_state_lock); + + migration_state = atomic_read(&ptr->migration_state); + + if (migration_state != SSDFS_PEB_NOT_MIGRATING) { + err = -ERANGE; + SSDFS_ERR("invalid migration_state %#x\n", + migration_state); + goto finish_create_destination; + } + + items_state = atomic_read(&ptr->items_state); + + if (items_state != SSDFS_PEB1_SRC_CONTAINER && + items_state != SSDFS_PEB2_SRC_CONTAINER) { + err = -ERANGE; + SSDFS_ERR("invalid items_state %#x\n", + items_state); + goto finish_create_destination; + } + + pebs_per_destination = fsi->pebs_per_seg / migration_threshold; + destination_index = + atomic_inc_return(&si->migration.migrating_pebs) - 1; + destination_index /= pebs_per_destination; + +try_start_preparation_again: + spin_lock(&si->migration.lock); + + destination = &si->migration.array[SSDFS_LAST_DESTINATION]; + + switch (destination->state) { + case SSDFS_EMPTY_DESTINATION: + need_create_relation = false; + destination = &si->migration.array[SSDFS_CREATING_DESTINATION]; + destination->state = SSDFS_DESTINATION_UNDER_CREATION; + destination->destination_pebs++; + destination->shared_peb_index = ptr->peb_index; + break; + + case SSDFS_VALID_DESTINATION: + destination_pebs = destination->destination_pebs; + need_create_relation = destination_index < destination_pebs; + + if (need_create_relation) { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(destination_index >= si->pebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + relation = &si->peb_array[destination_index]; + if (atomic_read(&relation->shared_free_dst_blks) <= 0) { + /* destination hasn't free room */ + need_create_relation = false; + } + } + + if (!need_create_relation) { + destination = + &si->migration.array[SSDFS_CREATING_DESTINATION]; + destination->state = SSDFS_DESTINATION_UNDER_CREATION; + destination->destination_pebs++; + destination->shared_peb_index = ptr->peb_index; + } + break; + + case SSDFS_OBSOLETE_DESTINATION: + destination = &si->migration.array[SSDFS_CREATING_DESTINATION]; + + if (destination->state != SSDFS_DESTINATION_UNDER_CREATION) { + err = -ERANGE; + SSDFS_WARN("invalid destination state %#x\n", + destination->state); + goto finish_check_destination; + } + + destination_pebs = destination->destination_pebs; + need_create_relation = destination_index < destination_pebs; + + if (!need_create_relation) + err = -EAGAIN; + break; + + default: + BUG(); + }; + +finish_check_destination: + spin_unlock(&si->migration.lock); + + if (err == -EAGAIN) { + DEFINE_WAIT(wait); + + mutex_unlock(&ptr->migration_lock); + prepare_to_wait(&ptr->migration_wq, &wait, + TASK_UNINTERRUPTIBLE); + schedule(); + finish_wait(&ptr->migration_wq, &wait); + mutex_lock(&ptr->migration_lock); + err = 0; + goto try_start_preparation_again; + } else if (unlikely(err)) + goto finish_create_destination; + + if (need_create_relation) { +create_relation: + atomic_set(&ptr->migration_state, + SSDFS_PEB_RELATION_PREPARATION); + + err = ssdfs_peb_container_prepare_relation(ptr); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare relation: " + "err %d\n", + err); + goto finish_create_destination; + } + + atomic_set(&ptr->migration_state, + SSDFS_PEB_UNDER_MIGRATION); + } else { + atomic_set(&ptr->migration_state, + SSDFS_PEB_MIGRATION_PREPARATION); + + err = ssdfs_peb_container_prepare_destination(ptr); + if (err == -ENODATA) { + err = 0; + goto create_relation; + } else if (unlikely(err)) { + SSDFS_ERR("fail to prepare destination: " + "err %d\n", + err); + goto finish_create_destination; + } + + atomic_set(&ptr->migration_state, + SSDFS_PEB_UNDER_MIGRATION); + } + +finish_create_destination: + if (unlikely(err)) { + atomic_set(&ptr->migration_state, migration_state); + atomic_dec(&si->migration.migrating_pebs); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("migration_state %d\n", + atomic_read(&ptr->migration_state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_peb_container_move_dest2source() - convert destination into source + * @ptr: pointer on PEB container + * @state: current state of items + * + * This method tries to transform destination PEB + * into source PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - destination PEB has references. + */ +static +int ssdfs_peb_container_move_dest2source(struct ssdfs_peb_container *ptr, + int state) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + struct ssdfs_segment_migration_info *mi; + struct ssdfs_migration_destination *mdest; + int new_state; + u64 leb_id; + u64 peb_create_time = U64_MAX; + u64 last_log_time = U64_MAX; + struct completion *end; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr || !ptr->src_peb); + BUG_ON(!ptr->parent_si || !ptr->parent_si->fsi); + BUG_ON(!rwsem_is_locked(&ptr->lock)); + + SSDFS_DBG("ptr %p, peb_index %u, " + "peb_type %#x, log_pages %u, " + "state %#x\n", + ptr, + ptr->peb_index, + ptr->peb_type, + ptr->log_pages, + state); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = ptr->parent_si->fsi; + si = ptr->parent_si; + + leb_id = ssdfs_get_leb_id_for_peb_index(fsi, + si->seg_id, + ptr->peb_index); + if (leb_id >= U64_MAX) { + SSDFS_ERR("fail to convert PEB index into LEB ID: " + "seg %llu, peb_index %u\n", + si->seg_id, ptr->peb_index); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_id %llu, dst_peb_refs %d\n", + ptr->dst_peb->peb_id, + atomic_read(&ptr->dst_peb_refs)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (atomic_read(&ptr->dst_peb_refs) > 1) { + /* wait of absence of references */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("leb_id %llu, peb_index %u, " + "refs_count %u\n", + leb_id, ptr->peb_index, + atomic_read(&ptr->dst_peb_refs)); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + + switch (state) { + case SSDFS_PEB1_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + new_state = SSDFS_PEB1_SRC_CONTAINER; + break; + + case SSDFS_PEB2_DST_CONTAINER: + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + new_state = SSDFS_PEB2_SRC_CONTAINER; + break; + + default: + SSDFS_WARN("invalid state: %#x\n", + state); + return -ERANGE; + } + + if (ptr->src_peb) { + peb_create_time = ptr->src_peb->peb_create_time; + + ssdfs_peb_current_log_lock(ptr->src_peb); + last_log_time = ptr->src_peb->current_log.last_log_time; + ssdfs_peb_current_log_unlock(ptr->src_peb); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, " + "peb_create_time %llx, last_log_time %llx\n", + si->seg_id, + ptr->src_peb->peb_id, + peb_create_time, + last_log_time); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + peb_create_time = ptr->dst_peb->peb_create_time; + + ssdfs_peb_current_log_lock(ptr->dst_peb); + last_log_time = ptr->dst_peb->current_log.last_log_time; + ssdfs_peb_current_log_unlock(ptr->dst_peb); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, " + "peb_create_time %llx, last_log_time %llx\n", + si->seg_id, + ptr->dst_peb->peb_id, + peb_create_time, + last_log_time); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + err = ssdfs_maptbl_exclude_migration_peb(fsi, leb_id, + ptr->peb_type, + peb_create_time, + last_log_time, + &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + return err; + } + + err = ssdfs_maptbl_exclude_migration_peb(fsi, leb_id, + ptr->peb_type, + peb_create_time, + last_log_time, + &end); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to exclude migration PEB: " + "leb_id %llu, peb_type %#x, err %d\n", + leb_id, ptr->peb_type, err); + return err; + } + + atomic_dec(&si->peb_array[ptr->dst_peb->peb_index].dst_peb_refs); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, leb_id %llu, " + "peb_id %llu, dst_peb_refs %d\n", + si->seg_id, leb_id, + ptr->dst_peb->peb_id, + atomic_read(&si->peb_array[ptr->dst_peb->peb_index].dst_peb_refs)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (ptr->src_peb) { + err = ssdfs_peb_object_destroy(ptr->src_peb); + WARN_ON(err); + err = 0; + } + + memset(ptr->src_peb, 0, sizeof(struct ssdfs_peb_info)); + ptr->src_peb = ptr->dst_peb; + ptr->dst_peb = NULL; + + atomic_set(&ptr->items_state, new_state); + atomic_set(&ptr->migration_state, SSDFS_PEB_NOT_MIGRATING); + + mi = &ptr->parent_si->migration; + spin_lock(&mi->lock); + atomic_dec(&mi->migrating_pebs); + mdest = &mi->array[SSDFS_LAST_DESTINATION]; + switch (mdest->state) { + case SSDFS_VALID_DESTINATION: + case SSDFS_OBSOLETE_DESTINATION: + mdest->destination_pebs--; + break; + }; + mdest = &mi->array[SSDFS_CREATING_DESTINATION]; + switch (mdest->state) { + case SSDFS_DESTINATION_UNDER_CREATION: + mdest->destination_pebs--; + break; + }; + spin_unlock(&mi->lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_peb_container_break_relation() - break relation with PEB + * @ptr: pointer on PEB container + * @state: current state of items + * @new_state: new state of items + * + * This method tries to break relation with destination PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_container_break_relation(struct ssdfs_peb_container *ptr, + int state, int new_state) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + struct ssdfs_peb_mapping_table *maptbl; + u64 leb_id, dst_leb_id; + u16 dst_peb_index; + int dst_peb_refs; + struct completion *end; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr || !ptr->src_peb || !ptr->dst_peb); + BUG_ON(!ptr->parent_si || !ptr->parent_si->fsi); + BUG_ON(!rwsem_is_locked(&ptr->lock)); + + SSDFS_DBG("ptr %p, peb_index %u, " + "peb_type %#x, log_pages %u, " + "state %#x, new_state %#x\n", + ptr, + ptr->peb_index, + ptr->peb_type, + ptr->log_pages, + state, new_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = ptr->parent_si->fsi; + si = ptr->parent_si; + maptbl = fsi->maptbl; + + leb_id = ssdfs_get_leb_id_for_peb_index(fsi, + si->seg_id, + ptr->peb_index); + if (leb_id >= U64_MAX) { + SSDFS_ERR("fail to convert PEB index into LEB ID: " + "seg %llu, peb_index %u\n", + si->seg_id, ptr->peb_index); + return -ERANGE; + } + + dst_peb_index = ptr->dst_peb->peb_index; + + dst_leb_id = ssdfs_get_leb_id_for_peb_index(fsi, + si->seg_id, + dst_peb_index); + if (dst_leb_id >= U64_MAX) { + SSDFS_ERR("fail to convert PEB index into LEB ID: " + "seg %llu, peb_index %u\n", + si->seg_id, dst_peb_index); + return -ERANGE; + } + + dst_peb_refs = atomic_read(&si->peb_array[dst_peb_index].dst_peb_refs); + + err = ssdfs_maptbl_break_indirect_relation(maptbl, + leb_id, + ptr->peb_type, + dst_leb_id, + dst_peb_refs, + &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + return err; + } + + err = ssdfs_maptbl_break_indirect_relation(maptbl, + leb_id, + ptr->peb_type, + dst_leb_id, + dst_peb_refs, + &end); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to break relation: " + "leb_id %llu, peb_index %u, err %d\n", + leb_id, ptr->peb_index, err); + return err; + } + + atomic_dec(&si->peb_array[ptr->dst_peb->peb_index].dst_peb_refs); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_id %llu, dst_peb_refs %d\n", + ptr->dst_peb->peb_id, + atomic_read(&si->peb_array[ptr->dst_peb->peb_index].dst_peb_refs)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (new_state == SSDFS_PEB_CONTAINER_EMPTY) { + err = ssdfs_peb_object_destroy(ptr->src_peb); + WARN_ON(err); + err = 0; + + memset(ptr->src_peb, 0, sizeof(struct ssdfs_peb_info)); + } else + ptr->dst_peb = NULL; + + atomic_set(&ptr->items_state, new_state); + atomic_set(&ptr->migration_state, SSDFS_PEB_NOT_MIGRATING); + atomic_dec(&ptr->parent_si->migration.migrating_pebs); + + return 0; +} + +/* + * ssdfs_peb_container_break_zns_relation() - break relation with PEB + * @ptr: pointer on PEB container + * @state: current state of items + * @new_state: new state of items + * + * This method tries to break relation with shared zone. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_container_break_zns_relation(struct ssdfs_peb_container *ptr, + int state, int new_state) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + struct ssdfs_peb_mapping_table *maptbl; + struct ssdfs_segment_blk_bmap *seg_blkbmap; + struct ssdfs_invextree_info *invextree; + struct ssdfs_btree_search *search; + struct ssdfs_raw_extent extent; + u64 leb_id; + int invalid_blks; + struct completion *end; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr || !ptr->src_peb || !ptr->dst_peb); + BUG_ON(!ptr->parent_si || !ptr->parent_si->fsi); + BUG_ON(!rwsem_is_locked(&ptr->lock)); + + SSDFS_DBG("ptr %p, peb_index %u, " + "peb_type %#x, log_pages %u, " + "state %#x, new_state %#x\n", + ptr, + ptr->peb_index, + ptr->peb_type, + ptr->log_pages, + state, new_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = ptr->parent_si->fsi; + si = ptr->parent_si; + maptbl = fsi->maptbl; + seg_blkbmap = &si->blk_bmap; + + leb_id = ssdfs_get_leb_id_for_peb_index(fsi, + si->seg_id, + ptr->peb_index); + if (leb_id >= U64_MAX) { + SSDFS_ERR("fail to convert PEB index into LEB ID: " + "seg %llu, peb_index %u\n", + si->seg_id, ptr->peb_index); + return -ERANGE; + } + + err = ssdfs_maptbl_break_zns_indirect_relation(maptbl, + leb_id, + ptr->peb_type, + &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + return err; + } + + err = ssdfs_maptbl_break_zns_indirect_relation(maptbl, + leb_id, + ptr->peb_type, + &end); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to break relation: " + "leb_id %llu, peb_index %u, err %d\n", + leb_id, ptr->peb_index, err); + return err; + } + + invextree = fsi->invextree; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!invextree); +#endif /* CONFIG_SSDFS_DEBUG */ + + invalid_blks = ssdfs_segment_blk_bmap_get_invalid_pages(seg_blkbmap); + if (invalid_blks <= 0) { + SSDFS_ERR("invalid state: " + "leb_id %llu, invalid_blks %d\n", + leb_id, invalid_blks); + return -ERANGE; + } + + search = ssdfs_btree_search_alloc(); + if (!search) { + SSDFS_ERR("fail to allocate btree search object\n"); + return -ENOMEM; + } + + extent.seg_id = cpu_to_le64(si->seg_id); + extent.logical_blk = cpu_to_le32(0); + extent.len = cpu_to_le32(invalid_blks); + + ssdfs_btree_search_init(search); + err = ssdfs_invextree_delete(invextree, &extent, search); + ssdfs_btree_search_free(search); + + if (unlikely(err)) { + SSDFS_ERR("fail to delete invalidated extent: " + "leb_id %llu, len %d, err %d\n", + leb_id, invalid_blks, err); + return err; + } + + if (new_state == SSDFS_PEB_CONTAINER_EMPTY) { + err = ssdfs_peb_object_destroy(ptr->src_peb); + WARN_ON(err); + err = 0; + + memset(ptr->src_peb, 0, sizeof(struct ssdfs_peb_info)); + } else + ptr->dst_peb = NULL; + + atomic_set(&ptr->items_state, new_state); + atomic_set(&ptr->migration_state, SSDFS_PEB_NOT_MIGRATING); + atomic_dec(&ptr->parent_si->migration.migrating_pebs); + + return 0; +} + +/* + * ssdfs_peb_container_forget_source() - forget about dirty source PEB + * @ptr: pointer on PEB container + * + * This method tries to forget about dirty source PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_peb_container_forget_source(struct ssdfs_peb_container *ptr) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + struct ssdfs_segment_migration_info *mi; + struct ssdfs_migration_destination *mdest; + struct ssdfs_peb_mapping_table *maptbl; + struct ssdfs_peb_blk_bmap *peb_blkbmap; + int migration_state; + int items_state; + u64 leb_id; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr || !ptr->src_peb); + BUG_ON(!ptr->parent_si || !ptr->parent_si->fsi); + BUG_ON(!mutex_is_locked(&ptr->migration_lock)); + + SSDFS_DBG("ptr %p, peb_index %u, " + "peb_type %#x, log_pages %u\n", + ptr, + ptr->peb_index, + ptr->peb_type, + ptr->log_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = ptr->parent_si->fsi; + si = ptr->parent_si; + maptbl = fsi->maptbl; + + leb_id = ssdfs_get_leb_id_for_peb_index(fsi, + si->seg_id, + ptr->peb_index); + if (leb_id >= U64_MAX) { + SSDFS_ERR("fail to convert PEB index into LEB ID: " + "seg %llu, peb_index %u\n", + si->seg_id, ptr->peb_index); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + if (rwsem_is_locked(&ptr->lock)) { + SSDFS_DBG("PEB is locked: " + "leb_id %llu\n", leb_id); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&ptr->lock); + + migration_state = atomic_read(&ptr->migration_state); + if (migration_state != SSDFS_PEB_FINISHING_MIGRATION) { + err = -ERANGE; + SSDFS_WARN("invalid migration_state %#x\n", + migration_state); + goto finish_forget_source; + } + + items_state = atomic_read(&ptr->items_state); + switch (items_state) { + case SSDFS_PEB1_DST_CONTAINER: + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB1_SRC_EXT_PTR_DST_CONTAINER: + case SSDFS_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + case SSDFS_PEB2_SRC_EXT_PTR_DST_CONTAINER: + /* valid state */ + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid items_state %#x\n", + items_state); + goto finish_forget_source; + }; + +/* + * You cannot move destination into source PEB and + * try to create another one destination for existing + * relations. Otherwise, you will have two full PEBs + * for the same peb_index. So, in the case of full + * destination PEB and presence of relation with another + * source PEB it needs to wake up all threads and to wait + * decreasing the dst_peb_refs counter. + */ + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_state %#x\n", items_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (items_state) { + case SSDFS_PEB1_DST_CONTAINER: + case SSDFS_PEB2_DST_CONTAINER: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(ptr->src_peb); + BUG_ON(!ptr->dst_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_container_move_dest2source(ptr, + items_state); + if (err == -ENODATA) + goto finish_forget_source; + else if (unlikely(err)) { + SSDFS_ERR("fail to transform destination: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_forget_source; + } + + WARN_ON(atomic_read(&ptr->shared_free_dst_blks) > 0); + break; + + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr->src_peb); + BUG_ON(!ptr->dst_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_container_move_dest2source(ptr, + items_state); + if (err == -ENODATA) + goto finish_forget_source; + else if (unlikely(err)) { + SSDFS_ERR("fail to transform destination: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_forget_source; + } + + if (ptr->peb_index >= si->blk_bmap.pebs_count) { + err = -ERANGE; + SSDFS_ERR("peb_index %u >= pebs_count %u\n", + ptr->peb_index, + si->blk_bmap.pebs_count); + goto finish_forget_source; + } + + peb_blkbmap = &si->blk_bmap.peb[ptr->peb_index]; + err = ssdfs_peb_blk_bmap_finish_migration(peb_blkbmap); + if (unlikely(err)) { + SSDFS_ERR("fail to finish bmap migration: " + "seg %llu, peb_index %u, err %d\n", + ptr->parent_si->seg_id, + ptr->peb_index, err); + goto finish_forget_source; + } + break; + + case SSDFS_PEB1_SRC_EXT_PTR_DST_CONTAINER: + case SSDFS_PEB2_SRC_EXT_PTR_DST_CONTAINER: { + int new_state = SSDFS_PEB_CONTAINER_STATE_MAX; + int used_blks; + bool has_valid_blks = true; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr->src_peb); + BUG_ON(!ptr->dst_peb); + BUG_ON(atomic_read(&ptr->dst_peb_refs) != 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + used_blks = ssdfs_peb_get_used_data_pages(ptr); + if (used_blks < 0) { + err = used_blks; + SSDFS_ERR("fail to get used_blks: " + "seg %llu, peb_index %u, err %d\n", + ptr->parent_si->seg_id, + ptr->peb_index, err); + goto finish_forget_source; + } + + has_valid_blks = used_blks > 0; + + switch (items_state) { + case SSDFS_PEB1_SRC_EXT_PTR_DST_CONTAINER: + if (has_valid_blks) + new_state = SSDFS_PEB1_SRC_CONTAINER; + else + new_state = SSDFS_PEB_CONTAINER_EMPTY; + break; + + case SSDFS_PEB2_SRC_EXT_PTR_DST_CONTAINER: + if (has_valid_blks) + new_state = SSDFS_PEB2_SRC_CONTAINER; + else + new_state = SSDFS_PEB_CONTAINER_EMPTY; + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid state: %#x\n", + new_state); + goto finish_forget_source; + } + + if (fsi->is_zns_device) { + err = ssdfs_peb_container_break_zns_relation(ptr, + items_state, + new_state); + } else { + err = ssdfs_peb_container_break_relation(ptr, + items_state, + new_state); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to break relation: " + "leb_id %llu, items_state %#x" + "new_state %#x\n", + leb_id, items_state, new_state); + goto finish_forget_source; + } + + if (new_state != SSDFS_PEB_CONTAINER_EMPTY) { + /* try create new destination */ + err = -ENOENT; + goto finish_forget_source; + } + break; + } + + default: + BUG(); + }; + +finish_forget_source: + up_write(&ptr->lock); + + if (err == -ENOENT) { /* create new destination or relation */ + err = ssdfs_peb_container_create_destination(ptr); + if (unlikely(err)) { + SSDFS_ERR("fail to create destination: " + "leb_id %llu, err %d\n", + leb_id, err); + return err; + } + } else if (err == -ENODATA) { + wake_up_all(&si->wait_queue[SSDFS_PEB_FLUSH_THREAD]); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dst_peb_refs %d\n", + atomic_read(&ptr->dst_peb_refs)); +#endif /* CONFIG_SSDFS_DEBUG */ + + while (atomic_read(&ptr->dst_peb_refs) > 1) { + DEFINE_WAIT(wait); + + mutex_unlock(&ptr->migration_lock); + prepare_to_wait(&ptr->migration_wq, &wait, + TASK_UNINTERRUPTIBLE); + schedule(); + finish_wait(&ptr->migration_wq, &wait); + mutex_lock(&ptr->migration_lock); + }; + + down_write(&ptr->lock); + + ptr->src_peb = ptr->dst_peb; + ptr->dst_peb = NULL; + + switch (items_state) { + case SSDFS_PEB1_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + atomic_set(&ptr->items_state, SSDFS_PEB1_SRC_CONTAINER); + break; + + case SSDFS_PEB2_DST_CONTAINER: + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + atomic_set(&ptr->items_state, SSDFS_PEB2_SRC_CONTAINER); + break; + + default: + BUG(); + }; + + atomic_set(&ptr->migration_state, SSDFS_PEB_NOT_MIGRATING); + + up_write(&ptr->lock); + + mi = &ptr->parent_si->migration; + spin_lock(&mi->lock); + atomic_dec(&mi->migrating_pebs); + mdest = &mi->array[SSDFS_LAST_DESTINATION]; + switch (mdest->state) { + case SSDFS_VALID_DESTINATION: + case SSDFS_OBSOLETE_DESTINATION: + mdest->destination_pebs--; + break; + }; + mdest = &mi->array[SSDFS_CREATING_DESTINATION]; + switch (mdest->state) { + case SSDFS_DESTINATION_UNDER_CREATION: + mdest->destination_pebs--; + break; + }; + spin_unlock(&mi->lock); + } else if (unlikely(err)) + return err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_peb_container_forget_relation() - forget about relation + * @ptr: pointer on PEB container + * + * This method tries to forget about relation with + * destination PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_peb_container_forget_relation(struct ssdfs_peb_container *ptr) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + struct ssdfs_peb_mapping_table *maptbl; + int migration_state; + int items_state; + u64 leb_id; + int new_state = SSDFS_PEB_CONTAINER_STATE_MAX; + int used_blks; + bool has_valid_blks = true; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr || !ptr->src_peb); + BUG_ON(!ptr->parent_si || !ptr->parent_si->fsi); + BUG_ON(!ptr->dst_peb); + BUG_ON(atomic_read(&ptr->dst_peb_refs) != 0); + + SSDFS_DBG("ptr %p, peb_index %u, " + "peb_type %#x, log_pages %u\n", + ptr, + ptr->peb_index, + ptr->peb_type, + ptr->log_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = ptr->parent_si->fsi; + si = ptr->parent_si; + maptbl = fsi->maptbl; + + leb_id = ssdfs_get_leb_id_for_peb_index(fsi, + si->seg_id, + ptr->peb_index); + if (leb_id >= U64_MAX) { + SSDFS_ERR("fail to convert PEB index into LEB ID: " + "seg %llu, peb_index %u\n", + si->seg_id, ptr->peb_index); + return -ERANGE; + } + + down_write(&ptr->lock); + + migration_state = atomic_read(&ptr->migration_state); + if (migration_state != SSDFS_PEB_FINISHING_MIGRATION) { + err = -ERANGE; + SSDFS_WARN("invalid migration_state %#x\n", + migration_state); + goto finish_forget_relation; + } + + used_blks = ssdfs_peb_get_used_data_pages(ptr); + if (used_blks < 0) { + err = used_blks; + SSDFS_ERR("fail to get used_blks: " + "seg %llu, peb_index %u, err %d\n", + ptr->parent_si->seg_id, + ptr->peb_index, err); + goto finish_forget_relation; + } + + has_valid_blks = used_blks > 0; + + items_state = atomic_read(&ptr->items_state); + switch (items_state) { + case SSDFS_PEB1_SRC_EXT_PTR_DST_CONTAINER: + if (has_valid_blks) + new_state = SSDFS_PEB1_SRC_CONTAINER; + else + new_state = SSDFS_PEB_CONTAINER_EMPTY; + break; + + case SSDFS_PEB2_SRC_EXT_PTR_DST_CONTAINER: + if (has_valid_blks) + new_state = SSDFS_PEB2_SRC_CONTAINER; + else + new_state = SSDFS_PEB_CONTAINER_EMPTY; + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid items_state %#x\n", + items_state); + goto finish_forget_relation; + }; + + err = ssdfs_peb_container_break_relation(ptr, + items_state, + new_state); + if (unlikely(err)) { + SSDFS_ERR("fail to break relation: " + "leb_id %llu, items_state %#x" + "new_state %#x\n", + leb_id, items_state, new_state); + } + +finish_forget_relation: + up_write(&ptr->lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_get_current_peb_locked() - lock PEB container and get PEB object + * @pebc: pointer on PEB container + */ +struct ssdfs_peb_info * +ssdfs_get_current_peb_locked(struct ssdfs_peb_container *pebc) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_peb_info *pebi = NULL; + bool is_peb_exhausted; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebc->parent_si->fsi; + +try_get_current_peb: + switch (atomic_read(&pebc->migration_state)) { + case SSDFS_PEB_NOT_MIGRATING: + down_read(&pebc->lock); + pebi = pebc->src_peb; + if (!pebi) { + err = -ERANGE; + SSDFS_WARN("source PEB is NULL\n"); + goto fail_to_get_current_peb; + } + + atomic_set(&pebc->migration_phase, + SSDFS_PEB_MIGRATION_STATUS_UNKNOWN); + break; + + case SSDFS_PEB_UNDER_MIGRATION: + down_read(&pebc->lock); + + pebi = pebc->src_peb; + if (!pebi) { + err = -ERANGE; + SSDFS_WARN("source PEB is NULL\n"); + goto fail_to_get_current_peb; + } + + ssdfs_peb_current_log_lock(pebi); + is_peb_exhausted = is_ssdfs_peb_exhausted(fsi, pebi); + ssdfs_peb_current_log_unlock(pebi); + + if (is_peb_exhausted) { + if (fsi->is_zns_device && + is_ssdfs_peb_containing_user_data(pebc)) { + atomic_set(&pebc->migration_phase, + SSDFS_SHARED_ZONE_RECEIVES_DATA); + } else { + pebi = pebc->dst_peb; + if (!pebi) { + err = -ERANGE; + SSDFS_WARN("destination PEB is NULL\n"); + goto fail_to_get_current_peb; + } + + atomic_set(&pebc->migration_phase, + SSDFS_DST_PEB_RECEIVES_DATA); + } + } else { + atomic_set(&pebc->migration_phase, + SSDFS_SRC_PEB_NOT_EXHAUSTED); + } + break; + + case SSDFS_PEB_MIGRATION_PREPARATION: + case SSDFS_PEB_RELATION_PREPARATION: + case SSDFS_PEB_FINISHING_MIGRATION: { + DEFINE_WAIT(wait); + + prepare_to_wait(&pebc->migration_wq, &wait, + TASK_UNINTERRUPTIBLE); + schedule(); + finish_wait(&pebc->migration_wq, &wait); + goto try_get_current_peb; + } + break; + + default: + SSDFS_WARN("invalid state: %#x\n", + atomic_read(&pebc->migration_state)); + return ERR_PTR(-ERANGE); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, " + "migration_state %#x, migration_phase %#x\n", + pebc->parent_si->seg_id, + pebi->peb_id, + atomic_read(&pebc->migration_state), + atomic_read(&pebc->migration_phase)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return pebi; + +fail_to_get_current_peb: + up_read(&pebc->lock); + return ERR_PTR(err); +} + +/* + * ssdfs_unlock_current_peb() - unlock source and destination PEB objects + * @pebc: pointer on PEB container + */ +void ssdfs_unlock_current_peb(struct ssdfs_peb_container *pebc) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!rwsem_is_locked(&pebc->lock)) { + SSDFS_WARN("PEB container hasn't been locked: " + "seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); + } else + up_read(&pebc->lock); +} + +/* + * ssdfs_get_peb_for_migration_id() - get PEB object for migration ID + * @pebc: pointer on PEB container + */ +struct ssdfs_peb_info * +ssdfs_get_peb_for_migration_id(struct ssdfs_peb_container *pebc, + u8 migration_id) +{ + struct ssdfs_peb_info *pebi = NULL; + int known_migration_id; + u64 src_peb_id, dst_peb_id; + int src_migration_id, dst_migration_id; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!rwsem_is_locked(&pebc->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&pebc->migration_state)) { + case SSDFS_PEB_NOT_MIGRATING: + pebi = pebc->src_peb; + if (!pebi) { + err = -ERANGE; + SSDFS_WARN("source PEB is NULL\n"); + goto fail_to_get_peb; + } + + known_migration_id = ssdfs_get_peb_migration_id_checked(pebi); + + if (migration_id != known_migration_id) { + err = -ERANGE; + SSDFS_WARN("peb %llu, " + "migration_id %u != known_migration_id %d\n", + pebi->peb_id, migration_id, + known_migration_id); + goto fail_to_get_peb; + } + break; + + case SSDFS_PEB_UNDER_MIGRATION: + case SSDFS_PEB_MIGRATION_PREPARATION: + case SSDFS_PEB_RELATION_PREPARATION: + case SSDFS_PEB_FINISHING_MIGRATION: + pebi = pebc->src_peb; + if (!pebi) { + err = -ERANGE; + SSDFS_WARN("source PEB is NULL\n"); + goto fail_to_get_peb; + } + + known_migration_id = ssdfs_get_peb_migration_id_checked(pebi); + + if (migration_id != known_migration_id) { + src_peb_id = pebi->peb_id; + src_migration_id = known_migration_id; + + pebi = pebc->dst_peb; + if (!pebi) { + err = -ERANGE; + SSDFS_WARN("destination PEB is NULL\n"); + goto fail_to_get_peb; + } + + known_migration_id = + ssdfs_get_peb_migration_id_checked(pebi); + + if (migration_id != known_migration_id) { + dst_peb_id = pebi->peb_id; + dst_migration_id = known_migration_id; + + err = -ERANGE; + SSDFS_WARN("fail to find PEB: " + "src_peb_id %llu, " + "src_migration_id %d, " + "dst_peb_id %llu, " + "dst_migration_id %d, " + "migration_id %u\n", + src_peb_id, src_migration_id, + dst_peb_id, dst_migration_id, + migration_id); + goto fail_to_get_peb; + } + } + break; + + default: + SSDFS_WARN("invalid state: %#x\n", + atomic_read(&pebc->migration_state)); + return ERR_PTR(-ERANGE); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, migration_state %#x, " + "migration_phase %#x, migration_id %u\n", + pebc->parent_si->seg_id, + pebi->peb_id, + atomic_read(&pebc->migration_state), + atomic_read(&pebc->migration_phase), + migration_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + return pebi; + +fail_to_get_peb: + return ERR_PTR(err); +} + +/* + * ssdfs_peb_get_free_pages() - get PEB's free pages count + * @ptr: pointer on PEB container + */ +int ssdfs_peb_get_free_pages(struct ssdfs_peb_container *pebc) +{ + struct ssdfs_segment_info *si; + struct ssdfs_segment_blk_bmap *seg_blkbmap; + struct ssdfs_peb_blk_bmap *peb_blkbmap; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!pebc->parent_si->blk_bmap.peb); + + SSDFS_DBG("pebc %p, peb_index %u, " + "peb_type %#x, log_pages %u\n", + pebc, pebc->peb_index, + pebc->peb_type, pebc->log_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = pebc->parent_si; + seg_blkbmap = &si->blk_bmap; + + if (pebc->peb_index >= seg_blkbmap->pebs_count) { + SSDFS_ERR("peb_index %u >= pebs_count %u\n", + pebc->peb_index, + seg_blkbmap->pebs_count); + return -ERANGE; + } + + peb_blkbmap = &seg_blkbmap->peb[pebc->peb_index]; + + return ssdfs_peb_blk_bmap_get_free_pages(peb_blkbmap); +} + +/* + * ssdfs_peb_get_used_pages() - get PEB's valid pages count + * @ptr: pointer on PEB container + */ +int ssdfs_peb_get_used_data_pages(struct ssdfs_peb_container *pebc) +{ + struct ssdfs_segment_info *si; + struct ssdfs_segment_blk_bmap *seg_blkbmap; + struct ssdfs_peb_blk_bmap *peb_blkbmap; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!pebc->parent_si->blk_bmap.peb); + + SSDFS_DBG("pebc %p, peb_index %u, " + "peb_type %#x, log_pages %u\n", + pebc, pebc->peb_index, + pebc->peb_type, pebc->log_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = pebc->parent_si; + seg_blkbmap = &si->blk_bmap; + + if (pebc->peb_index >= seg_blkbmap->pebs_count) { + SSDFS_ERR("peb_index %u >= pebs_count %u\n", + pebc->peb_index, + seg_blkbmap->pebs_count); + return -ERANGE; + } + + peb_blkbmap = &seg_blkbmap->peb[pebc->peb_index]; + + return ssdfs_peb_blk_bmap_get_used_pages(peb_blkbmap); +} + +/* + * ssdfs_peb_get_invalid_pages() - get PEB's invalid pages count + * @ptr: pointer on PEB container + */ +int ssdfs_peb_get_invalid_pages(struct ssdfs_peb_container *pebc) +{ + struct ssdfs_segment_info *si; + struct ssdfs_segment_blk_bmap *seg_blkbmap; + struct ssdfs_peb_blk_bmap *peb_blkbmap; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!pebc->parent_si->blk_bmap.peb); + + SSDFS_DBG("pebc %p, peb_index %u, " + "peb_type %#x, log_pages %u\n", + pebc, pebc->peb_index, + pebc->peb_type, pebc->log_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = pebc->parent_si; + seg_blkbmap = &si->blk_bmap; + + if (pebc->peb_index >= seg_blkbmap->pebs_count) { + SSDFS_ERR("peb_index %u >= pebs_count %u\n", + pebc->peb_index, + seg_blkbmap->pebs_count); + return -ERANGE; + } + + peb_blkbmap = &seg_blkbmap->peb[pebc->peb_index]; + + return ssdfs_peb_blk_bmap_get_invalid_pages(peb_blkbmap); +} + +/* + * ssdfs_peb_container_invalidate_block() - invalidate PEB's block + * @pebc: pointer on PEB container + * @desc: physical offset descriptor + * + * This method tries to invalidate PEB's block. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_peb_container_invalidate_block(struct ssdfs_peb_container *pebc, + struct ssdfs_phys_offset_descriptor *desc) +{ + struct ssdfs_segment_info *si; + struct ssdfs_peb_info *pebi; + struct ssdfs_segment_blk_bmap *seg_blkbmap; + struct ssdfs_peb_blk_bmap *peb_blkbmap; + struct ssdfs_block_bmap_range range; + u16 peb_index; + u32 peb_page; + u8 peb_migration_id; + int id; + int items_state; + int bmap_index = SSDFS_PEB_BLK_BMAP_INDEX_MAX; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !desc); + BUG_ON(!pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!pebc->parent_si->blk_bmap.peb); + + SSDFS_DBG("seg %llu, peb_index %u, peb_migration_id %u, " + "logical_offset %u, logical_blk %u, peb_page %u\n", + pebc->parent_si->seg_id, + pebc->peb_index, + desc->blk_state.peb_migration_id, + le32_to_cpu(desc->page_desc.logical_offset), + le16_to_cpu(desc->page_desc.logical_blk), + le16_to_cpu(desc->page_desc.peb_page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + peb_index = pebc->peb_index; + peb_page = le16_to_cpu(desc->page_desc.peb_page); + peb_migration_id = desc->blk_state.peb_migration_id; + + down_read(&pebc->lock); + + items_state = atomic_read(&pebc->items_state); + switch (items_state) { + case SSDFS_PEB1_SRC_CONTAINER: + case SSDFS_PEB2_SRC_CONTAINER: + pebi = pebc->src_peb; + if (!pebi) { + SSDFS_ERR("PEB pointer is NULL: items_state %#x\n", + items_state); + err = -ERANGE; + goto finish_invalidate_block; + } + bmap_index = SSDFS_PEB_BLK_BMAP_SOURCE; + break; + + case SSDFS_PEB1_DST_CONTAINER: + case SSDFS_PEB2_DST_CONTAINER: + pebi = pebc->dst_peb; + if (!pebi) { + SSDFS_ERR("PEB pointer is NULL: items_state %#x\n", + items_state); + err = -ERANGE; + goto finish_invalidate_block; + } + bmap_index = SSDFS_PEB_BLK_BMAP_DESTINATION; + break; + + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + case SSDFS_PEB1_SRC_EXT_PTR_DST_CONTAINER: + case SSDFS_PEB2_SRC_EXT_PTR_DST_CONTAINER: + pebi = pebc->src_peb; + if (!pebi) { + SSDFS_ERR("PEB pointer is NULL: items_state %#x\n", + items_state); + err = -ERANGE; + goto finish_invalidate_block; + } + + bmap_index = SSDFS_PEB_BLK_BMAP_SOURCE; + id = ssdfs_get_peb_migration_id_checked(pebi); + + if (peb_migration_id != id) { + pebi = pebc->dst_peb; + if (!pebi) { + SSDFS_ERR("PEB pointer is NULL: " + "items_state %#x\n", + items_state); + err = -ERANGE; + goto finish_invalidate_block; + } + bmap_index = SSDFS_PEB_BLK_BMAP_DESTINATION; + } + break; + + default: + SSDFS_ERR("invalid PEB container's items_state: " + "%#x\n", + items_state); + err = -ERANGE; + goto finish_invalidate_block; + }; + + id = ssdfs_get_peb_migration_id_checked(pebi); + + if (peb_migration_id != id) { + SSDFS_ERR("peb_migration_id %u != pebi->peb_migration_id %u\n", + peb_migration_id, + ssdfs_get_peb_migration_id(pebi)); + err = -ERANGE; + goto finish_invalidate_block; + } + + si = pebc->parent_si; + seg_blkbmap = &si->blk_bmap; + + if (pebc->peb_index >= seg_blkbmap->pebs_count) { + SSDFS_ERR("peb_index %u >= pebs_count %u\n", + pebc->peb_index, + seg_blkbmap->pebs_count); + return -ERANGE; + } + + peb_blkbmap = &seg_blkbmap->peb[pebc->peb_index]; + + range.start = peb_page; + range.len = 1; + + err = ssdfs_peb_blk_bmap_invalidate(peb_blkbmap, + bmap_index, + &range); + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate range: " + "peb %llu, " + "range (start %u, len %u), err %d\n", + pebi->peb_id, + range.start, range.len, err); + goto finish_invalidate_block; + } + +finish_invalidate_block: + up_read(&pebc->lock); + + return err; +} + +/* + * is_peb_joined_into_create_requests_queue() - is PEB joined into create queue? + * @pebc: pointer on PEB container + */ +bool is_peb_joined_into_create_requests_queue(struct ssdfs_peb_container *pebc) +{ + bool is_joined; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&pebc->crq_ptr_lock); + is_joined = pebc->create_rq != NULL; + spin_unlock(&pebc->crq_ptr_lock); + + return is_joined; +} + +/* + * ssdfs_peb_join_create_requests_queue() - join to process new page requests + * @pebc: pointer on PEB container + * @create_rq: pointer on shared new page requests queue + * @wait: wait queue of threads that process new pages + * + * This function select PEB's flush thread for processing new page + * requests. Namely, selected PEB object keeps pointer on shared + * new page requests queue and to join into wait queue of another + * flush threads. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input value. + */ +int ssdfs_peb_join_create_requests_queue(struct ssdfs_peb_container *pebc, + struct ssdfs_requests_queue *create_rq) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc); + BUG_ON(!create_rq); + + SSDFS_DBG("seg %llu, peb_index %u, create_rq %p\n", + pebc->parent_si->seg_id, + pebc->peb_index, create_rq); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_peb_joined_into_create_requests_queue(pebc)) { + SSDFS_ERR("PEB is joined into create requests queue yet: " + "seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, pebc->peb_index); + return -EINVAL; + } + + if (pebc->thread[SSDFS_PEB_FLUSH_THREAD].task == NULL) { + SSDFS_ERR("PEB hasn't flush thread: " + "seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, pebc->peb_index); + return -EINVAL; + } + + spin_lock(&pebc->crq_ptr_lock); + pebc->create_rq = create_rq; + spin_unlock(&pebc->crq_ptr_lock); + + return 0; +} + +/* + * ssdfs_peb_forget_create_requests_queue() - forget create requests queue + * @pebc: pointer on PEB container + */ +void ssdfs_peb_forget_create_requests_queue(struct ssdfs_peb_container *pebc) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc); + WARN_ON(!is_peb_joined_into_create_requests_queue(pebc)); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&pebc->crq_ptr_lock); + pebc->create_rq = NULL; + spin_unlock(&pebc->crq_ptr_lock); +} + +/* + * ssdfs_peb_container_change_state() - change PEB's state in mapping table + * @pebc: pointer on PEB container + * + * This method tries to change PEB's state in the mapping table. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_peb_container_change_state(struct ssdfs_peb_container *pebc) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + struct ssdfs_segment_blk_bmap *seg_blkbmap; + struct ssdfs_peb_blk_bmap *peb_blkbmap; + struct ssdfs_peb_info *pebi; + struct ssdfs_peb_mapping_table *maptbl; + struct completion *end; + int items_state; + int used_pages, free_pages, invalid_pages; + int new_peb_state = SSDFS_MAPTBL_UNKNOWN_PEB_STATE; + u64 leb_id; + bool is_peb_exhausted = false; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!rwsem_is_locked(&pebc->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = pebc->parent_si; + fsi = pebc->parent_si->fsi; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("pebc %p, seg %llu, peb_index %u, " + "peb_type %#x, log_pages %u\n", + pebc, si->seg_id, pebc->peb_index, + pebc->peb_type, pebc->log_pages); +#else + SSDFS_DBG("pebc %p, seg %llu, peb_index %u, " + "peb_type %#x, log_pages %u\n", + pebc, si->seg_id, pebc->peb_index, + pebc->peb_type, pebc->log_pages); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + seg_blkbmap = &si->blk_bmap; + maptbl = fsi->maptbl; + + if (pebc->peb_index >= seg_blkbmap->pebs_count) { + SSDFS_ERR("peb_index %u >= pebs_count %u\n", + pebc->peb_index, + seg_blkbmap->pebs_count); + return -ERANGE; + } + + peb_blkbmap = &seg_blkbmap->peb[pebc->peb_index]; + + leb_id = ssdfs_get_leb_id_for_peb_index(fsi, si->seg_id, + pebc->peb_index); + if (leb_id == U64_MAX) { + SSDFS_ERR("fail to convert PEB index into LEB ID: " + "seg %llu, peb_index %u\n", + si->seg_id, pebc->peb_index); + return -EINVAL; + } + + items_state = atomic_read(&pebc->items_state); + switch (items_state) { + case SSDFS_PEB1_SRC_CONTAINER: + case SSDFS_PEB2_SRC_CONTAINER: + pebi = pebc->src_peb; + if (!pebi) { + SSDFS_ERR("PEB pointer is NULL: items_state %#x\n", + items_state); + return -ERANGE; + } + + free_pages = ssdfs_peb_blk_bmap_get_free_pages(peb_blkbmap); + if (free_pages < 0) { + err = free_pages; + SSDFS_ERR("fail to get free pages: err %d\n", + err); + return err; + } + + used_pages = ssdfs_peb_blk_bmap_get_used_pages(peb_blkbmap); + if (used_pages < 0) { + err = used_pages; + SSDFS_ERR("fail to get used pages: err %d\n", + err); + return err; + } + + invalid_pages = + ssdfs_peb_blk_bmap_get_invalid_pages(peb_blkbmap); + if (invalid_pages < 0) { + err = invalid_pages; + SSDFS_ERR("fail to get invalid pages: err %d\n", + err); + return err; + } + + ssdfs_peb_current_log_lock(pebi); + is_peb_exhausted = is_ssdfs_peb_exhausted(fsi, pebi); + ssdfs_peb_current_log_unlock(pebi); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_pages %d, used_pages %d, " + "invalid_pages %d, is_peb_exhausted %#x\n", + free_pages, used_pages, + invalid_pages, is_peb_exhausted); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (free_pages == 0) { + if (!is_peb_exhausted) { + new_peb_state = + SSDFS_MAPTBL_USING_PEB_STATE; + } else if (invalid_pages == 0) { + if (used_pages == 0) { + SSDFS_ERR("invalid state: " + "free_pages %d, " + "used_pages %d, " + "invalid_pages %d\n", + free_pages, + used_pages, + invalid_pages); + return -ERANGE; + } + + new_peb_state = + SSDFS_MAPTBL_USED_PEB_STATE; + } else if (used_pages == 0) { + if (invalid_pages == 0) { + SSDFS_ERR("invalid state: " + "free_pages %d, " + "used_pages %d, " + "invalid_pages %d\n", + free_pages, + used_pages, + invalid_pages); + return -ERANGE; + } + + new_peb_state = + SSDFS_MAPTBL_DIRTY_PEB_STATE; + } else { + new_peb_state = + SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE; + } + } else if (used_pages == 0) { + if (invalid_pages == 0) { + new_peb_state = + SSDFS_MAPTBL_CLEAN_PEB_STATE; + } else { + new_peb_state = + SSDFS_MAPTBL_USING_PEB_STATE; + } + } else { + new_peb_state = + SSDFS_MAPTBL_USING_PEB_STATE; + } + + err = ssdfs_maptbl_change_peb_state(fsi, leb_id, + pebc->peb_type, + new_peb_state, &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + return err; + } + + err = ssdfs_maptbl_change_peb_state(fsi, + leb_id, + pebc->peb_type, + new_peb_state, + &end); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to change the PEB state: " + "peb_id %llu, new_state %#x, err %d\n", + pebi->peb_id, new_peb_state, err); + return err; + } + break; + + case SSDFS_PEB1_DST_CONTAINER: + case SSDFS_PEB2_DST_CONTAINER: + pebi = pebc->dst_peb; + if (!pebi) { + SSDFS_ERR("PEB pointer is NULL: items_state %#x\n", + items_state); + return -ERANGE; + } + + free_pages = ssdfs_peb_blk_bmap_get_free_pages(peb_blkbmap); + if (free_pages < 0) { + err = free_pages; + SSDFS_ERR("fail to get free pages: err %d\n", + err); + return err; + } + + used_pages = ssdfs_peb_blk_bmap_get_used_pages(peb_blkbmap); + if (used_pages < 0) { + err = used_pages; + SSDFS_ERR("fail to get used pages: err %d\n", + err); + return err; + } + + invalid_pages = + ssdfs_peb_blk_bmap_get_invalid_pages(peb_blkbmap); + if (invalid_pages < 0) { + err = invalid_pages; + SSDFS_ERR("fail to get invalid pages: err %d\n", + err); + return err; + } + + ssdfs_peb_current_log_lock(pebi); + is_peb_exhausted = is_ssdfs_peb_exhausted(fsi, pebi); + ssdfs_peb_current_log_unlock(pebi); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_pages %d, used_pages %d, " + "invalid_pages %d, is_peb_exhausted %#x\n", + free_pages, used_pages, + invalid_pages, is_peb_exhausted); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (free_pages == 0) { + if (!is_peb_exhausted) { + new_peb_state = + SSDFS_MAPTBL_USING_PEB_STATE; + } else if (invalid_pages == 0) { + if (used_pages == 0) { + SSDFS_ERR("invalid state: " + "free_pages %d, " + "used_pages %d, " + "invalid_pages %d\n", + free_pages, + used_pages, + invalid_pages); + return -ERANGE; + } + + new_peb_state = + SSDFS_MAPTBL_USED_PEB_STATE; + } else if (used_pages == 0) { + if (invalid_pages == 0) { + SSDFS_ERR("invalid state: " + "free_pages %d, " + "used_pages %d, " + "invalid_pages %d\n", + free_pages, + used_pages, + invalid_pages); + return -ERANGE; + } + + new_peb_state = + SSDFS_MAPTBL_DIRTY_PEB_STATE; + } else { + new_peb_state = + SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE; + } + } else if (used_pages == 0) { + if (invalid_pages == 0) { + new_peb_state = + SSDFS_MAPTBL_CLEAN_PEB_STATE; + } else { + new_peb_state = + SSDFS_MAPTBL_USING_PEB_STATE; + } + } else { + new_peb_state = + SSDFS_MAPTBL_USING_PEB_STATE; + } + + err = ssdfs_maptbl_change_peb_state(fsi, leb_id, + pebc->peb_type, + new_peb_state, &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + return err; + } + + err = ssdfs_maptbl_change_peb_state(fsi, leb_id, + pebc->peb_type, + new_peb_state, + &end); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to change the PEB state: " + "peb_id %llu, new_state %#x, err %d\n", + pebi->peb_id, new_peb_state, err); + return err; + } + break; + + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + case SSDFS_PEB1_SRC_EXT_PTR_DST_CONTAINER: + case SSDFS_PEB2_SRC_EXT_PTR_DST_CONTAINER: + pebi = pebc->src_peb; + if (!pebi) { + SSDFS_ERR("PEB pointer is NULL: items_state %#x\n", + items_state); + return -ERANGE; + } + + free_pages = ssdfs_src_blk_bmap_get_free_pages(peb_blkbmap); + if (free_pages < 0) { + err = free_pages; + SSDFS_ERR("fail to get free pages: err %d\n", + err); + return err; + } + + used_pages = ssdfs_src_blk_bmap_get_used_pages(peb_blkbmap); + if (used_pages < 0) { + err = used_pages; + SSDFS_ERR("fail to get used pages: err %d\n", + err); + return err; + } + + invalid_pages = + ssdfs_src_blk_bmap_get_invalid_pages(peb_blkbmap); + if (invalid_pages < 0) { + err = invalid_pages; + SSDFS_ERR("fail to get invalid pages: err %d\n", + err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("source PEB: free_pages %d, used_pages %d, " + "invalid_pages %d\n", + free_pages, used_pages, invalid_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (invalid_pages == 0) { + if (used_pages == 0) { + SSDFS_ERR("invalid state: " + "used_pages %d, " + "invalid_pages %d\n", + used_pages, + invalid_pages); + return -ERANGE; + } + + new_peb_state = + SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE; + } else if (used_pages == 0) { + new_peb_state = + SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE; + } else { + new_peb_state = + SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE; + } + + err = ssdfs_maptbl_change_peb_state(fsi, leb_id, + pebc->peb_type, + new_peb_state, &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + return err; + } + + err = ssdfs_maptbl_change_peb_state(fsi, leb_id, + pebc->peb_type, + new_peb_state, + &end); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to change the PEB state: " + "peb_id %llu, new_state %#x, err %d\n", + pebi->peb_id, new_peb_state, err); + return err; + } + + pebi = pebc->dst_peb; + if (!pebi) { + SSDFS_ERR("PEB pointer is NULL: " + "items_state %#x\n", + items_state); + return -ERANGE; + } + + free_pages = ssdfs_dst_blk_bmap_get_free_pages(peb_blkbmap); + if (free_pages < 0) { + err = free_pages; + SSDFS_ERR("fail to get free pages: err %d\n", + err); + return err; + } + + used_pages = ssdfs_dst_blk_bmap_get_used_pages(peb_blkbmap); + if (used_pages < 0) { + err = used_pages; + SSDFS_ERR("fail to get used pages: err %d\n", + err); + return err; + } + + invalid_pages = + ssdfs_dst_blk_bmap_get_invalid_pages(peb_blkbmap); + if (invalid_pages < 0) { + err = invalid_pages; + SSDFS_ERR("fail to get invalid pages: err %d\n", + err); + return err; + } + + ssdfs_peb_current_log_lock(pebi); + is_peb_exhausted = is_ssdfs_peb_exhausted(fsi, pebi); + ssdfs_peb_current_log_unlock(pebi); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("destination PEB: free_pages %d, used_pages %d, " + "invalid_pages %d, is_peb_exhausted %#x\n", + free_pages, used_pages, + invalid_pages, is_peb_exhausted); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (free_pages == 0) { + if (!is_peb_exhausted) { + new_peb_state = + SSDFS_MAPTBL_MIGRATION_DST_USING_STATE; + } else if (invalid_pages == 0) { + if (used_pages == 0) { + SSDFS_ERR("invalid state: " + "free_pages %d, " + "used_pages %d, " + "invalid_pages %d\n", + free_pages, + used_pages, + invalid_pages); + return -ERANGE; + } + + new_peb_state = + SSDFS_MAPTBL_MIGRATION_DST_USED_STATE; + } else if (used_pages == 0) { + if (invalid_pages == 0) { + SSDFS_ERR("invalid state: " + "free_pages %d, " + "used_pages %d, " + "invalid_pages %d\n", + free_pages, + used_pages, + invalid_pages); + return -ERANGE; + } + + new_peb_state = + SSDFS_MAPTBL_MIGRATION_DST_DIRTY_STATE; + } else { + new_peb_state = + SSDFS_MAPTBL_MIGRATION_DST_PRE_DIRTY_STATE; + } + } else if (used_pages == 0) { + if (invalid_pages == 0) { + new_peb_state = + SSDFS_MAPTBL_MIGRATION_DST_CLEAN_STATE; + } else { + new_peb_state = + SSDFS_MAPTBL_MIGRATION_DST_USING_STATE; + } + } else { + new_peb_state = + SSDFS_MAPTBL_MIGRATION_DST_USING_STATE; + } + + err = ssdfs_maptbl_change_peb_state(fsi, leb_id, + pebc->peb_type, + new_peb_state, &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + return err; + } + + err = ssdfs_maptbl_change_peb_state(fsi, leb_id, + pebc->peb_type, + new_peb_state, + &end); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to change the PEB state: " + "peb_id %llu, new_state %#x, err %d\n", + pebi->peb_id, new_peb_state, err); + return err; + } + break; + + default: + SSDFS_ERR("invalid PEB container's items_state: " + "%#x\n", + items_state); + return -ERANGE; + }; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} From patchwork Sat Feb 25 01:08:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151931 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4ADA1C7EE2E for ; Sat, 25 Feb 2023 01:17:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229683AbjBYBRN (ORCPT ); Fri, 24 Feb 2023 20:17:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48762 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229495AbjBYBQb (ORCPT ); Fri, 24 Feb 2023 20:16:31 -0500 Received: from mail-oi1-x231.google.com (mail-oi1-x231.google.com [IPv6:2607:f8b0:4864:20::231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D749C15142 for ; Fri, 24 Feb 2023 17:16:26 -0800 (PST) Received: by mail-oi1-x231.google.com with SMTP id o12so802829oik.6 for ; Fri, 24 Feb 2023 17:16:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=6ZrMueduaHlrNnwqrIXonCbfUTBWboBtLdItB0mqdHU=; b=ADYiF+NUl2S6EHqCopcK7zkBITVAgxe3fhBzMXWu0kY540dOS1b9iCTvypD2f4Cvjt vdF7hrqLLieUBRvXfd3SBKF/2AojC9soq+YSK0t92gVUhGRiAFAQztCMJXjyifXJl22T QC4S6Kq0hs1MMzKnOOVRkNwKpebPirentntc0llnO5qP0uUXA+7i67Tmy7VsPrwhUDAY AcXgbCrLveMYhttR13H2XVJ3IITr9RpMXCtE7N+Qr+fZcnOd022cir6IbaB9DSL4GVMq hif3wd6IepmBhCIH7414r0Qyt4j5Be8BGa3rnPI9E/MeerhOt7sWhzidRuq7FOivnTP2 Hd5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6ZrMueduaHlrNnwqrIXonCbfUTBWboBtLdItB0mqdHU=; b=p9N8MLG0VEX6kVU4ZLegT5w3Y6gCtvJYGKpLJSVefemrY0oRNv3DgT2+RE0wGFpPI+ ZFnT1islSlCIeHhO2nMuBvpO8Eg1bsehCYICBJURAcUvMeZJhSw/Q+GiakpPLaZHH/eE iPUxzFJSnpLIdJCMyuTMp4phN/YOgL4O5gmVLCiAFPADGG26AQKXgPTi6xT8a1c2+4IM E7ZYQZs1orUudhpthew2c8bLcRdqi8KsNdIUv6bJXidhBcV8WxoVZlLBTCN4PleKVDwK FDV5fCUTpMcM2crLyYf27JPZO/7eFf+CeO3qmJ56JrTBMNDC+sKDfpNNXLaNaJn3Y9Nu /EcQ== X-Gm-Message-State: AO0yUKWaEI9p7x0zT1KHfascjGA2c6YkD4qvDaCplcvFWbv5ettlo66u SIlTw9BLvLhH0qGgQ9NgLmyFZXFxZCCcXhuw X-Google-Smtp-Source: AK7set9Y44IHRYOTu5DEqVccsAejoCWWoPLzy4GgOZwALeCjsmTNaA4ZEniPxkGPWLaS2Xy9tcg3TQ== X-Received: by 2002:aca:1010:0:b0:384:111a:54bb with SMTP id 16-20020aca1010000000b00384111a54bbmr308604oiq.36.1677287785331; Fri, 24 Feb 2023 17:16:25 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:24 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 24/76] ssdfs: PEB read thread's init logic Date: Fri, 24 Feb 2023 17:08:35 -0800 Message-Id: <20230225010927.813929-25-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org "Physical" Erase Block (PEB) container has read thread. This thread is used to execute: (1) background PEB initialization logic (2) background readahead (3) free PEB object's cache memory pages in background This patch implements finite state machine of read thread and declare main command that read thread can process. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/peb_read_thread.c | 3053 ++++++++++++++++++++++++++++++++++++ 1 file changed, 3053 insertions(+) create mode 100644 fs/ssdfs/peb_read_thread.c diff --git a/fs/ssdfs/peb_read_thread.c b/fs/ssdfs/peb_read_thread.c new file mode 100644 index 000000000000..c5087373df8d --- /dev/null +++ b/fs/ssdfs/peb_read_thread.c @@ -0,0 +1,3053 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/peb_read_thread.c - read thread functionality. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates. + * https://www.bytedance.com/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + * Cong Wang + */ + +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "compression.h" +#include "page_vector.h" +#include "block_bitmap.h" +#include "peb_block_bitmap.h" +#include "segment_block_bitmap.h" +#include "offset_translation_table.h" +#include "page_array.h" +#include "peb.h" +#include "peb_container.h" +#include "segment_bitmap.h" +#include "segment.h" +#include "peb_mapping_table.h" +#include "extents_queue.h" +#include "request_queue.h" +#include "btree_search.h" +#include "btree_node.h" +#include "btree.h" +#include "diff_on_write.h" +#include "shared_extents_tree.h" +#include "invalidated_extents_tree.h" + +#include + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_read_page_leaks; +atomic64_t ssdfs_read_memory_leaks; +atomic64_t ssdfs_read_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_read_cache_leaks_increment(void *kaddr) + * void ssdfs_read_cache_leaks_decrement(void *kaddr) + * void *ssdfs_read_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_read_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_read_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_read_kfree(void *kaddr) + * struct page *ssdfs_read_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_read_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_read_free_page(struct page *page) + * void ssdfs_read_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(read) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(read) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_read_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_read_page_leaks, 0); + atomic64_set(&ssdfs_read_memory_leaks, 0); + atomic64_set(&ssdfs_read_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_read_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_read_page_leaks) != 0) { + SSDFS_ERR("READ THREAD: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_read_page_leaks)); + } + + if (atomic64_read(&ssdfs_read_memory_leaks) != 0) { + SSDFS_ERR("READ THREAD: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_read_memory_leaks)); + } + + if (atomic64_read(&ssdfs_read_cache_leaks) != 0) { + SSDFS_ERR("READ THREAD: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_read_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +/* + * struct ssdfs_segbmap_extent - segbmap extent + * @logical_offset: logical offset inside of segbmap's content + * @data_size: requested data size + * @fragment_size: fragment size of segbmap + */ +struct ssdfs_segbmap_extent { + u64 logical_offset; + u32 data_size; + u16 fragment_size; +}; + +static +void ssdfs_prepare_blk_bmap_init_env(struct ssdfs_blk_bmap_init_env *env, + u32 pages_per_peb) +{ + size_t bmap_bytes; + size_t bmap_pages; + + memset(env->bmap_hdr_buf, 0, SSDFS_BLKBMAP_HDR_CAPACITY); + env->bmap_hdr = (struct ssdfs_block_bitmap_header *)env->bmap_hdr_buf; + env->frag_hdr = + (struct ssdfs_block_bitmap_fragment *)(env->bmap_hdr_buf + + sizeof(struct ssdfs_block_bitmap_header)); + env->fragment_index = -1; + + bmap_bytes = BLK_BMAP_BYTES(pages_per_peb); + bmap_pages = (bmap_bytes + PAGE_SIZE - 1) / PAGE_SIZE; + ssdfs_page_vector_create(&env->array, bmap_pages); + + env->read_bytes = 0; +} + +static void +ssdfs_prepare_blk2off_table_init_env(struct ssdfs_blk2off_table_init_env *env) +{ + memset(&env->tbl_hdr, 0, sizeof(struct ssdfs_blk2off_table_header)); + pagevec_init(&env->pvec); + env->blk2off_tbl_hdr_off = 0; + env->read_off = 0; + env->write_off = 0; +} + +static void +ssdfs_prepare_blk_desc_table_init_env(struct ssdfs_blk_desc_table_init_env *env) +{ + pagevec_init(&env->pvec); + env->compressed_buf = NULL; + env->buf_size = 0; + env->read_off = 0; + env->write_off = 0; +} + +static +int ssdfs_prepare_read_init_env(struct ssdfs_read_init_env *env, + u32 pages_per_peb) +{ + size_t hdr_size; + size_t footer_buf_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("env %p\n", env); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr_size = sizeof(struct ssdfs_segment_header); + hdr_size = max_t(size_t, hdr_size, (size_t)SSDFS_4KB); + + env->log_hdr = ssdfs_read_kzalloc(hdr_size, GFP_KERNEL); + if (!env->log_hdr) { + SSDFS_ERR("fail to allocate log header buffer\n"); + return -ENOMEM; + } + + env->has_seg_hdr = false; + + footer_buf_size = max_t(size_t, hdr_size, + sizeof(struct ssdfs_log_footer)); + env->footer = ssdfs_read_kzalloc(footer_buf_size, GFP_KERNEL); + if (!env->footer) { + SSDFS_ERR("fail to allocate log footer buffer\n"); + return -ENOMEM; + } + + env->has_footer = false; + + env->cur_migration_id = -1; + env->prev_migration_id = -1; + + env->log_offset = 0; + env->log_pages = U32_MAX; + env->log_bytes = U32_MAX; + + ssdfs_prepare_blk_bmap_init_env(&env->b_init, pages_per_peb); + ssdfs_prepare_blk2off_table_init_env(&env->t_init); + ssdfs_prepare_blk_desc_table_init_env(&env->bdt_init); + + return 0; +} + +static +void ssdfs_destroy_init_env(struct ssdfs_read_init_env *env) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("env %p\n", env); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (env->log_hdr) + ssdfs_read_kfree(env->log_hdr); + + env->log_hdr = NULL; + env->has_seg_hdr = false; + + if (env->footer) + ssdfs_read_kfree(env->footer); + + env->footer = NULL; + env->has_footer = false; + + ssdfs_page_vector_release(&env->b_init.array); + ssdfs_page_vector_destroy(&env->b_init.array); + + ssdfs_read_pagevec_release(&env->t_init.pvec); + ssdfs_read_pagevec_release(&env->bdt_init.pvec); + + if (env->bdt_init.compressed_buf) + ssdfs_read_kfree(env->bdt_init.compressed_buf); +} + +static +int ssdfs_read_blk2off_table_fragment(struct ssdfs_peb_info *pebi, + struct ssdfs_read_init_env *env); + +/****************************************************************************** + * READ THREAD FUNCTIONALITY * + ******************************************************************************/ + +/* + * ssdfs_find_prev_partial_log() - find previous partial log + * @fsi: file system info object + * @pebi: pointer on PEB object + * @env: read operation's init environment [in|out] + * @log_diff: offset for logs processing + * + * This function tries to find a previous partial log. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EIO - I/O error. + * %-ENOENT - unable to find any log. + */ +static +int ssdfs_find_prev_partial_log(struct ssdfs_fs_info *fsi, + struct ssdfs_peb_info *pebi, + struct ssdfs_read_init_env *env, + int log_diff) +{ + struct ssdfs_signature *magic = NULL; + struct ssdfs_segment_header *seg_hdr = NULL; + struct ssdfs_partial_log_header *pl_hdr = NULL; + struct ssdfs_log_footer *footer = NULL; + struct page *page; + void *kaddr; + size_t hdr_buf_size = sizeof(struct ssdfs_segment_header); + int start_offset; + int skipped_logs = 0; + int i; + int err = -ENOENT; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !pebi || !pebi->pebc || !env); + + SSDFS_DBG("seg %llu, peb %llu, peb_index %u, " + "log_offset %u, log_diff %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, pebi->peb_index, + env->log_offset, log_diff); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (env->log_offset > fsi->pages_per_peb) { + SSDFS_ERR("log_offset %u > pages_per_peb %u\n", + env->log_offset, fsi->pages_per_peb); + return -ERANGE; + } else if (env->log_offset == fsi->pages_per_peb) + env->log_offset--; + + start_offset = env->log_offset; + + if (log_diff > 0) { + SSDFS_ERR("invalid log_diff %d\n", log_diff); + return -EINVAL; + } + + if (env->log_offset == 0) { + SSDFS_DBG("previous log is absent\n"); + return -ENOENT; + } + + for (i = start_offset; i >= 0; i--) { + page = ssdfs_page_array_get_page_locked(&pebi->cache, i); + if (IS_ERR_OR_NULL(page)) { + if (page == NULL) { + SSDFS_ERR("fail to get page: " + "index %d\n", + i); + return -ERANGE; + } else { + err = PTR_ERR(page); + + if (err == -ENOENT) + continue; + else { + SSDFS_ERR("fail to get page: " + "index %d, err %d\n", + i, err); + return err; + } + } + } + + kaddr = kmap_local_page(page); + ssdfs_memcpy(env->log_hdr, 0, hdr_buf_size, + kaddr, 0, PAGE_SIZE, + hdr_buf_size); + ssdfs_memcpy(env->footer, 0, hdr_buf_size, + kaddr, 0, PAGE_SIZE, + hdr_buf_size); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + magic = (struct ssdfs_signature *)env->log_hdr; + + if (__is_ssdfs_segment_header_magic_valid(magic)) { + seg_hdr = SSDFS_SEG_HDR(env->log_hdr); + + err = ssdfs_check_segment_header(fsi, seg_hdr, + false); + if (unlikely(err)) { + SSDFS_ERR("log header is corrupted: " + "seg %llu, peb %llu, index %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + i); + return -EIO; + } + + if (start_offset == i) { + /* + * Requested starting log_offset points out + * on segment header. It needs to skip this + * header because of searching the previous + * log. + */ + continue; + } + + env->has_seg_hdr = true; + env->has_footer = ssdfs_log_has_footer(seg_hdr); + env->log_offset = (u16)i; + + if (skipped_logs > log_diff) { + skipped_logs--; + err = -ENOENT; + continue; + } else { + /* log has been found */ + err = 0; + goto finish_prev_log_search; + } + } else if (is_ssdfs_partial_log_header_magic_valid(magic)) { + u32 flags; + + pl_hdr = SSDFS_PLH(env->log_hdr); + + err = ssdfs_check_partial_log_header(fsi, pl_hdr, + false); + if (unlikely(err)) { + SSDFS_ERR("partial log header is corrupted: " + "seg %llu, peb %llu, index %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + i); + return -EIO; + } + + env->has_seg_hdr = false; + env->has_footer = ssdfs_pl_has_footer(pl_hdr); + + env->log_bytes = + le32_to_cpu(pl_hdr->log_bytes); + + flags = le32_to_cpu(pl_hdr->pl_flags); + + if (flags & SSDFS_PARTIAL_HEADER_INSTEAD_FOOTER) { + /* first partial log */ + err = -ENOENT; + continue; + } else if (flags & SSDFS_LOG_HAS_FOOTER) { + /* last partial log */ + if (start_offset == i) { + /* + * Requested starting log_offset + * points out on segment header. + * It needs to skip this header + * because of searching the previous + * log. + */ + continue; + } + + env->log_offset = (u16)i; + + if (skipped_logs > log_diff) { + skipped_logs--; + err = -ENOENT; + continue; + } else { + /* log has been found */ + err = 0; + goto finish_prev_log_search; + } + } else { + /* intermediate partial log */ + if (start_offset == i) { + /* + * Requested starting log_offset + * points out on segment header. + * It needs to skip this header + * because of searching the previous + * log. + */ + continue; + } + + env->log_offset = (u16)i; + + if (skipped_logs > log_diff) { + skipped_logs--; + err = -ENOENT; + continue; + } else { + /* log has been found */ + err = 0; + goto finish_prev_log_search; + } + } + } else if (__is_ssdfs_log_footer_magic_valid(magic)) { + footer = SSDFS_LF(env->footer); + + env->log_bytes = + le32_to_cpu(footer->log_bytes); + continue; + } else { + err = -ENOENT; + continue; + } + } + +finish_prev_log_search: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log_offset %u, log_bytes %u\n", + env->log_offset, + env->log_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_peb_complete_init_blk2off_table() - init blk2off table's fragment + * @pebi: pointer on PEB object + * @log_diff: offset for logs processing + * @req: read request + * + * This function tries to init blk2off table's fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-EIO - I/O error. + */ +static +int ssdfs_peb_complete_init_blk2off_table(struct ssdfs_peb_info *pebi, + int log_diff, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_blk2off_table *blk2off_table = NULL; + u64 cno; + unsigned long last_page_idx; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!req); + + SSDFS_DBG("seg %llu, peb %llu, log_diff %d, " + "class %#x, cmd %#x, type %#x\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, log_diff, + req->private.class, + req->private.cmd, + req->private.type); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + blk2off_table = pebi->pebc->parent_si->blk2off_table; + + switch (atomic_read(&blk2off_table->state)) { + case SSDFS_BLK2OFF_OBJECT_COMPLETE_INIT: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("blk2off table has been initialized: " + "peb_id %llu\n", + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + + default: + /* continue to init blk2off table */ + break; + } + + err = ssdfs_prepare_read_init_env(&pebi->env, fsi->pages_per_peb); + if (unlikely(err)) { + SSDFS_ERR("fail to init read environment: err %d\n", + err); + return err; + } + + last_page_idx = ssdfs_page_array_get_last_page_index(&pebi->cache); + + if (last_page_idx >= SSDFS_PAGE_ARRAY_INVALID_LAST_PAGE) { + SSDFS_ERR("empty page array: last_page_idx %lu\n", + last_page_idx); + return -ERANGE; + } + + if (last_page_idx >= fsi->pages_per_peb) { + SSDFS_ERR("corrupted page array: " + "last_page_idx %lu, fsi->pages_per_peb %u\n", + last_page_idx, fsi->pages_per_peb); + return -ERANGE; + } + + pebi->env.log_offset = (u32)last_page_idx + 1; + + do { + err = ssdfs_find_prev_partial_log(fsi, pebi, + &pebi->env, log_diff); + if (err == -ENOENT) { + if (pebi->env.log_offset > 0) { + SSDFS_ERR("fail to find prev log: " + "log_offset %u, err %d\n", + pebi->env.log_offset, err); + goto fail_init_blk2off_table; + } else { + /* no previous log exists */ + err = 0; + SSDFS_DBG("no previous log exists\n"); + goto fail_init_blk2off_table; + } + } else if (unlikely(err)) { + SSDFS_ERR("fail to find prev log: " + "log_offset %u, err %d\n", + pebi->env.log_offset, err); + goto fail_init_blk2off_table; + } + + err = ssdfs_pre_fetch_blk2off_table_area(pebi, &pebi->env); + if (err == -ENOENT) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("blk2off table's fragment is absent: " + "seg %llu, peb %llu, log_offset %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + pebi->env.log_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + goto try_next_log; + } else if (unlikely(err)) { + SSDFS_ERR("fail to pre-fetch blk2off_table area: " + "seg %llu, peb %llu, log_offset %u, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + pebi->env.log_offset, + err); + goto fail_init_blk2off_table; + } + + err = ssdfs_pre_fetch_blk_desc_table_area(pebi, &pebi->env); + if (err == -ENOENT) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("blk desc table's fragment is absent: " + "seg %llu, peb %llu, log_offset %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + pebi->env.log_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + goto try_next_log; + } else if (unlikely(err)) { + SSDFS_ERR("fail to pre-fetch blk desc table area: " + "seg %llu, peb %llu, log_offset %u, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + pebi->env.log_offset, + err); + goto fail_init_blk2off_table; + } + + err = ssdfs_read_blk2off_table_fragment(pebi, &pebi->env); + if (unlikely(err)) { + SSDFS_ERR("fail to read translation table fragments: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); + goto fail_init_blk2off_table; + } + + if (pebi->env.has_seg_hdr) { + struct ssdfs_segment_header *seg_hdr = NULL; + + seg_hdr = SSDFS_SEG_HDR(pebi->env.log_hdr); + cno = le64_to_cpu(seg_hdr->cno); + } else { + struct ssdfs_partial_log_header *pl_hdr = NULL; + + pl_hdr = SSDFS_PLH(pebi->env.log_hdr); + cno = le64_to_cpu(pl_hdr->cno); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, peb %llu, " + "env.log_offset %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->env.log_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_blk2off_table_partial_init(blk2off_table, + &pebi->env.t_init.pvec, + &pebi->env.bdt_init.pvec, + pebi->peb_index, + cno); + if (unlikely(err)) { + SSDFS_ERR("fail to start init of offset table: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); + goto fail_init_blk2off_table; + } + +try_next_log: + ssdfs_read_pagevec_release(&pebi->env.t_init.pvec); + ssdfs_read_pagevec_release(&pebi->env.bdt_init.pvec); + log_diff = 0; + } while (pebi->env.log_offset > 0); + +fail_init_blk2off_table: + ssdfs_destroy_init_env(&pebi->env); + return err; +} + +/* + * ssdfs_start_complete_init_blk2off_table() - start to init blk2off table + * @pebi: pointer on PEB object + * @req: read request + * + * This function tries to start the initialization of blk2off table. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-EIO - I/O error. + */ +static +int ssdfs_start_complete_init_blk2off_table(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req) +{ + int log_diff = -1; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !req); + + SSDFS_DBG("peb_id %llu, peb_index %u\n", + pebi->peb_id, pebi->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&pebi->current_log.state)) { + case SSDFS_LOG_INITIALIZED: + case SSDFS_LOG_CREATED: + case SSDFS_LOG_COMMITTED: + /* + * The last log was processed during initialization of + * "using" or "used" PEB. So, it needs to process the + * log before the last one. + */ + log_diff = -1; + break; + + default: + /* + * It needs to process the last log. + */ + log_diff = 0; + break; + } + + err = ssdfs_peb_complete_init_blk2off_table(pebi, log_diff, req); + if (unlikely(err)) { + SSDFS_ERR("fail to complete blk2off table init: " + "peb_id %llu, peb_index %u, " + "log_diff %d, err %d\n", + pebi->peb_id, pebi->peb_index, + log_diff, err); + } + + return err; +} + +/* + * ssdfs_finish_complete_init_blk2off_table() - finish to init blk2off table + * @pebi: pointer on PEB object + * @req: read request + * + * This function tries to finish the initialization of blk2off table. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-EIO - I/O error. + */ +static +int ssdfs_finish_complete_init_blk2off_table(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_maptbl_peb_relation pebr; + struct completion *end; + struct ssdfs_maptbl_peb_descriptor *ptr; + u64 leb_id; + int log_diff = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !req); + BUG_ON(!pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + + SSDFS_DBG("peb_id %llu, peb_index %u\n", + pebi->peb_id, pebi->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + leb_id = ssdfs_get_leb_id_for_peb_index(fsi, + pebi->pebc->parent_si->seg_id, + pebi->peb_index); + if (leb_id == U64_MAX) { + SSDFS_ERR("fail to convert PEB index into LEB ID: " + "seg %llu, peb_index %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_index); + return -ERANGE; + } + + err = ssdfs_maptbl_convert_leb2peb(fsi, leb_id, + pebi->pebc->peb_type, + &pebr, &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + return err; + } + + err = ssdfs_maptbl_convert_leb2peb(fsi, leb_id, + pebi->pebc->peb_type, + &pebr, &end); + } + + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB is not mapped: leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, peb_type %#x, err %d\n", + leb_id, pebi->pebc->peb_type, err); + return err; + } + + ptr = &pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX]; + + if (ptr->peb_id != pebi->peb_id) { + SSDFS_ERR("ptr->peb_id %llu != pebi->peb_id %llu\n", + ptr->peb_id, pebi->peb_id); + return -ERANGE; + } + + switch (ptr->state) { + case SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ignore PEB: peb_id %llu, state %#x\n", + pebi->peb_id, ptr->state); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + + default: + /* continue logic */ + break; + } + + switch (atomic_read(&pebi->current_log.state)) { + case SSDFS_LOG_INITIALIZED: + case SSDFS_LOG_CREATED: + case SSDFS_LOG_COMMITTED: + /* + * It needs to process the last log of source PEB. + * The destination PEB has been/will be processed + * in a real pair. + */ + log_diff = 0; + break; + + default: + /* + * It needs to process the last log. + */ + log_diff = 0; + break; + } + + err = ssdfs_peb_complete_init_blk2off_table(pebi, log_diff, req); + if (unlikely(err)) { + SSDFS_ERR("fail to complete blk2off table init: " + "peb_id %llu, peb_index %u, " + "log_diff %d, err %d\n", + pebi->peb_id, pebi->peb_index, log_diff, err); + } + + return err; +} + +/* + * ssdfs_src_peb_complete_init_blk2off_table() - init src PEB's blk2off table + * @pebi: pointer on PEB object + * @req: read request + * + * This function tries to init the source PEB's blk2off table. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-EIO - I/O error. + */ +static +int ssdfs_src_peb_complete_init_blk2off_table(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req) +{ + struct ssdfs_peb_info *pebi; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, pebc->peb_index); +#else + SSDFS_DBG("seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, pebc->peb_index); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + down_read(&pebc->lock); + + pebi = pebc->src_peb; + if (!pebi) { + SSDFS_WARN("source PEB is NULL\n"); + err = -ERANGE; + goto finish_src_peb_init_blk2off_table; + } + + err = ssdfs_start_complete_init_blk2off_table(pebi, req); + if (unlikely(err)) { + SSDFS_ERR("fail to complete blk2off table init: " + "seg_id %llu, peb_index %u, " + "err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, + err); + goto finish_src_peb_init_blk2off_table; + } + +finish_src_peb_init_blk2off_table: + up_read(&pebc->lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: err %d\n", err); +#else + SSDFS_DBG("finished: err %d\n", err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * ssdfs_dst_peb_complete_init_blk2off_table() - init dst PEB's blk2off table + * @pebi: pointer on PEB object + * @req: read request + * + * This function tries to init the destination PEB's blk2off table. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-EIO - I/O error. + */ +static +int ssdfs_dst_peb_complete_init_blk2off_table(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req) +{ + struct ssdfs_peb_info *pebi; + int items_state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, pebc->peb_index); +#else + SSDFS_DBG("seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, pebc->peb_index); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + down_read(&pebc->lock); + + items_state = atomic_read(&pebc->items_state); + switch (items_state) { + case SSDFS_PEB1_DST_CONTAINER: + case SSDFS_PEB2_DST_CONTAINER: + pebi = pebc->dst_peb; + if (!pebi) { + SSDFS_WARN("destination PEB is NULL\n"); + err = -ERANGE; + goto finish_dst_peb_init_blk2off_table; + } + + err = ssdfs_start_complete_init_blk2off_table(pebi, req); + if (unlikely(err)) { + SSDFS_ERR("fail to complete blk2off table init: " + "seg_id %llu, peb_index %u, " + "err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, + err); + goto finish_dst_peb_init_blk2off_table; + } + break; + + case SSDFS_PEB1_SRC_EXT_PTR_DST_CONTAINER: + case SSDFS_PEB2_SRC_EXT_PTR_DST_CONTAINER: + pebi = pebc->src_peb; + if (!pebi) { + SSDFS_WARN("source PEB is NULL\n"); + err = -ERANGE; + goto finish_dst_peb_init_blk2off_table; + } + + err = ssdfs_finish_complete_init_blk2off_table(pebi, req); + if (unlikely(err)) { + SSDFS_ERR("fail to complete blk2off table init: " + "seg_id %llu, peb_index %u, " + "err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, + err); + goto finish_dst_peb_init_blk2off_table; + } + break; + + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + pebi = pebc->dst_peb; + if (!pebi) { + SSDFS_WARN("destination PEB is NULL\n"); + err = -ERANGE; + goto finish_dst_peb_init_blk2off_table; + } + + err = ssdfs_start_complete_init_blk2off_table(pebi, req); + if (unlikely(err)) { + SSDFS_ERR("fail to complete blk2off table init: " + "seg_id %llu, peb_index %u, " + "err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, + err); + goto finish_dst_peb_init_blk2off_table; + } + + pebi = pebc->src_peb; + if (!pebi) { + SSDFS_WARN("source PEB is NULL\n"); + err = -ERANGE; + goto finish_dst_peb_init_blk2off_table; + } + + err = ssdfs_finish_complete_init_blk2off_table(pebi, req); + if (unlikely(err)) { + SSDFS_ERR("fail to complete blk2off table init: " + "seg_id %llu, peb_index %u, " + "err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, + err); + goto finish_dst_peb_init_blk2off_table; + } + break; + + default: + BUG(); + } + +finish_dst_peb_init_blk2off_table: + up_read(&pebc->lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: err %d\n", err); +#else + SSDFS_DBG("finished: err %d\n", err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * ssdfs_peb_define_segbmap_seg_index() - define segbmap segment index + * @pebc: pointer on PEB container + * + * RETURN: + * [success] - segbmap segment index + * [failure] - U16_MAX + */ +static +u16 ssdfs_peb_define_segbmap_seg_index(struct ssdfs_peb_container *pebc) +{ + struct ssdfs_segment_bmap *segbmap; + int seg_index; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!pebc->parent_si->fsi->segbmap); + + SSDFS_DBG("seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + segbmap = pebc->parent_si->fsi->segbmap; + + down_read(&segbmap->resize_lock); + + seg_index = ssdfs_segbmap_seg_id_2_seg_index(segbmap, + pebc->parent_si->seg_id); + if (seg_index < 0) { + SSDFS_ERR("fail to convert seg_id %llu, err %d\n", + pebc->parent_si->seg_id, seg_index); + seg_index = U16_MAX; + } + + up_read(&segbmap->resize_lock); + + return (u16)seg_index; +} + +/* + * ssdfs_peb_define_segbmap_sequence_id() - define fragment's sequence ID + * @pebc: pointer on PEB container + * @seg_index: index of segment in segbmap's segments sequence + * @logical_offset: logical offset + * + * RETURN: + * [success] - sequence ID + * [failure] - U16_MAX + */ +static +u16 ssdfs_peb_define_segbmap_sequence_id(struct ssdfs_peb_container *pebc, + u16 seg_index, + u64 logical_offset) +{ + struct ssdfs_segment_bmap *segbmap; + u16 peb_index; + u16 fragments_per_seg; + u16 fragment_size; + u32 fragments_bytes_per_seg; + u64 seg_logical_offset; + u32 id; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!pebc->parent_si->fsi->segbmap); + + SSDFS_DBG("seg_id %llu, seg_index %u, " + "peb_index %u, logical_offset %llu\n", + pebc->parent_si->seg_id, seg_index, + pebc->peb_index, logical_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + segbmap = pebc->parent_si->fsi->segbmap; + peb_index = pebc->peb_index; + + down_read(&segbmap->resize_lock); + fragments_per_seg = segbmap->fragments_per_seg; + fragment_size = segbmap->fragment_size; + fragments_bytes_per_seg = + (u32)segbmap->fragments_per_seg * fragment_size; + up_read(&segbmap->resize_lock); + + seg_logical_offset = (u64)seg_index * fragments_bytes_per_seg; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_index %u, seg_logical_offset %llu, " + "logical_offset %llu\n", + seg_index, seg_logical_offset, + logical_offset); + + BUG_ON(seg_logical_offset > logical_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + logical_offset -= seg_logical_offset; + + id = logical_offset / fragment_size; + id += seg_index * fragments_per_seg; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_index %u, fragments_per_seg %u, " + "logical_offset %llu, fragment_size %u, " + "id %u\n", + seg_index, fragments_per_seg, + logical_offset, fragment_size, + id); + + BUG_ON(id >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + return (u16)id; +} + +/* + * ssdfs_peb_define_segbmap_logical_extent() - define logical extent + * @pebc: pointer on PEB container + * @seg_index: index of segment in segment bitmap + * @ptr: pointer on segbmap extent [out] + */ +static +void ssdfs_peb_define_segbmap_logical_extent(struct ssdfs_peb_container *pebc, + u16 seg_index, + struct ssdfs_segbmap_extent *ptr) +{ + struct ssdfs_segment_bmap *segbmap; + u16 peb_index; + u32 fragments_bytes_per_seg; + u32 fragments_bytes_per_peb; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!pebc->parent_si->fsi->segbmap); + BUG_ON(!ptr); + + SSDFS_DBG("seg_id %llu, seg_index %u, peb_index %u, extent %p\n", + pebc->parent_si->seg_id, seg_index, + pebc->peb_index, ptr); +#endif /* CONFIG_SSDFS_DEBUG */ + + segbmap = pebc->parent_si->fsi->segbmap; + peb_index = pebc->peb_index; + + down_read(&segbmap->resize_lock); + ptr->fragment_size = segbmap->fragment_size; + fragments_bytes_per_seg = + (u32)segbmap->fragments_per_seg * ptr->fragment_size; + fragments_bytes_per_peb = + (u32)segbmap->fragments_per_peb * ptr->fragment_size; + ptr->logical_offset = fragments_bytes_per_seg * seg_index; + ptr->logical_offset += fragments_bytes_per_peb * peb_index; + ptr->data_size = segbmap->fragments_per_peb * ptr->fragment_size; + up_read(&segbmap->resize_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment_size %u, fragments_bytes_per_seg %u, " + "fragments_bytes_per_peb %u, seg_index %u, " + "peb_index %u, logical_offset %llu, data_size %u\n", + ptr->fragment_size, + fragments_bytes_per_seg, + fragments_bytes_per_peb, + seg_index, peb_index, + ptr->logical_offset, + ptr->data_size); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +/* + * ssdfs_peb_define_segbmap_logical_block() - convert offset into block number + * @pebc: pointer on PEB container + * @seg_index: index of segment in segment bitmap + * @logical_offset: logical offset + * + * RETURN: + * [success] - logical block number + * [failure] - U16_MAX + */ +static +u16 ssdfs_peb_define_segbmap_logical_block(struct ssdfs_peb_container *pebc, + u16 seg_index, + u64 logical_offset) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_bmap *segbmap; + u16 peb_index; + u32 fragments_bytes_per_seg; + u32 fragments_bytes_per_peb; + u32 blks_per_peb; + u64 seg_logical_offset; + u32 peb_blk_off, blk_off; + u32 logical_blk; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!pebc->parent_si->fsi->segbmap); + + SSDFS_DBG("seg_id %llu, peb_index %u, " + "logical_offset %llu\n", + pebc->parent_si->seg_id, pebc->peb_index, + logical_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebc->parent_si->fsi; + segbmap = fsi->segbmap; + peb_index = pebc->peb_index; + + down_read(&segbmap->resize_lock); + fragments_bytes_per_seg = + (u32)segbmap->fragments_per_seg * segbmap->fragment_size; + fragments_bytes_per_peb = + (u32)segbmap->fragments_per_peb * segbmap->fragment_size; + blks_per_peb = fragments_bytes_per_peb; + blks_per_peb >>= fsi->log_pagesize; + up_read(&segbmap->resize_lock); + + seg_logical_offset = (u64)seg_index * fragments_bytes_per_seg; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_index %u, seg_logical_offset %llu, " + "logical_offset %llu\n", + seg_index, seg_logical_offset, + logical_offset); + + BUG_ON(seg_logical_offset > logical_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + logical_offset -= seg_logical_offset; + + logical_blk = blks_per_peb * peb_index; + peb_blk_off = blks_per_peb * peb_index; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(peb_blk_off >= U16_MAX); + BUG_ON((logical_offset >> fsi->log_pagesize) >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + blk_off = (u32)(logical_offset >> fsi->log_pagesize); + + if (blk_off < peb_blk_off || blk_off >= (peb_blk_off + blks_per_peb)) { + SSDFS_ERR("invalid logical offset: " + "blk_off %u, peb_blk_off %u, " + "blks_per_peb %u, logical_offset %llu\n", + blk_off, peb_blk_off, + blks_per_peb, logical_offset); + return U16_MAX; + } + + logical_blk = blk_off - peb_blk_off; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_blk_off %u, blk_off %u, " + "logical_blk %u\n", + peb_blk_off, blk_off, + logical_blk); + + BUG_ON(logical_blk >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + return (u16)logical_blk; +} + +/* + * ssdfs_peb_read_segbmap_first_page() - read first page of segbmap + * @pebc: pointer on PEB container + * @seg_index: index of segment in segbmap's segments sequence + * @extent: requested extent for reading + * + * This method tries to read first page of segbmap, to check it + * and to initialize the available fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENODATA - no pages for read. + * %-ENOMEM - fail to allocate memory. + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_read_segbmap_first_page(struct ssdfs_peb_container *pebc, + u16 seg_index, + struct ssdfs_segbmap_extent *extent) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_request *req; + u16 pages_count = 1; + u16 logical_blk; + u16 sequence_id; + int state; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!extent); + BUG_ON(extent->fragment_size != PAGE_SIZE); + + SSDFS_DBG("seg %llu, peb_index %u, " + "logical_offset %llu, data_size %u, " + "fragment_size %u\n", + pebc->parent_si->seg_id, pebc->peb_index, + extent->logical_offset, extent->data_size, + extent->fragment_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebc->parent_si->fsi; + + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + return err; + } + + ssdfs_request_init(req); + ssdfs_get_request(req); + + ssdfs_request_prepare_logical_extent(SSDFS_SEG_BMAP_INO, + extent->logical_offset, + extent->fragment_size, + 0, 0, req); + + err = ssdfs_request_add_allocated_page_locked(req); + if (unlikely(err)) { + SSDFS_ERR("fail allocate memory page: err %d\n", err); + goto fail_read_segbmap_page; + } + + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + SSDFS_READ_PAGE, + SSDFS_REQ_SYNC, + req); + + ssdfs_request_define_segment(pebc->parent_si->seg_id, req); + + logical_blk = ssdfs_peb_define_segbmap_logical_block(pebc, + seg_index, + extent->logical_offset); + if (unlikely(logical_blk == U16_MAX)) { + err = -ERANGE; + SSDFS_ERR("fail to define logical block\n"); + goto fail_read_segbmap_page; + } + + if (fsi->pagesize < PAGE_SIZE) + pages_count = PAGE_SIZE >> fsi->log_pagesize; + + ssdfs_request_define_volume_extent(logical_blk, pages_count, req); + + err = ssdfs_peb_read_page(pebc, req, NULL); + if (unlikely(err)) { + SSDFS_ERR("fail to read page: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + goto fail_read_segbmap_page; + } + + if (!err) { + for (i = 0; i < req->result.processed_blks; i++) + ssdfs_peb_mark_request_block_uptodate(pebc, req, i); + } + + if (!ssdfs_segbmap_fragment_has_content(req->result.pvec.pages[0])) { + err = -ENODATA; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_index %u hasn't segbmap's fragments\n", + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + goto fail_read_segbmap_page; + } + + sequence_id = ssdfs_peb_define_segbmap_sequence_id(pebc, seg_index, + extent->logical_offset); + if (unlikely(sequence_id == U16_MAX)) { + err = -ERANGE; + SSDFS_ERR("fail to define sequence_id\n"); + goto fail_read_segbmap_page; + } + + err = ssdfs_segbmap_check_fragment_header(pebc, seg_index, sequence_id, + req->result.pvec.pages[0]); + if (unlikely(err)) { + SSDFS_CRIT("segbmap fragment is corrupted: err %d\n", + err); + } + + if (err) { + state = SSDFS_SEGBMAP_FRAG_INIT_FAILED; + goto fail_read_segbmap_page; + } else + state = SSDFS_SEGBMAP_FRAG_INITIALIZED; + + err = ssdfs_segbmap_fragment_init(pebc, sequence_id, + req->result.pvec.pages[0], + state); + if (unlikely(err)) { + SSDFS_ERR("fail to init fragment: " + "sequence_id %u, err %d\n", + sequence_id, err); + goto fail_read_segbmap_page; + } else + ssdfs_request_unlock_and_remove_page(req, 0); + + extent->logical_offset += extent->fragment_size; + extent->data_size -= extent->fragment_size; + +fail_read_segbmap_page: + ssdfs_request_unlock_and_remove_pages(req); + ssdfs_put_request(req); + ssdfs_request_free(req); + + return err; +} + +/* + * ssdfs_peb_read_segbmap_pages() - read pagevec-based amount of pages + * @pebc: pointer on PEB container + * @seg_index: index of segment in segbmap's segments sequence + * @extent: requested extent for reading + * + * This method tries to read pagevec-based amount of pages of + * segbmap in PEB (excluding the first one) and to initialize all + * available fragments. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENODATA - no pages for read. + * %-ENOMEM - fail to allocate memory. + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_read_segbmap_pages(struct ssdfs_peb_container *pebc, + u16 seg_index, + struct ssdfs_segbmap_extent *extent) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_request *req; + u32 read_bytes; + u16 fragments_count; + u16 pages_count = 1; + u16 logical_blk; + u16 sequence_id; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!extent); + BUG_ON(extent->fragment_size != PAGE_SIZE); + + SSDFS_DBG("seg %llu, peb_index %u, " + "logical_offset %llu, data_size %u, " + "fragment_size %u\n", + pebc->parent_si->seg_id, pebc->peb_index, + extent->logical_offset, extent->data_size, + extent->fragment_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebc->parent_si->fsi; + + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + return err; + } + + ssdfs_request_init(req); + ssdfs_get_request(req); + + read_bytes = min_t(u32, PAGEVEC_SIZE * PAGE_SIZE, + extent->data_size); + + ssdfs_request_prepare_logical_extent(SSDFS_SEG_BMAP_INO, + extent->logical_offset, + read_bytes, + 0, 0, req); + + fragments_count = read_bytes + extent->fragment_size - 1; + fragments_count /= extent->fragment_size; + + for (i = 0; i < fragments_count; i++) { + err = ssdfs_request_add_allocated_page_locked(req); + if (unlikely(err)) { + SSDFS_ERR("fail allocate memory page: err %d\n", err); + goto fail_read_segbmap_pages; + } + } + + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + SSDFS_READ_PAGES_READAHEAD, + SSDFS_REQ_SYNC, + req); + + ssdfs_request_define_segment(pebc->parent_si->seg_id, req); + + logical_blk = ssdfs_peb_define_segbmap_logical_block(pebc, + seg_index, + extent->logical_offset); + if (unlikely(logical_blk == U16_MAX)) { + err = -ERANGE; + SSDFS_ERR("fail to define logical block\n"); + goto fail_read_segbmap_pages; + } + + pages_count = (read_bytes + fsi->pagesize - 1) >> PAGE_SHIFT; + ssdfs_request_define_volume_extent(logical_blk, pages_count, req); + + err = ssdfs_peb_readahead_pages(pebc, req, NULL); + if (unlikely(err)) { + SSDFS_ERR("fail to read pages: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + goto fail_read_segbmap_pages; + } + + for (i = 0; i < req->result.processed_blks; i++) + ssdfs_peb_mark_request_block_uptodate(pebc, req, i); + + sequence_id = ssdfs_peb_define_segbmap_sequence_id(pebc, seg_index, + extent->logical_offset); + if (unlikely(sequence_id == U16_MAX)) { + err = -ERANGE; + SSDFS_ERR("fail to define sequence_id\n"); + goto fail_read_segbmap_pages; + } + + for (i = 0; i < fragments_count; i++) { + int state; + struct page *page = req->result.pvec.pages[i]; + + err = ssdfs_segbmap_check_fragment_header(pebc, seg_index, + sequence_id, page); + if (unlikely(err)) { + SSDFS_CRIT("segbmap fragment is corrupted: " + "sequence_id %u, err %d\n", + sequence_id, err); + } + + if (err) { + state = SSDFS_SEGBMAP_FRAG_INIT_FAILED; + goto fail_read_segbmap_pages; + } else + state = SSDFS_SEGBMAP_FRAG_INITIALIZED; + + err = ssdfs_segbmap_fragment_init(pebc, sequence_id, + page, state); + if (unlikely(err)) { + SSDFS_ERR("fail to init fragment: " + "sequence_id %u, err %d\n", + sequence_id, err); + goto fail_read_segbmap_pages; + } else + ssdfs_request_unlock_and_remove_page(req, i); + + sequence_id++; + } + + extent->logical_offset += read_bytes; + extent->data_size -= read_bytes; + +fail_read_segbmap_pages: + ssdfs_request_unlock_and_remove_pages(req); + ssdfs_put_request(req); + ssdfs_request_free(req); + + return err; +} + +/* + * ssdfs_peb_read_segbmap_rest_pages() - read all pages of segbmap in PEB + * @pebc: pointer on PEB container + * @seg_index: index of segment in segbmap's segments sequence + * @extent: requested extent for reading + * + * This method tries to read all pages of segbmap in PEB (excluding + * the first one) and initialize all available fragments. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENODATA - no pages for read. + */ +static +int ssdfs_peb_read_segbmap_rest_pages(struct ssdfs_peb_container *pebc, + u16 seg_index, + struct ssdfs_segbmap_extent *extent) +{ + int err = 0, err1; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!extent); + BUG_ON(extent->fragment_size != PAGE_SIZE); + + SSDFS_DBG("seg %llu, peb_index %u, " + "logical_offset %llu, data_size %u, " + "fragment_size %u\n", + pebc->parent_si->seg_id, pebc->peb_index, + extent->logical_offset, extent->data_size, + extent->fragment_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (extent->data_size == 0) { + SSDFS_DBG("extent->data_size == 0\n"); + return -ENODATA; + } + + do { + err1 = ssdfs_peb_read_segbmap_pages(pebc, seg_index, + extent); + if (unlikely(err1)) { + SSDFS_ERR("fail to read segbmap's pages: " + "logical_offset %llu, data_bytes %u, " + "err %d\n", + extent->logical_offset, + extent->data_size, + err1); + err = err1; + break; + } + } while (extent->data_size > 0); + + return err; +} + +/* + * ssdfs_peb_init_segbmap_object() - init segment bitmap object + * @pebc: pointer on PEB container + * @req: read request + * + * This method tries to initialize segment bitmap object. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_init_segbmap_object(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + u16 seg_index; + struct ssdfs_segbmap_extent extent = {0}; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi || !req); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg %llu, peb_index %u, " + "class %#x, cmd %#x, type %#x\n", + pebc->parent_si->seg_id, + pebc->peb_index, + req->private.class, req->private.cmd, + req->private.type); +#else + SSDFS_DBG("seg %llu, peb_index %u, " + "class %#x, cmd %#x, type %#x\n", + pebc->parent_si->seg_id, + pebc->peb_index, + req->private.class, req->private.cmd, + req->private.type); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + fsi = pebc->parent_si->fsi; + + seg_index = ssdfs_peb_define_segbmap_seg_index(pebc); + if (seg_index == U16_MAX) { + SSDFS_ERR("fail to determine segment index\n"); + return -ERANGE; + } + + ssdfs_peb_define_segbmap_logical_extent(pebc, seg_index, &extent); + + err = ssdfs_peb_read_segbmap_first_page(pebc, seg_index, &extent); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_index %u hasn't segbmap's content\n", + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to read PEB's segbmap first page: " + "err %d\n", err); + return err; + } + + err = ssdfs_peb_read_segbmap_rest_pages(pebc, seg_index, &extent); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_index %u has only one page\n", + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to read PEB's segbmap rest pages: " + "err %d\n", err); + return err; + } + + { + int err1 = ssdfs_peb_release_pages(pebc); + if (err1 == -ENODATA) { + SSDFS_DBG("PEB cache is empty\n"); + } else if (unlikely(err1)) { + SSDFS_ERR("fail to release pages: err %d\n", + err1); + } + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} + +/* + * ssdfs_maptbl_fragment_pages_count() - calculate count of pages in fragment + * @fsi: file system info object + * + * This method calculates count of pages in the mapping table's + * fragment. + * + * RETURN: + * [success] - count of pages in fragment + * [failure] - U16_MAX + */ +static inline +u16 ssdfs_maptbl_fragment_pages_count(struct ssdfs_fs_info *fsi) +{ + u32 fragment_pages; + +#ifdef CONFIG_SSDFS_DEBUG + if (fsi->maptbl->fragment_bytes % PAGE_SIZE) { + SSDFS_WARN("invalid fragment_bytes %u\n", + fsi->maptbl->fragment_bytes); + return U16_MAX; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + fragment_pages = fsi->maptbl->fragment_bytes / PAGE_SIZE; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(fragment_pages >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + return fragment_pages; +} + +/* + * ssdfs_peb_read_maptbl_fragment() - read mapping table's fragment's pages + * @pebc: pointer on PEB container + * @index: index of fragment in the PEB + * @logical_offset: logical offset of fragment in mapping table + * @logical_blk: starting logical block of fragment + * @fragment_bytes: size of fragment in bytes + * @area: fragment content [out] + * + * This method tries to read mapping table's fragment's pages. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-ENODATA - fragment hasn't content. + */ +static +int ssdfs_peb_read_maptbl_fragment(struct ssdfs_peb_container *pebc, + int index, u64 logical_offset, + u16 logical_blk, + u32 fragment_bytes, + struct ssdfs_maptbl_area *area) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_request *req; + u32 pagevec_bytes = (u32)PAGEVEC_SIZE << PAGE_SHIFT; + u32 cur_offset = 0; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi || !area); + + SSDFS_DBG("pebc %p, index %d, logical_offset %llu, " + "logical_blk %u, fragment_bytes %u, area %p\n", + pebc, index, logical_offset, + logical_blk, fragment_bytes, area); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebc->parent_si->fsi; + + if (fragment_bytes == 0) { + SSDFS_ERR("invalid fragment_bytes %u\n", + fragment_bytes); + return -ERANGE; + } + + do { + u32 size; + u16 pages_count; + int i; + + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + return err; + } + + ssdfs_request_init(req); + ssdfs_get_request(req); + + if (cur_offset == 0) + size = fsi->pagesize; + else + size = min_t(u32, fragment_bytes, pagevec_bytes); + + ssdfs_request_prepare_logical_extent(SSDFS_MAPTBL_INO, + logical_offset, size, + 0, 0, req); + + pages_count = (size + fsi->pagesize - 1) >> PAGE_SHIFT; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(pages_count > PAGEVEC_SIZE); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < pages_count; i++) { + err = ssdfs_request_add_allocated_page_locked(req); + if (unlikely(err)) { + SSDFS_ERR("fail allocate memory page: err %d\n", + err); + goto fail_read_maptbl_pages; + } + } + + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + SSDFS_READ_PAGES_READAHEAD, + SSDFS_REQ_SYNC, + req); + + ssdfs_request_define_segment(pebc->parent_si->seg_id, req); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_offset %llu, size %u, " + "logical_blk %u, pages_count %u\n", + logical_offset, size, + logical_blk, pages_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_define_volume_extent((u16)logical_blk, + pages_count, req); + + err = ssdfs_peb_readahead_pages(pebc, req, NULL); + if (unlikely(err)) { + SSDFS_ERR("fail to read pages: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + goto fail_read_maptbl_pages; + } + + for (i = 0; i < req->result.processed_blks; i++) + ssdfs_peb_mark_request_block_uptodate(pebc, req, i); + + if (cur_offset == 0) { + struct ssdfs_leb_table_fragment_header *hdr; + u16 magic; + void *kaddr; + bool is_fragment_valid = false; + + kaddr = kmap_local_page(req->result.pvec.pages[0]); + hdr = (struct ssdfs_leb_table_fragment_header *)kaddr; + magic = le16_to_cpu(hdr->magic); + is_fragment_valid = magic == SSDFS_LEB_TABLE_MAGIC; + area->portion_id = le16_to_cpu(hdr->portion_id); + kunmap_local(kaddr); + + if (!is_fragment_valid) { + err = -ENODATA; + area->portion_id = U16_MAX; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("empty fragment: " + "peb_index %u, index %d\n", + pebc->peb_index, index); +#endif /* CONFIG_SSDFS_DEBUG */ + goto fail_read_maptbl_pages; + } + } + + ssdfs_maptbl_move_fragment_pages(req, area, pages_count); + ssdfs_request_unlock_and_remove_pages(req); + ssdfs_put_request(req); + ssdfs_request_free(req); + + fragment_bytes -= size; + logical_offset += size; + cur_offset += size; + logical_blk += pages_count; + } while (fragment_bytes > 0); + + return 0; + +fail_read_maptbl_pages: + ssdfs_request_unlock_and_remove_pages(req); + ssdfs_put_request(req); + ssdfs_request_free(req); + + return err; +} + +/* + * ssdfs_peb_init_maptbl_object() - init mapping table's fragment + * @pebc: pointer on PEB container + * @req: read request + * + * This method tries to read and to init mapping table's fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_peb_init_maptbl_object(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_maptbl_area area = {0}; + u64 logical_offset; + u32 logical_blk; + u32 fragment_bytes; + u32 blks_per_fragment; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi || !req); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg %llu, peb_index %u, " + "class %#x, cmd %#x, type %#x\n", + pebc->parent_si->seg_id, pebc->peb_index, + req->private.class, req->private.cmd, + req->private.type); +#else + SSDFS_DBG("seg %llu, peb_index %u, " + "class %#x, cmd %#x, type %#x\n", + pebc->parent_si->seg_id, pebc->peb_index, + req->private.class, req->private.cmd, + req->private.type); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + fsi = pebc->parent_si->fsi; + + down_read(&fsi->maptbl->tbl_lock); + fragment_bytes = fsi->maptbl->fragment_bytes; + area.pages_count = 0; + area.pages_capacity = ssdfs_maptbl_fragment_pages_count(fsi); + up_read(&fsi->maptbl->tbl_lock); + + if (unlikely(area.pages_capacity >= U16_MAX)) { + err = -ERANGE; + SSDFS_ERR("invalid fragment pages_capacity\n"); + goto end_init; + } + + area.pages = ssdfs_read_kcalloc(area.pages_capacity, + sizeof(struct page *), + GFP_KERNEL); + if (!area.pages) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate memory: " + "area.pages_capacity %zu\n", + area.pages_capacity); + goto end_init; + } + + logical_offset = req->extent.logical_offset; + logical_blk = req->place.start.blk_index; + + blks_per_fragment = + (fragment_bytes + fsi->pagesize - 1) / fsi->pagesize; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(blks_per_fragment >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < fsi->maptbl->fragments_per_peb; i++) { + logical_offset = logical_offset + ((u64)fragment_bytes * i); + logical_blk = logical_blk + (blks_per_fragment * i); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(logical_blk >= U16_MAX); + + SSDFS_DBG("seg %llu, peb_index %d, " + "logical_offset %llu, logical_blk %u\n", + pebc->parent_si->seg_id, i, + logical_offset, logical_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_read_maptbl_fragment(pebc, i, + logical_offset, + (u16)logical_blk, + fragment_bytes, + &area); + if (err == -ENODATA) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_index %u hasn't more maptbl fragments: " + "last index %d\n", + pebc->peb_index, i); +#endif /* CONFIG_SSDFS_DEBUG */ + goto end_init; + } else if (unlikely(err)) { + SSDFS_ERR("fail to read maptbl fragment: " + "index %d, err %d\n", + i, err); + goto end_init; + } + + down_read(&fsi->maptbl->tbl_lock); + err = ssdfs_maptbl_fragment_init(pebc, &area); + up_read(&fsi->maptbl->tbl_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to init maptbl fragment: " + "index %d, err %d\n", + i, err); + goto end_init; + } + } + +end_init: + for (i = 0; i < area.pages_capacity; i++) { + if (area.pages[i]) { + ssdfs_read_free_page(area.pages[i]); + area.pages[i] = NULL; + } + } + + ssdfs_read_kfree(area.pages); + + { + int err1 = ssdfs_peb_release_pages(pebc); + if (err1 == -ENODATA) { + SSDFS_DBG("PEB cache is empty\n"); + } else if (unlikely(err1)) { + SSDFS_ERR("fail to release pages: err %d\n", + err1); + } + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: err %d\n", err); +#else + SSDFS_DBG("finished: err %d\n", err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * ssdfs_peb_get_last_log_time() - get PEB's last log timestamp + * @fsi: file system info object + * @pebi: pointer on PEB object + * @page_off: page offset to footer's placement + * @peb_create_time: PEB's create timestamp [out] + * @last_log_time: PEB's last log timestamp + * + * This method tries to read the last log footer of PEB + * and retrieve peb_create_time and last_log_time. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-ENODATA - no valid log footer. + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_get_last_log_time(struct ssdfs_fs_info *fsi, + struct ssdfs_peb_info *pebi, + u32 page_off, + u64 *peb_create_time, + u64 *last_log_time) +{ + struct ssdfs_signature *magic = NULL; + struct ssdfs_partial_log_header *plh_hdr = NULL; + struct ssdfs_log_footer *footer = NULL; + struct page *page; + void *kaddr; + u32 bytes_off; + size_t read_bytes; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!peb_create_time || !last_log_time); + + SSDFS_DBG("seg %llu, peb_id %llu, page_off %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, page_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + *peb_create_time = U64_MAX; + *last_log_time = U64_MAX; + + page = ssdfs_page_array_grab_page(&pebi->cache, page_off); + if (unlikely(IS_ERR_OR_NULL(page))) { + SSDFS_ERR("fail to grab page: index %u\n", + page_off); + return -ENOMEM; + } + + kaddr = kmap_local_page(page); + + if (PageUptodate(page) || PageDirty(page)) + goto check_footer_magic; + + bytes_off = page_off * fsi->pagesize; + + err = ssdfs_aligned_read_buffer(fsi, pebi->peb_id, + bytes_off, + (u8 *)kaddr, + PAGE_SIZE, + &read_bytes); + if (unlikely(err)) + goto fail_read_footer; + else if (unlikely(read_bytes != PAGE_SIZE)) { + err = -ERANGE; + goto fail_read_footer; + } + + SetPageUptodate(page); + +check_footer_magic: + magic = (struct ssdfs_signature *)kaddr; + + if (!is_ssdfs_magic_valid(magic)) { + err = -ENODATA; + goto fail_read_footer; + } + + if (is_ssdfs_partial_log_header_magic_valid(magic)) { + plh_hdr = SSDFS_PLH(kaddr); + *peb_create_time = le64_to_cpu(plh_hdr->peb_create_time); + *last_log_time = le64_to_cpu(plh_hdr->timestamp); + } else if (__is_ssdfs_log_footer_magic_valid(magic)) { + footer = SSDFS_LF(kaddr); + *peb_create_time = le64_to_cpu(footer->peb_create_time); + *last_log_time = le64_to_cpu(footer->timestamp); + } else { + err = -ENODATA; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log footer is corrupted: " + "peb %llu, page_off %u\n", + pebi->peb_id, page_off); +#endif /* CONFIG_SSDFS_DEBUG */ + goto fail_read_footer; + } + +fail_read_footer: + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("valid footer is not detected: " + "seg_id %llu, peb_id %llu, " + "page_off %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + page_off); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to read footer: " + "seg %llu, peb %llu, " + "pages_off %u, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + page_off, + err); + return err; + } + + return 0; +} + +/* + * ssdfs_peb_read_last_log_footer() - read PEB's last log footer + * @pebi: pointer on PEB object + * @req: read request + * + * This method tries to read the last log footer of PEB + * and initialize peb_create_time and last_log_time fields. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-ENODATA - no valid log footer. + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_read_last_log_footer(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + u32 log_bytes; + u32 pages_per_log; + u32 logs_count; + u32 page_off; + u64 peb_create_time; + u64 last_log_time; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi || !req); + + SSDFS_DBG("seg %llu, peb %llu, " + "class %#x, cmd %#x, type %#x\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + req->private.class, req->private.cmd, + req->private.type); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + page_off = 0; + + err = __ssdfs_peb_read_log_header(fsi, pebi, page_off, + &log_bytes); + if (unlikely(err)) { + SSDFS_ERR("fail to read log header: " + "seg %llu, peb %llu, page_off %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + page_off, + err); + return err; + } + + pages_per_log = log_bytes + fsi->pagesize - 1; + pages_per_log /= fsi->pagesize; + logs_count = fsi->pages_per_peb / pages_per_log; + + for (i = logs_count; i > 0; i--) { + page_off = (i * pages_per_log) - 1; + + err = ssdfs_peb_get_last_log_time(fsi, pebi, + page_off, + &peb_create_time, + &last_log_time); + if (err == -ENODATA) + continue; + else if (unlikely(err)) { + SSDFS_ERR("fail to get last log time: " + "seg %llu, peb %llu, " + "page_off %u, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + page_off, + err); + return err; + } else + break; + } + + if (i <= 0 || err == -ENODATA) { + SSDFS_ERR("fail to get last log time: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); + return -ERANGE; + } + + pebi->peb_create_time = peb_create_time; + pebi->current_log.last_log_time = last_log_time; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, " + "peb_create_time %llx, last_log_time %llx\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + peb_create_time, + last_log_time); + + BUG_ON(pebi->peb_create_time > last_log_time); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_peb_read_src_last_log_footer() - read src PEB's last log footer + * @pebc: pointer on PEB container + * @req: read request + * + * This method tries to read the last log footer of source PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-ENODATA - no valid log footer. + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_read_src_last_log_footer(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_peb_info *pebi; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi || !req); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg %llu, peb_index %u, " + "class %#x, cmd %#x, type %#x\n", + pebc->parent_si->seg_id, pebc->peb_index, + req->private.class, req->private.cmd, + req->private.type); +#else + SSDFS_DBG("seg %llu, peb_index %u, " + "class %#x, cmd %#x, type %#x\n", + pebc->parent_si->seg_id, pebc->peb_index, + req->private.class, req->private.cmd, + req->private.type); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + fsi = pebc->parent_si->fsi; + + down_read(&pebc->lock); + + pebi = pebc->src_peb; + if (!pebi) { + SSDFS_WARN("source PEB is NULL\n"); + err = -ERANGE; + goto finish_read_src_last_log_footer; + } + + err = ssdfs_peb_read_last_log_footer(pebi, req); + if (unlikely(err)) { + SSDFS_ERR("fail to read last log's footer: " + "peb_id %llu, peb_index %u, err %d\n", + pebi->peb_id, pebi->peb_index, err); + goto finish_read_src_last_log_footer; + } + +finish_read_src_last_log_footer: + up_read(&pebc->lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: err %d\n", err); +#else + SSDFS_DBG("finished: err %d\n", err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * ssdfs_peb_read_dst_last_log_footer() - read dst PEB's last log footer + * @pebc: pointer on PEB container + * @req: read request + * + * This method tries to read the last log footer of destination PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-ENODATA - no valid log footer. + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_read_dst_last_log_footer(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_peb_info *pebi; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi || !req); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg %llu, peb_index %u, " + "class %#x, cmd %#x, type %#x\n", + pebc->parent_si->seg_id, pebc->peb_index, + req->private.class, req->private.cmd, + req->private.type); +#else + SSDFS_DBG("seg %llu, peb_index %u, " + "class %#x, cmd %#x, type %#x\n", + pebc->parent_si->seg_id, pebc->peb_index, + req->private.class, req->private.cmd, + req->private.type); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + fsi = pebc->parent_si->fsi; + + down_read(&pebc->lock); + + pebi = pebc->dst_peb; + if (!pebi) { + SSDFS_WARN("destination PEB is NULL\n"); + err = -ERANGE; + goto finish_read_dst_last_log_footer; + } + + err = ssdfs_peb_read_last_log_footer(pebi, req); + if (unlikely(err)) { + SSDFS_ERR("fail to read last log's footer: " + "peb_id %llu, peb_index %u, err %d\n", + pebi->peb_id, pebi->peb_index, err); + goto finish_read_dst_last_log_footer; + } + +finish_read_dst_last_log_footer: + up_read(&pebc->lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: err %d\n", err); +#else + SSDFS_DBG("finished: err %d\n", err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * ssdfs_process_read_request() - process read request + * @pebc: pointer on PEB container + * @req: read request + * + * This function detects command of read request and + * to call a proper function for request processing. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + */ +static +int ssdfs_process_read_request(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !req); + + SSDFS_DBG("req %p, class %#x, cmd %#x, type %#x\n", + req, req->private.class, req->private.cmd, + req->private.type); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (req->private.cmd < SSDFS_READ_PAGE || + req->private.cmd >= SSDFS_READ_CMD_MAX) { + SSDFS_ERR("unknown read command %d, seg %llu, peb_index %u\n", + req->private.cmd, pebc->parent_si->seg_id, + pebc->peb_index); + atomic_set(&req->result.state, SSDFS_REQ_FAILED); + req->result.err = -EINVAL; + return -EINVAL; + } + + atomic_set(&req->result.state, SSDFS_REQ_STARTED); + + switch (req->private.cmd) { + case SSDFS_READ_PAGE: + err = ssdfs_peb_read_page(pebc, req, NULL); + if (unlikely(err)) { + ssdfs_fs_error(pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to read page: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + case SSDFS_READ_PAGES_READAHEAD: + err = ssdfs_peb_readahead_pages(pebc, req, NULL); + if (unlikely(err)) { + ssdfs_fs_error(pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to read pages: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + case SSDFS_READ_SRC_ALL_LOG_HEADERS: + err = ssdfs_peb_read_src_all_log_headers(pebc, req); + if (unlikely(err)) { + ssdfs_fs_error(pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to read log headers: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + case SSDFS_READ_DST_ALL_LOG_HEADERS: + err = ssdfs_peb_read_dst_all_log_headers(pebc, req); + if (unlikely(err)) { + ssdfs_fs_error(pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to read log headers: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + case SSDFS_READ_BLK_BMAP_SRC_USING_PEB: + err = ssdfs_src_peb_init_using_metadata_state(pebc, req); + if (unlikely(err)) { + ssdfs_fs_error(pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to init source PEB (using state): " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + case SSDFS_READ_BLK_BMAP_DST_USING_PEB: + err = ssdfs_dst_peb_init_using_metadata_state(pebc, req); + if (unlikely(err)) { + ssdfs_fs_error(pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to init destination PEB (using state): " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + case SSDFS_READ_BLK_BMAP_SRC_USED_PEB: + err = ssdfs_src_peb_init_used_metadata_state(pebc, req); + if (unlikely(err)) { + ssdfs_fs_error(pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to init source PEB (used state): " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + case SSDFS_READ_BLK_BMAP_DST_USED_PEB: + err = ssdfs_dst_peb_init_used_metadata_state(pebc, req); + if (unlikely(err)) { + ssdfs_fs_error(pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to init destination PEB (used state): " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + case SSDFS_READ_BLK2OFF_TABLE_SRC_PEB: + err = ssdfs_src_peb_complete_init_blk2off_table(pebc, req); + if (unlikely(err)) { + ssdfs_fs_error(pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to finish offset table init: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + case SSDFS_READ_BLK2OFF_TABLE_DST_PEB: + err = ssdfs_dst_peb_complete_init_blk2off_table(pebc, req); + if (unlikely(err)) { + ssdfs_fs_error(pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to finish offset table init: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + case SSDFS_READ_INIT_SEGBMAP: + err = ssdfs_peb_init_segbmap_object(pebc, req); + if (unlikely(err)) { + ssdfs_fs_error(pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to init segment bitmap object: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + case SSDFS_READ_INIT_MAPTBL: + err = ssdfs_peb_init_maptbl_object(pebc, req); + if (unlikely(err)) { + ssdfs_fs_error(pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to init mapping table object: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + case SSDFS_READ_SRC_LAST_LOG_FOOTER: + err = ssdfs_peb_read_src_last_log_footer(pebc, req); + if (unlikely(err)) { + ssdfs_fs_error(pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to read last log footer: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + case SSDFS_READ_DST_LAST_LOG_FOOTER: + err = ssdfs_peb_read_dst_last_log_footer(pebc, req); + if (unlikely(err)) { + ssdfs_fs_error(pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to read last log footer: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + default: + BUG(); + } + + if (unlikely(err)) + atomic_set(&req->result.state, SSDFS_REQ_FAILED); + + return err; +} + +/* + * ssdfs_finish_read_request() - finish read request + * @pebc: pointer on PEB container + * @req: segment request + * @wait: wait queue head + * @err: error code (read request failure code) + * + * This function makes final activity with read request. + */ +static +void ssdfs_finish_read_request(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req, + wait_queue_head_t *wait, int err) +{ + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !req); + + SSDFS_DBG("req %p, class %#x, cmd %#x, type %#x, err %d\n", + req, req->private.class, req->private.cmd, + req->private.type, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!err) { + for (i = 0; i < req->result.processed_blks; i++) + ssdfs_peb_mark_request_block_uptodate(pebc, req, i); + } + + req->result.err = err; + + if (err) + atomic_set(&req->result.state, SSDFS_REQ_FAILED); + else + atomic_set(&req->result.state, SSDFS_REQ_FINISHED); + + switch (req->private.type) { + case SSDFS_REQ_SYNC: + complete(&req->result.wait); + wake_up_all(&req->private.wait_queue); + break; + + case SSDFS_REQ_ASYNC: + complete(&req->result.wait); + + ssdfs_put_request(req); + if (atomic_read(&req->private.refs_count) != 0) { + err = wait_event_killable_timeout(*wait, + atomic_read(&req->private.refs_count) == 0, + SSDFS_DEFAULT_TIMEOUT); + if (err < 0) + WARN_ON(err < 0); + else + err = 0; + } + + wake_up_all(&req->private.wait_queue); + + for (i = 0; i < pagevec_count(&req->result.pvec); i++) { + struct page *page = req->result.pvec.pages[i]; + + if (!page) { + SSDFS_WARN("page %d is NULL\n", i); + continue; + } + +#ifdef CONFIG_SSDFS_DEBUG + WARN_ON(!PageLocked(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); + SSDFS_DBG("page_index %llu, flags %#lx\n", + (u64)page_index(page), page->flags); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + ssdfs_request_free(req); + break; + + case SSDFS_REQ_ASYNC_NO_FREE: + complete(&req->result.wait); + + ssdfs_put_request(req); + if (atomic_read(&req->private.refs_count) != 0) { + err = wait_event_killable_timeout(*wait, + atomic_read(&req->private.refs_count) == 0, + SSDFS_DEFAULT_TIMEOUT); + if (err < 0) + WARN_ON(err < 0); + else + err = 0; + } + + wake_up_all(&req->private.wait_queue); + + for (i = 0; i < pagevec_count(&req->result.pvec); i++) { + struct page *page = req->result.pvec.pages[i]; + + if (!page) { + SSDFS_WARN("page %d is NULL\n", i); + continue; + } + +#ifdef CONFIG_SSDFS_DEBUG + WARN_ON(!PageLocked(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); + SSDFS_DBG("page_index %llu, flags %#lx\n", + (u64)page_index(page), page->flags); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + default: + BUG(); + }; + + ssdfs_peb_finish_read_request_cno(pebc); +} + +#define READ_THREAD_WAKE_CONDITION(pebc) \ + (kthread_should_stop() || \ + !is_ssdfs_requests_queue_empty(READ_RQ_PTR(pebc))) +#define READ_FAILED_THREAD_WAKE_CONDITION() \ + (kthread_should_stop()) +#define READ_THREAD_WAKEUP_TIMEOUT (msecs_to_jiffies(3000)) + +/* + * ssdfs_peb_read_thread_func() - main fuction of read thread + * @data: pointer on data object + * + * This function is main fuction of read thread. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + */ +int ssdfs_peb_read_thread_func(void *data) +{ + struct ssdfs_peb_container *pebc = data; + wait_queue_head_t *wait_queue; + struct ssdfs_segment_request *req; + u64 timeout = READ_THREAD_WAKEUP_TIMEOUT; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + if (!pebc) { + SSDFS_ERR("pointer on PEB container is NULL\n"); + BUG(); + } + + SSDFS_DBG("read thread: seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + wait_queue = &pebc->parent_si->wait_queue[SSDFS_PEB_READ_THREAD]; + +repeat: + if (kthread_should_stop()) { + complete_all(&pebc->thread[SSDFS_PEB_READ_THREAD].full_stop); + return err; + } + + if (is_ssdfs_requests_queue_empty(&pebc->read_rq)) + goto sleep_read_thread; + + do { + err = ssdfs_requests_queue_remove_first(&pebc->read_rq, + &req); + if (err == -ENODATA) { + /* empty queue */ + err = 0; + break; + } else if (err == -ENOENT) { + SSDFS_WARN("request queue contains NULL request\n"); + err = 0; + continue; + } else if (unlikely(err < 0)) { + SSDFS_CRIT("fail to get request from the queue: " + "err %d\n", + err); + goto sleep_failed_read_thread; + } + + err = ssdfs_process_read_request(pebc, req); + if (unlikely(err)) { + SSDFS_ERR("fail to process read request: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + } + + ssdfs_finish_read_request(pebc, req, wait_queue, err); + } while (!is_ssdfs_requests_queue_empty(&pebc->read_rq)); + +sleep_read_thread: + wait_event_interruptible_timeout(*wait_queue, + READ_THREAD_WAKE_CONDITION(pebc), + timeout); + if (!is_ssdfs_requests_queue_empty(&pebc->read_rq)) { + /* do requests processing */ + goto repeat; + } else { + if (is_it_time_free_peb_cache_memory(pebc)) { + err = ssdfs_peb_release_pages(pebc); + if (err == -ENODATA) { + err = 0; + timeout = min_t(u64, timeout * 2, + (u64)SSDFS_DEFAULT_TIMEOUT); + } else if (unlikely(err)) { + SSDFS_ERR("fail to release pages: " + "err %d\n", err); + err = 0; + } else + timeout = READ_THREAD_WAKEUP_TIMEOUT; + } + + if (!is_ssdfs_requests_queue_empty(&pebc->read_rq) || + kthread_should_stop()) + goto repeat; + else + goto sleep_read_thread; + } + +sleep_failed_read_thread: + ssdfs_peb_release_pages(pebc); + wait_event_interruptible(*wait_queue, + READ_FAILED_THREAD_WAKE_CONDITION()); + goto repeat; +} From patchwork Sat Feb 25 01:08:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151929 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE155C7EE31 for ; Sat, 25 Feb 2023 01:17:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229728AbjBYBRQ (ORCPT ); Fri, 24 Feb 2023 20:17:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48798 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229533AbjBYBQe (ORCPT ); Fri, 24 Feb 2023 20:16:34 -0500 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 12F2712848 for ; Fri, 24 Feb 2023 17:16:29 -0800 (PST) Received: by mail-oi1-x22b.google.com with SMTP id bg11so810094oib.5 for ; Fri, 24 Feb 2023 17:16:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=T+oc8VyWeDYkD3EcU5LUBCQGg3XnApkJQ4I3CKTKmEM=; b=snJQfisqrg4G7/+MXR1D51TWD2aaV7SQm6hrdaKDTOf8K7F4UoXtj3ZOH+zJ8WFOiO dlspvCW9FP3bWfY1IQzRLxiU8wSVrJBnveHPBwV5hxhQ7hURnPXwYnD8/VArcy9KVU8P ucg4CtSsy6/s5skLz/IzuIbiNsCscimPmoOv57uZWRJUcbabD8KfEQK8FO15AbxpUpIm Vw4UEiewIepHfsOuYza3ItgVtHlVoJ8G354VFyMXpKVaZKp5VGJiIF1N3dB+knDYt9ge wBhuC1KkdqO98yZhuW5FWJBlUpJYntJjkYVpxdBlvtkGnitAgJtfifpph3EW1p21f/AI 8hAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=T+oc8VyWeDYkD3EcU5LUBCQGg3XnApkJQ4I3CKTKmEM=; b=BM4EwmkwgfdxUovPKjHeGn8r0q3qwZUxa684E2Tmk/U91DuE6yStcy8eUipLQhO3YK 0nwOwXKy5eImXLlsEuYs4TkW0B7cslf/9YImGIyFSciXfTs/7ssyWeqf1ZsGOl9A+l6U GfAi96MdLOIy0fPT52XsEH/clp35echkRoPpUtrIs6wuFB5yazP9QMU7/6vWDGtkT4FV 2FBXSigacQ6fxGm5hTdENYA8aBd0YgqDrGRs4D6dVvzpDJdLHAM83SHP0dqmJAB1nC+p z4UJPSQf8J+IDKwgZ+5x9H9hEzN4J+zZev6Dh1FWagzJ2juyRwnyG97FrAwmZOblKWcD 4kKA== X-Gm-Message-State: AO0yUKVhogKNLWqeX7KbHZlg4OZjWralP12P64fYqeqQBz7CnrriGv/n JtsYrfpZToZYTHLuFYXymqKImvPGHFOTSzWl X-Google-Smtp-Source: AK7set+JgYokM40AbQnHJHP+KxorRCT+GsCIZUFXdz9bEgaVfR/gNyggT1ClK7ekSY1PrrYlW1h2sQ== X-Received: by 2002:a05:6808:114a:b0:37f:7cf3:a952 with SMTP id u10-20020a056808114a00b0037f7cf3a952mr1040497oiu.0.1677287787375; Fri, 24 Feb 2023 17:16:27 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:26 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 25/76] ssdfs: block bitmap initialization logic Date: Fri, 24 Feb 2023 17:08:36 -0800 Message-Id: <20230225010927.813929-26-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Erase block contains sequence of logs. Every log starts with header that represents a metadata describing content of the log. Log content can include as metadata as user data payload. One of the important log's metadata structure is block bitmap. Block bitmap tracks the state (free, pre-allocated, allocated, invalid) of logical blocks and accounts distribution of physical space among metadata and user data payload. Header includes metadata information that desribes offset or location block bitmap in the log. Usually, block bitmap is stored after log's header. Block bitmap could be compressed. Also, log can store source and destination block bitmaps in the case of migration. Initialization logic requires to read block bitmap(s) from the volume, decompress it, and initialize PEB block bitmap object. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/peb_read_thread.c | 2255 ++++++++++++++++++++++++++++++++++++ 1 file changed, 2255 insertions(+) diff --git a/fs/ssdfs/peb_read_thread.c b/fs/ssdfs/peb_read_thread.c index c5087373df8d..f6a5b67612af 100644 --- a/fs/ssdfs/peb_read_thread.c +++ b/fs/ssdfs/peb_read_thread.c @@ -245,6 +245,2261 @@ int ssdfs_read_blk2off_table_fragment(struct ssdfs_peb_info *pebi, * READ THREAD FUNCTIONALITY * ******************************************************************************/ +/* + * ssdfs_read_checked_block_bitmap_header() - read and check block bitmap header + * @pebi: pointer on PEB object + * @env: init environment [in|out] + * + * This function reads block bitmap header from the volume and + * to check it consistency. + * + * RETURN: + * [success] - block bitmap header has been read in consistent state. + * [failure] - error code: + * + * %-EIO - I/O error. + */ +static +int ssdfs_read_checked_block_bitmap_header(struct ssdfs_peb_info *pebi, + struct ssdfs_read_init_env *env) +{ + struct ssdfs_fs_info *fsi; + struct page *page; + u32 pages_off; + u32 area_offset; + struct ssdfs_metadata_descriptor *desc = NULL; + size_t bmap_hdr_size = sizeof(struct ssdfs_block_bitmap_header); + size_t hdr_buf_size = max_t(size_t, + sizeof(struct ssdfs_segment_header), + sizeof(struct ssdfs_partial_log_header)); + u32 pebsize; + u32 read_bytes = 0; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!env || !env->log_hdr || !env->footer); + BUG_ON(env->log_pages > + pebi->pebc->parent_si->fsi->pages_per_peb); + BUG_ON((env->log_offset) > + pebi->pebc->parent_si->fsi->pages_per_peb); + BUG_ON(!env->b_init.bmap_hdr); + + SSDFS_DBG("seg %llu, peb %llu, log_offset %u, log_pages %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + env->log_offset, env->log_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + pages_off = env->log_offset; + pebsize = fsi->pages_per_peb * fsi->pagesize; + + page = ssdfs_page_array_get_page_locked(&pebi->cache, pages_off); + if (IS_ERR_OR_NULL(page)) { + SSDFS_ERR("fail to read checked segment header: " + "peb %llu\n", pebi->peb_id); + return -ERANGE; + } else { + ssdfs_memcpy_from_page(env->log_hdr, 0, hdr_buf_size, + page, 0, PAGE_SIZE, + hdr_buf_size); + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + err = ssdfs_check_log_header(fsi, env); + if (unlikely(err)) { + SSDFS_ERR("fail to check log header: " + "err %d\n", err); + return err; + } + + if (env->has_seg_hdr) + err = ssdfs_get_segment_header_blk_bmap_desc(pebi, env, &desc); + else + err = ssdfs_get_partial_header_blk_bmap_desc(pebi, env, &desc); + + if (unlikely(err)) { + SSDFS_ERR("fail to get descriptor: " + "err %d\n", err); + return err; + } + + if (!desc) { + SSDFS_ERR("invalid descriptor pointer\n"); + return -ERANGE; + } + + if (bmap_hdr_size != le16_to_cpu(desc->check.bytes)) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "bmap_hdr_size %zu != desc->check.bytes %u\n", + bmap_hdr_size, + le16_to_cpu(desc->check.bytes)); + return -EIO; + } + + if (le32_to_cpu(desc->offset) >= pebsize) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "desc->offset %u >= pebsize %u\n", + le32_to_cpu(desc->offset), pebsize); + return -EIO; + } + + area_offset = le32_to_cpu(desc->offset); + read_bytes = le16_to_cpu(desc->check.bytes); + + err = ssdfs_unaligned_read_cache(pebi, + area_offset, bmap_hdr_size, + env->b_init.bmap_hdr); + if (unlikely(err)) { + SSDFS_ERR("fail to read block bitmap's header: " + "seg %llu, peb %llu, offset %u, size %zu, err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + area_offset, bmap_hdr_size, + err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("BLOCK BITMAP HEADER: " + "magic: common %#x, key %#x, version (%u.%u), " + "fragments_count %u, bytes_count %u, " + "flags %#x, type %#x\n", + le32_to_cpu(env->b_init.bmap_hdr->magic.common), + le16_to_cpu(env->b_init.bmap_hdr->magic.key), + env->b_init.bmap_hdr->magic.version.major, + env->b_init.bmap_hdr->magic.version.minor, + le16_to_cpu(env->b_init.bmap_hdr->fragments_count), + le32_to_cpu(env->b_init.bmap_hdr->bytes_count), + env->b_init.bmap_hdr->flags, + env->b_init.bmap_hdr->type); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_csum_valid(&desc->check, env->b_init.bmap_hdr, read_bytes)) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "block bitmap header has invalid checksum\n"); + return -EIO; + } + + env->b_init.read_bytes += read_bytes; + + return 0; +} + +/* + * ssdfs_read_checked_block_bitmap() - read and check block bitmap + * @pebi: pointer on PEB object + * @req: segment request + * @env: init environment [in|out] + * + * This function reads block bitmap from the volume and + * to check it consistency. + * + * RETURN: + * [success] - block bitmap has been read in consistent state. + * [failure] - error code: + * + * %-ENOMEM - fail to allocate memory. + * %-EIO - I/O error. + */ +static +int ssdfs_read_checked_block_bitmap(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + struct ssdfs_read_init_env *env) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_metadata_descriptor *desc = NULL; + size_t hdr_size = sizeof(struct ssdfs_block_bitmap_fragment); + size_t desc_size = sizeof(struct ssdfs_fragment_desc); + struct ssdfs_fragment_desc *frag_array = NULL; + struct ssdfs_block_bitmap_fragment *frag_hdr = NULL; + u32 area_offset; + void *cdata_buf; + u32 chain_compr_bytes, chain_uncompr_bytes; + u32 read_bytes, uncompr_bytes; + u16 fragments_count; + u16 last_free_blk; + u32 bmap_bytes = 0; + u32 bmap_pages = 0; + u32 pages_count; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!env || !env->log_hdr || !env->footer); + BUG_ON(!env->b_init.frag_hdr); + BUG_ON(env->log_pages > + pebi->pebc->parent_si->fsi->pages_per_peb); + BUG_ON(env->log_offset > + pebi->pebc->parent_si->fsi->pages_per_peb); + BUG_ON(ssdfs_page_vector_count(&env->b_init.array) != 0); + + SSDFS_DBG("seg %llu, peb %llu, log_offset %u, log_pages %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + env->log_offset, env->log_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + if (env->has_seg_hdr) + err = ssdfs_get_segment_header_blk_bmap_desc(pebi, env, &desc); + else + err = ssdfs_get_partial_header_blk_bmap_desc(pebi, env, &desc); + + if (unlikely(err)) { + SSDFS_ERR("fail to get descriptor: " + "err %d\n", err); + return err; + } + + if (!desc) { + SSDFS_ERR("invalid descriptor pointer\n"); + return -ERANGE; + } + + area_offset = le32_to_cpu(desc->offset); + + err = ssdfs_unaligned_read_cache(pebi, + area_offset + env->b_init.read_bytes, + SSDFS_BLKBMAP_FRAG_HDR_CAPACITY, + env->b_init.frag_hdr); + if (unlikely(err)) { + SSDFS_ERR("fail to read fragment's header: " + "seg %llu, peb %llu, offset %u, size %u, err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + area_offset + env->b_init.read_bytes, + (u32)SSDFS_BLKBMAP_FRAG_HDR_CAPACITY, + err); + return err; + } + + cdata_buf = ssdfs_read_kzalloc(PAGE_SIZE, GFP_KERNEL); + if (!cdata_buf) { + SSDFS_ERR("fail to allocate cdata_buf\n"); + return -ENOMEM; + } + + frag_hdr = env->b_init.frag_hdr; + + frag_array = (struct ssdfs_fragment_desc *)((u8 *)frag_hdr + hdr_size); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("BLOCK BITMAP FRAGMENT HEADER: " + "peb_index %u, sequence_id %u, flags %#x, " + "type %#x, last_free_blk %u, " + "metadata_blks %u, invalid_blks %u\n", + le16_to_cpu(frag_hdr->peb_index), + frag_hdr->sequence_id, + frag_hdr->flags, + frag_hdr->type, + le32_to_cpu(frag_hdr->last_free_blk), + le32_to_cpu(frag_hdr->metadata_blks), + le32_to_cpu(frag_hdr->invalid_blks)); + + SSDFS_DBG("FRAGMENT CHAIN HEADER: " + "compr_bytes %u, uncompr_bytes %u, " + "fragments_count %u, desc_size %u, " + "magic %#x, type %#x, flags %#x\n", + le32_to_cpu(frag_hdr->chain_hdr.compr_bytes), + le32_to_cpu(frag_hdr->chain_hdr.uncompr_bytes), + le16_to_cpu(frag_hdr->chain_hdr.fragments_count), + le16_to_cpu(frag_hdr->chain_hdr.desc_size), + frag_hdr->chain_hdr.magic, + frag_hdr->chain_hdr.type, + le16_to_cpu(frag_hdr->chain_hdr.flags)); +#endif /* CONFIG_SSDFS_DEBUG */ + + last_free_blk = le16_to_cpu(frag_hdr->last_free_blk); + + if (last_free_blk >= fsi->pages_per_peb) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "block bitmap is corrupted: " + "last_free_blk %u is invalid\n", + last_free_blk); + err = -EIO; + goto fail_read_blk_bmap; + } + + if (le16_to_cpu(frag_hdr->metadata_blks) > fsi->pages_per_peb) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "block bitmap is corrupted: " + "metadata_blks %u is invalid\n", + le16_to_cpu(frag_hdr->metadata_blks)); + err = -EIO; + goto fail_read_blk_bmap; + } + + if (desc_size != le16_to_cpu(frag_hdr->chain_hdr.desc_size)) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "block bitmap is corrupted: " + "desc_size %u is invalid\n", + le16_to_cpu(frag_hdr->chain_hdr.desc_size)); + err = -EIO; + goto fail_read_blk_bmap; + } + + if (frag_hdr->chain_hdr.magic != SSDFS_CHAIN_HDR_MAGIC) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "block bitmap is corrupted: " + "chain header magic %#x is invalid\n", + frag_hdr->chain_hdr.magic); + err = -EIO; + goto fail_read_blk_bmap; + } + + if (frag_hdr->chain_hdr.type != SSDFS_BLK_BMAP_CHAIN_HDR) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "block bitmap is corrupted: " + "chain header type %#x is invalid\n", + frag_hdr->chain_hdr.type); + err = -EIO; + goto fail_read_blk_bmap; + } + + if (le16_to_cpu(frag_hdr->chain_hdr.flags) & + ~SSDFS_CHAIN_HDR_FLAG_MASK) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "block bitmap is corrupted: " + "unknown chain header flags %#x\n", + le16_to_cpu(frag_hdr->chain_hdr.flags)); + err = -EIO; + goto fail_read_blk_bmap; + } + + fragments_count = le16_to_cpu(frag_hdr->chain_hdr.fragments_count); + if (fragments_count == 0) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "block bitmap is corrupted: " + "fragments count is zero\n"); + err = -EIO; + goto fail_read_blk_bmap; + } + + env->b_init.read_bytes += hdr_size + (fragments_count * desc_size); + + chain_compr_bytes = le32_to_cpu(frag_hdr->chain_hdr.compr_bytes); + chain_uncompr_bytes = le32_to_cpu(frag_hdr->chain_hdr.uncompr_bytes); + read_bytes = 0; + uncompr_bytes = 0; + + if (last_free_blk == 0) { + /* need to process as minumum one page */ + bmap_pages = 1; + } else { + bmap_bytes = BLK_BMAP_BYTES(last_free_blk); + bmap_pages = (bmap_bytes + PAGE_SIZE - 1) / PAGE_SIZE; + } + + pages_count = min_t(u32, (u32)fragments_count, bmap_pages); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("last_free_blk %u, bmap_bytes %u, " + "bmap_pages %u, fragments_count %u, " + "pages_count %u\n", + last_free_blk, bmap_bytes, + bmap_pages, fragments_count, + pages_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < fragments_count; i++) { + struct ssdfs_fragment_desc *frag_desc; + struct page *page; + u16 sequence_id = i; + + if (read_bytes >= chain_compr_bytes || + uncompr_bytes >= chain_uncompr_bytes) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "block bitmap is corrupted: " + "fragments header: " + "compr_bytes %u, " + "uncompr_bytes %u\n", + chain_compr_bytes, + chain_uncompr_bytes); + err = -EIO; + goto fail_read_blk_bmap; + } + + frag_desc = &frag_array[i]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("FRAGMENT DESCRIPTOR: index %d, " + "offset %u, compr_size %u, uncompr_size %u, " + "checksum %#x, sequence_id %u, magic %#x, " + "type %#x, flags %#x\n", + i, + le32_to_cpu(frag_desc->offset), + le16_to_cpu(frag_desc->compr_size), + le16_to_cpu(frag_desc->uncompr_size), + le32_to_cpu(frag_desc->checksum), + frag_desc->sequence_id, + frag_desc->magic, + frag_desc->type, + frag_desc->flags); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (i >= pages_count) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("account fragment bytes: " + "i %d, pages_count %u\n", + i, pages_count); +#endif /* CONFIG_SSDFS_DEBUG */ + goto account_fragment_bytes; + } + + page = ssdfs_page_vector_allocate(&env->b_init.array); + if (unlikely(IS_ERR_OR_NULL(page))) { + err = !page ? -ENOMEM : PTR_ERR(page); + SSDFS_ERR("fail to add pagevec page: " + "sequence_id %u, " + "fragments count %u, err %d\n", + sequence_id, fragments_count, err); + goto fail_read_blk_bmap; + } + + ssdfs_lock_page(page); + err = ssdfs_read_checked_fragment(pebi, req, area_offset, + sequence_id, + frag_desc, + cdata_buf, page); + ssdfs_unlock_page(page); + + if (unlikely(err)) { + SSDFS_ERR("fail to read checked fragment: " + "offset %u, compr_size %u, " + "uncompr_size %u, sequence_id %u, " + "flags %#x, err %d\n", + le32_to_cpu(frag_desc->offset), + le16_to_cpu(frag_desc->compr_size), + le16_to_cpu(frag_desc->uncompr_size), + le16_to_cpu(frag_desc->sequence_id), + le16_to_cpu(frag_desc->flags), + err); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("FRAG ARRAY DUMP: \n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + frag_array, + fragments_count * desc_size); + SSDFS_DBG("\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + goto fail_read_blk_bmap; + } + +account_fragment_bytes: + read_bytes += le16_to_cpu(frag_desc->compr_size); + uncompr_bytes += le16_to_cpu(frag_desc->uncompr_size); + env->b_init.read_bytes += le16_to_cpu(frag_desc->compr_size); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("last_free_blk %u, metadata_blks %u, invalid_blks %u\n", + le16_to_cpu(frag_hdr->last_free_blk), + le16_to_cpu(frag_hdr->metadata_blks), + le16_to_cpu(frag_hdr->invalid_blks)); +#endif /* CONFIG_SSDFS_DEBUG */ + +fail_read_blk_bmap: + ssdfs_read_kfree(cdata_buf); + return err; +} + +/* + * ssdfs_init_block_bitmap_fragment() - init block bitmap fragment + * @pebi: pointer on PEB object + * @req: segment request + * @env: init environment [in|out] + * + * This function reads block bitmap's fragment from the volume and + * try to initialize the fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - fail to allocate memory. + * %-EIO - I/O error. + */ +static +int ssdfs_init_block_bitmap_fragment(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + struct ssdfs_read_init_env *env) +{ + struct ssdfs_segment_blk_bmap *seg_blkbmap; + u64 cno; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!env || !env->log_hdr || !env->footer); + + SSDFS_DBG("seg %llu, peb %llu, peb_index %u, " + "log_offset %u, log_pages %u, " + "fragment_index %d, read_bytes %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, pebi->peb_index, + env->log_offset, env->log_pages, + env->b_init.fragment_index, + env->b_init.read_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_page_vector_init(&env->b_init.array); + if (unlikely(err)) { + SSDFS_ERR("fail to init page vector: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); + goto fail_init_blk_bmap_fragment; + } + + err = ssdfs_read_checked_block_bitmap(pebi, req, env); + if (unlikely(err)) { + SSDFS_ERR("fail to read block bitmap: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); + goto fail_init_blk_bmap_fragment; + } + + seg_blkbmap = &pebi->pebc->parent_si->blk_bmap; + + if (env->has_seg_hdr) { + struct ssdfs_segment_header *seg_hdr = NULL; + + seg_hdr = SSDFS_SEG_HDR(env->log_hdr); + cno = le64_to_cpu(seg_hdr->cno); + } else { + struct ssdfs_partial_log_header *pl_hdr = NULL; + + pl_hdr = SSDFS_PLH(env->log_hdr); + cno = le64_to_cpu(pl_hdr->cno); + } + + err = ssdfs_segment_blk_bmap_partial_init(seg_blkbmap, + pebi->peb_index, + &env->b_init.array, + env->b_init.frag_hdr, + cno); + if (unlikely(err)) { + SSDFS_ERR("fail to initialize block bitmap: " + "seg %llu, peb %llu, cno %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, cno, err); + goto fail_init_blk_bmap_fragment; + } + +fail_init_blk_bmap_fragment: + ssdfs_page_vector_release(&env->b_init.array); + + return err; +} + +/* + * ssdfs_read_blk2off_table_header() - read blk2off table header + * @pebi: pointer on PEB object + * @env: init environment [in|out] + * + * This function tries to read blk2off table header. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - fail to allocate memory. + * %-EIO - I/O error. + */ +static +int ssdfs_read_blk2off_table_header(struct ssdfs_peb_info *pebi, + struct ssdfs_read_init_env *env) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_metadata_descriptor *desc = NULL; + struct ssdfs_blk2off_table_header *hdr = NULL; + size_t hdr_size = sizeof(struct ssdfs_blk2off_table_header); + struct page *page; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!env || !env->log_hdr || !env->footer); + BUG_ON(pagevec_count(&env->t_init.pvec) != 0); + + SSDFS_DBG("seg %llu, peb %llu, read_off %u, write_off %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + env->t_init.read_off, env->t_init.write_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + if (env->has_seg_hdr) { + err = ssdfs_get_segment_header_blk2off_tbl_desc(pebi, env, + &desc); + } else { + err = ssdfs_get_partial_header_blk2off_tbl_desc(pebi, env, + &desc); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to get descriptor: " + "err %d\n", err); + return err; + } + + if (!desc) { + SSDFS_ERR("invalid descriptor pointer\n"); + return -ERANGE; + } + + env->t_init.read_off = le32_to_cpu(desc->offset); + env->t_init.blk2off_tbl_hdr_off = env->t_init.read_off; + env->t_init.write_off = 0; + + err = ssdfs_unaligned_read_cache(pebi, + env->t_init.read_off, hdr_size, + &env->t_init.tbl_hdr); + if (unlikely(err)) { + SSDFS_ERR("fail to read table's header: " + "seg %llu, peb %llu, offset %u, size %zu, err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + env->t_init.read_off, hdr_size, err); + return err; + } + + hdr = &env->t_init.tbl_hdr; + + if (le32_to_cpu(hdr->magic.common) != SSDFS_SUPER_MAGIC || + le16_to_cpu(hdr->magic.key) != SSDFS_BLK2OFF_TABLE_HDR_MAGIC) { + SSDFS_ERR("invalid magic of blk2off_table\n"); + return -EIO; + } + + page = ssdfs_read_add_pagevec_page(&env->t_init.pvec); + if (unlikely(IS_ERR_OR_NULL(page))) { + err = !page ? -ENOMEM : PTR_ERR(page); + SSDFS_ERR("fail to add pagevec page: err %d\n", + err); + return err; + } + + ssdfs_lock_page(page); + ssdfs_memcpy_to_page(page, 0, PAGE_SIZE, + hdr, 0, hdr_size, + hdr_size); + ssdfs_unlock_page(page); + + env->t_init.read_off += offsetof(struct ssdfs_blk2off_table_header, + sequence); + env->t_init.write_off += offsetof(struct ssdfs_blk2off_table_header, + sequence); + + return 0; +} + +/* + * ssdfs_read_blk2off_byte_stream() - read blk2off's byte stream + * @pebi: pointer on PEB object + * @read_bytes: amount of bytes for reading + * @env: init environment [in|out] + * + * This function tries to read blk2off table's byte stream. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-EIO - I/O error. + */ +static +int ssdfs_read_blk2off_byte_stream(struct ssdfs_peb_info *pebi, + u32 read_bytes, + struct ssdfs_read_init_env *env) +{ + struct ssdfs_fs_info *fsi; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!env); + + SSDFS_DBG("seg %llu, peb %llu, read_bytes %u, " + "read_off %u, write_off %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + read_bytes, env->t_init.read_off, + env->t_init.write_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + while (read_bytes > 0) { + struct page *page = NULL; + void *kaddr; + pgoff_t page_index = env->t_init.write_off >> PAGE_SHIFT; + u32 capacity = pagevec_count(&env->t_init.pvec) << PAGE_SHIFT; + u32 offset, bytes; + + if (env->t_init.write_off >= capacity) { + page = ssdfs_read_add_pagevec_page(&env->t_init.pvec); + if (unlikely(IS_ERR_OR_NULL(page))) { + err = !page ? -ENOMEM : PTR_ERR(page); + SSDFS_ERR("fail to add pagevec page: err %d\n", + err); + return err; + } + } else { + page = env->t_init.pvec.pages[page_index]; + if (unlikely(!page)) { + err = -ERANGE; + SSDFS_ERR("fail to get page: err %d\n", + err); + return err; + } + } + + offset = env->t_init.write_off % PAGE_SIZE; + bytes = min_t(u32, read_bytes, PAGE_SIZE - offset); + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + err = ssdfs_unaligned_read_cache(pebi, + env->t_init.read_off, bytes, + (u8 *)kaddr + offset); + flush_dcache_page(page); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + if (unlikely(err)) { + SSDFS_ERR("fail to read page: " + "seg %llu, peb %llu, offset %u, " + "size %u, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, env->t_init.read_off, + bytes, err); + return err; + } + + read_bytes -= bytes; + env->t_init.read_off += bytes; + env->t_init.write_off += bytes; + }; + + return 0; +} + +/* + * ssdfs_read_blk2off_table_extents() - read blk2off table's extents + * @pebi: pointer on PEB object + * @env: init environment [in|out] + * + * This function tries to read blk2off table's extents. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-EIO - I/O error. + */ +static +int ssdfs_read_blk2off_table_extents(struct ssdfs_peb_info *pebi, + struct ssdfs_read_init_env *env) +{ + struct ssdfs_fs_info *fsi; + u16 extents_off; + u16 extent_count; + size_t extent_size = sizeof(struct ssdfs_translation_extent); + u32 offset = offsetof(struct ssdfs_blk2off_table_header, sequence); + u32 read_bytes; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!env); +#endif /* CONFIG_SSDFS_DEBUG */ + + extents_off = le16_to_cpu(env->t_init.tbl_hdr.extents_off); + extent_count = le16_to_cpu(env->t_init.tbl_hdr.extents_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, " + "extents_off %u, extent_count %u, " + "read_off %u, write_off %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + extents_off, extent_count, + env->t_init.read_off, + env->t_init.write_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + if (offset != extents_off) { + SSDFS_ERR("extents_off %u != offset %u\n", + extents_off, offset); + return -EIO; + } + + if (extent_count == 0 || extent_count == U16_MAX) { + SSDFS_ERR("invalid extent_count %u\n", + extent_count); + return -EIO; + } + + read_bytes = extent_size * extent_count; + + err = ssdfs_read_blk2off_byte_stream(pebi, read_bytes, env); + if (unlikely(err)) { + SSDFS_ERR("fail to read byte stream: err %d\n", + err); + return err; + } + + return 0; +} + +/* + * ssdfs_read_blk2off_pot_fragment() - read blk2off table's POT fragment + * @pebi: pointer on PEB object + * @env: init environment [in|out] + * + * This function tries to read blk2off table's Physical Offsets Table + * fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-EIO - I/O error. + */ +static +int ssdfs_read_blk2off_pot_fragment(struct ssdfs_peb_info *pebi, + struct ssdfs_read_init_env *env) +{ + struct ssdfs_phys_offset_table_header hdr; + size_t hdr_size = sizeof(struct ssdfs_phys_offset_table_header); + u32 fragment_start; + u32 next_frag_off; + u32 read_bytes; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!env); + + SSDFS_DBG("seg %llu, peb %llu, " + "read_off %u, write_off %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + env->t_init.read_off, env->t_init.write_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + fragment_start = env->t_init.read_off; + + err = ssdfs_unaligned_read_cache(pebi, + env->t_init.read_off, hdr_size, + &hdr); + if (unlikely(err)) { + SSDFS_ERR("fail to read page: " + "seg %llu, peb %llu, offset %u, " + "size %zu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, env->t_init.read_off, + hdr_size, err); + return err; + } + + if (le32_to_cpu(hdr.magic) != SSDFS_PHYS_OFF_TABLE_MAGIC) { + SSDFS_ERR("invalid magic\n"); + return -EIO; + } + + read_bytes = le32_to_cpu(hdr.byte_size); + + err = ssdfs_read_blk2off_byte_stream(pebi, read_bytes, env); + if (unlikely(err)) { + SSDFS_ERR("fail to read byte stream: err %d\n", + err); + return err; + } + + next_frag_off = le16_to_cpu(hdr.next_fragment_off); + + if (next_frag_off >= U16_MAX) + goto finish_read_blk2off_pot_fragment; + + env->t_init.read_off = fragment_start + next_frag_off; + +finish_read_blk2off_pot_fragment: + return 0; +} + +/* + * ssdfs_read_blk2off_table_fragment() - read blk2off table's log's fragments + * @pebi: pointer on PEB object + * @env: init environment [in|out] + * + * This function tries to read blk2off table's log's fragments. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-EIO - I/O error. + */ +static +int ssdfs_read_blk2off_table_fragment(struct ssdfs_peb_info *pebi, + struct ssdfs_read_init_env *env) +{ + struct ssdfs_fs_info *fsi; + u16 fragment_count; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!env || !env->log_hdr || !env->footer); + BUG_ON(pagevec_count(&env->t_init.pvec) != 0); + + SSDFS_DBG("seg %llu, peb %llu\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + env->t_init.read_off = 0; + env->t_init.write_off = 0; + + err = ssdfs_read_blk2off_table_header(pebi, env); + if (unlikely(err)) { + SSDFS_ERR("fail to read translation table header: " + "seg %llu, peb %llu, " + "read_off %u, write_off %u, err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + env->t_init.read_off, env->t_init.write_off, + err); + goto fail_read_blk2off_fragments; + } + + err = ssdfs_read_blk2off_table_extents(pebi, env); + if (unlikely(err)) { + SSDFS_ERR("fail to read translation table's extents: " + "seg %llu, peb %llu, " + "read_off %u, write_off %u, err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + env->t_init.read_off, env->t_init.write_off, + err); + goto fail_read_blk2off_fragments; + } + + fragment_count = le16_to_cpu(env->t_init.tbl_hdr.fragments_count); + + for (i = 0; i < fragment_count; i++) { + err = ssdfs_read_blk2off_pot_fragment(pebi, env); + if (unlikely(err)) { + SSDFS_ERR("fail to read physical offset table's " + "fragment: seg %llu, peb %llu, " + "fragment_index %d, " + "read_off %u, write_off %u, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + i, env->t_init.read_off, + env->t_init.write_off, err); + goto fail_read_blk2off_fragments; + } + } + +fail_read_blk2off_fragments: + return err; +} + +/* + * ssdfs_correct_zone_block_bitmap() - set all migrated blocks as invalidated + * @pebi: pointer on PEB object + * + * This function tries to mark all migrated blocks as + * invalidated for the case of source zone. Actually, invalidated + * extents will be added into the queue. Invalidation operation + * happens after complete intialization of segment object. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - fail to allocate memory. + * %-EIO - I/O error. + */ +static +int ssdfs_correct_zone_block_bitmap(struct ssdfs_peb_info *pebi) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + struct ssdfs_invextree_info *invextree; + struct ssdfs_shared_extents_tree *shextree; + struct ssdfs_btree_search *search; + struct ssdfs_raw_extent extent; + struct ssdfs_raw_extent *found; +#ifdef CONFIG_SSDFS_DEBUG + size_t item_size = sizeof(struct ssdfs_raw_extent); +#endif /* CONFIG_SSDFS_DEBUG */ + u32 logical_blk = 0; + u32 len; + u32 count; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + + SSDFS_DBG("seg %llu, peb %llu, peb_index %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, pebi->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = pebi->pebc->parent_si; + fsi = si->fsi; + len = fsi->pages_per_seg; + + invextree = fsi->invextree; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!invextree); +#endif /* CONFIG_SSDFS_DEBUG */ + + shextree = fsi->shextree; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!shextree); +#endif /* CONFIG_SSDFS_DEBUG */ + + search = ssdfs_btree_search_alloc(); + if (!search) { + SSDFS_ERR("fail to allocate btree search object\n"); + return -ENOMEM; + } + + do { + extent.seg_id = cpu_to_le64(si->seg_id); + extent.logical_blk = cpu_to_le32(logical_blk); + extent.len = cpu_to_le32(len); + + ssdfs_btree_search_init(search); + err = ssdfs_invextree_find(invextree, &extent, search); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find invalidated extents: " + "seg_id %llu, logical_blk %u, len %u\n", + si->seg_id, logical_blk, len); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find invalidated extents: " + "seg_id %llu, logical_blk %u, len %u\n", + si->seg_id, logical_blk, len); + goto finish_correct_zone_block_bmap; + } + + count = search->result.items_in_buffer; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search->result.buf); + BUG_ON(count == 0); + BUG_ON((count * item_size) != search->result.buf_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < count; i++) { + found = (struct ssdfs_raw_extent *)search->result.buf; + found += i; + + err = ssdfs_shextree_add_pre_invalid_extent(shextree, + SSDFS_INVALID_EXTENTS_BTREE_INO, + found); + if (unlikely(err)) { + SSDFS_ERR("fail to add pre-invalid extent: " + "seg_id %llu, logical_blk %u, " + "len %u, err %d\n", + le64_to_cpu(found->seg_id), + le32_to_cpu(found->logical_blk), + le32_to_cpu(found->len), + err); + goto finish_correct_zone_block_bmap; + } + } + + found = (struct ssdfs_raw_extent *)search->result.buf; + found += count - 1; + + logical_blk = le32_to_cpu(found->logical_blk) + + le32_to_cpu(found->len); + + if (logical_blk >= fsi->pages_per_seg) + len = 0; + else + len = fsi->pages_per_seg - logical_blk; + } while (len > 0); + + if (err == -ENODATA) { + /* all extents have been processed */ + err = 0; + } + +finish_correct_zone_block_bmap: + ssdfs_btree_search_free(search); + return err; +} + +/* + * ssdfs_peb_init_using_metadata_state() - initialize "using" PEB + * @pebi: pointer on PEB object + * @env: read operation's init environment + * @req: read request + * + * This function tries to initialize last actual metadata state for + * the case of "using" state of PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - fail to allocate memory. + * %-EIO - I/O error. + */ +static +int ssdfs_peb_init_using_metadata_state(struct ssdfs_peb_info *pebi, + struct ssdfs_read_init_env *env, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + struct ssdfs_segment_header *seg_hdr = NULL; + struct ssdfs_partial_log_header *pl_hdr = NULL; + u16 fragments_count; + u32 bytes_count; + u16 new_log_start_page; + u64 cno; + int sequence_id = 0; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !env || !req); + + SSDFS_DBG("seg %llu, peb %llu, peb_index %u, " + "class %#x, cmd %#x, type %#x\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, pebi->peb_index, + req->private.class, req->private.cmd, + req->private.type); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = pebi->pebc->parent_si; + fsi = si->fsi; + + /* + * Allow creating thread to continue creation logic. + */ + complete(&req->result.wait); + + err = ssdfs_peb_get_log_pages_count(fsi, pebi, env); + if (unlikely(err)) { + SSDFS_ERR("fail to define log_pages: " + "seg %llu, peb %llu\n", + si->seg_id, pebi->peb_id); + goto fail_init_using_blk_bmap; + } + +#ifdef CONFIG_SSDFS_DEBUG + if (fsi->pages_per_peb % env->log_pages) { + SSDFS_WARN("fsi->pages_per_peb %u, log_pages %u\n", + fsi->pages_per_peb, env->log_pages); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + pebi->log_pages = env->log_pages; + + err = ssdfs_find_last_partial_log(fsi, pebi, env, + &new_log_start_page); + if (unlikely(err)) { + SSDFS_ERR("fail to find last partial log: err %d\n", err); + goto fail_init_using_blk_bmap; + } + + err = ssdfs_pre_fetch_block_bitmap(pebi, env); + if (unlikely(err)) { + SSDFS_ERR("fail to pre-fetch block bitmap: " + "seg %llu, peb %llu, log_offset %u, err %d\n", + si->seg_id, pebi->peb_id, + env->log_offset, err); + goto fail_init_using_blk_bmap; + } + + err = ssdfs_read_checked_block_bitmap_header(pebi, env); + if (unlikely(err)) { + SSDFS_ERR("fail to read block bitmap header: " + "seg %llu, peb %llu, log_offset %u, err %d\n", + si->seg_id, pebi->peb_id, + env->log_offset, err); + goto fail_init_using_blk_bmap; + } + + fragments_count = le16_to_cpu(env->b_init.bmap_hdr->fragments_count); + bytes_count = le32_to_cpu(env->b_init.bmap_hdr->bytes_count); + + for (i = 0; i < fragments_count; i++) { + env->b_init.fragment_index = i; + err = ssdfs_init_block_bitmap_fragment(pebi, req, env); + if (unlikely(err)) { + SSDFS_ERR("fail to init block bitmap: " + "peb_id %llu, peb_index %u, " + "log_offset %u, fragment_index %u, " + "read_bytes %u, err %d\n", + pebi->peb_id, pebi->peb_index, + env->log_offset, i, + env->b_init.read_bytes, err); + goto fail_init_using_blk_bmap; + } + } + + if (bytes_count != env->b_init.read_bytes) { + SSDFS_WARN("bytes_count %u != read_bytes %u\n", + bytes_count, env->b_init.read_bytes); + err = -EIO; + goto fail_init_using_blk_bmap; + } + + if (fsi->is_zns_device && + is_ssdfs_peb_containing_user_data(pebi->pebc)) { + err = ssdfs_correct_zone_block_bitmap(pebi); + if (unlikely(err)) { + SSDFS_ERR("fail to correct zone's block bitmap: " + "seg %llu, peb %llu, peb_index %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, pebi->peb_index, + err); + goto fail_init_using_blk_bmap; + } + } + + BUG_ON(new_log_start_page >= U16_MAX); + + if (env->has_seg_hdr) { + /* first log */ + sequence_id = 0; + } else { + pl_hdr = SSDFS_PLH(env->log_hdr); + sequence_id = le32_to_cpu(pl_hdr->sequence_id); + } + + BUG_ON((sequence_id + 1) >= INT_MAX); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("new_log_start_page %u\n", new_log_start_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (new_log_start_page < fsi->pages_per_peb) { + u16 free_pages; + u16 min_log_pages; + + /* + * Set the value of log's start page + * by temporary value. It needs for + * estimation of min_partial_log_pages. + */ + ssdfs_peb_current_log_lock(pebi); + pebi->current_log.start_page = new_log_start_page; + ssdfs_peb_current_log_unlock(pebi); + + free_pages = new_log_start_page % pebi->log_pages; + free_pages = pebi->log_pages - free_pages; + min_log_pages = ssdfs_peb_estimate_min_partial_log_pages(pebi); + sequence_id++; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_pages %u, min_log_pages %u, " + "new_log_start_page %u\n", + free_pages, min_log_pages, + new_log_start_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (free_pages == pebi->log_pages) { + /* start new full log */ + sequence_id = 0; + } else if (free_pages < min_log_pages) { + SSDFS_WARN("POTENTIAL HOLE: " + "seg %llu, peb %llu, " + "peb_index %u, start_page %u, " + "free_pages %u, min_log_pages %u, " + "new_log_start_page %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, pebi->peb_index, + new_log_start_page, + free_pages, min_log_pages, + new_log_start_page + free_pages); + +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + + new_log_start_page += free_pages; + free_pages = pebi->log_pages; + sequence_id = 0; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_pages %u, min_log_pages %u, " + "new_log_start_page %u\n", + free_pages, min_log_pages, + new_log_start_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + bytes_count = le32_to_cpu(env->b_init.bmap_hdr->bytes_count); + ssdfs_peb_current_log_init(pebi, free_pages, + new_log_start_page, + sequence_id, + bytes_count); + } else { + sequence_id = 0; + ssdfs_peb_current_log_init(pebi, + 0, + new_log_start_page, + sequence_id, + U32_MAX); + } + +fail_init_using_blk_bmap: + if (unlikely(err)) + goto fail_init_using_peb; + + err = ssdfs_pre_fetch_blk2off_table_area(pebi, env); + if (err == -ENOENT) { + SSDFS_DBG("blk2off table's fragment is absent\n"); + return 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to pre-fetch blk2off_table area: " + "seg %llu, peb %llu, log_offset %u, err %d\n", + si->seg_id, pebi->peb_id, + env->log_offset, err); + goto fail_init_using_peb; + } + + err = ssdfs_pre_fetch_blk_desc_table_area(pebi, env); + if (err == -ENOENT) { + SSDFS_DBG("blk desc table's fragment is absent\n"); + /* continue logic -> process free extents */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to pre-fetch blk desc table area: " + "seg %llu, peb %llu, log_offset %u, err %d\n", + si->seg_id, pebi->peb_id, + env->log_offset, err); + goto fail_init_using_peb; + } + + err = ssdfs_read_blk2off_table_fragment(pebi, env); + if (unlikely(err)) { + SSDFS_ERR("fail to read translation table fragments: " + "seg %llu, peb %llu, err %d\n", + si->seg_id, pebi->peb_id, err); + goto fail_init_using_peb; + } + + if (env->has_seg_hdr) { + seg_hdr = SSDFS_SEG_HDR(env->log_hdr); + cno = le64_to_cpu(seg_hdr->cno); + } else { + pl_hdr = SSDFS_PLH(env->log_hdr); + cno = le64_to_cpu(pl_hdr->cno); + } + + err = ssdfs_blk2off_table_partial_init(si->blk2off_table, + &env->t_init.pvec, + &env->bdt_init.pvec, + pebi->peb_index, cno); + if (unlikely(err)) { + SSDFS_ERR("fail to start initialization of offset table: " + "seg %llu, peb %llu, err %d\n", + si->seg_id, pebi->peb_id, err); + goto fail_init_using_peb; + } + +fail_init_using_peb: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; +} + +/* + * ssdfs_peb_init_used_metadata_state() - initialize "used" PEB + * @pebi: pointer on PEB object + * @env: read operation's init environment + * @req: read request + * + * This function tries to initialize last actual metadata state for + * the case of "used" state of PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - fail to allocate memory. + * %-EIO - I/O error. + */ +static +int ssdfs_peb_init_used_metadata_state(struct ssdfs_peb_info *pebi, + struct ssdfs_read_init_env *env, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + struct ssdfs_segment_header *seg_hdr = NULL; + struct ssdfs_partial_log_header *pl_hdr = NULL; + u16 fragments_count; + u32 bytes_count; + u16 new_log_start_page; + u64 cno; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !env || !req); + + SSDFS_DBG("seg %llu, peb %llu, peb_index %u, " + "class %#x, cmd %#x, type %#x\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, pebi->peb_index, + req->private.class, req->private.cmd, + req->private.type); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = pebi->pebc->parent_si; + fsi = si->fsi; + + /* + * Allow creating thread to continue creation logic. + */ + complete(&req->result.wait); + + err = ssdfs_peb_get_log_pages_count(fsi, pebi, env); + if (unlikely(err)) { + SSDFS_ERR("fail to define log_pages: " + "seg %llu, peb %llu\n", + si->seg_id, pebi->peb_id); + goto fail_init_used_blk_bmap; + } + +#ifdef CONFIG_SSDFS_DEBUG + if (fsi->pages_per_peb % env->log_pages) { + SSDFS_WARN("fsi->pages_per_peb %u, log_pages %u\n", + fsi->pages_per_peb, env->log_pages); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + pebi->log_pages = env->log_pages; + + err = ssdfs_find_last_partial_log(fsi, pebi, env, + &new_log_start_page); + if (unlikely(err)) { + SSDFS_ERR("fail to find last partial log: err %d\n", err); + goto fail_init_used_blk_bmap; + } + + err = ssdfs_pre_fetch_block_bitmap(pebi, env); + if (unlikely(err)) { + SSDFS_ERR("fail to pre-fetch block bitmap: " + "seg %llu, peb %llu, log_offset %u, err %d\n", + si->seg_id, pebi->peb_id, + env->log_offset, err); + goto fail_init_used_blk_bmap; + } + + err = ssdfs_read_checked_block_bitmap_header(pebi, env); + if (unlikely(err)) { + SSDFS_ERR("fail to read block bitmap header: " + "seg %llu, peb %llu, log_offset %u, err %d\n", + si->seg_id, pebi->peb_id, + env->log_offset, err); + goto fail_init_used_blk_bmap; + } + + fragments_count = le16_to_cpu(env->b_init.bmap_hdr->fragments_count); + bytes_count = le32_to_cpu(env->b_init.bmap_hdr->bytes_count); + + for (i = 0; i < fragments_count; i++) { + env->b_init.fragment_index = i; + err = ssdfs_init_block_bitmap_fragment(pebi, req, env); + if (unlikely(err)) { + SSDFS_ERR("fail to init block bitmap: " + "peb_id %llu, peb_index %u, " + "log_offset %u, fragment_index %u, " + "read_bytes %u, err %d\n", + pebi->peb_id, pebi->peb_index, + env->log_offset, i, + env->b_init.read_bytes, err); + goto fail_init_used_blk_bmap; + } + } + + if (bytes_count != env->b_init.read_bytes) { + SSDFS_WARN("bytes_count %u != read_bytes %u\n", + bytes_count, env->b_init.read_bytes); + err = -EIO; + goto fail_init_used_blk_bmap; + } + + if (fsi->is_zns_device && + is_ssdfs_peb_containing_user_data(pebi->pebc)) { + err = ssdfs_correct_zone_block_bitmap(pebi); + if (unlikely(err)) { + SSDFS_ERR("fail to correct zone's block bitmap: " + "seg %llu, peb %llu, peb_index %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, pebi->peb_index, + err); + goto fail_init_used_blk_bmap; + } + } + + ssdfs_peb_current_log_init(pebi, 0, fsi->pages_per_peb, 0, U32_MAX); + +fail_init_used_blk_bmap: + if (unlikely(err)) + goto fail_init_used_peb; + + err = ssdfs_pre_fetch_blk2off_table_area(pebi, env); + if (err == -ENOENT) { + SSDFS_DBG("blk2off table's fragment is absent\n"); + return 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to pre-fetch blk2off_table area: " + "seg %llu, peb %llu, log_offset %u, err %d\n", + si->seg_id, pebi->peb_id, + env->log_offset, err); + goto fail_init_used_peb; + } + + err = ssdfs_pre_fetch_blk_desc_table_area(pebi, env); + if (err == -ENOENT) { + SSDFS_DBG("blk desc table's fragment is absent\n"); + /* continue logic -> process free extents */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to pre-fetch blk desc table area: " + "seg %llu, peb %llu, log_offset %u, err %d\n", + si->seg_id, pebi->peb_id, + env->log_offset, err); + goto fail_init_used_peb; + } + + err = ssdfs_read_blk2off_table_fragment(pebi, env); + if (unlikely(err)) { + SSDFS_ERR("fail to read translation table fragments: " + "seg %llu, peb %llu, err %d\n", + si->seg_id, pebi->peb_id, err); + goto fail_init_used_peb; + } + + if (env->has_seg_hdr) { + seg_hdr = SSDFS_SEG_HDR(env->log_hdr); + cno = le64_to_cpu(seg_hdr->cno); + } else { + pl_hdr = SSDFS_PLH(env->log_hdr); + cno = le64_to_cpu(pl_hdr->cno); + } + + err = ssdfs_blk2off_table_partial_init(si->blk2off_table, + &env->t_init.pvec, + &env->bdt_init.pvec, + pebi->peb_index, + cno); + if (unlikely(err)) { + SSDFS_ERR("fail to start initialization of offset table: " + "seg %llu, peb %llu, err %d\n", + si->seg_id, pebi->peb_id, err); + goto fail_init_used_peb; + } + +fail_init_used_peb: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; +} + +/* + * ssdfs_src_peb_init_using_metadata_state() - init src "using" PEB container + * @pebc: pointer on PEB container + * @req: read request + * + * This function tries to initialize "using" PEB container. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_src_peb_init_using_metadata_state(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_peb_info *pebi; + int items_state; + int id1, id2; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, pebc->peb_index); +#else + SSDFS_DBG("seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, pebc->peb_index); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + fsi = pebc->parent_si->fsi; + + items_state = atomic_read(&pebc->items_state); + switch(items_state) { + case SSDFS_PEB1_SRC_CONTAINER: + case SSDFS_PEB2_SRC_CONTAINER: + /* valid states */ + break; + + default: + SSDFS_WARN("invalid items_state %#x\n", + items_state); + return -ERANGE; + }; + + down_read(&pebc->lock); + + pebi = pebc->src_peb; + if (!pebi) { + SSDFS_WARN("source PEB is NULL\n"); + err = -ERANGE; + goto finish_src_init_using_metadata_state; + } + + err = ssdfs_prepare_read_init_env(&pebi->env, fsi->pages_per_peb); + if (unlikely(err)) { + SSDFS_ERR("fail to init read environment: err %d\n", + err); + goto finish_src_init_using_metadata_state; + } + + err = ssdfs_peb_init_using_metadata_state(pebi, &pebi->env, req); + if (unlikely(err)) { + SSDFS_ERR("fail to init using metadata state: " + "peb_id %llu, peb_index %u, err %d\n", + pebi->peb_id, pebi->peb_index, err); + ssdfs_segment_blk_bmap_init_failed(&pebc->parent_si->blk_bmap, + pebc->peb_index); + goto finish_src_init_using_metadata_state; + } + + id1 = pebi->env.cur_migration_id; + + if (!is_peb_migration_id_valid(id1)) { + err = -EIO; + SSDFS_ERR("invalid peb_migration_id: " + "seg_id %llu, peb_index %u, " + "peb_migration_id %u\n", + pebc->parent_si->seg_id, + pebc->peb_index, + id1); + goto finish_src_init_using_metadata_state; + } + + id2 = ssdfs_get_peb_migration_id(pebi); + + if (id2 == SSDFS_PEB_UNKNOWN_MIGRATION_ID) { + /* it needs to initialize the migration id */ + ssdfs_set_peb_migration_id(pebi, id1); + } else if (is_peb_migration_id_valid(id2)) { + if (id1 != id2) { + err = -ERANGE; + SSDFS_ERR("migration_id1 %d != migration_id2 %d\n", + id1, id2); + goto finish_src_init_using_metadata_state; + } else { + /* + * Do nothing. + */ + } + } else { + err = -ERANGE; + SSDFS_ERR("invalid migration_id %d\n", id2); + goto finish_src_init_using_metadata_state; + } + + atomic_set(&pebi->state, + SSDFS_PEB_OBJECT_INITIALIZED); + complete_all(&pebi->init_end); + +finish_src_init_using_metadata_state: + ssdfs_destroy_init_env(&pebi->env); + up_read(&pebc->lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: err %d\n", err); +#else + SSDFS_DBG("finished: err %d\n", err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * ssdfs_dst_peb_init_using_metadata_state() - init dst "using" PEB container + * @pebc: pointer on PEB container + * @req: read request + * + * This function tries to initialize "using" PEB container. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_dst_peb_init_using_metadata_state(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_peb_info *pebi; + int items_state; + int id1, id2; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, pebc->peb_index); +#else + SSDFS_DBG("seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, pebc->peb_index); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + fsi = pebc->parent_si->fsi; + + items_state = atomic_read(&pebc->items_state); + switch(items_state) { + case SSDFS_PEB1_DST_CONTAINER: + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + /* valid states */ + break; + + default: + SSDFS_WARN("invalid items_state %#x\n", + items_state); + return -ERANGE; + }; + + down_read(&pebc->lock); + + pebi = pebc->dst_peb; + if (!pebi) { + SSDFS_WARN("destination PEB is NULL\n"); + err = -ERANGE; + goto finish_dst_init_using_metadata_state; + } + + err = ssdfs_prepare_read_init_env(&pebi->env, fsi->pages_per_peb); + if (unlikely(err)) { + SSDFS_ERR("fail to init read environment: err %d\n", + err); + goto finish_dst_init_using_metadata_state; + } + + err = ssdfs_peb_init_using_metadata_state(pebi, &pebi->env, req); + if (unlikely(err)) { + SSDFS_ERR("fail to init using metadata state: " + "peb_id %llu, peb_index %u, err %d\n", + pebi->peb_id, pebi->peb_index, err); + ssdfs_segment_blk_bmap_init_failed(&pebc->parent_si->blk_bmap, + pebc->peb_index); + goto finish_dst_init_using_metadata_state; + } + + id1 = pebi->env.cur_migration_id; + + if (!is_peb_migration_id_valid(id1)) { + err = -EIO; + SSDFS_ERR("invalid peb_migration_id: " + "seg_id %llu, peb_index %u, " + "peb_migration_id %u\n", + pebc->parent_si->seg_id, + pebc->peb_index, + id1); + goto finish_dst_init_using_metadata_state; + } + + ssdfs_set_peb_migration_id(pebc->dst_peb, id1); + + atomic_set(&pebc->dst_peb->state, + SSDFS_PEB_OBJECT_INITIALIZED); + complete_all(&pebc->dst_peb->init_end); + + switch (items_state) { + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + if (!pebc->src_peb) { + SSDFS_WARN("source PEB is NULL\n"); + err = -ERANGE; + goto finish_dst_init_using_metadata_state; + } + + id1 = pebi->env.prev_migration_id; + + if (!is_peb_migration_id_valid(id1)) { + err = -EIO; + SSDFS_ERR("invalid peb_migration_id: " + "seg_id %llu, peb_index %u, " + "peb_migration_id %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, + id1); + goto finish_dst_init_using_metadata_state; + } + + id2 = ssdfs_get_peb_migration_id(pebc->src_peb); + + if (id2 == SSDFS_PEB_UNKNOWN_MIGRATION_ID) { + /* it needs to initialize the migration id */ + ssdfs_set_peb_migration_id(pebc->src_peb, id1); + atomic_set(&pebc->src_peb->state, + SSDFS_PEB_OBJECT_INITIALIZED); + complete_all(&pebc->src_peb->init_end); + } else if (is_peb_migration_id_valid(id2)) { + if (id1 != id2) { + err = -ERANGE; + SSDFS_ERR("id1 %d != id2 %d\n", + id1, id2); + goto finish_dst_init_using_metadata_state; + } else { + /* + * Do nothing. + */ + } + } else { + err = -ERANGE; + SSDFS_ERR("invalid migration_id %d\n", id2); + goto finish_dst_init_using_metadata_state; + } + break; + + default: + /* do nothing */ + break; + }; + +finish_dst_init_using_metadata_state: + ssdfs_destroy_init_env(&pebi->env); + up_read(&pebc->lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: err %d\n", err); +#else + SSDFS_DBG("finished: err %d\n", err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * ssdfs_src_peb_init_used_metadata_state() - init src "used" PEB container + * @pebc: pointer on PEB container + * @req: read request + * + * This function tries to initialize "used" PEB container. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_src_peb_init_used_metadata_state(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_peb_info *pebi; + int items_state; + int id1, id2; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, pebc->peb_index); +#else + SSDFS_DBG("seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, pebc->peb_index); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + fsi = pebc->parent_si->fsi; + + items_state = atomic_read(&pebc->items_state); + switch(items_state) { + case SSDFS_PEB1_SRC_CONTAINER: + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB1_SRC_EXT_PTR_DST_CONTAINER: + case SSDFS_PEB2_SRC_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + case SSDFS_PEB2_SRC_EXT_PTR_DST_CONTAINER: + /* valid states */ + break; + + default: + SSDFS_WARN("invalid items_state %#x\n", + items_state); + return -ERANGE; + }; + + down_read(&pebc->lock); + + pebi = pebc->src_peb; + if (!pebi) { + SSDFS_WARN("source PEB is NULL\n"); + err = -ERANGE; + goto finish_src_init_used_metadata_state; + } + + err = ssdfs_prepare_read_init_env(&pebi->env, fsi->pages_per_peb); + if (unlikely(err)) { + SSDFS_ERR("fail to init read environment: err %d\n", + err); + goto finish_src_init_used_metadata_state; + } + + err = ssdfs_peb_init_used_metadata_state(pebi, &pebi->env, req); + if (unlikely(err)) { + SSDFS_ERR("fail to init used metadata state: " + "peb_id %llu, peb_index %u, err %d\n", + pebi->peb_id, pebi->peb_index, err); + ssdfs_segment_blk_bmap_init_failed(&pebc->parent_si->blk_bmap, + pebc->peb_index); + goto finish_src_init_used_metadata_state; + } + + id1 = pebi->env.cur_migration_id; + + if (!is_peb_migration_id_valid(id1)) { + err = -EIO; + SSDFS_ERR("invalid peb_migration_id: " + "seg_id %llu, peb_index %u, " + "peb_migration_id %u\n", + pebc->parent_si->seg_id, + pebc->peb_index, + id1); + goto finish_src_init_used_metadata_state; + } + + id2 = ssdfs_get_peb_migration_id(pebi); + + if (id2 == SSDFS_PEB_UNKNOWN_MIGRATION_ID) { + /* it needs to initialize the migration id */ + ssdfs_set_peb_migration_id(pebi, id1); + atomic_set(&pebi->state, + SSDFS_PEB_OBJECT_INITIALIZED); + complete_all(&pebi->init_end); + } else if (is_peb_migration_id_valid(id2)) { + if (id1 != id2) { + err = -ERANGE; + SSDFS_ERR("migration_id1 %d != migration_id2 %d\n", + id1, id2); + goto finish_src_init_used_metadata_state; + } else { + /* + * Do nothing. + */ + } + } else { + err = -ERANGE; + SSDFS_ERR("invalid migration_id %d\n", id2); + goto finish_src_init_used_metadata_state; + } + + switch (items_state) { + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + if (!pebc->dst_peb) { + SSDFS_WARN("destination PEB is NULL\n"); + err = -ERANGE; + goto finish_src_init_used_metadata_state; + } + + id1 = __ssdfs_define_next_peb_migration_id(id1); + if (!is_peb_migration_id_valid(id1)) { + err = -EIO; + SSDFS_ERR("invalid peb_migration_id: " + "seg_id %llu, peb_index %u, " + "peb_migration_id %u\n", + pebc->parent_si->seg_id, + pebc->peb_index, + id1); + goto finish_src_init_used_metadata_state; + } + + id2 = ssdfs_get_peb_migration_id(pebc->dst_peb); + + if (id2 == SSDFS_PEB_UNKNOWN_MIGRATION_ID) { + /* it needs to initialize the migration id */ + ssdfs_set_peb_migration_id(pebc->dst_peb, id1); + atomic_set(&pebc->dst_peb->state, + SSDFS_PEB_OBJECT_INITIALIZED); + complete_all(&pebc->dst_peb->init_end); + } else if (is_peb_migration_id_valid(id2)) { + if (id1 != id2) { + err = -ERANGE; + SSDFS_ERR("id1 %d != id2 %d\n", + id1, id2); + goto finish_src_init_used_metadata_state; + } else { + /* + * Do nothing. + */ + } + } else { + err = -ERANGE; + SSDFS_ERR("invalid migration_id %d\n", id2); + goto finish_src_init_used_metadata_state; + } + break; + + default: + /* do nothing */ + break; + }; + +finish_src_init_used_metadata_state: + ssdfs_destroy_init_env(&pebi->env); + up_read(&pebc->lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: err %d\n", err); +#else + SSDFS_DBG("finished: err %d\n", err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * ssdfs_dst_peb_init_used_metadata_state() - init dst "used" PEB container + * @pebc: pointer on PEB container + * @req: read request + * + * This function tries to initialize "used" PEB container. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_dst_peb_init_used_metadata_state(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_peb_info *pebi; + int items_state; + int id1, id2; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, pebc->peb_index); +#else + SSDFS_DBG("seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, pebc->peb_index); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + fsi = pebc->parent_si->fsi; + + items_state = atomic_read(&pebc->items_state); + switch(items_state) { + case SSDFS_PEB1_DST_CONTAINER: + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + /* valid states */ + break; + + default: + SSDFS_WARN("invalid items_state %#x\n", + items_state); + return -ERANGE; + }; + + down_read(&pebc->lock); + + pebi = pebc->dst_peb; + if (!pebi) { + SSDFS_WARN("destination PEB is NULL\n"); + err = -ERANGE; + goto finish_dst_init_used_metadata_state; + } + + err = ssdfs_prepare_read_init_env(&pebi->env, fsi->pages_per_peb); + if (unlikely(err)) { + SSDFS_ERR("fail to init read environment: err %d\n", + err); + goto finish_dst_init_used_metadata_state; + } + + err = ssdfs_peb_init_used_metadata_state(pebi, &pebi->env, req); + if (unlikely(err)) { + SSDFS_ERR("fail to init used metadata state: " + "peb_id %llu, peb_index %u, err %d\n", + pebi->peb_id, pebi->peb_index, err); + ssdfs_segment_blk_bmap_init_failed(&pebc->parent_si->blk_bmap, + pebc->peb_index); + goto finish_dst_init_used_metadata_state; + } + + id1 = pebi->env.cur_migration_id; + + if (!is_peb_migration_id_valid(id1)) { + err = -EIO; + SSDFS_ERR("invalid peb_migration_id: " + "seg_id %llu, peb_index %u, " + "peb_migration_id %u\n", + pebc->parent_si->seg_id, + pebc->peb_index, + id1); + goto finish_dst_init_used_metadata_state; + } + + id2 = ssdfs_get_peb_migration_id(pebi); + + if (id2 == SSDFS_PEB_UNKNOWN_MIGRATION_ID) { + /* it needs to initialize the migration id */ + ssdfs_set_peb_migration_id(pebi, id1); + } else if (is_peb_migration_id_valid(id2)) { + if (id1 != id2) { + err = -ERANGE; + SSDFS_ERR("migration_id1 %d != migration_id2 %d\n", + id1, id2); + goto finish_dst_init_used_metadata_state; + } else { + /* + * Do nothing. + */ + } + } else { + err = -ERANGE; + SSDFS_ERR("invalid migration_id %d\n", id2); + goto finish_dst_init_used_metadata_state; + } + + switch (items_state) { + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + if (!pebc->src_peb) { + SSDFS_WARN("source PEB is NULL\n"); + err = -ERANGE; + goto finish_dst_init_used_metadata_state; + } + + id1 = pebi->env.prev_migration_id; + + if (!is_peb_migration_id_valid(id1)) { + err = -EIO; + SSDFS_ERR("invalid peb_migration_id: " + "seg_id %llu, peb_index %u, " + "peb_migration_id %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, + id1); + goto finish_dst_init_used_metadata_state; + } + + id2 = ssdfs_get_peb_migration_id(pebc->src_peb); + + if (id2 == SSDFS_PEB_UNKNOWN_MIGRATION_ID) { + /* it needs to initialize the migration id */ + ssdfs_set_peb_migration_id(pebc->src_peb, id1); + atomic_set(&pebc->src_peb->state, + SSDFS_PEB_OBJECT_INITIALIZED); + complete_all(&pebc->src_peb->init_end); + } else if (is_peb_migration_id_valid(id2)) { + if (id1 != id2) { + err = -ERANGE; + SSDFS_ERR("id1 %d != id2 %d\n", + id1, id2); + goto finish_dst_init_used_metadata_state; + } else { + /* + * Do nothing. + */ + } + } else { + err = -ERANGE; + SSDFS_ERR("invalid migration_id %d\n", id2); + goto finish_dst_init_used_metadata_state; + } + break; + + default: + /* do nothing */ + break; + }; + + atomic_set(&pebc->dst_peb->state, + SSDFS_PEB_OBJECT_INITIALIZED); + complete_all(&pebc->dst_peb->init_end); + +finish_dst_init_used_metadata_state: + ssdfs_destroy_init_env(&pebi->env); + up_read(&pebc->lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: err %d\n", err); +#else + SSDFS_DBG("finished: err %d\n", err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + /* * ssdfs_find_prev_partial_log() - find previous partial log * @fsi: file system info object From patchwork Sat Feb 25 01:08:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151930 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F656C7EE30 for ; Sat, 25 Feb 2023 01:17:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229492AbjBYBRP (ORCPT ); Fri, 24 Feb 2023 20:17:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229534AbjBYBQr (ORCPT ); Fri, 24 Feb 2023 20:16:47 -0500 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D3061689C for ; Fri, 24 Feb 2023 17:16:30 -0800 (PST) Received: by mail-oi1-x22e.google.com with SMTP id bk32so762907oib.10 for ; Fri, 24 Feb 2023 17:16:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7Fs0wji0RZdfERTWVvz/dU2Rrcm/WhP8G9lXm3559BE=; b=k+i9l2IX5yskL4QgdM+6IdVW432ohxSv1xbcT/yaujkf6tV/8OpjV7UtyDAe0SHTmZ wNpQ+HQD9cA2qnYkyQxVhQR3kSkUkf9h67VBveSRkV3SqkgxEYTFq7wPcehYd4hLLH54 v8C0ufaKBXdRI2l7234Jd7usMfnG+Oohs2YRxQc/xWlAdMsDG76DFqdW2PMD4JratUK9 BIJ++adyjtjv3ce7Cbt4fNEv/N8hagqNwaKQ8wWEjQQn/h4mzPDHAja4LpSIHQQLeuBZ H5+RjgU+F/CtLejQ6fFGfS72W/NEGyZM11sUzPTAYspDI+IClkXGfYjoukGQGatjTzOH 6+iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7Fs0wji0RZdfERTWVvz/dU2Rrcm/WhP8G9lXm3559BE=; b=mLcFW6nZ/RlLar5RLh67OW/yyUq1gFgcJkBPZo9i93y1hcgvPYZ8PDPfHZjMc1XzOr BOybFaALW/AWZDGGkG+8laKC8T64EO67D1gyJpcLilqDHHLMDELQWj6jtGWiZFzfXSFZ APcoFlF9Mk92JLWqLJN0mkHlTcSCul/OvrEFaCx2Wu7CmWQtCMyez+TRDf2Ch1mRzcV7 u2sE66FSa/C9M18VXXM+lRf1a6alp02o7zX1N5bRJeoOzYgZ9TnNn7ghOweZPU7nzNT0 Q+7LWvzKa8T4AxxNg52EaL63pdwk4GqfjpYpW3l38SW43jJW96MKvE/NyBQjL81Gly96 Nkvw== X-Gm-Message-State: AO0yUKVRmY7HZ3SBso0u3bgwevRHGqFS0/OPvT30UQr9k1W8BPTx82Rz /CH5zdrBUZisuXuYDvREY6nWoPfiz08Fe/IF X-Google-Smtp-Source: AK7set9qymfqJWpA5EXCYVIu8P3p/NfAQiDn0EMKr6ixAgNZiWuqJm36JtdKYgHpbyWcdPSWBkxbMA== X-Received: by 2002:aca:1c0a:0:b0:384:232:2a4f with SMTP id c10-20020aca1c0a000000b0038402322a4fmr1289096oic.4.1677287789299; Fri, 24 Feb 2023 17:16:29 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:28 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 26/76] ssdfs: offset translation table initialization logic Date: Fri, 24 Feb 2023 17:08:37 -0800 Message-Id: <20230225010927.813929-27-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Offset translation table is a metadata structure that is stored as metadata in every log. The responsibility of offset translation table keeps the knowledge which particular logical blocks are stored into log's payload and which offset in the payload should be used to access and retrieve the content of logical block. Offset translation table can be imagined like a sequence of fragments. Every fragment contains array of physical offset descriptors that provides the way to convert logical block ID into the physical offset in the log. Initialization logic requires to read all fragments from the volume, decompress it, and initilaize offset translation table by fragments. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/peb_read_thread.c | 2288 ++++++++++++++++++++++++++++++++++++ 1 file changed, 2288 insertions(+) diff --git a/fs/ssdfs/peb_read_thread.c b/fs/ssdfs/peb_read_thread.c index f6a5b67612af..317eef078521 100644 --- a/fs/ssdfs/peb_read_thread.c +++ b/fs/ssdfs/peb_read_thread.c @@ -245,6 +245,2294 @@ int ssdfs_read_blk2off_table_fragment(struct ssdfs_peb_info *pebi, * READ THREAD FUNCTIONALITY * ******************************************************************************/ +/* + * __ssdfs_peb_read_log_footer() - read log's footer + * @fsi: file system info object + * @pebi: PEB object + * @page_off: log's starting page + * @desc: footer's descriptor + * @log_bytes: pointer on value of bytes in the log [out] + * + * This function tries to read log's footer. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-ENODATA - valid footer is not found. + */ +static +int __ssdfs_peb_read_log_footer(struct ssdfs_fs_info *fsi, + struct ssdfs_peb_info *pebi, + u16 page_off, + struct ssdfs_metadata_descriptor *desc, + u32 *log_bytes) +{ + struct ssdfs_signature *magic = NULL; + struct ssdfs_partial_log_header *plh_hdr = NULL; + struct ssdfs_log_footer *footer = NULL; + u16 footer_off; + u32 bytes_off; + struct page *page; + void *kaddr; + size_t read_bytes; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!desc || !log_bytes); + + SSDFS_DBG("seg %llu, peb_id %llu, page_off %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, page_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + *log_bytes = U32_MAX; + + bytes_off = le32_to_cpu(desc->offset); + footer_off = bytes_off / fsi->pagesize; + + page = ssdfs_page_array_grab_page(&pebi->cache, footer_off); + if (unlikely(IS_ERR_OR_NULL(page))) { + SSDFS_ERR("fail to grab page: index %u\n", + footer_off); + return -ENOMEM; + } + + kaddr = kmap_local_page(page); + + if (PageUptodate(page) || PageDirty(page)) + goto check_footer_magic; + + err = ssdfs_aligned_read_buffer(fsi, pebi->peb_id, + bytes_off, + (u8 *)kaddr, + PAGE_SIZE, + &read_bytes); + if (unlikely(err)) + goto fail_read_footer; + else if (unlikely(read_bytes != PAGE_SIZE)) { + err = -ERANGE; + goto fail_read_footer; + } + + SetPageUptodate(page); + +check_footer_magic: + magic = (struct ssdfs_signature *)kaddr; + + if (!is_ssdfs_magic_valid(magic)) { + err = -ENODATA; + goto fail_read_footer; + } + + if (is_ssdfs_partial_log_header_magic_valid(magic)) { + plh_hdr = SSDFS_PLH(kaddr); + *log_bytes = le32_to_cpu(plh_hdr->log_bytes); + } else if (__is_ssdfs_log_footer_magic_valid(magic)) { + footer = SSDFS_LF(kaddr); + *log_bytes = le32_to_cpu(footer->log_bytes); + } else { + err = -ENODATA; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log footer is corrupted: " + "peb %llu, page_off %u\n", + pebi->peb_id, page_off); +#endif /* CONFIG_SSDFS_DEBUG */ + goto fail_read_footer; + } + +fail_read_footer: + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("valid footer is not detected: " + "seg_id %llu, peb_id %llu, " + "page_off %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + footer_off); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to read footer: " + "seg %llu, peb %llu, " + "pages_off %u, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + footer_off, + err); + return err; + } + + return 0; +} + +/* + * __ssdfs_peb_read_log_header() - read log's header + * @fsi: file system info object + * @pebi: PEB object + * @page_off: log's starting page + * @log_bytes: pointer on value of bytes in the log [out] + * + * This function tries to read the log's header. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-ENODATA - valid footer is not found. + */ +static +int __ssdfs_peb_read_log_header(struct ssdfs_fs_info *fsi, + struct ssdfs_peb_info *pebi, + u16 page_off, + u32 *log_bytes) +{ + struct ssdfs_signature *magic = NULL; + struct ssdfs_segment_header *seg_hdr = NULL; + struct ssdfs_partial_log_header *pl_hdr = NULL; + struct ssdfs_metadata_descriptor *desc = NULL; + struct page *page; + void *kaddr; + size_t read_bytes; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!log_bytes); + + SSDFS_DBG("seg %llu, peb_id %llu, page_off %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, page_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + *log_bytes = U32_MAX; + + page = ssdfs_page_array_grab_page(&pebi->cache, page_off); + if (unlikely(IS_ERR_OR_NULL(page))) { + SSDFS_ERR("fail to grab page: index %u\n", + page_off); + return -ENOMEM; + } + + kaddr = kmap_local_page(page); + + if (PageUptodate(page) || PageDirty(page)) + goto check_header_magic; + + err = ssdfs_aligned_read_buffer(fsi, pebi->peb_id, + page_off * PAGE_SIZE, + (u8 *)kaddr, + PAGE_SIZE, + &read_bytes); + if (unlikely(err)) + goto fail_read_log_header; + else if (unlikely(read_bytes != PAGE_SIZE)) { + err = -ERANGE; + goto fail_read_log_header; + } + + SetPageUptodate(page); + +check_header_magic: + magic = (struct ssdfs_signature *)kaddr; + + if (!is_ssdfs_magic_valid(magic)) { + err = -ENODATA; + goto fail_read_log_header; + } + + if (__is_ssdfs_segment_header_magic_valid(magic)) { + seg_hdr = SSDFS_SEG_HDR(kaddr); + + err = ssdfs_check_segment_header(fsi, seg_hdr, + false); + if (unlikely(err)) { + err = -ENODATA; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log header is corrupted: " + "seg %llu, peb %llu, page_off %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + page_off); +#endif /* CONFIG_SSDFS_DEBUG */ + goto fail_read_log_header; + } + + desc = &seg_hdr->desc_array[SSDFS_LOG_FOOTER_INDEX]; + + err = __ssdfs_peb_read_log_footer(fsi, pebi, page_off, + desc, log_bytes); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fail to read footer: " + "seg %llu, peb %llu, page_off %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + page_off, + err); +#endif /* CONFIG_SSDFS_DEBUG */ + goto fail_read_log_header; + } else if (unlikely(err)) { + SSDFS_ERR("fail to read footer: " + "seg %llu, peb %llu, page_off %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + page_off, + err); + goto fail_read_log_header; + } + } else if (is_ssdfs_partial_log_header_magic_valid(magic)) { + pl_hdr = SSDFS_PLH(kaddr); + + err = ssdfs_check_partial_log_header(fsi, pl_hdr, + false); + if (unlikely(err)) { + err = -ENODATA; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("partial log header is corrupted: " + "seg %llu, peb %llu, page_off %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + page_off); +#endif /* CONFIG_SSDFS_DEBUG */ + goto fail_read_log_header; + } + + desc = &pl_hdr->desc_array[SSDFS_LOG_FOOTER_INDEX]; + + if (ssdfs_pl_has_footer(pl_hdr)) { + err = __ssdfs_peb_read_log_footer(fsi, pebi, page_off, + desc, log_bytes); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fail to read footer: " + "seg %llu, peb %llu, page_off %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + page_off, + err); +#endif /* CONFIG_SSDFS_DEBUG */ + goto fail_read_log_header; + } else if (unlikely(err)) { + SSDFS_ERR("fail to read footer: " + "seg %llu, peb %llu, page_off %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + page_off, + err); + goto fail_read_log_header; + } + } else + *log_bytes = le32_to_cpu(pl_hdr->log_bytes); + } else { + err = -ENODATA; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log header is corrupted: " + "seg %llu, peb %llu, page_off %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + page_off); +#endif /* CONFIG_SSDFS_DEBUG */ + goto fail_read_log_header; + } + +fail_read_log_header: + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("valid header is not detected: " + "seg_id %llu, peb_id %llu, page_off %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + page_off); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to read checked log header: " + "seg %llu, peb %llu, " + "pages_off %u, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + page_off, err); + return err; + } + + return 0; +} + +/* + * ssdfs_peb_read_all_log_headers() - read all PEB's log headers + * @pebi: pointer on PEB object + * @req: read request + * + * This function tries to read all headers and footers of + * the PEB's logs. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_peb_read_all_log_headers(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + u32 log_bytes = U32_MAX; + u32 page_off; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!req); + + SSDFS_DBG("seg %llu, peb_index %u, " + "class %#x, cmd %#x, type %#x, " + "ino %llu, logical_offset %llu, data_bytes %u\n", + pebi->pebc->parent_si->seg_id, pebi->pebc->peb_index, + req->private.class, req->private.cmd, req->private.type, + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + page_off = 0; + + do { + u32 pages_per_log; + + err = __ssdfs_peb_read_log_header(fsi, pebi, page_off, + &log_bytes); + if (err == -ENODATA) + return 0; + else if (unlikely(err)) { + SSDFS_ERR("fail to read log header: " + "seg %llu, peb %llu, page_off %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + page_off, + err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(log_bytes >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + pages_per_log = log_bytes + fsi->pagesize - 1; + pages_per_log /= fsi->pagesize; + page_off += pages_per_log; + } while (page_off < fsi->pages_per_peb); + + return 0; +} + +/* + * ssdfs_peb_read_src_all_log_headers() - read all source PEB's log headers + * @pebi: pointer on PEB object + * @req: read request + * + * This function tries to read all headers and footers of + * the source PEB's logs. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_peb_read_src_all_log_headers(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req) +{ + struct ssdfs_peb_info *pebi; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!req); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&pebc->lock); + + pebi = pebc->src_peb; + if (!pebi) { + SSDFS_WARN("source PEB is NULL\n"); + err = -ERANGE; + goto finish_read_src_all_log_headers; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg_id %llu, peb_index %u, peb_id %llu\n", + pebc->parent_si->seg_id, + pebc->peb_index, + pebi->peb_id); +#else + SSDFS_DBG("seg_id %llu, peb_index %u, peb_id %llu\n", + pebc->parent_si->seg_id, + pebc->peb_index, + pebi->peb_id); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + err = ssdfs_peb_read_all_log_headers(pebi, req); + if (unlikely(err)) { + SSDFS_ERR("fail to read the log's headers: " + "peb_id %llu, peb_index %u, err %d\n", + pebi->peb_id, pebi->peb_index, err); + goto finish_read_src_all_log_headers; + } + +finish_read_src_all_log_headers: + up_read(&pebc->lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: err %d\n", err); +#else + SSDFS_DBG("finished: err %d\n", err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * ssdfs_peb_read_dst_all_log_headers() - read all dst PEB's log headers + * @pebi: pointer on PEB object + * @req: read request + * + * This function tries to read all headers and footers of + * the destination PEB's logs. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_peb_read_dst_all_log_headers(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req) +{ + struct ssdfs_peb_info *pebi; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!req); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&pebc->lock); + + pebi = pebc->dst_peb; + if (!pebi) { + SSDFS_WARN("destination PEB is NULL\n"); + err = -ERANGE; + goto finish_read_dst_all_log_headers; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg_id %llu, peb_index %u, peb_id %llu\n", + pebc->parent_si->seg_id, + pebc->peb_index, + pebi->peb_id); +#else + SSDFS_DBG("seg_id %llu, peb_index %u, peb_id %llu\n", + pebc->parent_si->seg_id, + pebc->peb_index, + pebi->peb_id); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + err = ssdfs_peb_read_all_log_headers(pebi, req); + if (unlikely(err)) { + SSDFS_ERR("fail to read the log's headers: " + "peb_id %llu, peb_index %u, err %d\n", + pebi->peb_id, pebi->peb_index, err); + goto finish_read_dst_all_log_headers; + } + +finish_read_dst_all_log_headers: + up_read(&pebc->lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: err %d\n", err); +#else + SSDFS_DBG("finished: err %d\n", err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * ssdfs_peb_get_log_pages_count() - determine count of pages in the log + * @fsi: file system info object + * @pebi: PEB object + * @env: init environment [in | out] + * + * This function reads segment header of the first log in + * segment and to retrieve log_pages field. Also it initilizes + * current and previous PEB migration IDs. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_get_log_pages_count(struct ssdfs_fs_info *fsi, + struct ssdfs_peb_info *pebi, + struct ssdfs_read_init_env *env) +{ + struct ssdfs_signature *magic; + struct page *page; + size_t hdr_buf_size = sizeof(struct ssdfs_segment_header); + u32 log_pages; + u32 pages_off = 0; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !env || !env->log_hdr); + + SSDFS_DBG("peb %llu, env %p\n", pebi->peb_id, env); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = ssdfs_page_array_get_page_locked(&pebi->cache, 0); + if (IS_ERR_OR_NULL(page)) { + err = ssdfs_read_checked_segment_header(fsi, + pebi->peb_id, + 0, + env->log_hdr, + false); + if (err) { + SSDFS_ERR("fail to read checked segment header: " + "peb %llu, err %d\n", + pebi->peb_id, err); + return err; + } + } else { + ssdfs_memcpy_from_page(env->log_hdr, 0, hdr_buf_size, + page, 0, PAGE_SIZE, + hdr_buf_size); + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + magic = (struct ssdfs_signature *)env->log_hdr; + +#ifdef CONFIG_SSDFS_DEBUG + if (!is_ssdfs_magic_valid(magic)) { + SSDFS_ERR("valid magic is not detected\n"); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + if (__is_ssdfs_segment_header_magic_valid(magic)) { + struct ssdfs_segment_header *seg_hdr; + + seg_hdr = SSDFS_SEG_HDR(env->log_hdr); + log_pages = le16_to_cpu(seg_hdr->log_pages); + env->log_pages = log_pages; + env->cur_migration_id = + seg_hdr->peb_migration_id[SSDFS_CUR_MIGRATING_PEB]; + env->prev_migration_id = + seg_hdr->peb_migration_id[SSDFS_PREV_MIGRATING_PEB]; + } else { + SSDFS_ERR("log header is corrupted: " + "peb %llu, pages_off %u\n", + pebi->peb_id, pages_off); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + if (fsi->pages_per_peb % log_pages) { + SSDFS_WARN("fsi->pages_per_peb %u, log_pages %u\n", + fsi->pages_per_peb, log_pages); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + if (log_pages > fsi->pages_per_peb) { + SSDFS_ERR("log_pages %u > fsi->pages_per_peb %u\n", + log_pages, fsi->pages_per_peb); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_find_last_partial_log() - find the last partial log + * @fsi: file system info object + * @pebi: pointer on PEB object + * @env: init environment [in|out] + * @new_log_start_page: pointer on the new log's start page [out] + * + * This function tries to find the last partial log + * in the PEB's cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EIO - I/O error. + */ +static +int ssdfs_find_last_partial_log(struct ssdfs_fs_info *fsi, + struct ssdfs_peb_info *pebi, + struct ssdfs_read_init_env *env, + u16 *new_log_start_page) +{ + struct ssdfs_signature *magic = NULL; + struct ssdfs_segment_header *seg_hdr = NULL; + struct ssdfs_partial_log_header *pl_hdr = NULL; + struct ssdfs_log_footer *footer = NULL; + struct page *page; + void *kaddr; + size_t hdr_buf_size = sizeof(struct ssdfs_segment_header); + u32 byte_offset, page_offset; + unsigned long last_page_idx; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !pebi || !pebi->pebc || !env); + BUG_ON(!new_log_start_page); + + SSDFS_DBG("seg %llu, peb %llu, peb_index %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, pebi->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + *new_log_start_page = U16_MAX; + + last_page_idx = ssdfs_page_array_get_last_page_index(&pebi->cache); + + if (last_page_idx >= SSDFS_PAGE_ARRAY_INVALID_LAST_PAGE) { + SSDFS_ERR("empty page array: last_page_idx %lu\n", + last_page_idx); + return -ERANGE; + } + + if (last_page_idx >= fsi->pages_per_peb) { + SSDFS_ERR("corrupted page array: " + "last_page_idx %lu, fsi->pages_per_peb %u\n", + last_page_idx, fsi->pages_per_peb); + return -ERANGE; + } + + for (i = (int)last_page_idx; i >= 0; i--) { + page = ssdfs_page_array_get_page_locked(&pebi->cache, i); + if (IS_ERR_OR_NULL(page)) { + if (page == NULL) { + SSDFS_ERR("fail to get page: " + "index %d\n", + i); + return -ERANGE; + } else { + err = PTR_ERR(page); + + if (err == -ENOENT) + continue; + else { + SSDFS_ERR("fail to get page: " + "index %d, err %d\n", + i, err); + return err; + } + } + } + + kaddr = kmap_local_page(page); + ssdfs_memcpy(env->log_hdr, 0, hdr_buf_size, + kaddr, 0, PAGE_SIZE, + hdr_buf_size); + ssdfs_memcpy(env->footer, 0, hdr_buf_size, + kaddr, 0, PAGE_SIZE, + hdr_buf_size); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page_index %d, page %p, count %d\n", + i, page, page_ref_count(page)); + + SSDFS_DBG("PAGE DUMP: cur_page %u\n", + i); + kaddr = kmap_local_page(page); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, PAGE_SIZE); + kunmap_local(kaddr); + SSDFS_DBG("\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + magic = (struct ssdfs_signature *)env->log_hdr; + + if (!is_ssdfs_magic_valid(magic)) + continue; + + if (__is_ssdfs_segment_header_magic_valid(magic)) { + seg_hdr = SSDFS_SEG_HDR(env->log_hdr); + + err = ssdfs_check_segment_header(fsi, seg_hdr, + false); + if (unlikely(err)) { + SSDFS_ERR("log header is corrupted: " + "seg %llu, peb %llu, index %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + i); + return -EIO; + } + + if (*new_log_start_page >= U16_MAX) { + SSDFS_ERR("invalid new_log_start_page\n"); + return -EIO; + } + + byte_offset = i * fsi->pagesize; + byte_offset += env->log_bytes; + byte_offset += fsi->pagesize - 1; + page_offset = byte_offset / fsi->pagesize; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("byte_offset %u, page_offset %u, " + "new_log_start_page %u\n", + byte_offset, page_offset, *new_log_start_page); + SSDFS_DBG("log_bytes %u\n", env->log_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (*new_log_start_page < page_offset) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("correct new log start page: " + "old value %u, new value %u\n", + *new_log_start_page, + page_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + *new_log_start_page = page_offset; + } else if (page_offset != *new_log_start_page) { + SSDFS_ERR("invalid new log start: " + "page_offset %u, " + "new_log_start_page %u\n", + page_offset, + *new_log_start_page); + return -EIO; + } + + env->log_offset = (u16)i; + pebi->peb_create_time = + le64_to_cpu(seg_hdr->peb_create_time); + pebi->current_log.last_log_time = + le64_to_cpu(seg_hdr->timestamp); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, " + "peb_create_time %llx, last_log_time %llx\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + pebi->peb_create_time, + pebi->current_log.last_log_time); + + BUG_ON(pebi->peb_create_time > + pebi->current_log.last_log_time); +#endif /* CONFIG_SSDFS_DEBUG */ + + goto finish_last_log_search; + } else if (is_ssdfs_partial_log_header_magic_valid(magic)) { + u32 flags; + + pl_hdr = SSDFS_PLH(env->log_hdr); + + err = ssdfs_check_partial_log_header(fsi, pl_hdr, + false); + if (unlikely(err)) { + SSDFS_ERR("partial log header is corrupted: " + "seg %llu, peb %llu, index %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + i); + return -EIO; + } + + flags = le32_to_cpu(pl_hdr->pl_flags); + + if (flags & SSDFS_PARTIAL_HEADER_INSTEAD_FOOTER) { + /* first partial log */ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON((i + 1) >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + byte_offset = (i + 1) * fsi->pagesize; + byte_offset += fsi->pagesize - 1; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("byte_offset %u, " + "new_log_start_page %u\n", + byte_offset, *new_log_start_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + *new_log_start_page = + (u16)(byte_offset / fsi->pagesize); + env->log_bytes = + le32_to_cpu(pl_hdr->log_bytes); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log_bytes %u\n", env->log_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + continue; + } else if (flags & SSDFS_LOG_HAS_FOOTER) { + /* last partial log */ + + env->log_bytes = + le32_to_cpu(pl_hdr->log_bytes); + + byte_offset = i * fsi->pagesize; + byte_offset += env->log_bytes; + byte_offset += fsi->pagesize - 1; + page_offset = byte_offset / fsi->pagesize; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("byte_offset %u, page_offset %u, " + "new_log_start_page %u\n", + byte_offset, page_offset, *new_log_start_page); + SSDFS_DBG("log_bytes %u\n", env->log_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (*new_log_start_page < page_offset) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("correct new log start page: " + "old value %u, " + "new value %u\n", + *new_log_start_page, + page_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + *new_log_start_page = page_offset; + } else if (page_offset != *new_log_start_page) { + SSDFS_ERR("invalid new log start: " + "page_offset %u, " + "new_log_start_page %u\n", + page_offset, + *new_log_start_page); + return -EIO; + } + + env->log_offset = (u16)i; + pebi->peb_create_time = + le64_to_cpu(pl_hdr->peb_create_time); + pebi->current_log.last_log_time = + le64_to_cpu(pl_hdr->timestamp); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, " + "peb_create_time %llx, last_log_time %llx\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + pebi->peb_create_time, + pebi->current_log.last_log_time); + + BUG_ON(pebi->peb_create_time > + pebi->current_log.last_log_time); +#endif /* CONFIG_SSDFS_DEBUG */ + + goto finish_last_log_search; + } else { + /* intermediate partial log */ + + env->log_bytes = + le32_to_cpu(pl_hdr->log_bytes); + + byte_offset = i * fsi->pagesize; + byte_offset += env->log_bytes; + byte_offset += fsi->pagesize - 1; + page_offset = byte_offset / fsi->pagesize; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("byte_offset %u, page_offset %u, " + "new_log_start_page %u\n", + byte_offset, page_offset, *new_log_start_page); + SSDFS_DBG("log_bytes %u\n", env->log_bytes); + + BUG_ON(page_offset >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + *new_log_start_page = (u16)page_offset; + env->log_offset = (u16)i; + pebi->peb_create_time = + le64_to_cpu(pl_hdr->peb_create_time); + pebi->current_log.last_log_time = + le64_to_cpu(pl_hdr->timestamp); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, " + "peb_create_time %llx, last_log_time %llx\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + pebi->peb_create_time, + pebi->current_log.last_log_time); + + BUG_ON(pebi->peb_create_time > + pebi->current_log.last_log_time); +#endif /* CONFIG_SSDFS_DEBUG */ + + goto finish_last_log_search; + } + } else if (__is_ssdfs_log_footer_magic_valid(magic)) { + footer = SSDFS_LF(env->footer); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON((i + 1) >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + byte_offset = (i + 1) * fsi->pagesize; + byte_offset += fsi->pagesize - 1; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("byte_offset %u, new_log_start_page %u\n", + byte_offset, *new_log_start_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + *new_log_start_page = + (u16)(byte_offset / fsi->pagesize); + env->log_bytes = + le32_to_cpu(footer->log_bytes); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log_bytes %u\n", env->log_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + continue; + } else { + SSDFS_ERR("log header is corrupted: " + "seg %llu, peb %llu, index %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + i); + return -ERANGE; + } + } + +finish_last_log_search: + if (env->log_offset >= fsi->pages_per_peb) { + SSDFS_ERR("log_offset %u >= pages_per_peb %u\n", + env->log_offset, fsi->pages_per_peb); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + if (fsi->erasesize < env->log_bytes) { + SSDFS_WARN("fsi->erasesize %u, log_bytes %u\n", + fsi->erasesize, + env->log_bytes); + } + + SSDFS_DBG("seg %llu, peb %llu, peb_index %u, " + "new_log_start_page %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, pebi->peb_index, + *new_log_start_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_check_log_header() - check log's header + * @fsi: file system info object + * @env: init environment [in|out] + * + * This function checks the log's header. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + * %-ENODATA - valid magic is not detected. + */ +static inline +int ssdfs_check_log_header(struct ssdfs_fs_info *fsi, + struct ssdfs_read_init_env *env) +{ + struct ssdfs_signature *magic = NULL; + struct ssdfs_segment_header *seg_hdr = NULL; + struct ssdfs_partial_log_header *pl_hdr = NULL; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!env || !env->log_hdr || !env->footer); + + SSDFS_DBG("log_offset %u, log_pages %u\n", + env->log_offset, env->log_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + magic = (struct ssdfs_signature *)env->log_hdr; + + if (!is_ssdfs_magic_valid(magic)) { + SSDFS_DBG("valid magic is not detected\n"); + return -ENODATA; + } + + if (__is_ssdfs_segment_header_magic_valid(magic)) { + seg_hdr = SSDFS_SEG_HDR(env->log_hdr); + + err = ssdfs_check_segment_header(fsi, seg_hdr, + false); + if (unlikely(err)) { + SSDFS_ERR("log header is corrupted\n"); + return -EIO; + } + + env->has_seg_hdr = true; + env->has_footer = ssdfs_log_has_footer(seg_hdr); + } else if (is_ssdfs_partial_log_header_magic_valid(magic)) { + pl_hdr = SSDFS_PLH(env->log_hdr); + + err = ssdfs_check_partial_log_header(fsi, pl_hdr, + false); + if (unlikely(err)) { + SSDFS_ERR("partial log header is corrupted\n"); + return -EIO; + } + + env->has_seg_hdr = false; + env->has_footer = ssdfs_pl_has_footer(pl_hdr); + } else { + SSDFS_DBG("log header is corrupted\n"); + return -EIO; + } + + return 0; +} + +/* + * ssdfs_get_segment_header_blk_bmap_desc() - get block bitmap's descriptor + * @pebi: pointer on PEB object + * @env: init environment [in] + * @desc: block bitmap's descriptor [out] + * + * This function tries to extract the block bitmap's descriptor. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EIO - I/O error. + */ +static +int ssdfs_get_segment_header_blk_bmap_desc(struct ssdfs_peb_info *pebi, + struct ssdfs_read_init_env *env, + struct ssdfs_metadata_descriptor **desc) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_header *seg_hdr = NULL; + size_t footer_size = sizeof(struct ssdfs_log_footer); + u32 pages_off; + u32 bytes_off; + struct page *page; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !env || !desc); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + *desc = NULL; + + if (!env->has_seg_hdr) { + SSDFS_ERR("segment header is absent\n"); + return -ERANGE; + } + + seg_hdr = SSDFS_SEG_HDR(env->log_hdr); + + if (!ssdfs_seg_hdr_has_blk_bmap(seg_hdr)) { + if (!env->has_footer) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, + __LINE__, + "log hasn't footer\n"); + return -EIO; + } + + *desc = &seg_hdr->desc_array[SSDFS_LOG_FOOTER_INDEX]; + + bytes_off = le32_to_cpu((*desc)->offset); + pages_off = bytes_off / fsi->pagesize; + + page = ssdfs_page_array_get_page_locked(&pebi->cache, + pages_off); + if (IS_ERR_OR_NULL(page)) { + err = ssdfs_read_checked_log_footer(fsi, + env->log_hdr, + pebi->peb_id, + bytes_off, + env->footer, + false); + if (unlikely(err)) { + SSDFS_ERR("fail to read checked log footer: " + "seg %llu, peb %llu, bytes_off %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, bytes_off); + return err; + } + } else { + ssdfs_memcpy_from_page(env->footer, 0, footer_size, + page, 0, PAGE_SIZE, + footer_size); + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + if (!ssdfs_log_footer_has_blk_bmap(env->footer)) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, + __LINE__, + "log hasn't block bitmap\n"); + return -EIO; + } + + *desc = &env->footer->desc_array[SSDFS_BLK_BMAP_INDEX]; + } else + *desc = &seg_hdr->desc_array[SSDFS_BLK_BMAP_INDEX]; + + return 0; +} + +/* + * ssdfs_get_partial_header_blk_bmap_desc() - get block bitmap's descriptor + * @pebi: pointer on PEB object + * @env: init environment [in] + * @desc: block bitmap's descriptor [out] + * + * This function tries to extract the block bitmap's descriptor. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EIO - I/O error. + */ +static +int ssdfs_get_partial_header_blk_bmap_desc(struct ssdfs_peb_info *pebi, + struct ssdfs_read_init_env *env, + struct ssdfs_metadata_descriptor **desc) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_partial_log_header *pl_hdr = NULL; + size_t footer_size = sizeof(struct ssdfs_log_footer); + u32 pages_off; + u32 bytes_off; + struct page *page; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !env || !desc); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + *desc = NULL; + + if (env->has_seg_hdr) { + SSDFS_ERR("partial log header is absent\n"); + return -ERANGE; + } + + pl_hdr = SSDFS_PLH(env->log_hdr); + + if (!ssdfs_pl_hdr_has_blk_bmap(pl_hdr)) { + if (!env->has_footer) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, + __LINE__, + "log hasn't footer\n"); + return -EIO; + } + + *desc = &pl_hdr->desc_array[SSDFS_LOG_FOOTER_INDEX]; + + bytes_off = le32_to_cpu((*desc)->offset); + pages_off = bytes_off / fsi->pagesize; + + page = ssdfs_page_array_get_page_locked(&pebi->cache, + pages_off); + if (IS_ERR_OR_NULL(page)) { + err = ssdfs_read_checked_log_footer(fsi, + env->log_hdr, + pebi->peb_id, + bytes_off, + env->footer, + false); + if (unlikely(err)) { + SSDFS_ERR("fail to read checked log footer: " + "seg %llu, peb %llu, bytes_off %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, bytes_off); + return err; + } + } else { + ssdfs_memcpy_from_page(env->footer, 0, footer_size, + page, 0, PAGE_SIZE, + footer_size); + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + if (!ssdfs_log_footer_has_blk_bmap(env->footer)) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, + __LINE__, + "log hasn't block bitmap\n"); + return -EIO; + } + + *desc = &env->footer->desc_array[SSDFS_BLK_BMAP_INDEX]; + } else + *desc = &pl_hdr->desc_array[SSDFS_BLK_BMAP_INDEX]; + + return 0; +} + +/* + * ssdfs_pre_fetch_block_bitmap() - pre-fetch block bitmap + * @pebi: pointer on PEB object + * @env: init environment [in|out] + * + * This function tries to check the presence of block bitmap + * in the PEB's cache. Otherwise, it tries to read the block + * bitmap from the volume into the PEB's cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EIO - I/O error. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_pre_fetch_block_bitmap(struct ssdfs_peb_info *pebi, + struct ssdfs_read_init_env *env) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_metadata_descriptor *desc = NULL; + struct page *page; + void *kaddr; + u32 pages_off; + u32 bytes_off; + size_t hdr_buf_size = sizeof(struct ssdfs_segment_header); + u32 area_offset, area_size; + u32 cur_page, page_start, page_end; + size_t read_bytes; + size_t bmap_hdr_size = sizeof(struct ssdfs_block_bitmap_header); + u32 pebsize; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !env); + + SSDFS_DBG("seg %llu, peb %llu, peb_index %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, pebi->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + pages_off = env->log_offset; + pebsize = fsi->pages_per_peb * fsi->pagesize; + + page = ssdfs_page_array_get_page_locked(&pebi->cache, pages_off); + if (IS_ERR_OR_NULL(page)) { + err = ssdfs_read_checked_segment_header(fsi, + pebi->peb_id, + pages_off, + env->log_hdr, + false); + if (err) { + SSDFS_ERR("fail to read checked segment header: " + "peb %llu, err %d\n", + pebi->peb_id, err); + return err; + } + } else { + ssdfs_memcpy_from_page(env->log_hdr, 0, hdr_buf_size, + page, 0, PAGE_SIZE, + hdr_buf_size); + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + err = ssdfs_check_log_header(fsi, env); + if (unlikely(err)) { + SSDFS_ERR("fail to check log header: " + "err %d\n", err); + return err; + } + + if (env->has_seg_hdr) + err = ssdfs_get_segment_header_blk_bmap_desc(pebi, env, &desc); + else + err = ssdfs_get_partial_header_blk_bmap_desc(pebi, env, &desc); + + if (unlikely(err)) { + SSDFS_ERR("fail to get descriptor: " + "err %d\n", err); + return err; + } + + if (!desc) { + SSDFS_ERR("invalid descriptor pointer\n"); + return -ERANGE; + } + + area_offset = le32_to_cpu(desc->offset); + area_size = le32_to_cpu(desc->size); + + if (bmap_hdr_size != le16_to_cpu(desc->check.bytes)) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "bmap_hdr_size %zu != desc->check.bytes %u\n", + bmap_hdr_size, + le16_to_cpu(desc->check.bytes)); + return -EIO; + } + + if (area_offset >= pebsize) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "desc->offset %u >= pebsize %u\n", + area_offset, pebsize); + return -EIO; + } + + bytes_off = area_offset; + page_start = bytes_off / fsi->pagesize; + bytes_off += area_size - 1; + page_end = bytes_off / fsi->pagesize; + + for (cur_page = page_start; cur_page <= page_end; cur_page++) { + page = ssdfs_page_array_get_page_locked(&pebi->cache, + cur_page); + if (IS_ERR_OR_NULL(page)) { + page = ssdfs_page_array_grab_page(&pebi->cache, + cur_page); + if (unlikely(IS_ERR_OR_NULL(page))) { + SSDFS_ERR("fail to grab page: index %u\n", + cur_page); + return -ENOMEM; + } + + kaddr = kmap_local_page(page); + err = ssdfs_aligned_read_buffer(fsi, pebi->peb_id, + cur_page * PAGE_SIZE, + (u8 *)kaddr, + PAGE_SIZE, + &read_bytes); + kunmap_local(kaddr); + + if (unlikely(err)) { + SSDFS_ERR("fail to read memory page: " + "index %u, err %d\n", + cur_page, err); + goto finish_read_page; + } else if (unlikely(read_bytes != PAGE_SIZE)) { + err = -ERANGE; + SSDFS_ERR("invalid read_bytes %zu\n", + read_bytes); + goto finish_read_page; + } + + SetPageUptodate(page); + +finish_read_page: + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + } + + return err; +} + +/* + * ssdfs_get_segment_header_blk2off_tbl_desc() - get blk2off tbl's descriptor + * @pebi: pointer on PEB object + * @env: init environment [in] + * @desc: blk2off tbl's descriptor [out] + * + * This function tries to extract the blk2off table's descriptor. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EIO - I/O error. + * %-ENOENT - blk2off table's descriptor is absent. + */ +static inline +int ssdfs_get_segment_header_blk2off_tbl_desc(struct ssdfs_peb_info *pebi, + struct ssdfs_read_init_env *env, + struct ssdfs_metadata_descriptor **desc) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_header *seg_hdr = NULL; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !env || !desc); +#endif /* CONFIG_SSDFS_DEBUG */ + + *desc = NULL; + fsi = pebi->pebc->parent_si->fsi; + + if (!env->has_seg_hdr) { + SSDFS_ERR("segment header is absent\n"); + return -ERANGE; + } + + seg_hdr = SSDFS_SEG_HDR(env->log_hdr); + + if (!ssdfs_seg_hdr_has_offset_table(seg_hdr)) { + if (!env->has_footer) { + ssdfs_fs_error(fsi->sb, __FILE__, + __func__, __LINE__, + "log hasn't footer\n"); + return -EIO; + } + + if (!ssdfs_log_footer_has_offset_table(env->footer)) { + SSDFS_DBG("log hasn't blk2off table\n"); + return -ENOENT; + } + + *desc = &env->footer->desc_array[SSDFS_OFF_TABLE_INDEX]; + } else + *desc = &seg_hdr->desc_array[SSDFS_OFF_TABLE_INDEX]; + + return 0; +} + +/* + * ssdfs_get_segment_header_blk_desc_tbl_desc() - get blk desc tbl's descriptor + * @pebi: pointer on PEB object + * @env: init environment [in] + * @desc: blk desc tbl's descriptor [out] + * + * This function tries to extract the block descriptor table's descriptor. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EIO - I/O error. + * %-ENOENT - block descriptor table's descriptor is absent. + */ +static inline +int ssdfs_get_segment_header_blk_desc_tbl_desc(struct ssdfs_peb_info *pebi, + struct ssdfs_read_init_env *env, + struct ssdfs_metadata_descriptor **desc) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_header *seg_hdr = NULL; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !env || !desc); +#endif /* CONFIG_SSDFS_DEBUG */ + + *desc = NULL; + fsi = pebi->pebc->parent_si->fsi; + + if (!env->has_seg_hdr) { + SSDFS_ERR("segment header is absent\n"); + return -ERANGE; + } + + seg_hdr = SSDFS_SEG_HDR(env->log_hdr); + + if (!ssdfs_log_has_blk_desc_chain(seg_hdr)) { + SSDFS_DBG("log hasn't block descriptor table\n"); + return -ENOENT; + } else + *desc = &seg_hdr->desc_array[SSDFS_BLK_DESC_AREA_INDEX]; + + return 0; +} + +/* + * ssdfs_get_partial_header_blk2off_tbl_desc() - get blk2off tbl's descriptor + * @pebi: pointer on PEB object + * @env: init environment [in] + * @desc: blk2off tbl's descriptor [out] + * + * This function tries to extract the blk2off table's descriptor. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EIO - I/O error. + * %-ENOENT - blk2off table's descriptor is absent. + */ +static inline +int ssdfs_get_partial_header_blk2off_tbl_desc(struct ssdfs_peb_info *pebi, + struct ssdfs_read_init_env *env, + struct ssdfs_metadata_descriptor **desc) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_partial_log_header *pl_hdr = NULL; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !env || !desc); +#endif /* CONFIG_SSDFS_DEBUG */ + + *desc = NULL; + fsi = pebi->pebc->parent_si->fsi; + + if (env->has_seg_hdr) { + SSDFS_ERR("partial log header is absent\n"); + return -ERANGE; + } + + pl_hdr = SSDFS_PLH(env->log_hdr); + + if (!ssdfs_pl_hdr_has_offset_table(pl_hdr)) { + if (!env->has_footer) { + SSDFS_DBG("log hasn't blk2off table\n"); + return -ENOENT; + } + + if (!ssdfs_log_footer_has_offset_table(env->footer)) { + SSDFS_DBG("log hasn't blk2off table\n"); + return -ENOENT; + } + + *desc = &env->footer->desc_array[SSDFS_OFF_TABLE_INDEX]; + } else + *desc = &pl_hdr->desc_array[SSDFS_OFF_TABLE_INDEX]; + + return 0; +} + +/* + * ssdfs_get_partial_header_blk_desc_tbl_desc() - get blk desc tbl's descriptor + * @pebi: pointer on PEB object + * @env: init environment [in] + * @desc: blk desc tbl's descriptor [out] + * + * This function tries to extract the block descriptor table's descriptor. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EIO - I/O error. + * %-ENOENT - block descriptor table's descriptor is absent. + */ +static inline +int ssdfs_get_partial_header_blk_desc_tbl_desc(struct ssdfs_peb_info *pebi, + struct ssdfs_read_init_env *env, + struct ssdfs_metadata_descriptor **desc) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_partial_log_header *pl_hdr = NULL; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !env || !desc); +#endif /* CONFIG_SSDFS_DEBUG */ + + *desc = NULL; + fsi = pebi->pebc->parent_si->fsi; + + if (env->has_seg_hdr) { + SSDFS_ERR("partial log header is absent\n"); + return -ERANGE; + } + + pl_hdr = SSDFS_PLH(env->log_hdr); + + if (!ssdfs_pl_has_blk_desc_chain(pl_hdr)) { + SSDFS_DBG("log hasn't block descriptor table\n"); + return -ENOENT; + } else + *desc = &pl_hdr->desc_array[SSDFS_BLK_DESC_AREA_INDEX]; + + return 0; +} + +/* + * ssdfs_pre_fetch_metadata_area() - pre-fetch metadata area + * @pebi: pointer on PEB object + * @desc: metadata area's descriptor + * + * This function tries to check the presence of metadata area + * in the PEB's cache. Otherwise, it tries to read the metadata area + * from the volume into the PEB's cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EIO - I/O error. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_pre_fetch_metadata_area(struct ssdfs_peb_info *pebi, + struct ssdfs_metadata_descriptor *desc) +{ + struct ssdfs_fs_info *fsi; + struct page *page; + void *kaddr; + u32 bytes_off; + u32 area_offset, area_size; + u32 cur_page, page_start, page_end; + size_t read_bytes; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !desc); + + SSDFS_DBG("seg %llu, peb %llu, peb_index %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, pebi->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + area_offset = le32_to_cpu(desc->offset); + area_size = le32_to_cpu(desc->size); + + bytes_off = area_offset; + page_start = bytes_off / fsi->pagesize; + bytes_off += area_size - 1; + page_end = bytes_off / fsi->pagesize; + + for (cur_page = page_start; cur_page <= page_end; cur_page++) { + page = ssdfs_page_array_get_page_locked(&pebi->cache, + cur_page); + if (IS_ERR_OR_NULL(page)) { + page = ssdfs_page_array_grab_page(&pebi->cache, + cur_page); + if (unlikely(IS_ERR_OR_NULL(page))) { + SSDFS_ERR("fail to grab page: index %u\n", + cur_page); + return -ENOMEM; + } + + kaddr = kmap_local_page(page); + err = ssdfs_aligned_read_buffer(fsi, pebi->peb_id, + cur_page * PAGE_SIZE, + (u8 *)kaddr, + PAGE_SIZE, + &read_bytes); + flush_dcache_page(page); + kunmap_local(kaddr); + + if (unlikely(err)) { + SSDFS_ERR("fail to read memory page: " + "index %u, err %d\n", + cur_page, err); + goto finish_read_page; + } else if (unlikely(read_bytes != PAGE_SIZE)) { + err = -ERANGE; + SSDFS_ERR("invalid read_bytes %zu\n", + read_bytes); + goto finish_read_page; + } + + SetPageUptodate(page); + +finish_read_page: + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + } + + return err; +} + +/* + * ssdfs_pre_fetch_blk2off_table_area() - pre-fetch blk2off table + * @pebi: pointer on PEB object + * @env: init environment [in|out] + * + * This function tries to check the presence of blk2off table + * in the PEB's cache. Otherwise, it tries to read the blk2off table + * from the volume into the PEB's cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EIO - I/O error. + * %-ENOMEM - fail to allocate memory. + * %-ENOENT - blk2off table is absent. + */ +static +int ssdfs_pre_fetch_blk2off_table_area(struct ssdfs_peb_info *pebi, + struct ssdfs_read_init_env *env) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_metadata_descriptor *desc = NULL; + struct page *page; + u32 pages_off; + size_t hdr_buf_size = sizeof(struct ssdfs_segment_header); + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !env); + + SSDFS_DBG("seg %llu, peb %llu, peb_index %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, pebi->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + pages_off = env->log_offset; + + page = ssdfs_page_array_get_page_locked(&pebi->cache, pages_off); + if (IS_ERR_OR_NULL(page)) { + err = ssdfs_read_checked_segment_header(fsi, + pebi->peb_id, + pages_off, + env->log_hdr, + false); + if (err) { + SSDFS_ERR("fail to read checked segment header: " + "peb %llu, err %d\n", + pebi->peb_id, err); + return err; + } + } else { + ssdfs_memcpy_from_page(env->log_hdr, 0, hdr_buf_size, + page, 0, PAGE_SIZE, + hdr_buf_size); + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + err = ssdfs_check_log_header(fsi, env); + if (unlikely(err)) { + SSDFS_ERR("fail to check log header: " + "err %d\n", err); + return err; + } + + if (env->has_seg_hdr) { + err = ssdfs_get_segment_header_blk2off_tbl_desc(pebi, env, + &desc); + } else { + err = ssdfs_get_partial_header_blk2off_tbl_desc(pebi, env, + &desc); + } + + if (err == -ENOENT) { + SSDFS_DBG("blk2off table is absent\n"); + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to get descriptor: " + "err %d\n", err); + return err; + } + + if (!desc) { + SSDFS_ERR("invalid descriptor pointer\n"); + return -ERANGE; + } + + err = ssdfs_pre_fetch_metadata_area(pebi, desc); + if (unlikely(err)) { + SSDFS_ERR("fail to pre-fetch a metadata area: " + "err %d\n", err); + return err; + } + + return 0; +} + +/* + * ssdfs_read_blk_desc_byte_stream() - read blk desc's byte stream + * @pebi: pointer on PEB object + * @read_bytes: amount of bytes for reading + * @env: init environment [in|out] + * + * This function tries to read blk desc table's byte stream. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-EIO - I/O error. + */ +static +int ssdfs_read_blk_desc_byte_stream(struct ssdfs_peb_info *pebi, + u32 read_bytes, + struct ssdfs_read_init_env *env) +{ + struct ssdfs_fs_info *fsi; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!env); + + SSDFS_DBG("seg %llu, peb %llu, read_bytes %u, " + "read_off %u, write_off %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + read_bytes, env->bdt_init.read_off, + env->bdt_init.write_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + while (read_bytes > 0) { + struct page *page = NULL; + void *kaddr; + pgoff_t page_index = env->bdt_init.write_off >> PAGE_SHIFT; + u32 capacity = pagevec_count(&env->bdt_init.pvec) << PAGE_SHIFT; + u32 offset, bytes; + + if (env->bdt_init.write_off >= capacity) { + if (pagevec_space(&env->bdt_init.pvec) == 0) { + /* + * Block descriptor table byte stream could be + * bigger than page vector capacity. + * As a result, not complete byte stream will + * read and initialization will be done only + * partially. The rest byte stream will be + * extracted and be used for initialization + * for request of particular logical block. + */ + SSDFS_DBG("pagevec is full\n"); + return 0; + } + + page = ssdfs_read_add_pagevec_page(&env->bdt_init.pvec); + if (unlikely(IS_ERR_OR_NULL(page))) { + err = !page ? -ENOMEM : PTR_ERR(page); + SSDFS_ERR("fail to add pagevec page: err %d\n", + err); + return err; + } + } else { + page = env->bdt_init.pvec.pages[page_index]; + if (unlikely(!page)) { + err = -ERANGE; + SSDFS_ERR("fail to get page: err %d\n", + err); + return err; + } + } + + offset = env->bdt_init.write_off % PAGE_SIZE; + bytes = min_t(u32, read_bytes, PAGE_SIZE - offset); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("offset %u, bytes %u\n", + offset, bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + err = ssdfs_unaligned_read_cache(pebi, + env->bdt_init.read_off, bytes, + (u8 *)kaddr + offset); + flush_dcache_page(page); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + if (unlikely(err)) { + SSDFS_ERR("fail to read page: " + "seg %llu, peb %llu, offset %u, " + "size %u, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, env->bdt_init.read_off, + bytes, err); + return err; + } + + read_bytes -= bytes; + env->bdt_init.read_off += bytes; + env->bdt_init.write_off += bytes; + }; + + return 0; +} + +/* + * ssdfs_read_blk_desc_compressed_byte_stream() - read blk desc's byte stream + * @pebi: pointer on PEB object + * @read_bytes: amount of bytes for reading + * @env: init environment [in|out] + * + * This function tries to read blk desc table's byte stream. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-EIO - I/O error. + */ +static +int ssdfs_read_blk_desc_compressed_byte_stream(struct ssdfs_peb_info *pebi, + u32 read_bytes, + struct ssdfs_read_init_env *env) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_area_block_table table; + struct ssdfs_fragment_desc *frag; + struct page *page = NULL; + void *kaddr; + size_t tbl_size = sizeof(struct ssdfs_area_block_table); + u32 area_offset; + u16 fragments_count; + u16 frag_uncompr_size; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!env); + + SSDFS_DBG("seg %llu, peb %llu, read_bytes %u, " + "read_off %u, write_off %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + read_bytes, env->bdt_init.read_off, + env->bdt_init.write_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + area_offset = env->bdt_init.read_off; + + err = ssdfs_unaligned_read_cache(pebi, area_offset, tbl_size, &table); + if (unlikely(err)) { + SSDFS_ERR("fail to read area block table: " + "area_offset %u, tbl_size %zu, err %d\n", + area_offset, tbl_size, err); + return err; + } + + if (table.chain_hdr.magic != SSDFS_CHAIN_HDR_MAGIC) { + SSDFS_ERR("corrupted area block table: " + "magic (expected %#x, found %#x)\n", + SSDFS_CHAIN_HDR_MAGIC, + table.chain_hdr.magic); + return -EIO; + } + + switch (table.chain_hdr.type) { + case SSDFS_BLK_DESC_ZLIB_CHAIN_HDR: + case SSDFS_BLK_DESC_LZO_CHAIN_HDR: + /* expected type */ + break; + + default: + SSDFS_ERR("unexpected area block table's type %#x\n", + table.chain_hdr.type); + return -EIO; + } + + fragments_count = le16_to_cpu(table.chain_hdr.fragments_count); + + for (i = 0; i < fragments_count; i++) { + frag = &table.blk[i]; + + if (frag->magic != SSDFS_FRAGMENT_DESC_MAGIC) { + SSDFS_ERR("corrupted area block table: " + "magic (expected %#x, found %#x)\n", + SSDFS_FRAGMENT_DESC_MAGIC, + frag->magic); + return -EIO; + } + + switch (frag->type) { + case SSDFS_DATA_BLK_DESC_ZLIB: + case SSDFS_DATA_BLK_DESC_LZO: + /* expected type */ + break; + + default: + SSDFS_ERR("unexpected fragment's type %#x\n", + frag->type); + return -EIO; + } + + frag_uncompr_size = le16_to_cpu(frag->uncompr_size); + + page = ssdfs_read_add_pagevec_page(&env->bdt_init.pvec); + if (unlikely(IS_ERR_OR_NULL(page))) { + err = !page ? -ENOMEM : PTR_ERR(page); + SSDFS_ERR("fail to add pagevec page: err %d\n", + err); + return err; + } + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + err = __ssdfs_decompress_blk_desc_fragment(pebi, frag, + area_offset, + kaddr, PAGE_SIZE); + flush_dcache_page(page); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + if (unlikely(err)) { + SSDFS_ERR("fail to read page: " + "seg %llu, peb %llu, offset %u, " + "size %u, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, env->bdt_init.read_off, + frag_uncompr_size, err); + return err; + } + + env->bdt_init.read_off += frag_uncompr_size; + env->bdt_init.write_off += frag_uncompr_size; + } + + return err; +} + +/* + * ssdfs_pre_fetch_blk_desc_table_area() - pre-fetch blk desc table + * @pebi: pointer on PEB object + * @env: init environment [in|out] + * + * This function tries to check the presence of blk desc table + * in the PEB's cache. Otherwise, it tries to read the blk desc table + * from the volume into the PEB's cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EIO - I/O error. + * %-ENOMEM - fail to allocate memory. + * %-ENOENT - blk desc table is absent. + */ +static +int ssdfs_pre_fetch_blk_desc_table_area(struct ssdfs_peb_info *pebi, + struct ssdfs_read_init_env *env) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_metadata_descriptor *desc = NULL; + struct page *page; + u32 pages_off; + size_t hdr_buf_size = sizeof(struct ssdfs_segment_header); + u16 flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !env); + + SSDFS_DBG("seg %llu, peb %llu, peb_index %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, pebi->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + pages_off = env->log_offset; + env->bdt_init.read_off = 0; + env->bdt_init.write_off = 0; + + page = ssdfs_page_array_get_page_locked(&pebi->cache, pages_off); + if (IS_ERR_OR_NULL(page)) { + err = ssdfs_read_checked_segment_header(fsi, + pebi->peb_id, + pages_off, + env->log_hdr, + false); + if (err) { + SSDFS_ERR("fail to read checked segment header: " + "peb %llu, err %d\n", + pebi->peb_id, err); + return err; + } + } else { + ssdfs_memcpy_from_page(env->log_hdr, 0, hdr_buf_size, + page, 0, PAGE_SIZE, + hdr_buf_size); + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + err = ssdfs_check_log_header(fsi, env); + if (unlikely(err)) { + SSDFS_ERR("fail to check log header: " + "err %d\n", err); + return err; + } + + if (env->has_seg_hdr) { + err = ssdfs_get_segment_header_blk_desc_tbl_desc(pebi, env, + &desc); + } else { + err = ssdfs_get_partial_header_blk_desc_tbl_desc(pebi, env, + &desc); + } + + if (err == -ENOENT) { + SSDFS_DBG("blk descriptor table is absent\n"); + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to get descriptor: " + "err %d\n", err); + return err; + } + + if (!desc) { + SSDFS_ERR("invalid descriptor pointer\n"); + return -ERANGE; + } + + env->bdt_init.read_off = le32_to_cpu(desc->offset); + + err = ssdfs_pre_fetch_metadata_area(pebi, desc); + if (unlikely(err)) { + SSDFS_ERR("fail to pre-fetch a metadata area: " + "err %d\n", err); + return err; + } + + flags = le16_to_cpu(desc->check.flags); + + if ((flags & SSDFS_ZLIB_COMPRESSED) && (flags & SSDFS_LZO_COMPRESSED)) { + SSDFS_ERR("invalid set of flags: " + "flags %#x\n", + flags); + return -ERANGE; + } + + if ((flags & SSDFS_ZLIB_COMPRESSED) || (flags & SSDFS_LZO_COMPRESSED)) { + err = ssdfs_read_blk_desc_compressed_byte_stream(pebi, + le32_to_cpu(desc->size), + env); + } else { + u32 read_bytes = le32_to_cpu(desc->size); + size_t area_tbl_size = sizeof(struct ssdfs_area_block_table); + + env->bdt_init.read_off += area_tbl_size; + + if (read_bytes <= area_tbl_size) { + SSDFS_ERR("corrupted area blocks table: " + "read_bytes %u, area_tbl_size %zu\n", + read_bytes, area_tbl_size); + return -EIO; + } + + read_bytes -= area_tbl_size; + + err = ssdfs_read_blk_desc_byte_stream(pebi, read_bytes, env); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to prepare block descriptor table: " + "err %d\n", err); + return err; + } + + return 0; +} + /* * ssdfs_read_checked_block_bitmap_header() - read and check block bitmap header * @pebi: pointer on PEB object From patchwork Sat Feb 25 01:08:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151932 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A4AFC64ED8 for ; Sat, 25 Feb 2023 01:17:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229574AbjBYBRR (ORCPT ); Fri, 24 Feb 2023 20:17:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48684 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229682AbjBYBQr (ORCPT ); Fri, 24 Feb 2023 20:16:47 -0500 Received: from mail-oi1-x235.google.com (mail-oi1-x235.google.com [IPv6:2607:f8b0:4864:20::235]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D5B3B1688D for ; Fri, 24 Feb 2023 17:16:32 -0800 (PST) Received: by mail-oi1-x235.google.com with SMTP id bl7so859227oib.0 for ; Fri, 24 Feb 2023 17:16:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=w8EUiqDud2aLb0cV5Wd8Zcdo/JT59MaincS+FUe8d20=; b=ATEMztRNqUnbNUdqQzVyO1N++3H7XstQ+WZnq3nHQvNtEscibqtaTZZ7Kjyw+MLap/ xVxxoQYYex9pXgEidSYZZjcMM0VO46xh/amjHzZoG0mPlhFjpTO6lMJlbbS5Ayw7G3NQ 0VGTc2AjViUm+VMPasM7AP2e7MfvqHsfR5uCp+3V+CYocQ+YEedffwoEu6AyIdAOzVxn 761BEtRpSh5377xWRGY6lCnICUe6SJNnmGPH054uoKJ4gvlvWcOk3naSjV4oAEsXGsgE uagNrcBpQUAfgKEz+goc+Bvc84OW8cCKVKL/G4I/YympzsFX8RqgdfIfgeHy58qTPUBR v6Gg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=w8EUiqDud2aLb0cV5Wd8Zcdo/JT59MaincS+FUe8d20=; b=RyPHLcXh6uxjy7vS2i/QIywdZYJsKNhVJMjG+ubWKhEk7+q33D91fmsD0NDCC8YzIg LRFO5MCUUV/dD4K4SFkcOk9/A0oZS7GczUtae2GaeqJsYyTJxdSLjxc4WVzCrqTzqgAO 9NyYtFMqLuJrrcICxXb5mAWNqeTpRkUqAOdJ53nl8CJxKZzF/xkRsZoLeeJb2/qfU7Aw M3ZgA5OhlyEWR7gNZiagjD4NXcCegkh9NppLbTwgQbTHgMXbynz/ZTN3BDvQ1tA0LIpZ t0B85MiHsbDAo8ObSJj1gU8p4qw4WDuvdjUeqatXcc/G+lWpeQbqeCyej4fuLdMxL1WI RP+A== X-Gm-Message-State: AO0yUKWn1/MibVqLFa6b2jauOhe1jUi4+o4NxBFAS9661LiRd67uz1lt /N8dzUnbeZo4JBCc6FiqGj0r7YSyj6Ma9ulB X-Google-Smtp-Source: AK7set9GhSN6rBvT9J7xJ2k24FpC6iAejaSNV0t2s2J5FsuFNFcScCk0ZvX0ie+tghG5yl/lbpaNLQ== X-Received: by 2002:a05:6808:23c7:b0:378:7dbd:6da7 with SMTP id bq7-20020a05680823c700b003787dbd6da7mr902741oib.29.1677287791553; Fri, 24 Feb 2023 17:16:31 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:30 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 27/76] ssdfs: read/readahead logic of PEB's thread Date: Fri, 24 Feb 2023 17:08:38 -0800 Message-Id: <20230225010927.813929-28-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This patch implements read and readahead logic. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/peb_read_thread.c | 3076 ++++++++++++++++++++++++++++++++++++ 1 file changed, 3076 insertions(+) diff --git a/fs/ssdfs/peb_read_thread.c b/fs/ssdfs/peb_read_thread.c index 317eef078521..764f4fdf5b0c 100644 --- a/fs/ssdfs/peb_read_thread.c +++ b/fs/ssdfs/peb_read_thread.c @@ -245,6 +245,3082 @@ int ssdfs_read_blk2off_table_fragment(struct ssdfs_peb_info *pebi, * READ THREAD FUNCTIONALITY * ******************************************************************************/ +/* + * __ssdfs_peb_release_pages() - release memory pages + * @pebi: pointer on PEB object + * + * This method tries to release the used pages from the page + * array upon the init has been finished. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_peb_release_pages(struct ssdfs_peb_info *pebi) +{ + u16 last_log_start_page = U16_MAX; + u16 log_pages = 0; + pgoff_t start, end; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); + BUG_ON(!rwsem_is_locked(&pebi->pebc->lock)); + + SSDFS_DBG("seg_id %llu, peb_index %u, peb_id %llu\n", + pebi->pebc->parent_si->seg_id, + pebi->pebc->peb_index, + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&pebi->current_log.state)) { + case SSDFS_LOG_INITIALIZED: + case SSDFS_LOG_CREATED: + case SSDFS_LOG_COMMITTED: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid current log's state: " + "%#x\n", + atomic_read(&pebi->current_log.state)); + return -ERANGE; + } + + ssdfs_peb_current_log_lock(pebi); + last_log_start_page = pebi->current_log.start_page; + log_pages = pebi->log_pages; + ssdfs_peb_current_log_unlock(pebi); + + if (last_log_start_page > 0 && last_log_start_page <= log_pages) { + start = 0; + end = last_log_start_page - 1; + + err = ssdfs_page_array_release_pages(&pebi->cache, + &start, end); + if (unlikely(err)) { + SSDFS_ERR("fail to release pages: " + "seg_id %llu, peb_id %llu, " + "start %lu, end %lu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, start, end, err); + } + } + + if (!err && is_ssdfs_page_array_empty(&pebi->cache)) { + err = -ENODATA; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cache is empty: " + "seg_id %llu, peb_index %u, peb_id %llu\n", + pebi->pebc->parent_si->seg_id, + pebi->pebc->peb_index, + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return err; +} + +/* + * ssdfs_peb_release_pages_after_init() - release memory pages + * @pebc: pointer on PEB container + * @req: read request + * + * This method tries to release the used pages from the page + * array upon the init has been finished. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_release_pages(struct ssdfs_peb_container *pebc) +{ + struct ssdfs_peb_info *pebi; + int err1 = 0, err2 = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + + SSDFS_DBG("seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&pebc->lock); + + pebi = pebc->src_peb; + if (pebi) { + err1 = __ssdfs_peb_release_pages(pebi); + if (err1 == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cache is empty: " + "seg_id %llu, peb_index %u\n", + pebi->pebc->parent_si->seg_id, + pebi->pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (unlikely(err1)) { + SSDFS_ERR("fail to release source PEB pages: " + "seg_id %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err1); + } + } + + pebi = pebc->dst_peb; + if (pebi) { + err2 = __ssdfs_peb_release_pages(pebi); + if (err2 == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cache is empty: " + "seg_id %llu, peb_index %u\n", + pebi->pebc->parent_si->seg_id, + pebi->pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (unlikely(err2)) { + SSDFS_ERR("fail to release dest PEB pages: " + "seg_id %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err2); + } + } + + up_write(&pebc->lock); + + if (err1 || err2) { + if (err1 == -ENODATA && err2 == -ENODATA) + return -ENODATA; + else if (!err1) { + if (err2 != -ENODATA) + return err2; + else + return 0; + } else if (!err2) { + if (err1 != -ENODATA) + return err1; + else + return 0; + } else + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_unaligned_read_cache() - unaligned read from PEB's cache + * @pebi: pointer on PEB object + * @area_offset: offset from the log's beginning + * @area_size: size of the data portion + * @buf: buffer for read + * + * This function tries to read some data portion from + * the PEB's cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_unaligned_read_cache(struct ssdfs_peb_info *pebi, + u32 area_offset, u32 area_size, + void *buf) +{ + struct ssdfs_fs_info *fsi; + struct page *page; + u32 page_off; + u32 bytes_off; + size_t read_bytes = 0; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si || !buf); + + SSDFS_DBG("seg %llu, peb %llu, " + "area_offset %u, area_size %u, buf %p\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + area_offset, area_size, buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + do { + size_t iter_read_bytes; + size_t offset; + + bytes_off = area_offset + read_bytes; + page_off = bytes_off / PAGE_SIZE; + offset = bytes_off % PAGE_SIZE; + + iter_read_bytes = min_t(size_t, + (size_t)(area_size - read_bytes), + (size_t)(PAGE_SIZE - offset)); + + page = ssdfs_page_array_get_page_locked(&pebi->cache, page_off); + if (IS_ERR_OR_NULL(page)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fail to get page: index %u\n", + page_off); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + err = ssdfs_memcpy_from_page(buf, read_bytes, area_size, + page, offset, PAGE_SIZE, + iter_read_bytes); + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + + if (unlikely(err)) { + SSDFS_ERR("fail to copy: " + "read_bytes %zu, offset %zu, " + "iter_read_bytes %zu, err %d\n", + read_bytes, offset, + iter_read_bytes, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + read_bytes += iter_read_bytes; + } while (read_bytes < area_size); + + return 0; +} + +/* + * ssdfs_peb_read_log_hdr_desc_array() - read log's header area's descriptors + * @pebi: pointer on PEB object + * @req: request + * @log_start_page: starting page of the log + * @array: array of area's descriptors [out] + * @array_size: count of items into array + * + * This function tries to read log's header area's descriptors. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-ENOENT - cache hasn't the requested page. + */ +int ssdfs_peb_read_log_hdr_desc_array(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + u16 log_start_page, + struct ssdfs_metadata_descriptor *array, + size_t array_size) +{ + struct ssdfs_fs_info *fsi; + struct page *page; + void *kaddr; + struct ssdfs_signature *magic = NULL; + struct ssdfs_segment_header *seg_hdr = NULL; + struct ssdfs_partial_log_header *plh_hdr = NULL; + size_t desc_size = sizeof(struct ssdfs_metadata_descriptor); + size_t array_bytes = array_size * desc_size; + u32 page_off; + size_t read_bytes; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!array); + BUG_ON(array_size != SSDFS_SEG_HDR_DESC_MAX); + + SSDFS_DBG("seg %llu, peb %llu, log_start_page %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + log_start_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + page_off = log_start_page; + + page = ssdfs_page_array_get_page_locked(&pebi->cache, page_off); + if (unlikely(IS_ERR_OR_NULL(page))) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to get page: index %u\n", + page_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (req->private.flags & SSDFS_REQ_READ_ONLY_CACHE) + return -ENOENT; + + page = ssdfs_page_array_grab_page(&pebi->cache, page_off); + if (unlikely(IS_ERR_OR_NULL(page))) { + SSDFS_ERR("fail to grab page: index %u\n", + page_off); + return -ENOMEM; + } + + kaddr = kmap_local_page(page); + + err = ssdfs_aligned_read_buffer(fsi, pebi->peb_id, + (page_off * PAGE_SIZE), + (u8 *)kaddr, + PAGE_SIZE, + &read_bytes); + if (unlikely(err)) + goto fail_copy_desc_array; + else if (unlikely(read_bytes != (PAGE_SIZE))) { + err = -ERANGE; + goto fail_copy_desc_array; + } + + SetPageUptodate(page); + flush_dcache_page(page); + } else + kaddr = kmap_local_page(page); + + magic = (struct ssdfs_signature *)kaddr; + + if (!is_ssdfs_magic_valid(magic)) { + err = -ERANGE; + SSDFS_ERR("valid magic is not detected\n"); + goto fail_copy_desc_array; + } + + if (__is_ssdfs_segment_header_magic_valid(magic)) { + seg_hdr = SSDFS_SEG_HDR(kaddr); + ssdfs_memcpy(array, 0, array_bytes, + seg_hdr->desc_array, 0, array_bytes, + array_bytes); + } else if (is_ssdfs_partial_log_header_magic_valid(magic)) { + plh_hdr = SSDFS_PLH(kaddr); + ssdfs_memcpy(array, 0, array_bytes, + plh_hdr->desc_array, 0, array_bytes, + array_bytes); + } else { + err = -EIO; + SSDFS_ERR("log header is corrupted: " + "seg %llu, peb %llu, log_start_page %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + log_start_page); + goto fail_copy_desc_array; + } + +fail_copy_desc_array: + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) { + SSDFS_ERR("fail to read checked segment header: " + "seg %llu, peb %llu, pages_off %u, err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + page_off, err); + return err; + } + + return 0; +} + +/* + * ssdfs_peb_read_page_locked() - read locked page into PEB's cache + * @pebi: pointer on PEB object + * @req: request + * @page_off: page index + * + * This function tries to read locked page into PEB's cache. + */ +static +struct page *ssdfs_peb_read_page_locked(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + u32 page_off) +{ + struct ssdfs_fs_info *fsi; + struct page *page; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + + SSDFS_DBG("seg %llu, peb %llu, page_off %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + page_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + page = ssdfs_page_array_get_page_locked(&pebi->cache, page_off); + if (unlikely(IS_ERR_OR_NULL(page))) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to get page: index %u\n", + page_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (req->private.flags & SSDFS_REQ_READ_ONLY_CACHE) + return ERR_PTR(-ENOENT); + + page = ssdfs_page_array_grab_page(&pebi->cache, page_off); + if (unlikely(IS_ERR_OR_NULL(page))) { + SSDFS_ERR("fail to grab page: index %u\n", + page_off); + return NULL; + } + + if (PageUptodate(page) || PageDirty(page)) + goto finish_page_read; + + err = ssdfs_read_page_from_volume(fsi, pebi->peb_id, + page_off << PAGE_SHIFT, + page); + + /* + * ->readpage() unlock the page + * But caller expects that page is locked + */ + ssdfs_lock_page(page); + + if (unlikely(err)) + goto fail_read_page; + + SetPageUptodate(page); + } + +finish_page_read: + return page; + +fail_read_page: + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + SSDFS_ERR("fail to read locked page: " + "seg %llu, peb %llu, page_off %u, err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + page_off, err); + + return NULL; +} + +/* + * __ssdfs_decompress_blk_desc_fragment() - decompress blk desc fragment + * @pebi: pointer on PEB object + * @frag: fragment descriptor + * @area_offset: area offset in bytes + * @read_buffer: buffer to read [out] + * @buf_size: size of buffer in bytes + * + * This function tries to decompress block descriptor fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static +int __ssdfs_decompress_blk_desc_fragment(struct ssdfs_peb_info *pebi, + struct ssdfs_fragment_desc *frag, + u32 area_offset, + void *read_buffer, size_t buf_size) +{ + void *cdata_buf = NULL; + u32 frag_offset; + u16 compr_size; + u16 uncompr_size; + int compr_type = SSDFS_COMPR_NONE; + __le32 checksum = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!frag || !read_buffer); + + SSDFS_DBG("seg %llu, peb %llu, area_offset %u, buf_size %zu\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, area_offset, buf_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + frag_offset = le32_to_cpu(frag->offset); + compr_size = le16_to_cpu(frag->compr_size); + uncompr_size = le16_to_cpu(frag->uncompr_size); + + if (buf_size < uncompr_size) { + SSDFS_ERR("invalid request: buf_size %zu < uncompr_size %u\n", + buf_size, uncompr_size); + return -E2BIG; + } + + cdata_buf = ssdfs_read_kzalloc(compr_size, GFP_KERNEL); + if (!cdata_buf) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate cdata_buf\n"); + goto free_buf; + } + + err = ssdfs_unaligned_read_cache(pebi, + area_offset + frag_offset, + compr_size, + cdata_buf); + if (unlikely(err)) { + SSDFS_ERR("fail to read blk desc fragment: " + "frag_offset %u, compr_size %u, " + "err %d\n", + frag_offset, compr_size, err); + goto free_buf; + } + + switch (frag->type) { + case SSDFS_DATA_BLK_DESC_ZLIB: + compr_type = SSDFS_COMPR_ZLIB; + break; + + case SSDFS_DATA_BLK_DESC_LZO: + compr_type = SSDFS_COMPR_LZO; + break; + + default: + BUG(); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("compr_type %#x, cdata_buf %px, read_buffer %px, " + "buf_size %zu, compr_size %u, uncompr_size %u\n", + compr_type, cdata_buf, read_buffer, + buf_size, compr_size, uncompr_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_decompress(compr_type, + cdata_buf, read_buffer, + compr_size, uncompr_size); + if (unlikely(err)) { + SSDFS_ERR("fail to decompress fragment: " + "seg %llu, peb %llu, " + "compr_size %u, uncompr_size %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + compr_size, uncompr_size, + err); + goto free_buf; + } + + if (frag->flags & SSDFS_FRAGMENT_HAS_CSUM) { + checksum = ssdfs_crc32_le(read_buffer, uncompr_size); + if (checksum != frag->checksum) { + err = -EIO; + SSDFS_ERR("invalid checksum: " + "(calculated %#x, csum %#x)\n", + le32_to_cpu(checksum), + le32_to_cpu(frag->checksum)); + goto free_buf; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("BLK DESC FRAGMENT DUMP\n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + read_buffer, buf_size); + SSDFS_DBG("\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + +free_buf: + if (cdata_buf) + ssdfs_read_kfree(cdata_buf); + + return err; +} + +/* + * ssdfs_decompress_blk_desc_fragment() - decompress blk desc fragment + * @pebi: pointer on PEB object + * @frag: fragment descriptor + * @area_offset: area offset in bytes + * + * This function tries to decompress block descriptor fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_decompress_blk_desc_fragment(struct ssdfs_peb_info *pebi, + struct ssdfs_fragment_desc *frag, + u32 area_offset) +{ + struct ssdfs_peb_read_buffer *buf; + u16 uncompr_size; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!frag); + BUG_ON(!rwsem_is_locked(&pebi->read_buffer.lock)); + + SSDFS_DBG("seg %llu, peb %llu, area_offset %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + area_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + buf = &pebi->read_buffer.blk_desc; + uncompr_size = le16_to_cpu(frag->uncompr_size); + + if (buf->size < uncompr_size) { + err = ssdfs_peb_realloc_read_buffer(buf, uncompr_size); + if (unlikely(err)) { + SSDFS_ERR("fail to realloc read buffer: " + "old_size %zu, new_size %u, err %d\n", + buf->size, uncompr_size, err); + return err; + } + } + + return __ssdfs_decompress_blk_desc_fragment(pebi, frag, area_offset, + buf->ptr, buf->size); +} + +/* + * ssdfs_peb_decompress_blk_desc_fragment() - decompress blk desc fragment + * @pebi: pointer on PEB object + * @meta_desc: area descriptor + * @offset: offset in bytes to read block descriptor + * + * This function tries to decompress block descriptor fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_peb_decompress_blk_desc_fragment(struct ssdfs_peb_info *pebi, + struct ssdfs_metadata_descriptor *meta_desc, + u32 offset) +{ + struct ssdfs_area_block_table table; + size_t tbl_size = sizeof(struct ssdfs_area_block_table); + u32 area_offset; + u32 area_size; + u32 tbl_offset = 0; + u32 compr_bytes = 0; + u32 uncompr_bytes = 0; + u16 flags; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!meta_desc); + BUG_ON(!rwsem_is_locked(&pebi->read_buffer.lock)); + + SSDFS_DBG("seg %llu, peb %llu, offset %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + area_offset = le32_to_cpu(meta_desc->offset); + area_size = le32_to_cpu(meta_desc->size); + +try_read_area_block_table: + if ((tbl_offset + tbl_size) > area_size) { + SSDFS_ERR("area block table out of area: " + "tbl_offset %u, tbl_size %zu, area_size %u\n", + tbl_offset, tbl_size, area_size); + return -ERANGE; + } + + err = ssdfs_unaligned_read_cache(pebi, + area_offset + tbl_offset, + tbl_size, + &table); + if (unlikely(err)) { + SSDFS_ERR("fail to read area block table: " + "area_offset %u, area_size %u, " + "tbl_offset %u, tbl_size %zu, err %d\n", + area_offset, area_size, + tbl_offset, tbl_size, err); + return err; + } + + if (table.chain_hdr.magic != SSDFS_CHAIN_HDR_MAGIC) { + SSDFS_ERR("corrupted area block table: " + "magic (expected %#x, found %#x)\n", + SSDFS_CHAIN_HDR_MAGIC, + table.chain_hdr.magic); + return -EIO; + } + + switch (table.chain_hdr.type) { + case SSDFS_BLK_DESC_ZLIB_CHAIN_HDR: + case SSDFS_BLK_DESC_LZO_CHAIN_HDR: + /* expected type */ + break; + + default: + SSDFS_ERR("unexpected area block table's type %#x\n", + table.chain_hdr.type); + return -EIO; + } + + compr_bytes = le32_to_cpu(table.chain_hdr.compr_bytes); + uncompr_bytes += le32_to_cpu(table.chain_hdr.uncompr_bytes); + + if (offset < uncompr_bytes) { + struct ssdfs_fragment_desc *frag; + u16 fragments_count; + u16 frag_uncompr_size; + int i; + + uncompr_bytes -= le32_to_cpu(table.chain_hdr.uncompr_bytes); + fragments_count = le16_to_cpu(table.chain_hdr.fragments_count); + + for (i = 0; i < fragments_count; i++) { + frag = &table.blk[i]; + + if (frag->magic != SSDFS_FRAGMENT_DESC_MAGIC) { + SSDFS_ERR("corrupted area block table: " + "magic (expected %#x, found %#x)\n", + SSDFS_FRAGMENT_DESC_MAGIC, + frag->magic); + return -EIO; + } + + switch (frag->type) { + case SSDFS_DATA_BLK_DESC_ZLIB: + case SSDFS_DATA_BLK_DESC_LZO: + /* expected type */ + break; + + default: + SSDFS_ERR("unexpected fragment's type %#x\n", + frag->type); + return -EIO; + } + + frag_uncompr_size = le16_to_cpu(frag->uncompr_size); + uncompr_bytes += frag_uncompr_size; + + if (offset < uncompr_bytes) { + err = ssdfs_decompress_blk_desc_fragment(pebi, + frag, area_offset); + if (unlikely(err)) { + SSDFS_ERR("fail to decompress: " + "err %d\n", err); + return err; + } + + break; + } + } + + if (i >= fragments_count) { + SSDFS_ERR("corrupted area block table: " + "i %d >= fragments_count %u\n", + i, fragments_count); + return -EIO; + } + } else { + flags = le16_to_cpu(table.chain_hdr.flags); + + if (!(flags & SSDFS_MULTIPLE_HDR_CHAIN)) { + SSDFS_ERR("corrupted area block table: " + "invalid flags set %#x\n", + flags); + return -EIO; + } + + tbl_offset += compr_bytes; + goto try_read_area_block_table; + } + + return 0; +} + +/* + * ssdfs_peb_read_block_descriptor() - read block descriptor + * @pebi: pointer on PEB object + * @meta_desc: area descriptor + * @offset: offset in bytes to read block descriptor + * @blk_desc: block descriptor [out] + * + * This function tries to read block descriptor. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_peb_read_block_descriptor(struct ssdfs_peb_info *pebi, + struct ssdfs_metadata_descriptor *meta_desc, + u32 offset, + struct ssdfs_block_descriptor *blk_desc) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_peb_temp_read_buffers *buf; + size_t blk_desc_size = sizeof(struct ssdfs_block_descriptor); + int compr_type = SSDFS_COMPR_NONE; + u32 lower_bound = U32_MAX; + u32 upper_bound = U32_MAX; + u16 flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!meta_desc || !blk_desc); + + SSDFS_DBG("seg %llu, peb %llu, offset %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + flags = le16_to_cpu(meta_desc->check.flags); + + if ((flags & SSDFS_ZLIB_COMPRESSED) && (flags & SSDFS_LZO_COMPRESSED)) { + SSDFS_ERR("invalid set of flags: " + "flags %#x\n", + flags); + return -ERANGE; + } + + if (flags & SSDFS_ZLIB_COMPRESSED) + compr_type = SSDFS_COMPR_ZLIB; + else if (flags & SSDFS_LZO_COMPRESSED) + compr_type = SSDFS_COMPR_LZO; + + if (compr_type != SSDFS_COMPR_NONE) { + buf = &pebi->read_buffer; + + down_write(&buf->lock); + + if (!buf->blk_desc.ptr) { + err = -ENOMEM; + SSDFS_ERR("buffer is not allocated\n"); + goto finish_decompress; + } + + lower_bound = buf->blk_desc.offset; + upper_bound = buf->blk_desc.offset + buf->blk_desc.size; + + if (buf->blk_desc.offset >= U32_MAX) { + err = ssdfs_peb_decompress_blk_desc_fragment(pebi, + meta_desc, + offset); + if (unlikely(err)) { + SSDFS_ERR("fail to decompress: err %d\n", + err); + goto finish_decompress; + } + } else if (offset < lower_bound || offset >= upper_bound) { + err = ssdfs_peb_decompress_blk_desc_fragment(pebi, + meta_desc, + offset); + if (unlikely(err)) { + SSDFS_ERR("fail to decompress: err %d\n", + err); + goto finish_decompress; + } + } + +finish_decompress: + downgrade_write(&buf->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to decompress portion: " + "err %d\n", err); + goto finish_read_compressed_blk_desc; + } + + err = ssdfs_memcpy(blk_desc, + 0, blk_desc_size, + buf->blk_desc.ptr, + offset - lower_bound, buf->blk_desc.size, + blk_desc_size); + if (unlikely(err)) { + SSDFS_ERR("invalid buffer state: " + "offset %u, buffer (offset %u, size %zu)\n", + offset, + buf->blk_desc.offset, + buf->blk_desc.size); + goto finish_read_compressed_blk_desc; + } + +finish_read_compressed_blk_desc: + up_read(&buf->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to read compressed block descriptor: " + "offset %u, err %d\n", + offset, err); + return err; + } + } else { + err = ssdfs_unaligned_read_cache(pebi, offset, + blk_desc_size, + blk_desc); + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to read block descriptor: " + "seg %llu, peb %llu, " + "offset %u, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + offset, err); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } + } + + return 0; +} + +/* + * ssdfs_peb_find_block_descriptor() - find block descriptor + * @pebi: pointer on PEB object + * @req: request + * @array: array of area's descriptors + * @array_size: count of items into array + * @desc_off: descriptor of physical offset + * @blk_desc: block descriptor [out] + * + * This function tries to get block descriptor. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_peb_find_block_descriptor(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + struct ssdfs_metadata_descriptor *array, + size_t array_size, + struct ssdfs_phys_offset_descriptor *desc_off, + struct ssdfs_block_descriptor *blk_desc) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_blk_state_offset *blk_state; + struct page *page; + struct pagevec pvec; + size_t blk_desc_size = sizeof(struct ssdfs_block_descriptor); + int area_index; + u32 area_offset; + u32 area_size; + u32 blk_desc_off; + u64 calculated; + u32 page_off; + u32 pages_count; + u32 i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!req || !array || !desc_off || !blk_desc); + BUG_ON(array_size != SSDFS_SEG_HDR_DESC_MAX); + + SSDFS_DBG("seg %llu, peb %llu, " + "log_start_page %u, log_area %#x, " + "peb_migration_id %u, byte_offset %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + le16_to_cpu(desc_off->blk_state.log_start_page), + desc_off->blk_state.log_area, + desc_off->blk_state.peb_migration_id, + le32_to_cpu(desc_off->blk_state.byte_offset)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + blk_state = &desc_off->blk_state; + + err = ssdfs_peb_read_log_hdr_desc_array(pebi, req, + le16_to_cpu(blk_state->log_start_page), + array, array_size); + if (unlikely(err)) { + SSDFS_ERR("fail to read log's header desc array: " + "seg %llu, peb %llu, log_start_page %u, err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + le16_to_cpu(blk_state->log_start_page), + err); + return err; + } + + area_index = SSDFS_AREA_TYPE2INDEX(blk_state->log_area); + + if (area_index >= SSDFS_SEG_HDR_DESC_MAX) { + SSDFS_ERR("invalid area index %#x\n", area_index); + return -ERANGE; + } + + area_offset = le32_to_cpu(array[area_index].offset); + area_size = le32_to_cpu(array[area_index].size); + blk_desc_off = le32_to_cpu(blk_state->byte_offset); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("area_offset %u, blk_desc_off %u\n", + area_offset, blk_desc_off); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_read_block_descriptor(pebi, &array[area_index], + area_offset + blk_desc_off, + blk_desc); + if (err) { + page_off = (area_offset + blk_desc_off) / PAGE_SIZE; + pages_count = (area_size + PAGE_SIZE - 1) / PAGE_SIZE; + pages_count = min_t(u32, pages_count, PAGEVEC_SIZE); + + pagevec_init(&pvec); + + for (i = 0; i < pages_count; i++) { + page = ssdfs_page_array_grab_page(&pebi->cache, + page_off + i); + if (unlikely(IS_ERR_OR_NULL(page))) { + SSDFS_ERR("fail to grab page: index %u\n", + page_off); + return -ENOMEM; + } + + if (PageUptodate(page) || PageDirty(page)) + break; + + pagevec_add(&pvec, page); + } + + err = ssdfs_read_pagevec_from_volume(fsi, pebi->peb_id, + page_off << PAGE_SHIFT, + &pvec); + if (unlikely(err)) { + SSDFS_ERR("fail to read pagevec: " + "peb_id %llu, page_off %u, " + "pages_count %u, err %d\n", + pebi->peb_id, page_off, + pages_count, err); + return err; + } + + for (i = 0; i < pagevec_count(&pvec); i++) { + page = pvec.pages[i]; + + if (!page) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %d is NULL\n", i); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } + + pvec.pages[i] = NULL; + } + + pagevec_reinit(&pvec); + + err = ssdfs_peb_read_block_descriptor(pebi, &array[area_index], + area_offset + blk_desc_off, + blk_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to read block descriptor: " + "peb %llu, area_offset %u, byte_offset %u, " + "buf_size %zu, err %d\n", + pebi->peb_id, area_offset, blk_desc_off, + blk_desc_size, err); + return err; + } + } + + if (le64_to_cpu(blk_desc->ino) != req->extent.ino) { + SSDFS_ERR("seg %llu, peb %llu, " + "blk_desc->ino %llu != req->extent.ino %llu\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + le64_to_cpu(blk_desc->ino), req->extent.ino); + return -ERANGE; + } + + calculated = (u64)req->result.processed_blks * fsi->pagesize; + + if (calculated >= req->extent.data_bytes) { + SSDFS_ERR("calculated %llu >= req->extent.data_bytes %u\n", + calculated, req->extent.data_bytes); + return -ERANGE; + } + + return 0; +} + +/* + * __ssdfs_peb_get_block_state_desc() - get block state descriptor + * @pebi: pointer on PEB object + * @req: segment request + * @area_desc: area descriptor + * @desc: block state descriptor [out] + * @cno: checkpoint ID [out] + * @parent_snapshot: parent snapshot ID [out] + * + * This function tries to get block state descriptor. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +int __ssdfs_peb_get_block_state_desc(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + struct ssdfs_metadata_descriptor *area_desc, + struct ssdfs_block_state_descriptor *desc, + u64 *cno, u64 *parent_snapshot) +{ + struct ssdfs_fs_info *fsi; + size_t state_desc_size = sizeof(struct ssdfs_block_state_descriptor); + u32 area_offset; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!area_desc || !desc); + BUG_ON(!cno || !parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + area_offset = le32_to_cpu(area_desc->offset); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, area_offset %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, area_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_unaligned_read_cache(pebi, + area_offset, + state_desc_size, + desc); + if (err) { + SSDFS_DBG("cache hasn't requested page\n"); + + if (req->private.flags & SSDFS_REQ_READ_ONLY_CACHE) + return -ENOENT; + + err = ssdfs_unaligned_read_buffer(fsi, pebi->peb_id, + area_offset, + desc, state_desc_size); + if (unlikely(err)) { + SSDFS_ERR("fail to read buffer: " + "peb %llu, area_offset %u, " + "buf_size %zu, err %d\n", + pebi->peb_id, area_offset, + state_desc_size, err); + return err; + } + } + + if (desc->chain_hdr.magic != SSDFS_CHAIN_HDR_MAGIC) { + SSDFS_ERR("chain header magic invalid\n"); + return -EIO; + } + + if (desc->chain_hdr.type != SSDFS_BLK_STATE_CHAIN_HDR) { + SSDFS_ERR("chain header type invalid\n"); + return -EIO; + } + + if (le16_to_cpu(desc->chain_hdr.desc_size) != + sizeof(struct ssdfs_fragment_desc)) { + SSDFS_ERR("fragment descriptor size is invalid\n"); + return -EIO; + } + + *cno = le64_to_cpu(desc->cno); + *parent_snapshot = le64_to_cpu(desc->parent_snapshot); + + return 0; +} + +/* + * ssdfs_peb_get_block_state_desc() - get block state descriptor + * @pebi: pointer on PEB object + * @req: request + * @area_desc: area descriptor + * @desc: block state descriptor [out] + * + * This function tries to get block state descriptor. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_peb_get_block_state_desc(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + struct ssdfs_metadata_descriptor *area_desc, + struct ssdfs_block_state_descriptor *desc) +{ + u64 cno; + u64 parent_snapshot; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!req || !area_desc || !desc); + + SSDFS_DBG("seg %llu, peb %llu\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_peb_get_block_state_desc(pebi, req, area_desc, + desc, &cno, &parent_snapshot); + if (err == -ENOENT) { + SSDFS_DBG("cache hasn't requested page\n"); + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to get block state descriptor: " + "err %d\n", err); + return err; + } + + if (req->extent.cno != cno) { + SSDFS_ERR("req->extent.cno %llu != cno %llu\n", + req->extent.cno, cno); + return -EIO; + } + + if (req->extent.parent_snapshot != parent_snapshot) { + SSDFS_ERR("req->extent.parent_snapshot %llu != " + "parent_snapshot %llu\n", + req->extent.parent_snapshot, + parent_snapshot); + return -EIO; + } + + return 0; +} + +/* + * ssdfs_peb_get_fragment_desc_array() - get fragment descriptors array + * @pebi: pointer on PEB object + * @req: segment request + * @array_offset: offset of array from the log's beginning + * @array: array of fragment descriptors [out] + * @array_size: count of items into array + * + * This function tries to get array of fragment descriptors. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_peb_get_fragment_desc_array(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + u32 array_offset, + struct ssdfs_fragment_desc *array, + size_t array_size) +{ + struct ssdfs_fs_info *fsi; + u32 page_index, page_off; + struct page *page; + size_t frag_desc_size = sizeof(struct ssdfs_fragment_desc); + size_t array_bytes = frag_desc_size * array_size; + size_t size = array_bytes; + size_t read_size = 0; + u32 buf_off = 0; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!array); + + SSDFS_DBG("seg %llu, peb %llu, " + "array_offset %u, array_size %zu\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + array_offset, array_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + +read_next_page: + page_off = array_offset % PAGE_SIZE; + read_size = min_t(size_t, size, PAGE_SIZE - page_off); + + page_index = array_offset >> PAGE_SHIFT; + page = ssdfs_peb_read_page_locked(pebi, req, page_index); + if (IS_ERR_OR_NULL(page)) { + err = IS_ERR(page) ? PTR_ERR(page) : -ERANGE; + if (err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cache hasn't page: index %u\n", + page_index); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + SSDFS_ERR("fail to read locked page: index %u\n", + page_index); + } + return err; + } + + err = ssdfs_memcpy_from_page(array, buf_off, array_bytes, + page, page_off, PAGE_SIZE, + read_size); + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + + if (unlikely(err)) { + SSDFS_ERR("fail to copy: " + "page_off %u, buf_off %u, " + "read_size %zu, size %zu, err %d\n", + page_off, buf_off, + read_size, array_bytes, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + size -= read_size; + buf_off += read_size; + array_offset += read_size; + + if (size != 0) + goto read_next_page; + + return 0; +} + +/* + * ssdfs_peb_unaligned_read_fragment() - unaligned read fragment + * @pebi: pointer on PEB object + * @req: request + * @byte_off: offset in bytes from PEB's begin + * @size: size of fragment in bytes + * @buf: buffer pointer + * + * This function tries to read fragment. + * + * RETURN: + * [success] - fragment has been read successfully. + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_unaligned_read_fragment(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + u32 byte_off, + size_t size, + void *buf) +{ + u32 page_index, page_off; + struct page *page; + size_t read_size = 0; + u32 buf_off = 0; + size_t array_bytes = size; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(byte_off > pebi->pebc->parent_si->fsi->erasesize); + BUG_ON(size > PAGE_SIZE); + WARN_ON(size == 0); + BUG_ON(!buf); + + SSDFS_DBG("seg %llu, peb %llu, " + "offset %u, size %zu, buf %p\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + byte_off, size, buf); +#endif /* CONFIG_SSDFS_DEBUG */ + +read_next_page: + if (byte_off > pebi->pebc->parent_si->fsi->erasesize) { + SSDFS_ERR("offset %u > erasesize %u\n", + byte_off, + pebi->pebc->parent_si->fsi->erasesize); + return -ERANGE; + } + + page_off = byte_off % PAGE_SIZE; + read_size = min_t(size_t, size, PAGE_SIZE - page_off); + + page_index = byte_off >> PAGE_SHIFT; + page = ssdfs_peb_read_page_locked(pebi, req, page_index); + if (IS_ERR_OR_NULL(page)) { + err = IS_ERR(page) ? PTR_ERR(page) : -ERANGE; + if (err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cache hasn't page: page_off %u\n", + page_off); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + SSDFS_ERR("fail to read locked page: index %u\n", + page_off); + } + return err; + } + + err = ssdfs_memcpy_from_page(buf, buf_off, array_bytes, + page, page_off, PAGE_SIZE, + read_size); + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + + if (unlikely(err)) { + SSDFS_ERR("fail to copy: " + "page_off %u, buf_off %u, " + "read_size %zu, size %zu, err %d\n", + page_off, buf_off, + read_size, array_bytes, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + size -= read_size; + buf_off += read_size; + byte_off += read_size; + + if (size != 0) + goto read_next_page; + + return 0; +} + +/* + * ssdfs_read_checked_fragment() - read and check data fragment + * @pebi: pointer on PEB object + * @req: segment request + * @area_offset: offset in bytes from log's begin + * @sequence_id: fragment identification number + * @desc: fragment descriptor + * @cdata_buf: compressed data buffer + * @page: buffer for uncompressed data + * + * This function reads data fragment, uncompressed it + * (if neccessary) and check fragment's checksum. + * + * RETURN: + * [success] - fragment has been read successfully. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal calculation error. + * %-EIO - I/O error. + */ +static +int ssdfs_read_checked_fragment(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + u32 area_offset, + int sequence_id, + struct ssdfs_fragment_desc *desc, + void *cdata_buf, + struct page *page) +{ + struct ssdfs_fs_info *fsi; + u32 pebsize; + u32 offset; + size_t compr_size, uncompr_size; + bool is_compressed; + void *kaddr; + __le32 checksum; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!desc || !cdata_buf || !page); + + SSDFS_DBG("seg %llu, peb %llu, area_offset %u, sequence_id %u, " + "offset %u, compr_size %u, uncompr_size %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + area_offset, + le16_to_cpu(desc->sequence_id), + le32_to_cpu(desc->offset), + le16_to_cpu(desc->compr_size), + le16_to_cpu(desc->uncompr_size)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + if (sequence_id != le16_to_cpu(desc->sequence_id)) { + SSDFS_ERR("sequence_id %d != desc->sequence_id %u\n", + sequence_id, le16_to_cpu(desc->sequence_id)); + return -EINVAL; + } + + pebsize = fsi->pages_per_peb * fsi->pagesize; + offset = area_offset + le32_to_cpu(desc->offset); + compr_size = le16_to_cpu(desc->compr_size); + uncompr_size = le16_to_cpu(desc->uncompr_size); + + if (offset >= pebsize) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "desc->offset %u >= pebsize %u\n", + offset, pebsize); + return -EIO; + } + + if (uncompr_size > PAGE_SIZE) { + SSDFS_ERR("uncompr_size %zu > PAGE_SIZE %lu\n", + uncompr_size, PAGE_SIZE); + return -ERANGE; + } + + is_compressed = (desc->type == SSDFS_FRAGMENT_ZLIB_BLOB || + desc->type == SSDFS_FRAGMENT_LZO_BLOB); + + if (desc->type == SSDFS_FRAGMENT_UNCOMPR_BLOB) { + if (compr_size != uncompr_size) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "compr_size %zu != uncompr_size %zu\n", + compr_size, uncompr_size); + return -EIO; + } + + if (uncompr_size > PAGE_SIZE) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "uncompr_size %zu > PAGE_CACHE %lu\n", + uncompr_size, PAGE_SIZE); + return -EIO; + } + + kaddr = kmap_local_page(page); + err = ssdfs_peb_unaligned_read_fragment(pebi, req, offset, + uncompr_size, + kaddr); + if (!err) + checksum = ssdfs_crc32_le(kaddr, uncompr_size); + flush_dcache_page(page); + kunmap_local(kaddr); + + if (err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cache hasn't requested page: " + "seg %llu, peb %llu, offset %u, size %zu\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + offset, uncompr_size); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to read fragment: " + "seg %llu, peb %llu, offset %u, size %zu, " + "err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + offset, uncompr_size, err); + return err; + } + } else if (is_compressed) { + int type; + + err = ssdfs_peb_unaligned_read_fragment(pebi, req, offset, + compr_size, + cdata_buf); + if (err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cache hasn't requested page: " + "seg %llu, peb %llu, offset %u, size %zu\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + offset, uncompr_size); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to read fragment: " + "seg %llu, peb %llu, offset %u, size %zu, " + "err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + offset, compr_size, err); + return err; + } + + if (desc->type == SSDFS_FRAGMENT_ZLIB_BLOB) + type = SSDFS_COMPR_ZLIB; + else if (desc->type == SSDFS_FRAGMENT_LZO_BLOB) + type = SSDFS_COMPR_LZO; + else + BUG(); + + kaddr = kmap_local_page(page); + err = ssdfs_decompress(type, cdata_buf, kaddr, + compr_size, uncompr_size); + if (!err) + checksum = ssdfs_crc32_le(kaddr, uncompr_size); + flush_dcache_page(page); + kunmap_local(kaddr); + + if (unlikely(err)) { + SSDFS_ERR("fail to decompress fragment: " + "seg %llu, peb %llu, offset %u, " + "compr_size %zu, uncompr_size %zu" + "err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + offset, compr_size, uncompr_size, err); + return err; + } + } else + BUG(); + + if (desc->checksum != checksum) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "desc->checksum %#x != checksum %#x\n", + le32_to_cpu(desc->checksum), + le32_to_cpu(checksum)); + return -EIO; + } + + return 0; +} + +/* + * ssdfs_peb_read_main_area_page() - read main area's page + * @pebi: pointer on PEB object + * @req: request + * @array: array of area's descriptors + * @array_size: count of items into array + * @blk_state_off: block state offset + * + * This function tries to read main area's page. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_peb_read_main_area_page(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + struct ssdfs_metadata_descriptor *array, + size_t array_size, + struct ssdfs_blk_state_offset *blk_state_off) +{ + struct ssdfs_fs_info *fsi; + u8 area_index; + u32 area_offset; + u32 data_bytes; + u32 read_bytes; + u32 byte_offset; + int page_index; + struct page *page; + void *kaddr; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!req || !array || !blk_state_off); + + SSDFS_DBG("seg %llu, peb %llu\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + area_index = SSDFS_AREA_TYPE2INDEX(blk_state_off->log_area); + if (area_index >= array_size) { + SSDFS_ERR("area_index %u >= array_size %zu\n", + area_index, array_size); + return -EIO; + } + + read_bytes = req->result.processed_blks * fsi->pagesize; + + if (read_bytes > req->extent.data_bytes) { + SSDFS_ERR("read_bytes %u > req->extent.data_bytes %u\n", + read_bytes, req->extent.data_bytes); + return -ERANGE; + } else if (read_bytes == req->extent.data_bytes) { + SSDFS_WARN("read_bytes %u == req->extent.data_bytes %u\n", + read_bytes, req->extent.data_bytes); + return -ERANGE; + } + + data_bytes = req->extent.data_bytes - read_bytes; + + if (fsi->pagesize > PAGE_SIZE) + data_bytes = min_t(u32, data_bytes, fsi->pagesize); + else + data_bytes = min_t(u32, data_bytes, PAGE_SIZE); + + area_offset = le32_to_cpu(array[area_index].offset); + byte_offset = le32_to_cpu(blk_state_off->byte_offset); + + page_index = (int)(read_bytes >> PAGE_SHIFT); + BUG_ON(page_index >= U16_MAX); + + if (req->private.flags & SSDFS_REQ_PREPARE_DIFF) { + if (pagevec_count(&req->result.old_state) <= page_index) { + SSDFS_ERR("page_index %d >= pagevec_count %u\n", + page_index, + pagevec_count(&req->result.old_state)); + return -EIO; + } + + page = req->result.old_state.pages[page_index]; + } else { + if (pagevec_count(&req->result.pvec) <= page_index) { + SSDFS_ERR("page_index %d >= pagevec_count %u\n", + page_index, + pagevec_count(&req->result.pvec)); + return -EIO; + } + + page = req->result.pvec.pages[page_index]; + } + + kaddr = kmap_local_page(page); + err = ssdfs_peb_unaligned_read_fragment(pebi, req, + area_offset + byte_offset, + data_bytes, + kaddr); + flush_dcache_page(page); + kunmap_local(kaddr); + + if (unlikely(err)) { + SSDFS_ERR("fail to read page: " + "seg %llu, peb %llu, offset %u, size %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + area_offset + byte_offset, data_bytes, err); + return err; + } + + return 0; +} + +/* + * ssdfs_peb_read_area_fragment() - read area's fragment + * @pebi: pointer on PEB object + * @req: request + * @array: array of area's descriptors + * @array_size: count of items into array + * @blk_state_off: block state offset + * + * This function tries to read area's fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_peb_read_area_fragment(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + struct ssdfs_metadata_descriptor *array, + size_t array_size, + struct ssdfs_blk_state_offset *blk_state_off) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_block_state_descriptor found_blk_state; + size_t state_desc_size = sizeof(struct ssdfs_block_state_descriptor); + struct ssdfs_fragment_desc *frag_descs = NULL; + size_t frag_desc_size = sizeof(struct ssdfs_fragment_desc); + void *cdata_buf = NULL; + u8 area_index; + u32 area_offset; + u32 frag_desc_offset; + u32 full_offset; + u32 data_bytes; + u32 read_bytes; + int page_index; + u16 fragments; + u32 uncompr_bytes; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!req || !array || !blk_state_off); + + SSDFS_DBG("seg %llu, peb %llu\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + area_index = SSDFS_AREA_TYPE2INDEX(blk_state_off->log_area); + if (area_index >= array_size) { + SSDFS_ERR("area_index %u >= array_size %zu\n", + area_index, array_size); + return -EIO; + } + + read_bytes = req->result.processed_blks * fsi->pagesize; + + if (read_bytes > req->extent.data_bytes) { + SSDFS_ERR("read_bytes %u > req->extent.data_bytes %u\n", + read_bytes, req->extent.data_bytes); + return -ERANGE; + } else if (read_bytes == req->extent.data_bytes) { + SSDFS_WARN("read_bytes %u == req->extent.data_bytes %u\n", + read_bytes, req->extent.data_bytes); + return -ERANGE; + } + + data_bytes = req->extent.data_bytes - read_bytes; + + if (fsi->pagesize > PAGE_SIZE) + data_bytes = min_t(u32, data_bytes, fsi->pagesize); + else + data_bytes = min_t(u32, data_bytes, PAGE_SIZE); + + err = ssdfs_peb_get_block_state_desc(pebi, req, &array[area_index], + &found_blk_state); + if (err == -ENOENT) { + SSDFS_DBG("cache hasn't requested page\n"); + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to get block state descriptor: " + "area_offset %u, err %d\n", + le32_to_cpu(array[area_index].offset), + err); + return err; + } + + uncompr_bytes = le32_to_cpu(found_blk_state.chain_hdr.uncompr_bytes); + if (data_bytes > uncompr_bytes) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("data_bytes %u > uncompr_bytes %u\n", + data_bytes, uncompr_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + req->extent.data_bytes -= data_bytes - uncompr_bytes; + data_bytes = uncompr_bytes; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("CORRECTED VALUE: data_bytes %u\n", + data_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + fragments = le16_to_cpu(found_blk_state.chain_hdr.fragments_count); + if (fragments == 0 || fragments > SSDFS_FRAGMENTS_CHAIN_MAX) { + SSDFS_ERR("invalid fragments count %u\n", fragments); + return -EIO; + } + + frag_descs = ssdfs_read_kcalloc(fragments, frag_desc_size, GFP_KERNEL); + if (!frag_descs) { + SSDFS_ERR("fail to allocate fragment descriptors array\n"); + return -ENOMEM; + } + + area_offset = le32_to_cpu(array[area_index].offset); + frag_desc_offset = le32_to_cpu(blk_state_off->byte_offset); + frag_desc_offset += state_desc_size; + full_offset = area_offset + frag_desc_offset; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("area_offset %u, blk_state_off->byte_offset %u, " + "state_desc_size %zu, frag_desc_offset %u, " + "full_offset %u\n", + area_offset, le32_to_cpu(blk_state_off->byte_offset), + state_desc_size, frag_desc_offset, full_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_get_fragment_desc_array(pebi, req, full_offset, + frag_descs, fragments); + if (err == -ENOENT) { + SSDFS_DBG("cache hasn't requested page\n"); + goto free_bufs; + } else if (unlikely(err)) { + SSDFS_ERR("fail to get fragment descriptor array: " + "offset %u, fragments %u, err %d\n", + full_offset, fragments, err); + goto free_bufs; + } + + cdata_buf = ssdfs_read_kzalloc(PAGE_SIZE, GFP_KERNEL); + if (!cdata_buf) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate cdata_buf\n"); + goto free_bufs; + } + + page_index = (int)(read_bytes >> PAGE_SHIFT); + BUG_ON(page_index >= U16_MAX); + + for (i = 0; i < fragments; i++) { + struct pagevec *pvec; + struct page *page; + struct ssdfs_fragment_desc *cur_desc; + u32 compr_size; + + if (req->private.flags & SSDFS_REQ_PREPARE_DIFF) { + pvec = &req->result.old_state; + + if (pagevec_count(pvec) <= i) { + err = -EIO; + SSDFS_ERR("page_index %d >= pagevec_count %u\n", + i, pagevec_count(pvec)); + goto free_bufs; + } + } else { + pvec = &req->result.pvec; + + if (pagevec_count(pvec) <= (page_index + i)) { + err = -EIO; + SSDFS_ERR("page_index %d >= pagevec_count %u\n", + page_index + i, + pagevec_count(pvec)); + goto free_bufs; + } + } + + cur_desc = &frag_descs[i]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("FRAGMENT DESC DUMP: index %d\n", i); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + cur_desc, + sizeof(struct ssdfs_fragment_desc)); + SSDFS_DBG("\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (cur_desc->magic != SSDFS_FRAGMENT_DESC_MAGIC) { + err = -EIO; + SSDFS_ERR("invalid fragment descriptor magic\n"); + goto free_bufs; + } + + if (cur_desc->type < SSDFS_FRAGMENT_UNCOMPR_BLOB || + cur_desc->type > SSDFS_FRAGMENT_LZO_BLOB) { + err = -EIO; + SSDFS_ERR("invalid fragment descriptor type\n"); + goto free_bufs; + } + + if (cur_desc->sequence_id != i) { + err = -EIO; + SSDFS_ERR("invalid fragment's sequence id\n"); + goto free_bufs; + } + + compr_size = le16_to_cpu(cur_desc->compr_size); + + if (compr_size > PAGE_SIZE) { + err = -EIO; + SSDFS_ERR("compr_size %u > PAGE_SIZE %lu\n", + compr_size, PAGE_SIZE); + goto free_bufs; + } + + if (req->private.flags & SSDFS_REQ_PREPARE_DIFF) + page = pvec->pages[i]; + else + page = pvec->pages[page_index + i]; + + err = ssdfs_read_checked_fragment(pebi, req, area_offset, + i, cur_desc, + cdata_buf, + page); + if (err == -ENOENT) { + SSDFS_DBG("cache hasn't requested page\n"); + goto free_bufs; + } else if (unlikely(err)) { + SSDFS_ERR("fail to read fragment: " + "index %d, err %d\n", + i, err); + goto free_bufs; + } + } + +free_bufs: + ssdfs_read_kfree(frag_descs); + ssdfs_read_kfree(cdata_buf); + + return err; +} + +/* + * ssdfs_peb_read_base_block_state() - read base state of block + * @pebi: pointer on PEB object + * @req: request + * @array: array of area's descriptors + * @array_size: count of items into array + * @offset: block state offset + * + * This function tries to extract a base state of block. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-ENOENT - cache hasn't requested page. + */ +static +int ssdfs_peb_read_base_block_state(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + struct ssdfs_metadata_descriptor *array, + size_t array_size, + struct ssdfs_blk_state_offset *offset) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!req || !array || !offset); + BUG_ON(array_size != SSDFS_SEG_HDR_DESC_MAX); + + SSDFS_DBG("seg %llu, peb %llu\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_read_log_hdr_desc_array(pebi, req, + le16_to_cpu(offset->log_start_page), + array, array_size); + if (err == -ENOENT) { + SSDFS_DBG("cache hasn't requested page\n"); + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to read log's header desc array: " + "seg %llu, peb %llu, log_start_page %u, err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + le16_to_cpu(offset->log_start_page), + err); + return err; + } + + if (offset->log_area == SSDFS_LOG_MAIN_AREA) { + err = ssdfs_peb_read_main_area_page(pebi, req, + array, array_size, + offset); + if (err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cache hasn't requested page: " + "seg %llu, peb %llu, " + "ino %llu, logical_offset %llu\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + req->extent.ino, + req->extent.logical_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to read main area's page: " + "seg %llu, peb %llu, " + "ino %llu, logical_offset %llu, " + "err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + req->extent.ino, + req->extent.logical_offset, + err); + return err; + } + } else { + err = ssdfs_peb_read_area_fragment(pebi, req, + array, array_size, + offset); + if (err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cache hasn't requested page: " + "seg %llu, peb %llu, " + "ino %llu, logical_offset %llu\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + req->extent.ino, + req->extent.logical_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to read area's fragment: " + "seg %llu, peb %llu, " + "ino %llu, logical_offset %llu, " + "err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + req->extent.ino, + req->extent.logical_offset, + err); + return err; + } + } + + return 0; +} + +/* + * ssdfs_peb_read_area_diff_fragment() - read diff fragment + * @pebi: pointer on PEB object + * @req: request + * @array: array of area's descriptors + * @array_size: count of items into array + * @blk_state_off: block state offset + * @page: page with current diff blob + * @sequence_id: sequence ID of the fragment + * + * This function tries to extract a diff blob into @page. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_peb_read_area_diff_fragment(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + struct ssdfs_metadata_descriptor *array, + size_t array_size, + struct ssdfs_blk_state_offset *blk_state_off, + struct page *page, + int sequence_id) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_block_state_descriptor found_blk_state; + size_t state_desc_size = sizeof(struct ssdfs_block_state_descriptor); + struct ssdfs_fragment_desc frag_desc = {0}; + void *cdata_buf = NULL; + u8 area_index; + u32 area_offset; + u32 frag_desc_offset; + u32 full_offset; + u16 fragments; + u64 cno; + u64 parent_snapshot; + u32 compr_size; +#ifdef CONFIG_SSDFS_DEBUG + void *kaddr; +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!array || !blk_state_off || !page); + + SSDFS_DBG("seg %llu, peb %llu, sequence_id %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, sequence_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + err = ssdfs_peb_read_log_hdr_desc_array(pebi, req, + le16_to_cpu(blk_state_off->log_start_page), + array, array_size); + if (unlikely(err)) { + SSDFS_ERR("fail to read log's header desc array: " + "seg %llu, peb %llu, log_start_page %u, err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + le16_to_cpu(blk_state_off->log_start_page), + err); + return err; + } + + area_index = SSDFS_AREA_TYPE2INDEX(blk_state_off->log_area); + if (area_index >= array_size) { + SSDFS_ERR("area_index %u >= array_size %zu\n", + area_index, array_size); + return -EIO; + } + + err = __ssdfs_peb_get_block_state_desc(pebi, req, &array[area_index], + &found_blk_state, + &cno, &parent_snapshot); + if (unlikely(err)) { + SSDFS_ERR("fail to get block state descriptor: " + "area_offset %u, err %d\n", + le32_to_cpu(array[area_index].offset), + err); + return err; + } + + fragments = le16_to_cpu(found_blk_state.chain_hdr.fragments_count); + if (fragments == 0 || fragments > 1) { + SSDFS_ERR("invalid fragments count %u\n", fragments); + return -EIO; + } + + area_offset = le32_to_cpu(array[area_index].offset); + frag_desc_offset = le32_to_cpu(blk_state_off->byte_offset); + frag_desc_offset += state_desc_size; + full_offset = area_offset + frag_desc_offset; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("area_offset %u, blk_state_off->byte_offset %u, " + "state_desc_size %zu, frag_desc_offset %u, " + "full_offset %u\n", + area_offset, le32_to_cpu(blk_state_off->byte_offset), + state_desc_size, frag_desc_offset, full_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_get_fragment_desc_array(pebi, req, full_offset, + &frag_desc, 1); + if (err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cache hasn't requested page: " + "seg %llu, peb %llu\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to get fragment descriptor array: " + "offset %u, fragments %u, err %d\n", + full_offset, fragments, err); + return err; + } + + cdata_buf = ssdfs_read_kzalloc(PAGE_SIZE, GFP_KERNEL); + if (!cdata_buf) { + SSDFS_ERR("fail to allocate cdata_buf\n"); + return -ENOMEM; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("FRAGMENT DESC DUMP: index %d\n", sequence_id); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + &frag_desc, + sizeof(struct ssdfs_fragment_desc)); + SSDFS_DBG("\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (frag_desc.magic != SSDFS_FRAGMENT_DESC_MAGIC) { + err = -EIO; + SSDFS_ERR("invalid fragment descriptor magic\n"); + goto free_bufs; + } + + if (frag_desc.type < SSDFS_FRAGMENT_UNCOMPR_BLOB || + frag_desc.type > SSDFS_FRAGMENT_LZO_BLOB) { + err = -EIO; + SSDFS_ERR("invalid fragment descriptor type\n"); + goto free_bufs; + } + + compr_size = le16_to_cpu(frag_desc.compr_size); + + if (compr_size > PAGE_SIZE) { + err = -EIO; + SSDFS_ERR("compr_size %u > PAGE_SIZE %lu\n", + compr_size, PAGE_SIZE); + goto free_bufs; + } + + err = ssdfs_read_checked_fragment(pebi, req, area_offset, + 0, &frag_desc, + cdata_buf, + page); + if (err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cache hasn't requested page: " + "seg %llu, peb %llu\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto free_bufs; + } else if (unlikely(err)) { + SSDFS_ERR("fail to read fragment: " + "index %d, err %d\n", + sequence_id, err); + goto free_bufs; + } + +#ifdef CONFIG_SSDFS_DEBUG + kaddr = kmap_local_page(page); + SSDFS_DBG("DIFF DUMP: index %d\n", + sequence_id); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, + PAGE_SIZE); + SSDFS_DBG("\n"); + kunmap_local(kaddr); +#endif /* CONFIG_SSDFS_DEBUG */ + +free_bufs: + if (cdata_buf) + ssdfs_read_kfree(cdata_buf); + + return err; +} + +/* + * ssdfs_peb_read_diff_block_state() - read diff blob + * @pebi: pointer on PEB object + * @req: request + * @array: array of area's descriptors + * @array_size: count of items into array + * @offset: block state offset + * + * This function tries to extract a diff blob. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_peb_read_diff_block_state(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + struct ssdfs_metadata_descriptor *array, + size_t array_size, + struct ssdfs_blk_state_offset *offset) +{ + struct page *page = NULL; + int sequence_id; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!array || !offset); + BUG_ON(array_size != SSDFS_SEG_HDR_DESC_MAX); + + SSDFS_DBG("seg %llu, peb %llu, pagevec_size %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pagevec_count(&req->result.diffs)); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = ssdfs_request_allocate_and_add_diff_page(req); + if (unlikely(IS_ERR_OR_NULL(page))) { + err = !page ? -ENOMEM : PTR_ERR(page); + SSDFS_ERR("fail to add pagevec page: err %d\n", + err); + return err; + } + + ssdfs_lock_page(page); + + sequence_id = pagevec_count(&req->result.diffs) - 1; + err = ssdfs_peb_read_area_diff_fragment(pebi, req, array, array_size, + offset, page, sequence_id); + if (err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cache hasn't requested page: " + "seg %llu, peb %llu\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to read area's fragment: " + "seg %llu, peb %llu, " + "err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); + return err; + } + + return 0; +} + +/* + * ssdfs_blk_desc_buffer_init() - init block descriptor buffer + * @pebc: pointer on PEB container + * @req: request + * @desc_off: block descriptor offset + * @pos: offset position + * @array: array of area's descriptors + * @array_size: count of items into array + * + * This function tries to init block descriptor buffer. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +int ssdfs_blk_desc_buffer_init(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req, + struct ssdfs_phys_offset_descriptor *desc_off, + struct ssdfs_offset_position *pos, + struct ssdfs_metadata_descriptor *array, + size_t array_size) +{ + struct ssdfs_peb_info *pebi = NULL; + struct ssdfs_blk2off_table *table; + u8 peb_migration_id; + u16 logical_blk; +#ifdef CONFIG_SSDFS_DEBUG + struct ssdfs_blk_state_offset *state_off; + int j; +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!desc_off || !pos); + + SSDFS_DBG("seg %llu, peb_index %u, blk_desc.status %#x\n", + pebc->parent_si->seg_id, pebc->peb_index, + pos->blk_desc.status); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (pos->blk_desc.status) { + case SSDFS_BLK_DESC_BUF_UNKNOWN_STATE: + case SSDFS_BLK_DESC_BUF_ALLOCATED: + peb_migration_id = desc_off->blk_state.peb_migration_id; + + pebi = ssdfs_get_peb_for_migration_id(pebc, peb_migration_id); + if (IS_ERR_OR_NULL(pebi)) { + err = pebi == NULL ? -ERANGE : PTR_ERR(pebi); + SSDFS_ERR("fail to get PEB object: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + goto finish_blk_desc_buffer_init; + } + + err = ssdfs_peb_find_block_descriptor(pebi, req, + array, array_size, + desc_off, + &pos->blk_desc.buf); + if (unlikely(err)) { + SSDFS_ERR("fail to find block descriptor: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + goto finish_blk_desc_buffer_init; + } + + pos->blk_desc.status = SSDFS_BLK_DESC_BUF_INITIALIZED; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("status %#x, ino %llu, " + "logical_offset %u, peb_index %u, peb_page %u\n", + pos->blk_desc.status, + le64_to_cpu(pos->blk_desc.buf.ino), + le32_to_cpu(pos->blk_desc.buf.logical_offset), + le16_to_cpu(pos->blk_desc.buf.peb_index), + le16_to_cpu(pos->blk_desc.buf.peb_page)); + + for (j = 0; j < SSDFS_BLK_STATE_OFF_MAX; j++) { + state_off = &pos->blk_desc.buf.state[j]; + + SSDFS_DBG("BLK STATE OFFSET %d: " + "log_start_page %u, log_area %#x, " + "byte_offset %u, peb_migration_id %u\n", + j, + le16_to_cpu(state_off->log_start_page), + state_off->log_area, + le32_to_cpu(state_off->byte_offset), + state_off->peb_migration_id); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + table = pebi->pebc->parent_si->blk2off_table; + logical_blk = req->place.start.blk_index + + req->result.processed_blks; + + err = ssdfs_blk2off_table_blk_desc_init(table, logical_blk, + pos); + if (unlikely(err)) { + SSDFS_ERR("fail to init blk desc: " + "logical_blk %u, err %d\n", + logical_blk, err); + goto finish_blk_desc_buffer_init; + } + break; + + case SSDFS_BLK_DESC_BUF_INITIALIZED: + /* do nothing */ + SSDFS_DBG("descriptor buffer is initialized already\n"); + break; + + default: + SSDFS_ERR("pos->blk_desc.status %#x\n", + pos->blk_desc.status); + BUG(); + } + +finish_blk_desc_buffer_init: + return err; +} + +/* + * ssdfs_peb_read_block_state() - read state of the block + * @pebc: pointer on PEB container + * @req: request + * @desc_off: block descriptor offset + * @pos: offset position + * @array: array of area's descriptors + * @array_size: count of items into array + * + * This function tries to read block state. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - I/O error. + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +int ssdfs_peb_read_block_state(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req, + struct ssdfs_phys_offset_descriptor *desc_off, + struct ssdfs_offset_position *pos, + struct ssdfs_metadata_descriptor *array, + size_t array_size) +{ + struct ssdfs_peb_info *pebi = NULL; + struct ssdfs_blk_state_offset *offset = NULL; + u64 ino; + u32 logical_offset; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!req || !array); + BUG_ON(array_size != SSDFS_SEG_HDR_DESC_MAX); + + SSDFS_DBG("seg %llu, peb_index %u, processed_blks %d\n", + pebc->parent_si->seg_id, pebc->peb_index, + req->result.processed_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_blk_desc_buffer_init(pebc, req, desc_off, pos, + array, array_size); + if (unlikely(err)) { + SSDFS_ERR("fail to init blk desc buffer: err %d\n", + err); + goto finish_prepare_pvec; + } + + ino = le64_to_cpu(pos->blk_desc.buf.ino); + logical_offset = le32_to_cpu(pos->blk_desc.buf.logical_offset); + + offset = &pos->blk_desc.buf.state[0]; + + if (IS_SSDFS_BLK_STATE_OFFSET_INVALID(offset)) { + err = -ERANGE; + SSDFS_ERR("block state offset invalid\n"); + SSDFS_ERR("log_start_page %u, log_area %u, " + "peb_migration_id %u, byte_offset %u\n", + le16_to_cpu(offset->log_start_page), + offset->log_area, + offset->peb_migration_id, + le32_to_cpu(offset->byte_offset)); + goto finish_prepare_pvec; + } + + pebi = ssdfs_get_peb_for_migration_id(pebc, offset->peb_migration_id); + if (IS_ERR_OR_NULL(pebi)) { + err = pebi == NULL ? -ERANGE : PTR_ERR(pebi); + SSDFS_ERR("fail to get PEB object: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + goto finish_prepare_pvec; + } + +#ifdef CONFIG_SSDFS_DEBUG + DEBUG_BLOCK_DESCRIPTOR(pebi->pebc->parent_si->seg_id, + pebi->peb_id, &pos->blk_desc.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_read_base_block_state(pebi, req, + array, array_size, + offset); + if (err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to read block state: " + "seg %llu, peb_index %u, ino %llu, " + "logical_offset %u\n", + pebc->parent_si->seg_id, + pebc->peb_index, + ino, logical_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_prepare_pvec; + } else if (unlikely(err)) { + SSDFS_ERR("fail to read block state: " + "seg %llu, peb_index %u, ino %llu, " + "logical_offset %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, + ino, logical_offset, + err); + goto finish_prepare_pvec; + } + + for (i = 0; i < SSDFS_BLK_STATE_OFF_MAX; i++) { + offset = &pos->blk_desc.buf.state[i]; + + if (i == 0) { + /* + * base block state has been read already + */ + continue; + } else { + if (IS_SSDFS_BLK_STATE_OFFSET_INVALID(offset)) + goto finish_prepare_pvec; + + pebi = ssdfs_get_peb_for_migration_id(pebc, + offset->peb_migration_id); + if (IS_ERR_OR_NULL(pebi)) { + err = pebi == NULL ? -ERANGE : PTR_ERR(pebi); + SSDFS_ERR("fail to get PEB object: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + goto finish_prepare_pvec; + } + + err = ssdfs_peb_read_diff_block_state(pebi, + req, + array, + array_size, + offset); + } + + if (err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cache hasn't requested page: " + "seg %llu, peb_index %u, ino %llu, " + "logical_offset %u\n", + pebc->parent_si->seg_id, + pebc->peb_index, + ino, logical_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_prepare_pvec; + } else if (unlikely(err)) { + SSDFS_ERR("fail to read block state: " + "seg %llu, peb_index %u, ino %llu, " + "logical_offset %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, + ino, logical_offset, err); + goto finish_prepare_pvec; + } + } + +finish_prepare_pvec: + if (err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to read the block state: " + "seg %llu, peb_index %u, ino %llu, " + "logical_offset %u, peb_index %u, " + "peb_page %u\n", + pebc->parent_si->seg_id, + pebc->peb_index, + le64_to_cpu(pos->blk_desc.buf.ino), + le32_to_cpu(pos->blk_desc.buf.logical_offset), + le16_to_cpu(pos->blk_desc.buf.peb_index), + le16_to_cpu(pos->blk_desc.buf.peb_page)); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_read_block_state; + } else if (unlikely(err)) { + SSDFS_ERR("fail to read the block state: " + "seg %llu, peb_index %u, ino %llu, " + "logical_offset %u, peb_index %u, " + "peb_page %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, + le64_to_cpu(pos->blk_desc.buf.ino), + le32_to_cpu(pos->blk_desc.buf.logical_offset), + le16_to_cpu(pos->blk_desc.buf.peb_index), + le16_to_cpu(pos->blk_desc.buf.peb_page), + err); + goto finish_read_block_state; + } + + if (pagevec_count(&req->result.diffs) == 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("diffs pagevec is empty: " + "seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_read_block_state; + } + + switch (pebi->pebc->peb_type) { + case SSDFS_MAPTBL_DATA_PEB_TYPE: + err = ssdfs_user_data_apply_diffs(pebi, req); + if (unlikely(err)) { + SSDFS_ERR("fail to apply diffs on base state: " + "seg %llu, peb_index %u, ino %llu, " + "logical_offset %u, peb_index %u, " + "peb_page %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, + le64_to_cpu(pos->blk_desc.buf.ino), + le32_to_cpu(pos->blk_desc.buf.logical_offset), + le16_to_cpu(pos->blk_desc.buf.peb_index), + le16_to_cpu(pos->blk_desc.buf.peb_page), + err); + goto finish_read_block_state; + } + break; + + case SSDFS_MAPTBL_LNODE_PEB_TYPE: + case SSDFS_MAPTBL_HNODE_PEB_TYPE: + case SSDFS_MAPTBL_IDXNODE_PEB_TYPE: + err = ssdfs_btree_node_apply_diffs(pebi, req); + if (unlikely(err)) { + SSDFS_ERR("fail to apply diffs on base state: " + "seg %llu, peb_index %u, ino %llu, " + "logical_offset %u, peb_index %u, " + "peb_page %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, + le64_to_cpu(pos->blk_desc.buf.ino), + le32_to_cpu(pos->blk_desc.buf.logical_offset), + le16_to_cpu(pos->blk_desc.buf.peb_index), + le16_to_cpu(pos->blk_desc.buf.peb_page), + err); + goto finish_read_block_state; + } + break; + + default: + err = -EOPNOTSUPP; + SSDFS_ERR("diff-on-write is not supported: " + "seg %llu, peb_index %u, peb_type %#x, ino %llu, " + "logical_offset %u, peb_index %u, " + "peb_page %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, + pebi->pebc->peb_type, + le64_to_cpu(pos->blk_desc.buf.ino), + le32_to_cpu(pos->blk_desc.buf.logical_offset), + le16_to_cpu(pos->blk_desc.buf.peb_index), + le16_to_cpu(pos->blk_desc.buf.peb_page), + err); + goto finish_read_block_state; + } + +finish_read_block_state: + if (!err && !(req->private.flags & SSDFS_REQ_PREPARE_DIFF)) + req->result.processed_blks++; + + if (err) + ssdfs_request_unlock_and_remove_old_state(req); + + ssdfs_request_unlock_and_remove_diffs(req); + + return err; +} + +/* + * ssdfs_peb_read_page() - read page from PEB + * @pebc: pointer on PEB container + * @req: request [in|out] + * @end: pointer on waiting queue [out] + * + * This function tries to read PEB's page. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EAGAIN - PEB object is not initialized yet. + */ +int ssdfs_peb_read_page(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req, + struct completion **end) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_blk2off_table *table; + struct ssdfs_phys_offset_descriptor *desc_off = NULL; + struct ssdfs_blk_state_offset *blk_state = NULL; + u16 logical_blk; + u16 log_start_page; + struct ssdfs_metadata_descriptor desc_array[SSDFS_SEG_HDR_DESC_MAX]; + u8 peb_migration_id; + u16 peb_index; + int migration_state = SSDFS_LBLOCK_UNKNOWN_STATE; + struct ssdfs_offset_position pos = {0}; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!req); + + SSDFS_DBG("seg %llu, peb_index %u, " + "class %#x, cmd %#x, type %#x, " + "ino %llu, logical_offset %llu, data_bytes %u\n", + pebc->parent_si->seg_id, pebc->peb_index, + req->private.class, req->private.cmd, req->private.type, + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebc->parent_si->fsi; + + if (req->extent.data_bytes == 0) { + SSDFS_WARN("empty read request: ino %llu, logical_offset %llu\n", + req->extent.ino, req->extent.logical_offset); + return 0; + } + + table = pebc->parent_si->blk2off_table; + logical_blk = req->place.start.blk_index + req->result.processed_blks; + + desc_off = ssdfs_blk2off_table_convert(table, logical_blk, + &peb_index, + &migration_state, + &pos); + if (IS_ERR(desc_off) && PTR_ERR(desc_off) == -EAGAIN) { + struct completion *init_end = &table->full_init_end; + + err = SSDFS_WAIT_COMPLETION(init_end); + if (unlikely(err)) { + SSDFS_ERR("blk2off init failed: " + "err %d\n", err); + return err; + } + + desc_off = ssdfs_blk2off_table_convert(table, logical_blk, + &peb_index, + &migration_state, + &pos); + } + + if (IS_ERR_OR_NULL(desc_off)) { + err = (desc_off == NULL ? -ERANGE : PTR_ERR(desc_off)); + SSDFS_ERR("fail to convert: " + "logical_blk %u, err %d\n", + logical_blk, err); + return err; + } + + peb_migration_id = desc_off->blk_state.peb_migration_id; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u, peb_index %u, " + "logical_offset %u, logical_blk %u, peb_page %u, " + "log_start_page %u, log_area %u, " + "peb_migration_id %u, byte_offset %u\n", + logical_blk, pebc->peb_index, + le32_to_cpu(desc_off->page_desc.logical_offset), + le16_to_cpu(desc_off->page_desc.logical_blk), + le16_to_cpu(desc_off->page_desc.peb_page), + le16_to_cpu(desc_off->blk_state.log_start_page), + desc_off->blk_state.log_area, + desc_off->blk_state.peb_migration_id, + le32_to_cpu(desc_off->blk_state.byte_offset)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_ssdfs_logical_block_migrating(migration_state)) { + err = ssdfs_blk2off_table_get_block_state(table, req); + if (err == -EAGAIN) { + desc_off = ssdfs_blk2off_table_convert(table, + logical_blk, + &peb_index, + &migration_state, + &pos); + if (IS_ERR_OR_NULL(desc_off)) { + err = (desc_off == NULL ? + -ERANGE : PTR_ERR(desc_off)); + SSDFS_ERR("fail to convert: " + "logical_blk %u, err %d\n", + logical_blk, err); + return err; + } + } else if (unlikely(err)) { + SSDFS_ERR("fail to get migrating block state: " + "logical_blk %u, peb_index %u, err %d\n", + logical_blk, pebc->peb_index, err); + return err; + } else + return 0; + } + + down_read(&pebc->lock); + + blk_state = &desc_off->blk_state; + log_start_page = le16_to_cpu(blk_state->log_start_page); + + if (log_start_page >= fsi->pages_per_peb) { + err = -ERANGE; + SSDFS_ERR("invalid log_start_page %u\n", log_start_page); + goto finish_read_page; + } + + err = ssdfs_peb_read_block_state(pebc, req, + desc_off, &pos, + desc_array, + SSDFS_SEG_HDR_DESC_MAX); + if (unlikely(err)) { + SSDFS_ERR("fail to read block state: " + "seg %llu, peb_index %u, " + "class %#x, cmd %#x, type %#x, " + "ino %llu, logical_offset %llu, " + "data_bytes %u, migration_state %#x, " + "err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, + req->private.class, req->private.cmd, + req->private.type, + req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, + migration_state, + err); + + SSDFS_ERR("seg_id %llu, peb_index %u, ino %llu, " + "logical_offset %u, peb_index %u, " + "peb_page %u\n", + pebc->parent_si->seg_id, + pebc->peb_index, + le64_to_cpu(pos.blk_desc.buf.ino), + le32_to_cpu(pos.blk_desc.buf.logical_offset), + le16_to_cpu(pos.blk_desc.buf.peb_index), + le16_to_cpu(pos.blk_desc.buf.peb_page)); + + for (i = 0; i < SSDFS_BLK_STATE_OFF_MAX; i++) { + blk_state = &pos.blk_desc.buf.state[i]; + + SSDFS_ERR("BLK STATE OFFSET %d: " + "log_start_page %u, log_area %#x, " + "byte_offset %u, peb_migration_id %u\n", + i, + le16_to_cpu(blk_state->log_start_page), + blk_state->log_area, + le32_to_cpu(blk_state->byte_offset), + blk_state->peb_migration_id); + } + + goto finish_read_page; + } + +finish_read_page: + up_read(&pebc->lock); + + return err; +} + +/* + * ssdfs_peb_readahead_pages() - read-ahead pages from PEB + * @pebc: pointer on PEB container + * @req: request [in|out] + * @end: pointer on waiting queue [out] + * + * This function tries to read-ahead PEB's pages. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_peb_readahead_pages(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req, + struct completion **end) +{ + struct ssdfs_fs_info *fsi; + u32 pages_count; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!req); + + SSDFS_DBG("seg %llu, peb_index %u, " + "class %#x, cmd %#x, type %#x, " + "ino %llu, logical_offset %llu, data_bytes %u\n", + pebc->parent_si->seg_id, pebc->peb_index, + req->private.class, req->private.cmd, req->private.type, + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebc->parent_si->fsi; + + if (req->extent.data_bytes == 0) { + SSDFS_WARN("empty read request: ino %llu, logical_offset %llu\n", + req->extent.ino, req->extent.logical_offset); + return 0; + } + + pages_count = req->extent.data_bytes + fsi->pagesize - 1; + pages_count >>= fsi->log_pagesize; + + for (i = req->result.processed_blks; i < pages_count; i++) { + int err = ssdfs_peb_read_page(pebc, req, end); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to process page %d\n", i); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to process page %d, err %d\n", + i, err); + return err; + } + } + + return 0; +} + /* * __ssdfs_peb_read_log_footer() - read log's footer * @fsi: file system info object From patchwork Sat Feb 25 01:08:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151933 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 925AEC6FA8E for ; Sat, 25 Feb 2023 01:17:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229733AbjBYBRT (ORCPT ); Fri, 24 Feb 2023 20:17:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48672 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229686AbjBYBQr (ORCPT ); Fri, 24 Feb 2023 20:16:47 -0500 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2F317125AF for ; Fri, 24 Feb 2023 17:16:35 -0800 (PST) Received: by mail-oi1-x22e.google.com with SMTP id bh20so792098oib.9 for ; Fri, 24 Feb 2023 17:16:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=YKa22wtce+DWoyIhrWY3cXyJT9fBmG2vw46A6+4YMDc=; b=P8JOH+DU+XMgRU7RjvbY43TilUaROOTJKi6M6TeuTZFjqYB/ktRxzQPoFUlHUfs3Y9 UfLSOKevx2QeHGguebNJKT3zOF1Q8px2ZWSrJ/DWqh/Iq69hDbDBBvGNAw00HUrnmF+4 xP+kQNLEe7PoJ6i7lCKB9ct5smdScvqB7yWilo4BWJLZHD5YRJ+kBV91lQ7+DNrdItAj rgKnXb81zVOrxistLlc9Hp/XcRGnBEDVQFOCn90Ytbx9tZvyPbl7b19xOH1vdNhTqOX9 KHg8hjY6BdAt5WhmKIkYxYP7HXykQ6eSoWtAA5ylkd8kLs+22aQhEf4tHq8doiQzz3yG tAtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YKa22wtce+DWoyIhrWY3cXyJT9fBmG2vw46A6+4YMDc=; b=GMvuiReFabgjFdw6aG5PlksXYBYNt/8EkvgthQXKt4EgX5uzAXxquLrNyPgi3DzTTd Jt/Vi4ECnS+iIv4iaIF6WkvkX4NWUt3uFMaSyJMjF7GZELyQJLvKc5SDoKH+yQfK5DUS 1DdCDx6lEYkLMY6jSsDAcNpF2uHfiiWqgC8jmxac5ycXiNMJkkPNSM+l3kBh9hnnLVKQ Wehky1WqGJLatTMdUSP4IcAUxiWMODbdwq1SI31stDNqlS7RzkNN8w3RhSC4s3Jejzzc 19bfpVMpx5BH1GcGHYnUoi0FtHIcMw6YRWB/OFBruiXFAWSiQuYdLd/zQvL+L1/LNdoO Exfw== X-Gm-Message-State: AO0yUKVZP9An7ZzTEK8Y781KwSMnTB4t82nRyAQd1JITP1yRbNlUyJaY 1BfJlSJneP6pVRecDHce6t9nxf/rLrxD2Rru X-Google-Smtp-Source: AK7set9FONSQrZetizfVjAXdf2l41g4kB7NN/QiuwdO8vh0ZAY8AKHbVJOum7ejbMi+oRDn9j9abHQ== X-Received: by 2002:a05:6808:a97:b0:37f:978d:45ba with SMTP id q23-20020a0568080a9700b0037f978d45bamr7552913oij.37.1677287793821; Fri, 24 Feb 2023 17:16:33 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:33 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 28/76] ssdfs: PEB flush thread's finite state machine Date: Fri, 24 Feb 2023 17:08:39 -0800 Message-Id: <20230225010927.813929-29-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org "Physical" Erase Block (PEB) has flush thread that implements logic of log preparation and commit/flush. Segment object has create queue that could contain requests for adding new data or metadata. PEB container object has update queue that contains requests to update existing data. Caller is responsible for allocation, preparation, and adding request(s) into particular queue (create or update). Then flush thread needs to be woken up. Flush thread checks the state of queues, takes request by request, and execute requests. If full log is prepared in memory or commit is requested, then flush thread commits the prepared log into storage device to make it persistent. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/peb_flush_thread.c | 3364 +++++++++++++++++++++++++++++++++++ 1 file changed, 3364 insertions(+) create mode 100644 fs/ssdfs/peb_flush_thread.c diff --git a/fs/ssdfs/peb_flush_thread.c b/fs/ssdfs/peb_flush_thread.c new file mode 100644 index 000000000000..6a9032762ea6 --- /dev/null +++ b/fs/ssdfs/peb_flush_thread.c @@ -0,0 +1,3364 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/peb_flush_thread.c - flush thread functionality. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates. + * https://www.bytedance.com/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + * Cong Wang + */ + +#include +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "compression.h" +#include "page_vector.h" +#include "block_bitmap.h" +#include "peb_block_bitmap.h" +#include "segment_block_bitmap.h" +#include "offset_translation_table.h" +#include "page_array.h" +#include "peb.h" +#include "peb_container.h" +#include "segment_bitmap.h" +#include "segment.h" +#include "current_segment.h" +#include "peb_mapping_table.h" +#include "btree_search.h" +#include "btree_node.h" +#include "btree.h" +#include "extents_tree.h" +#include "diff_on_write.h" +#include "invalidated_extents_tree.h" + +#include + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_flush_page_leaks; +atomic64_t ssdfs_flush_memory_leaks; +atomic64_t ssdfs_flush_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_flush_cache_leaks_increment(void *kaddr) + * void ssdfs_flush_cache_leaks_decrement(void *kaddr) + * void *ssdfs_flush_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_flush_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_flush_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_flush_kfree(void *kaddr) + * struct page *ssdfs_flush_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_flush_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_flush_free_page(struct page *page) + * void ssdfs_flush_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(flush) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(flush) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_flush_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_flush_page_leaks, 0); + atomic64_set(&ssdfs_flush_memory_leaks, 0); + atomic64_set(&ssdfs_flush_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_flush_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_flush_page_leaks) != 0) { + SSDFS_ERR("FLUSH THREAD: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_flush_page_leaks)); + } + + if (atomic64_read(&ssdfs_flush_memory_leaks) != 0) { + SSDFS_ERR("FLUSH THREAD: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_flush_memory_leaks)); + } + + if (atomic64_read(&ssdfs_flush_cache_leaks) != 0) { + SSDFS_ERR("FLUSH THREAD: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_flush_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +/****************************************************************************** + * FLUSH THREAD FUNCTIONALITY * + ******************************************************************************/ + +/* + * __ssdfs_finish_request() - common logic of request's finishing + * @pebc: pointer on PEB container + * @req: request + * @wait: wait queue head + * @err: error of processing request + */ +static +void __ssdfs_finish_request(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req, + wait_queue_head_t *wait, + int err) +{ + u32 pagesize; + u32 processed_bytes_max; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!req); + + SSDFS_DBG("req %p, cmd %#x, type %#x, err %d\n", + req, req->private.cmd, req->private.type, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + pagesize = pebc->parent_si->fsi->pagesize; + processed_bytes_max = req->result.processed_blks * pagesize; + + if (req->extent.data_bytes > processed_bytes_max) { + SSDFS_WARN("data_bytes %u > processed_bytes_max %u\n", + req->extent.data_bytes, + processed_bytes_max); + } + + req->result.err = err; + + switch (req->private.type) { + case SSDFS_REQ_SYNC: + /* do nothing */ + break; + + case SSDFS_REQ_ASYNC: + ssdfs_free_flush_request_pages(req); + pagevec_reinit(&req->result.pvec); + break; + + case SSDFS_REQ_ASYNC_NO_FREE: + ssdfs_free_flush_request_pages(req); + pagevec_reinit(&req->result.pvec); + break; + + default: + BUG(); + }; + + switch (req->private.type) { + case SSDFS_REQ_SYNC: + if (err) { + SSDFS_DBG("failure: req %p, err %d\n", req, err); + atomic_set(&req->result.state, SSDFS_REQ_FAILED); + } else + atomic_set(&req->result.state, SSDFS_REQ_FINISHED); + + complete(&req->result.wait); + wake_up_all(&req->private.wait_queue); + break; + + case SSDFS_REQ_ASYNC: + ssdfs_put_request(req); + + if (err) { + SSDFS_DBG("failure: req %p, err %d\n", req, err); + atomic_set(&req->result.state, SSDFS_REQ_FAILED); + } else + atomic_set(&req->result.state, SSDFS_REQ_FINISHED); + + complete(&req->result.wait); + + if (atomic_read(&req->private.refs_count) != 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start waiting: refs_count %d\n", + atomic_read(&req->private.refs_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = wait_event_killable_timeout(*wait, + atomic_read(&req->private.refs_count) == 0, + SSDFS_DEFAULT_TIMEOUT); + if (err < 0) + WARN_ON(err < 0); + else + err = 0; + } + + wake_up_all(&req->private.wait_queue); + ssdfs_request_free(req); + break; + + case SSDFS_REQ_ASYNC_NO_FREE: + ssdfs_put_request(req); + + if (err) { + SSDFS_DBG("failure: req %p, err %d\n", req, err); + atomic_set(&req->result.state, SSDFS_REQ_FAILED); + } else + atomic_set(&req->result.state, SSDFS_REQ_FINISHED); + + complete(&req->result.wait); + + if (atomic_read(&req->private.refs_count) != 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start waiting: refs_count %d\n", + atomic_read(&req->private.refs_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = wait_event_killable_timeout(*wait, + atomic_read(&req->private.refs_count) == 0, + SSDFS_DEFAULT_TIMEOUT); + if (err < 0) + WARN_ON(err < 0); + else + err = 0; + } + + wake_up_all(&req->private.wait_queue); + break; + + default: + atomic_set(&req->result.state, SSDFS_REQ_FAILED); + BUG(); + }; +} + +/* + * ssdfs_finish_pre_allocate_request() - finish pre-allocate request + * @pebc: pointer on PEB container + * @req: request + * @wait: wait queue head + * @err: error of processing request + * + * This function finishes pre-allocate request processing. If attempt of + * pre-allocate an extent has been resulted with %-EAGAIN error then + * function returns request into create queue for final + * processing. + */ +static +void ssdfs_finish_pre_allocate_request(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req, + wait_queue_head_t *wait, + int err) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!req); + + SSDFS_DBG("req %p, cmd %#x, type %#x, err %d\n", + req, req->private.cmd, req->private.type, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!err) { + WARN_ON(pagevec_count(&req->result.pvec) != 0); + ssdfs_flush_pagevec_release(&req->result.pvec); + } + + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("return request into queue: " + "seg %llu, peb_index %u, " + "ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu" + "seg %llu, logical_block %u, cmd %#x, type %#x, " + "processed_blks %d\n", + pebc->parent_si->seg_id, pebc->peb_index, + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot, + req->place.start.seg_id, req->place.start.blk_index, + req->private.cmd, req->private.type, + req->result.processed_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + atomic_set(&req->result.state, SSDFS_REQ_CREATED); + + spin_lock(&pebc->crq_ptr_lock); + ssdfs_requests_queue_add_head(pebc->create_rq, req); + spin_unlock(&pebc->crq_ptr_lock); + } else + __ssdfs_finish_request(pebc, req, wait, err); +} + +/* + * ssdfs_finish_create_request() - finish create request + * @pebc: pointer on PEB container + * @req: request + * @wait: wait queue head + * @err: error of processing request + * + * This function finishes create request processing. If attempt of + * adding data block has been resulted with %-EAGAIN error then + * function returns request into create queue for final + * processing. + */ +static +void ssdfs_finish_create_request(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req, + wait_queue_head_t *wait, + int err) +{ + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!req); + + SSDFS_DBG("req %p, cmd %#x, type %#x, err %d\n", + req, req->private.cmd, req->private.type, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!err) { + for (i = 0; i < req->result.processed_blks; i++) + ssdfs_peb_mark_request_block_uptodate(pebc, req, i); + } + + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("return request into queue: " + "seg %llu, peb_index %u, " + "ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu" + "seg %llu, logical_block %u, cmd %#x, type %#x, " + "processed_blks %d\n", + pebc->parent_si->seg_id, pebc->peb_index, + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot, + req->place.start.seg_id, req->place.start.blk_index, + req->private.cmd, req->private.type, + req->result.processed_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + atomic_set(&req->result.state, SSDFS_REQ_CREATED); + + spin_lock(&pebc->crq_ptr_lock); + ssdfs_requests_queue_add_head(pebc->create_rq, req); + spin_unlock(&pebc->crq_ptr_lock); + } else + __ssdfs_finish_request(pebc, req, wait, err); +} + +/* + * ssdfs_finish_update_request() - finish update request + * @pebc: pointer on PEB container + * @req: request + * @wait: wait queue head + * @err: error of processing request + * + * This function finishes update request processing. + */ +static +void ssdfs_finish_update_request(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req, + wait_queue_head_t *wait, + int err) +{ + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!req); + + SSDFS_DBG("req %p, cmd %#x, type %#x, err %d\n", + req, req->private.cmd, req->private.type, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!err) { + for (i = 0; i < req->result.processed_blks; i++) + ssdfs_peb_mark_request_block_uptodate(pebc, req, i); + } + + __ssdfs_finish_request(pebc, req, wait, err); +} + +/* + * ssdfs_finish_flush_request() - finish flush request + * @pebc: pointer on PEB container + * @req: request + * @wait: wait queue head + * @err: error of processing request + */ +static inline +void ssdfs_finish_flush_request(struct ssdfs_peb_container *pebc, + struct ssdfs_segment_request *req, + wait_queue_head_t *wait, + int err) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!req); + + SSDFS_DBG("req %p, cmd %#x, type %#x, err %d\n", + req, req->private.cmd, req->private.type, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (req->private.class) { + case SSDFS_PEB_PRE_ALLOCATE_DATA_REQ: + case SSDFS_PEB_PRE_ALLOCATE_LNODE_REQ: + case SSDFS_PEB_PRE_ALLOCATE_HNODE_REQ: + case SSDFS_PEB_PRE_ALLOCATE_IDXNODE_REQ: + ssdfs_finish_pre_allocate_request(pebc, req, wait, err); + break; + + case SSDFS_PEB_CREATE_DATA_REQ: + case SSDFS_PEB_CREATE_LNODE_REQ: + case SSDFS_PEB_CREATE_HNODE_REQ: + case SSDFS_PEB_CREATE_IDXNODE_REQ: + ssdfs_finish_create_request(pebc, req, wait, err); + break; + + case SSDFS_PEB_UPDATE_REQ: + case SSDFS_PEB_PRE_ALLOC_UPDATE_REQ: + case SSDFS_PEB_DIFF_ON_WRITE_REQ: + case SSDFS_PEB_COLLECT_GARBAGE_REQ: + case SSDFS_ZONE_USER_DATA_MIGRATE_REQ: + ssdfs_finish_update_request(pebc, req, wait, err); + break; + + default: + BUG(); + }; + + ssdfs_forget_user_data_flush_request(pebc->parent_si); + ssdfs_segment_finish_request_cno(pebc->parent_si); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("flush_reqs %lld\n", + atomic64_read(&pebc->parent_si->fsi->flush_reqs)); +#endif /* CONFIG_SSDFS_DEBUG */ + + WARN_ON(atomic64_dec_return(&pebc->parent_si->fsi->flush_reqs) < 0); +} + +/* + * ssdfs_peb_clear_current_log_pages() - clear dirty pages of current log + * @pebi: pointer on PEB object + */ +static inline +void ssdfs_peb_clear_current_log_pages(struct ssdfs_peb_info *pebi) +{ + struct ssdfs_page_array *area_pages; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < SSDFS_LOG_AREA_MAX; i++) { + area_pages = &pebi->current_log.area[i].array; + err = ssdfs_page_array_clear_all_dirty_pages(area_pages); + if (unlikely(err)) { + SSDFS_ERR("fail to clear dirty pages: " + "area_type %#x, err %d\n", + i, err); + } + } +} + +/* + * ssdfs_peb_clear_current_log_pages() - clear dirty pages of PEB's cache + * @pebi: pointer on PEB object + */ +static inline +void ssdfs_peb_clear_cache_dirty_pages(struct ssdfs_peb_info *pebi) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_page_array_clear_all_dirty_pages(&pebi->cache); + if (unlikely(err)) { + SSDFS_ERR("fail to clear dirty pages: " + "err %d\n", + err); + } +} + +/* + * ssdfs_peb_commit_log_on_thread_stop() - commit log on thread stopping + * @pebi: pointer on PEB object + * @cur_segs: current segment IDs array + * @size: size of segment IDs array size in bytes + */ +static +int ssdfs_peb_commit_log_on_thread_stop(struct ssdfs_peb_info *pebi, + __le64 *cur_segs, size_t size) +{ + struct ssdfs_fs_info *fsi; + u64 reserved_new_user_data_pages; + u64 updated_user_data_pages; + u64 flushing_user_data_requests; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb_index %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + if (ssdfs_peb_has_dirty_pages(pebi)) { + /* + * Unexpected situation. + * Try to commit anyway. + */ + + spin_lock(&fsi->volume_state_lock); + reserved_new_user_data_pages = + fsi->reserved_new_user_data_pages; + updated_user_data_pages = + fsi->updated_user_data_pages; + flushing_user_data_requests = + fsi->flushing_user_data_requests; + spin_unlock(&fsi->volume_state_lock); + + SSDFS_WARN("PEB has dirty pages: " + "seg %llu, peb %llu, peb_type %#x, " + "global_fs_state %#x, " + "reserved_new_user_data_pages %llu, " + "updated_user_data_pages %llu, " + "flushing_user_data_requests %llu\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, pebi->pebc->peb_type, + atomic_read(&fsi->global_fs_state), + reserved_new_user_data_pages, + updated_user_data_pages, + flushing_user_data_requests); + + err = ssdfs_peb_commit_log(pebi, cur_segs, size); + if (unlikely(err)) { + SSDFS_CRIT("fail to commit log: " + "seg %llu, peb_index %u, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_index, err); + + ssdfs_peb_clear_current_log_pages(pebi); + ssdfs_peb_clear_cache_dirty_pages(pebi); + } + } + + return err; +} + +/* + * ssdfs_peb_get_current_log_state() - get state of PEB's current log + * @pebc: pointer on PEB container + */ +static +int ssdfs_peb_get_current_log_state(struct ssdfs_peb_container *pebc) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_peb_info *pebi = NULL; + bool is_peb_exhausted; + int state; + int err = 0; + + fsi = pebc->parent_si->fsi; + +try_get_current_state: + down_read(&pebc->lock); + + switch (atomic_read(&pebc->migration_state)) { + case SSDFS_PEB_NOT_MIGRATING: + pebi = pebc->src_peb; + if (!pebi) { + err = -ERANGE; + SSDFS_WARN("source PEB is NULL\n"); + goto finish_get_current_log_state; + } + state = atomic_read(&pebi->current_log.state); + break; + + case SSDFS_PEB_UNDER_MIGRATION: + pebi = pebc->src_peb; + if (!pebi) { + err = -ERANGE; + SSDFS_WARN("source PEB is NULL\n"); + goto finish_get_current_log_state; + } + + ssdfs_peb_current_log_lock(pebi); + is_peb_exhausted = is_ssdfs_peb_exhausted(fsi, pebi); + ssdfs_peb_current_log_unlock(pebi); + + if (is_peb_exhausted) { + pebi = pebc->dst_peb; + if (!pebi) { + err = -ERANGE; + SSDFS_WARN("destination PEB is NULL\n"); + goto finish_get_current_log_state; + } + } + + state = atomic_read(&pebi->current_log.state); + break; + + case SSDFS_PEB_MIGRATION_PREPARATION: + case SSDFS_PEB_RELATION_PREPARATION: + case SSDFS_PEB_FINISHING_MIGRATION: + err = -EAGAIN; + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid state: %#x\n", + atomic_read(&pebc->migration_state)); + goto finish_get_current_log_state; + break; + } + +finish_get_current_log_state: + up_read(&pebc->lock); + + if (err == -EAGAIN) { + DEFINE_WAIT(wait); + + err = 0; + prepare_to_wait(&pebc->migration_wq, &wait, + TASK_UNINTERRUPTIBLE); + schedule(); + finish_wait(&pebc->migration_wq, &wait); + goto try_get_current_state; + } else if (unlikely(err)) + state = SSDFS_LOG_UNKNOWN; + + return state; +} + +bool is_ssdfs_peb_exhausted(struct ssdfs_fs_info *fsi, + struct ssdfs_peb_info *pebi) +{ + bool is_exhausted = false; + u16 start_page; + u32 pages_per_peb; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !pebi); + BUG_ON(!mutex_is_locked(&pebi->current_log.lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + start_page = pebi->current_log.start_page; + pages_per_peb = min_t(u32, fsi->leb_pages_capacity, + fsi->peb_pages_capacity); + + switch (atomic_read(&pebi->current_log.state)) { + case SSDFS_LOG_INITIALIZED: + case SSDFS_LOG_COMMITTED: + case SSDFS_LOG_CREATED: + is_exhausted = start_page >= pages_per_peb; + break; + + default: + is_exhausted = false; + break; + }; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, peb_id %llu, start_page %u, " + "pages_per_peb %u, is_exhausted %#x\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, start_page, + pages_per_peb, is_exhausted); +#endif /* CONFIG_SSDFS_DEBUG */ + + return is_exhausted; +} + +bool is_ssdfs_peb_ready_to_exhaust(struct ssdfs_fs_info *fsi, + struct ssdfs_peb_info *pebi) +{ + bool is_ready_to_exhaust = false; + u16 start_page; + u32 pages_per_peb; + u16 free_data_pages; + u16 reserved_pages; + u16 min_partial_log_pages; + int empty_pages; + int migration_state; + int migration_phase; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); + BUG_ON(!mutex_is_locked(&pebi->current_log.lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + migration_state = atomic_read(&pebi->pebc->migration_state); + migration_phase = atomic_read(&pebi->pebc->migration_phase); + + switch (migration_state) { + case SSDFS_PEB_NOT_MIGRATING: + /* continue logic */ + break; + + case SSDFS_PEB_UNDER_MIGRATION: + switch (migration_phase) { + case SSDFS_SRC_PEB_NOT_EXHAUSTED: + is_ready_to_exhaust = false; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb under migration: " + "src_peb %llu is not exhausted\n", + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return is_ready_to_exhaust; + + default: + /* continue logic */ + break; + } + break; + + case SSDFS_PEB_MIGRATION_PREPARATION: + case SSDFS_PEB_RELATION_PREPARATION: + case SSDFS_PEB_FINISHING_MIGRATION: + is_ready_to_exhaust = true; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb is going to migrate: " + "src_peb %llu is exhausted\n", + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return is_ready_to_exhaust; + + default: + SSDFS_WARN("migration_state %#x, migration_phase %#x\n", + migration_state, migration_phase); + BUG(); + break; + } + + start_page = pebi->current_log.start_page; + pages_per_peb = min_t(u32, fsi->leb_pages_capacity, + fsi->peb_pages_capacity); + empty_pages = pages_per_peb - start_page; + free_data_pages = pebi->current_log.free_data_pages; + reserved_pages = pebi->current_log.reserved_pages; + min_partial_log_pages = ssdfs_peb_estimate_min_partial_log_pages(pebi); + + switch (atomic_read(&pebi->current_log.state)) { + case SSDFS_LOG_INITIALIZED: + case SSDFS_LOG_COMMITTED: + case SSDFS_LOG_CREATED: + if (empty_pages > min_partial_log_pages) + is_ready_to_exhaust = false; + else if (reserved_pages == 0) { + if (free_data_pages <= min_partial_log_pages) + is_ready_to_exhaust = true; + else + is_ready_to_exhaust = false; + } else { + if (free_data_pages < min_partial_log_pages) + is_ready_to_exhaust = true; + else + is_ready_to_exhaust = false; + } + break; + + default: + is_ready_to_exhaust = false; + break; + }; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, peb_id %llu, free_data_pages %u, " + "reserved_pages %u, min_partial_log_pages %u, " + "is_ready_to_exhaust %#x\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, free_data_pages, + reserved_pages, min_partial_log_pages, + is_ready_to_exhaust); +#endif /* CONFIG_SSDFS_DEBUG */ + + return is_ready_to_exhaust; +} + +static inline +bool ssdfs_peb_has_partial_empty_log(struct ssdfs_fs_info *fsi, + struct ssdfs_peb_info *pebi) +{ + bool has_partial_empty_log = false; + u16 start_page; + u32 pages_per_peb; + u16 log_pages; + int empty_pages; + u16 min_partial_log_pages; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !pebi); + BUG_ON(!mutex_is_locked(&pebi->current_log.lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + start_page = pebi->current_log.start_page; + pages_per_peb = min_t(u32, fsi->leb_pages_capacity, + fsi->peb_pages_capacity); + log_pages = pebi->log_pages; + min_partial_log_pages = ssdfs_peb_estimate_min_partial_log_pages(pebi); + + switch (atomic_read(&pebi->current_log.state)) { + case SSDFS_LOG_INITIALIZED: + case SSDFS_LOG_COMMITTED: + case SSDFS_LOG_CREATED: + empty_pages = pages_per_peb - start_page; + if (empty_pages < 0) + has_partial_empty_log = false; + else if (empty_pages < min_partial_log_pages) + has_partial_empty_log = true; + else + has_partial_empty_log = false; + break; + + default: + has_partial_empty_log = false; + break; + }; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, peb_id %llu, start_page %u, " + "pages_per_peb %u, log_pages %u, " + "min_partial_log_pages %u, " + "has_partial_empty_log %#x\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, start_page, + pages_per_peb, log_pages, + min_partial_log_pages, + has_partial_empty_log); +#endif /* CONFIG_SSDFS_DEBUG */ + + return has_partial_empty_log; +} + +static inline +bool has_commit_log_now_requested(struct ssdfs_peb_container *pebc) +{ + struct ssdfs_segment_request *req = NULL; + bool commit_log_now = false; + int err; + + if (is_ssdfs_requests_queue_empty(&pebc->update_rq)) + return false; + + err = ssdfs_requests_queue_remove_first(&pebc->update_rq, &req); + if (err || !req) + return false; + + commit_log_now = req->private.cmd == SSDFS_COMMIT_LOG_NOW; + ssdfs_requests_queue_add_head(&pebc->update_rq, req); + return commit_log_now; +} + +static inline +bool has_start_migration_now_requested(struct ssdfs_peb_container *pebc) +{ + struct ssdfs_segment_request *req = NULL; + bool start_migration_now = false; + int err; + + if (is_ssdfs_requests_queue_empty(&pebc->update_rq)) + return false; + + err = ssdfs_requests_queue_remove_first(&pebc->update_rq, &req); + if (err || !req) + return false; + + start_migration_now = req->private.cmd == SSDFS_START_MIGRATION_NOW; + ssdfs_requests_queue_add_head(&pebc->update_rq, req); + return start_migration_now; +} + +static inline +void ssdfs_peb_check_update_queue(struct ssdfs_peb_container *pebc) +{ + struct ssdfs_segment_request *req = NULL; + int err; + + if (is_ssdfs_requests_queue_empty(&pebc->update_rq)) { + SSDFS_DBG("update request queue is empty\n"); + return; + } + + err = ssdfs_requests_queue_remove_first(&pebc->update_rq, &req); + if (err || !req) + return; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); + SSDFS_DBG("req->private.class %#x, req->private.cmd %#x\n", + req->private.class, req->private.cmd); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_requests_queue_add_head(&pebc->update_rq, req); + return; +} + +static inline +int __ssdfs_peb_finish_migration(struct ssdfs_peb_container *pebc) +{ + struct ssdfs_segment_info *si = pebc->parent_si; + struct ssdfs_segment_request *req; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_finish_migration(pebc); + if (unlikely(err)) { + SSDFS_ERR("fail to finish migration: " + "seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); + return err; + } + + /* + * The responsibility of finish migration code + * is to copy the state of valid blocks of + * source erase block into the buffers. + * So, this buffered state of valid blocks + * should be commited ASAP. It needs to send + * the COMMIT_LOG_NOW command to guarantee + * that valid blocks will be flushed on the volume. + */ + + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + SSDFS_ERR("fail to allocate request: " + "err %d\n", err); + return err; + } + + ssdfs_request_init(req); + ssdfs_get_request(req); + + err = ssdfs_segment_commit_log_async2(si, SSDFS_REQ_ASYNC, + pebc->peb_index, req); + if (unlikely(err)) { + SSDFS_ERR("commit log request failed: " + "err %d\n", err); + ssdfs_put_request(req); + ssdfs_request_free(req); + return err; + } + + return 0; +} + +static inline +bool need_wait_next_create_data_request(struct ssdfs_peb_info *pebi) +{ + struct ssdfs_segment_info *si = pebi->pebc->parent_si; + struct ssdfs_fs_info *fsi = si->fsi; + bool has_pending_pages = false; + bool has_reserved_pages = false; + int state; + bool is_current_seg = false; + u64 reserved_pages = 0; + u64 pending_pages = 0; + bool need_wait = false; + + if (!is_ssdfs_peb_containing_user_data(pebi->pebc)) + goto finish_check; + + spin_lock(&si->pending_lock); + pending_pages = si->pending_new_user_data_pages; + has_pending_pages = si->pending_new_user_data_pages > 0; + spin_unlock(&si->pending_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pending_pages %llu\n", pending_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (has_pending_pages) { + need_wait = true; + goto finish_check; + } + + spin_lock(&fsi->volume_state_lock); + reserved_pages = fsi->reserved_new_user_data_pages; + has_reserved_pages = fsi->reserved_new_user_data_pages > 0; + spin_unlock(&fsi->volume_state_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("reserved_pages %llu\n", reserved_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_read(&si->obj_state); + is_current_seg = (state == SSDFS_CURRENT_SEG_OBJECT); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("is_current_seg %#x, has_reserved_pages %#x\n", + is_current_seg, has_reserved_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + need_wait = is_current_seg && has_reserved_pages; + +finish_check: + return need_wait; +} + +static inline +bool need_wait_next_update_request(struct ssdfs_peb_info *pebi) +{ + struct ssdfs_segment_info *si = pebi->pebc->parent_si; + struct ssdfs_fs_info *fsi = si->fsi; + bool has_pending_pages = false; + bool has_updated_pages = false; + u64 updated_pages = 0; + u64 pending_pages = 0; + bool need_wait = false; + + if (!is_ssdfs_peb_containing_user_data(pebi->pebc)) + goto finish_check; + + spin_lock(&pebi->pebc->pending_lock); + pending_pages = pebi->pebc->pending_updated_user_data_pages; + has_pending_pages = pending_pages > 0; + spin_unlock(&pebi->pebc->pending_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pending_pages %llu\n", pending_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (has_pending_pages) { + need_wait = true; + goto finish_check; + } + + spin_lock(&fsi->volume_state_lock); + updated_pages = fsi->updated_user_data_pages; + has_updated_pages = fsi->updated_user_data_pages > 0; + spin_unlock(&fsi->volume_state_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("updated_pages %llu\n", updated_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + need_wait = has_updated_pages; + +finish_check: + return need_wait; +} + +static inline +bool no_more_updated_pages(struct ssdfs_peb_container *pebc) +{ + struct ssdfs_segment_info *si = pebc->parent_si; + struct ssdfs_fs_info *fsi = si->fsi; + bool has_updated_pages = false; + + if (!is_ssdfs_peb_containing_user_data(pebc)) + return true; + + spin_lock(&fsi->volume_state_lock); + has_updated_pages = fsi->updated_user_data_pages > 0; + spin_unlock(&fsi->volume_state_lock); + + return !has_updated_pages; +} + +static inline +bool is_regular_fs_operations(struct ssdfs_peb_container *pebc) +{ + int state; + + state = atomic_read(&pebc->parent_si->fsi->global_fs_state); + return state == SSDFS_REGULAR_FS_OPERATIONS; +} + +/* Flush thread possible states */ +enum { + SSDFS_FLUSH_THREAD_ERROR, + SSDFS_FLUSH_THREAD_FREE_SPACE_ABSENT, + SSDFS_FLUSH_THREAD_RO_STATE, + SSDFS_FLUSH_THREAD_NEED_CREATE_LOG, + SSDFS_FLUSH_THREAD_CHECK_STOP_CONDITION, + SSDFS_FLUSH_THREAD_GET_CREATE_REQUEST, + SSDFS_FLUSH_THREAD_GET_UPDATE_REQUEST, + SSDFS_FLUSH_THREAD_PROCESS_CREATE_REQUEST, + SSDFS_FLUSH_THREAD_PROCESS_UPDATE_REQUEST, + SSDFS_FLUSH_THREAD_PROCESS_INVALIDATED_EXTENT, + SSDFS_FLUSH_THREAD_CHECK_MIGRATION_NEED, + SSDFS_FLUSH_THREAD_START_MIGRATION_NOW, + SSDFS_FLUSH_THREAD_COMMIT_LOG, + SSDFS_FLUSH_THREAD_DELEGATE_CREATE_ROLE, +}; + +#define FLUSH_THREAD_WAKE_CONDITION(pebc) \ + (kthread_should_stop() || have_flush_requests(pebc)) +#define FLUSH_FAILED_THREAD_WAKE_CONDITION() \ + (kthread_should_stop()) +#define FLUSH_THREAD_CUR_SEG_WAKE_CONDITION(pebc) \ + (kthread_should_stop() || have_flush_requests(pebc) || \ + !is_regular_fs_operations(pebc) || \ + atomic_read(&pebc->parent_si->obj_state) != SSDFS_CURRENT_SEG_OBJECT) +#define FLUSH_THREAD_UPDATE_WAKE_CONDITION(pebc) \ + (kthread_should_stop() || have_flush_requests(pebc) || \ + no_more_updated_pages(pebc) || !is_regular_fs_operations(pebc)) +#define FLUSH_THREAD_INVALIDATE_WAKE_CONDITION(pebc) \ + (kthread_should_stop() || have_flush_requests(pebc) || \ + !is_regular_fs_operations(pebc)) + +static +int wait_next_create_request(struct ssdfs_peb_container *pebc, + int *thread_state) +{ + struct ssdfs_segment_info *si = pebc->parent_si; + struct ssdfs_fs_info *fsi = si->fsi; + struct ssdfs_peb_info *pebi; + u64 reserved_pages = 0; + bool has_reserved_pages = false; + int state; + bool is_current_seg = false; + bool has_dirty_pages = false; + bool need_commit_log = false; + wait_queue_head_t *wq = NULL; + struct ssdfs_segment_request *req; + int err; + + if (!is_ssdfs_peb_containing_user_data(pebc)) { + *thread_state = SSDFS_FLUSH_THREAD_ERROR; + return -ERANGE; + } + + spin_lock(&fsi->volume_state_lock); + reserved_pages = fsi->reserved_new_user_data_pages; + has_reserved_pages = reserved_pages > 0; + spin_unlock(&fsi->volume_state_lock); + + state = atomic_read(&si->obj_state); + is_current_seg = (state == SSDFS_CURRENT_SEG_OBJECT); + + if (is_current_seg && has_reserved_pages) { + wq = &fsi->pending_wq; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("wait next data request: reserved_pages %llu\n", + reserved_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = wait_event_killable_timeout(*wq, + FLUSH_THREAD_CUR_SEG_WAKE_CONDITION(pebc), + SSDFS_DEFAULT_TIMEOUT); + if (err < 0) + WARN_ON(err < 0); + else + err = 0; + } + + state = atomic_read(&si->obj_state); + is_current_seg = (state == SSDFS_CURRENT_SEG_OBJECT); + + pebi = ssdfs_get_current_peb_locked(pebc); + if (!IS_ERR_OR_NULL(pebi)) { + ssdfs_peb_current_log_lock(pebi); + has_dirty_pages = ssdfs_peb_has_dirty_pages(pebi); + ssdfs_peb_current_log_unlock(pebi); + ssdfs_unlock_current_peb(pebc); + } + + if (!is_regular_fs_operations(pebc)) + need_commit_log = true; + else if (!is_current_seg) + need_commit_log = true; + else if (!have_flush_requests(pebc) && has_dirty_pages) + need_commit_log = true; + + if (need_commit_log) { + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + SSDFS_ERR("fail to allocate request: " + "err %d\n", err); + *thread_state = SSDFS_FLUSH_THREAD_ERROR; + return err; + } + + ssdfs_request_init(req); + ssdfs_get_request(req); + + err = ssdfs_segment_commit_log_async2(si, SSDFS_REQ_ASYNC, + pebc->peb_index, req); + if (unlikely(err)) { + SSDFS_ERR("commit log request failed: " + "err %d\n", err); + ssdfs_put_request(req); + ssdfs_request_free(req); + *thread_state = SSDFS_FLUSH_THREAD_ERROR; + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("request commit log now: reserved_pages %llu\n", + reserved_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("get next create request: reserved_pages %llu\n", + reserved_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + *thread_state = SSDFS_FLUSH_THREAD_GET_CREATE_REQUEST; + return 0; +} + +static +int wait_next_update_request(struct ssdfs_peb_container *pebc, + int *thread_state) +{ + struct ssdfs_segment_info *si = pebc->parent_si; + struct ssdfs_peb_info *pebi; + struct ssdfs_fs_info *fsi = si->fsi; + u64 updated_pages = 0; + bool has_updated_pages = false; + bool has_dirty_pages = false; + bool need_commit_log = false; + wait_queue_head_t *wq = NULL; + struct ssdfs_segment_request *req; + int err; + + if (!is_ssdfs_peb_containing_user_data(pebc)) { + *thread_state = SSDFS_FLUSH_THREAD_ERROR; + return -ERANGE; + } + + spin_lock(&fsi->volume_state_lock); + updated_pages = fsi->updated_user_data_pages; + has_updated_pages = updated_pages > 0; + spin_unlock(&fsi->volume_state_lock); + + if (has_updated_pages) { + wq = &fsi->pending_wq; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("wait next update request: updated_pages %llu\n", + updated_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = wait_event_killable_timeout(*wq, + FLUSH_THREAD_UPDATE_WAKE_CONDITION(pebc), + SSDFS_DEFAULT_TIMEOUT); + if (err < 0) + WARN_ON(err < 0); + else + err = 0; + } + + pebi = ssdfs_get_current_peb_locked(pebc); + if (!IS_ERR_OR_NULL(pebi)) { + ssdfs_peb_current_log_lock(pebi); + has_dirty_pages = ssdfs_peb_has_dirty_pages(pebi); + ssdfs_peb_current_log_unlock(pebi); + ssdfs_unlock_current_peb(pebc); + } + + if (!is_regular_fs_operations(pebc)) + need_commit_log = true; + else if (no_more_updated_pages(pebc)) + need_commit_log = true; + else if (!have_flush_requests(pebc) && has_dirty_pages) + need_commit_log = true; + + if (need_commit_log) { + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + SSDFS_ERR("fail to allocate request: " + "err %d\n", err); + *thread_state = SSDFS_FLUSH_THREAD_ERROR; + return err; + } + + ssdfs_request_init(req); + ssdfs_get_request(req); + + err = ssdfs_segment_commit_log_async2(si, SSDFS_REQ_ASYNC, + pebc->peb_index, req); + if (unlikely(err)) { + SSDFS_ERR("commit log request failed: " + "err %d\n", err); + ssdfs_put_request(req); + ssdfs_request_free(req); + *thread_state = SSDFS_FLUSH_THREAD_ERROR; + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("request commit log now: updated_pages %llu\n", + updated_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("get next create request: updated_pages %llu\n", + updated_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + *thread_state = SSDFS_FLUSH_THREAD_GET_CREATE_REQUEST; + return 0; +} + +static +int wait_next_invalidate_request(struct ssdfs_peb_container *pebc, + int *thread_state) +{ + struct ssdfs_segment_info *si = pebc->parent_si; + struct ssdfs_fs_info *fsi = si->fsi; + int state; + wait_queue_head_t *wq = NULL; + struct ssdfs_segment_request *req; + int err; + + if (!is_ssdfs_peb_containing_user_data(pebc)) { + *thread_state = SSDFS_FLUSH_THREAD_ERROR; + return -ERANGE; + } + + if (!is_user_data_pages_invalidated(si)) { + *thread_state = SSDFS_FLUSH_THREAD_ERROR; + return -ERANGE; + } + + state = atomic_read(&fsi->global_fs_state); + switch(state) { + case SSDFS_REGULAR_FS_OPERATIONS: + wq = &fsi->pending_wq; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("wait next invalidate request\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = wait_event_killable_timeout(*wq, + FLUSH_THREAD_INVALIDATE_WAKE_CONDITION(pebc), + SSDFS_DEFAULT_TIMEOUT); + if (err < 0) + WARN_ON(err < 0); + else + err = 0; + + if (have_flush_requests(pebc)) + SSDFS_DBG("get next create request\n"); + else + goto request_commit_log_now; + break; + + case SSDFS_METADATA_GOING_FLUSHING: +request_commit_log_now: + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + SSDFS_ERR("fail to allocate request: " + "err %d\n", err); + *thread_state = SSDFS_FLUSH_THREAD_ERROR; + return err; + } + + ssdfs_request_init(req); + ssdfs_get_request(req); + + err = ssdfs_segment_commit_log_async2(si, SSDFS_REQ_ASYNC, + pebc->peb_index, req); + if (unlikely(err)) { + SSDFS_ERR("commit log request failed: " + "err %d\n", err); + ssdfs_put_request(req); + ssdfs_request_free(req); + *thread_state = SSDFS_FLUSH_THREAD_ERROR; + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("request commit log now\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + default: + SSDFS_ERR("unexpected global FS state %#x\n", + state); + *thread_state = SSDFS_FLUSH_THREAD_ERROR; + return -ERANGE; + } + + *thread_state = SSDFS_FLUSH_THREAD_GET_CREATE_REQUEST; + return 0; +} + +static inline +int ssdfs_check_peb_init_state(u64 seg_id, u64 peb_id, int state, + struct completion *init_end) +{ + int res; + + if (peb_id >= U64_MAX || + state == SSDFS_PEB_OBJECT_INITIALIZED || + !init_end) { + /* do nothing */ + return 0; + } + + res = wait_for_completion_timeout(init_end, SSDFS_DEFAULT_TIMEOUT); + if (res == 0) { + SSDFS_ERR("PEB init failed: " + "seg %llu, peb %llu\n", + seg_id, peb_id); + return -ERANGE; + } + + return 0; +} + +static inline +int ssdfs_check_src_peb_init_state(struct ssdfs_peb_container *pebc) +{ + struct ssdfs_peb_info *pebi = NULL; + struct completion *init_end = NULL; + u64 peb_id = U64_MAX; + int state = SSDFS_PEB_OBJECT_UNKNOWN_STATE; + + down_read(&pebc->lock); + pebi = pebc->src_peb; + if (pebi) { + init_end = &pebi->init_end; + peb_id = pebi->peb_id; + state = atomic_read(&pebi->state); + } + up_read(&pebc->lock); + + return ssdfs_check_peb_init_state(pebc->parent_si->seg_id, + peb_id, state, init_end); +} + +static inline +int ssdfs_check_dst_peb_init_state(struct ssdfs_peb_container *pebc) +{ + struct ssdfs_peb_info *pebi = NULL; + struct completion *init_end = NULL; + u64 peb_id = U64_MAX; + int state = SSDFS_PEB_OBJECT_UNKNOWN_STATE; + + down_read(&pebc->lock); + pebi = pebc->dst_peb; + if (pebi) { + init_end = &pebi->init_end; + peb_id = pebi->peb_id; + state = atomic_read(&pebi->state); + } + up_read(&pebc->lock); + + return ssdfs_check_peb_init_state(pebc->parent_si->seg_id, + peb_id, state, init_end); +} + +static inline +int ssdfs_check_peb_container_init_state(struct ssdfs_peb_container *pebc) +{ + int err; + + err = ssdfs_check_src_peb_init_state(pebc); + if (!err) + err = ssdfs_check_dst_peb_init_state(pebc); + + return err; +} + +/* + * ssdfs_peb_flush_thread_func() - main fuction of flush thread + * @data: pointer on data object + * + * This function is main fuction of flush thread. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + */ +int ssdfs_peb_flush_thread_func(void *data) +{ + struct ssdfs_peb_container *pebc = data; + struct ssdfs_segment_info *si = pebc->parent_si; + struct ssdfs_fs_info *fsi = si->fsi; + struct ssdfs_peb_mapping_table *maptbl = fsi->maptbl; + wait_queue_head_t *wait_queue; + struct ssdfs_segment_request *req; + struct ssdfs_segment_request *postponed_req = NULL; + struct ssdfs_peb_info *pebi = NULL; + u64 peb_id = U64_MAX; + int state; + int thread_state = SSDFS_FLUSH_THREAD_NEED_CREATE_LOG; + __le64 cur_segs[SSDFS_CUR_SEGS_COUNT]; + size_t size = sizeof(__le64) * SSDFS_CUR_SEGS_COUNT; + bool is_peb_exhausted = false; + bool is_peb_ready_to_exhaust = false; + bool peb_has_dirty_pages = false; + bool has_partial_empty_log = false; + bool skip_finish_flush_request = false; + bool need_create_log = true; + bool has_migration_check_requested = false; + bool has_extent_been_invalidated = false; + bool is_user_data = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + if (!pebc) { + SSDFS_ERR("pointer on PEB container is NULL\n"); + BUG(); + } + + SSDFS_DBG("flush thread: seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + wait_queue = &pebc->parent_si->wait_queue[SSDFS_PEB_FLUSH_THREAD]; + +repeat: + if (err) + thread_state = SSDFS_FLUSH_THREAD_ERROR; + + if (thread_state != SSDFS_FLUSH_THREAD_ERROR && + thread_state != SSDFS_FLUSH_THREAD_FREE_SPACE_ABSENT) { + if (fsi->sb->s_flags & SB_RDONLY) + thread_state = SSDFS_FLUSH_THREAD_RO_STATE; + } + +next_partial_step: + switch (thread_state) { + case SSDFS_FLUSH_THREAD_ERROR: + BUG_ON(err == 0); +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("[FLUSH THREAD STATE] ERROR\n"); + SSDFS_DBG("thread after-error state: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, pebc->peb_index, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (have_flush_requests(pebc)) { + ssdfs_requests_queue_remove_all(&pebc->update_rq, + -EROFS); + } + + if (is_peb_joined_into_create_requests_queue(pebc)) + ssdfs_peb_find_next_log_creation_thread(pebc); + + /* + * Check that we've delegated log creation role. + * Otherwise, simply forget about creation queue. + */ + if (is_peb_joined_into_create_requests_queue(pebc)) { + spin_lock(&pebc->crq_ptr_lock); + ssdfs_requests_queue_remove_all(pebc->create_rq, + -EROFS); + spin_unlock(&pebc->crq_ptr_lock); + + ssdfs_peb_forget_create_requests_queue(pebc); + } + +check_necessity_to_stop_thread: + if (kthread_should_stop()) { + struct completion *ptr; + +stop_flush_thread: + ptr = &pebc->thread[SSDFS_PEB_FLUSH_THREAD].full_stop; + complete_all(ptr); + return err; + } else + goto sleep_failed_flush_thread; + break; + + case SSDFS_FLUSH_THREAD_FREE_SPACE_ABSENT: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("[FLUSH THREAD STATE] FREE SPACE ABSENT: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_peb_joined_into_create_requests_queue(pebc)) { + err = ssdfs_peb_find_next_log_creation_thread(pebc); + if (err == -ENOSPC) + err = 0; + else if (unlikely(err)) { + SSDFS_WARN("fail to delegate log creation role:" + " seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + } + + /* + * Check that we've delegated log creation role. + * Otherwise, simply forget about creation queue. + */ + if (is_peb_joined_into_create_requests_queue(pebc)) { + spin_lock(&pebc->crq_ptr_lock); + ssdfs_requests_queue_remove_all(pebc->create_rq, + -EROFS); + spin_unlock(&pebc->crq_ptr_lock); + + ssdfs_peb_forget_create_requests_queue(pebc); + } + + if (err == -ENOSPC && have_flush_requests(pebc)) { + err = 0; + + if (is_peb_under_migration(pebc)) { + err = __ssdfs_peb_finish_migration(pebc); + if (unlikely(err)) + goto finish_process_free_space_absence; + } + + err = ssdfs_peb_start_migration(pebc); + if (unlikely(err)) + goto finish_process_free_space_absence; + + pebi = ssdfs_get_current_peb_locked(pebc); + if (IS_ERR_OR_NULL(pebi)) { + err = pebi == NULL ? -ERANGE : PTR_ERR(pebi); + SSDFS_ERR("fail to get PEB object: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto finish_process_free_space_absence; + } + + if (is_ssdfs_maptbl_going_to_be_destroyed(maptbl)) { + SSDFS_WARN("seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); + } + + err = ssdfs_peb_container_change_state(pebc); + ssdfs_unlock_current_peb(pebc); + +finish_process_free_space_absence: + if (unlikely(err)) { + SSDFS_WARN("fail to start PEB's migration: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + ssdfs_requests_queue_remove_all(&pebc->update_rq, + -ENOSPC); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + + thread_state = SSDFS_FLUSH_THREAD_NEED_CREATE_LOG; + goto next_partial_step; + } else if (have_flush_requests(pebc)) { + ssdfs_requests_queue_remove_all(&pebc->update_rq, + -ENOSPC); + } + + goto check_necessity_to_stop_thread; + break; + + case SSDFS_FLUSH_THREAD_RO_STATE: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("[FLUSH THREAD STATE] READ-ONLY STATE: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_prepare_current_segment_ids(fsi, cur_segs, size); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare current segments IDs: " + "err %d\n", + err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + + pebi = ssdfs_get_current_peb_locked(pebc); + if (IS_ERR_OR_NULL(pebi)) { + err = pebi == NULL ? -ERANGE : PTR_ERR(pebi); + SSDFS_ERR("fail to get PEB object: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + + if (!(fsi->sb->s_flags & SB_RDONLY)) { + /* + * File system state was changed. + * Now file system has RW state. + */ + if (fsi->fs_state == SSDFS_ERROR_FS) { + ssdfs_peb_current_log_lock(pebi); + if (ssdfs_peb_has_dirty_pages(pebi)) + ssdfs_peb_clear_current_log_pages(pebi); + ssdfs_peb_current_log_unlock(pebi); + ssdfs_unlock_current_peb(pebc); + goto check_necessity_to_stop_thread; + } else { + state = ssdfs_peb_get_current_log_state(pebc); + if (state <= SSDFS_LOG_UNKNOWN || + state >= SSDFS_LOG_STATE_MAX) { + err = -ERANGE; + SSDFS_WARN("invalid log state: " + "state %#x\n", + state); + ssdfs_unlock_current_peb(pebc); + goto repeat; + } + + if (state != SSDFS_LOG_CREATED) { + thread_state = + SSDFS_FLUSH_THREAD_NEED_CREATE_LOG; + ssdfs_unlock_current_peb(pebc); + goto next_partial_step; + } + + thread_state = + SSDFS_FLUSH_THREAD_CHECK_STOP_CONDITION; + ssdfs_unlock_current_peb(pebc); + goto repeat; + } + } + + ssdfs_peb_current_log_lock(pebi); + if (ssdfs_peb_has_dirty_pages(pebi)) { + if (fsi->fs_state == SSDFS_ERROR_FS) + ssdfs_peb_clear_current_log_pages(pebi); + else { + mutex_lock(&pebc->migration_lock); + err = ssdfs_peb_commit_log(pebi, + cur_segs, size); + mutex_unlock(&pebc->migration_lock); + + if (unlikely(err)) { + SSDFS_CRIT("fail to commit log: " + "seg %llu, peb_index %u, " + "err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, + err); + ssdfs_peb_clear_current_log_pages(pebi); + ssdfs_peb_clear_cache_dirty_pages(pebi); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + } + } + } + ssdfs_peb_current_log_unlock(pebi); + + if (!err) { + if (is_ssdfs_maptbl_going_to_be_destroyed(maptbl)) { + SSDFS_WARN("seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); + } + + err = ssdfs_peb_container_change_state(pebc); + if (unlikely(err)) { + SSDFS_CRIT("fail to change peb state: " + "err %d\n", err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + } + } + + ssdfs_unlock_current_peb(pebc); + + goto check_necessity_to_stop_thread; + break; + + case SSDFS_FLUSH_THREAD_NEED_CREATE_LOG: +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("[FLUSH THREAD STATE] NEED CREATE LOG: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#else + SSDFS_DBG("[FLUSH THREAD STATE] NEED CREATE LOG: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (fsi->sb->s_flags & SB_RDONLY) { + thread_state = SSDFS_FLUSH_THREAD_RO_STATE; + goto repeat; + } + + if (kthread_should_stop()) { + if (have_flush_requests(pebc)) { + SSDFS_WARN("discovered unprocessed requests: " + "seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); + } else { + thread_state = + SSDFS_FLUSH_THREAD_CHECK_STOP_CONDITION; + goto repeat; + } + } + + if (!has_ssdfs_segment_blk_bmap_initialized(&si->blk_bmap, + pebc)) { + err = ssdfs_segment_blk_bmap_wait_init_end(&si->blk_bmap, + pebc); + if (unlikely(err)) { + SSDFS_ERR("block bitmap init failed: " + "seg %llu, peb_index %u, err %d\n", + si->seg_id, pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + } + + err = ssdfs_check_peb_container_init_state(pebc); + if (unlikely(err)) { + SSDFS_ERR("fail to check init state: " + "seg %llu, peb_index %u, err %d\n", + si->seg_id, pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + + pebi = ssdfs_get_current_peb_locked(pebc); + if (IS_ERR_OR_NULL(pebi)) { + err = pebi == NULL ? -ERANGE : PTR_ERR(pebi); + SSDFS_ERR("fail to get PEB object: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb_index %u, migration_state %#x\n", + pebc->parent_si->seg_id, + pebc->peb_index, + atomic_read(&pebc->migration_state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_peb_current_log_lock(pebi); + peb_id = pebi->peb_id; + peb_has_dirty_pages = ssdfs_peb_has_dirty_pages(pebi); + need_create_log = peb_has_dirty_pages || + have_flush_requests(pebc); + is_peb_exhausted = is_ssdfs_peb_exhausted(fsi, pebi); +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_id %llu, ssdfs_peb_has_dirty_pages %#x, " + "have_flush_requests %#x, need_create_log %#x, " + "is_peb_exhausted %#x\n", + peb_id, peb_has_dirty_pages, + have_flush_requests(pebc), + need_create_log, is_peb_exhausted); +#endif /* CONFIG_SSDFS_DEBUG */ + ssdfs_peb_current_log_unlock(pebi); + + if (!need_create_log) { + ssdfs_unlock_current_peb(pebc); + thread_state = SSDFS_FLUSH_THREAD_NEED_CREATE_LOG; + goto sleep_flush_thread; + } + + if (has_commit_log_now_requested(pebc) && + is_create_requests_queue_empty(pebc)) { + /* + * If no other commands in the queue + * then ignore the log creation now. + */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("Don't create log: " + "COMMIT_LOG_NOW command: " + "seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + ssdfs_unlock_current_peb(pebc); + thread_state = SSDFS_FLUSH_THREAD_GET_UPDATE_REQUEST; + goto repeat; + } + + if (has_start_migration_now_requested(pebc)) { + /* + * No necessity to create log + * for START_MIGRATION_NOW command. + */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("Don't create log: " + "START_MIGRATION_NOW command: " + "seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + ssdfs_unlock_current_peb(pebc); + thread_state = SSDFS_FLUSH_THREAD_GET_UPDATE_REQUEST; + goto repeat; + } + + if (is_peb_exhausted) { + ssdfs_unlock_current_peb(pebc); + + if (is_ssdfs_maptbl_under_flush(fsi)) { + if (is_ssdfs_peb_containing_user_data(pebc)) { + /* + * Continue logic for user data. + */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ignore mapping table's " + "flush for user data\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (have_flush_requests(pebc)) { + SSDFS_ERR("maptbl is flushing: " + "unprocessed requests: " + "seg %llu, peb %llu\n", + pebc->parent_si->seg_id, + peb_id); + +#ifdef CONFIG_SSDFS_DEBUG + ssdfs_peb_check_update_queue(pebc); +#endif /* CONFIG_SSDFS_DEBUG */ + BUG(); + } else { + SSDFS_ERR("maptbl is flushing\n"); + thread_state = + SSDFS_FLUSH_THREAD_NEED_CREATE_LOG; + goto sleep_flush_thread; + } + } + + if (is_peb_under_migration(pebc)) { + err = __ssdfs_peb_finish_migration(pebc); + if (unlikely(err)) { + SSDFS_ERR("fail to finish migration: " + "seg %llu, peb_index %u, " + "err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + } + + if (!has_peb_migration_done(pebc)) { + SSDFS_ERR("migration is not finished: " + "seg %llu, peb_index %u, " + "err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + + err = ssdfs_peb_start_migration(pebc); + if (unlikely(err)) { + SSDFS_ERR("fail to start migration: " + "seg %llu, peb_index %u, " + "err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + + pebi = ssdfs_get_current_peb_locked(pebc); + if (IS_ERR_OR_NULL(pebi)) { + err = pebi == NULL ? -ERANGE : PTR_ERR(pebi); + SSDFS_ERR("fail to get PEB object: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + + if (is_ssdfs_maptbl_going_to_be_destroyed(maptbl)) { + SSDFS_WARN("seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); + } + + err = ssdfs_peb_container_change_state(pebc); + if (unlikely(err)) { + ssdfs_unlock_current_peb(pebc); + SSDFS_ERR("fail to change peb state: " + "err %d\n", err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("is_peb_under_migration %#x, " + "has_peb_migration_done %#x\n", + is_peb_under_migration(pebc), + has_peb_migration_done(pebc)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_peb_under_migration(pebc) && + has_peb_migration_done(pebc)) { + ssdfs_unlock_current_peb(pebc); + + err = __ssdfs_peb_finish_migration(pebc); + if (unlikely(err)) { + SSDFS_ERR("fail to finish migration: " + "seg %llu, peb_index %u, " + "err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + + pebi = ssdfs_get_current_peb_locked(pebc); + if (IS_ERR_OR_NULL(pebi)) { + err = pebi == NULL ? -ERANGE : PTR_ERR(pebi); + SSDFS_ERR("fail to get PEB object: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + + if (is_ssdfs_maptbl_going_to_be_destroyed(maptbl)) { + SSDFS_WARN("seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); + } + + err = ssdfs_peb_container_change_state(pebc); + if (unlikely(err)) { + ssdfs_unlock_current_peb(pebc); + SSDFS_ERR("fail to change peb state: " + "err %d\n", err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + } + + mutex_lock(&pebc->migration_lock); + err = ssdfs_peb_create_log(pebi); + mutex_unlock(&pebc->migration_lock); + ssdfs_unlock_current_peb(pebc); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: err %d\n", err); +#else + SSDFS_DBG("finished: err %d\n", err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (err == -EAGAIN) { + if (kthread_should_stop()) + goto fail_create_log; + else { + err = 0; + goto sleep_flush_thread; + } + } else if (err == -ENOSPC) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PEB hasn't free space: " + "seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + thread_state = SSDFS_FLUSH_THREAD_FREE_SPACE_ABSENT; + } else if (unlikely(err)) { +fail_create_log: + SSDFS_CRIT("fail to create log: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + } else + thread_state = SSDFS_FLUSH_THREAD_CHECK_STOP_CONDITION; + goto repeat; + break; + + case SSDFS_FLUSH_THREAD_CHECK_STOP_CONDITION: +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("[FLUSH THREAD STATE] CHECK NECESSITY TO STOP: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#else + SSDFS_DBG("[FLUSH THREAD STATE] CHECK NECESSITY TO STOP: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (kthread_should_stop()) { + if (have_flush_requests(pebc)) { + state = ssdfs_peb_get_current_log_state(pebc); + if (state <= SSDFS_LOG_UNKNOWN || + state >= SSDFS_LOG_STATE_MAX) { + err = -ERANGE; + SSDFS_WARN("invalid log state: " + "state %#x\n", + state); + goto repeat; + } + + if (state != SSDFS_LOG_CREATED) { + thread_state = + SSDFS_FLUSH_THREAD_NEED_CREATE_LOG; + goto next_partial_step; + } else + goto process_flush_requests; + } + + err = ssdfs_prepare_current_segment_ids(fsi, + cur_segs, + size); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare current seg IDs: " + "err %d\n", + err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + + pebi = ssdfs_get_current_peb_locked(pebc); + if (IS_ERR_OR_NULL(pebi)) { + err = pebi == NULL ? -ERANGE : PTR_ERR(pebi); + SSDFS_ERR("fail to get PEB object: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + + ssdfs_peb_current_log_lock(pebi); + mutex_lock(&pebc->migration_lock); + err = ssdfs_peb_commit_log_on_thread_stop(pebi, + cur_segs, + size); + mutex_unlock(&pebc->migration_lock); + ssdfs_peb_current_log_unlock(pebi); + + if (unlikely(err)) { + SSDFS_CRIT("fail to commit log: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + } + + ssdfs_unlock_current_peb(pebc); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: err %d\n", err); +#else + SSDFS_DBG("finished: err %d\n", err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + goto stop_flush_thread; + } else { +process_flush_requests: + state = ssdfs_peb_get_current_log_state(pebc); + if (state <= SSDFS_LOG_UNKNOWN || + state >= SSDFS_LOG_STATE_MAX) { + err = -ERANGE; + SSDFS_WARN("invalid log state: " + "state %#x\n", + state); + goto repeat; + } + + if (state != SSDFS_LOG_CREATED) { + thread_state = + SSDFS_FLUSH_THREAD_NEED_CREATE_LOG; + } else { + thread_state = + SSDFS_FLUSH_THREAD_GET_CREATE_REQUEST; + } + goto repeat; + } + break; + + case SSDFS_FLUSH_THREAD_GET_CREATE_REQUEST: +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("[FLUSH THREAD STATE] GET CREATE REQUEST: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#else + SSDFS_DBG("[FLUSH THREAD STATE] GET CREATE REQUEST: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (!have_flush_requests(pebc)) { + thread_state = SSDFS_FLUSH_THREAD_CHECK_STOP_CONDITION; + if (kthread_should_stop()) + goto repeat; + else + goto sleep_flush_thread; + } + + if (!is_peb_joined_into_create_requests_queue(pebc) || + is_create_requests_queue_empty(pebc)) { + thread_state = SSDFS_FLUSH_THREAD_GET_UPDATE_REQUEST; + goto repeat; + } + + spin_lock(&pebc->crq_ptr_lock); + err = ssdfs_requests_queue_remove_first(pebc->create_rq, &req); + spin_unlock(&pebc->crq_ptr_lock); + + if (err == -ENODATA) { + SSDFS_DBG("empty create queue\n"); + err = 0; + thread_state = SSDFS_FLUSH_THREAD_GET_UPDATE_REQUEST; + goto repeat; + } else if (err == -ENOENT) { + SSDFS_WARN("request queue contains NULL request\n"); + err = 0; + thread_state = SSDFS_FLUSH_THREAD_GET_UPDATE_REQUEST; + goto repeat; + } else if (unlikely(err < 0)) { + SSDFS_CRIT("fail to get request from create queue: " + "err %d\n", + err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("req->private.class %#x, req->private.cmd %#x\n", + req->private.class, req->private.cmd); +#endif /* CONFIG_SSDFS_DEBUG */ + + thread_state = SSDFS_FLUSH_THREAD_PROCESS_CREATE_REQUEST; + goto next_partial_step; + break; + + case SSDFS_FLUSH_THREAD_GET_UPDATE_REQUEST: +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("[FLUSH THREAD STATE] GET UPDATE REQUEST: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#else + SSDFS_DBG("[FLUSH THREAD STATE] GET UPDATE REQUEST: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (is_ssdfs_requests_queue_empty(&pebc->update_rq)) { + if (have_flush_requests(pebc)) { + thread_state = + SSDFS_FLUSH_THREAD_GET_CREATE_REQUEST; + goto next_partial_step; + } else { + thread_state = + SSDFS_FLUSH_THREAD_CHECK_STOP_CONDITION; + goto sleep_flush_thread; + } + } + + err = ssdfs_requests_queue_remove_first(&pebc->update_rq, &req); + if (err == -ENODATA) { + SSDFS_DBG("empty update queue\n"); + err = 0; + thread_state = SSDFS_FLUSH_THREAD_GET_UPDATE_REQUEST; + goto repeat; + } else if (err == -ENOENT) { + SSDFS_WARN("request queue contains NULL request\n"); + err = 0; + thread_state = SSDFS_FLUSH_THREAD_GET_UPDATE_REQUEST; + goto repeat; + } else if (unlikely(err < 0)) { + SSDFS_CRIT("fail to get request from update queue: " + "err %d\n", + err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("req->private.class %#x, req->private.cmd %#x\n", + req->private.class, req->private.cmd); +#endif /* CONFIG_SSDFS_DEBUG */ + + thread_state = SSDFS_FLUSH_THREAD_PROCESS_UPDATE_REQUEST; + goto next_partial_step; + break; + + case SSDFS_FLUSH_THREAD_PROCESS_CREATE_REQUEST: +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("[FLUSH THREAD STATE] PROCESS CREATE REQUEST: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); + SSDFS_ERR("req->private.class %#x, req->private.cmd %#x\n", + req->private.class, req->private.cmd); +#else + SSDFS_DBG("[FLUSH THREAD STATE] PROCESS CREATE REQUEST: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); + SSDFS_DBG("req->private.class %#x, req->private.cmd %#x\n", + req->private.class, req->private.cmd); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + is_user_data = is_ssdfs_peb_containing_user_data(pebc); + + if (!has_ssdfs_segment_blk_bmap_initialized(&si->blk_bmap, + pebc)) { + err = ssdfs_segment_blk_bmap_wait_init_end(&si->blk_bmap, + pebc); + if (unlikely(err)) { + SSDFS_ERR("block bitmap init failed: " + "seg %llu, peb_index %u, err %d\n", + si->seg_id, pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + } + + pebi = ssdfs_get_current_peb_locked(pebc); + if (IS_ERR_OR_NULL(pebi)) { + err = pebi == NULL ? -ERANGE : PTR_ERR(pebi); + SSDFS_ERR("fail to get PEB object: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + + ssdfs_peb_current_log_lock(pebi); + + mutex_lock(&pebc->migration_lock); + err = ssdfs_process_create_request(pebi, req); + mutex_unlock(&pebc->migration_lock); + + if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to process create request: " + "seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + ssdfs_finish_flush_request(pebc, req, wait_queue, err); + thread_state = SSDFS_FLUSH_THREAD_FREE_SPACE_ABSENT; + goto finish_create_request_processing; + } else if (err == -EAGAIN) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to process create request : " + "seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + spin_lock(&pebc->crq_ptr_lock); + ssdfs_requests_queue_add_head(pebc->create_rq, req); + spin_unlock(&pebc->crq_ptr_lock); + req = NULL; + skip_finish_flush_request = true; + thread_state = SSDFS_FLUSH_THREAD_COMMIT_LOG; + goto finish_create_request_processing; + } else if (unlikely(err)) { + SSDFS_ERR("fail to process create request: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + ssdfs_finish_flush_request(pebc, req, wait_queue, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto finish_create_request_processing; + } + + if (req->private.type == SSDFS_REQ_SYNC) { + thread_state = SSDFS_FLUSH_THREAD_COMMIT_LOG; + goto finish_create_request_processing; + } + + /* SSDFS_REQ_ASYNC */ + if (is_full_log_ready(pebi)) { + skip_finish_flush_request = false; + thread_state = SSDFS_FLUSH_THREAD_COMMIT_LOG; + goto finish_create_request_processing; + } else if (should_partial_log_being_commited(pebi)) { + skip_finish_flush_request = false; + thread_state = SSDFS_FLUSH_THREAD_COMMIT_LOG; + goto finish_create_request_processing; + } else if (!have_flush_requests(pebc)) { + if (need_wait_next_create_data_request(pebi)) { + ssdfs_account_user_data_flush_request(si); + ssdfs_finish_flush_request(pebc, req, + wait_queue, err); + ssdfs_peb_current_log_unlock(pebi); + ssdfs_unlock_current_peb(pebc); + + err = wait_next_create_request(pebc, + &thread_state); + ssdfs_forget_user_data_flush_request(si); + skip_finish_flush_request = false; + goto finish_wait_next_create_request; + } else if (is_user_data) { + skip_finish_flush_request = false; + thread_state = SSDFS_FLUSH_THREAD_COMMIT_LOG; + goto finish_create_request_processing; + } else { + goto get_next_update_request; + } + } else { +get_next_update_request: + ssdfs_finish_flush_request(pebc, req, + wait_queue, err); + thread_state = + SSDFS_FLUSH_THREAD_GET_UPDATE_REQUEST; + } + +finish_create_request_processing: + ssdfs_peb_current_log_unlock(pebi); + ssdfs_unlock_current_peb(pebc); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + +finish_wait_next_create_request: + if (thread_state == SSDFS_FLUSH_THREAD_COMMIT_LOG) + goto next_partial_step; + else + goto repeat; + break; + + case SSDFS_FLUSH_THREAD_PROCESS_UPDATE_REQUEST: +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("[FLUSH THREAD STATE] PROCESS UPDATE REQUEST: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); + SSDFS_ERR("req->private.class %#x, req->private.cmd %#x\n", + req->private.class, req->private.cmd); +#else + SSDFS_DBG("[FLUSH THREAD STATE] PROCESS UPDATE REQUEST: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); + SSDFS_DBG("req->private.class %#x, req->private.cmd %#x\n", + req->private.class, req->private.cmd); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + is_user_data = is_ssdfs_peb_containing_user_data(pebc); + + if (!has_ssdfs_segment_blk_bmap_initialized(&si->blk_bmap, + pebc)) { + err = ssdfs_segment_blk_bmap_wait_init_end(&si->blk_bmap, + pebc); + if (unlikely(err)) { + SSDFS_ERR("block bitmap init failed: " + "seg %llu, peb_index %u, err %d\n", + si->seg_id, pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + } + + pebi = ssdfs_get_current_peb_locked(pebc); + if (IS_ERR_OR_NULL(pebi)) { + err = pebi == NULL ? -ERANGE : PTR_ERR(pebi); + SSDFS_ERR("fail to get PEB object: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + + ssdfs_peb_current_log_lock(pebi); + + mutex_lock(&pebc->migration_lock); + err = ssdfs_process_update_request(pebi, req); + mutex_unlock(&pebc->migration_lock); + + if (err == -EAGAIN) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to process update request : " + "seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + ssdfs_requests_queue_add_head(&pebc->update_rq, req); + req = NULL; + skip_finish_flush_request = true; + thread_state = SSDFS_FLUSH_THREAD_COMMIT_LOG; + goto finish_update_request_processing; + } else if (err == -ENOENT && + req->private.cmd == SSDFS_BTREE_NODE_DIFF) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to process update request : " + "seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + req = NULL; + thread_state = SSDFS_FLUSH_THREAD_GET_UPDATE_REQUEST; + goto finish_update_request_processing; + } else if (unlikely(err)) { + SSDFS_ERR("fail to process update request: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + ssdfs_finish_flush_request(pebc, req, wait_queue, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto finish_update_request_processing; + } + + switch (req->private.cmd) { + case SSDFS_EXTENT_WAS_INVALIDATED: + /* log has to be committed */ + has_extent_been_invalidated = true; + thread_state = + SSDFS_FLUSH_THREAD_PROCESS_INVALIDATED_EXTENT; + goto finish_update_request_processing; + + case SSDFS_START_MIGRATION_NOW: + thread_state = SSDFS_FLUSH_THREAD_START_MIGRATION_NOW; + goto finish_update_request_processing; + + case SSDFS_COMMIT_LOG_NOW: + if (has_commit_log_now_requested(pebc)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("Ignore current COMMIT_LOG_NOW: " + "seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + ssdfs_finish_flush_request(pebc, req, + wait_queue, err); + thread_state = + SSDFS_FLUSH_THREAD_GET_CREATE_REQUEST; + } else if (have_flush_requests(pebc)) { + ssdfs_requests_queue_add_tail(&pebc->update_rq, + req); + req = NULL; + + state = atomic_read(&pebi->current_log.state); + if (state == SSDFS_LOG_COMMITTED) { + thread_state = + SSDFS_FLUSH_THREAD_NEED_CREATE_LOG; + } else { + thread_state = + SSDFS_FLUSH_THREAD_GET_CREATE_REQUEST; + } + } else if (has_extent_been_invalidated) { + if (is_user_data_pages_invalidated(si) && + is_regular_fs_operations(pebc)) { + ssdfs_account_user_data_flush_request(si); + ssdfs_finish_flush_request(pebc, req, + wait_queue, err); + ssdfs_peb_current_log_unlock(pebi); + ssdfs_unlock_current_peb(pebc); + + err = wait_next_invalidate_request(pebc, + &thread_state); + ssdfs_forget_user_data_flush_request(si); + skip_finish_flush_request = false; + goto finish_wait_next_data_request; + } else { + thread_state = + SSDFS_FLUSH_THREAD_COMMIT_LOG; + } + } else if (ssdfs_peb_has_dirty_pages(pebi)) { + if (need_wait_next_create_data_request(pebi)) { + ssdfs_account_user_data_flush_request(si); + ssdfs_finish_flush_request(pebc, req, + wait_queue, err); + ssdfs_peb_current_log_unlock(pebi); + ssdfs_unlock_current_peb(pebc); + + err = wait_next_create_request(pebc, + &thread_state); + ssdfs_forget_user_data_flush_request(si); + skip_finish_flush_request = false; + goto finish_wait_next_data_request; + } else if (need_wait_next_update_request(pebi)) { + ssdfs_account_user_data_flush_request(si); + ssdfs_finish_flush_request(pebc, req, + wait_queue, err); + ssdfs_peb_current_log_unlock(pebi); + ssdfs_unlock_current_peb(pebc); + + err = wait_next_update_request(pebc, + &thread_state); + ssdfs_forget_user_data_flush_request(si); + skip_finish_flush_request = false; + goto finish_wait_next_data_request; + + } else { + thread_state = + SSDFS_FLUSH_THREAD_COMMIT_LOG; + } + } else { + ssdfs_finish_flush_request(pebc, req, + wait_queue, err); + + state = atomic_read(&pebi->current_log.state); + if (state == SSDFS_LOG_COMMITTED) { + thread_state = + SSDFS_FLUSH_THREAD_NEED_CREATE_LOG; + } else { + thread_state = + SSDFS_FLUSH_THREAD_GET_CREATE_REQUEST; + } + } + goto finish_update_request_processing; + + default: + /* do nothing */ + break; + } + + if (req->private.type == SSDFS_REQ_SYNC) { + thread_state = SSDFS_FLUSH_THREAD_COMMIT_LOG; + goto finish_update_request_processing; + } else if (has_migration_check_requested) { + ssdfs_finish_flush_request(pebc, req, + wait_queue, err); + thread_state = SSDFS_FLUSH_THREAD_COMMIT_LOG; + goto finish_update_request_processing; + } else if (is_full_log_ready(pebi)) { + skip_finish_flush_request = false; + thread_state = SSDFS_FLUSH_THREAD_COMMIT_LOG; + goto finish_update_request_processing; + } else if (should_partial_log_being_commited(pebi)) { + skip_finish_flush_request = false; + thread_state = SSDFS_FLUSH_THREAD_COMMIT_LOG; + goto finish_update_request_processing; + } else if (!have_flush_requests(pebc)) { + if (need_wait_next_update_request(pebi)) { + ssdfs_account_user_data_flush_request(si); + ssdfs_finish_flush_request(pebc, req, + wait_queue, err); + ssdfs_peb_current_log_unlock(pebi); + ssdfs_unlock_current_peb(pebc); + + err = wait_next_update_request(pebc, + &thread_state); + ssdfs_forget_user_data_flush_request(si); + skip_finish_flush_request = false; + goto finish_wait_next_data_request; + } else if (is_user_data && + ssdfs_peb_has_dirty_pages(pebi)) { + skip_finish_flush_request = false; + thread_state = SSDFS_FLUSH_THREAD_COMMIT_LOG; + goto finish_update_request_processing; + } else + goto get_next_create_request; + } else { +get_next_create_request: + ssdfs_finish_flush_request(pebc, req, wait_queue, err); + thread_state = SSDFS_FLUSH_THREAD_GET_CREATE_REQUEST; + goto finish_update_request_processing; + } + +finish_update_request_processing: + ssdfs_peb_current_log_unlock(pebi); + ssdfs_unlock_current_peb(pebc); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + +finish_wait_next_data_request: + if (thread_state == SSDFS_FLUSH_THREAD_COMMIT_LOG || + thread_state == SSDFS_FLUSH_THREAD_START_MIGRATION_NOW) { + goto next_partial_step; + } else if (thread_state == SSDFS_FLUSH_THREAD_NEED_CREATE_LOG) + goto sleep_flush_thread; + else + goto repeat; + break; + + case SSDFS_FLUSH_THREAD_PROCESS_INVALIDATED_EXTENT: +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("[FLUSH THREAD STATE] PROCESS INVALIDATED EXTENT: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#else + SSDFS_DBG("[FLUSH THREAD STATE] PROCESS INVALIDATED EXTENT: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + pebi = ssdfs_get_current_peb_locked(pebc); + if (IS_ERR_OR_NULL(pebi)) { + err = pebi == NULL ? -ERANGE : PTR_ERR(pebi); + SSDFS_ERR("fail to get PEB object: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + + if (is_peb_under_migration(pebc) && + has_peb_migration_done(pebc)) { + ssdfs_unlock_current_peb(pebc); + + err = __ssdfs_peb_finish_migration(pebc); + if (unlikely(err)) { + SSDFS_ERR("fail to finish migration: " + "seg %llu, peb_index %u, " + "err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + + pebi = ssdfs_get_current_peb_locked(pebc); + if (IS_ERR_OR_NULL(pebi)) { + err = pebi == NULL ? -ERANGE : PTR_ERR(pebi); + SSDFS_ERR("fail to get PEB object: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + + if (is_ssdfs_maptbl_going_to_be_destroyed(maptbl)) { + SSDFS_WARN("seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); + } + + err = ssdfs_peb_container_change_state(pebc); + if (unlikely(err)) { + ssdfs_unlock_current_peb(pebc); + SSDFS_ERR("fail to change peb state: " + "err %d\n", err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + } + + ssdfs_peb_current_log_lock(pebi); + ssdfs_finish_flush_request(pebc, req, wait_queue, err); + ssdfs_peb_current_log_unlock(pebi); + + ssdfs_unlock_current_peb(pebc); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + thread_state = SSDFS_FLUSH_THREAD_GET_UPDATE_REQUEST; + goto next_partial_step; + break; + + case SSDFS_FLUSH_THREAD_START_MIGRATION_NOW: +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("[FLUSH THREAD STATE] START MIGRATION REQUEST: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#else + SSDFS_DBG("[FLUSH THREAD STATE] START MIGRATION REQUEST: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + pebi = ssdfs_get_current_peb_locked(pebc); + if (IS_ERR_OR_NULL(pebi)) { + err = pebi == NULL ? -ERANGE : PTR_ERR(pebi); + SSDFS_ERR("fail to get PEB object: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + + ssdfs_peb_current_log_lock(pebi); + is_peb_exhausted = is_ssdfs_peb_exhausted(fsi, pebi); + is_peb_ready_to_exhaust = + is_ssdfs_peb_ready_to_exhaust(fsi, pebi); + has_partial_empty_log = + ssdfs_peb_has_partial_empty_log(fsi, pebi); + ssdfs_peb_current_log_unlock(pebi); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("is_peb_exhausted %#x, " + "is_peb_ready_to_exhaust %#x\n", + is_peb_exhausted, is_peb_ready_to_exhaust); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_peb_exhausted || is_peb_ready_to_exhaust) { + ssdfs_unlock_current_peb(pebc); + + if (is_peb_under_migration(pebc)) { + /* + * START_MIGRATION_NOW is requested during + * the flush operation of PEB mapping table, + * segment bitmap or any btree. It is the first + * step to initiate the migration. + * Then, fragments or nodes will be flushed. + * And final step is the COMMIT_LOG_NOW + * request. So, it doesn't need to request + * the COMMIT_LOG_NOW here. + */ + err = ssdfs_peb_finish_migration(pebc); + if (unlikely(err)) { + SSDFS_ERR("fail to finish migration: " + "seg %llu, peb_index %u, " + "err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto process_migration_failure; + } + } + + if (!has_peb_migration_done(pebc)) { + SSDFS_ERR("migration is not finished: " + "seg %llu, peb_index %u, " + "err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto process_migration_failure; + } + + err = ssdfs_peb_start_migration(pebc); + if (unlikely(err)) { + SSDFS_ERR("fail to start migration: " + "seg %llu, peb_index %u, " + "err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto process_migration_failure; + } + +process_migration_failure: + pebi = ssdfs_get_current_peb_locked(pebc); + if (err) { + if (IS_ERR_OR_NULL(pebi)) { + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + + ssdfs_peb_current_log_lock(pebi); + ssdfs_finish_flush_request(pebc, req, + wait_queue, + err); + ssdfs_peb_current_log_unlock(pebi); + ssdfs_unlock_current_peb(pebc); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } else if (IS_ERR_OR_NULL(pebi)) { + err = pebi == NULL ? -ERANGE : PTR_ERR(pebi); + SSDFS_ERR("fail to get PEB object: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + + if (is_ssdfs_maptbl_going_to_be_destroyed(maptbl)) { + SSDFS_WARN("seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); + } + + err = ssdfs_peb_container_change_state(pebc); + if (unlikely(err)) { + ssdfs_unlock_current_peb(pebc); + SSDFS_ERR("fail to change peb state: " + "err %d\n", err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + } else if (has_partial_empty_log) { + /* + * TODO: it will need to implement logic here + */ + SSDFS_WARN("log is partially empty\n"); + } + + ssdfs_peb_current_log_lock(pebi); + ssdfs_finish_flush_request(pebc, req, wait_queue, err); + ssdfs_peb_current_log_unlock(pebi); + ssdfs_unlock_current_peb(pebc); + + state = ssdfs_peb_get_current_log_state(pebc); + if (state <= SSDFS_LOG_UNKNOWN || + state >= SSDFS_LOG_STATE_MAX) { + err = -ERANGE; + SSDFS_WARN("invalid log state: " + "state %#x\n", + state); + goto repeat; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (state != SSDFS_LOG_CREATED) { + thread_state = + SSDFS_FLUSH_THREAD_NEED_CREATE_LOG; + goto sleep_flush_thread; + } else { + thread_state = + SSDFS_FLUSH_THREAD_GET_UPDATE_REQUEST; + } + goto next_partial_step; + break; + + case SSDFS_FLUSH_THREAD_COMMIT_LOG: +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("[FLUSH THREAD STATE] COMMIT LOG: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#else + SSDFS_DBG("[FLUSH THREAD STATE] COMMIT LOG: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (postponed_req) { + req = postponed_req; + postponed_req = NULL; + has_migration_check_requested = false; + } else if (req != NULL) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("req->private.class %#x, " + "req->private.cmd %#x\n", + req->private.class, + req->private.cmd); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (req->private.class) { + case SSDFS_PEB_COLLECT_GARBAGE_REQ: + /* ignore this case */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ignore request class %#x\n", + req->private.class); +#endif /* CONFIG_SSDFS_DEBUG */ + goto make_log_commit; + + default: + /* Try to stimulate the migration */ + break; + } + + if (is_peb_under_migration(pebc) && + !has_migration_check_requested) { + SSDFS_DBG("Try to stimulate the migration\n"); + thread_state = + SSDFS_FLUSH_THREAD_CHECK_MIGRATION_NEED; + has_migration_check_requested = true; + postponed_req = req; + req = NULL; + goto next_partial_step; + } else + has_migration_check_requested = false; + } + +make_log_commit: + err = ssdfs_prepare_current_segment_ids(fsi, cur_segs, size); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare current segments IDs: " + "err %d\n", + err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + + pebi = ssdfs_get_current_peb_locked(pebc); + if (IS_ERR_OR_NULL(pebi)) { + err = pebi == NULL ? -ERANGE : PTR_ERR(pebi); + SSDFS_ERR("fail to get PEB object: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + + ssdfs_peb_current_log_lock(pebi); + mutex_lock(&pebc->migration_lock); + peb_id = pebi->peb_id; + err = ssdfs_peb_commit_log(pebi, cur_segs, size); + mutex_unlock(&pebc->migration_lock); + + if (err) { + ssdfs_peb_clear_current_log_pages(pebi); + ssdfs_peb_clear_cache_dirty_pages(pebi); + ssdfs_requests_queue_remove_all(&pebc->update_rq, + -EROFS); + + ssdfs_fs_error(fsi->sb, + __FILE__, __func__, __LINE__, + "fail to commit log: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + } + ssdfs_peb_current_log_unlock(pebi); + + if (!err) { + has_extent_been_invalidated = false; + + if (is_ssdfs_maptbl_going_to_be_destroyed(maptbl)) { + SSDFS_WARN("mapping table is near destroy: " + "seg %llu, peb_index %u, " + "peb_id %llu, peb_type %#x, " + "req->private.class %#x, " + "req->private.cmd %#x\n", + pebc->parent_si->seg_id, + pebc->peb_index, + peb_id, + pebc->peb_type, + req->private.class, + req->private.cmd); + } + + ssdfs_forget_invalidated_user_data_pages(si); + + err = ssdfs_peb_container_change_state(pebc); + if (unlikely(err)) { + SSDFS_ERR("fail to change peb state: " + "err %d\n", err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + } + } + + ssdfs_peb_current_log_lock(pebi); + if (skip_finish_flush_request) + skip_finish_flush_request = false; + else + ssdfs_finish_flush_request(pebc, req, wait_queue, err); + ssdfs_peb_current_log_unlock(pebi); + + ssdfs_unlock_current_peb(pebc); + +#ifdef CONFIG_SSDFS_DEBUG + ssdfs_peb_check_update_queue(pebc); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: err %d\n", err); +#else + SSDFS_DBG("finished: err %d\n", err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (unlikely(err)) + goto repeat; + + thread_state = SSDFS_FLUSH_THREAD_DELEGATE_CREATE_ROLE; + goto next_partial_step; + break; + + case SSDFS_FLUSH_THREAD_CHECK_MIGRATION_NEED: +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("[FLUSH THREAD STATE] CHECK MIGRATION NEED REQUEST: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#else + SSDFS_DBG("[FLUSH THREAD STATE] CHECK MIGRATION NEED REQUEST: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (is_peb_under_migration(pebc)) { + u32 free_space1, free_space2; + u16 free_data_pages; + + pebi = ssdfs_get_current_peb_locked(pebc); + if (IS_ERR_OR_NULL(pebi)) { + err = pebi == NULL ? -ERANGE : PTR_ERR(pebi); + SSDFS_ERR("fail to get PEB object: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } + + ssdfs_peb_current_log_lock(pebi); + free_space1 = ssdfs_area_free_space(pebi, + SSDFS_LOG_JOURNAL_AREA); + free_space2 = ssdfs_area_free_space(pebi, + SSDFS_LOG_DIFFS_AREA); + free_data_pages = pebi->current_log.free_data_pages; + peb_has_dirty_pages = ssdfs_peb_has_dirty_pages(pebi); + ssdfs_peb_current_log_unlock(pebi); + ssdfs_unlock_current_peb(pebc); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_space1 %u, free_space2 %u, " + "free_data_pages %u, peb_has_dirty_pages %#x\n", + free_space1, free_space2, + free_data_pages, peb_has_dirty_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!peb_has_dirty_pages) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PEB has no dirty pages: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_check_migration_need; + } + + if (free_data_pages == 0) { + /* + * No free space for shadow migration. + */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("No free space for shadow migration: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_check_migration_need; + } + + if (free_space1 < (PAGE_SIZE / 2) && + free_space2 < (PAGE_SIZE / 2)) { + /* + * No free space for shadow migration. + */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("No free space for shadow migration: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_check_migration_need; + } + + if (!has_ssdfs_source_peb_valid_blocks(pebc)) { + /* + * No used blocks in the source PEB. + */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("No used blocks in the source PEB: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_check_migration_need; + } + + mutex_lock(&pebc->migration_lock); + + if (free_space1 >= (PAGE_SIZE / 2)) { + err = ssdfs_peb_prepare_range_migration(pebc, 1, + SSDFS_BLK_PRE_ALLOCATED); + if (err == -ENODATA) { + err = 0; + SSDFS_DBG("unable to migrate: " + "no pre-allocated blocks\n"); + } else + goto stimulate_migration_done; + } + + if (free_space2 >= (PAGE_SIZE / 2)) { + err = ssdfs_peb_prepare_range_migration(pebc, 1, + SSDFS_BLK_VALID); + if (err == -ENODATA) { + SSDFS_DBG("unable to migrate: " + "no valid blocks\n"); + } + } + +stimulate_migration_done: + mutex_unlock(&pebc->migration_lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: err %d\n", err); +#else + SSDFS_DBG("finished: err %d\n", err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (err == -ENODATA) { + err = 0; + goto finish_check_migration_need; + } else if (unlikely(err)) { + SSDFS_ERR("fail to prepare range migration: " + "err %d\n", err); + thread_state = SSDFS_FLUSH_THREAD_COMMIT_LOG; + goto next_partial_step; + } + + thread_state = SSDFS_FLUSH_THREAD_GET_UPDATE_REQUEST; + goto next_partial_step; + } else { +finish_check_migration_need: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("no migration necessary: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + thread_state = SSDFS_FLUSH_THREAD_COMMIT_LOG; + goto next_partial_step; + } + break; + + case SSDFS_FLUSH_THREAD_DELEGATE_CREATE_ROLE: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("[FLUSH THREAD STATE] DELEGATE CREATE ROLE: " + "seg_id %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_peb_joined_into_create_requests_queue(pebc)) { +finish_delegation: + if (err) { + thread_state = SSDFS_FLUSH_THREAD_ERROR; + goto repeat; + } else { + thread_state = + SSDFS_FLUSH_THREAD_CHECK_STOP_CONDITION; + goto sleep_flush_thread; + } + } + + err = ssdfs_peb_find_next_log_creation_thread(pebc); + if (unlikely(err)) { + SSDFS_WARN("fail to delegate log creation role: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + } + goto finish_delegation; + break; + + default: + BUG(); + }; + +/* + * Every thread should be added into one wait queue only. + * Segment object should have several queues: + * (1) read threads waiting queue; + * (2) flush threads waiting queue; + * (3) GC threads waiting queue. + * The wakeup operation should be the operation under group + * of threads of the same type. Thread function should check + * several condition in the case of wakeup. + */ + +sleep_flush_thread: +#ifdef CONFIG_SSDFS_DEBUG + if (is_ssdfs_peb_containing_user_data(pebc)) { + pebi = ssdfs_get_current_peb_locked(pebc); + if (!IS_ERR_OR_NULL(pebi)) { + ssdfs_peb_current_log_lock(pebi); + + if (ssdfs_peb_has_dirty_pages(pebi)) { + u64 reserved_new_user_data_pages; + u64 updated_user_data_pages; + u64 flushing_user_data_requests; + + spin_lock(&fsi->volume_state_lock); + reserved_new_user_data_pages = + fsi->reserved_new_user_data_pages; + updated_user_data_pages = + fsi->updated_user_data_pages; + flushing_user_data_requests = + fsi->flushing_user_data_requests; + spin_unlock(&fsi->volume_state_lock); + + SSDFS_WARN("seg %llu, peb %llu, peb_type %#x, " + "global_fs_state %#x, " + "reserved_new_user_data_pages %llu, " + "updated_user_data_pages %llu, " + "flushing_user_data_requests %llu\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, pebi->pebc->peb_type, + atomic_read(&fsi->global_fs_state), + reserved_new_user_data_pages, + updated_user_data_pages, + flushing_user_data_requests); + } + + ssdfs_peb_current_log_unlock(pebi); + ssdfs_unlock_current_peb(pebc); + } + } +#endif /* CONFIG_SSDFS_DEBUG */ + + wait_event_interruptible(*wait_queue, + FLUSH_THREAD_WAKE_CONDITION(pebc)); + goto repeat; + +sleep_failed_flush_thread: + wait_event_interruptible(*wait_queue, + FLUSH_FAILED_THREAD_WAKE_CONDITION()); + goto repeat; +} From patchwork Sat Feb 25 01:08:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151935 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4F52C7EE2E for ; Sat, 25 Feb 2023 01:17:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229759AbjBYBRl (ORCPT ); Fri, 24 Feb 2023 20:17:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48692 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229685AbjBYBQs (ORCPT ); Fri, 24 Feb 2023 20:16:48 -0500 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ADA2312F04 for ; Fri, 24 Feb 2023 17:16:37 -0800 (PST) Received: by mail-oi1-x22b.google.com with SMTP id t22so777553oiw.12 for ; Fri, 24 Feb 2023 17:16:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=K8KT/xsZ3aapY7TlRqxS9n+nCYXhYZVje65G7qYVAHY=; b=F63SyBD7tuXd7IY3FPi3gQaVqfseTEFrv3e7pZEAw/vHfcp/owwiWIrGa1YoDLmKUH Mz6BmDN7MYCg5mzGxKXJ45gEHp+dN3grYW7gjSJfUdyA0AA5MvqHLUYAg3euLzO78SG3 iMYTLFbW2FPbJNcv7c8ZQ1QER7H+MAUxM3E7FqX88pQm0V8NL3lQyl3UN+hwSE8eFfT8 edzqjTEfrKYKXVcZ06bFMC6x4VPtES/8O1elnUWe6aKLTt4/ENUueNboOiomtlgHF4Ai MjxJN8zXsfUPU5BASlC2tXbag12l05IAdkDs6S/ppST8GLopgNCZsBQ2y16AsyDpfc2K k4mw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=K8KT/xsZ3aapY7TlRqxS9n+nCYXhYZVje65G7qYVAHY=; b=xUKlCFQNKthuC3DtDWc471t6bunhFfDqCOVNF/MKWgxlspXLrRufFInlz9GYreTrrX b4GLM0+O4J+ybraQYvyU6A3lx4LaEEAqJ/OIqSrRj99HmhetRx/tCIEkK9gg8NrMZOxR bwJRG6fToSgNmZlnlOj64xM34gZkKMHUhH1vt/w681NP/5jaWJPvr0fXSGwQU5LwaemS eaFyygsQ1aXdtN2po+pGwAv6TD57h8jA8L8OO8zjd7h0JU63lm+Z9s3wj2xDUABlguZN oUYiM4Lft4t79poEUMS3f5yLErDBaaSzEFZTOhR7gU1M0MMX7hKnA704SEsyzgSMp0HQ iDpQ== X-Gm-Message-State: AO0yUKVuLYFuGwn1y7b0xO/AN6X41xr0Gb/DQHnMmWaXXmCFru8ioaVX 8EXh3UPbWgmwf+DDha9DioPx9LL9jAv24Grg X-Google-Smtp-Source: AK7set82SNyvQG21EdniJg1i/56hilXAroD1yGWztcyFh9MzeX+kxSegFMWODKf6p0Gkcno/VyI9aQ== X-Received: by 2002:a05:6808:901:b0:37f:ae54:155a with SMTP id w1-20020a056808090100b0037fae54155amr7495982oih.32.1677287795837; Fri, 24 Feb 2023 17:16:35 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:34 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 29/76] ssdfs: commit log logic Date: Fri, 24 Feb 2023 17:08:40 -0800 Message-Id: <20230225010927.813929-30-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org SSDFS stores any data or metadata in the form of logs in append-only manner. The mkfs tool defines the size of full log (for example, 32 logical block or NAND flash pages) for user data and various metadata. If flush thread has enough requests to prepare full log, then log starts from segment header, continues by payload, and finishes by footer. Otherwise, flush thread logic needs to commit a partial log. Partial log starts from partial log's header, continues by payaload, and has no footer. Potentially, area of full log can contain a sequence of partial logs. It is expected that the latest log in the sequence of partial logs finishes by footer. Header contains metadata that desribes location of block bitmap and offset translation table in the payload of log. Block bitmap tracks the state (free, pre-allocated, allocated, invalidated) of logical blocks in the erase block. Offset translation table converts logical block ID into the offset of piece of data in the log's payload. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/peb_flush_thread.c | 1745 +++++++++++++++++++++++++++++++++++ 1 file changed, 1745 insertions(+) diff --git a/fs/ssdfs/peb_flush_thread.c b/fs/ssdfs/peb_flush_thread.c index 6a9032762ea6..d9352804f6b9 100644 --- a/fs/ssdfs/peb_flush_thread.c +++ b/fs/ssdfs/peb_flush_thread.c @@ -113,6 +113,1751 @@ void ssdfs_flush_check_memory_leaks(void) * FLUSH THREAD FUNCTIONALITY * ******************************************************************************/ +/* + * ssdfs_peb_define_next_log_start() - define start of the next log + * @pebi: pointer on PEB object + * @log_strategy: strategy in log creation + * @cur_page: pointer on current page value [in|out] + * @write_offset: pointer on write offset value [in|out] + */ +static +void ssdfs_peb_define_next_log_start(struct ssdfs_peb_info *pebi, + int log_strategy, + pgoff_t *cur_page, u32 *write_offset) +{ + struct ssdfs_fs_info *fsi; + u16 pages_diff; + u16 rest_phys_free_pages = 0; + u32 pages_per_peb; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!cur_page || !write_offset); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, log_strategy %#x, " + "current_log.start_page %u, " + "cur_page %lu, write_offset %u, " + "current_log.free_data_pages %u, " + "sequence_id %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + log_strategy, + pebi->current_log.start_page, + *cur_page, *write_offset, + pebi->current_log.free_data_pages, + atomic_read(&pebi->current_log.sequence_id)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + pages_per_peb = min_t(u32, fsi->leb_pages_capacity, + fsi->peb_pages_capacity); + + switch (log_strategy) { + case SSDFS_START_PARTIAL_LOG: + case SSDFS_CONTINUE_PARTIAL_LOG: + pebi->current_log.start_page = *cur_page; + rest_phys_free_pages = pebi->log_pages - + (*cur_page % pebi->log_pages); + pebi->current_log.free_data_pages = rest_phys_free_pages; + atomic_inc(&pebi->current_log.sequence_id); + WARN_ON(pebi->current_log.free_data_pages == 0); + break; + + case SSDFS_FINISH_PARTIAL_LOG: + case SSDFS_FINISH_FULL_LOG: + if (*cur_page % pebi->log_pages) { + *cur_page += pebi->log_pages - 1; + *cur_page = + (*cur_page / pebi->log_pages) * pebi->log_pages; + } + + pebi->current_log.start_page = *cur_page; + + if (pebi->current_log.start_page >= pages_per_peb) { + pebi->current_log.free_data_pages = 0; + } else { + pages_diff = pages_per_peb; + pages_diff -= pebi->current_log.start_page; + + pebi->current_log.free_data_pages = + min_t(u16, pebi->log_pages, pages_diff); + } + + atomic_set(&pebi->current_log.sequence_id, 0); + break; + + default: + BUG(); + break; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pebi->current_log.start_page %u, " + "current_log.free_data_pages %u, " + "sequence_id %d\n", + pebi->current_log.start_page, + pebi->current_log.free_data_pages, + atomic_read(&pebi->current_log.sequence_id)); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +/* + * ssdfs_peb_store_pl_header_like_footer() - store partial log's header + * @pebi: pointer on PEB object + * @flags: partial log header's flags + * @hdr_desc: partial log header's metadata descriptor in segment header + * @plh_desc: partial log header's metadata descriptors array + * @array_size: count of items in array + * @cur_page: pointer on current page value [in|out] + * @write_offset: pointer on write offset value [in|out] + * + * This function tries to store the partial log's header + * in the end of the log (instead of footer). + * + * RETURN: + * [success] + * [failure] - error code. + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_store_pl_header_like_footer(struct ssdfs_peb_info *pebi, + u32 flags, + struct ssdfs_metadata_descriptor *hdr_desc, + struct ssdfs_metadata_descriptor *plh_desc, + size_t array_size, + pgoff_t *cur_page, + u32 *write_offset) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_partial_log_header *pl_hdr; + u32 log_pages; + struct page *page; + size_t desc_size = sizeof(struct ssdfs_metadata_descriptor); + size_t array_bytes = desc_size * array_size; + u32 area_offset, area_size; + u16 seg_type; + int sequence_id; + u64 last_log_time; + u64 last_log_cno; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!hdr_desc || !plh_desc); + BUG_ON(!cur_page || !write_offset); + BUG_ON(array_size != SSDFS_SEG_HDR_DESC_MAX); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, current_log.start_page %u, " + "cur_page %lu, write_offset %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page, + *cur_page, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + seg_type = pebi->pebc->parent_si->seg_type; + + sequence_id = atomic_read(&pebi->current_log.sequence_id); + if (sequence_id < 0 || sequence_id >= INT_MAX) { + SSDFS_ERR("invalid sequence_id %d\n", sequence_id); + return -ERANGE; + } + + area_offset = *write_offset; + area_size = sizeof(struct ssdfs_partial_log_header); + + *write_offset += max_t(u32, PAGE_SIZE, area_size); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(((*write_offset + PAGE_SIZE - 1) >> fsi->log_pagesize) > + pebi->log_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + log_pages = (*write_offset + fsi->pagesize - 1) / fsi->pagesize; + + page = ssdfs_page_array_grab_page(&pebi->cache, *cur_page); + if (IS_ERR_OR_NULL(page)) { + SSDFS_ERR("fail to get cache page: index %lu\n", + *cur_page); + return -ENOMEM; + } + + pl_hdr = kmap_local_page(page); + memset(pl_hdr, 0xFF, PAGE_SIZE); + ssdfs_memcpy(pl_hdr->desc_array, 0, array_bytes, + plh_desc, 0, array_bytes, + array_bytes); + + pl_hdr->peb_create_time = cpu_to_le64(pebi->peb_create_time); + + last_log_time = pebi->current_log.last_log_time; + last_log_cno = pebi->current_log.last_log_cno; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, " + "peb_create_time %llx, last_log_time %llx\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + pebi->peb_create_time, + last_log_time); + + BUG_ON(pebi->peb_create_time > last_log_time); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_prepare_partial_log_header_for_commit(fsi, + sequence_id, + log_pages, + seg_type, flags, + last_log_time, + last_log_cno, + pl_hdr); + + if (!err) { + hdr_desc->offset = cpu_to_le32(area_offset + + (pebi->current_log.start_page * fsi->pagesize)); + hdr_desc->size = cpu_to_le32(area_size); + + ssdfs_memcpy(&hdr_desc->check, + 0, sizeof(struct ssdfs_metadata_check), + &pl_hdr->check, + 0, sizeof(struct ssdfs_metadata_check), + sizeof(struct ssdfs_metadata_check)); + } + + flush_dcache_page(page); + kunmap_local(pl_hdr); + + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + + err = ssdfs_page_array_set_page_dirty(&pebi->cache, *cur_page); + if (unlikely(err)) { + SSDFS_ERR("fail to set page dirty: " + "page_index %lu, err %d\n", + *cur_page, err); + } + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) { + SSDFS_CRIT("fail to store partial log header: " + "seg %llu, peb %llu, current_log.start_page %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page, err); + return err; + } + + pebi->current_log.seg_flags |= + SSDFS_LOG_IS_PARTIAL | + SSDFS_LOG_HAS_PARTIAL_HEADER | + SSDFS_PARTIAL_HEADER_INSTEAD_FOOTER; + + (*cur_page)++; + + return 0; +} + +/* + * ssdfs_peb_store_pl_header_like_header() - store partial log's header + * @pebi: pointer on PEB object + * @flags: partial log header's flags + * @plh_desc: partial log header's metadata descriptors array + * @array_size: count of items in array + * @cur_page: pointer on current page value [in|out] + * @write_offset: pointer on write offset value [in|out] + * + * This function tries to store the partial log's header + * in the beginning of the log. + * + * RETURN: + * [success] + * [failure] - error code. + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_store_pl_header_like_header(struct ssdfs_peb_info *pebi, + u32 flags, + struct ssdfs_metadata_descriptor *plh_desc, + size_t array_size, + pgoff_t *cur_page, + u32 *write_offset) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_partial_log_header *pl_hdr; + struct page *page; + size_t desc_size = sizeof(struct ssdfs_metadata_descriptor); + size_t array_bytes = desc_size * array_size; + u32 seg_flags; + u32 log_pages; + u16 seg_type; + int sequence_id; + u64 last_log_time; + u64 last_log_cno; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!plh_desc); + BUG_ON(array_size != SSDFS_SEG_HDR_DESC_MAX); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, current_log.start_page %u, " + "write_offset %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page, + *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(pebi->pebc->parent_si->seg_type > SSDFS_LAST_KNOWN_SEG_TYPE); + BUG_ON(*write_offset % fsi->pagesize); + BUG_ON((*write_offset >> fsi->log_pagesize) > pebi->log_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + sequence_id = atomic_read(&pebi->current_log.sequence_id); + if (sequence_id < 0 || sequence_id >= INT_MAX) { + SSDFS_ERR("invalid sequence_id %d\n", sequence_id); + return -ERANGE; + } + + seg_type = pebi->pebc->parent_si->seg_type; + seg_flags = pebi->current_log.seg_flags; + + log_pages = (*write_offset + fsi->pagesize - 1) / fsi->pagesize; + + page = ssdfs_page_array_get_page_locked(&pebi->cache, + pebi->current_log.start_page); + if (IS_ERR_OR_NULL(page)) { + SSDFS_ERR("fail to get cache page: index %u\n", + pebi->current_log.start_page); + return -ERANGE; + } + + pl_hdr = kmap_local_page(page); + + ssdfs_memcpy(pl_hdr->desc_array, 0, array_bytes, + plh_desc, 0, array_bytes, + array_bytes); + + pl_hdr->peb_create_time = cpu_to_le64(pebi->peb_create_time); + + last_log_time = pebi->current_log.last_log_time; + last_log_cno = pebi->current_log.last_log_cno; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, " + "peb_create_time %llx, last_log_time %llx\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + pebi->peb_create_time, + last_log_time); + + BUG_ON(pebi->peb_create_time > last_log_time); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_prepare_partial_log_header_for_commit(fsi, + sequence_id, + log_pages, + seg_type, + flags | seg_flags, + last_log_time, + last_log_cno, + pl_hdr); + if (unlikely(err)) + goto finish_pl_header_preparation; + +finish_pl_header_preparation: + flush_dcache_page(page); + kunmap_local(pl_hdr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) { + SSDFS_CRIT("fail to store partial log header: " + "seg %llu, peb %llu, current_log.start_page %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page, err); + return err; + } + + return 0; +} + +/* + * ssdfs_peb_store_partial_log_header() - store partial log's header + * @pebi: pointer on PEB object + * @flags: partial log header's flags + * @hdr_desc: partial log header's metadata descriptor in segment header + * @plh_desc: partial log header's metadata descriptors array + * @array_size: count of items in array + * @cur_page: pointer on current page value [in|out] + * @write_offset: pointer on write offset value [in|out] + * + * This function tries to store the partial log's header. + * + * RETURN: + * [success] + * [failure] - error code. + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_store_partial_log_header(struct ssdfs_peb_info *pebi, u32 flags, + struct ssdfs_metadata_descriptor *hdr_desc, + struct ssdfs_metadata_descriptor *plh_desc, + size_t array_size, + pgoff_t *cur_page, + u32 *write_offset) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!plh_desc); + BUG_ON(!cur_page || !write_offset); + BUG_ON(array_size != SSDFS_SEG_HDR_DESC_MAX); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, current_log.start_page %u, " + "cur_page %lu, write_offset %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page, + *cur_page, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (hdr_desc) { + return ssdfs_peb_store_pl_header_like_footer(pebi, flags, + hdr_desc, + plh_desc, + array_size, + cur_page, + write_offset); + } else { + return ssdfs_peb_store_pl_header_like_header(pebi, flags, + plh_desc, + array_size, + cur_page, + write_offset); + } +} + +/* + * ssdfs_peb_commit_first_partial_log() - commit first partial log + * @pebi: pointer on PEB object + * + * This function tries to commit the first partial log. + * + * RETURN: + * [success] + * [failure] - error code. + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_commit_first_partial_log(struct ssdfs_peb_info *pebi) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_metadata_descriptor hdr_desc[SSDFS_SEG_HDR_DESC_MAX]; + struct ssdfs_metadata_descriptor plh_desc[SSDFS_SEG_HDR_DESC_MAX]; + struct ssdfs_metadata_descriptor *cur_hdr_desc; + u32 flags; + size_t desc_size = sizeof(struct ssdfs_metadata_descriptor); + pgoff_t cur_page = pebi->current_log.start_page; + u32 write_offset = 0; + bool log_has_data = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, current_log.start_page %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + memset(hdr_desc, 0, desc_size * SSDFS_SEG_HDR_DESC_MAX); + memset(plh_desc, 0, desc_size * SSDFS_SEG_HDR_DESC_MAX); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0001: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_reserve_segment_header(pebi, &cur_page, &write_offset); + if (unlikely(err)) { + SSDFS_CRIT("fail to reserve segment header: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, err); + goto finish_commit_log; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0002: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_commit_log_payload(pebi, hdr_desc, &log_has_data, + &cur_page, &write_offset); + if (unlikely(err)) { + SSDFS_CRIT("fail to commit payload: " + "seg %llu, peb %llu, cur_page %lu, write_offset %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + cur_page, write_offset, err); + goto finish_commit_log; + } + + if (!log_has_data) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("current log hasn't data: start_page %u\n", + pebi->current_log.start_page); +#endif /* CONFIG_SSDFS_DEBUG */ + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0003: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + cur_hdr_desc = &hdr_desc[SSDFS_LOG_FOOTER_INDEX]; + flags = SSDFS_LOG_IS_PARTIAL | + SSDFS_LOG_HAS_PARTIAL_HEADER | + SSDFS_PARTIAL_HEADER_INSTEAD_FOOTER; + err = ssdfs_peb_store_partial_log_header(pebi, flags, cur_hdr_desc, + plh_desc, + SSDFS_SEG_HDR_DESC_MAX, + &cur_page, + &write_offset); + if (unlikely(err)) { + SSDFS_CRIT("fail to store log's partial header: " + "seg %llu, peb %llu, cur_page %lu, write_offset %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + cur_page, write_offset, err); + goto finish_commit_log; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0004: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_store_log_header(pebi, hdr_desc, + SSDFS_SEG_HDR_DESC_MAX, + write_offset); + if (unlikely(err)) { + SSDFS_CRIT("fail to store log's header: " + "seg %llu, peb %llu, write_offset %u," + "err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + write_offset, err); + goto finish_commit_log; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0005: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_flush_current_log_dirty_pages(pebi, write_offset); + if (unlikely(err)) { + SSDFS_CRIT("fail to flush current log: " + "seg %llu, peb %llu, current_log.start_page %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page, err); + goto finish_commit_log; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0006: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_peb_define_next_log_start(pebi, SSDFS_START_PARTIAL_LOG, + &cur_page, &write_offset); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0007: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + pebi->current_log.reserved_pages = 0; + pebi->current_log.seg_flags = 0; + + ssdfs_peb_set_current_log_state(pebi, SSDFS_LOG_COMMITTED); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log commited: seg %llu, peb %llu\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_commit_log: + return err; +} + +/* + * ssdfs_peb_commit_next_partial_log() - commit next partial log + * @pebi: pointer on PEB object + * + * This function tries to commit the next partial log. + * + * RETURN: + * [success] + * [failure] - error code. + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_commit_next_partial_log(struct ssdfs_peb_info *pebi) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_metadata_descriptor plh_desc[SSDFS_SEG_HDR_DESC_MAX]; + u32 flags; + size_t desc_size = sizeof(struct ssdfs_metadata_descriptor); + pgoff_t cur_page = pebi->current_log.start_page; + u32 write_offset = 0; + bool log_has_data = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, current_log.start_page %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + memset(plh_desc, 0, desc_size * SSDFS_SEG_HDR_DESC_MAX); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0001: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_reserve_partial_log_header(pebi, &cur_page, &write_offset); + if (unlikely(err)) { + SSDFS_CRIT("fail to reserve partial log's header: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, err); + goto finish_commit_log; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0002: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_commit_log_payload(pebi, plh_desc, &log_has_data, + &cur_page, &write_offset); + if (unlikely(err)) { + SSDFS_CRIT("fail to commit payload: " + "seg %llu, peb %llu, cur_page %lu, write_offset %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + cur_page, write_offset, err); + goto finish_commit_log; + } + + if (!log_has_data) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("current log hasn't data: start_page %u\n", + pebi->current_log.start_page); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + flags = SSDFS_LOG_IS_PARTIAL | + SSDFS_LOG_HAS_PARTIAL_HEADER; + err = ssdfs_peb_store_partial_log_header(pebi, flags, NULL, + plh_desc, + SSDFS_SEG_HDR_DESC_MAX, + &cur_page, + &write_offset); + if (unlikely(err)) { + SSDFS_CRIT("fail to store log's partial header: " + "seg %llu, peb %llu, cur_page %lu, write_offset %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + cur_page, write_offset, err); + goto finish_commit_log; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0003: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_flush_current_log_dirty_pages(pebi, write_offset); + if (unlikely(err)) { + SSDFS_CRIT("fail to flush current log: " + "seg %llu, peb %llu, current_log.start_page %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page, err); + goto finish_commit_log; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0004: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_peb_define_next_log_start(pebi, SSDFS_CONTINUE_PARTIAL_LOG, + &cur_page, &write_offset); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0005: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + pebi->current_log.reserved_pages = 0; + pebi->current_log.seg_flags = 0; + + ssdfs_peb_set_current_log_state(pebi, SSDFS_LOG_COMMITTED); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log commited: seg %llu, peb %llu\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_commit_log: + return err; +} + +/* + * ssdfs_peb_commit_last_partial_log() - commit last partial log + * @pebi: pointer on PEB object + * @cur_segs: current segment IDs array + * @cur_segs_size: size of segment IDs array size in bytes + * + * This function tries to commit the last partial log. + * + * RETURN: + * [success] + * [failure] - error code. + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_commit_last_partial_log(struct ssdfs_peb_info *pebi, + __le64 *cur_segs, + size_t cur_segs_size) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_metadata_descriptor plh_desc[SSDFS_SEG_HDR_DESC_MAX]; + struct ssdfs_metadata_descriptor lf_desc[SSDFS_LOG_FOOTER_DESC_MAX]; + struct ssdfs_metadata_descriptor *cur_hdr_desc; + u32 flags; + size_t desc_size = sizeof(struct ssdfs_metadata_descriptor); + pgoff_t cur_page = pebi->current_log.start_page; + pgoff_t cur_page_offset; + u32 write_offset = 0; + bool log_has_data = false; + int log_strategy = SSDFS_FINISH_PARTIAL_LOG; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, current_log.start_page %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + memset(plh_desc, 0, desc_size * SSDFS_SEG_HDR_DESC_MAX); + memset(lf_desc, 0, desc_size * SSDFS_LOG_FOOTER_DESC_MAX); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0001: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_reserve_partial_log_header(pebi, &cur_page, &write_offset); + if (unlikely(err)) { + SSDFS_CRIT("fail to reserve partial log's header: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, err); + goto finish_commit_log; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0002: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_commit_log_payload(pebi, plh_desc, &log_has_data, + &cur_page, &write_offset); + if (unlikely(err)) { + SSDFS_CRIT("fail to commit payload: " + "seg %llu, peb %llu, cur_page %lu, write_offset %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + cur_page, write_offset, err); + goto finish_commit_log; + } + + if (!log_has_data) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("current log hasn't data: start_page %u\n", + pebi->current_log.start_page); +#endif /* CONFIG_SSDFS_DEBUG */ + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0003: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + cur_page_offset = cur_page % pebi->log_pages; + + if (cur_page_offset == 0) { + /* + * There is no space for log footer. + * So, full log will be without footer. + */ + SSDFS_DBG("There is no space for log footer.\n"); + + flags = SSDFS_LOG_IS_PARTIAL | + SSDFS_LOG_HAS_PARTIAL_HEADER; + log_strategy = SSDFS_FINISH_PARTIAL_LOG; + } else if ((pebi->log_pages - cur_page_offset) == 1) { + cur_hdr_desc = &plh_desc[SSDFS_LOG_FOOTER_INDEX]; + flags = SSDFS_PARTIAL_LOG_FOOTER | SSDFS_ENDING_LOG_FOOTER; + err = ssdfs_peb_store_log_footer(pebi, flags, cur_hdr_desc, + lf_desc, + SSDFS_LOG_FOOTER_DESC_MAX, + cur_segs, cur_segs_size, + &cur_page, + &write_offset); + if (unlikely(err)) { + SSDFS_CRIT("fail to store log's footer: " + "seg %llu, peb %llu, cur_page %lu, " + "write_offset %u, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + cur_page, write_offset, err); + goto finish_commit_log; + } + + flags = SSDFS_LOG_IS_PARTIAL | + SSDFS_LOG_HAS_PARTIAL_HEADER | + SSDFS_LOG_HAS_FOOTER; + log_strategy = SSDFS_FINISH_PARTIAL_LOG; + } else { + /* + * It is possible to add another log. + */ + flags = SSDFS_LOG_IS_PARTIAL | + SSDFS_LOG_HAS_PARTIAL_HEADER; + log_strategy = SSDFS_CONTINUE_PARTIAL_LOG; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0004: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_store_partial_log_header(pebi, flags, NULL, + plh_desc, + SSDFS_SEG_HDR_DESC_MAX, + &cur_page, + &write_offset); + if (unlikely(err)) { + SSDFS_CRIT("fail to store log's partial header: " + "seg %llu, peb %llu, cur_page %lu, write_offset %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + cur_page, write_offset, err); + goto finish_commit_log; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0005: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_flush_current_log_dirty_pages(pebi, write_offset); + if (unlikely(err)) { + SSDFS_CRIT("fail to flush current log: " + "seg %llu, peb %llu, current_log.start_page %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page, err); + goto finish_commit_log; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0006: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_peb_define_next_log_start(pebi, log_strategy, + &cur_page, &write_offset); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0007: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + pebi->current_log.reserved_pages = 0; + pebi->current_log.seg_flags = 0; + + ssdfs_peb_set_current_log_state(pebi, SSDFS_LOG_COMMITTED); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log commited: seg %llu, peb %llu\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_commit_log: + return err; +} + +/* + * ssdfs_peb_commit_full_log() - commit full current log + * @pebi: pointer on PEB object + * @cur_segs: current segment IDs array + * @cur_segs_size: size of segment IDs array size in bytes + * + * This function tries to commit the current log. + * + * RETURN: + * [success] + * [failure] - error code. + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_commit_full_log(struct ssdfs_peb_info *pebi, + __le64 *cur_segs, + size_t cur_segs_size) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_metadata_descriptor hdr_desc[SSDFS_SEG_HDR_DESC_MAX]; + struct ssdfs_metadata_descriptor plh_desc[SSDFS_SEG_HDR_DESC_MAX]; + struct ssdfs_metadata_descriptor lf_desc[SSDFS_LOG_FOOTER_DESC_MAX]; + struct ssdfs_metadata_descriptor *cur_hdr_desc; + int log_strategy = SSDFS_FINISH_FULL_LOG; + u32 flags; + size_t desc_size = sizeof(struct ssdfs_metadata_descriptor); + pgoff_t cur_page = pebi->current_log.start_page; + pgoff_t cur_page_offset; + u32 write_offset = 0; + bool log_has_data = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, current_log.start_page %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + memset(hdr_desc, 0, desc_size * SSDFS_SEG_HDR_DESC_MAX); + memset(lf_desc, 0, desc_size * SSDFS_LOG_FOOTER_DESC_MAX); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0001: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_reserve_segment_header(pebi, &cur_page, &write_offset); + if (unlikely(err)) { + SSDFS_CRIT("fail to reserve segment header: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, err); + goto finish_commit_log; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0002: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_commit_log_payload(pebi, hdr_desc, &log_has_data, + &cur_page, &write_offset); + if (unlikely(err)) { + SSDFS_CRIT("fail to commit payload: " + "seg %llu, peb %llu, cur_page %lu, write_offset %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + cur_page, write_offset, err); + goto finish_commit_log; + } + + if (!log_has_data) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("current log hasn't data: start_page %u\n", + pebi->current_log.start_page); +#endif /* CONFIG_SSDFS_DEBUG */ + goto define_next_log_start; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0003: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + cur_page_offset = cur_page % pebi->log_pages; + if (cur_page_offset == 0) { + SSDFS_WARN("There is no space for log footer.\n"); + } + + if ((pebi->log_pages - cur_page_offset) > 1) { + log_strategy = SSDFS_START_PARTIAL_LOG; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start partial log: " + "cur_page_offset %lu, pebi->log_pages %u\n", + cur_page_offset, pebi->log_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + cur_hdr_desc = &hdr_desc[SSDFS_LOG_FOOTER_INDEX]; + flags = SSDFS_LOG_IS_PARTIAL | + SSDFS_LOG_HAS_PARTIAL_HEADER | + SSDFS_PARTIAL_HEADER_INSTEAD_FOOTER; + err = ssdfs_peb_store_partial_log_header(pebi, flags, + cur_hdr_desc, + plh_desc, + SSDFS_SEG_HDR_DESC_MAX, + &cur_page, + &write_offset); + if (unlikely(err)) { + SSDFS_CRIT("fail to store log's partial header: " + "seg %llu, peb %llu, cur_page %lu, " + "write_offset %u, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, cur_page, + write_offset, err); + goto finish_commit_log; + } + } else { + log_strategy = SSDFS_FINISH_FULL_LOG; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finish full log: " + "cur_page_offset %lu, pebi->log_pages %u\n", + cur_page_offset, pebi->log_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + cur_hdr_desc = &hdr_desc[SSDFS_LOG_FOOTER_INDEX]; + flags = 0; + err = ssdfs_peb_store_log_footer(pebi, flags, cur_hdr_desc, + lf_desc, + SSDFS_LOG_FOOTER_DESC_MAX, + cur_segs, cur_segs_size, + &cur_page, + &write_offset); + if (unlikely(err)) { + SSDFS_CRIT("fail to store log's footer: " + "seg %llu, peb %llu, cur_page %lu, " + "write_offset %u, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, cur_page, + write_offset, err); + goto finish_commit_log; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0004: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_store_log_header(pebi, hdr_desc, + SSDFS_SEG_HDR_DESC_MAX, + write_offset); + if (unlikely(err)) { + SSDFS_CRIT("fail to store log's header: " + "seg %llu, peb %llu, write_offset %u," + "err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + write_offset, err); + goto finish_commit_log; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0005: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_flush_current_log_dirty_pages(pebi, write_offset); + if (unlikely(err)) { + SSDFS_CRIT("fail to flush current log: " + "seg %llu, peb %llu, current_log.start_page %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page, err); + goto finish_commit_log; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0006: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + +define_next_log_start: + ssdfs_peb_define_next_log_start(pebi, log_strategy, + &cur_page, &write_offset); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0007: cur_page %lu, write_offset %u\n", + cur_page, write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + pebi->current_log.reserved_pages = 0; + pebi->current_log.seg_flags = 0; + + ssdfs_peb_set_current_log_state(pebi, SSDFS_LOG_COMMITTED); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log commited: seg %llu, peb %llu\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_commit_log: + return err; +} + +/* + * ssdfs_peb_calculate_reserved_metapages() - calculate reserved metapages + * @page_size: size of page in bytes + * @data_pages: number of allocated data pages + * @pebs_per_seg: number of PEBs in one segment + * @log_strategy: stategy of log commiting + */ +u16 ssdfs_peb_calculate_reserved_metapages(u32 page_size, + u32 data_pages, + u32 pebs_per_seg, + int log_strategy) +{ + size_t seg_hdr_size = sizeof(struct ssdfs_segment_header); + size_t lf_hdr_size = sizeof(struct ssdfs_log_footer); + u32 blk_bmap_bytes = 0; + u32 blk2off_tbl_bytes = 0; + u32 blk_desc_tbl_bytes = 0; + u32 reserved_bytes = 0; + u32 reserved_pages = 0; + + /* segment header */ + reserved_bytes += seg_hdr_size; + + /* block bitmap */ + blk_bmap_bytes = __ssdfs_peb_estimate_blk_bmap_bytes(data_pages, true); + reserved_bytes += blk_bmap_bytes; + + /* blk2off table */ + blk2off_tbl_bytes = __ssdfs_peb_estimate_blk2off_bytes(data_pages, + pebs_per_seg); + reserved_bytes += blk2off_tbl_bytes; + + /* block descriptor table */ + blk_desc_tbl_bytes = + __ssdfs_peb_estimate_blk_desc_tbl_bytes(data_pages); + reserved_bytes += blk_desc_tbl_bytes; + + reserved_bytes += page_size - 1; + reserved_bytes /= page_size; + reserved_bytes *= page_size; + + switch (log_strategy) { + case SSDFS_START_FULL_LOG: + case SSDFS_FINISH_PARTIAL_LOG: + case SSDFS_FINISH_FULL_LOG: + /* log footer header */ + reserved_bytes += lf_hdr_size; + + /* block bitmap */ + reserved_bytes += blk_bmap_bytes; + + /* blk2off table */ + reserved_bytes += blk2off_tbl_bytes; + + reserved_bytes += page_size - 1; + reserved_bytes /= page_size; + reserved_bytes *= page_size; + + reserved_pages = reserved_bytes / page_size; + break; + + case SSDFS_START_PARTIAL_LOG: + case SSDFS_CONTINUE_PARTIAL_LOG: + /* do nothing */ + break; + + default: + SSDFS_CRIT("unexpected log strategy %#x\n", + log_strategy); + return U16_MAX; + } + + reserved_pages = reserved_bytes / page_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("data_pages %u, log_strategy %#x, " + "blk_bmap_bytes %u, blk2off_tbl_bytes %u, " + "blk_desc_tbl_bytes %u, reserved_bytes %u, " + "reserved_pages %u\n", + data_pages, log_strategy, + blk_bmap_bytes, blk2off_tbl_bytes, + blk_desc_tbl_bytes, reserved_bytes, + reserved_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + BUG_ON(reserved_pages >= U16_MAX); + + return (u16)reserved_pages; +} + +/* + * ssdfs_peb_commit_log() - commit current log + * @pebi: pointer on PEB object + * @cur_segs: current segment IDs array + * @cur_segs_size: size of segment IDs array size in bytes + * + * This function tries to commit the current log. + * + * RETURN: + * [success] + * [failure] - error code. + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_commit_log(struct ssdfs_peb_info *pebi, + __le64 *cur_segs, size_t cur_segs_size) +{ + struct ssdfs_segment_info *si; + struct ssdfs_blk2off_table *table; + int log_state; + int log_strategy; + u32 page_size; + u32 pebs_per_seg; + u32 pages_per_peb; + int used_pages; + int invalid_pages; + u32 data_pages; + u16 reserved_pages; + u16 diff; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); +#endif /* CONFIG_SSDFS_DEBUG */ + + log_state = atomic_read(&pebi->current_log.state); + + switch (log_state) { + case SSDFS_LOG_UNKNOWN: + case SSDFS_LOG_PREPARED: + case SSDFS_LOG_INITIALIZED: + SSDFS_WARN("peb %llu current log can't be commited\n", + pebi->peb_id); + return -EINVAL; + + case SSDFS_LOG_CREATED: + /* do function's work */ + break; + + case SSDFS_LOG_COMMITTED: + SSDFS_WARN("peb %llu current log has been commited\n", + pebi->peb_id); + return 0; + + default: + BUG(); + }; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg %llu, peb %llu, current_log.start_page %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page); +#else + SSDFS_DBG("seg %llu, peb %llu, current_log.start_page %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + si = pebi->pebc->parent_si; + log_strategy = is_log_partial(pebi); + page_size = pebi->pebc->parent_si->fsi->pagesize; + pebs_per_seg = pebi->pebc->parent_si->fsi->pebs_per_seg; + pages_per_peb = pebi->pebc->parent_si->fsi->pages_per_peb; + + used_pages = ssdfs_peb_get_used_data_pages(pebi->pebc); + if (used_pages < 0) { + err = used_pages; + SSDFS_ERR("fail to get used data pages count: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); + return err; + } + + invalid_pages = ssdfs_peb_get_invalid_pages(pebi->pebc); + if (invalid_pages < 0) { + err = invalid_pages; + SSDFS_ERR("fail to get invalid pages count: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); + return err; + } + + data_pages = used_pages + invalid_pages; + + if (data_pages == 0) { + SSDFS_ERR("invalid data pages count: " + "used_pages %d, invalid_pages %d, " + "data_pages %u\n", + used_pages, invalid_pages, data_pages); + return -ERANGE; + } + + reserved_pages = ssdfs_peb_calculate_reserved_metapages(page_size, + data_pages, + pebs_per_seg, + log_strategy); + if (reserved_pages > pages_per_peb) { + SSDFS_ERR("reserved_pages %u > pages_per_peb %u\n", + reserved_pages, pages_per_peb); + return -ERANGE; + } + + if (reserved_pages > pebi->current_log.reserved_pages) { + diff = reserved_pages - pebi->current_log.reserved_pages; + + err = ssdfs_segment_blk_bmap_reserve_metapages(&si->blk_bmap, + pebi->pebc, + diff); + if (err == -ENOSPC) { + /* ignore error */ + err = 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to reserve metadata pages: " + "count %u, err %d\n", + diff, err); + return err; + } + + pebi->current_log.reserved_pages += diff; + if (diff > pebi->current_log.free_data_pages) + pebi->current_log.free_data_pages = 0; + else + pebi->current_log.free_data_pages -= diff; + } else if (reserved_pages < pebi->current_log.reserved_pages) { + diff = pebi->current_log.reserved_pages - reserved_pages; + + err = ssdfs_segment_blk_bmap_free_metapages(&si->blk_bmap, + pebi->pebc, + diff); + if (unlikely(err)) { + SSDFS_ERR("fail to free metadata pages: " + "count %u, err %d\n", + diff, err); + return err; + } + + pebi->current_log.reserved_pages -= diff; + pebi->current_log.free_data_pages += diff; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("data_pages %u, " + "current_log (reserved_pages %u, free_data_pages %u)\n", + data_pages, + pebi->current_log.reserved_pages, + pebi->current_log.free_data_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + pebi->current_log.last_log_time = ssdfs_current_timestamp(); + pebi->current_log.last_log_cno = ssdfs_current_cno(si->fsi->sb); + + log_strategy = is_log_partial(pebi); + + switch (log_strategy) { + case SSDFS_START_FULL_LOG: + SSDFS_CRIT("log contains nothing: " + "seg %llu, peb %llu, " + "free_data_pages %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + pebi->current_log.free_data_pages); + return -ERANGE; + + case SSDFS_START_PARTIAL_LOG: + err = ssdfs_peb_commit_first_partial_log(pebi); + if (unlikely(err)) { + SSDFS_CRIT("fail to commit first partial log: " + "err %d\n", err); + return err; + } + break; + + case SSDFS_CONTINUE_PARTIAL_LOG: + err = ssdfs_peb_commit_next_partial_log(pebi); + if (unlikely(err)) { + SSDFS_CRIT("fail to commit next partial log: " + "err %d\n", err); + return err; + } + break; + + case SSDFS_FINISH_PARTIAL_LOG: + err = ssdfs_peb_commit_last_partial_log(pebi, cur_segs, + cur_segs_size); + if (unlikely(err)) { + SSDFS_CRIT("fail to commit last partial log: " + "err %d\n", err); + return err; + } + break; + + case SSDFS_FINISH_FULL_LOG: + err = ssdfs_peb_commit_full_log(pebi, cur_segs, + cur_segs_size); + if (unlikely(err)) { + SSDFS_CRIT("fail to commit full log: " + "err %d\n", err); + return err; + } + break; + + default: + SSDFS_CRIT("unexpected log strategy %#x\n", + log_strategy); + return -ERANGE; + } + + table = pebi->pebc->parent_si->blk2off_table; + + err = ssdfs_blk2off_table_revert_migration_state(table, + pebi->peb_index); + if (unlikely(err)) { + SSDFS_ERR("fail to revert migration state: " + "seg %llu, peb %llu, peb_index %u, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + pebi->peb_index, + err); + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * ssdfs_peb_remain_log_creation_thread() - remain as log creation thread + * @pebc: pointer on PEB container + * + * This function check that PEB's flush thread can work + * as thread that creates logs. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - PEB hasn't free space. + */ +static +int ssdfs_peb_remain_log_creation_thread(struct ssdfs_peb_container *pebc) +{ + struct ssdfs_segment_info *si; + int peb_free_pages; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si); + + SSDFS_DBG("seg %llu, peb_index %d\n", + pebc->parent_si->seg_id, pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = pebc->parent_si; + + peb_free_pages = ssdfs_peb_get_free_pages(pebc); + if (unlikely(peb_free_pages < 0)) { + err = peb_free_pages; + SSDFS_ERR("fail to calculate PEB's free pages: " + "seg %llu, peb index %d, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_free_pages %d\n", peb_free_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (peb_free_pages == 0) { + SSDFS_DBG("PEB hasn't free space\n"); + return -ENOSPC; + } + + if (!is_peb_joined_into_create_requests_queue(pebc)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_WARN("peb_index %u hasn't creation role\n", + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_join_create_requests_queue(pebc, + &si->create_rq); + if (unlikely(err)) { + SSDFS_ERR("fail to join create requests queue: " + "seg %llu, peb_index %d, err %d\n", + si->seg_id, pebc->peb_index, err); + return err; + } + } + + return 0; +} + +/* + * ssdfs_peb_delegate_log_creation_role() - try to delegate log creation role + * @pebc: pointer on PEB container + * @found_peb_index: index of PEB candidate + * + * This function tries to delegate the role of logs creation to + * PEB with @found_peb_index. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EAGAIN - it needs to search another candidate. + */ +static +int ssdfs_peb_delegate_log_creation_role(struct ssdfs_peb_container *pebc, + int found_peb_index) +{ + struct ssdfs_segment_info *si; + struct ssdfs_peb_container *found_pebc; + int peb_free_pages; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si); + BUG_ON(found_peb_index >= pebc->parent_si->pebs_count); + + SSDFS_DBG("seg %llu, peb_index %d, found_peb_index %d\n", + pebc->parent_si->seg_id, pebc->peb_index, + found_peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = pebc->parent_si; + + if (found_peb_index == pebc->peb_index) { + err = ssdfs_peb_remain_log_creation_thread(pebc); + if (err == -ENOSPC) { + SSDFS_WARN("PEB hasn't free space\n"); + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to remain log creation thread: " + "seg %llu, peb_index %d, " + "err %d\n", + si->seg_id, pebc->peb_index, err); + return err; + } + + return 0; + } + + found_pebc = &si->peb_array[found_peb_index]; + + peb_free_pages = ssdfs_peb_get_free_pages(found_pebc); + if (unlikely(peb_free_pages < 0)) { + err = peb_free_pages; + SSDFS_ERR("fail to calculate PEB's free pages: " + "seg %llu, peb index %d, err %d\n", + found_pebc->parent_si->seg_id, + found_pebc->peb_index, err); + return err; + } + + if (peb_free_pages == 0) + return -EAGAIN; + + if (is_peb_joined_into_create_requests_queue(found_pebc)) { + SSDFS_WARN("PEB is creating log: " + "seg %llu, peb_index %d\n", + found_pebc->parent_si->seg_id, + found_pebc->peb_index); + return -EAGAIN; + } + + ssdfs_peb_forget_create_requests_queue(pebc); + + err = ssdfs_peb_join_create_requests_queue(found_pebc, + &si->create_rq); + if (unlikely(err)) { + SSDFS_ERR("fail to join create requests queue: " + "seg %llu, peb_index %d, err %d\n", + si->seg_id, found_pebc->peb_index, err); + return err; + } + + return 0; +} + +/* + * ssdfs_peb_find_next_log_creation_thread() - search PEB for logs creation + * @pebc: pointer on PEB container + * + * This function tries to find and to delegate the role of logs creation + * to another PEB's flush thread. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - fail to find another PEB. + */ +static +int ssdfs_peb_find_next_log_creation_thread(struct ssdfs_peb_container *pebc) +{ + struct ssdfs_segment_info *si; + int start_pos; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si); + + SSDFS_DBG("seg %llu, peb_index %d\n", + pebc->parent_si->seg_id, pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = pebc->parent_si; + + if (!is_peb_joined_into_create_requests_queue(pebc)) { + SSDFS_WARN("peb_index %u hasn't creation role\n", + pebc->peb_index); + return -EINVAL; + } + + start_pos = pebc->peb_index + si->create_threads; + + if (start_pos >= si->pebs_count) + start_pos = pebc->peb_index % si->create_threads; + + if (start_pos == pebc->peb_index) { + err = ssdfs_peb_remain_log_creation_thread(pebc); + if (err == -ENOSPC) { + SSDFS_DBG("PEB hasn't free space\n"); + return 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to remain log creation thread: " + "seg %llu, peb_index %d, " + "err %d\n", + si->seg_id, pebc->peb_index, err); + return err; + } else + return 0; + } + + if (start_pos < pebc->peb_index) + goto search_from_begin; + + for (i = start_pos; i < si->pebs_count; i += si->create_threads) { + err = ssdfs_peb_delegate_log_creation_role(pebc, i); + if (err == -EAGAIN) + continue; + else if (err == -ENOSPC) { + SSDFS_WARN("PEB hasn't free space\n"); + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to delegate log creation role: " + "seg %llu, peb_index %d, " + "found_peb_index %d, err %d\n", + si->seg_id, pebc->peb_index, + i, err); + return err; + } else + return 0; + } + + start_pos = pebc->peb_index % si->create_threads; + +search_from_begin: + for (i = start_pos; i <= pebc->peb_index; i += si->create_threads) { + err = ssdfs_peb_delegate_log_creation_role(pebc, i); + if (err == -EAGAIN) + continue; + if (err == -ENOSPC) { + SSDFS_WARN("PEB hasn't free space\n"); + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to delegate log creation role: " + "seg %llu, peb_index %d, " + "found_peb_index %d, err %d\n", + si->seg_id, pebc->peb_index, + i, err); + return err; + } else + return 0; + } + + SSDFS_ERR("fail to delegate log creation role: " + "seg %llu, peb_index %d\n", + si->seg_id, pebc->peb_index); + return -ERANGE; +} + /* * __ssdfs_finish_request() - common logic of request's finishing * @pebc: pointer on PEB container From patchwork Sat Feb 25 01:08:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151936 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79FACC7EE30 for ; Sat, 25 Feb 2023 01:17:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229693AbjBYBRo (ORCPT ); Fri, 24 Feb 2023 20:17:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229711AbjBYBQs (ORCPT ); Fri, 24 Feb 2023 20:16:48 -0500 Received: from mail-oi1-x234.google.com (mail-oi1-x234.google.com [IPv6:2607:f8b0:4864:20::234]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BAA5312879 for ; Fri, 24 Feb 2023 17:16:39 -0800 (PST) Received: by mail-oi1-x234.google.com with SMTP id s41so5936oiw.13 for ; Fri, 24 Feb 2023 17:16:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=AjqG8CyOgJaOW/X4m7IOU9xnMCFRPpnm81JOAFbDQdM=; b=yZ/GwWugZjfV+CSDUO6YYm2vu/waVlvoBO49C2Ur9ZsA4mPK0IISyhWbCw5UqjGM/C 3hoXp1AY3zTHQFxEbFY6/1qWSnPzvTsADvAPkE3OfOZOfMESKg1aDpHMeuZDQ+/ml+Ys NNiQt/AIjKIs4vXDCpDtpmtlogJlX0QKKsZovCSVzxM/XpHlZ9jUnyZNBsNREhHiHPoP 4qLieT1TQ498UqFwH5/MgIVW2n/OJh6j2fn2+GgVwmIqpCxQ2V73dbzYg9iRYviQj9V+ yYIWdawEUbEWTZDmXW8bcyCNmf0z5WraHmMHOFwVQi/JAKP/xcG83faBWjWcqZU6it95 fyHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AjqG8CyOgJaOW/X4m7IOU9xnMCFRPpnm81JOAFbDQdM=; b=eQ7lQ2aH6nz4aY2cdSuyDkJtjZzGXLRye02pimx21klSZ2WOJWp7Nt3+KYgco3mlX5 x+vnsVD7WT6HLzeASWhTcmAqFAhvXuGY5yoAvu7tf4eP8mdGDdduk+4qG83UubJT0bzT FCI/QIMNQiY43FzOfS+vN8+4k67oofyIWNRfh5LzYZN9ViVBv7NJQJSeokqM6IhYYV29 6Y5sbibstIglwYNykspLAMSgnMj91bE9Suojy7fVy97+mzATSScUyj0lWL5LGBLBxbpz iP2lhkhdfFolHk/MugFBL6d54kiJiYZakkLTxe8Oz2nV3SGhi/IgSPdtG4YOGYoMt2Ot 3mjg== X-Gm-Message-State: AO0yUKUUqGDRjSNHdItsUmN2aechvDumiwSazLE1D886DjZr8vaNuyBn 8fVuQO5wO9nvQTj/h2//OTWvE/8x6k5qnRku X-Google-Smtp-Source: AK7set8fy66UL162BGmZDf+Scp1Eyqpig59AHfHxhNP9QFnABWfiCXECk/rPP7Sq4tYijFJU0WGUNw== X-Received: by 2002:aca:1c0a:0:b0:384:232:2a4f with SMTP id c10-20020aca1c0a000000b0038402322a4fmr1289235oic.4.1677287798021; Fri, 24 Feb 2023 17:16:38 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:37 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 30/76] ssdfs: commit log payload Date: Fri, 24 Feb 2023 17:08:41 -0800 Message-Id: <20230225010927.813929-31-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Log can be imagined like a container that keeps user data or metadata of SSDFS metadata structures. Every log starts from header (and can be finished by footer). Header and footer are log's metadata structures that describes the structure of log's payload. Initially, flush thread processes create and update requests and payload of these requests is compressed and then is compacted (ot aggregated) into contigous sequence of memory pages. When full log is ready or commit request has been received, flush thread executes commit log logic. This logic includes the steps: (1) reserve space for header, (2) define commit strategy (full or partial log), (3) store block bitmap, (4) store offset translation table, (5) copy content of main, journal, or diff-on-write areas into log's payload, (6) create and store log's footer (optional logic), (7) create and store header into reserved space. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/peb_flush_thread.c | 3341 +++++++++++++++++++++++++++++++++++ 1 file changed, 3341 insertions(+) diff --git a/fs/ssdfs/peb_flush_thread.c b/fs/ssdfs/peb_flush_thread.c index d9352804f6b9..2de4bb806678 100644 --- a/fs/ssdfs/peb_flush_thread.c +++ b/fs/ssdfs/peb_flush_thread.c @@ -113,6 +113,3347 @@ void ssdfs_flush_check_memory_leaks(void) * FLUSH THREAD FUNCTIONALITY * ******************************************************************************/ +/* + * ssdfs_peb_has_dirty_pages() - check that PEB has dirty pages + * @pebi: pointer on PEB object + */ +bool ssdfs_peb_has_dirty_pages(struct ssdfs_peb_info *pebi) +{ + struct ssdfs_page_array *area_pages; + bool is_peb_dirty = false; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < SSDFS_LOG_AREA_MAX; i++) { + area_pages = &pebi->current_log.area[i].array; + + if (atomic_read(&area_pages->state) == SSDFS_PAGE_ARRAY_DIRTY) { + is_peb_dirty = true; + break; + } + } + + return is_peb_dirty; +} + +/* + * is_full_log_ready() - check that full log is ready + * @pebi: pointer on PEB object + */ +static inline +bool is_full_log_ready(struct ssdfs_peb_info *pebi) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, free_data_pages %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + pebi->current_log.free_data_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + return pebi->current_log.free_data_pages == 0; +} + +/* + * should_partial_log_being_commited() - check that it's time to commit + * @pebi: pointer on PEB object + */ +static inline +bool should_partial_log_being_commited(struct ssdfs_peb_info *pebi) +{ + u16 free_data_pages; + u16 min_partial_log_pages; + int log_strategy; + bool time_to_commit = false; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); +#endif /* CONFIG_SSDFS_DEBUG */ + + free_data_pages = pebi->current_log.free_data_pages; + min_partial_log_pages = ssdfs_peb_estimate_min_partial_log_pages(pebi); + + log_strategy = is_log_partial(pebi); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, log_strategy %#x, " + "free_data_pages %u, min_partial_log_pages %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, log_strategy, + free_data_pages, min_partial_log_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (log_strategy) { + case SSDFS_START_FULL_LOG: + case SSDFS_START_PARTIAL_LOG: + if (free_data_pages <= min_partial_log_pages) { + time_to_commit = true; + } else { + time_to_commit = false; + } + break; + + case SSDFS_CONTINUE_PARTIAL_LOG: + case SSDFS_FINISH_PARTIAL_LOG: + case SSDFS_FINISH_FULL_LOG: + /* do nothing */ + time_to_commit = false; + break; + + default: + SSDFS_CRIT("unexpected log strategy %#x\n", + log_strategy); + time_to_commit = false; + break; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("time_to_commit %#x\n", time_to_commit); +#endif /* CONFIG_SSDFS_DEBUG */ + + return time_to_commit; +} + +/* + * ssdfs_reserve_segment_header() - reserve space for segment header + * @pebi: pointer on PEB object + * @cur_page: pointer on current page value [in|out] + * @write_offset: pointer on write offset value [in|out] + * + * This function reserves space for segment header in PEB's cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENOMEM - fail to allocate page. + */ +static +int ssdfs_reserve_segment_header(struct ssdfs_peb_info *pebi, + pgoff_t *cur_page, u32 *write_offset) +{ + struct page *page; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!cur_page || !write_offset); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, current_log.start_page %u, " + "cur_page %lu, write_offset %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page, + *cur_page, *write_offset); + + if (*cur_page != pebi->current_log.start_page) { + SSDFS_ERR("cur_page %lu != start_page %u\n", + *cur_page, pebi->current_log.start_page); + return -EINVAL; + } + + if (*write_offset != 0) { + SSDFS_ERR("write_offset %u != 0\n", + *write_offset); + return -EINVAL; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + page = ssdfs_page_array_grab_page(&pebi->cache, *cur_page); + if (IS_ERR_OR_NULL(page)) { + SSDFS_ERR("fail to grab cache page: index %lu\n", + *cur_page); + return -ENOMEM; + } + + /* prepare header space */ + ssdfs_memset_page(page, 0, PAGE_SIZE, 0xFF, PAGE_SIZE); + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + *write_offset = offsetof(struct ssdfs_segment_header, payload); + + return 0; +} + +/* + * ssdfs_reserve_partial_log_header() - reserve space for partial log's header + * @pebi: pointer on PEB object + * @cur_page: pointer on current page value [in|out] + * @write_offset: pointer on write offset value [in|out] + * + * This function reserves space for partial log's header in PEB's cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENOMEM - fail to allocate page. + */ +static +int ssdfs_reserve_partial_log_header(struct ssdfs_peb_info *pebi, + pgoff_t *cur_page, u32 *write_offset) +{ + struct page *page; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!cur_page || !write_offset); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, current_log.start_page %u, " + "cur_page %lu, write_offset %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page, + *cur_page, *write_offset); + + if (*cur_page != pebi->current_log.start_page) { + SSDFS_ERR("cur_page %lu != start_page %u\n", + *cur_page, pebi->current_log.start_page); + return -EINVAL; + } + + if (*write_offset != 0) { + SSDFS_ERR("write_offset %u != 0\n", + *write_offset); + return -EINVAL; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + page = ssdfs_page_array_grab_page(&pebi->cache, *cur_page); + if (IS_ERR_OR_NULL(page)) { + SSDFS_ERR("fail to grab cache page: index %lu\n", + *cur_page); + return -ENOMEM; + } + + /* prepare header space */ + ssdfs_memset_page(page, 0, PAGE_SIZE, 0xFF, PAGE_SIZE); + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + *write_offset = offsetof(struct ssdfs_partial_log_header, payload); + + return 0; +} + +/* + * ssdfs_peb_store_pagevec() - store pagevec into page cache + * @desc: descriptor of pagevec environment + * + * This function tries to store pagevec into log. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_store_pagevec(struct ssdfs_pagevec_descriptor *desc) +{ + struct ssdfs_fs_info *fsi; + struct page *src_page, *dst_page; + unsigned char *kaddr; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!desc); + BUG_ON(!desc->pebi || !desc->pebi->pebc->parent_si); + BUG_ON(!desc->pebi->pebc->parent_si->fsi); + BUG_ON(!desc->page_vec || !desc->desc_array); + BUG_ON(!desc->cur_page || !desc->write_offset); + + switch (desc->compression_type) { + case SSDFS_FRAGMENT_UNCOMPR_BLOB: + case SSDFS_FRAGMENT_ZLIB_BLOB: + case SSDFS_FRAGMENT_LZO_BLOB: + /* valid type */ + break; + + default: + SSDFS_WARN("invalid compression %#x\n", + desc->compression_type); + return -EINVAL; + } + + SSDFS_DBG("seg %llu, peb %llu, current_log.start_page %u\n", + desc->pebi->pebc->parent_si->seg_id, + desc->pebi->peb_id, + desc->pebi->current_log.start_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = desc->pebi->pebc->parent_si->fsi; + desc->compr_size = 0; + desc->uncompr_size = 0; + desc->fragments_count = 0; + + for (i = 0; i < ssdfs_page_vector_count(desc->page_vec); i++) { + size_t iter_bytes; + size_t dst_page_off; + size_t dst_free_space; + struct ssdfs_fragment_source from; + struct ssdfs_fragment_destination to; + + BUG_ON(i >= desc->array_capacity); + + if (desc->uncompr_size > desc->bytes_count) { + SSDFS_WARN("uncompr_size %u > bytes_count %zu\n", + desc->uncompr_size, + desc->bytes_count); + break; + } else if (desc->uncompr_size == desc->bytes_count) + break; + + iter_bytes = min_t(size_t, PAGE_SIZE, + desc->bytes_count - desc->uncompr_size); + + src_page = desc->page_vec->pages[i]; + +try_get_next_page: + dst_page = ssdfs_page_array_grab_page(&desc->pebi->cache, + *desc->cur_page); + if (IS_ERR_OR_NULL(dst_page)) { + SSDFS_ERR("fail to grab cache page: index %lu\n", + *desc->cur_page); + return -ENOMEM; + } + + dst_page_off = *(desc->write_offset) % PAGE_SIZE; + dst_free_space = PAGE_SIZE - dst_page_off; + + kaddr = kmap_local_page(dst_page); + + from.page = src_page; + from.start_offset = 0; + from.data_bytes = iter_bytes; + from.sequence_id = desc->start_sequence_id + i; + from.fragment_type = desc->compression_type; + from.fragment_flags = SSDFS_FRAGMENT_HAS_CSUM; + + to.area_offset = desc->area_offset; + to.write_offset = *desc->write_offset; + to.store = kaddr + dst_page_off; + to.free_space = dst_free_space; + to.compr_size = 0; + to.desc = &desc->desc_array[i]; + + err = ssdfs_peb_store_fragment(&from, &to); + + flush_dcache_page(dst_page); + kunmap_local(kaddr); + + if (!err) { + ssdfs_set_page_private(dst_page, 0); + SetPageUptodate(dst_page); + + err = + ssdfs_page_array_set_page_dirty(&desc->pebi->cache, + *desc->cur_page); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: " + "err %d\n", + *desc->cur_page, err); + } + } + + ssdfs_unlock_page(dst_page); + ssdfs_put_page(dst_page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + dst_page, page_ref_count(dst_page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try to get next page: " + "write_offset %u, dst_free_space %zu\n", + *desc->write_offset, + dst_free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + *desc->write_offset += dst_free_space; + (*desc->cur_page)++; + goto try_get_next_page; + } + + if (unlikely(err)) { + SSDFS_ERR("fail to store fragment: " + "sequence_id %u, write_offset %u, err %d\n", + desc->start_sequence_id + i, + *desc->write_offset, err); + return err; + } + + desc->uncompr_size += iter_bytes; + *desc->write_offset += to.compr_size; + desc->compr_size += to.compr_size; + desc->fragments_count++; + } + + return 0; +} + +/* + * ssdfs_peb_store_blk_bmap_fragment() - store fragment of block bitmap + * @desc: descriptor of block bitmap fragment environment + * @bmap_hdr_offset: offset of header from log's beginning + * + * This function tries to store block bitmap fragment + * into PEB's log. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_store_blk_bmap_fragment(struct ssdfs_bmap_descriptor *desc, + u32 bmap_hdr_offset) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_block_bitmap_fragment *frag_hdr = NULL; + struct ssdfs_fragment_desc *frag_desc_array = NULL; + size_t frag_hdr_size = sizeof(struct ssdfs_block_bitmap_fragment); + size_t frag_desc_size = sizeof(struct ssdfs_fragment_desc); + size_t allocation_size = 0; + u32 frag_hdr_off; + struct ssdfs_pagevec_descriptor pvec_desc; + u32 pages_per_peb; + struct page *page; + pgoff_t index; + u32 page_off; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!desc); + BUG_ON(!desc->pebi || !desc->cur_page || !desc->write_offset); + BUG_ON(ssdfs_page_vector_count(desc->snapshot) == 0); + + switch (desc->compression_type) { + case SSDFS_BLK_BMAP_NOCOMPR_TYPE: + case SSDFS_BLK_BMAP_ZLIB_COMPR_TYPE: + case SSDFS_BLK_BMAP_LZO_COMPR_TYPE: + /* valid type */ + break; + + default: + SSDFS_WARN("invalid compression %#x\n", + desc->compression_type); + return -EINVAL; + } + + SSDFS_DBG("peb_id %llu, peb_index %u, " + "cur_page %lu, write_offset %u, " + "desc->compression_type %#x\n", + desc->pebi->peb_id, + desc->pebi->peb_index, + *(desc->cur_page), *(desc->write_offset), + desc->compression_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = desc->pebi->pebc->parent_si->fsi; + + allocation_size = frag_hdr_size; + allocation_size += + ssdfs_page_vector_count(desc->snapshot) * frag_desc_size; + + frag_hdr = ssdfs_flush_kzalloc(allocation_size, GFP_KERNEL); + if (!frag_hdr) { + SSDFS_ERR("unable to allocate block bmap header\n"); + return -ENOMEM; + } + + frag_hdr_off = *(desc->write_offset); + *(desc->write_offset) += allocation_size; + + frag_desc_array = (struct ssdfs_fragment_desc *)((u8 *)frag_hdr + + frag_hdr_size); + + switch (desc->compression_type) { + case SSDFS_BLK_BMAP_NOCOMPR_TYPE: + pvec_desc.compression_type = SSDFS_FRAGMENT_UNCOMPR_BLOB; + break; + + case SSDFS_BLK_BMAP_ZLIB_COMPR_TYPE: + pvec_desc.compression_type = SSDFS_FRAGMENT_ZLIB_BLOB; + break; + + case SSDFS_BLK_BMAP_LZO_COMPR_TYPE: + pvec_desc.compression_type = SSDFS_FRAGMENT_LZO_BLOB; + break; + + default: + SSDFS_WARN("invalid compression %#x\n", + desc->compression_type); + return -EINVAL; + } + + pvec_desc.pebi = desc->pebi; + pvec_desc.start_sequence_id = 0; + pvec_desc.area_offset = bmap_hdr_offset; + pvec_desc.page_vec = desc->snapshot; + pvec_desc.bytes_count = desc->bytes_count; + pvec_desc.desc_array = frag_desc_array; + pvec_desc.array_capacity = SSDFS_BLK_BMAP_FRAGMENTS_CHAIN_MAX; + pvec_desc.cur_page = desc->cur_page; + pvec_desc.write_offset = desc->write_offset; + + err = ssdfs_peb_store_pagevec(&pvec_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to store block bitmap in the log: " + "seg %llu, peb %llu, write_offset %u, " + "err %d\n", + desc->pebi->pebc->parent_si->seg_id, + desc->pebi->peb_id, + *(desc->write_offset), err); + goto fail_store_bmap_fragment; + } + + frag_hdr->peb_index = cpu_to_le16(desc->peb_index); + frag_hdr->sequence_id = *(desc->frag_id); + *(desc->frag_id) += 1; + frag_hdr->flags = desc->flags; + frag_hdr->type = desc->type; + + pages_per_peb = fsi->pages_per_peb; + + if (desc->last_free_blk >= pages_per_peb) { + SSDFS_ERR("last_free_page %u >= pages_per_peb %u\n", + desc->last_free_blk, pages_per_peb); + err = -ERANGE; + goto fail_store_bmap_fragment; + } + + if ((desc->invalid_blks + desc->metadata_blks) > pages_per_peb) { + SSDFS_ERR("invalid descriptor state: " + "invalid_blks %u, metadata_blks %u, " + "pages_per_peb %u\n", + desc->invalid_blks, + desc->metadata_blks, + pages_per_peb); + err = -ERANGE; + goto fail_store_bmap_fragment; + } + + frag_hdr->last_free_blk = cpu_to_le32(desc->last_free_blk); + frag_hdr->metadata_blks = cpu_to_le32(desc->metadata_blks); + frag_hdr->invalid_blks = cpu_to_le32(desc->invalid_blks); + +#ifdef CONFIG_SSDFS_DEBUG + WARN_ON(pvec_desc.compr_size > pvec_desc.uncompr_size); + WARN_ON(pvec_desc.compr_size > + desc->pebi->pebc->parent_si->fsi->segsize); +#endif /* CONFIG_SSDFS_DEBUG */ + frag_hdr->chain_hdr.compr_bytes = cpu_to_le32(pvec_desc.compr_size); + +#ifdef CONFIG_SSDFS_DEBUG + WARN_ON(pvec_desc.uncompr_size > + desc->pebi->pebc->parent_si->fsi->segsize); +#endif /* CONFIG_SSDFS_DEBUG */ + frag_hdr->chain_hdr.uncompr_bytes = cpu_to_le32(pvec_desc.uncompr_size); + +#ifdef CONFIG_SSDFS_DEBUG + WARN_ON(pvec_desc.fragments_count > SSDFS_BLK_BMAP_FRAGMENTS_CHAIN_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + frag_hdr->chain_hdr.fragments_count = + cpu_to_le16(pvec_desc.fragments_count); + + frag_hdr->chain_hdr.desc_size = cpu_to_le16(frag_desc_size); + frag_hdr->chain_hdr.magic = SSDFS_CHAIN_HDR_MAGIC; + frag_hdr->chain_hdr.type = SSDFS_BLK_BMAP_CHAIN_HDR; + frag_hdr->chain_hdr.flags = 0; + + index = ssdfs_write_offset_to_mem_page_index(fsi, + desc->pebi->current_log.start_page, + frag_hdr_off); + + page = ssdfs_page_array_get_page_locked(&desc->pebi->cache, index); + if (IS_ERR_OR_NULL(page)) { + SSDFS_ERR("fail to get cache page: index %lu\n", index); + err = -ENOMEM; + goto fail_store_bmap_fragment; + } + + page_off = frag_hdr_off % PAGE_SIZE; + + err = ssdfs_memcpy_to_page(page, page_off, PAGE_SIZE, + frag_hdr, 0, allocation_size, + allocation_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: " + "page_off %u, allocation_size %zu, err %d\n", + page_off, allocation_size, err); + goto finish_copy; + } + + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + + err = ssdfs_page_array_set_page_dirty(&desc->pebi->cache, index); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: " + "err %d\n", + index, err); + } + +finish_copy: + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + +fail_store_bmap_fragment: + ssdfs_block_bmap_forget_snapshot(desc->snapshot); + ssdfs_flush_kfree(frag_hdr); + return err; +} + +/* + * ssdfs_peb_store_dst_blk_bmap() - store destination block bitmap + * @pebi: pointer on PEB object + * @items_state: PEB container's items state + * @compression: compression type + * @bmap_hdr_off: offset from log's beginning to bitmap header + * @frag_id: pointer on fragments counter [in|out] + * @cur_page: pointer on current page value [in|out] + * @write_offset: pointer on write offset value [in|out] + * + * This function tries to store destination block bitmap + * into destination PEB's log. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_store_dst_blk_bmap(struct ssdfs_peb_info *pebi, + int items_state, + u8 compression, + u32 bmap_hdr_off, + u8 *frag_id, + pgoff_t *cur_page, + u32 *write_offset) +{ + struct ssdfs_segment_blk_bmap *seg_blkbmap; + struct ssdfs_peb_blk_bmap *peb_blkbmap; + struct ssdfs_block_bmap *bmap; + struct ssdfs_bmap_descriptor desc; + int buffers_state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!frag_id || !cur_page || !write_offset); + BUG_ON(!rwsem_is_locked(&pebi->pebc->lock)); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + switch (items_state) { + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + /* valid state */ + break; + + default: + SSDFS_WARN("invalid items_state %#x\n", + items_state); + return -EINVAL; + } + + switch (compression) { + case SSDFS_BLK_BMAP_NOCOMPR_TYPE: + case SSDFS_BLK_BMAP_ZLIB_COMPR_TYPE: + case SSDFS_BLK_BMAP_LZO_COMPR_TYPE: + /* valid type */ + break; + + default: + SSDFS_WARN("invalid compression %#x\n", + compression); + return -EINVAL; + } + + SSDFS_DBG("seg %llu, peb_index %u, " + "cur_page %lu, write_offset %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_index, + *cur_page, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + desc.compression_type = compression; + desc.flags = SSDFS_PEB_HAS_RELATION | SSDFS_MIGRATING_BLK_BMAP; + desc.type = SSDFS_DST_BLK_BMAP; + desc.frag_id = frag_id; + desc.cur_page = cur_page; + desc.write_offset = write_offset; + + desc.snapshot = &pebi->current_log.bmap_snapshot; + + if (!pebi->pebc->src_peb || !pebi->pebc->dst_peb) { + SSDFS_WARN("empty src or dst PEB pointer\n"); + return -ERANGE; + } + + if (pebi == pebi->pebc->src_peb) + desc.pebi = pebi->pebc->src_peb; + else + desc.pebi = pebi->pebc->dst_peb; + + if (!desc.pebi) { + SSDFS_WARN("destination PEB doesn't exist\n"); + return -ERANGE; + } + + desc.peb_index = desc.pebi->peb_index; + + seg_blkbmap = &pebi->pebc->parent_si->blk_bmap; + peb_blkbmap = &seg_blkbmap->peb[pebi->pebc->peb_index]; + + err = ssdfs_page_vector_init(desc.snapshot); + if (unlikely(err)) { + SSDFS_ERR("fail to init page vector: " + "err %d\n", err); + return err; + } + + if (!ssdfs_peb_blk_bmap_initialized(peb_blkbmap)) { + SSDFS_ERR("PEB's block bitmap isn't initialized\n"); + return -ERANGE; + } + + down_read(&peb_blkbmap->lock); + + buffers_state = atomic_read(&peb_blkbmap->buffers_state); + switch (buffers_state) { + case SSDFS_PEB_BMAP1_SRC_PEB_BMAP2_DST: + case SSDFS_PEB_BMAP2_SRC_PEB_BMAP1_DST: + /* valid state */ + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid buffers_state %#x\n", + buffers_state); + goto finish_store_dst_blk_bmap; + } + + bmap = peb_blkbmap->dst; + if (!bmap) { + err = -ERANGE; + SSDFS_WARN("destination bitmap doesn't exist\n"); + goto finish_store_dst_blk_bmap; + } + + err = ssdfs_block_bmap_lock(bmap); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_store_dst_blk_bmap; + } + + err = ssdfs_block_bmap_snapshot(bmap, desc.snapshot, + &desc.last_free_blk, + &desc.metadata_blks, + &desc.invalid_blks, + &desc.bytes_count); + + ssdfs_block_bmap_unlock(bmap); + + if (unlikely(err)) { + SSDFS_ERR("fail to snapshot block bitmap: " + "seg %llu, peb_index %u, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->pebc->peb_index, err); + goto finish_store_dst_blk_bmap; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_id %llu, DST: last_free_blk %u, " + "metadata_blks %u, invalid_blks %u\n", + pebi->peb_id, desc.last_free_blk, + desc.metadata_blks, desc.invalid_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (ssdfs_page_vector_count(desc.snapshot) == 0) { + err = -ERANGE; + SSDFS_ERR("empty block bitmap\n"); + goto finish_store_dst_blk_bmap; + } + +finish_store_dst_blk_bmap: + up_read(&peb_blkbmap->lock); + + if (unlikely(err)) + return err; + + return ssdfs_peb_store_blk_bmap_fragment(&desc, bmap_hdr_off); +} + +/* + * ssdfs_peb_store_source_blk_bmap() - store source block bitmap + * @pebi: pointer on PEB object + * @items_state: PEB container's items state + * @compression: compression type + * @bmap_hdr_off: offset from log's beginning to bitmap header + * @frag_id: pointer on fragments counter [in|out] + * @cur_page: pointer on current page value [in|out] + * @write_offset: pointer on write offset value [in|out] + * + * This function tries to store source block bitmap + * into destination PEB's log. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_store_source_blk_bmap(struct ssdfs_peb_info *pebi, + int items_state, + u8 compression, + u32 bmap_hdr_off, + u8 *frag_id, + pgoff_t *cur_page, + u32 *write_offset) +{ + struct ssdfs_segment_blk_bmap *seg_blkbmap; + struct ssdfs_peb_blk_bmap *peb_blkbmap; + struct ssdfs_block_bmap *bmap; + struct ssdfs_bmap_descriptor desc; + int buffers_state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!frag_id || !cur_page || !write_offset); + BUG_ON(!pebi); + BUG_ON(!rwsem_is_locked(&pebi->pebc->lock)); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + switch (items_state) { + case SSDFS_PEB1_SRC_CONTAINER: + case SSDFS_PEB2_SRC_CONTAINER: + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + case SSDFS_PEB1_DST_CONTAINER: + case SSDFS_PEB2_DST_CONTAINER: + /* valid state */ + break; + + default: + SSDFS_WARN("invalid items_state %#x\n", + items_state); + return -EINVAL; + } + + switch (compression) { + case SSDFS_BLK_BMAP_NOCOMPR_TYPE: + case SSDFS_BLK_BMAP_ZLIB_COMPR_TYPE: + case SSDFS_BLK_BMAP_LZO_COMPR_TYPE: + /* valid type */ + break; + + default: + SSDFS_WARN("invalid compression %#x\n", + compression); + return -EINVAL; + } + + SSDFS_DBG("seg %llu, peb_index %u, " + "cur_page %lu, write_offset %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_index, + *cur_page, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + desc.compression_type = compression; + desc.frag_id = frag_id; + desc.cur_page = cur_page; + desc.write_offset = write_offset; + + desc.snapshot = &pebi->current_log.bmap_snapshot; + + switch (items_state) { + case SSDFS_PEB1_SRC_CONTAINER: + case SSDFS_PEB2_SRC_CONTAINER: + desc.flags = 0; + desc.type = SSDFS_SRC_BLK_BMAP; + desc.pebi = pebi->pebc->src_peb; + break; + + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + if (!pebi->pebc->src_peb || !pebi->pebc->dst_peb) { + SSDFS_WARN("empty src or dst PEB pointer\n"); + return -ERANGE; + } + + desc.flags = SSDFS_PEB_HAS_RELATION | + SSDFS_MIGRATING_BLK_BMAP; + desc.type = SSDFS_SRC_BLK_BMAP; + + if (pebi == pebi->pebc->src_peb) + desc.pebi = pebi->pebc->src_peb; + else + desc.pebi = pebi->pebc->dst_peb; + break; + + case SSDFS_PEB1_DST_CONTAINER: + case SSDFS_PEB2_DST_CONTAINER: + desc.flags = SSDFS_MIGRATING_BLK_BMAP; + desc.type = SSDFS_DST_BLK_BMAP; + /* log could be created in destintaion PEB only */ + desc.pebi = pebi->pebc->dst_peb; + break; + + default: + BUG(); + } + + if (!desc.pebi) { + SSDFS_WARN("destination PEB doesn't exist\n"); + return -ERANGE; + } + + desc.peb_index = desc.pebi->peb_index; + + seg_blkbmap = &pebi->pebc->parent_si->blk_bmap; + peb_blkbmap = &seg_blkbmap->peb[pebi->peb_index]; + + err = ssdfs_page_vector_init(desc.snapshot); + if (unlikely(err)) { + SSDFS_ERR("fail to init page vector: " + "err %d\n", err); + return err; + } + + if (!ssdfs_peb_blk_bmap_initialized(peb_blkbmap)) { + SSDFS_ERR("PEB's block bitmap isn't initialized\n"); + return -ERANGE; + } + + down_read(&peb_blkbmap->lock); + + buffers_state = atomic_read(&peb_blkbmap->buffers_state); + switch (buffers_state) { + case SSDFS_PEB_BMAP1_SRC: + case SSDFS_PEB_BMAP2_SRC: + case SSDFS_PEB_BMAP1_SRC_PEB_BMAP2_DST: + case SSDFS_PEB_BMAP2_SRC_PEB_BMAP1_DST: + /* valid state */ + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid buffers_state %#x\n", + buffers_state); + goto finish_store_src_blk_bmap; + } + + bmap = peb_blkbmap->src; + if (!bmap) { + err = -ERANGE; + SSDFS_WARN("source bitmap doesn't exist\n"); + goto finish_store_src_blk_bmap; + } + + err = ssdfs_block_bmap_lock(bmap); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_store_src_blk_bmap; + } + + err = ssdfs_block_bmap_snapshot(bmap, desc.snapshot, + &desc.last_free_blk, + &desc.metadata_blks, + &desc.invalid_blks, + &desc.bytes_count); + + ssdfs_block_bmap_unlock(bmap); + + if (unlikely(err)) { + SSDFS_ERR("fail to snapshot block bitmap: " + "seg %llu, peb_index %u, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->pebc->peb_index, err); + goto finish_store_src_blk_bmap; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_id %llu, SRC: last_free_blk %u, " + "metadata_blks %u, invalid_blks %u\n", + pebi->peb_id, desc.last_free_blk, + desc.metadata_blks, desc.invalid_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (desc.metadata_blks == 0) { + SSDFS_WARN("peb_id %llu, SRC: last_free_blk %u, " + "metadata_blks %u, invalid_blks %u\n", + pebi->peb_id, desc.last_free_blk, + desc.metadata_blks, desc.invalid_blks); + BUG(); + } + + if (ssdfs_page_vector_count(desc.snapshot) == 0) { + err = -ERANGE; + SSDFS_ERR("empty block bitmap\n"); + goto finish_store_src_blk_bmap; + } + +finish_store_src_blk_bmap: + up_read(&peb_blkbmap->lock); + + if (unlikely(err)) + return err; + + return ssdfs_peb_store_blk_bmap_fragment(&desc, bmap_hdr_off); +} + +/* + * ssdfs_peb_store_dependent_blk_bmap() - store dependent source bitmaps + * @pebi: pointer on PEB object + * @items_state: PEB container's items state + * @compression: compression type + * @bmap_hdr_off: offset from log's beginning to bitmap header + * @frag_id: pointer on fragments counter [in|out] + * @cur_page: pointer on current page value [in|out] + * @write_offset: pointer on write offset value [in|out] + * + * This function tries to store dependent source block bitmaps + * of migrating PEBs into destination PEB's log. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_store_dependent_blk_bmap(struct ssdfs_peb_info *pebi, + int items_state, + u8 compression, + u32 bmap_hdr_off, + u8 *frag_id, + pgoff_t *cur_page, + u32 *write_offset) +{ + struct ssdfs_segment_blk_bmap *seg_blkbmap; + struct ssdfs_peb_blk_bmap *peb_blkbmap; + struct ssdfs_block_bmap *bmap; + struct ssdfs_bmap_descriptor desc; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!frag_id || !cur_page || !write_offset); + BUG_ON(!rwsem_is_locked(&pebi->pebc->lock)); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + switch (items_state) { + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + case SSDFS_PEB1_DST_CONTAINER: + case SSDFS_PEB2_DST_CONTAINER: + /* valid state */ + break; + + default: + SSDFS_WARN("invalid items_state %#x\n", + items_state); + return -EINVAL; + } + + switch (compression) { + case SSDFS_BLK_BMAP_NOCOMPR_TYPE: + case SSDFS_BLK_BMAP_ZLIB_COMPR_TYPE: + case SSDFS_BLK_BMAP_LZO_COMPR_TYPE: + /* valid type */ + break; + + default: + SSDFS_WARN("invalid compression %#x\n", + compression); + return -EINVAL; + } + + SSDFS_DBG("seg %llu, peb_index %u, " + "cur_page %lu, write_offset %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_index, + *cur_page, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + desc.compression_type = compression; + desc.frag_id = frag_id; + desc.cur_page = cur_page; + desc.write_offset = write_offset; + + desc.snapshot = &pebi->current_log.bmap_snapshot; + + switch (items_state) { + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + case SSDFS_PEB1_DST_CONTAINER: + case SSDFS_PEB2_DST_CONTAINER: + desc.flags = SSDFS_PEB_HAS_EXT_PTR | SSDFS_MIGRATING_BLK_BMAP; + desc.type = SSDFS_SRC_BLK_BMAP; + desc.pebi = pebi->pebc->dst_peb; + break; + + default: + BUG(); + } + + if (!desc.pebi) { + SSDFS_WARN("destination PEB doesn't exist\n"); + return -ERANGE; + } + + seg_blkbmap = &pebi->pebc->parent_si->blk_bmap; + + for (i = 0; i < pebi->pebc->parent_si->pebs_count; i++) { + struct ssdfs_peb_container *cur_pebc; + struct ssdfs_peb_info *dst_peb; + int buffers_state; + + cur_pebc = &pebi->pebc->parent_si->peb_array[i]; + + switch (atomic_read(&cur_pebc->items_state)) { + case SSDFS_PEB1_SRC_EXT_PTR_DST_CONTAINER: + case SSDFS_PEB2_SRC_EXT_PTR_DST_CONTAINER: + /* do nothing here */ + break; + + default: + continue; + }; + + down_read(&cur_pebc->lock); + dst_peb = cur_pebc->dst_peb; + up_read(&cur_pebc->lock); + + if (dst_peb == NULL) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dst_peb is NULL: " + "peb_index %u\n", + i); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } else if (dst_peb != pebi->pebc->dst_peb) + continue; + + peb_blkbmap = &seg_blkbmap->peb[i]; + + err = ssdfs_page_vector_init(desc.snapshot); + if (unlikely(err)) { + SSDFS_ERR("fail to init page vector: " + "err %d\n", err); + return err; + } + + desc.peb_index = (u16)i; + + if (!ssdfs_peb_blk_bmap_initialized(peb_blkbmap)) { + SSDFS_ERR("PEB's block bitmap isn't initialized\n"); + return -ERANGE; + } + + down_read(&peb_blkbmap->lock); + + buffers_state = atomic_read(&peb_blkbmap->buffers_state); + switch (buffers_state) { + case SSDFS_PEB_BMAP1_SRC: + case SSDFS_PEB_BMAP2_SRC: + /* valid state */ + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid buffers_state %#x\n", + buffers_state); + goto finish_store_dependent_blk_bmap; + } + + bmap = peb_blkbmap->src; + if (!bmap) { + err = -ERANGE; + SSDFS_WARN("source bitmap doesn't exist\n"); + goto finish_store_dependent_blk_bmap; + } + + err = ssdfs_block_bmap_lock(bmap); + if (unlikely(err)) { + SSDFS_ERR("fail to lock block bitmap: err %d\n", err); + goto finish_store_dependent_blk_bmap; + } + + err = ssdfs_block_bmap_snapshot(bmap, desc.snapshot, + &desc.last_free_blk, + &desc.metadata_blks, + &desc.invalid_blks, + &desc.bytes_count); + + ssdfs_block_bmap_unlock(bmap); + + if (unlikely(err)) { + SSDFS_ERR("fail to snapshot block bitmap: " + "seg %llu, peb_index %u, err %d\n", + cur_pebc->parent_si->seg_id, + cur_pebc->peb_index, err); + goto finish_store_dependent_blk_bmap; + } + + if (ssdfs_page_vector_count(desc.snapshot) == 0) { + err = -ERANGE; + SSDFS_ERR("empty block bitmap\n"); + goto finish_store_dependent_blk_bmap; + } + +finish_store_dependent_blk_bmap: + up_read(&peb_blkbmap->lock); + + if (unlikely(err)) + return err; + + err = ssdfs_peb_store_blk_bmap_fragment(&desc, bmap_hdr_off); + if (unlikely(err)) { + SSDFS_ERR("fail to store block bitmap fragment: " + "peb_index %u, err %d\n", + i, err); + return err; + } + + ssdfs_block_bmap_forget_snapshot(desc.snapshot); + } + + return 0; +} + +static inline +void ssdfs_prepare_blk_bmap_options(struct ssdfs_fs_info *fsi, + u16 *flags, u8 *compression) +{ + u8 type; + + *flags = fsi->metadata_options.blk_bmap.flags; + type = fsi->metadata_options.blk_bmap.compression; + + *compression = SSDFS_BLK_BMAP_UNCOMPRESSED_BLOB; + + if (*flags & SSDFS_BLK_BMAP_MAKE_COMPRESSION) { + switch (type) { + case SSDFS_BLK_BMAP_NOCOMPR_TYPE: + *compression = SSDFS_BLK_BMAP_UNCOMPRESSED_BLOB; + break; + + case SSDFS_BLK_BMAP_ZLIB_COMPR_TYPE: + *compression = SSDFS_BLK_BMAP_ZLIB_BLOB; + break; + + case SSDFS_BLK_BMAP_LZO_COMPR_TYPE: + *compression = SSDFS_BLK_BMAP_LZO_BLOB; + break; + } + } +} + +/* + * ssdfs_peb_store_block_bmap() - store block bitmap into page cache + * @pebi: pointer on PEB object + * @desc: block bitmap descriptor [out] + * @cur_page: pointer on current page value [in|out] + * @write_offset: pointer on write offset value [in|out] + * + * This function tries to store block bitmap into log. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_store_block_bmap(struct ssdfs_peb_info *pebi, + struct ssdfs_metadata_descriptor *desc, + pgoff_t *cur_page, + u32 *write_offset) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_blk_bmap *seg_blkbmap; + struct ssdfs_block_bitmap_header *bmap_hdr = NULL; + size_t bmap_hdr_size = sizeof(struct ssdfs_block_bitmap_header); + int items_state; + u8 frag_id = 0; + u32 bmap_hdr_off; + u32 pages_per_peb; + u16 log_start_page = 0; + u16 flags = 0; + u8 compression = SSDFS_BLK_BMAP_UNCOMPRESSED_BLOB; + struct page *page; + pgoff_t index; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(flags & ~SSDFS_BLK_BMAP_FLAG_MASK); + BUG_ON(!desc || !cur_page || !write_offset); + + SSDFS_DBG("seg %llu, peb_index %u, " + "cur_page %lu, write_offset %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_index, + *cur_page, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + seg_blkbmap = &pebi->pebc->parent_si->blk_bmap; + + pages_per_peb = min_t(u32, fsi->leb_pages_capacity, + fsi->peb_pages_capacity); + + ssdfs_prepare_blk_bmap_options(fsi, &flags, &compression); + + bmap_hdr_off = *write_offset; + *write_offset += bmap_hdr_size; + + items_state = atomic_read(&pebi->pebc->items_state); + switch (items_state) { + case SSDFS_PEB1_SRC_CONTAINER: + case SSDFS_PEB2_SRC_CONTAINER: + /* Prepare source bitmap only */ + err = ssdfs_peb_store_source_blk_bmap(pebi, items_state, + compression, + bmap_hdr_off, + &frag_id, + cur_page, + write_offset); + if (unlikely(err)) { + SSDFS_ERR("fail to store source bitmap: " + "cur_page %lu, write_offset %u, " + "err %d\n", + *cur_page, *write_offset, err); + goto finish_store_block_bitmap; + } + break; + + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + if (!pebi->pebc->src_peb || !pebi->pebc->dst_peb) { + err = -ERANGE; + SSDFS_WARN("invalid src or dst PEB pointer\n"); + goto finish_store_block_bitmap; + } + + /* + * Prepare + * (1) destination bitmap + * (2) source bitmap + * (3) all dependent bitmaps + */ + err = ssdfs_peb_store_dst_blk_bmap(pebi, items_state, + compression, + bmap_hdr_off, + &frag_id, + cur_page, + write_offset); + if (unlikely(err)) { + SSDFS_ERR("fail to store destination bitmap: " + "cur_page %lu, write_offset %u, " + "err %d\n", + *cur_page, *write_offset, err); + goto finish_store_block_bitmap; + } + + err = ssdfs_peb_store_source_blk_bmap(pebi, items_state, + compression, + bmap_hdr_off, + &frag_id, + cur_page, + write_offset); + if (unlikely(err)) { + SSDFS_ERR("fail to store source bitmap: " + "cur_page %lu, write_offset %u, " + "err %d\n", + *cur_page, *write_offset, err); + goto finish_store_block_bitmap; + } + + err = ssdfs_peb_store_dependent_blk_bmap(pebi, items_state, + compression, + bmap_hdr_off, + &frag_id, + cur_page, + write_offset); + if (unlikely(err)) { + SSDFS_ERR("fail to store dependent bitmaps: " + "cur_page %lu, write_offset %u, " + "err %d\n", + *cur_page, *write_offset, err); + goto finish_store_block_bitmap; + } + break; + + case SSDFS_PEB1_DST_CONTAINER: + case SSDFS_PEB2_DST_CONTAINER: + /* + * Prepare + * (1) source bitmap + * (2) all dependent bitmaps + */ + err = ssdfs_peb_store_source_blk_bmap(pebi, items_state, + compression, + bmap_hdr_off, + &frag_id, + cur_page, + write_offset); + if (unlikely(err)) { + SSDFS_ERR("fail to store source bitmap: " + "cur_page %lu, write_offset %u, " + "err %d\n", + *cur_page, *write_offset, err); + goto finish_store_block_bitmap; + } + + err = ssdfs_peb_store_dependent_blk_bmap(pebi, items_state, + compression, + bmap_hdr_off, + &frag_id, + cur_page, + write_offset); + if (unlikely(err)) { + SSDFS_ERR("fail to store dependent bitmaps: " + "cur_page %lu, write_offset %u, " + "err %d\n", + *cur_page, *write_offset, err); + goto finish_store_block_bitmap; + } + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid items_state %#x\n", + items_state); + break; + } + + if (pebi->current_log.start_page >= pages_per_peb) { + err = -ERANGE; + SSDFS_ERR("log_start_page %u >= pages_per_peb %u\n", + log_start_page, pages_per_peb); + goto finish_store_block_bitmap; + } + + desc->offset = cpu_to_le32(bmap_hdr_off + + (pebi->current_log.start_page * fsi->pagesize)); + + index = ssdfs_write_offset_to_mem_page_index(fsi, + pebi->current_log.start_page, + bmap_hdr_off); + + page = ssdfs_page_array_get_page_locked(&pebi->cache, index); + if (IS_ERR_OR_NULL(page)) { + SSDFS_ERR("fail to get cache page: index %lu\n", index); + err = -ENOMEM; + goto finish_store_block_bitmap; + } + + kaddr = kmap_local_page(page); + + bmap_hdr = SSDFS_BLKBMP_HDR((u8 *)kaddr + + (bmap_hdr_off % PAGE_SIZE)); + + bmap_hdr->magic.common = cpu_to_le32(SSDFS_SUPER_MAGIC); + bmap_hdr->magic.key = cpu_to_le16(SSDFS_BLK_BMAP_MAGIC); + bmap_hdr->magic.version.major = SSDFS_MAJOR_REVISION; + bmap_hdr->magic.version.minor = SSDFS_MINOR_REVISION; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(frag_id == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + bmap_hdr->fragments_count = cpu_to_le16(frag_id); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(*write_offset <= bmap_hdr_off); + BUG_ON(*write_offset <= (bmap_hdr_off + bmap_hdr_size)); +#endif /* CONFIG_SSDFS_DEBUG */ + bmap_hdr->bytes_count = cpu_to_le32(*write_offset - bmap_hdr_off); + desc->size = bmap_hdr->bytes_count; + + pebi->current_log.prev_log_bmap_bytes = + le32_to_cpu(bmap_hdr->bytes_count); + + bmap_hdr->flags = flags; + bmap_hdr->type = compression; + + desc->check.bytes = cpu_to_le16(bmap_hdr_size); + + switch (compression) { + case SSDFS_BLK_BMAP_ZLIB_BLOB: + desc->check.flags = cpu_to_le16(SSDFS_CRC32 | + SSDFS_ZLIB_COMPRESSED); + break; + + case SSDFS_BLK_BMAP_LZO_BLOB: + desc->check.flags = cpu_to_le16(SSDFS_CRC32 | + SSDFS_LZO_COMPRESSED); + break; + + default: + desc->check.flags = cpu_to_le16(SSDFS_CRC32); + break; + } + + err = ssdfs_calculate_csum(&desc->check, bmap_hdr, bmap_hdr_size); + if (unlikely(err)) { + SSDFS_ERR("unable to calculate checksum: err %d\n", err); + goto finish_bmap_hdr_preparation; + } + + pebi->current_log.seg_flags |= SSDFS_SEG_HDR_HAS_BLK_BMAP; + +finish_bmap_hdr_preparation: + flush_dcache_page(page); + kunmap_local(kaddr); + + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + + err = ssdfs_page_array_set_page_dirty(&pebi->cache, index); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: " + "err %d\n", + index, err); + } + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_store_block_bitmap: + return err; +} + +/* + * is_peb_area_empty() - check that PEB's area is empty + * @pebi: pointer on PEB object + * @area_type: type of area + */ +static inline +bool is_peb_area_empty(struct ssdfs_peb_info *pebi, int area_type) +{ + struct ssdfs_peb_area *area; + size_t blk_table_size = sizeof(struct ssdfs_area_block_table); + bool is_empty = false; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); +#endif /* CONFIG_SSDFS_DEBUG */ + + area = &pebi->current_log.area[area_type]; + + if (area->has_metadata) + is_empty = area->write_offset == blk_table_size; + else + is_empty = area->write_offset == 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("area_type %#x, write_offset %u, is_empty %d\n", + area_type, area->write_offset, (int)is_empty); +#endif /* CONFIG_SSDFS_DEBUG */ + + return is_empty; +} + +/* + * ssdfs_peb_copy_area_pages_into_cache() - copy area pages into cache + * @pebi: pointer on PEB object + * @area_type: type of area + * @desc: descriptor of metadata area + * @cur_page: pointer on current page value [in|out] + * @write_offset: pointer on write offset value [in|out] + * + * This function tries to copy area pages into log's page cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENODATA - area is empty. + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_copy_area_pages_into_cache(struct ssdfs_peb_info *pebi, + int area_type, + struct ssdfs_metadata_descriptor *desc, + pgoff_t *cur_page, + u32 *write_offset) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_peb_area *area; + size_t blk_table_size = sizeof(struct ssdfs_area_block_table); + struct pagevec pvec; + struct ssdfs_page_array *smap, *dmap; + pgoff_t page_index, end, pages_count, range_len; + struct page *page; + u32 area_offset, area_size = 0; + u16 log_start_page; + u32 read_bytes = 0; + u32 area_write_offset = 0; + u16 flags; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(area_type >= SSDFS_LOG_AREA_MAX); + BUG_ON(!desc || !cur_page || !write_offset); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + area = &pebi->current_log.area[area_type]; + log_start_page = pebi->current_log.start_page; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, current_log.start_page %u, " + "area_type %#x, area->write_offset %u, " + "area->compressed_offset %u, " + "cur_page %lu, write_offset %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page, + area_type, area->write_offset, + area->compressed_offset, + *cur_page, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_peb_area_empty(pebi, area_type)) { + SSDFS_DBG("area %#x is empty\n", area_type); + return -ENODATA; + } + + smap = &area->array; + dmap = &pebi->cache; + + switch (area_type) { + case SSDFS_LOG_BLK_DESC_AREA: + flags = fsi->metadata_options.blk2off_tbl.flags; + if (flags & SSDFS_BLK2OFF_TBL_MAKE_COMPRESSION) + area_write_offset = area->compressed_offset; + else + area_write_offset = area->write_offset; + break; + + default: + area_write_offset = area->write_offset; + break; + } + + area_offset = *write_offset; + area_size = area_write_offset; + + desc->offset = cpu_to_le32(area_offset + + (log_start_page * fsi->pagesize)); + desc->size = cpu_to_le32(area_size); + + if (area->has_metadata) { + void *kaddr; + u8 compression = fsi->metadata_options.blk2off_tbl.compression; + u16 metadata_flags = SSDFS_CRC32; + + switch (area_type) { + case SSDFS_LOG_BLK_DESC_AREA: + flags = fsi->metadata_options.blk2off_tbl.flags; + if (flags & SSDFS_BLK2OFF_TBL_MAKE_COMPRESSION) { + switch (compression) { + case SSDFS_BLK2OFF_TBL_ZLIB_COMPR_TYPE: + metadata_flags |= SSDFS_ZLIB_COMPRESSED; + break; + + case SSDFS_BLK2OFF_TBL_LZO_COMPR_TYPE: + metadata_flags |= SSDFS_LZO_COMPRESSED; + break; + + default: + /* do nothing */ + break; + } + } + break; + + default: + /* do nothing */ + break; + } + + page = ssdfs_page_array_get_page_locked(smap, 0); + if (IS_ERR_OR_NULL(page)) { + SSDFS_ERR("fail to get page of area %#x\n", + area_type); + return -ERANGE; + } + + kaddr = kmap_local_page(page); + desc->check.bytes = cpu_to_le16(blk_table_size); + desc->check.flags = cpu_to_le16(metadata_flags); + err = ssdfs_calculate_csum(&desc->check, kaddr, blk_table_size); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) { + SSDFS_ERR("unable to calculate checksum: err %d\n", + err); + return err; + } + + err = ssdfs_page_array_set_page_dirty(smap, 0); + if (unlikely(err)) { + SSDFS_ERR("fail to set page dirty: err %d\n", + err); + return err; + } + } + + pagevec_init(&pvec); + + page_index = 0; + pages_count = area_write_offset + PAGE_SIZE - 1; + pages_count >>= PAGE_SHIFT; + + while (page_index < pages_count) { + int i; + + range_len = min_t(pgoff_t, + (pgoff_t)PAGEVEC_SIZE, + (pgoff_t)(pages_count - page_index)); + end = page_index + range_len - 1; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page_index %lu, pages_count %lu\n", + page_index, pages_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_page_array_lookup_range(smap, &page_index, end, + SSDFS_DIRTY_PAGE_TAG, + PAGEVEC_SIZE, + &pvec); + if (unlikely(err)) { + SSDFS_ERR("fail to find any dirty pages: err %d\n", + err); + return err; + } + + for (i = 0; i < pagevec_count(&pvec); i++) { + struct page *page1 = pvec.pages[i], *page2; + pgoff_t src_index = page1->index; + u32 src_len, dst_len, copy_len; + u32 src_off, dst_off; + u32 rest_len = PAGE_SIZE; + + if (read_bytes == area_size) + goto finish_pagevec_copy; + else if (read_bytes > area_size) { + err = -E2BIG; + SSDFS_ERR("too many pages: " + "pages_count %u, area_size %u\n", + pagevec_count(&pvec), + area_size); + goto finish_current_copy; + } + + src_off = 0; + +try_copy_area_data: + ssdfs_lock_page(page1); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page1, page_ref_count(page1)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (*write_offset >= PAGE_SIZE) + dst_off = *write_offset % PAGE_SIZE; + else + dst_off = *write_offset; + + src_len = min_t(u32, area_size - read_bytes, rest_len); + dst_len = min_t(u32, PAGE_SIZE, PAGE_SIZE - dst_off); + copy_len = min_t(u32, src_len, dst_len); + + page2 = ssdfs_page_array_grab_page(dmap, *cur_page); + if (unlikely(IS_ERR_OR_NULL(page2))) { + err = -ENOMEM; + SSDFS_ERR("fail to grab page: index %lu\n", + *cur_page); + goto unlock_page1; + } + + err = ssdfs_memcpy_page(page2, dst_off, PAGE_SIZE, + page1, src_off, PAGE_SIZE, + copy_len); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: " + "src_off %u, dst_off %u, " + "copy_len %u\n", + src_off, dst_off, copy_len); + goto unlock_page2; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("src_off %u, dst_off %u, src_len %u, " + "dst_len %u, copy_len %u, " + "write_offset %u, cur_page %lu, " + "page_index %d\n", + src_off, dst_off, src_len, dst_len, copy_len, + *write_offset, *cur_page, i); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (PageDirty(page1)) { + err = ssdfs_page_array_set_page_dirty(dmap, + *cur_page); + if (unlikely(err)) { + SSDFS_ERR("fail to set page dirty: " + "page_index %lu, err %d\n", + *cur_page, err); + goto unlock_page2; + } + } else { + err = -ERANGE; + SSDFS_ERR("page %d is not dirty\n", i); + goto unlock_page2; + } + +unlock_page2: + ssdfs_unlock_page(page2); + ssdfs_put_page(page2); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page2, page_ref_count(page2)); +#endif /* CONFIG_SSDFS_DEBUG */ + +unlock_page1: + ssdfs_unlock_page(page1); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page1, page_ref_count(page1)); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_current_copy: + if (unlikely(err)) { + SSDFS_ERR("fail to copy page: " + " from %lu to %lu, err %d\n", + src_index, *cur_page, err); + goto fail_copy_area_pages; + } + + read_bytes += copy_len; + *write_offset += copy_len; + rest_len -= copy_len; + + if ((dst_off + copy_len) >= PAGE_SIZE) + ++(*cur_page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("read_bytes %u, area_size %u, " + "write_offset %u, copy_len %u, rest_len %u\n", + read_bytes, area_size, + *write_offset, copy_len, rest_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (read_bytes == area_size) { + err = ssdfs_page_array_clear_dirty_page(smap, + page_index + i); + if (unlikely(err)) { + SSDFS_ERR("fail to mark page clean: " + "page_index %lu\n", + page_index + i); + goto fail_copy_area_pages; + } else + goto finish_pagevec_copy; + } else if ((src_off + copy_len) < PAGE_SIZE) { + src_off += copy_len; + goto try_copy_area_data; + } else { + err = ssdfs_page_array_clear_dirty_page(smap, + page_index + i); + if (unlikely(err)) { + SSDFS_ERR("fail to mark page clean: " + "page_index %lu\n", + page_index + i); + goto fail_copy_area_pages; + } + } + } + +finish_pagevec_copy: + page_index += PAGEVEC_SIZE; + + for (i = 0; i < pagevec_count(&pvec); i++) { + page = pvec.pages[i]; + ssdfs_put_page(page); + } + + pagevec_reinit(&pvec); + cond_resched(); + }; + + err = ssdfs_page_array_release_all_pages(smap); + if (unlikely(err)) { + SSDFS_ERR("fail to release area's pages: " + "err %d\n", err); + goto finish_copy_area_pages; + } + + pebi->current_log.seg_flags |= SSDFS_AREA_TYPE2FLAG(area_type); + + return 0; + +fail_copy_area_pages: + for (i = 0; i < pagevec_count(&pvec); i++) { + page = pvec.pages[i]; + ssdfs_put_page(page); + } + + pagevec_reinit(&pvec); + +finish_copy_area_pages: + return err; +} + +/* + * ssdfs_peb_move_area_pages_into_cache() - move area pages into cache + * @pebi: pointer on PEB object + * @area_type: type of area + * @desc: descriptor of metadata area + * @cur_page: pointer on current page value [in|out] + * @write_offset: pointer on write offset value [in|out] + * + * This function tries to move area pages into log's page cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENODATA - area is empty. + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_move_area_pages_into_cache(struct ssdfs_peb_info *pebi, + int area_type, + struct ssdfs_metadata_descriptor *desc, + pgoff_t *cur_page, + u32 *write_offset) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_peb_area *area; + size_t blk_table_size = sizeof(struct ssdfs_area_block_table); + struct pagevec pvec; + struct ssdfs_page_array *smap, *dmap; + pgoff_t page_index, end, pages_count, range_len; + struct page *page; + u32 area_offset, area_size; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(area_type >= SSDFS_LOG_AREA_MAX); + BUG_ON(!desc || !cur_page || !write_offset); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + area = &pebi->current_log.area[area_type]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, current_log.start_page %u, " + "area_type %#x, area->write_offset %u, " + "cur_page %lu, write_offset %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page, + area_type, area->write_offset, + *cur_page, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_peb_area_empty(pebi, area_type)) { + SSDFS_DBG("area %#x is empty\n", area_type); + return -ENODATA; + } + + smap = &area->array; + dmap = &pebi->cache; + + area_offset = *write_offset; + area_size = area->write_offset; + + desc->offset = cpu_to_le32(area_offset + + (pebi->current_log.start_page * fsi->pagesize)); + + desc->size = cpu_to_le32(area_size); + + if (area->has_metadata) { + page = ssdfs_page_array_get_page_locked(smap, 0); + if (IS_ERR_OR_NULL(page)) { + SSDFS_ERR("fail to get page of area %#x\n", + area_type); + return -ERANGE; + } + + kaddr = kmap_local_page(page); + desc->check.bytes = cpu_to_le16(blk_table_size); + desc->check.flags = cpu_to_le16(SSDFS_CRC32); + err = ssdfs_calculate_csum(&desc->check, kaddr, blk_table_size); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) { + SSDFS_ERR("unable to calculate checksum: err %d\n", + err); + return err; + } + + err = ssdfs_page_array_set_page_dirty(smap, 0); + if (unlikely(err)) { + SSDFS_ERR("fail to set page dirty: err %d\n", + err); + return err; + } + } + + pagevec_init(&pvec); + + page_index = 0; + pages_count = area->write_offset + PAGE_SIZE - 1; + pages_count >>= PAGE_SHIFT; + + while (page_index < pages_count) { + int i; + + range_len = min_t(pgoff_t, + (pgoff_t)PAGEVEC_SIZE, + (pgoff_t)(pages_count - page_index)); + end = page_index + range_len - 1; + + err = ssdfs_page_array_lookup_range(smap, &page_index, end, + SSDFS_DIRTY_PAGE_TAG, + PAGEVEC_SIZE, + &pvec); + if (unlikely(err)) { + SSDFS_ERR("fail to find any dirty pages: err %d\n", + err); + return err; + } + + for (i = 0; i < pagevec_count(&pvec); i++) { + struct page *page = pvec.pages[i], *page2; + pgoff_t src_off = page->index; + + ssdfs_lock_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + page2 = ssdfs_page_array_delete_page(smap, src_off); + if (IS_ERR_OR_NULL(page2)) { + err = !page2 ? -ERANGE : PTR_ERR(page2); + SSDFS_ERR("fail to delete page %lu: err %d\n", + src_off, err); + goto finish_current_move; + } + + WARN_ON(page2 != page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_page %lu, write_offset %u, " + "i %d, pvec_count %u\n", + *cur_page, *write_offset, + i, pagevec_count(&pvec)); +#endif /* CONFIG_SSDFS_DEBUG */ + + page->index = *cur_page; + + err = ssdfs_page_array_add_page(dmap, page, *cur_page); + if (unlikely(err)) { + SSDFS_ERR("fail to add page %lu: err %d\n", + *cur_page, err); + goto finish_current_move; + } + + if (PageDirty(page)) { + err = ssdfs_page_array_set_page_dirty(dmap, + *cur_page); + if (unlikely(err)) { + SSDFS_ERR("fail to set page dirty: " + "page_index %lu, err %d\n", + *cur_page, err); + goto finish_current_move; + } + } else { + err = -ERANGE; + SSDFS_ERR("page %d is not dirty\n", i); + goto finish_current_move; + } + + pvec.pages[i] = NULL; + +finish_current_move: + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) { + for (i = 0; i < pagevec_count(&pvec); i++) { + page = pvec.pages[i]; + if (!page) + continue; + ssdfs_put_page(page); + } + + pagevec_reinit(&pvec); + SSDFS_ERR("fail to move page: " + " from %lu to %lu, err %d\n", + src_off, *cur_page, err); + return err; + } + + (*cur_page)++; + *write_offset += PAGE_SIZE; + } + + page_index += PAGEVEC_SIZE; + + pagevec_reinit(&pvec); + cond_resched(); + }; + + pebi->current_log.seg_flags |= SSDFS_AREA_TYPE2FLAG(area_type); + + return 0; +} + +/* + * ssdfs_peb_store_blk_desc_table() - try to store block descriptor table + * @pebi: pointer on PEB object + * @desc: descriptor of metadata area + * @cur_page: pointer on current page value [in|out] + * @write_offset: pointer on write offset value [in|out] + * + * This function tries to store block descriptor into log's page cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENODATA - area is empty. + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_store_blk_desc_table(struct ssdfs_peb_info *pebi, + struct ssdfs_metadata_descriptor *desc, + pgoff_t *cur_page, + u32 *write_offset) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_peb_area *area; + struct ssdfs_fragment_desc *meta_desc; + struct ssdfs_fragments_chain_header *chain_hdr; + struct ssdfs_peb_temp_buffer *buf; + int area_type = SSDFS_LOG_BLK_DESC_AREA; + u16 flags; + size_t uncompr_size; + size_t compr_size = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!desc || !cur_page || !write_offset); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + area = &pebi->current_log.area[area_type]; + chain_hdr = &area->metadata.area.blk_desc.table.chain_hdr; + buf = &area->metadata.area.blk_desc.flush_buf; + flags = fsi->metadata_options.blk2off_tbl.flags; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, current_log.start_page %u, " + "area->write_offset %u, " + "cur_page %lu, write_offset %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page, + area->write_offset, + *cur_page, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_peb_area_empty(pebi, area_type)) { + SSDFS_DBG("area %#x is empty\n", area_type); + return -ENODATA; + } + + meta_desc = ssdfs_peb_get_area_cur_frag_desc(pebi, area_type); + if (IS_ERR(meta_desc)) { + SSDFS_ERR("fail to get current fragment descriptor: " + "err %d\n", + (int)PTR_ERR(meta_desc)); + return PTR_ERR(meta_desc); + } else if (!meta_desc) { + err = -ERANGE; + SSDFS_ERR("fail to get current fragment descriptor: " + "err %d\n", + err); + return err; + } + + uncompr_size = le32_to_cpu(meta_desc->uncompr_size); + + if (uncompr_size == 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("latest fragment of blk desc table is empty: " + "seg %llu, peb %llu, current_log.start_page %u, " + "area->write_offset %u, " + "cur_page %lu, write_offset %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page, + area->write_offset, + *cur_page, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + goto copy_area_pages_into_cache; + } + + if (flags & SSDFS_BLK2OFF_TBL_MAKE_COMPRESSION) { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!buf->ptr); + + if (buf->write_offset >= buf->size) { + SSDFS_ERR("invalid request: " + "buf->write_offset %u, buf->size %zu\n", + buf->write_offset, buf->size); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + meta_desc->flags = SSDFS_FRAGMENT_HAS_CSUM; + + if (uncompr_size > buf->size) { + SSDFS_ERR("invalid state: " + "uncompr_size %zu > buf->size %zu\n", + uncompr_size, buf->size); + return -ERANGE; + } + + meta_desc->checksum = ssdfs_crc32_le(buf->ptr, uncompr_size); + + if (le32_to_cpu(meta_desc->checksum) == 0) { + SSDFS_WARN("checksum is invalid: " + "seg %llu, peb %llu, bytes_count %zu\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + uncompr_size); + return -ERANGE; + } + + err = ssdfs_peb_compress_blk_descs_fragment(pebi, + uncompr_size, + &compr_size); + if (unlikely(err)) { + SSDFS_ERR("fail to compress blk desc fragment: " + "err %d\n", err); + return err; + } + + meta_desc->offset = cpu_to_le32(area->compressed_offset); + +#ifdef CONFIG_SSDFS_DEBUG + WARN_ON(compr_size > U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + meta_desc->compr_size = cpu_to_le16((u16)compr_size); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("offset %u, compr_size %u, " + "uncompr_size %u, checksum %#x\n", + le32_to_cpu(meta_desc->offset), + le16_to_cpu(meta_desc->compr_size), + le16_to_cpu(meta_desc->uncompr_size), + le32_to_cpu(meta_desc->checksum)); +#endif /* CONFIG_SSDFS_DEBUG */ + + area->compressed_offset += compr_size; + le32_add_cpu(&chain_hdr->compr_bytes, compr_size); + } + + err = ssdfs_peb_store_area_block_table(pebi, area_type, 0); + if (unlikely(err)) { + SSDFS_ERR("fail to store area's block table: " + "area %#x, err %d\n", + area_type, err); + return err; + } + +copy_area_pages_into_cache: + err = ssdfs_peb_copy_area_pages_into_cache(pebi, area_type, + desc, cur_page, + write_offset); + if (unlikely(err)) { + SSDFS_ERR("fail to move pages in the cache: err %d\n", + err); + return err; + } + + return 0; +} + +/* + * ssdfs_peb_store_log_footer() - store log footer + * @pebi: pointer on PEB object + * @flags: log footer's flags + * @hdr_desc: log footer's metadata descriptor in header + * @lf_desc: log footer's metadata descriptors array + * @array_size: count of items in array + * @cur_segs: current segment IDs array + * @cur_segs_size: size of segment IDs array size in bytes + * @cur_page: pointer on current page value [in|out] + * @write_offset: pointer on write offset value [in|out] + * + * This function tries to store log footer into PEB's page cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - fail to allocate memory page. + */ +static +int ssdfs_peb_store_log_footer(struct ssdfs_peb_info *pebi, + u32 flags, + struct ssdfs_metadata_descriptor *hdr_desc, + struct ssdfs_metadata_descriptor *lf_desc, + size_t array_size, + __le64 *cur_segs, + size_t cur_segs_size, + pgoff_t *cur_page, + u32 *write_offset) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_log_footer *footer; + size_t desc_size = sizeof(struct ssdfs_metadata_descriptor); + size_t array_bytes = desc_size * array_size; + int padding; + u32 log_pages; + struct page *page; + u32 area_offset, area_size; + u64 last_log_time; + u64 last_log_cno; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!hdr_desc || !lf_desc || !cur_segs); + BUG_ON(!cur_page || !write_offset); + BUG_ON(array_size != SSDFS_LOG_FOOTER_DESC_MAX); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, current_log.start_page %u, " + "cur_page %lu, write_offset %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page, + *cur_page, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + area_offset = *write_offset; + area_size = sizeof(struct ssdfs_log_footer); + + *write_offset += max_t(u32, PAGE_SIZE, area_size); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(flags & ~SSDFS_LOG_FOOTER_FLAG_MASK); + BUG_ON(((*write_offset + fsi->pagesize - 1) >> fsi->log_pagesize) > + pebi->log_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + log_pages = (*write_offset + fsi->pagesize - 1) / fsi->pagesize; + + padding = *cur_page % pebi->log_pages; + padding = pebi->log_pages - padding; + padding--; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("area_offset %u, write_offset %u, " + "log_pages %u, padding %d, " + "cur_page %lu\n", + area_offset, *write_offset, + log_pages, padding, + *cur_page); + + if (padding > 1) { + SSDFS_WARN("padding is big: " + "seg %llu, peb %llu, current_log.start_page %u, " + "cur_page %lu, write_offset %u, " + "padding %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + pebi->current_log.start_page, + *cur_page, *write_offset, + padding); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + if (padding > 0) { + /* + * Align the log_pages and log_bytes. + */ + log_pages += padding; + *write_offset = log_pages * fsi->pagesize; + area_offset = *write_offset - fsi->pagesize; + + for (i = 0; i < padding; i++) { + page = ssdfs_page_array_grab_page(&pebi->cache, + *cur_page); + if (IS_ERR_OR_NULL(page)) { + SSDFS_ERR("fail to get cache page: index %lu\n", + *cur_page); + return -ENOMEM; + } + + ssdfs_memset_page(page, 0, PAGE_SIZE, 0xFF, PAGE_SIZE); + + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + + err = ssdfs_page_array_set_page_dirty(&pebi->cache, + *cur_page); + if (unlikely(err)) { + SSDFS_ERR("fail to set page dirty: " + "page_index %lu, err %d\n", + *cur_page, err); + } + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + + if (unlikely(err)) + return err; + + (*cur_page)++; + } + } + + page = ssdfs_page_array_grab_page(&pebi->cache, *cur_page); + if (IS_ERR_OR_NULL(page)) { + SSDFS_ERR("fail to get cache page: index %lu\n", + *cur_page); + return -ENOMEM; + } + + footer = kmap_local_page(page); + memset(footer, 0xFF, PAGE_SIZE); + ssdfs_memcpy(footer->desc_array, 0, array_bytes, + lf_desc, 0, array_bytes, + array_bytes); + + last_log_time = pebi->current_log.last_log_time; + last_log_cno = pebi->current_log.last_log_cno; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(pebi->peb_create_time > last_log_time); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_prepare_volume_state_info_for_commit(fsi, SSDFS_MOUNTED_FS, + cur_segs, + cur_segs_size, + last_log_time, + last_log_cno, + &footer->volume_state); + + if (!err) { + err = ssdfs_prepare_log_footer_for_commit(fsi, log_pages, + flags, + last_log_time, + last_log_cno, + footer); + + footer->peb_create_time = cpu_to_le64(pebi->peb_create_time); + } + + if (!err) { + hdr_desc->offset = cpu_to_le32(area_offset + + (pebi->current_log.start_page * fsi->pagesize)); + hdr_desc->size = cpu_to_le32(area_size); + + ssdfs_memcpy(&hdr_desc->check, + 0, sizeof(struct ssdfs_metadata_check), + &footer->volume_state.check, + 0, sizeof(struct ssdfs_metadata_check), + sizeof(struct ssdfs_metadata_check)); + } + + flush_dcache_page(page); + kunmap_local(footer); + + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + + err = ssdfs_page_array_set_page_dirty(&pebi->cache, *cur_page); + if (unlikely(err)) { + SSDFS_ERR("fail to set page dirty: " + "page_index %lu, err %d\n", + *cur_page, err); + } + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) { + SSDFS_CRIT("fail to store log footer: " + "seg %llu, peb %llu, current_log.start_page %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page, err); + return err; + } + + pebi->current_log.seg_flags |= SSDFS_LOG_HAS_FOOTER; + + (*cur_page)++; + + return 0; +} + +/* + * ssdfs_extract_src_peb_migration_id() - prepare src PEB's migration_id + * @pebi: pointer on PEB object + * @prev_id: pointer on previous PEB's peb_migration_id [out] + * @cur_id: pointer on current PEB's peb_migration_id [out] + */ +static inline +int ssdfs_extract_src_peb_migration_id(struct ssdfs_peb_info *pebi, + u8 *prev_id, u8 *cur_id) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->src_peb); + BUG_ON(!prev_id || !cur_id); + BUG_ON(!rwsem_is_locked(&pebi->pebc->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + *prev_id = SSDFS_PEB_UNKNOWN_MIGRATION_ID; + *cur_id = SSDFS_PEB_UNKNOWN_MIGRATION_ID; + + if (pebi != pebi->pebc->src_peb) { + SSDFS_ERR("pebi %p != src_peb %p\n", + pebi, pebi->pebc->src_peb); + return -ERANGE; + } + + *cur_id = ssdfs_get_peb_migration_id_checked(pebi); + if (unlikely(*cur_id < 0)) { + err = *cur_id; + *cur_id = SSDFS_PEB_UNKNOWN_MIGRATION_ID; + SSDFS_ERR("fail to get migration_id: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + err); + return err; + } + + *prev_id = ssdfs_define_prev_peb_migration_id(pebi); + if (!is_peb_migration_id_valid(*prev_id)) { + err = *prev_id; + *prev_id = SSDFS_PEB_UNKNOWN_MIGRATION_ID; + SSDFS_ERR("fail to define prev migration_id: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + err); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_extract_dst_peb_migration_id() - prepare dst PEB's migration_id + * @pebi: pointer on PEB object + * @prev_id: pointer on previous PEB's peb_migration_id [out] + * @cur_id: pointer on current PEB's peb_migration_id [out] + */ +static inline +int ssdfs_extract_dst_peb_migration_id(struct ssdfs_peb_info *pebi, + u8 *prev_id, u8 *cur_id) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->src_peb || !pebi->pebc->dst_peb); + BUG_ON(!prev_id || !cur_id); + BUG_ON(!rwsem_is_locked(&pebi->pebc->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + *prev_id = SSDFS_PEB_UNKNOWN_MIGRATION_ID; + *cur_id = SSDFS_PEB_UNKNOWN_MIGRATION_ID; + + *cur_id = ssdfs_get_peb_migration_id_checked(pebi->pebc->dst_peb); + if (unlikely(*cur_id < 0)) { + err = *cur_id; + *cur_id = SSDFS_PEB_UNKNOWN_MIGRATION_ID; + SSDFS_ERR("fail to get migration_id: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + err); + return err; + } + + *prev_id = ssdfs_get_peb_migration_id_checked(pebi->pebc->src_peb); + if (unlikely(*prev_id < 0)) { + err = *prev_id; + *prev_id = SSDFS_PEB_UNKNOWN_MIGRATION_ID; + SSDFS_ERR("fail to get migration_id: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + err); + return err; + } + + return 0; +} + +/* + * ssdfs_store_peb_migration_id() - store peb_migration_id into header + * @pebi: pointer on PEB object + * @hdr: pointer on segment header [out] + */ +static +int ssdfs_store_peb_migration_id(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_header *hdr) +{ + int items_state; + u8 prev_id = SSDFS_PEB_UNKNOWN_MIGRATION_ID; + u8 cur_id = SSDFS_PEB_UNKNOWN_MIGRATION_ID; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!rwsem_is_locked(&pebi->pebc->lock)); + + SSDFS_DBG("seg %llu, peb %llu\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + items_state = atomic_read(&pebi->pebc->items_state); + switch (items_state) { + case SSDFS_PEB1_SRC_CONTAINER: + case SSDFS_PEB2_SRC_CONTAINER: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi->pebc->src_peb || pebi->pebc->dst_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_extract_src_peb_migration_id(pebi, + &prev_id, + &cur_id); + if (unlikely(err)) { + SSDFS_ERR("fail to extract peb_migration_id: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); + return err; + } + break; + + case SSDFS_PEB1_DST_CONTAINER: + case SSDFS_PEB2_DST_CONTAINER: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(pebi->pebc->src_peb || !pebi->pebc->dst_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pebi != pebi->pebc->dst_peb) { + SSDFS_ERR("pebi %p != dst_peb %p\n", + pebi, pebi->pebc->dst_peb); + return -ERANGE; + } + + cur_id = ssdfs_get_peb_migration_id_checked(pebi); + if (unlikely(cur_id < 0)) { + err = cur_id; + SSDFS_ERR("fail to get migration_id: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + err); + return err; + } + break; + + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi->pebc->src_peb || !pebi->pebc->dst_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = -ERANGE; + + if (pebi == pebi->pebc->src_peb) { + err = ssdfs_extract_src_peb_migration_id(pebi, + &prev_id, + &cur_id); + } else if (pebi == pebi->pebc->dst_peb) { + err = ssdfs_extract_dst_peb_migration_id(pebi, + &prev_id, + &cur_id); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to extract peb_migration_id: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); + return err; + } + break; + + case SSDFS_PEB1_SRC_EXT_PTR_DST_CONTAINER: + case SSDFS_PEB2_SRC_EXT_PTR_DST_CONTAINER: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi->pebc->src_peb || !pebi->pebc->dst_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_extract_src_peb_migration_id(pebi, + &prev_id, + &cur_id); + if (unlikely(err)) { + SSDFS_ERR("fail to extract peb_migration_id: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); + return err; + } + break; + + default: + SSDFS_WARN("invalid items_state %#x\n", + items_state); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + hdr->peb_migration_id[SSDFS_PREV_MIGRATING_PEB] = prev_id; + hdr->peb_migration_id[SSDFS_CUR_MIGRATING_PEB] = cur_id; + + return 0; +} + +/* + * ssdfs_peb_store_log_header() - store log's header + * @pebi: pointer on PEB object + * @desc_array: pointer on descriptors array + * @array_size: count of items in array + * @write_offset: current write offset in log + * + * This function tries to store log's header in PEB's page cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_store_log_header(struct ssdfs_peb_info *pebi, + struct ssdfs_metadata_descriptor *desc_array, + size_t array_size, + u32 write_offset) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_header *hdr; + struct page *page; + size_t desc_size = sizeof(struct ssdfs_metadata_descriptor); + size_t array_bytes = desc_size * array_size; + u32 seg_flags; + u32 log_pages; + u16 seg_type; + u64 last_log_time; + u64 last_log_cno; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!desc_array); + BUG_ON(array_size != SSDFS_SEG_HDR_DESC_MAX); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, current_log.start_page %u, " + "write_offset %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page, + write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(pebi->pebc->parent_si->seg_type > SSDFS_LAST_KNOWN_SEG_TYPE); + BUG_ON(pebi->current_log.seg_flags & ~SSDFS_SEG_HDR_FLAG_MASK); + BUG_ON(write_offset % fsi->pagesize); + BUG_ON((write_offset >> fsi->log_pagesize) > pebi->log_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + seg_type = pebi->pebc->parent_si->seg_type; + log_pages = pebi->log_pages; + seg_flags = pebi->current_log.seg_flags; + + page = ssdfs_page_array_get_page_locked(&pebi->cache, + pebi->current_log.start_page); + if (IS_ERR_OR_NULL(page)) { + SSDFS_ERR("fail to get cache page: index %u\n", + pebi->current_log.start_page); + return -ERANGE; + } + + hdr = kmap_local_page(page); + + ssdfs_memcpy(hdr->desc_array, 0, array_bytes, + desc_array, 0, array_bytes, + array_bytes); + + ssdfs_create_volume_header(fsi, &hdr->volume_hdr); + + err = ssdfs_prepare_volume_header_for_commit(fsi, &hdr->volume_hdr); + if (unlikely(err)) + goto finish_segment_header_preparation; + + err = ssdfs_store_peb_migration_id(pebi, hdr); + if (unlikely(err)) + goto finish_segment_header_preparation; + + hdr->peb_create_time = cpu_to_le64(pebi->peb_create_time); + + last_log_time = pebi->current_log.last_log_time; + last_log_cno = pebi->current_log.last_log_cno; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, " + "peb_create_time %llx, last_log_time %llx\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + pebi->peb_create_time, + last_log_time); + + BUG_ON(pebi->peb_create_time > last_log_time); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_prepare_segment_header_for_commit(fsi, + log_pages, + seg_type, + seg_flags, + last_log_time, + last_log_cno, + hdr); + if (unlikely(err)) + goto finish_segment_header_preparation; + +finish_segment_header_preparation: + flush_dcache_page(page); + kunmap_local(hdr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) { + SSDFS_CRIT("fail to store segment header: " + "seg %llu, peb %llu, current_log.start_page %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page, err); + return err; + } + + return 0; +} + +/* + * ssdfs_peb_flush_current_log_dirty_pages() - flush log's dirty pages + * @pebi: pointer on PEB object + * @write_offset: current write offset in log + * + * This function tries to flush the current log's dirty pages. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_flush_current_log_dirty_pages(struct ssdfs_peb_info *pebi, + u32 write_offset) +{ + struct ssdfs_fs_info *fsi; + loff_t peb_offset; + struct pagevec pvec; + u32 log_bytes, written_bytes; + u32 log_start_off; + unsigned flushed_pages; +#ifdef CONFIG_SSDFS_CHECK_LOGICAL_BLOCK_EMPTYNESS + u32 pages_per_block; +#endif /* CONFIG_SSDFS_CHECK_LOGICAL_BLOCK_EMPTYNESS */ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(write_offset == 0); + BUG_ON(write_offset % pebi->pebc->parent_si->fsi->pagesize); + BUG_ON(!pebi->pebc->parent_si->fsi->devops); + BUG_ON(!pebi->pebc->parent_si->fsi->devops->writepages); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, current_log.start_page %u, " + "write_offset %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page, + write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + pagevec_init(&pvec); + + peb_offset = (pebi->peb_id * fsi->pages_per_peb) << fsi->log_pagesize; + + log_bytes = write_offset; + log_start_off = pebi->current_log.start_page << fsi->log_pagesize; + written_bytes = 0; + flushed_pages = 0; + + while (written_bytes < log_bytes) { + pgoff_t index, end; + unsigned i; + u32 page_start_off, write_size; + loff_t iter_write_offset; + u32 pagevec_bytes; + pgoff_t written_pages = 0; + + index = pebi->current_log.start_page + flushed_pages; + end = (pgoff_t)pebi->current_log.start_page + pebi->log_pages; + end = min_t(pgoff_t, end, (pgoff_t)(index + PAGEVEC_SIZE)); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index %lu, end %lu\n", + index, end); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_page_array_lookup_range(&pebi->cache, + &index, end, + SSDFS_DIRTY_PAGE_TAG, + PAGEVEC_SIZE, + &pvec); + if (unlikely(err)) { + SSDFS_ERR("fail to find dirty pages: " + "index %lu, end %lu, err %d\n", + index, end, err); + return -ERANGE; + } + + page_start_off = log_start_off + written_bytes; + page_start_off %= PAGE_SIZE; + + pagevec_bytes = (u32)pagevec_count(&pvec) * PAGE_SIZE; + + write_size = min_t(u32, + pagevec_bytes - page_start_off, + log_bytes - written_bytes); + + if ((written_bytes + write_size) > log_bytes) { + pagevec_reinit(&pvec); + SSDFS_ERR("written_bytes %u > log_bytes %u\n", + written_bytes + write_size, + log_bytes); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(write_size % fsi->pagesize); + BUG_ON(written_bytes % fsi->pagesize); + + for (i = 1; i < pagevec_count(&pvec); i++) { + struct page *page1, *page2; + + page1 = pvec.pages[i - 1]; + page2 = pvec.pages[i]; + + if ((page_index(page1) + 1) != page_index(page2)) { + SSDFS_ERR("not contiguous log: " + "page_index1 %lu, page_index2 %lu\n", + page_index(page1), + page_index(page2)); + } + } +#endif /* CONFIG_SSDFS_DEBUG */ + + iter_write_offset = peb_offset + log_start_off; + iter_write_offset += written_bytes; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("iter_write_offset %llu, write_size %u, " + "page_start_off %u\n", + iter_write_offset, write_size, page_start_off); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_CHECK_LOGICAL_BLOCK_EMPTYNESS + pages_per_block = fsi->pagesize / PAGE_SIZE; + for (i = 0; i < pagevec_count(&pvec); i += pages_per_block) { + u64 byte_off; + + if (!fsi->devops->can_write_page) { + SSDFS_DBG("can_write_page is not supported\n"); + break; + } + + byte_off = iter_write_offset; + byte_off += i * PAGE_SIZE; + + err = fsi->devops->can_write_page(fsi->sb, byte_off, + true); + if (err) { + pagevec_reinit(&pvec); + ssdfs_fs_error(fsi->sb, + __FILE__, __func__, __LINE__, + "offset %llu err %d\n", + byte_off, err); + return err; + } + } +#endif /* CONFIG_SSDFS_CHECK_LOGICAL_BLOCK_EMPTYNESS */ + + err = fsi->devops->writepages(fsi->sb, iter_write_offset, + &pvec, + page_start_off, + write_size); + if (unlikely(err)) { + pagevec_reinit(&pvec); + SSDFS_ERR("fail to flush pagevec: " + "iter_write_offset %llu, write_size %u, " + "err %d\n", + iter_write_offset, write_size, err); + return err; + } + + written_pages = write_size / PAGE_SIZE; + + for (i = 0; i < pagevec_count(&pvec); i++) { + struct page *page = pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (i < written_pages) { + ssdfs_lock_page(page); + ClearPageUptodate(page); + ssdfs_clear_page_private(page, 0); + pvec.pages[i] = NULL; + ssdfs_unlock_page(page); + } else { + ssdfs_lock_page(page); + pvec.pages[i] = NULL; + ssdfs_unlock_page(page); + } + } + + end = index + written_pages - 1; + err = ssdfs_page_array_clear_dirty_range(&pebi->cache, + index, + end); + if (unlikely(err)) { + SSDFS_ERR("fail to clean dirty pages: " + "start %lu, end %lu, err %d\n", + index, end, err); + } + + err = ssdfs_page_array_release_pages(&pebi->cache, + &index, end); + if (unlikely(err)) { + SSDFS_ERR("fail to release pages: " + "seg_id %llu, peb_id %llu, " + "start %lu, end %lu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, index, end, err); + } + + written_bytes += write_size; + flushed_pages += written_pages; + + pagevec_reinit(&pvec); + cond_resched(); + }; + + return 0; +} + +/* + * ssdfs_peb_commit_log_payload() - commit payload of the log + * @pebi: pointer on PEB object + * @hdr_desc: log header's metadata descriptors array + * @log_has_data: does log contain data? [out] + * @cur_page: pointer on current page value [in|out] + * @write_offset: pointer on write offset value [in|out] + */ +static +int ssdfs_peb_commit_log_payload(struct ssdfs_peb_info *pebi, + struct ssdfs_metadata_descriptor *hdr_desc, + bool *log_has_data, + pgoff_t *cur_page, u32 *write_offset) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_metadata_descriptor *cur_hdr_desc; + int area_type; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!hdr_desc || !cur_page || !write_offset || !log_has_data); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, current_log.start_page %u, " + "cur_page %lu, write_offset %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.start_page, + *cur_page, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + *log_has_data = false; + + cur_hdr_desc = &hdr_desc[SSDFS_BLK_BMAP_INDEX]; + err = ssdfs_peb_store_block_bmap(pebi, cur_hdr_desc, + cur_page, write_offset); + if (unlikely(err)) { + SSDFS_CRIT("fail to store block bitmap: " + "seg %llu, peb %llu, cur_page %lu, write_offset %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + *cur_page, *write_offset, err); + goto finish_commit_payload; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0001-payload: cur_page %lu, write_offset %u\n", + *cur_page, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + cur_hdr_desc = &hdr_desc[SSDFS_OFF_TABLE_INDEX]; + err = ssdfs_peb_store_offsets_table(pebi, cur_hdr_desc, + cur_page, write_offset); + if (unlikely(err)) { + SSDFS_CRIT("fail to store offsets table: " + "seg %llu, peb %llu, cur_page %lu, write_offset %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + *cur_page, *write_offset, err); + goto finish_commit_payload; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0002-payload: cur_page %lu, write_offset %u\n", + *cur_page, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + area_type = SSDFS_LOG_BLK_DESC_AREA; + cur_hdr_desc = &hdr_desc[SSDFS_AREA_TYPE2INDEX(area_type)]; + err = ssdfs_peb_store_blk_desc_table(pebi, cur_hdr_desc, + cur_page, write_offset); + if (err == -ENODATA) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("block descriptor area is absent: " + "seg %llu, peb %llu, " + "cur_page %lu, write_offset %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + *cur_page, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (unlikely(err)) { + SSDFS_CRIT("fail to store block descriptors table: " + "seg %llu, peb %llu, cur_page %lu, write_offset %u, " + "err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + *cur_page, *write_offset, err); + goto finish_commit_payload; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0003-payload: cur_page %lu, write_offset %u\n", + *cur_page, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + area_type = SSDFS_LOG_DIFFS_AREA; + cur_hdr_desc = &hdr_desc[SSDFS_AREA_TYPE2INDEX(area_type)]; + err = ssdfs_peb_copy_area_pages_into_cache(pebi, + area_type, + cur_hdr_desc, + cur_page, + write_offset); + if (err == -ENODATA) { + err = 0; + } else if (unlikely(err)) { + SSDFS_CRIT("fail to move the area %d into PEB cache: " + "seg %llu, peb %llu, cur_page %lu, " + "write_offset %u, err %d\n", + area_type, pebi->pebc->parent_si->seg_id, + pebi->peb_id, *cur_page, *write_offset, + err); + goto finish_commit_payload; + } else + *log_has_data = true; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0004-payload: cur_page %lu, write_offset %u\n", + *cur_page, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + area_type = SSDFS_LOG_JOURNAL_AREA; + cur_hdr_desc = &hdr_desc[SSDFS_AREA_TYPE2INDEX(area_type)]; + err = ssdfs_peb_copy_area_pages_into_cache(pebi, + area_type, + cur_hdr_desc, + cur_page, + write_offset); + if (err == -ENODATA) { + err = 0; + } else if (unlikely(err)) { + SSDFS_CRIT("fail to move the area %d into PEB cache: " + "seg %llu, peb %llu, cur_page %lu, " + "write_offset %u, err %d\n", + area_type, pebi->pebc->parent_si->seg_id, + pebi->peb_id, *cur_page, *write_offset, + err); + goto finish_commit_payload; + } else + *log_has_data = true; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0005-payload: cur_page %lu, write_offset %u\n", + *cur_page, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (*write_offset % PAGE_SIZE) { + (*cur_page)++; + + *write_offset += PAGE_SIZE - 1; + *write_offset >>= PAGE_SHIFT; + *write_offset <<= PAGE_SHIFT; + } + + area_type = SSDFS_LOG_MAIN_AREA; + cur_hdr_desc = &hdr_desc[SSDFS_AREA_TYPE2INDEX(area_type)]; + err = ssdfs_peb_move_area_pages_into_cache(pebi, + area_type, + cur_hdr_desc, + cur_page, + write_offset); + if (err == -ENODATA) { + err = 0; + } else if (unlikely(err)) { + SSDFS_CRIT("fail to move the area %d into PEB cache: " + "seg %llu, peb %llu, cur_page %lu, " + "write_offset %u, err %d\n", + area_type, pebi->pebc->parent_si->seg_id, + pebi->peb_id, *cur_page, *write_offset, + err); + goto finish_commit_payload; + } else + *log_has_data = true; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("0006-payload: cur_page %lu, write_offset %u\n", + *cur_page, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_commit_payload: + return err; +} + /* * ssdfs_peb_define_next_log_start() - define start of the next log * @pebi: pointer on PEB object From patchwork Sat Feb 25 01:08:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151934 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4773BC6FA8E for ; Sat, 25 Feb 2023 01:17:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229746AbjBYBRk (ORCPT ); Fri, 24 Feb 2023 20:17:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229710AbjBYBQs (ORCPT ); Fri, 24 Feb 2023 20:16:48 -0500 Received: from mail-oi1-x236.google.com (mail-oi1-x236.google.com [IPv6:2607:f8b0:4864:20::236]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9198F15CA6 for ; Fri, 24 Feb 2023 17:16:41 -0800 (PST) Received: by mail-oi1-x236.google.com with SMTP id s41so6004oiw.13 for ; Fri, 24 Feb 2023 17:16:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KVp52myVQBoHKKJ7zEqsDlHvv8dniF0ze345V4RXKyg=; b=pQ0/NK5hld3N2RvzIIHID45NiMcgIFw0i9jHrXxHTVkSlVKphpESHfk9ITzjiTUvg/ mpGLFEAVN33j1SSbmGEiseozwMJP8vw0JH6ijfgFEbpHps1v8UNOynHl0lS3uWF4YpIS +tH2cXRKbGWIDCSsjVgctVQ3+9S3azbEi8l/uDIq1OHtBPU0is6gEx2F+RMzNCPWzulh eCR9GHNAsr7tBznkv76y3tHQns40VwUibJwUBi6iYlnvnlwsAW2sgq+CZKPA99fruEWd V5JGwJy3eIJCglpPoJFm5ckjLgpvrTIgCiLjvTjvVWZdNW2DbIo2cyxrD9tAReI6MKXR RNnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KVp52myVQBoHKKJ7zEqsDlHvv8dniF0ze345V4RXKyg=; b=X3m640Z3pG8reLyrJwer8kWXR66ser1TJzg/P8zDvRgSItK/778NB7aFIkwEK7YoVd PTq8wAqkLOOmFbq3tPFx4CEP7gVJ0mEYgUxfYH/aWv41/19jJOyS5UORjrbj62gQh6vR qbkYtHynas1SqW8PepR9OUFu80pez2c5NGCR8wH666OwelOFSeLVcU5l9wHGgMIWRvMW gHIwB2KLXrf7L87r7IDHoOmV10HRzk37HXXQuptv2ZgUlujQb3NLc64o9Yftbmtpcfib N9s8FEcpV8BRf0xQ5QhYsRdyK9yxi/nFGKmDazGuXpbUwyUiIOvlSTVd3X9FrZxq5kRU m0XA== X-Gm-Message-State: AO0yUKVIhqIyo3cPFwT2msrKUCvg9AB2vd6KJVafRaSyFkI/U6Y5mWCO Dds9/cOTkuN69oKGPDUQ4ZcFW83m1qTcfobS X-Google-Smtp-Source: AK7set+2FxorP3gb7Bqp40zlNLqJ2QIaI3TO+x8M4Temmx5986GscahmlN1ScgoDT+HRS6Db2ZPj+w== X-Received: by 2002:a05:6808:6242:b0:378:12b9:b31e with SMTP id dt2-20020a056808624200b0037812b9b31emr4238905oib.27.1677287799937; Fri, 24 Feb 2023 17:16:39 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:39 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 31/76] ssdfs: process update request Date: Fri, 24 Feb 2023 17:08:42 -0800 Message-Id: <20230225010927.813929-32-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Flush thread can receive several types of update requests: (1) update block or extent, (2) prepare diff of b-tree node, (3) prepare diff of user data logical block, (4) commit log now, (5) start migration now, (6) process invalidated extent, (7) migrate range, pre-allocated page, or fragment. Update block or extent implies to store updated user data or metadata under Copy-On-Write (COW) policy in compressed or non-compressed state. Prepare diff means to use a delta-encoding technique to store updated user data or metadata under Copy-On-Write (COW) policy. Commit log now operation requests the execution of log commit right now. The start migration now operation is dedicated to mapping table case. This operation is requested before mapping table flush with the goal to check the necessity to finish/start migration. Because, start/finish migration requires the modification of mapping table. However, mapping table's flush operation needs to be finished without any modifications of mapping table itself. Process invalidates extent operation is executed after operation of file truncate or b-tree node deletion operation. This operation executes with the goal to finish migration operation and correct the state of PEB container. The migrate range operation can be received from global GC thread(s) as a recommendation for flush thread to execute migration of valid blocks from source (exhausted) erase block into destination (clean or "using") erase block if a pair of erase block is stuck in migration process. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/peb_flush_thread.c | 2426 +++++++++++++++++++++++++++++++++++ 1 file changed, 2426 insertions(+) diff --git a/fs/ssdfs/peb_flush_thread.c b/fs/ssdfs/peb_flush_thread.c index 2de4bb806678..7e6a8a67e142 100644 --- a/fs/ssdfs/peb_flush_thread.c +++ b/fs/ssdfs/peb_flush_thread.c @@ -109,10 +109,2436 @@ void ssdfs_flush_check_memory_leaks(void) #endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ } +/* + * struct ssdfs_fragment_source - fragment source descriptor + * @page: memory page that contains uncompressed fragment + * @start_offset: offset into page to fragment's begin + * @data_bytes: size of fragment in bytes + * @sequence_id: fragment's sequence number + * @fragment_type: fragment type + * @fragment_flags: fragment's flags + */ +struct ssdfs_fragment_source { + struct page *page; + u32 start_offset; + size_t data_bytes; + u8 sequence_id; + u8 fragment_type; + u8 fragment_flags; +}; + +/* + * struct ssdfs_fragment_destination - fragment destination descriptor + * @area_offset: offset of area from log's beginning + * @write_offset: offset of @store pointer from area's begin + * @store: pointer for storing fragment + * @free_space: available space in bytes for fragment storing [in|out] + * @compr_size: size of fragment in bytes after compression [out] + * @desc: fragment descriptor [out] + */ +struct ssdfs_fragment_destination { + u32 area_offset; + u32 write_offset; + unsigned char *store; + size_t free_space; + size_t compr_size; + struct ssdfs_fragment_desc *desc; +}; + +/* + * struct ssdfs_byte_stream_descriptor - byte stream descriptor + * @pvec: pagevec that contains byte stream + * @start_offset: offset in bytes of byte stream in pagevec + * @data_bytes: size of uncompressed byte stream + * @write_offset: write offset of byte stream in area [out] + * @compr_bytes: size of byte stream after compression [out] + */ +struct ssdfs_byte_stream_descriptor { + struct pagevec *pvec; + u32 start_offset; + u32 data_bytes; + u32 write_offset; + u32 compr_bytes; +}; + +/* + * struct ssdfs_bmap_descriptor - block bitmap flush descriptor + * @pebi: pointer on PEB object + * @snapshot: block bitmap snapshot + * @peb_index: PEB index of bitmap owner + * @flags: fragment flags + * @type: fragment type + * @compression_type: type of compression + * @last_free_blk: last logical free block + * @metadata_blks: count of physical pages are used by metadata + * @invalid_blks: count of invalid blocks + * @frag_id: pointer on fragment counter + * @cur_page: pointer on current page value + * @write_offset: pointer on write offset value + */ +struct ssdfs_bmap_descriptor { + struct ssdfs_peb_info *pebi; + struct ssdfs_page_vector *snapshot; + u16 peb_index; + u8 flags; + u8 type; + u8 compression_type; + u32 last_free_blk; + u32 metadata_blks; + u32 invalid_blks; + size_t bytes_count; + u8 *frag_id; + pgoff_t *cur_page; + u32 *write_offset; +}; + +/* + * struct ssdfs_pagevec_descriptor - pagevec descriptor + * @pebi: pointer on PEB object + * @page_vec: pagevec with saving data + * @start_sequence_id: start sequence id + * @area_offset: offset of area + * @bytes_count: size in bytes of valid data in pagevec + * @desc_array: array of fragment descriptors + * @array_capacity: capacity of fragment descriptors' array + * @compression_type: type of compression + * @compr_size: whole size of all compressed fragments [out] + * @uncompr_size: whole size of all fragments in uncompressed state [out] + * @fragments_count: count of saved fragments + * @cur_page: pointer on current page value + * @write_offset: pointer on write offset value + */ +struct ssdfs_pagevec_descriptor { + struct ssdfs_peb_info *pebi; + struct ssdfs_page_vector *page_vec; + u16 start_sequence_id; + u32 area_offset; + size_t bytes_count; + struct ssdfs_fragment_desc *desc_array; + size_t array_capacity; + u8 compression_type; + u32 compr_size; + u32 uncompr_size; + u16 fragments_count; + pgoff_t *cur_page; + u32 *write_offset; +}; + /****************************************************************************** * FLUSH THREAD FUNCTIONALITY * ******************************************************************************/ +/* + * ssdfs_peb_read_from_offset() - read in buffer from offset + * @pebi: pointer on PEB object + * @off: offset in PEB + * @buf: pointer on buffer + * @buf_size: size of the buffer + * + * This function tries to read from volume into buffer. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +#ifdef CONFIG_SSDFS_UNDER_DEVELOPMENT_FUNC +static +int ssdfs_peb_read_from_offset(struct ssdfs_peb_info *pebi, + struct ssdfs_phys_offset_descriptor *off, + void *buf, size_t buf_size) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_metadata_descriptor desc_array[SSDFS_SEG_HDR_DESC_MAX]; + u16 log_start_page; + u32 byte_offset; + u16 log_index; + int area_index; + u32 area_offset; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!off || !buf); + BUG_ON(buf_size == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + log_start_page = le16_to_cpu(off->blk_state.log_start_page); + byte_offset = le32_to_cpu(off->blk_state.byte_offset); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, " + "log_start_page %u, log_area %#x, " + "peb_migration_id %u, byte_offset %u, " + "buf %p, buf_size %zu\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + log_start_page, off->blk_state.log_area, + off->blk_state.peb_migration_id, + byte_offset, buf, buf_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + log_index = log_start_page / pebi->log_pages; + + if (log_index >= (fsi->pages_per_peb / pebi->log_pages)) { + SSDFS_ERR("invalid log index %u\n", log_index); + return -ERANGE; + } + + area_index = SSDFS_AREA_TYPE2INDEX(off->blk_state.log_area); + + if (area_index >= SSDFS_SEG_HDR_DESC_MAX) { + SSDFS_ERR("invalid area index %#x\n", area_index); + return -ERANGE; + } + + err = ssdfs_peb_read_log_hdr_desc_array(pebi, log_index, desc_array, + SSDFS_SEG_HDR_DESC_MAX); + if (unlikely(err)) { + SSDFS_ERR("fail to read log's header desc array: " + "seg %llu, peb %llu, log_index %u, err %d\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + log_index, err); + return err; + } + + area_offset = le32_to_cpu(desc_array[area_index].offset); + + err = ssdfs_unaligned_read_buffer(fsi, pebi->peb_id, + area_offset + byte_offset, + buf, buf_size); + if (unlikely(err)) { + SSDFS_ERR("fail to read buffer: " + "peb %llu, area_offset %u, byte_offset %u, " + "buf_size %zu, err %d\n", + pebi->peb_id, area_offset, byte_offset, + buf_size, err); + return err; + } + + return 0; +} +#endif /* CONFIG_SSDFS_UNDER_DEVELOPMENT_FUNC */ + +static inline +bool does_user_data_block_contain_diff(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + u32 mem_pages_per_block; + int page_index; + struct page *page; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !req); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_ssdfs_peb_containing_user_data(pebi->pebc)) + return false; + + fsi = pebi->pebc->parent_si->fsi; + mem_pages_per_block = fsi->pagesize / PAGE_SIZE; + page_index = req->result.processed_blks * mem_pages_per_block; + page = req->result.pvec.pages[page_index]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + return is_diff_page(page); +} + +/* + * __ssdfs_peb_update_block() - update data block + * @pebi: pointer on PEB object + * @req: I/O request + * + * This function tries to update data block in PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EAGAIN - try again to update data block. + * %-ENOENT - need migrate base state before storing diff. + */ +static +int __ssdfs_peb_update_block(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + struct ssdfs_blk2off_table *table; + struct ssdfs_segment_blk_bmap *seg_blkbmap; + struct ssdfs_peb_blk_bmap *peb_blkbmap; + struct ssdfs_phys_offset_descriptor *blk_desc_off; + struct ssdfs_peb_phys_offset data_off = {0}; + struct ssdfs_peb_phys_offset desc_off = {0}; + u16 blk; + u64 logical_offset; + struct ssdfs_block_bmap_range range; + int range_state; + u32 written_bytes; + u16 peb_index; + int migration_state = SSDFS_LBLOCK_UNKNOWN_STATE; + struct ssdfs_offset_position pos = {0}; + u8 migration_id1; + int migration_id2; +#ifdef CONFIG_SSDFS_DEBUG + int i; +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !req); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(req->place.start.seg_id != pebi->pebc->parent_si->seg_id); + BUG_ON(req->place.start.blk_index >= + pebi->pebc->parent_si->fsi->pages_per_seg); + switch (req->private.class) { + case SSDFS_PEB_UPDATE_REQ: + case SSDFS_PEB_PRE_ALLOC_UPDATE_REQ: + case SSDFS_PEB_DIFF_ON_WRITE_REQ: + case SSDFS_PEB_COLLECT_GARBAGE_REQ: + /* expected case */ + break; + default: + BUG(); + break; + } + BUG_ON(req->private.type >= SSDFS_REQ_TYPE_MAX); + BUG_ON(atomic_read(&req->private.refs_count) == 0); + + SSDFS_DBG("ino %llu, seg %llu, peb %llu, logical_offset %llu, " + "processed_blks %d, logical_block %u, data_bytes %u, " + "cno %llu, parent_snapshot %llu, cmd %#x, type %#x\n", + req->extent.ino, req->place.start.seg_id, pebi->peb_id, + req->extent.logical_offset, req->result.processed_blks, + req->place.start.blk_index, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot, + req->private.cmd, req->private.type); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = pebi->pebc->parent_si; + fsi = si->fsi; + table = pebi->pebc->parent_si->blk2off_table; + seg_blkbmap = &pebi->pebc->parent_si->blk_bmap; + peb_blkbmap = &seg_blkbmap->peb[pebi->pebc->peb_index]; + +#ifdef CONFIG_SSDFS_DEBUG + if (req->extent.logical_offset >= U64_MAX) { + SSDFS_ERR("seg %llu, peb %llu, logical_block %u, " + "logical_offset %llu, " + "processed_blks %d\n", + req->place.start.seg_id, pebi->peb_id, + req->place.start.blk_index, + req->extent.logical_offset, + req->result.processed_blks); + BUG(); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + blk = req->place.start.blk_index + req->result.processed_blks; + logical_offset = req->extent.logical_offset + + ((u64)req->result.processed_blks * fsi->pagesize); + logical_offset = div64_u64(logical_offset, fsi->pagesize); + + if (req->private.class == SSDFS_PEB_DIFF_ON_WRITE_REQ) { + u32 pvec_size = pagevec_count(&req->result.pvec); + u32 cur_index = req->result.processed_blks; + + if (cur_index >= pvec_size) { + SSDFS_ERR("processed_blks %u > pagevec_size %u\n", + cur_index, pvec_size); + return -ERANGE; + } + + if (req->result.pvec.pages[cur_index] == NULL) { + req->result.processed_blks++; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("block %u hasn't diff\n", + blk); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_update_block; + } + } + + blk_desc_off = ssdfs_blk2off_table_convert(table, blk, + &peb_index, + &migration_state, + &pos); + if (IS_ERR(blk_desc_off) && PTR_ERR(blk_desc_off) == -EAGAIN) { + struct completion *end = &table->full_init_end; + + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("blk2off init failed: " + "err %d\n", err); + return err; + } + + blk_desc_off = ssdfs_blk2off_table_convert(table, blk, + &peb_index, + &migration_state, + &pos); + } + + if (IS_ERR_OR_NULL(blk_desc_off)) { + err = (blk_desc_off == NULL ? -ERANGE : PTR_ERR(blk_desc_off)); + SSDFS_ERR("fail to convert: " + "logical_blk %u, err %d\n", + blk, err); + return err; + } + + if (req->private.class == SSDFS_PEB_DIFF_ON_WRITE_REQ) { + migration_id1 = + SSDFS_GET_BLK_DESC_MIGRATION_ID(&pos.blk_desc.buf); + migration_id2 = ssdfs_get_peb_migration_id_checked(pebi); + + if (migration_id1 < U8_MAX && migration_id1 != migration_id2) { + /* + * Base state and diff in different PEBs + */ + + range.start = blk; + range.len = 1; + + ssdfs_requests_queue_add_head(&pebi->pebc->update_rq, + req); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("range: (start %u, len %u)\n", + range.start, range.len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_ssdfs_peb_containing_user_data(pebi->pebc)) { + ssdfs_account_updated_user_data_pages(fsi, + range.len); + } + + err = ssdfs_peb_migrate_valid_blocks_range(si, + pebi->pebc, + peb_blkbmap, + &range); + if (unlikely(err)) { + SSDFS_ERR("fail to migrate valid blocks: " + "range (start %u, len %u), err %d\n", + range.start, range.len, err); + return err; + } + + return -ENOENT; + } + } + + down_write(&table->translation_lock); + + migration_state = ssdfs_blk2off_table_get_block_migration(table, blk, + peb_index); + switch (migration_state) { + case SSDFS_LBLOCK_UNKNOWN_STATE: + err = -ENOENT; + /* logical block is not migrating */ + break; + + case SSDFS_LBLOCK_UNDER_MIGRATION: + switch (req->private.cmd) { + case SSDFS_MIGRATE_RANGE: + case SSDFS_MIGRATE_FRAGMENT: + err = 0; + /* continue logic */ + break; + + default: + err = ssdfs_blk2off_table_update_block_state(table, + req); + if (unlikely(err)) { + SSDFS_ERR("fail to update block's state: " + "seg %llu, logical_block %u, " + "peb %llu, err %d\n", + req->place.start.seg_id, blk, + pebi->peb_id, err); + } else + err = -EEXIST; + break; + } + break; + + case SSDFS_LBLOCK_UNDER_COMMIT: + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try again to update block: " + "seg %llu, logical_block %u, peb %llu\n", + req->place.start.seg_id, blk, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("unexpected migration state: " + "seg %llu, logical_block %u, " + "peb %llu, migration_state %#x\n", + req->place.start.seg_id, blk, + pebi->peb_id, migration_state); + break; + } + + up_write(&table->translation_lock); + + if (err == -ENOENT) { + /* logical block is not migrating */ + err = 0; + } else if (err == -EEXIST) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("migrating block has been updated in buffer: " + "seg %llu, peb %llu, logical_block %u\n", + req->place.start.seg_id, pebi->peb_id, + blk); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } else if (unlikely(err)) + return err; + + err = ssdfs_peb_reserve_block_descriptor(pebi, req); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try again to add block: " + "seg %llu, logical_block %u, peb %llu\n", + req->place.start.seg_id, blk, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to reserve block descriptor: " + "seg %llu, logical_block %u, peb %llu, err %d\n", + req->place.start.seg_id, blk, pebi->peb_id, err); + return err; + } + + err = ssdfs_peb_add_block_into_data_area(pebi, req, + blk_desc_off, &pos, + &data_off, + &written_bytes); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try again to add block: " + "seg %llu, logical_block %u, peb %llu\n", + req->place.start.seg_id, blk, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to add block: " + "seg %llu, logical_block %u, peb %llu, err %d\n", + req->place.start.seg_id, blk, pebi->peb_id, err); + return err; + } + + range.start = le16_to_cpu(blk_desc_off->page_desc.peb_page); + range.len = (written_bytes + fsi->pagesize - 1) >> fsi->log_pagesize; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("written_bytes %u, range (start %u, len %u)\n", + written_bytes, range.start, range.len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (req->private.class == SSDFS_PEB_DIFF_ON_WRITE_REQ) + range_state = SSDFS_BLK_VALID; + else if (is_ssdfs_block_full(fsi->pagesize, written_bytes)) + range_state = SSDFS_BLK_VALID; + else + range_state = SSDFS_BLK_PRE_ALLOCATED; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u, peb_page %u\n", + blk, range.start); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_segment_blk_bmap_update_range(seg_blkbmap, pebi->pebc, + blk_desc_off->blk_state.peb_migration_id, + range_state, &range); + if (unlikely(err)) { + SSDFS_ERR("fail to update range: " + "seg %llu, peb %llu, " + "range (start %u, len %u), err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, range.start, range.len, + err); + return err; + } + + data_off.peb_page = (u16)range.start; + + if (req->private.class != SSDFS_PEB_DIFF_ON_WRITE_REQ && + !does_user_data_block_contain_diff(pebi, req)) + SSDFS_BLK_DESC_INIT(&pos.blk_desc.buf); + else { +#ifdef CONFIG_SSDFS_DEBUG + migration_id1 = + SSDFS_GET_BLK_DESC_MIGRATION_ID(&pos.blk_desc.buf); + if (migration_id1 >= U8_MAX) { + /* + * continue logic + */ + } else { + migration_id2 = + ssdfs_get_peb_migration_id_checked(pebi); + + if (migration_id1 != migration_id2) { + struct ssdfs_block_descriptor *blk_desc; + + SSDFS_WARN("invalid request: " + "migration_id1 %u, " + "migration_id2 %d\n", + migration_id1, migration_id2); + + blk_desc = &pos.blk_desc.buf; + + SSDFS_ERR("seg_id %llu, peb_id %llu, " + "ino %llu, logical_offset %u, " + "peb_index %u, peb_page %u\n", + req->place.start.seg_id, + pebi->peb_id, + le64_to_cpu(blk_desc->ino), + le32_to_cpu(blk_desc->logical_offset), + le16_to_cpu(blk_desc->peb_index), + le16_to_cpu(blk_desc->peb_page)); + + for (i = 0; i < SSDFS_BLK_STATE_OFF_MAX; i++) { + struct ssdfs_blk_state_offset *state; + + state = &blk_desc->state[i]; + + SSDFS_ERR("BLK STATE OFFSET %d: " + "log_start_page %u, " + "log_area %#x, " + "byte_offset %u, " + "peb_migration_id %u\n", + i, + le16_to_cpu(state->log_start_page), + state->log_area, + le32_to_cpu(state->byte_offset), + state->peb_migration_id); + } + + BUG(); + } + } +#endif /* CONFIG_SSDFS_DEBUG */ + } + + err = ssdfs_peb_store_block_descriptor(pebi, req, + &pos.blk_desc.buf, + &data_off, &desc_off); + if (unlikely(err)) { + SSDFS_ERR("fail to store block descriptor: " + "seg %llu, logical_block %u, peb %llu, err %d\n", + req->place.start.seg_id, blk, + pebi->peb_id, err); + return err; + } + + err = ssdfs_peb_store_block_descriptor_offset(pebi, + (u32)logical_offset, + blk, + &pos.blk_desc.buf, + &desc_off); + if (unlikely(err)) { + SSDFS_ERR("fail to store block descriptor offset: " + "err %d\n", + err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %llu, seg %llu, peb %llu, logical_block %u, " + "migration_state %#x\n", + req->extent.ino, req->place.start.seg_id, pebi->peb_id, + req->place.start.blk_index, migration_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_ssdfs_logical_block_migrating(migration_state)) { + err = ssdfs_blk2off_table_set_block_commit(table, blk, + peb_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set block commit: " + "logical_blk %u, peb_index %u, err %d\n", + blk, peb_index, err); + return err; + } + } + + req->result.processed_blks += range.len; + +finish_update_block: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finish update block: " + "ino %llu, seg %llu, peb %llu, logical_block %u, " + "req->result.processed_blks %d\n", + req->extent.ino, req->place.start.seg_id, pebi->peb_id, + req->place.start.blk_index, + req->result.processed_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_check_zone_move_request() - check request + * @req: segment request + * + * This method tries to check the state of request. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_check_zone_move_request(struct ssdfs_segment_request *req) +{ + wait_queue_head_t *wq = NULL; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); + + SSDFS_DBG("req %p\n", req); +#endif /* CONFIG_SSDFS_DEBUG */ + +check_req_state: + switch (atomic_read(&req->result.state)) { + case SSDFS_REQ_CREATED: + case SSDFS_REQ_STARTED: + wq = &req->private.wait_queue; + + err = wait_event_killable_timeout(*wq, + has_request_been_executed(req), + SSDFS_DEFAULT_TIMEOUT); + if (err < 0) + WARN_ON(err < 0); + else + err = 0; + + goto check_req_state; + break; + + case SSDFS_REQ_FINISHED: + /* do nothing */ + break; + + case SSDFS_REQ_FAILED: + err = req->result.err; + + if (!err) { + SSDFS_ERR("error code is absent: " + "req %p, err %d\n", + req, err); + err = -ERANGE; + } + + SSDFS_ERR("flush request is failed: " + "err %d\n", err); + return err; + + default: + SSDFS_ERR("invalid result's state %#x\n", + atomic_read(&req->result.state)); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_extract_left_extent() - extract left extent + * @req: I/O request + * @migration: recommended migration extent + * @left_fragment: difference between recommended and requested extents [out] + * + * This function tries to extract difference between recommended + * and requested extents from the left. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static inline +int ssdfs_extract_left_extent(struct ssdfs_segment_request *req, + struct ssdfs_zone_fragment *migration, + struct ssdfs_zone_fragment *left_fragment) +{ + u64 seg_id; + u32 start_blk; + u32 len; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req || !migration || !left_fragment); + + SSDFS_DBG("ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu, " + "seg %llu, logical_block %u, len %u, " + "cmd %#x, type %#x, processed_blks %d\n", + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot, + req->place.start.seg_id, + req->place.start.blk_index, + req->place.len, + req->private.cmd, req->private.type, + req->result.processed_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + seg_id = le64_to_cpu(migration->extent.seg_id); + start_blk = le32_to_cpu(migration->extent.logical_blk); + len = le32_to_cpu(migration->extent.len); + + if (req->extent.ino != migration->ino) { + SSDFS_ERR("invalid input: " + "ino1 %llu != ino2 %llu\n", + req->extent.ino, migration->ino); + return -ERANGE; + } + + if (req->place.start.seg_id != seg_id) { + SSDFS_ERR("invalid input: " + "seg_id1 %llu != seg_id2 %llu\n", + req->place.start.seg_id, seg_id); + return -ERANGE; + } + + if (req->place.start.blk_index < start_blk) { + SSDFS_ERR("invalid input: " + "request (seg_id %llu, logical_blk %u, len %u), " + "migration (seg_id %llu, logical_blk %u, len %u)\n", + req->place.start.seg_id, + req->place.start.blk_index, + req->place.len, + seg_id, start_blk, len); + return -ERANGE; + } + + if (req->place.start.blk_index == start_blk) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("no extent from the left: " + "request (seg_id %llu, logical_blk %u, len %u), " + "migration (seg_id %llu, logical_blk %u, len %u)\n", + req->place.start.seg_id, + req->place.start.blk_index, + req->place.len, + seg_id, start_blk, len); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + + left_fragment->ino = migration->ino; + left_fragment->logical_blk_offset = migration->logical_blk_offset; + left_fragment->extent.seg_id = migration->extent.seg_id; + left_fragment->extent.logical_blk = migration->extent.logical_blk; + left_fragment->extent.len = + cpu_to_le32(req->place.start.blk_index - start_blk); + + return 0; +} + +/* + * ssdfs_extract_right_extent() - extract right extent + * @req: I/O request + * @migration: recommended migration extent + * @right_fragment: difference between recommended and requested extents [out] + * + * This function tries to extract difference between recommended + * and requested extents from the right. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static inline +int ssdfs_extract_right_extent(struct ssdfs_segment_request *req, + struct ssdfs_zone_fragment *migration, + struct ssdfs_zone_fragment *right_fragment) +{ + u64 seg_id; + u32 start_blk; + u32 len; + u32 upper_bound1, upper_bound2; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req || !migration || !right_fragment); + + SSDFS_DBG("ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu, " + "seg %llu, logical_block %u, len %u, " + "cmd %#x, type %#x, processed_blks %d\n", + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot, + req->place.start.seg_id, + req->place.start.blk_index, + req->place.len, + req->private.cmd, req->private.type, + req->result.processed_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + seg_id = le64_to_cpu(migration->extent.seg_id); + start_blk = le32_to_cpu(migration->extent.logical_blk); + len = le32_to_cpu(migration->extent.len); + + if (req->extent.ino != migration->ino) { + SSDFS_ERR("invalid input: " + "ino1 %llu != ino2 %llu\n", + req->extent.ino, migration->ino); + return -ERANGE; + } + + if (req->place.start.seg_id != seg_id) { + SSDFS_ERR("invalid input: " + "seg_id1 %llu != seg_id2 %llu\n", + req->place.start.seg_id, seg_id); + return -ERANGE; + } + + upper_bound1 = req->place.start.blk_index + req->place.len; + upper_bound2 = start_blk + len; + + if (upper_bound1 > upper_bound2) { + SSDFS_ERR("invalid input: " + "request (seg_id %llu, logical_blk %u, len %u), " + "migration (seg_id %llu, logical_blk %u, len %u)\n", + req->place.start.seg_id, + req->place.start.blk_index, + req->place.len, + seg_id, start_blk, len); + return -ERANGE; + } + + if (upper_bound1 == upper_bound2) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("no extent from the right: " + "request (seg_id %llu, logical_blk %u, len %u), " + "migration (seg_id %llu, logical_blk %u, len %u)\n", + req->place.start.seg_id, + req->place.start.blk_index, + req->place.len, + seg_id, start_blk, len); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + + right_fragment->ino = migration->ino; + right_fragment->logical_blk_offset = + migration->logical_blk_offset + upper_bound1; + right_fragment->extent.seg_id = migration->extent.seg_id; + right_fragment->extent.logical_blk = cpu_to_le32(upper_bound1); + right_fragment->extent.len = cpu_to_le32(upper_bound2 - upper_bound1); + + return 0; +} + +/* + * __ssdfs_zone_issue_move_request() - issue move request + * @pebi: pointer on PEB object + * @fragment: zone fragment + * @req_type: request type + * @req: I/O request + * + * This function tries to issue move request. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_zone_issue_move_request(struct ssdfs_peb_info *pebi, + struct ssdfs_zone_fragment *fragment, + int req_type, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + struct inode *inode; + struct ssdfs_inode_info *ii; + struct ssdfs_extents_btree_info *etree; + struct ssdfs_btree_search *search; + struct page *page; + struct ssdfs_blk2off_range new_extent; + struct ssdfs_raw_extent old_raw_extent; + struct ssdfs_raw_extent new_raw_extent; + u64 seg_id; + u32 logical_blk; + u32 len; + u64 logical_offset; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !fragment); + + SSDFS_DBG("peb %llu, ino %llu, logical_blk_offset %llu, " + "extent (seg_id %llu, logical_blk %u, len %u)\n", + pebi->peb_id, + fragment->ino, + fragment->logical_blk_offset, + le64_to_cpu(fragment->extent.seg_id), + le32_to_cpu(fragment->extent.logical_blk), + le32_to_cpu(fragment->extent.len)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + seg_id = le64_to_cpu(fragment->extent.seg_id); + logical_blk = le32_to_cpu(fragment->extent.logical_blk); + len = le32_to_cpu(fragment->extent.len); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(seg_id != pebi->pebc->parent_si->seg_id); + BUG_ON(len > PAGEVEC_SIZE); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_init(req); + ssdfs_get_request(req); + + req->private.flags |= SSDFS_REQ_DONT_FREE_PAGES; + + logical_offset = fragment->logical_blk_offset << fsi->log_pagesize; + ssdfs_request_prepare_logical_extent(fragment->ino, logical_offset, + len, 0, 0, req); + + req->place.start.seg_id = seg_id; + req->place.start.blk_index = logical_blk; + req->place.len = 0; + + req->result.processed_blks = 0; + + for (i = 0; i < len; i++) { + logical_blk += i; + req->place.len++; + + err = ssdfs_peb_copy_page(pebi->pebc, logical_blk, req); + if (err == -EAGAIN) { + req->place.len = req->result.processed_blks; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to copy the whole range: " + "seg %llu, logical_blk %u, len %u\n", + pebi->pebc->parent_si->seg_id, + logical_blk, req->place.len); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to copy page: " + "seg %llu, logical_blk %u, err %d\n", + pebi->pebc->parent_si->seg_id, + logical_blk, err); + return err; + } + } + + for (i = 0; i < req->result.processed_blks; i++) + ssdfs_peb_mark_request_block_uptodate(pebi->pebc, req, i); + + for (i = 0; i < pagevec_count(&req->result.pvec); i++) { + page = req->result.pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + set_page_writeback(page); + } + + req->result.err = 0; + req->result.processed_blks = 0; + atomic_set(&req->result.state, SSDFS_UNKNOWN_REQ_RESULT); + + err = ssdfs_segment_migrate_zone_extent_async(fsi, + req_type, + req, + &seg_id, + &new_extent); + if (unlikely(err)) { + SSDFS_ERR("fail to migrate zone extent: " + "peb %llu, ino %llu, logical_blk_offset %llu, " + "extent (seg_id %llu, logical_blk %u, len %u)\n", + pebi->peb_id, + fragment->ino, + fragment->logical_blk_offset, + le64_to_cpu(fragment->extent.seg_id), + le32_to_cpu(fragment->extent.logical_blk), + le32_to_cpu(fragment->extent.len)); + goto fail_issue_move_request; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(seg_id >= U64_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + old_raw_extent.seg_id = fragment->extent.seg_id; + old_raw_extent.logical_blk = fragment->extent.logical_blk; + old_raw_extent.len = fragment->extent.len; + + new_raw_extent.seg_id = cpu_to_le64(seg_id); + new_raw_extent.logical_blk = cpu_to_le32(new_extent.start_lblk); + new_raw_extent.len = cpu_to_le32(new_extent.len); + + page = req->result.pvec.pages[0]; + inode = page->mapping->host; + ii = SSDFS_I(inode); + + etree = SSDFS_EXTREE(ii); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!etree); +#endif /* CONFIG_SSDFS_DEBUG */ + + search = ssdfs_btree_search_alloc(); + if (!search) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate btree search object\n"); + goto fail_issue_move_request; + } + + ssdfs_btree_search_init(search); + err = ssdfs_extents_tree_move_extent(etree, + fragment->logical_blk_offset, + &old_raw_extent, + &new_raw_extent, + search); + ssdfs_btree_search_free(search); + + if (unlikely(err)) { + SSDFS_ERR("fail to move extent: " + "old_extent (seg_id %llu, logical_blk %u, len %u), " + "new_extent (seg_id %llu, logical_blk %u, len %u), " + "err %d\n", + le64_to_cpu(old_raw_extent.seg_id), + le32_to_cpu(old_raw_extent.logical_blk), + le32_to_cpu(old_raw_extent.len), + le64_to_cpu(new_raw_extent.seg_id), + le32_to_cpu(new_raw_extent.logical_blk), + le32_to_cpu(new_raw_extent.len), + err); + goto fail_issue_move_request; + } + + return 0; + +fail_issue_move_request: + ssdfs_request_unlock_and_remove_pages(req); + ssdfs_put_request(req); + + return err; +} + +/* + * ssdfs_zone_issue_async_move_request() - issue async move request + * @pebi: pointer on PEB object + * @fragment: zone fragment + * + * This function tries to issue async move request. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_zone_issue_async_move_request(struct ssdfs_peb_info *pebi, + struct ssdfs_zone_fragment *fragment) +{ + struct ssdfs_segment_request *req; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !fragment); + + SSDFS_DBG("peb %llu, ino %llu, logical_blk_offset %llu, " + "extent (seg_id %llu, logical_blk %u, len %u)\n", + pebi->peb_id, + fragment->ino, + fragment->logical_blk_offset, + le64_to_cpu(fragment->extent.seg_id), + le32_to_cpu(fragment->extent.logical_blk), + le32_to_cpu(fragment->extent.len)); +#endif /* CONFIG_SSDFS_DEBUG */ + + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + return err; + } + + err = __ssdfs_zone_issue_move_request(pebi, fragment, + SSDFS_REQ_ASYNC, + req); + if (unlikely(err)) { + SSDFS_ERR("fail to issue move request: " + "peb %llu, ino %llu, logical_blk_offset %llu, " + "extent (seg_id %llu, logical_blk %u, len %u)\n", + pebi->peb_id, + fragment->ino, + fragment->logical_blk_offset, + le64_to_cpu(fragment->extent.seg_id), + le32_to_cpu(fragment->extent.logical_blk), + le32_to_cpu(fragment->extent.len)); + goto fail_issue_move_request; + } + + return 0; + +fail_issue_move_request: + ssdfs_request_free(req); + return err; +} + +/* + * ssdfs_zone_issue_move_request() - issue move request + * @pebi: pointer on PEB object + * @fragment: zone fragment + * @req: I/O request + * + * This function tries to issue move request. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_zone_issue_move_request(struct ssdfs_peb_info *pebi, + struct ssdfs_zone_fragment *fragment, + struct ssdfs_segment_request *req) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !fragment); + + SSDFS_DBG("peb %llu, ino %llu, logical_blk_offset %llu, " + "extent (seg_id %llu, logical_blk %u, len %u)\n", + pebi->peb_id, + fragment->ino, + fragment->logical_blk_offset, + le64_to_cpu(fragment->extent.seg_id), + le32_to_cpu(fragment->extent.logical_blk), + le32_to_cpu(fragment->extent.len)); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_zone_issue_move_request(pebi, fragment, + SSDFS_REQ_ASYNC_NO_FREE, + req); + if (unlikely(err)) { + SSDFS_ERR("fail to issue move request: " + "peb %llu, ino %llu, logical_blk_offset %llu, " + "extent (seg_id %llu, logical_blk %u, len %u)\n", + pebi->peb_id, + fragment->ino, + fragment->logical_blk_offset, + le64_to_cpu(fragment->extent.seg_id), + le32_to_cpu(fragment->extent.logical_blk), + le32_to_cpu(fragment->extent.len)); + goto fail_issue_move_request; + } + +fail_issue_move_request: + return err; +} + +/* + * ssdfs_zone_prepare_migration_request() - stimulate migration + * @pebi: pointer on PEB object + * @fragment: zone fragment + * @req: I/O request + * + * This function tries to prepare migration stimulation request + * during moving updated data from exhausted zone into current zone + * for updates. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_zone_prepare_migration_request(struct ssdfs_peb_info *pebi, + struct ssdfs_zone_fragment *fragment, + struct ssdfs_segment_request *req) +{ + struct ssdfs_zone_fragment sub_fragment; + u64 seg_id; + u32 logical_blk; + u32 len; + u32 offset = 0; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !fragment || !req); + + SSDFS_DBG("peb %llu, logical_blk_offset %llu, " + "extent (seg_id %llu, logical_blk %u, len %u)\n", + pebi->peb_id, + fragment->logical_blk_offset, + le64_to_cpu(fragment->extent.seg_id), + le32_to_cpu(fragment->extent.logical_blk), + le32_to_cpu(fragment->extent.len)); +#endif /* CONFIG_SSDFS_DEBUG */ + + seg_id = le64_to_cpu(fragment->extent.seg_id); + logical_blk = le32_to_cpu(fragment->extent.logical_blk); + len = le32_to_cpu(fragment->extent.len); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(seg_id != pebi->pebc->parent_si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + while (len > PAGEVEC_SIZE) { + sub_fragment.ino = fragment->ino; + sub_fragment.logical_blk_offset = + fragment->logical_blk_offset + offset; + sub_fragment.extent.seg_id = fragment->extent.seg_id; + sub_fragment.extent.logical_blk = + cpu_to_le32(logical_blk + offset); + sub_fragment.extent.len = cpu_to_le32(PAGEVEC_SIZE); + + err = ssdfs_zone_issue_async_move_request(pebi, &sub_fragment); + if (unlikely(err)) { + SSDFS_ERR("fail to issue zone async move request: " + "peb %llu, logical_blk_offset %llu, " + "sub_extent (seg_id %llu, " + "logical_blk %u, len %u), err %d\n", + pebi->peb_id, + sub_fragment.logical_blk_offset, + le64_to_cpu(sub_fragment.extent.seg_id), + le32_to_cpu(sub_fragment.extent.logical_blk), + le32_to_cpu(sub_fragment.extent.len), + err); + return err; + } + + offset += PAGEVEC_SIZE; + len -= PAGEVEC_SIZE; + } + + sub_fragment.ino = fragment->ino; + sub_fragment.logical_blk_offset = + fragment->logical_blk_offset + offset; + sub_fragment.extent.seg_id = fragment->extent.seg_id; + sub_fragment.extent.logical_blk = cpu_to_le32(logical_blk + offset); + sub_fragment.extent.len = cpu_to_le32(len); + + err = ssdfs_zone_issue_move_request(pebi, &sub_fragment, req); + if (unlikely(err)) { + SSDFS_ERR("fail to issue zone move request: " + "peb %llu, logical_blk_offset %llu, " + "sub_extent (seg_id %llu, " + "logical_blk %u, len %u), err %d\n", + pebi->peb_id, + sub_fragment.logical_blk_offset, + le64_to_cpu(sub_fragment.extent.seg_id), + le32_to_cpu(sub_fragment.extent.logical_blk), + le32_to_cpu(sub_fragment.extent.len), + err); + return err; + } + + return 0; +} + +/* + * ssdfs_zone_prepare_move_flush_request() - convert update into move request + * @pebi: pointer on PEB object + * @src: source I/O request + * @dst: destination I/O request + * + * This function tries to convert update request into + * move request. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_zone_prepare_move_flush_request(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *src, + struct ssdfs_segment_request *dst) +{ + struct ssdfs_fs_info *fsi; + struct inode *inode; + struct ssdfs_inode_info *ii; + struct ssdfs_extents_btree_info *etree; + struct ssdfs_btree_search *search; + struct page *page; + struct ssdfs_blk2off_range new_extent; + struct ssdfs_raw_extent old_raw_extent; + struct ssdfs_raw_extent new_raw_extent; + u64 seg_id; + u32 logical_blk; + u32 len; + u64 logical_offset; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !src || !dst); + + SSDFS_DBG("peb %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu, " + "seg %llu, logical_block %u, cmd %#x, type %#x, " + "processed_blks %d\n", + pebi->peb_id, src->extent.ino, src->extent.logical_offset, + src->extent.data_bytes, src->extent.cno, + src->extent.parent_snapshot, + src->place.start.seg_id, src->place.start.blk_index, + src->private.cmd, src->private.type, + src->result.processed_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + seg_id = src->place.start.seg_id; + logical_blk = src->place.start.blk_index; + len = src->place.len; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(seg_id != pebi->pebc->parent_si->seg_id); + BUG_ON(len > PAGEVEC_SIZE); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = src->result.pvec.pages[0]; + inode = page->mapping->host; + ii = SSDFS_I(inode); + + etree = SSDFS_EXTREE(ii); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!etree); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_init(dst); + ssdfs_get_request(dst); + + dst->private.flags |= SSDFS_REQ_DONT_FREE_PAGES; + + logical_offset = src->extent.logical_offset; + ssdfs_request_prepare_logical_extent(src->extent.ino, + logical_offset, len, + 0, 0, dst); + + dst->place.start.seg_id = seg_id; + dst->place.start.blk_index = logical_blk; + dst->place.len = len; + + dst->result.processed_blks = 0; + + for (i = 0; i < pagevec_count(&src->result.pvec); i++) { + page = src->result.pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + dst->result.pvec.pages[i] = page; + src->result.pvec.pages[i] = NULL; + } + + pagevec_reinit(&src->result.pvec); + + dst->result.err = 0; + dst->result.processed_blks = 0; + atomic_set(&dst->result.state, SSDFS_UNKNOWN_REQ_RESULT); + + err = ssdfs_segment_migrate_zone_extent_async(fsi, + SSDFS_REQ_ASYNC_NO_FREE, + dst, + &seg_id, + &new_extent); + if (unlikely(err)) { + SSDFS_ERR("fail to migrate zone extent: " + "peb %llu, ino %llu, logical_blk_offset %llu, " + "extent (seg_id %llu, logical_blk %u, len %u)\n", + pebi->peb_id, + src->extent.ino, src->extent.logical_offset, + src->place.start.seg_id, + src->place.start.blk_index, + src->place.len); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(seg_id >= U64_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + old_raw_extent.seg_id = cpu_to_le64(src->place.start.seg_id); + old_raw_extent.logical_blk = cpu_to_le32(src->place.start.blk_index); + old_raw_extent.len = cpu_to_le32(src->place.len); + + new_raw_extent.seg_id = cpu_to_le64(seg_id); + new_raw_extent.logical_blk = cpu_to_le32(new_extent.start_lblk); + new_raw_extent.len = cpu_to_le32(new_extent.len); + + search = ssdfs_btree_search_alloc(); + if (!search) { + SSDFS_ERR("fail to allocate btree search object\n"); + return -ENOMEM; + } + + ssdfs_btree_search_init(search); + err = ssdfs_extents_tree_move_extent(etree, + src->extent.logical_offset, + &old_raw_extent, + &new_raw_extent, + search); + ssdfs_btree_search_free(search); + + if (unlikely(err)) { + SSDFS_ERR("fail to move extent: " + "old_extent (seg_id %llu, logical_blk %u, len %u), " + "new_extent (seg_id %llu, logical_blk %u, len %u), " + "err %d\n", + le64_to_cpu(old_raw_extent.seg_id), + le32_to_cpu(old_raw_extent.logical_blk), + le32_to_cpu(old_raw_extent.len), + le64_to_cpu(new_raw_extent.seg_id), + le32_to_cpu(new_raw_extent.logical_blk), + le32_to_cpu(new_raw_extent.len), + err); + return err; + } + + return 0; +} + +enum { + SSDFS_ZONE_LEFT_EXTENT, + SSDFS_ZONE_MAIN_EXTENT, + SSDFS_ZONE_RIGHT_EXTENT, + SSDFS_ZONE_MIGRATING_EXTENTS +}; + +/* + * ssdfs_zone_move_extent() - move extent (ZNS SSD case) + * @pebi: pointer on PEB object + * @req: I/O request + * + * This function tries to move extent from exhausted zone + * into current zone for updates. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_zone_move_extent(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_invextree_info *invextree; + struct ssdfs_btree_search *search; + struct ssdfs_raw_extent extent; + struct ssdfs_segment_request *queue[SSDFS_ZONE_MIGRATING_EXTENTS] = {0}; + struct ssdfs_zone_fragment migration; + struct ssdfs_zone_fragment left_fragment; + struct ssdfs_zone_fragment *left_fragment_ptr; + struct ssdfs_zone_fragment right_fragment; + struct ssdfs_zone_fragment *right_fragment_ptr; + size_t desc_size = sizeof(struct ssdfs_zone_fragment); + u32 rest_bytes; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !req); + + SSDFS_DBG("peb %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu, " + "seg %llu, logical_block %u, cmd %#x, type %#x, " + "processed_blks %d\n", + pebi->peb_id, req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot, + req->place.start.seg_id, req->place.start.blk_index, + req->private.cmd, req->private.type, + req->result.processed_blks); + + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(req->place.start.seg_id != pebi->pebc->parent_si->seg_id); + BUG_ON(req->place.start.blk_index >= + pebi->pebc->parent_si->fsi->pages_per_seg); + switch (req->private.class) { + case SSDFS_PEB_UPDATE_REQ: + case SSDFS_PEB_PRE_ALLOC_UPDATE_REQ: + case SSDFS_PEB_DIFF_ON_WRITE_REQ: + case SSDFS_PEB_COLLECT_GARBAGE_REQ: + /* expected case */ + break; + default: + BUG(); + break; + } + BUG_ON(req->private.type >= SSDFS_REQ_TYPE_MAX); + BUG_ON(atomic_read(&req->private.refs_count) == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + rest_bytes = ssdfs_request_rest_bytes(pebi, req); + + memset(&migration, 0xFF, desc_size); + memset(&left_fragment, 0xFF, desc_size); + memset(&right_fragment, 0xFF, desc_size); + + err = ssdfs_recommend_migration_extent(fsi, req, + &migration); + if (err == -ENODATA) { + err = 0; + /* do nothing */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to recommend migration extent: " + "err %d\n", err); + goto finish_zone_move_extent; + } else { + err = ssdfs_extract_left_extent(req, &migration, + &left_fragment); + if (err == -ENODATA) { + err = 0; + SSDFS_DBG("no extent from the left\n"); + } else if (unlikely(err)) { + SSDFS_ERR("fail to extract left extent: " + "seg_id %llu, peb_id %llu, " + "logical_block %u, err %d\n", + req->place.start.seg_id, + pebi->peb_id, + req->place.start.blk_index, + err); + goto finish_zone_move_extent; + } else + left_fragment_ptr = &left_fragment; + + err = ssdfs_extract_right_extent(req, &migration, + &right_fragment); + if (err == -ENODATA) { + err = 0; + SSDFS_DBG("no extent from the right\n"); + } else if (unlikely(err)) { + SSDFS_ERR("fail to extract right extent: " + "seg_id %llu, peb_id %llu, " + "logical_block %u, err %d\n", + req->place.start.seg_id, + pebi->peb_id, + req->place.start.blk_index, + err); + goto finish_zone_move_extent; + } else + right_fragment_ptr = &right_fragment; + } + + if (left_fragment_ptr) { + queue[SSDFS_ZONE_LEFT_EXTENT] = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(queue[SSDFS_ZONE_LEFT_EXTENT])) { + SSDFS_ERR("fail to allocate segment request\n"); + goto free_moving_requests; + } + } + + queue[SSDFS_ZONE_MAIN_EXTENT] = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(queue[SSDFS_ZONE_MAIN_EXTENT])) { + SSDFS_ERR("fail to allocate segment request\n"); + goto free_moving_requests; + } + + if (right_fragment_ptr) { + queue[SSDFS_ZONE_RIGHT_EXTENT] = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(queue[SSDFS_ZONE_RIGHT_EXTENT])) { + SSDFS_ERR("fail to allocate segment request\n"); + goto free_moving_requests; + } + } + + if (left_fragment_ptr) { + err = ssdfs_zone_prepare_migration_request(pebi, + left_fragment_ptr, + queue[SSDFS_ZONE_LEFT_EXTENT]); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare zone migration request: " + "err %d\n", err); + goto free_moving_requests; + } + } + + err = ssdfs_zone_prepare_move_flush_request(pebi, req, + queue[SSDFS_ZONE_MAIN_EXTENT]); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare zone move request: " + "err %d\n", err); + goto free_moving_requests; + } + + if (right_fragment_ptr) { + err = ssdfs_zone_prepare_migration_request(pebi, + left_fragment_ptr, + queue[SSDFS_ZONE_RIGHT_EXTENT]); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare zone migration request: " + "err %d\n", err); + goto free_moving_requests; + } + } + + for (i = 0; i < SSDFS_ZONE_MIGRATING_EXTENTS; i++) { + if (queue[i] == NULL) + continue; + + err = ssdfs_check_zone_move_request(queue[i]); + if (unlikely(err)) { + SSDFS_ERR("flush request failed: " + "index %d, err %d\n", + i, err); + } + + ssdfs_put_request(queue[i]); + ssdfs_request_free(queue[i]); + } + + invextree = fsi->invextree; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!invextree); +#endif /* CONFIG_SSDFS_DEBUG */ + + search = ssdfs_btree_search_alloc(); + if (!search) { + SSDFS_ERR("fail to allocate btree search object\n"); + return -ENOMEM; + } + + extent.seg_id = migration.extent.seg_id; + extent.logical_blk = migration.extent.logical_blk; + extent.len = migration.extent.len; + + ssdfs_btree_search_init(search); + err = ssdfs_invextree_add(invextree, &extent, search); + ssdfs_btree_search_free(search); + + if (unlikely(err)) { + SSDFS_ERR("fail to add invalidated extent: " + "seg_id %llu, logical_blk %u, " + "len %u, err %d\n", + le64_to_cpu(extent.seg_id), + le32_to_cpu(extent.logical_blk), + le32_to_cpu(extent.len), + err); + return err; + } + + return 0; + +free_moving_requests: + for (i = 0; i < SSDFS_ZONE_MIGRATING_EXTENTS; i++) { + if (queue[i] == NULL) + continue; + + ssdfs_put_request(queue[i]); + ssdfs_request_free(queue[i]); + } + +finish_zone_move_extent: + return err; +} + +/* + * ssdfs_peb_update_block() - update data block + * @pebi: pointer on PEB object + * @req: I/O request + * + * This function tries to update data block in PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EAGAIN - try again to update data block. + * %-ENOENT - need migrate base state before storing diff. + */ +static +int ssdfs_peb_update_block(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !req); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + + SSDFS_DBG("ino %llu, seg %llu, peb %llu, logical_offset %llu, " + "processed_blks %d, logical_block %u, data_bytes %u, " + "cno %llu, parent_snapshot %llu, cmd %#x, type %#x\n", + req->extent.ino, req->place.start.seg_id, pebi->peb_id, + req->extent.logical_offset, req->result.processed_blks, + req->place.start.blk_index, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot, + req->private.cmd, req->private.type); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&pebi->pebc->migration_phase)) { + case SSDFS_SHARED_ZONE_RECEIVES_DATA: + err = ssdfs_zone_move_extent(pebi, req); + break; + + default: + err = __ssdfs_peb_update_block(pebi, req); + break; + } + + return err; +} + +/* + * __ssdfs_peb_update_extent() - update extent of blocks + * @pebi: pointer on PEB object + * @req: I/O request + * + * This function tries to update extent of blocks in PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_peb_update_extent(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + u32 blk; + u32 rest_bytes; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !req); + + SSDFS_DBG("peb %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu, " + "seg %llu, logical_block %u, cmd %#x, type %#x, " + "processed_blks %d\n", + pebi->peb_id, req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot, + req->place.start.seg_id, req->place.start.blk_index, + req->private.cmd, req->private.type, + req->result.processed_blks); + + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(req->place.start.seg_id != pebi->pebc->parent_si->seg_id); + BUG_ON(req->place.start.blk_index >= + pebi->pebc->parent_si->fsi->pages_per_seg); + switch (req->private.class) { + case SSDFS_PEB_UPDATE_REQ: + case SSDFS_PEB_PRE_ALLOC_UPDATE_REQ: + case SSDFS_PEB_DIFF_ON_WRITE_REQ: + case SSDFS_PEB_COLLECT_GARBAGE_REQ: + /* expected case */ + break; + default: + BUG(); + break; + } + BUG_ON(req->private.type >= SSDFS_REQ_TYPE_MAX); + BUG_ON(atomic_read(&req->private.refs_count) == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + rest_bytes = ssdfs_request_rest_bytes(pebi, req); + + while (rest_bytes > 0) { + blk = req->place.start.blk_index + + req->result.processed_blks; + + err = __ssdfs_peb_update_block(pebi, req); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to update block: " + "seg %llu, logical_block %u, " + "peb %llu\n", + req->place.start.seg_id, blk, + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("need to migrate base state for diff: " + "seg %llu, logical_block %u, " + "peb %llu\n", + req->place.start.seg_id, blk, + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to update block: " + "seg %llu, logical_block %u, " + "peb %llu, err %d\n", + req->place.start.seg_id, blk, + pebi->peb_id, err); + return err; + } + + rest_bytes = ssdfs_request_rest_bytes(pebi, req); + }; + + return 0; +} + +/* + * ssdfs_peb_update_extent() - update extent of blocks + * @pebi: pointer on PEB object + * @req: I/O request + * + * This function tries to update extent of blocks in PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_update_extent(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !req); + + SSDFS_DBG("peb %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu, " + "seg %llu, logical_block %u, cmd %#x, type %#x, " + "processed_blks %d\n", + pebi->peb_id, req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot, + req->place.start.seg_id, req->place.start.blk_index, + req->private.cmd, req->private.type, + req->result.processed_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&pebi->pebc->migration_phase)) { + case SSDFS_SHARED_ZONE_RECEIVES_DATA: + err = ssdfs_zone_move_extent(pebi, req); + break; + + default: + err = __ssdfs_peb_update_extent(pebi, req); + break; + } + + return err; +} + +/* + * ssdfs_peb_migrate_pre_allocated_block() - migrate pre-allocated block + * @pebi: pointer on PEB object + * @req: I/O request + * + * This function tries to update data block in PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_migrate_pre_allocated_block(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + struct ssdfs_blk2off_table *table; + struct ssdfs_segment_blk_bmap *seg_blkbmap; + struct ssdfs_phys_offset_descriptor *blk_desc_off; + struct ssdfs_peb_phys_offset desc_off = {0}; + u16 peb_index; + u16 logical_block; + int processed_blks; + u64 logical_offset; + struct ssdfs_block_bmap_range range; + int range_state; + int migration_state = SSDFS_LBLOCK_UNKNOWN_STATE; + struct ssdfs_offset_position pos = {0}; + u32 len; + u8 id; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !req); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(req->place.start.seg_id != pebi->pebc->parent_si->seg_id); + BUG_ON(req->place.start.blk_index >= + pebi->pebc->parent_si->fsi->pages_per_seg); + + switch (req->private.class) { + case SSDFS_PEB_COLLECT_GARBAGE_REQ: + /* expected state */ + break; + default: + SSDFS_ERR("unexpected request: " + "req->private.class %#x\n", + req->private.class); + BUG(); + }; + + switch (req->private.cmd) { + case SSDFS_MIGRATE_PRE_ALLOC_PAGE: + /* expected state */ + break; + + default: + SSDFS_ERR("unexpected request: " + "req->private.cmd %#x\n", + req->private.cmd); + BUG(); + }; + + BUG_ON(req->private.type >= SSDFS_REQ_TYPE_MAX); + BUG_ON(atomic_read(&req->private.refs_count) == 0); + BUG_ON(req->extent.data_bytes > pebi->pebc->parent_si->fsi->pagesize); + BUG_ON(req->result.processed_blks > 0); + + SSDFS_DBG("ino %llu, seg %llu, peb %llu, logical_offset %llu, " + "processed_blks %d, logical_block %u, data_bytes %u, " + "cno %llu, parent_snapshot %llu, cmd %#x, type %#x\n", + req->extent.ino, req->place.start.seg_id, pebi->peb_id, + req->extent.logical_offset, req->result.processed_blks, + req->place.start.blk_index, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot, + req->private.cmd, req->private.type); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + si = pebi->pebc->parent_si; + table = pebi->pebc->parent_si->blk2off_table; + seg_blkbmap = &pebi->pebc->parent_si->blk_bmap; + processed_blks = req->result.processed_blks; + logical_block = req->place.start.blk_index + processed_blks; + logical_offset = req->extent.logical_offset + + ((u64)processed_blks * fsi->pagesize); + logical_offset /= fsi->pagesize; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, logical_block %u, " + "logical_offset %llu, " + "processed_blks %d\n", + req->place.start.seg_id, pebi->peb_id, + logical_block, logical_offset, + processed_blks); + + if (req->extent.logical_offset >= U64_MAX) { + SSDFS_ERR("seg %llu, peb %llu, logical_block %u, " + "logical_offset %llu, " + "processed_blks %d\n", + req->place.start.seg_id, pebi->peb_id, + logical_block, logical_offset, + processed_blks); + BUG(); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + len = req->extent.data_bytes; + len -= req->result.processed_blks * si->fsi->pagesize; + len >>= fsi->log_pagesize; + + blk_desc_off = ssdfs_blk2off_table_convert(table, + logical_block, + &peb_index, + &migration_state, + &pos); + if (IS_ERR(blk_desc_off) && PTR_ERR(blk_desc_off) == -EAGAIN) { + struct completion *end = &table->full_init_end; + + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("blk2off init failed: " + "err %d\n", err); + return err; + } + + blk_desc_off = ssdfs_blk2off_table_convert(table, + logical_block, + &peb_index, + &migration_state, + &pos); + } + + if (IS_ERR_OR_NULL(blk_desc_off)) { + err = (blk_desc_off == NULL ? -ERANGE : PTR_ERR(blk_desc_off)); + SSDFS_ERR("fail to convert: " + "logical_blk %u, err %d\n", + logical_block, err); + return err; + } + + if (migration_state == SSDFS_LBLOCK_UNDER_COMMIT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try again to add block: " + "seg %llu, logical_block %u, peb %llu\n", + req->place.start.seg_id, logical_block, + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EAGAIN; + } + + range.start = le16_to_cpu(blk_desc_off->page_desc.peb_page); + range.len = len; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u, peb_page %u\n", + logical_block, range.start); + SSDFS_DBG("range (start %u, len %u)\n", + range.start, range.len); +#endif /* CONFIG_SSDFS_DEBUG */ + + range_state = SSDFS_BLK_PRE_ALLOCATED; + + err = ssdfs_segment_blk_bmap_update_range(seg_blkbmap, pebi->pebc, + blk_desc_off->blk_state.peb_migration_id, + range_state, &range); + if (unlikely(err)) { + SSDFS_ERR("fail to update range: " + "seg %llu, peb %llu, " + "range (start %u, len %u), err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, range.start, range.len, + err); + return err; + } + + id = ssdfs_get_peb_migration_id_checked(pebi); + if (unlikely(id < 0)) { + SSDFS_ERR("invalid peb_migration_id: " + "seg %llu, peb_id %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, id); + return id; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(id > U8_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + desc_off.peb_index = pebi->peb_index; + desc_off.peb_migration_id = id; + desc_off.peb_page = (u16)range.start; + desc_off.log_area = SSDFS_LOG_AREA_MAX; + desc_off.byte_offset = U32_MAX; + + err = ssdfs_peb_store_block_descriptor_offset(pebi, + (u32)logical_offset, + logical_block, + NULL, + &desc_off); + if (unlikely(err)) { + SSDFS_ERR("fail to store block descriptor offset: " + "logical_block %u, logical_offset %llu, " + "err %d\n", + logical_block, logical_offset, err); + return err; + } + + req->result.processed_blks += range.len; + return 0; +} + +/* + * ssdfs_process_update_request() - process update request + * @pebi: pointer on PEB object + * @req: request + * + * This function detects command of request and + * to call a proper function for request processing. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-EAGAIN - unable to update block. + */ +static +int ssdfs_process_update_request(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !req); + + SSDFS_DBG("req %p, cmd %#x, type %#x\n", + req, req->private.cmd, req->private.type); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (req->private.cmd <= SSDFS_CREATE_CMD_MAX || + req->private.cmd >= SSDFS_COLLECT_GARBAGE_CMD_MAX) { + SSDFS_ERR("unknown update command %d, seg %llu, peb %llu\n", + req->private.cmd, pebi->pebc->parent_si->seg_id, + pebi->peb_id); + req->result.err = -EINVAL; + atomic_set(&req->result.state, SSDFS_REQ_FAILED); + return -EINVAL; + } + + atomic_set(&req->result.state, SSDFS_REQ_STARTED); + + switch (req->private.cmd) { + case SSDFS_UPDATE_BLOCK: + case SSDFS_UPDATE_PRE_ALLOC_BLOCK: + err = ssdfs_peb_update_block(pebi, req); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to update block: " + "seg %llu, peb %llu\n", + req->place.start.seg_id, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + ssdfs_fs_error(pebi->pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to update block: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + case SSDFS_UPDATE_EXTENT: + case SSDFS_UPDATE_PRE_ALLOC_EXTENT: + err = ssdfs_peb_update_extent(pebi, req); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to update block: " + "seg %llu, peb %llu\n", + req->place.start.seg_id, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + ssdfs_fs_error(pebi->pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to update extent: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + case SSDFS_BTREE_NODE_DIFF: + err = ssdfs_peb_update_extent(pebi, req); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to update extent: " + "seg %llu, peb %llu\n", + req->place.start.seg_id, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("need to migrate base state for diff: " + "seg %llu, peb %llu\n", + req->place.start.seg_id, + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + ssdfs_fs_error(pebi->pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to update extent: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + case SSDFS_USER_DATA_DIFF: + err = ssdfs_peb_update_block(pebi, req); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to update block: " + "seg %llu, peb %llu\n", + req->place.start.seg_id, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + ssdfs_fs_error(pebi->pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to update block: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + case SSDFS_COMMIT_LOG_NOW: + case SSDFS_START_MIGRATION_NOW: + case SSDFS_EXTENT_WAS_INVALIDATED: + /* simply continue logic */ + break; + + case SSDFS_MIGRATE_RANGE: + err = ssdfs_peb_update_extent(pebi, req); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to migrate extent: " + "seg %llu, peb %llu\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + ssdfs_fs_error(pebi->pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to migrate extent: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + case SSDFS_MIGRATE_PRE_ALLOC_PAGE: + err = ssdfs_peb_migrate_pre_allocated_block(pebi, req); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to migrate pre-alloc page: " + "seg %llu, peb %llu\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + ssdfs_fs_error(pebi->pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to migrate pre-alloc page: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + case SSDFS_MIGRATE_FRAGMENT: + err = ssdfs_peb_update_block(pebi, req); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to migrate fragment: " + "seg %llu, peb %llu\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + ssdfs_fs_error(pebi->pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to migrate fragment: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + default: + BUG(); + } + + if (unlikely(err)) { + /* request failed */ + atomic_set(&req->result.state, SSDFS_REQ_FAILED); + } else if (is_ssdfs_peb_containing_user_data(pebi->pebc)) { + struct ssdfs_peb_container *pebc = pebi->pebc; + int processed_blks = req->result.processed_blks; + u32 pending = 0; + + switch (req->private.cmd) { + case SSDFS_UPDATE_BLOCK: + case SSDFS_UPDATE_PRE_ALLOC_BLOCK: + case SSDFS_UPDATE_EXTENT: + case SSDFS_UPDATE_PRE_ALLOC_EXTENT: + case SSDFS_BTREE_NODE_DIFF: + case SSDFS_USER_DATA_DIFF: + case SSDFS_MIGRATE_RANGE: + case SSDFS_MIGRATE_PRE_ALLOC_PAGE: + case SSDFS_MIGRATE_FRAGMENT: + spin_lock(&pebc->pending_lock); + pending = pebc->pending_updated_user_data_pages; + if (pending >= processed_blks) { + pebc->pending_updated_user_data_pages -= + processed_blks; + pending = pebc->pending_updated_user_data_pages; + } else { + /* wrong accounting */ + err = -ERANGE; + } + spin_unlock(&pebc->pending_lock); + break; + + default: + /* do nothing */ + break; + } + + if (unlikely(err)) { + SSDFS_ERR("pending %u < processed_blks %d\n", + pending, processed_blks); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, peb_index %u, pending %u\n", + pebi->pebc->parent_si->seg_id, + pebi->pebc->peb_index, + pending); +#endif /* CONFIG_SSDFS_DEBUG */ + } + } + + return err; +} + /* * ssdfs_peb_has_dirty_pages() - check that PEB has dirty pages * @pebi: pointer on PEB object From patchwork Sat Feb 25 01:08:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151937 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D751C64ED8 for ; Sat, 25 Feb 2023 01:17:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229763AbjBYBRm (ORCPT ); Fri, 24 Feb 2023 20:17:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229693AbjBYBQs (ORCPT ); Fri, 24 Feb 2023 20:16:48 -0500 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A078516328 for ; Fri, 24 Feb 2023 17:16:43 -0800 (PST) Received: by mail-oi1-x22e.google.com with SMTP id bk32so763176oib.10 for ; Fri, 24 Feb 2023 17:16:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GJmoW19vH/ihUE8CclRwinORr6wbIShbLnA29Opvs3o=; b=e9eRwQHswQEsvKmSoGov8XUycjrTCFyQs/4pfnp23TX3dVXbxD4sjn4FUPmLgiHhZk qAVS0wjiddDxX4BYPvIZFqSuU3733vejFmEgO+4ai99573r9YWQ+iyn3Lp+6Xkzka3JJ esiub0GfixJu8EFmshpwt7DdJiPKmkkNfZTLuLFJKyfcwZ049e20naNeGDvJg5m6jNHb XAn3+Q1RQGHOzFMukORCY8P66aRPe7BRF8MTHVvqRCsqs/jyMjTOQQYocPjYCXXmCWiB xfwqVkAxVIdElWpHTS2hGKNu1kr586a81AhIdrlX60VXCM1nZtg61KKiTtzRqAqL1rYd Y2Yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GJmoW19vH/ihUE8CclRwinORr6wbIShbLnA29Opvs3o=; b=eqyfTJkPj7PGeFFw5zcMOg5IWyoBmjz8s3zPrI+nC9Pa81bGus0QND8K9Iy6SX5yJh LcCYSP/1e7xzn79k0rldlBXxtlj5Xxx5EScsB6PML3ZHRC/jmqcGTW9DKPegjXOOaDdk W7BfTyf/sPLtBsSuRWVUhNf3qbGIsQpF4aXt2J/i59u0A6OzKqO4w/1FOj/xTfnvzTKj kXwnFC/iLF9zMicW1r3yhqt2PboES02RrO/785ZpDC3LlLRadgs+6f9zhPfZA2yS3PCh krnBcwoHNt0X0Ll6sineK08qYZcvja939sjCBkwx23nmoDez3ASvZswJV5sRajlN01NR xSSg== X-Gm-Message-State: AO0yUKUnuV0WqgQ7v5bEsEMoERhH1teNSeIW8rYCyRPLuZwBaWTKJVGQ Jbhakb7VXkYaIADCsuVmpQLtteYw83UrvcuY X-Google-Smtp-Source: AK7set95toy0C8RZyRWlkZHAk+srcyArM3X3blIMNAojQkjj5EET4gVcLnrQwclcpQ58uGJwpvTgKQ== X-Received: by 2002:a54:4806:0:b0:384:8a1:c14b with SMTP id j6-20020a544806000000b0038408a1c14bmr1282993oij.31.1677287802254; Fri, 24 Feb 2023 17:16:42 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:41 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 32/76] ssdfs: process create request Date: Fri, 24 Feb 2023 17:08:43 -0800 Message-Id: <20230225010927.813929-33-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Flush thread of current segment can receive a several types of create request: (1) pre-allocate block or extent, (2) create block or extent, (3) migrate zone's user block or extent. Pre-allocate operation implies the operation of reservation one or several logical blocks for empty file or b-tree node. Also, if a file can be inline (stored into inode's space), then logical block is in pre-allocated state too. Create block or extent operation implies the allocation of logical block(s) and store user data or metadata into it. Migrate zone's block (or extent) operation is used if user data in closed zone is received update. Such case requires the storing of new state of user data into current zone for user data update and store the invalidated extent of closed zone into invalidated extents b-tree. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/peb_flush_thread.c | 3270 +++++++++++++++++++++++++++++++++++ 1 file changed, 3270 insertions(+) diff --git a/fs/ssdfs/peb_flush_thread.c b/fs/ssdfs/peb_flush_thread.c index 7e6a8a67e142..857270e0cbf0 100644 --- a/fs/ssdfs/peb_flush_thread.c +++ b/fs/ssdfs/peb_flush_thread.c @@ -228,6 +228,3276 @@ struct ssdfs_pagevec_descriptor { * FLUSH THREAD FUNCTIONALITY * ******************************************************************************/ +/* + * ssdfs_request_rest_bytes() - define rest bytes in request + * @pebi: pointer on PEB object + * @req: I/O request + */ +static inline +u32 ssdfs_request_rest_bytes(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi = pebi->pebc->parent_si->fsi; + u32 processed_bytes = req->result.processed_blks * fsi->pagesize; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("processed_bytes %u, req->extent.data_bytes %u\n", + processed_bytes, req->extent.data_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (processed_bytes > req->extent.data_bytes) + return 0; + else + return req->extent.data_bytes - processed_bytes; +} + +/* + * ssdfs_peb_increase_area_payload_size() - increase area size + * @pebi: pointer on PEB object + * @area_type: area type + * @p: byte stream object ponter + */ +static void +ssdfs_peb_increase_area_payload_size(struct ssdfs_peb_info *pebi, + int area_type, + struct ssdfs_byte_stream_descriptor *p) +{ + struct ssdfs_peb_area *area; + struct ssdfs_fragments_chain_header *chain_hdr; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !p); + BUG_ON(area_type >= SSDFS_LOG_AREA_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + area = &pebi->current_log.area[area_type]; + + switch (area_type) { + case SSDFS_LOG_BLK_DESC_AREA: + chain_hdr = &area->metadata.area.blk_desc.table.chain_hdr; + break; + + case SSDFS_LOG_DIFFS_AREA: + chain_hdr = &area->metadata.area.diffs.table.hdr.chain_hdr; + break; + + case SSDFS_LOG_JOURNAL_AREA: + chain_hdr = &area->metadata.area.journal.table.hdr.chain_hdr; + break; + + case SSDFS_LOG_MAIN_AREA: + chain_hdr = &area->metadata.area.main.desc.chain_hdr; + break; + + default: + BUG(); + }; + + le32_add_cpu(&chain_hdr->compr_bytes, p->compr_bytes); + le32_add_cpu(&chain_hdr->uncompr_bytes, (u32)p->data_bytes); +} + +/* + * ssdfs_peb_define_area_offset() - define fragment's offset + * @pebi: pointer on PEB object + * @area_type: area type + * @p: byte stream object ponter + * @off: PEB's physical offset to data [out] + */ +static +int ssdfs_peb_define_area_offset(struct ssdfs_peb_info *pebi, + int area_type, + struct ssdfs_byte_stream_descriptor *p, + struct ssdfs_peb_phys_offset *off) +{ + int id; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !p); + BUG_ON(area_type >= SSDFS_LOG_AREA_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + id = ssdfs_get_peb_migration_id_checked(pebi); + if (unlikely(id < 0)) { + SSDFS_ERR("invalid peb_migration_id: " + "seg %llu, peb_id %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, id); + return id; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(id > U8_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + off->peb_index = pebi->peb_index; + off->peb_migration_id = (u8)id; + off->log_area = area_type; + off->byte_offset = p->write_offset; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("off->peb_index %u, off->peb_migration_id %u, " + "off->log_area %#x, off->byte_offset %u\n", + pebi->peb_index, id, area_type, p->write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +static inline +void ssdfs_prepare_user_data_options(struct ssdfs_fs_info *fsi, + u8 *compression) +{ + u16 flags; + u8 type; + + flags = fsi->metadata_options.user_data.flags; + type = fsi->metadata_options.user_data.compression; + + *compression = SSDFS_FRAGMENT_UNCOMPR_BLOB; + + if (flags & SSDFS_USER_DATA_MAKE_COMPRESSION) { + switch (type) { + case SSDFS_USER_DATA_NOCOMPR_TYPE: + *compression = SSDFS_FRAGMENT_UNCOMPR_BLOB; + break; + + case SSDFS_USER_DATA_ZLIB_COMPR_TYPE: + *compression = SSDFS_FRAGMENT_ZLIB_BLOB; + break; + + case SSDFS_USER_DATA_LZO_COMPR_TYPE: + *compression = SSDFS_FRAGMENT_LZO_BLOB; + break; + } + } +} + +/* + * ssdfs_peb_store_fragment_in_area() - try to store fragment into area + * @pebi: pointer on PEB object + * @req: I/O request + * @area_type: area type + * @start_offset: start offset of fragment in bytes + * @data_bytes: size of fragment in bytes + * @off: PEB's physical offset to data [out] + * + * This function tries to store fragment into data area (diff updates + * or journal) of the log. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EAGAIN - unable to store data block in current log. + */ +static +int ssdfs_peb_store_fragment_in_area(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + int area_type, + u32 start_offset, + u32 data_bytes, + struct ssdfs_peb_phys_offset *off) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_byte_stream_descriptor byte_stream = {0}; + u8 compression_type = SSDFS_FRAGMENT_UNCOMPR_BLOB; + u32 metadata_offset; + u32 metadata_space; + u32 estimated_compr_size = data_bytes; + u32 check_bytes; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!req || !off); + BUG_ON(req->extent.data_bytes < + (req->result.processed_blks * + pebi->pebc->parent_si->fsi->pagesize)); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, ino %llu, " + "processed_blks %d, area_type %#x, " + "start_offset %u, data_bytes %u\n", + req->place.start.seg_id, pebi->peb_id, req->extent.ino, + req->result.processed_blks, area_type, + start_offset, data_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + err = ssdfs_peb_define_metadata_space(pebi, area_type, + start_offset, + data_bytes, + &metadata_offset, + &metadata_space); + if (err) { + SSDFS_ERR("fail to define metadata space: err %d\n", + err); + return err; + } + + ssdfs_prepare_user_data_options(fsi, &compression_type); + + switch (compression_type) { + case SSDFS_FRAGMENT_UNCOMPR_BLOB: + estimated_compr_size = data_bytes; + break; + + case SSDFS_FRAGMENT_ZLIB_BLOB: +#if defined(CONFIG_SSDFS_ZLIB) + estimated_compr_size = + ssdfs_peb_estimate_data_fragment_size(data_bytes); +#else + compression_type = SSDFS_FRAGMENT_UNCOMPR_BLOB; + estimated_compr_size = data_bytes; + SSDFS_WARN("ZLIB compression is not supported\n"); +#endif + break; + + case SSDFS_FRAGMENT_LZO_BLOB: +#if defined(CONFIG_SSDFS_LZO) + estimated_compr_size = + ssdfs_peb_estimate_data_fragment_size(data_bytes); +#else + compression_type = SSDFS_FRAGMENT_UNCOMPR_BLOB; + estimated_compr_size = data_bytes; + SSDFS_WARN("LZO compression is not supported\n"); +#endif + break; + + default: + BUG(); + } + + check_bytes = metadata_space + estimated_compr_size; + + if (!can_area_add_fragment(pebi, area_type, check_bytes)) { + pebi->current_log.free_data_pages = 0; + SSDFS_DBG("log is full\n"); + return -EAGAIN; + } + + if (!has_current_page_free_space(pebi, area_type, check_bytes)) { + err = ssdfs_peb_grow_log_area(pebi, area_type, check_bytes); + if (err == -ENOSPC) { + SSDFS_DBG("log is full\n"); + return -EAGAIN; + } else if (unlikely(err)) { + SSDFS_ERR("fail to grow log area: " + "type %#x, err %d\n", + area_type, err); + return err; + } + } + + byte_stream.pvec = &req->result.pvec; + byte_stream.start_offset = start_offset; + byte_stream.data_bytes = data_bytes; + + err = ssdfs_peb_store_byte_stream(pebi, &byte_stream, area_type, + compression_type, + req->extent.cno, + req->extent.parent_snapshot); + + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to add byte stream: " + "start_offset %u, data_bytes %u, area_type %#x, " + "cno %llu, parent_snapshot %llu\n", + byte_stream.start_offset, data_bytes, area_type, + req->extent.cno, req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to add byte stream: " + "start_offset %u, data_bytes %u, area_type %#x, " + "cno %llu, parent_snapshot %llu\n", + byte_stream.start_offset, data_bytes, area_type, + req->extent.cno, req->extent.parent_snapshot); + return err; + } + + ssdfs_peb_increase_area_payload_size(pebi, area_type, &byte_stream); + + err = ssdfs_peb_define_area_offset(pebi, area_type, &byte_stream, off); + if (unlikely(err)) { + SSDFS_ERR("fail to define area offset: " + "seg %llu, peb_id %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); + return err; + } + + return 0; +} + +/* + * ssdfs_peb_store_in_journal_area() - try to store fragment into Journal area + * @pebi: pointer on PEB object + * @req: I/O request + * @start_offset: start offset of fragment in bytes + * @data_bytes: size of fragment in bytes + * @off: PEB's physical offset to data [out] + * + * This function tries to store fragment into Journal area of the log. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EAGAIN - unable to store data block in current log. + */ +static inline +int ssdfs_peb_store_in_journal_area(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + u32 start_offset, + u32 data_bytes, + struct ssdfs_peb_phys_offset *off) +{ + int area_type = SSDFS_LOG_JOURNAL_AREA; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!req || !off); + BUG_ON(req->extent.data_bytes < + (req->result.processed_blks * + pebi->pebc->parent_si->fsi->pagesize)); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, ino %llu, " + "processed_blks %d, start_offset %u, data_bytes %u\n", + req->place.start.seg_id, pebi->peb_id, req->extent.ino, + req->result.processed_blks, start_offset, data_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + return ssdfs_peb_store_fragment_in_area(pebi, req, area_type, + start_offset, data_bytes, + off); +} + +/* + * ssdfs_peb_store_in_diff_area() - try to store fragment into Diff area + * @pebi: pointer on PEB object + * @req: I/O request + * @start_offset: start offset of fragment in bytes + * @data_bytes: size of fragment in bytes + * @off: PEB's physical offset to data [out] + * + * This function tries to store fragment into Diff Updates area of the log. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EAGAIN - unable to store data block in current log. + */ +static inline +int ssdfs_peb_store_in_diff_area(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + u32 start_offset, + u32 data_bytes, + struct ssdfs_peb_phys_offset *off) +{ + int area_type = SSDFS_LOG_DIFFS_AREA; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!req || !off); + BUG_ON(req->extent.data_bytes < + (req->result.processed_blks * + pebi->pebc->parent_si->fsi->pagesize)); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, ino %llu, " + "processed_blks %d, start_offset %u, data_bytes %u\n", + req->place.start.seg_id, pebi->peb_id, req->extent.ino, + req->result.processed_blks, start_offset, data_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + return ssdfs_peb_store_fragment_in_area(pebi, req, area_type, + start_offset, data_bytes, + off); +} + +/* + * ssdfs_peb_store_in_main_area() - try to store fragment into Main area + * @pebi: pointer on PEB object + * @req: I/O request + * @start_offset: start offset of fragment in bytes + * @data_bytes: size of fragment in bytes + * @off: PEB's physical offset to data [out] + * + * This function tries to store fragment into Main area of the log. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EAGAIN - unable to store data block in current log. + */ +static +int ssdfs_peb_store_in_main_area(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + u32 start_offset, + u32 data_bytes, + struct ssdfs_peb_phys_offset *off) +{ + int area_type = SSDFS_LOG_MAIN_AREA; + struct ssdfs_byte_stream_descriptor byte_stream = {0}; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!req || !off); + BUG_ON(req->extent.data_bytes < + (req->result.processed_blks * + pebi->pebc->parent_si->fsi->pagesize)); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, ino %llu, " + "processed_blks %d, rest_bytes %u\n", + req->place.start.seg_id, pebi->peb_id, req->extent.ino, + req->result.processed_blks, + data_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!can_area_add_fragment(pebi, area_type, data_bytes)) { + pebi->current_log.free_data_pages = 0; + SSDFS_DBG("log is full\n"); + return -EAGAIN; + } + + if (!has_current_page_free_space(pebi, area_type, data_bytes)) { + err = ssdfs_peb_grow_log_area(pebi, area_type, data_bytes); + if (err == -ENOSPC) { + SSDFS_DBG("log is full\n"); + return -EAGAIN; + } else if (unlikely(err)) { + SSDFS_ERR("fail to grow log area: " + "type %#x, err %d\n", + area_type, err); + return err; + } + } + + byte_stream.pvec = &req->result.pvec; + byte_stream.start_offset = start_offset; + byte_stream.data_bytes = data_bytes; + + err = ssdfs_peb_store_byte_stream_in_main_area(pebi, &byte_stream, + req->extent.cno, + req->extent.parent_snapshot); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to add byte stream: " + "start_offset %u, data_bytes %u, area_type %#x, " + "cno %llu, parent_snapshot %llu\n", + start_offset, data_bytes, area_type, + req->extent.cno, req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to add byte stream: " + "start_offset %u, data_bytes %u, area_type %#x, " + "cno %llu, parent_snapshot %llu\n", + start_offset, data_bytes, area_type, + req->extent.cno, req->extent.parent_snapshot); + return err; + } + + ssdfs_peb_increase_area_payload_size(pebi, area_type, &byte_stream); + + err = ssdfs_peb_define_area_offset(pebi, area_type, &byte_stream, off); + if (unlikely(err)) { + SSDFS_ERR("fail to define area offset: " + "seg %llu, peb_id %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); + return err; + } + + return 0; +} + +/* + * is_ssdfs_block_full() - check that data size is equal to page size + * @pagesize: page size in bytes + * @data_size: data size in bytes + */ +static inline +bool is_ssdfs_block_full(u32 pagesize, u32 data_size) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pagesize %u, data_size %u\n", + pagesize, data_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pagesize > PAGE_SIZE) + return data_size >= pagesize; + + return data_size >= PAGE_SIZE; +} + +/* + * can_ssdfs_pagevec_be_compressed() - check that pagevec can be compressed + * @start_page: starting page in pagevec + * @page_count: count of pages in the portion + * @bytes_count: bytes number in the portion + * @req: segment request + */ +static +bool can_ssdfs_pagevec_be_compressed(u32 start_page, u32 page_count, + u32 bytes_count, + struct ssdfs_segment_request *req) +{ + struct page *page; + int page_index; + u32 start_offset; + u32 portion_size; + u32 tested_bytes = 0; + u32 can_compress[2] = {0, 0}; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); + + SSDFS_DBG("start_page %u, page_count %u, " + "bytes_count %u\n", + start_page, page_count, bytes_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < page_count; i++) { + int state; + + page_index = i + start_page; + start_offset = page_index >> PAGE_SHIFT; + portion_size = PAGE_SIZE; + + if (page_index >= pagevec_count(&req->result.pvec)) { + SSDFS_ERR("fail to check page: " + "index %d, pvec_size %u\n", + page_index, + pagevec_count(&req->result.pvec)); + return false; + } + + page = req->result.pvec.pages[page_index]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(tested_bytes >= bytes_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + portion_size = min_t(u32, portion_size, + bytes_count - tested_bytes); + + if (ssdfs_can_compress_data(page, portion_size)) + state = 1; + else + state = 0; + + can_compress[state]++; + tested_bytes += portion_size; + } + + return can_compress[true] >= can_compress[false]; +} + +/* + * ssdfs_peb_define_area_type() - define area type + * @pebi: pointer on PEB object + * @bytes_count: bytes number in the portion + * @start_page: starting page in pagevec + * @page_count: count of pages in the portion + * @req: I/O request + * @desc_off: block descriptor offset + * @pos: offset position + * @area_type: type of area [out] + */ +static +int ssdfs_peb_define_area_type(struct ssdfs_peb_info *pebi, + u32 bytes_count, + u32 start_page, u32 page_count, + struct ssdfs_segment_request *req, + struct ssdfs_phys_offset_descriptor *desc_off, + struct ssdfs_offset_position *pos, + int *area_type) +{ + struct ssdfs_fs_info *fsi; + bool can_be_compressed = false; +#ifdef CONFIG_SSDFS_DIFF_ON_WRITE_USER_DATA + int err; +#endif /* CONFIG_SSDFS_DIFF_ON_WRITE_USER_DATA */ + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!req || !area_type); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, ino %llu, " + "processed_blks %d, bytes_count %u, " + "start_page %u, page_count %u\n", + req->place.start.seg_id, pebi->peb_id, req->extent.ino, + req->result.processed_blks, + bytes_count, start_page, page_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + *area_type = SSDFS_LOG_AREA_MAX; + + if (req->private.class == SSDFS_PEB_DIFF_ON_WRITE_REQ) { + *area_type = SSDFS_LOG_DIFFS_AREA; + } else if (!is_ssdfs_block_full(fsi->pagesize, bytes_count)) + *area_type = SSDFS_LOG_JOURNAL_AREA; + else { +#ifdef CONFIG_SSDFS_DIFF_ON_WRITE_USER_DATA + if (req->private.class == SSDFS_PEB_UPDATE_REQ) { + err = ssdfs_user_data_prepare_diff(pebi->pebc, + desc_off, + pos, req); + } else + err = -ENOENT; + + if (err == -ENOENT) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to prepare user data diff: " + "seg %llu, peb %llu, ino %llu, " + "processed_blks %d, bytes_count %u, " + "start_page %u, page_count %u\n", + req->place.start.seg_id, + pebi->peb_id, + req->extent.ino, + req->result.processed_blks, + bytes_count, start_page, + page_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + can_be_compressed = + can_ssdfs_pagevec_be_compressed(start_page, + page_count, + bytes_count, + req); + if (can_be_compressed) + *area_type = SSDFS_LOG_DIFFS_AREA; + else + *area_type = SSDFS_LOG_MAIN_AREA; + } else if (unlikely(err)) { + SSDFS_ERR("fail to prepare user data diff: " + "seg %llu, peb %llu, ino %llu, " + "processed_blks %d, bytes_count %u, " + "start_page %u, page_count %u, err %d\n", + req->place.start.seg_id, + pebi->peb_id, + req->extent.ino, + req->result.processed_blks, + bytes_count, start_page, + page_count, err); + return err; + } else + *area_type = SSDFS_LOG_DIFFS_AREA; +#else + can_be_compressed = can_ssdfs_pagevec_be_compressed(start_page, + page_count, + bytes_count, + req); + if (can_be_compressed) + *area_type = SSDFS_LOG_DIFFS_AREA; + else + *area_type = SSDFS_LOG_MAIN_AREA; +#endif /* CONFIG_SSDFS_DIFF_ON_WRITE_USER_DATA */ + } + + return 0; +} + +/* + * ssdfs_peb_add_block_into_data_area() - try to add data block into log + * @pebi: pointer on PEB object + * @req: I/O request + * @desc_off: block descriptor offset + * @pos: offset position + * @off: PEB's physical offset to data [out] + * @written_bytes: amount of written bytes [out] + * + * This function tries to add data block into data area (main, diff updates + * or journal) of the log. If attempt to add data or block descriptor + * has failed with %-EAGAIN error then it needs to return request into + * head of the queue. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EAGAIN - unable to store data block in current log. + */ +static +int ssdfs_peb_add_block_into_data_area(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + struct ssdfs_phys_offset_descriptor *desc_off, + struct ssdfs_offset_position *pos, + struct ssdfs_peb_phys_offset *off, + u32 *written_bytes) +{ + struct ssdfs_fs_info *fsi; + int area_type = SSDFS_LOG_AREA_MAX; + u32 rest_bytes; + u32 start_page = 0; + u32 page_count = 0; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!req || !off); + BUG_ON(req->extent.data_bytes < + (req->result.processed_blks * + pebi->pebc->parent_si->fsi->pagesize)); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); +#endif /* CONFIG_SSDFS_DEBUG */ + + *written_bytes = 0; + rest_bytes = ssdfs_request_rest_bytes(pebi, req); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, ino %llu, " + "processed_blks %d, rest_bytes %u\n", + req->place.start.seg_id, pebi->peb_id, req->extent.ino, + req->result.processed_blks, + rest_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + if (req->private.class == SSDFS_PEB_DIFF_ON_WRITE_REQ) { + start_page = req->result.processed_blks; + rest_bytes = PAGE_SIZE; + page_count = 1; + } else if (fsi->pagesize < PAGE_SIZE) { + start_page = req->result.processed_blks << fsi->log_pagesize; + start_page >>= PAGE_SHIFT; + rest_bytes = min_t(u32, rest_bytes, PAGE_SIZE); + page_count = rest_bytes + PAGE_SIZE - 1; + page_count >>= PAGE_SHIFT; + } else { + start_page = req->result.processed_blks << fsi->log_pagesize; + start_page >>= PAGE_SHIFT; + rest_bytes = min_t(u32, rest_bytes, fsi->pagesize); + page_count = rest_bytes + PAGE_SIZE - 1; + page_count >>= PAGE_SHIFT; + } + + err = ssdfs_peb_define_area_type(pebi, rest_bytes, + start_page, page_count, + req, desc_off, pos, &area_type); + if (unlikely(err)) { + SSDFS_ERR("fail to define area type: " + "rest_bytes %u, start_page %u, " + "page_count %u, err %d\n", + rest_bytes, start_page, + page_count, err); + return err; + } + + for (i = 0; i < page_count; i++) { + int page_index = i + start_page; + u32 start_offset = page_index << PAGE_SHIFT; + u32 portion_size = PAGE_SIZE; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(*written_bytes >= rest_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + portion_size = min_t(u32, portion_size, + rest_bytes - *written_bytes); + + switch (area_type) { + case SSDFS_LOG_JOURNAL_AREA: + err = ssdfs_peb_store_in_journal_area(pebi, req, + start_offset, + portion_size, + off); + break; + + case SSDFS_LOG_DIFFS_AREA: + err = ssdfs_peb_store_in_diff_area(pebi, req, + start_offset, + portion_size, + off); + break; + + case SSDFS_LOG_MAIN_AREA: + err = ssdfs_peb_store_in_main_area(pebi, req, + start_offset, + portion_size, + off); + break; + + default: + BUG(); + } + + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to add page into current log: " + "index %d, portion_size %u\n", + page_index, portion_size); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to add page into current log: " + "index %d, portion_size %u\n", + page_index, portion_size); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EAGAIN; + } else if (unlikely(err)) { + SSDFS_ERR("fail to add page: " + "index %d, portion_size %u, err %d\n", + page_index, portion_size, err); + return err; + } + + *written_bytes += portion_size; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("written_bytes %u\n", *written_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * need_reserve_free_space() - check necessuty to reserve free space + * @pebi: pointer on PEB object + * @area_type: area type + * @fragment_size: size of fragment + * + * This function checks that it needs to reserve free space. + */ +static +bool need_reserve_free_space(struct ssdfs_peb_info *pebi, + int area_type, + u32 fragment_size) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_page_array *area_pages; + bool is_space_enough, is_page_available; + u32 write_offset; + pgoff_t page_index; + unsigned long pages_count; + struct page *page; + struct ssdfs_peb_area_metadata *metadata; + u32 free_space = 0; + u16 flags; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(fragment_size == 0); + BUG_ON(area_type >= SSDFS_LOG_AREA_MAX); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("area_type %#x, fragment_size %u\n", + area_type, fragment_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + switch (area_type) { + case SSDFS_LOG_BLK_DESC_AREA: + flags = fsi->metadata_options.blk2off_tbl.flags; + if (flags & SSDFS_BLK2OFF_TBL_MAKE_COMPRESSION) { + /* + * continue logic + */ + } else + return has_current_page_free_space(pebi, area_type, + fragment_size); + break; + + default: + return has_current_page_free_space(pebi, area_type, + fragment_size); + } + + write_offset = pebi->current_log.area[area_type].write_offset; + page_index = write_offset / PAGE_SIZE; + + down_read(&pebi->current_log.area[area_type].array.lock); + pages_count = pebi->current_log.area[area_type].array.pages_count; + up_read(&pebi->current_log.area[area_type].array.lock); + + if (page_index < pages_count) + free_space += PAGE_SIZE - (write_offset % PAGE_SIZE); + + metadata = &pebi->current_log.area[area_type].metadata; + write_offset = metadata->area.blk_desc.flush_buf.write_offset; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("write_offset %u, free_space %u\n", + write_offset, free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + is_space_enough = (write_offset / 2) < free_space; + + write_offset = pebi->current_log.area[area_type].write_offset; + + page_index = write_offset >> PAGE_SHIFT; + area_pages = &pebi->current_log.area[area_type].array; + page = ssdfs_page_array_get_page(area_pages, page_index); + if (IS_ERR_OR_NULL(page)) + is_page_available = false; + else { + is_page_available = true; + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return is_space_enough && is_page_available; +} + +/* + * ssdfs_peb_reserve_block_descriptor() - reserve block descriptor space + * @pebi: pointer on PEB object + * @req: I/O request + * + * This function tries to reserve space for block descriptor in + * block descriptor area. If attempt to add data or block descriptor + * has failed with %-EAGAIN error then it needs to return request into + * head of the queue. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EAGAIN - unable to reserve block descriptor in current log. + */ +static +int ssdfs_peb_reserve_block_descriptor(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + int area_type = SSDFS_LOG_BLK_DESC_AREA; + struct ssdfs_peb_area_metadata *metadata; + struct ssdfs_area_block_table *table; + int items_count, capacity; + size_t blk_desc_size = sizeof(struct ssdfs_block_descriptor); + u16 flags; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!req); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("ino %llu, logical_offset %llu, processed_blks %d\n", + req->extent.ino, req->extent.logical_offset, + req->result.processed_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + flags = fsi->metadata_options.blk2off_tbl.flags; + + metadata = &pebi->current_log.area[area_type].metadata; + table = &metadata->area.blk_desc.table; + + items_count = metadata->area.blk_desc.items_count; + capacity = metadata->area.blk_desc.capacity; + + if (items_count > capacity) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_count %d > capacity %d\n", + items_count, capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_count %d, capacity %d\n", + items_count, capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + if (flags & SSDFS_BLK2OFF_TBL_MAKE_COMPRESSION) { + if (need_reserve_free_space(pebi, area_type, + blk_desc_size)) { + err = ssdfs_peb_grow_log_area(pebi, area_type, + blk_desc_size); + if (err == -ENOSPC) { + SSDFS_DBG("log is full\n"); + return -EAGAIN; + } else if (unlikely(err)) { + SSDFS_ERR("fail to grow log area: " + "type %#x, err %d\n", + area_type, err); + return err; + } + } + } else { + if (!has_current_page_free_space(pebi, area_type, + blk_desc_size)) { + err = ssdfs_peb_grow_log_area(pebi, area_type, + blk_desc_size); + if (err == -ENOSPC) { + SSDFS_DBG("log is full\n"); + return -EAGAIN; + } else if (unlikely(err)) { + SSDFS_ERR("fail to grow log area: " + "type %#x, err %d\n", + area_type, err); + return err; + } + } + } + + metadata->area.blk_desc.items_count++; + + return 0; +} + +/* + * ssdfs_peb_init_block_descriptor_state() - init block descriptor's state + * @pebi: pointer on PEB object + * @data: data offset inside PEB + * @state: block descriptor's state [out] + * + * This function initializes a state of block descriptor. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - corrupted block descriptor. + * %-ERANGE - internal error. + */ +static inline +int ssdfs_peb_init_block_descriptor_state(struct ssdfs_peb_info *pebi, + struct ssdfs_peb_phys_offset *data, + struct ssdfs_blk_state_offset *state) +{ + int id; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !data || !state); +#endif /* CONFIG_SSDFS_DEBUG */ + + state->log_start_page = cpu_to_le16(pebi->current_log.start_page); + state->log_area = data->log_area; + state->byte_offset = cpu_to_le32(data->byte_offset); + + id = ssdfs_get_peb_migration_id_checked(pebi); + if (unlikely(id < 0)) { + SSDFS_ERR("invalid peb_migration_id: " + "seg %llu, peb_id %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, id); + return id; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(id > U8_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + state->peb_migration_id = (u8)id; + + return 0; +} + +/* + * ssdfs_peb_prepare_block_descriptor() - prepare new state of block descriptor + * @pebi: pointer on PEB object + * @req: I/O request + * @data: data offset inside PEB + * @desc: block descriptor [out] + * + * This function prepares new state of block descriptor. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - corrupted block descriptor. + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_prepare_block_descriptor(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + struct ssdfs_peb_phys_offset *data, + struct ssdfs_block_descriptor *desc) +{ + u64 logical_offset; + u32 pagesize; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req || !desc || !data); + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + + SSDFS_DBG("ino %llu, logical_offset %llu, processed_blks %d\n", + req->extent.ino, req->extent.logical_offset, + req->result.processed_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + pagesize = pebi->pebc->parent_si->fsi->pagesize; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(req->result.processed_blks > req->place.len); +#endif /* CONFIG_SSDFS_DEBUG */ + + logical_offset = req->extent.logical_offset + + (req->result.processed_blks * pagesize); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(logical_offset >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + logical_offset /= pagesize; + +#ifdef CONFIG_SSDFS_DEBUG + DEBUG_BLOCK_DESCRIPTOR(pebi->pebc->parent_si->seg_id, + pebi->peb_id, desc); +#endif /* CONFIG_SSDFS_DEBUG */ + + i = 0; + + do { + if (IS_SSDFS_BLK_STATE_OFFSET_INVALID(&desc->state[i])) + break; + else + i++; + } while (i < SSDFS_BLK_STATE_OFF_MAX); + + if (i == 0) { + /* empty block descriptor */ + desc->ino = cpu_to_le64(req->extent.ino); + desc->logical_offset = cpu_to_le32((u32)logical_offset); + desc->peb_index = cpu_to_le16(data->peb_index); + desc->peb_page = cpu_to_le16(data->peb_page); + + err = ssdfs_peb_init_block_descriptor_state(pebi, data, + &desc->state[0]); + if (unlikely(err)) { + SSDFS_ERR("fail to init block descriptor state: " + "err %d\n", err); + return err; + } + } else if (i >= SSDFS_BLK_STATE_OFF_MAX) { + SSDFS_WARN("block descriptor is exhausted: " + "seg_id %llu, peb_id %llu, " + "ino %llu, logical_offset %llu\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + req->extent.ino, + req->extent.logical_offset); + return -ERANGE; + } else { + if (le64_to_cpu(desc->ino) != req->extent.ino) { + SSDFS_ERR("corrupted block state: " + "ino1 %llu != ino2 %llu\n", + le64_to_cpu(desc->ino), + req->extent.ino); + return -EIO; + } + + if (le32_to_cpu(desc->logical_offset) != logical_offset) { + SSDFS_ERR("corrupted block state: " + "logical_offset1 %u != logical_offset2 %llu\n", + le32_to_cpu(desc->logical_offset), + logical_offset); + return -EIO; + } + + if (le16_to_cpu(desc->peb_index) != data->peb_index) { + SSDFS_ERR("corrupted block state: " + "peb_index1 %u != peb_index2 %u\n", + le16_to_cpu(desc->peb_index), + data->peb_index); + return -EIO; + } + + if (le16_to_cpu(desc->peb_page) != data->peb_page) { + SSDFS_ERR("corrupted block state: " + "peb_page1 %u != peb_page2 %u\n", + le16_to_cpu(desc->peb_page), + data->peb_page); + return -EIO; + } + + err = ssdfs_peb_init_block_descriptor_state(pebi, data, + &desc->state[i]); + if (unlikely(err)) { + SSDFS_ERR("fail to init block descriptor state: " + "err %d\n", err); + return err; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + DEBUG_BLOCK_DESCRIPTOR(pebi->pebc->parent_si->seg_id, + pebi->peb_id, desc); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_peb_write_block_descriptor() - write block descriptor into area + * @pebi: pointer on PEB object + * @req: I/O request + * @desc: block descriptor + * @data_off: offset to data in PEB [in] + * @off: block descriptor offset in PEB [out] + * @write_offset: write offset for written block descriptor [out] + * + * This function tries to write block descriptor into dedicated area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-E2BIG - buffer is full. + */ +static +int ssdfs_peb_write_block_descriptor(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + struct ssdfs_block_descriptor *desc, + struct ssdfs_peb_phys_offset *data_off, + struct ssdfs_peb_phys_offset *off, + u32 *write_offset) +{ + struct ssdfs_fs_info *fsi; + int area_type = SSDFS_LOG_BLK_DESC_AREA; + struct ssdfs_peb_area *area; + struct ssdfs_peb_area_metadata *metadata; + struct ssdfs_peb_temp_buffer *buf; + size_t blk_desc_size = sizeof(struct ssdfs_block_descriptor); + struct page *page; + pgoff_t page_index; + u32 page_off; + int id; + int items_count, capacity; + u16 flags; + bool is_buffer_full = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !req || !desc || !off || !write_offset); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("ino %llu, logical_offset %llu, processed_blks %d\n", + req->extent.ino, req->extent.logical_offset, + req->result.processed_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + flags = fsi->metadata_options.blk2off_tbl.flags; + + area = &pebi->current_log.area[area_type]; + metadata = &area->metadata; + items_count = metadata->area.blk_desc.items_count; + capacity = metadata->area.blk_desc.capacity; + + if (items_count < 1) { + SSDFS_ERR("block descriptor is not reserved\n"); + return -ERANGE; + } + + *write_offset = ssdfs_peb_correct_area_write_offset(area->write_offset, + blk_desc_size); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("area->write_offset %u, write_offset %u\n", + area->write_offset, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (flags & SSDFS_BLK2OFF_TBL_MAKE_COMPRESSION) { + buf = &metadata->area.blk_desc.flush_buf; + +#ifdef CONFIG_SSDFS_DEBUG + if (buf->write_offset % blk_desc_size) { + SSDFS_ERR("invalid write_offset %u\n", + buf->write_offset); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + if ((buf->write_offset + buf->granularity) > buf->size) { + SSDFS_ERR("buffer is full: " + "write_offset %u, granularity %zu, " + "size %zu\n", + buf->write_offset, + buf->granularity, + buf->size); + + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!buf->ptr); + + if (buf->granularity != blk_desc_size) { + SSDFS_ERR("invalid granularity: " + "granularity %zu, item_size %zu\n", + buf->granularity, blk_desc_size); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_memcpy(buf->ptr, buf->write_offset, buf->size, + desc, 0, blk_desc_size, + blk_desc_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: " + "write_offset %u, blk_desc_size %zu, " + "err %d\n", + buf->write_offset, blk_desc_size, err); + return err; + } + + buf->write_offset += blk_desc_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("buf->write_offset %u, buf->size %zu\n", + buf->write_offset, buf->size); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (buf->write_offset == buf->size) { + err = ssdfs_peb_realloc_write_buffer(buf); + if (err == -E2BIG) { + err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("buffer is full: " + "write_offset %u, size %zu\n", + buf->write_offset, + buf->size); +#endif /* CONFIG_SSDFS_DEBUG */ + + is_buffer_full = true; + } else if (unlikely(err)) { + SSDFS_ERR("fail to reallocate buffer: " + "err %d\n", err); + return err; + } + } + } else { + page_index = *write_offset / PAGE_SIZE; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("area->write_offset %u, blk_desc_size %zu, " + "write_offset %u, page_index %lu\n", + area->write_offset, blk_desc_size, + *write_offset, page_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = ssdfs_page_array_grab_page(&area->array, page_index); + if (IS_ERR_OR_NULL(page)) { + SSDFS_ERR("fail to get page %lu for area %#x\n", + page_index, area_type); + return -ERANGE; + } + + page_off = *write_offset % PAGE_SIZE; + + err = ssdfs_memcpy_to_page(page, page_off, PAGE_SIZE, + desc, 0, blk_desc_size, + blk_desc_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: " + "page_off %u, blk_desc_size %zu, err %d\n", + page_off, blk_desc_size, err); + goto finish_copy; + } + + SetPageUptodate(page); + + err = ssdfs_page_array_set_page_dirty(&area->array, + page_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: " + "err %d\n", + page_index, err); + } + +finish_copy: + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) + return err; + } + + id = ssdfs_get_peb_migration_id_checked(pebi); + if (unlikely(id < 0)) { + SSDFS_ERR("invalid peb_migration_id: " + "seg %llu, peb_id %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, id); + return id; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(id > U8_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + /* Prepare block descriptor's offset in PEB */ + off->peb_index = pebi->peb_index; + off->peb_migration_id = (u8)id; + off->peb_page = data_off->peb_page; + off->log_area = area_type; + off->byte_offset = *write_offset; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, peb_id %llu, " + "peb_index %u, peb_page %u, " + "log_area %#x, byte_offset %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + le16_to_cpu(off->peb_index), + le16_to_cpu(off->peb_page), + off->log_area, + le32_to_cpu(off->byte_offset)); +#endif /* CONFIG_SSDFS_DEBUG */ + + area->write_offset = *write_offset + blk_desc_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("area->write_offset %u, write_offset %u\n", + area->write_offset, *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_buffer_full) + return -E2BIG; + + return 0; +} + +/* + * ssdfs_peb_compress_blk_descs_fragment() - compress block descriptor fragment + * @pebi: pointer on PEB object + * @uncompr_size: size of uncompressed fragment + * @compr_size: size of compressed fragment [out] + * + * This function tries to compress block descriptor fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EAGAIN - unable to get fragment descriptor. + */ +static +int ssdfs_peb_compress_blk_descs_fragment(struct ssdfs_peb_info *pebi, + size_t uncompr_size, + size_t *compr_size) +{ + struct ssdfs_fs_info *fsi; + int area_type = SSDFS_LOG_BLK_DESC_AREA; + struct ssdfs_peb_area *area; + struct ssdfs_peb_temp_buffer *buf; + struct page *page; + pgoff_t page_index; + unsigned char *kaddr; + u32 page_off; + u8 compr_type; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !compr_size); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + area = &pebi->current_log.area[area_type]; + buf = &area->metadata.area.blk_desc.flush_buf; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, " + "compressed_offset %u, write_offset %u, " + "uncompr_size %zu\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + area->compressed_offset, + area->write_offset, + uncompr_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (uncompr_size > buf->size) { + SSDFS_ERR("uncompr_size %zu > buf->size %zu\n", + uncompr_size, buf->size); + return -ERANGE; + } + + switch (fsi->metadata_options.blk2off_tbl.compression) { + case SSDFS_BLK2OFF_TBL_NOCOMPR_TYPE: + compr_type = SSDFS_COMPR_NONE; + break; + case SSDFS_BLK2OFF_TBL_ZLIB_COMPR_TYPE: + compr_type = SSDFS_COMPR_ZLIB; + break; + case SSDFS_BLK2OFF_TBL_LZO_COMPR_TYPE: + compr_type = SSDFS_COMPR_LZO; + break; + default: + BUG(); + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!buf->ptr); + + SSDFS_DBG("BUF DUMP: size %zu\n", + buf->size); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + buf->ptr, + buf->size); + SSDFS_DBG("\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_index = area->compressed_offset / PAGE_SIZE; + + page = ssdfs_page_array_grab_page(&area->array, page_index); + if (IS_ERR_OR_NULL(page)) { + SSDFS_ERR("fail to get page %lu for area %#x\n", + page_index, area_type); + return -ERANGE; + } + + page_off = area->compressed_offset % PAGE_SIZE; + *compr_size = PAGE_SIZE - page_off; + + kaddr = kmap_local_page(page); + err = ssdfs_compress(compr_type, + buf->ptr, (u8 *)kaddr + page_off, + &uncompr_size, compr_size); + flush_dcache_page(page); + kunmap_local(kaddr); + + if (err == -E2BIG) { + void *compr_buf = NULL; + u32 copy_len; + + compr_buf = ssdfs_flush_kzalloc(PAGE_SIZE, GFP_KERNEL); + if (!compr_buf) { + SSDFS_ERR("fail to allocate buffer\n"); + return -ENOMEM; + } + + *compr_size = PAGE_SIZE; + err = ssdfs_compress(compr_type, + buf->ptr, compr_buf, + &uncompr_size, compr_size); + if (err) { + SSDFS_ERR("fail to compress fragment: " + "data_bytes %zu, free_space %zu, " + "err %d\n", + uncompr_size, *compr_size, err); + goto free_compr_buf; + } + + copy_len = PAGE_SIZE - page_off; + + err = ssdfs_memcpy_to_page(page, page_off, PAGE_SIZE, + compr_buf, 0, PAGE_SIZE, + copy_len); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + goto free_compr_buf; + } + + SetPageUptodate(page); + + err = ssdfs_page_array_set_page_dirty(&area->array, + page_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: " + "err %d\n", + page_index, err); + } + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + + if (unlikely(err)) + goto free_compr_buf; + + page_index++; + + page = ssdfs_page_array_grab_page(&area->array, page_index); + if (IS_ERR_OR_NULL(page)) { + err = -ERANGE; + SSDFS_ERR("fail to get page %lu for area %#x\n", + page_index, area_type); + goto free_compr_buf; + } + + err = ssdfs_memcpy_to_page(page, 0, PAGE_SIZE, + compr_buf, copy_len, PAGE_SIZE, + *compr_size - copy_len); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + goto free_compr_buf; + } + + SetPageUptodate(page); + + err = ssdfs_page_array_set_page_dirty(&area->array, + page_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: " + "err %d\n", + page_index, err); + } + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + + if (unlikely(err)) + goto free_compr_buf; + +free_compr_buf: + ssdfs_flush_kfree(compr_buf); + + if (unlikely(err)) + return err; + } else if (err) { + SSDFS_ERR("fail to compress fragment: " + "data_bytes %zu, free_space %zu, err %d\n", + uncompr_size, *compr_size, err); + return err; + } else { + SetPageUptodate(page); + + err = ssdfs_page_array_set_page_dirty(&area->array, + page_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: " + "err %d\n", + page_index, err); + } + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + + if (unlikely(err)) + return err; + } + + memset(buf->ptr, 0, buf->size); + buf->write_offset = 0; + + return 0; +} + +/* + * ssdfs_peb_store_compressed_block_descriptor() - store block descriptor + * @pebi: pointer on PEB object + * @req: I/O request + * @blk_desc: block descriptor + * @data_off: offset to data in PEB [in] + * @desc_off: offset to block descriptor in PEB [out] + * + * This function tries to compress and to store block descriptor + * into dedicated area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EAGAIN - unable to get fragment descriptor. + */ +static +int ssdfs_peb_store_compressed_block_descriptor(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + struct ssdfs_block_descriptor *blk_desc, + struct ssdfs_peb_phys_offset *data_off, + struct ssdfs_peb_phys_offset *desc_off) +{ + struct ssdfs_fs_info *fsi; + int area_type = SSDFS_LOG_BLK_DESC_AREA; + struct ssdfs_peb_area *area; + struct ssdfs_peb_temp_buffer *buf; + struct ssdfs_fragments_chain_header *chain_hdr; + struct ssdfs_fragment_desc *meta_desc; + size_t blk_desc_size = sizeof(struct ssdfs_block_descriptor); + u32 write_offset = 0; + u32 old_offset; + u16 bytes_count; + u16 fragments_count; + size_t compr_size = 0; + u8 fragment_type = SSDFS_DATA_BLK_DESC; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !req || !blk_desc || !data_off || !desc_off); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, ino %llu, " + "logical_offset %llu, processed_blks %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + req->extent.ino, req->extent.logical_offset, + req->result.processed_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + area = &pebi->current_log.area[area_type]; + buf = &area->metadata.area.blk_desc.flush_buf; + chain_hdr = &area->metadata.area.blk_desc.table.chain_hdr; + fragments_count = le16_to_cpu(chain_hdr->fragments_count); + + switch (fsi->metadata_options.blk2off_tbl.compression) { + case SSDFS_BLK2OFF_TBL_NOCOMPR_TYPE: + fragment_type = SSDFS_DATA_BLK_DESC; + break; + case SSDFS_BLK2OFF_TBL_ZLIB_COMPR_TYPE: + fragment_type = SSDFS_DATA_BLK_DESC_ZLIB; + break; + case SSDFS_BLK2OFF_TBL_LZO_COMPR_TYPE: + fragment_type = SSDFS_DATA_BLK_DESC_LZO; + break; + default: + BUG(); + } + + if (fragments_count == 0) { + meta_desc = ssdfs_peb_get_area_free_frag_desc(pebi, area_type); + if (IS_ERR(meta_desc)) { + SSDFS_ERR("fail to get current fragment descriptor: " + "err %d\n", + (int)PTR_ERR(meta_desc)); + return PTR_ERR(meta_desc); + } else if (!meta_desc) { + err = -ERANGE; + SSDFS_ERR("fail to get current fragment descriptor: " + "err %d\n", + err); + return err; + } + + meta_desc->magic = SSDFS_FRAGMENT_DESC_MAGIC; + meta_desc->type = fragment_type; + meta_desc->flags = SSDFS_FRAGMENT_HAS_CSUM; + meta_desc->offset = cpu_to_le32(area->compressed_offset); + meta_desc->checksum = 0; + } else { + meta_desc = ssdfs_peb_get_area_cur_frag_desc(pebi, area_type); + if (IS_ERR(meta_desc)) { + SSDFS_ERR("fail to get current fragment descriptor: " + "err %d\n", + (int)PTR_ERR(meta_desc)); + return PTR_ERR(meta_desc); + } else if (!meta_desc) { + err = -ERANGE; + SSDFS_ERR("fail to get current fragment descriptor: " + "err %d\n", + err); + return err; + } + } + + old_offset = le32_to_cpu(meta_desc->offset); + bytes_count = le16_to_cpu(meta_desc->uncompr_size); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("old_offset %u, bytes_count %u\n", + old_offset, bytes_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_prepare_block_descriptor(pebi, req, data_off, + blk_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare block descriptor: " + "ino %llu, logical_offset %llu, " + "processed_blks %d, err %d\n", + req->extent.ino, req->extent.logical_offset, + req->result.processed_blks, err); + return err; + } + + err = ssdfs_peb_write_block_descriptor(pebi, req, blk_desc, + data_off, desc_off, + &write_offset); + if (err == -E2BIG) { + /* + * continue logic + */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to write block descriptor: " + "ino %llu, logical_offset %llu, " + "processed_blks %d, err %d\n", + req->extent.ino, req->extent.logical_offset, + req->result.processed_blks, err); + return err; + } + + bytes_count += blk_desc_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("bytes_count %u\n", + bytes_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (err == -E2BIG) { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!buf->ptr); + + if (buf->write_offset != buf->size) { + SSDFS_ERR("invalid request: " + "buf->write_offset %u, buf->size %zu\n", + buf->write_offset, buf->size); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + if (bytes_count > buf->size) { + SSDFS_ERR("invalid size: " + "bytes_count %u > buf->size %zu\n", + bytes_count, buf->size); + return -ERANGE; + } + + meta_desc->checksum = ssdfs_crc32_le(buf->ptr, bytes_count); + + if (le32_to_cpu(meta_desc->checksum) == 0) { + SSDFS_WARN("checksum is invalid: " + "seg %llu, peb %llu, ino %llu, " + "logical_offset %llu, processed_blks %d, " + "bytes_count %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + req->extent.ino, + req->extent.logical_offset, + req->result.processed_blks, + bytes_count); + return -ERANGE; + } + + err = ssdfs_peb_compress_blk_descs_fragment(pebi, + bytes_count, + &compr_size); + if (unlikely(err)) { + SSDFS_ERR("fail to compress blk desc fragment: " + "err %d\n", err); + return err; + } + + meta_desc->offset = cpu_to_le32(area->compressed_offset); + +#ifdef CONFIG_SSDFS_DEBUG + WARN_ON(compr_size > U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + meta_desc->compr_size = cpu_to_le16((u16)compr_size); + +#ifdef CONFIG_SSDFS_DEBUG + WARN_ON(bytes_count > U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + meta_desc->uncompr_size = cpu_to_le16((u16)bytes_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("offset %u, compr_size %u, " + "uncompr_size %u, checksum %#x\n", + le32_to_cpu(meta_desc->offset), + le16_to_cpu(meta_desc->compr_size), + le16_to_cpu(meta_desc->uncompr_size), + le32_to_cpu(meta_desc->checksum)); +#endif /* CONFIG_SSDFS_DEBUG */ + + area->compressed_offset += compr_size; + le32_add_cpu(&chain_hdr->compr_bytes, compr_size); + + if (fragments_count == SSDFS_NEXT_BLK_TABLE_INDEX) { + err = ssdfs_peb_store_area_block_table(pebi, area_type, + SSDFS_MULTIPLE_HDR_CHAIN); + if (unlikely(err)) { + SSDFS_ERR("fail to store area's block table: " + "area %#x, err %d\n", + area_type, err); + return err; + } + + err = ssdfs_peb_allocate_area_block_table(pebi, + area_type); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log is full, " + "unable to add next fragments chain: " + "area %#x\n", + area_type); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to add next fragments chain: " + "area %#x\n", + area_type); + return err; + } + } + + meta_desc = ssdfs_peb_get_area_free_frag_desc(pebi, area_type); + if (IS_ERR(meta_desc)) { + SSDFS_ERR("fail to get vacant fragment descriptor: " + "err %d\n", + (int)PTR_ERR(meta_desc)); + return PTR_ERR(meta_desc); + } else if (!meta_desc) { + SSDFS_ERR("fail to get fragment descriptor: " + "area_type %#x\n", + area_type); + return -ERANGE; + } + + meta_desc->offset = cpu_to_le32(area->compressed_offset); + meta_desc->compr_size = cpu_to_le16(0); + meta_desc->uncompr_size = cpu_to_le16(0); + meta_desc->checksum = 0; + + if (area->metadata.sequence_id == U8_MAX) + area->metadata.sequence_id = 0; + + meta_desc->sequence_id = area->metadata.sequence_id++; + + meta_desc->magic = SSDFS_FRAGMENT_DESC_MAGIC; + meta_desc->type = fragment_type; + meta_desc->flags = SSDFS_FRAGMENT_HAS_CSUM; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("old_offset %u, write_offset %u, bytes_count %u\n", + old_offset, area->compressed_offset, bytes_count); + SSDFS_DBG("fragments_count %u, fragment (offset %u, " + "compr_size %u, sequence_id %u, type %#x)\n", + le16_to_cpu(chain_hdr->fragments_count), + le32_to_cpu(meta_desc->offset), + le16_to_cpu(meta_desc->compr_size), + meta_desc->sequence_id, + meta_desc->type); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + BUG_ON(bytes_count >= U16_MAX); + + meta_desc->uncompr_size = cpu_to_le16((u16)bytes_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("old_offset %u, write_offset %u, bytes_count %u\n", + old_offset, write_offset, bytes_count); + SSDFS_DBG("fragments_count %u, fragment (offset %u, " + "uncompr_size %u, sequence_id %u, type %#x)\n", + le16_to_cpu(chain_hdr->fragments_count), + le32_to_cpu(meta_desc->offset), + le16_to_cpu(meta_desc->uncompr_size), + meta_desc->sequence_id, + meta_desc->type); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + le32_add_cpu(&chain_hdr->uncompr_bytes, (u32)blk_desc_size); + + return 0; +} + +/* + * __ssdfs_peb_store_block_descriptor() - store block descriptor into area + * @pebi: pointer on PEB object + * @req: I/O request + * @blk_desc: block descriptor + * @data_off: offset to data in PEB [in] + * @desc_off: offset to block descriptor in PEB [out] + * + * This function tries to store block descriptor into dedicated area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EAGAIN - unable to get fragment descriptor. + */ +static +int __ssdfs_peb_store_block_descriptor(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + struct ssdfs_block_descriptor *blk_desc, + struct ssdfs_peb_phys_offset *data_off, + struct ssdfs_peb_phys_offset *desc_off) +{ + int area_type = SSDFS_LOG_BLK_DESC_AREA; + struct ssdfs_peb_area *area; + struct ssdfs_fragments_chain_header *chain_hdr; + struct ssdfs_fragment_desc *meta_desc; + u32 write_offset, old_offset; + u32 old_page_index, new_page_index; + size_t blk_desc_size = sizeof(struct ssdfs_block_descriptor); + u16 bytes_count; + u16 fragments_count; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !req || !blk_desc || !data_off || !desc_off); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, ino %llu, " + "logical_offset %llu, processed_blks %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + req->extent.ino, req->extent.logical_offset, + req->result.processed_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + area = &pebi->current_log.area[area_type]; + chain_hdr = &area->metadata.area.blk_desc.table.chain_hdr; + fragments_count = le16_to_cpu(chain_hdr->fragments_count); + + if (fragments_count == 0) { + meta_desc = ssdfs_peb_get_area_free_frag_desc(pebi, area_type); + if (IS_ERR(meta_desc)) { + SSDFS_ERR("fail to get current fragment descriptor: " + "err %d\n", + (int)PTR_ERR(meta_desc)); + return PTR_ERR(meta_desc); + } else if (!meta_desc) { + err = -ERANGE; + SSDFS_ERR("fail to get current fragment descriptor: " + "err %d\n", + err); + return err; + } + + meta_desc->magic = SSDFS_FRAGMENT_DESC_MAGIC; + meta_desc->type = SSDFS_DATA_BLK_DESC; + meta_desc->flags = 0; + meta_desc->offset = cpu_to_le32(area->write_offset); + meta_desc->checksum = 0; + } else { + meta_desc = ssdfs_peb_get_area_cur_frag_desc(pebi, area_type); + if (IS_ERR(meta_desc)) { + SSDFS_ERR("fail to get current fragment descriptor: " + "err %d\n", + (int)PTR_ERR(meta_desc)); + return PTR_ERR(meta_desc); + } else if (!meta_desc) { + err = -ERANGE; + SSDFS_ERR("fail to get current fragment descriptor: " + "err %d\n", + err); + return err; + } + } + + old_offset = le32_to_cpu(meta_desc->offset); + old_page_index = old_offset / PAGE_SIZE; + new_page_index = area->write_offset / PAGE_SIZE; + bytes_count = le16_to_cpu(meta_desc->compr_size); + + if (old_page_index != new_page_index && + fragments_count == SSDFS_NEXT_BLK_TABLE_INDEX) { + err = ssdfs_peb_store_area_block_table(pebi, area_type, + SSDFS_MULTIPLE_HDR_CHAIN); + if (unlikely(err)) { + SSDFS_ERR("fail to store area's block table: " + "area %#x, err %d\n", + area_type, err); + return err; + } + + err = ssdfs_peb_allocate_area_block_table(pebi, + area_type); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log is full, " + "unable to add next fragments chain: " + "area %#x\n", + area_type); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to add next fragments chain: " + "area %#x\n", + area_type); + return err; + } + } + + err = ssdfs_peb_prepare_block_descriptor(pebi, req, data_off, + blk_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare block descriptor: " + "ino %llu, logical_offset %llu, " + "processed_blks %d, err %d\n", + req->extent.ino, req->extent.logical_offset, + req->result.processed_blks, err); + return err; + } + + err = ssdfs_peb_write_block_descriptor(pebi, req, blk_desc, + data_off, desc_off, + &write_offset); + if (unlikely(err)) { + SSDFS_ERR("fail to write block descriptor: " + "ino %llu, logical_offset %llu, " + "processed_blks %d, err %d\n", + req->extent.ino, req->extent.logical_offset, + req->result.processed_blks, err); + return err; + } + + new_page_index = write_offset / PAGE_SIZE; + + if (old_page_index == new_page_index) { + bytes_count += blk_desc_size; + + BUG_ON(bytes_count >= U16_MAX); + + meta_desc->compr_size = cpu_to_le16((u16)bytes_count); + meta_desc->uncompr_size = cpu_to_le16((u16)bytes_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("old_offset %u, write_offset %u, bytes_count %u\n", + old_offset, write_offset, bytes_count); + SSDFS_DBG("fragments_count %u, fragment (offset %u, " + "compr_size %u, sequence_id %u, type %#x)\n", + le16_to_cpu(chain_hdr->fragments_count), + le32_to_cpu(meta_desc->offset), + le16_to_cpu(meta_desc->compr_size), + meta_desc->sequence_id, + meta_desc->type); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + meta_desc = ssdfs_peb_get_area_free_frag_desc(pebi, area_type); + if (IS_ERR(meta_desc)) { + SSDFS_ERR("fail to get vacant fragment descriptor: " + "err %d\n", + (int)PTR_ERR(meta_desc)); + return PTR_ERR(meta_desc); + } else if (!meta_desc) { + SSDFS_ERR("fail to get fragment descriptor: " + "area_type %#x\n", + area_type); + return -ERANGE; + } + + meta_desc->offset = cpu_to_le32(write_offset); + meta_desc->compr_size = cpu_to_le16(blk_desc_size); + meta_desc->uncompr_size = cpu_to_le16(blk_desc_size); + meta_desc->checksum = 0; + + if (area->metadata.sequence_id == U8_MAX) + area->metadata.sequence_id = 0; + + meta_desc->sequence_id = area->metadata.sequence_id++; + + meta_desc->magic = SSDFS_FRAGMENT_DESC_MAGIC; + meta_desc->type = SSDFS_DATA_BLK_DESC; + meta_desc->flags = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("old_offset %u, write_offset %u, bytes_count %u\n", + old_offset, write_offset, bytes_count); + SSDFS_DBG("fragments_count %u, fragment (offset %u, " + "compr_size %u, sequence_id %u, type %#x)\n", + le16_to_cpu(chain_hdr->fragments_count), + le32_to_cpu(meta_desc->offset), + le16_to_cpu(meta_desc->compr_size), + meta_desc->sequence_id, + meta_desc->type); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + le32_add_cpu(&chain_hdr->compr_bytes, (u32)blk_desc_size); + le32_add_cpu(&chain_hdr->uncompr_bytes, (u32)blk_desc_size); + + return 0; +} + +/* + * ssdfs_peb_store_block_descriptor() - store block descriptor into area + * @pebi: pointer on PEB object + * @req: I/O request + * @blk_desc: block descriptor + * @data_off: offset to data in PEB [in] + * @desc_off: offset to block descriptor in PEB [out] + * + * This function tries to store block descriptor into dedicated area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EAGAIN - unable to get fragment descriptor. + */ +static +int ssdfs_peb_store_block_descriptor(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req, + struct ssdfs_block_descriptor *blk_desc, + struct ssdfs_peb_phys_offset *data_off, + struct ssdfs_peb_phys_offset *desc_off) +{ + struct ssdfs_fs_info *fsi; + u16 flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !req || !blk_desc || !data_off || !desc_off); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, ino %llu, " + "logical_offset %llu, processed_blks %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + req->extent.ino, req->extent.logical_offset, + req->result.processed_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + flags = fsi->metadata_options.blk2off_tbl.flags; + + if (flags & SSDFS_BLK2OFF_TBL_MAKE_COMPRESSION) { + err = ssdfs_peb_store_compressed_block_descriptor(pebi, req, + blk_desc, + data_off, + desc_off); + } else { + err = __ssdfs_peb_store_block_descriptor(pebi, req, + blk_desc, + data_off, + desc_off); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to store block descriptor: " + "seg %llu, peb %llu, ino %llu, " + "logical_offset %llu, processed_blks %d, " + "err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + req->extent.ino, + req->extent.logical_offset, + req->result.processed_blks, + err); + } + + return err; +} + +/* + * ssdfs_peb_store_block_descriptor_offset() - store offset in blk2off table + * @pebi: pointer on PEB object + * @logical_offset: offset in pages from file's begin + * @logical_blk: segment's logical block + * @blk_desc: block descriptor + * @off: pointer on block descriptor offset + */ +static +int ssdfs_peb_store_block_descriptor_offset(struct ssdfs_peb_info *pebi, + u32 logical_offset, + u16 logical_blk, + struct ssdfs_block_descriptor *blk_desc, + struct ssdfs_peb_phys_offset *off) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_phys_offset_descriptor blk_desc_off; + struct ssdfs_blk2off_table *table; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !pebi->pebc->parent_si); + BUG_ON(!pebi->pebc->parent_si->fsi); + BUG_ON(!off); + BUG_ON(logical_blk == U16_MAX); + + SSDFS_DBG("seg %llu, peb %llu, logical_offset %u, " + "logical_blk %u, area_type %#x," + "peb_index %u, peb_page %u, byte_offset %u\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, logical_offset, logical_blk, + off->log_area, off->peb_index, + off->peb_page, off->byte_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + + blk_desc_off.page_desc.logical_offset = cpu_to_le32(logical_offset); + blk_desc_off.page_desc.logical_blk = cpu_to_le16(logical_blk); + blk_desc_off.page_desc.peb_page = cpu_to_le16(off->peb_page); + + blk_desc_off.blk_state.log_start_page = + cpu_to_le16(pebi->current_log.start_page); + blk_desc_off.blk_state.log_area = off->log_area; + blk_desc_off.blk_state.peb_migration_id = off->peb_migration_id; + blk_desc_off.blk_state.byte_offset = cpu_to_le32(off->byte_offset); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PHYS OFFSET: logical_offset %u, logical_blk %u, " + "peb_page %u, log_start_page %u, " + "log_area %u, peb_migration_id %u, " + "byte_offset %u\n", + logical_offset, logical_blk, + off->peb_page, pebi->current_log.start_page, + off->log_area, off->peb_migration_id, + off->byte_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + table = pebi->pebc->parent_si->blk2off_table; + + err = ssdfs_blk2off_table_change_offset(table, logical_blk, + off->peb_index, + blk_desc, + &blk_desc_off); + if (err == -EAGAIN) { + struct completion *end = &table->full_init_end; + + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("blk2off init failed: " + "err %d\n", err); + return err; + } + + err = ssdfs_blk2off_table_change_offset(table, logical_blk, + off->peb_index, + blk_desc, + &blk_desc_off); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to change offset: " + "logical_blk %u, err %d\n", + logical_blk, err); + return err; + } + + return 0; +} + +/* + * __ssdfs_peb_create_block() - create data block + * @pebi: pointer on PEB object + * @req: I/O request + * + * This function tries to create data block in PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_peb_create_block(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + struct ssdfs_phys_offset_descriptor *blk_desc_off = NULL; + struct ssdfs_block_descriptor blk_desc = {0}; + struct ssdfs_peb_phys_offset data_off = {0}; + struct ssdfs_peb_phys_offset desc_off = {0}; + struct ssdfs_offset_position pos = {0}; + u16 logical_block; + int processed_blks; + u64 logical_offset; + struct ssdfs_block_bmap_range range; + u32 rest_bytes, written_bytes; + u32 len; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !req); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(req->extent.data_bytes < + (req->result.processed_blks * + pebi->pebc->parent_si->fsi->pagesize)); + BUG_ON(req->place.start.seg_id != pebi->pebc->parent_si->seg_id); + BUG_ON(req->place.len >= U16_MAX); + BUG_ON(req->result.processed_blks > req->place.len); + + SSDFS_DBG("ino %llu, seg %llu, peb %llu, logical_offset %llu, " + "processed_blks %d, logical_block %u, data_bytes %u, " + "cno %llu, parent_snapshot %llu, cmd %#x, type %#x\n", + req->extent.ino, req->place.start.seg_id, pebi->peb_id, + req->extent.logical_offset, req->result.processed_blks, + req->place.start.blk_index, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot, + req->private.cmd, req->private.type); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + si = pebi->pebc->parent_si; + processed_blks = req->result.processed_blks; + logical_block = req->place.start.blk_index + processed_blks; + rest_bytes = ssdfs_request_rest_bytes(pebi, req); + logical_offset = req->extent.logical_offset + + ((u64)processed_blks * fsi->pagesize); + logical_offset /= fsi->pagesize; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, logical_block %u, " + "logical_offset %llu, " + "processed_blks %d, rest_size %u\n", + req->place.start.seg_id, pebi->peb_id, + logical_block, logical_offset, + processed_blks, rest_bytes); + + BUG_ON(logical_offset >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_reserve_block_descriptor(pebi, req); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try again to add block: " + "seg %llu, logical_block %u, peb %llu\n", + req->place.start.seg_id, logical_block, + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to reserve block descriptor: " + "seg %llu, logical_block %u, peb %llu, err %d\n", + req->place.start.seg_id, logical_block, + pebi->peb_id, err); + return err; + } + + err = ssdfs_peb_add_block_into_data_area(pebi, req, + blk_desc_off, &pos, + &data_off, + &written_bytes); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try again to add block: " + "seg %llu, logical_block %u, peb %llu\n", + req->place.start.seg_id, logical_block, + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to add block: " + "seg %llu, logical_block %u, peb %llu, err %d\n", + req->place.start.seg_id, logical_block, + pebi->peb_id, err); + return err; + } + + len = (written_bytes + fsi->pagesize - 1) >> fsi->log_pagesize; + + if (!is_ssdfs_block_full(fsi->pagesize, written_bytes)) { + err = ssdfs_segment_blk_bmap_pre_allocate(&si->blk_bmap, + pebi->pebc, + &len, + &range); + } else { + err = ssdfs_segment_blk_bmap_allocate(&si->blk_bmap, + pebi->pebc, + &len, + &range); + } + + if (err == -ENOSPC) { + SSDFS_DBG("block bitmap hasn't free space\n"); + return err; + } else if (unlikely(err || (len != range.len))) { + SSDFS_ERR("fail to allocate range: " + "seg %llu, peb %llu, " + "range (start %u, len %u), err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + range.start, range.len, err); + return err; + } + + data_off.peb_page = (u16)range.start; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u, peb_page %u\n", + logical_block, range.start); +#endif /* CONFIG_SSDFS_DEBUG */ + + SSDFS_BLK_DESC_INIT(&blk_desc); + + err = ssdfs_peb_store_block_descriptor(pebi, req, + &blk_desc, &data_off, + &desc_off); + if (unlikely(err)) { + SSDFS_ERR("fail to store block descriptor: " + "seg %llu, logical_block %u, peb %llu, err %d\n", + req->place.start.seg_id, logical_block, + pebi->peb_id, err); + return err; + } + + err = ssdfs_peb_store_block_descriptor_offset(pebi, (u32)logical_offset, + logical_block, + &blk_desc, &desc_off); + if (unlikely(err)) { + SSDFS_ERR("fail to store block descriptor offset: " + "err %d\n", + err); + return err; + } + + req->result.processed_blks += range.len; + return 0; +} + +/* + * ssdfs_peb_create_block() - create data block + * @pebi: pointer on PEB object + * @req: I/O request + * + * This function tries to create data block in PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_create_block(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !req); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(req->place.start.seg_id != pebi->pebc->parent_si->seg_id); + BUG_ON(req->place.start.blk_index >= + pebi->pebc->parent_si->fsi->pages_per_seg); + switch (req->private.class) { + case SSDFS_PEB_CREATE_DATA_REQ: + case SSDFS_PEB_CREATE_LNODE_REQ: + case SSDFS_PEB_CREATE_HNODE_REQ: + case SSDFS_PEB_CREATE_IDXNODE_REQ: + /* expected state */ + break; + default: + BUG(); + }; + BUG_ON(req->private.cmd != SSDFS_CREATE_BLOCK); + BUG_ON(req->private.type >= SSDFS_REQ_TYPE_MAX); + BUG_ON(atomic_read(&req->private.refs_count) == 0); + BUG_ON(req->extent.data_bytes > pebi->pebc->parent_si->fsi->pagesize); + BUG_ON(req->result.processed_blks > 0); + + SSDFS_DBG("ino %llu, seg %llu, peb %llu, logical_offset %llu, " + "logical_block %u, data_bytes %u, cno %llu, " + "parent_snapshot %llu, cmd %#x, type %#x\n", + req->extent.ino, req->place.start.seg_id, pebi->peb_id, + req->extent.logical_offset, req->place.start.blk_index, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot, + req->private.cmd, req->private.type); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_peb_create_block(pebi, req); + if (err == -ENOSPC) { + SSDFS_DBG("block bitmap hasn't free space\n"); + return err; + } else if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try again to create block: " + "seg %llu, logical_block %u, peb %llu\n", + req->place.start.seg_id, + req->place.start.blk_index, + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create block: " + "seg %llu, logical_block %u, peb %llu, err %d\n", + req->place.start.seg_id, + req->place.start.blk_index, + pebi->peb_id, err); + return err; + } + + return 0; +} + +/* + * ssdfs_peb_create_extent() - create extent + * @pebi: pointer on PEB object + * @req: I/O request + * + * This function tries to create extent of data blocks in PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_create_extent(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + u32 rest_bytes; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !req); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(req->place.start.seg_id != pebi->pebc->parent_si->seg_id); + BUG_ON(req->place.start.blk_index >= + pebi->pebc->parent_si->fsi->pages_per_seg); + switch (req->private.class) { + case SSDFS_PEB_CREATE_DATA_REQ: + case SSDFS_PEB_CREATE_LNODE_REQ: + case SSDFS_PEB_CREATE_HNODE_REQ: + case SSDFS_PEB_CREATE_IDXNODE_REQ: + /* expected state */ + break; + default: + BUG(); + }; + BUG_ON(req->private.cmd != SSDFS_CREATE_EXTENT); + BUG_ON(req->private.type >= SSDFS_REQ_TYPE_MAX); + BUG_ON(atomic_read(&req->private.refs_count) == 0); + + SSDFS_DBG("peb %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu" + "seg %llu, logical_block %u, cmd %#x, type %#x, " + "processed_blks %d\n", + pebi->peb_id, req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot, + req->place.start.seg_id, req->place.start.blk_index, + req->private.cmd, req->private.type, + req->result.processed_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + rest_bytes = ssdfs_request_rest_bytes(pebi, req); + + while (rest_bytes > 0) { + u32 logical_block = req->place.start.blk_index + + req->result.processed_blks; + + err = __ssdfs_peb_create_block(pebi, req); + if (err == -ENOSPC) { + SSDFS_DBG("block bitmap hasn't free space\n"); + return err; + } else if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try again to create block: " + "seg %llu, logical_block %u, peb %llu\n", + req->place.start.seg_id, logical_block, + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create block: " + "seg %llu, logical_block %u, " + "peb %llu, err %d\n", + req->place.start.seg_id, logical_block, + pebi->peb_id, err); + return err; + } + + rest_bytes = ssdfs_request_rest_bytes(pebi, req); + }; + + return 0; +} + +/* + * __ssdfs_peb_pre_allocate_extent() - pre-allocate extent + * @pebi: pointer on PEB object + * @req: I/O request + * + * This function tries to pre-allocate an extent of blocks in PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_peb_pre_allocate_extent(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + struct ssdfs_peb_phys_offset desc_off = {0}; + u16 logical_block; + int processed_blks; + u64 logical_offset; + struct ssdfs_block_bmap_range range; + u32 rest_bytes; + u32 len; + u8 id; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !req); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(req->extent.data_bytes < + (req->result.processed_blks * + pebi->pebc->parent_si->fsi->pagesize)); + BUG_ON(req->place.start.seg_id != pebi->pebc->parent_si->seg_id); + BUG_ON(req->place.len >= U16_MAX); + BUG_ON(req->result.processed_blks > req->place.len); + WARN_ON(pagevec_count(&req->result.pvec) != 0); + + SSDFS_DBG("ino %llu, seg %llu, peb %llu, logical_offset %llu, " + "processed_blks %d, logical_block %u, data_bytes %u, " + "cno %llu, parent_snapshot %llu, cmd %#x, type %#x\n", + req->extent.ino, req->place.start.seg_id, pebi->peb_id, + req->extent.logical_offset, req->result.processed_blks, + req->place.start.blk_index, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot, + req->private.cmd, req->private.type); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + si = pebi->pebc->parent_si; + processed_blks = req->result.processed_blks; + logical_block = req->place.start.blk_index + processed_blks; + rest_bytes = ssdfs_request_rest_bytes(pebi, req); + logical_offset = req->extent.logical_offset + + ((u64)processed_blks * fsi->pagesize); + logical_offset /= fsi->pagesize; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, logical_block %u, " + "logical_offset %llu, " + "processed_blks %d, rest_size %u\n", + req->place.start.seg_id, pebi->peb_id, + logical_block, logical_offset, + processed_blks, rest_bytes); + + if (req->extent.logical_offset >= U64_MAX) { + SSDFS_ERR("seg %llu, peb %llu, logical_block %u, " + "logical_offset %llu, " + "processed_blks %d, rest_size %u\n", + req->place.start.seg_id, pebi->peb_id, + logical_block, logical_offset, + processed_blks, rest_bytes); + BUG(); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + len = req->extent.data_bytes; + len -= req->result.processed_blks * si->fsi->pagesize; + len >>= fsi->log_pagesize; + + err = ssdfs_segment_blk_bmap_pre_allocate(&si->blk_bmap, + pebi->pebc, + &len, + &range); + if (err == -ENOSPC) { + SSDFS_DBG("block bitmap hasn't free space\n"); + return err; + } else if (unlikely(err || (len != range.len))) { + SSDFS_ERR("fail to allocate range: " + "seg %llu, peb %llu, " + "range (start %u, len %u), err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, + range.start, range.len, err); + return err; + } + + id = ssdfs_get_peb_migration_id_checked(pebi); + if (unlikely(id < 0)) { + SSDFS_ERR("invalid peb_migration_id: " + "seg %llu, peb_id %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, id); + return id; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(id > U8_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < range.len; i++) { + desc_off.peb_index = pebi->peb_index; + desc_off.peb_migration_id = id; + desc_off.peb_page = (u16)(range.start + i); + desc_off.log_area = SSDFS_LOG_AREA_MAX; + desc_off.byte_offset = U32_MAX; + + logical_block += i; + logical_offset += i; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u, peb_page %u\n", + logical_block, range.start + i); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_peb_store_block_descriptor_offset(pebi, + (u32)logical_offset, + logical_block, + NULL, + &desc_off); + if (unlikely(err)) { + SSDFS_ERR("fail to store block descriptor offset: " + "logical_block %u, logical_offset %llu, " + "err %d\n", + logical_block, logical_offset, err); + return err; + } + } + + req->result.processed_blks += range.len; + return 0; +} + +/* + * ssdfs_peb_pre_allocate_block() - pre-allocate block + * @pebi: pointer on PEB object + * @req: I/O request + * + * This function tries to pre-allocate a block in PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_pre_allocate_block(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !req); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(req->place.start.seg_id != pebi->pebc->parent_si->seg_id); + BUG_ON(req->place.start.blk_index >= + pebi->pebc->parent_si->fsi->pages_per_seg); + + switch (req->private.class) { + case SSDFS_PEB_PRE_ALLOCATE_DATA_REQ: + case SSDFS_PEB_PRE_ALLOCATE_LNODE_REQ: + case SSDFS_PEB_PRE_ALLOCATE_HNODE_REQ: + case SSDFS_PEB_PRE_ALLOCATE_IDXNODE_REQ: + /* expected state */ + break; + default: + SSDFS_ERR("unexpected request: " + "req->private.class %#x\n", + req->private.class); + BUG(); + }; + + switch (req->private.cmd) { + case SSDFS_CREATE_BLOCK: + /* expected state */ + break; + + default: + SSDFS_ERR("unexpected request: " + "req->private.cmd %#x\n", + req->private.cmd); + BUG(); + }; + + BUG_ON(req->private.type >= SSDFS_REQ_TYPE_MAX); + BUG_ON(atomic_read(&req->private.refs_count) == 0); + BUG_ON(req->extent.data_bytes > pebi->pebc->parent_si->fsi->pagesize); + BUG_ON(req->result.processed_blks > 0); + + SSDFS_DBG("ino %llu, seg %llu, peb %llu, logical_offset %llu, " + "logical_block %u, data_bytes %u, cno %llu, " + "parent_snapshot %llu, cmd %#x, type %#x\n", + req->extent.ino, req->place.start.seg_id, pebi->peb_id, + req->extent.logical_offset, req->place.start.blk_index, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot, + req->private.cmd, req->private.type); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_peb_pre_allocate_extent(pebi, req); + if (err == -ENOSPC) { + SSDFS_DBG("block bitmap hasn't free space\n"); + return err; + } else if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try again to pre-allocate block: " + "seg %llu, logical_block %u, peb %llu\n", + req->place.start.seg_id, + req->place.start.blk_index, + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to pre-allocate block: " + "seg %llu, logical_block %u, peb %llu, err %d\n", + req->place.start.seg_id, + req->place.start.blk_index, + pebi->peb_id, err); + return err; + } + + return 0; +} + +/* + * ssdfs_peb_pre_allocate_extent() - pre-allocate extent + * @pebi: pointer on PEB object + * @req: I/O request + * + * This function tries to pre-allocate an extent of blocks in PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_pre_allocate_extent(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !req); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + + SSDFS_DBG("peb %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu, " + "seg %llu, logical_block %u, cmd %#x, type %#x, " + "processed_blks %d\n", + pebi->peb_id, req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot, + req->place.start.seg_id, req->place.start.blk_index, + req->private.cmd, req->private.type, + req->result.processed_blks); + + BUG_ON(req->place.start.seg_id != pebi->pebc->parent_si->seg_id); + BUG_ON(req->place.start.blk_index >= + pebi->pebc->parent_si->fsi->pages_per_seg); + + switch (req->private.class) { + case SSDFS_PEB_PRE_ALLOCATE_DATA_REQ: + case SSDFS_PEB_PRE_ALLOCATE_LNODE_REQ: + case SSDFS_PEB_PRE_ALLOCATE_HNODE_REQ: + case SSDFS_PEB_PRE_ALLOCATE_IDXNODE_REQ: + /* expected state */ + break; + default: + SSDFS_ERR("unexpected request: " + "req->private.class %#x\n", + req->private.class); + BUG(); + }; + + switch (req->private.cmd) { + case SSDFS_CREATE_EXTENT: + /* expected state */ + break; + + default: + SSDFS_ERR("unexpected request: " + "req->private.cmd %#x\n", + req->private.cmd); + BUG(); + }; + + BUG_ON(req->private.type >= SSDFS_REQ_TYPE_MAX); + BUG_ON(atomic_read(&req->private.refs_count) == 0); + BUG_ON((req->extent.data_bytes / + pebi->pebc->parent_si->fsi->pagesize) < 1); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_peb_pre_allocate_extent(pebi, req); + if (err == -ENOSPC) { + SSDFS_DBG("block bitmap hasn't free space\n"); + return err; + } else if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try again to pre-allocate extent: " + "seg %llu, logical_block %u, peb %llu\n", + req->place.start.seg_id, + req->place.start.blk_index, + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to pre-allocate extent: " + "seg %llu, logical_block %u, peb %llu, " + "ino %llu, logical_offset %llu, err %d\n", + req->place.start.seg_id, + req->place.start.blk_index, + pebi->peb_id, req->extent.ino, + req->extent.logical_offset, err); + return err; + } + + return 0; +} + +/* + * ssdfs_process_create_request() - process create request + * @pebi: pointer on PEB object + * @req: request + * + * This function detects command of request and + * to call a proper function for request processing. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + */ +static +int ssdfs_process_create_request(struct ssdfs_peb_info *pebi, + struct ssdfs_segment_request *req) +{ + struct ssdfs_segment_info *si; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !req); + + SSDFS_DBG("req %p, cmd %#x, type %#x\n", + req, req->private.cmd, req->private.type); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (req->private.cmd <= SSDFS_READ_CMD_MAX || + req->private.cmd >= SSDFS_CREATE_CMD_MAX) { + SSDFS_ERR("unknown create command %d, seg %llu, peb %llu\n", + req->private.cmd, + pebi->pebc->parent_si->seg_id, + pebi->peb_id); + req->result.err = -EINVAL; + atomic_set(&req->result.state, SSDFS_REQ_FAILED); + return -EINVAL; + } + + atomic_set(&req->result.state, SSDFS_REQ_STARTED); + + switch (req->private.cmd) { + case SSDFS_CREATE_BLOCK: + switch (req->private.class) { + case SSDFS_PEB_CREATE_DATA_REQ: + case SSDFS_PEB_CREATE_LNODE_REQ: + case SSDFS_PEB_CREATE_HNODE_REQ: + case SSDFS_PEB_CREATE_IDXNODE_REQ: + err = ssdfs_peb_create_block(pebi, req); + break; + + case SSDFS_PEB_PRE_ALLOCATE_DATA_REQ: + case SSDFS_PEB_PRE_ALLOCATE_LNODE_REQ: + case SSDFS_PEB_PRE_ALLOCATE_HNODE_REQ: + case SSDFS_PEB_PRE_ALLOCATE_IDXNODE_REQ: + err = ssdfs_peb_pre_allocate_block(pebi, req); + break; + + default: + BUG(); + } + + if (err == -ENOSPC) { + SSDFS_DBG("block bitmap hasn't free space\n"); + return err; + } else if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try again to create block: " + "seg %llu, peb %llu\n", + req->place.start.seg_id, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + ssdfs_fs_error(pebi->pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to create block: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + case SSDFS_CREATE_EXTENT: + switch (req->private.class) { + case SSDFS_PEB_CREATE_DATA_REQ: + case SSDFS_PEB_CREATE_LNODE_REQ: + case SSDFS_PEB_CREATE_HNODE_REQ: + case SSDFS_PEB_CREATE_IDXNODE_REQ: + err = ssdfs_peb_create_extent(pebi, req); + break; + + case SSDFS_PEB_PRE_ALLOCATE_DATA_REQ: + case SSDFS_PEB_PRE_ALLOCATE_LNODE_REQ: + case SSDFS_PEB_PRE_ALLOCATE_HNODE_REQ: + case SSDFS_PEB_PRE_ALLOCATE_IDXNODE_REQ: + err = ssdfs_peb_pre_allocate_extent(pebi, req); + break; + + default: + BUG(); + } + + if (err == -ENOSPC) { + SSDFS_DBG("block bitmap hasn't free space\n"); + return err; + } else if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try again to create extent: " + "seg %llu, peb %llu\n", + req->place.start.seg_id, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + ssdfs_fs_error(pebi->pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to create extent: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + case SSDFS_MIGRATE_ZONE_USER_BLOCK: + switch (req->private.class) { + case SSDFS_ZONE_USER_DATA_MIGRATE_REQ: + err = ssdfs_peb_create_block(pebi, req); + break; + + default: + BUG(); + } + + if (err == -ENOSPC) { + SSDFS_DBG("block bitmap hasn't free space\n"); + return err; + } else if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try again to migrate block: " + "seg %llu, peb %llu\n", + req->place.start.seg_id, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + ssdfs_fs_error(pebi->pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to migrate block: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + case SSDFS_MIGRATE_ZONE_USER_EXTENT: + switch (req->private.class) { + case SSDFS_ZONE_USER_DATA_MIGRATE_REQ: + err = ssdfs_peb_create_extent(pebi, req); + break; + + default: + BUG(); + } + + if (err == -ENOSPC) { + SSDFS_DBG("block bitmap hasn't free space\n"); + return err; + } else if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try again to migrate extent: " + "seg %llu, peb %llu\n", + req->place.start.seg_id, pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + ssdfs_fs_error(pebi->pebc->parent_si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to migrate extent: " + "seg %llu, peb %llu, err %d\n", + pebi->pebc->parent_si->seg_id, + pebi->peb_id, err); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + default: + BUG(); + } + + if (unlikely(err)) { + /* request failed */ + atomic_set(&req->result.state, SSDFS_REQ_FAILED); + } else if ((req->private.class == SSDFS_PEB_CREATE_DATA_REQ || + req->private.class == SSDFS_ZONE_USER_DATA_MIGRATE_REQ) && + is_ssdfs_peb_containing_user_data(pebi->pebc)) { + int processed_blks = req->result.processed_blks; + u32 pending = 0; + + si = pebi->pebc->parent_si; + + spin_lock(&si->pending_lock); + pending = si->pending_new_user_data_pages; + if (si->pending_new_user_data_pages >= processed_blks) { + si->pending_new_user_data_pages -= processed_blks; + pending = si->pending_new_user_data_pages; + } else { + /* wrong accounting */ + err = -ERANGE; + } + spin_unlock(&si->pending_lock); + + if (unlikely(err)) { + SSDFS_ERR("pending %u < processed_blks %d\n", + pending, processed_blks); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, pending %u, processed_blks %d\n", + si->seg_id, pending, processed_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + } + } + + return err; +} + /* * ssdfs_peb_read_from_offset() - read in buffer from offset * @pebi: pointer on PEB object From patchwork Sat Feb 25 01:08:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151938 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A273EC6FA8E for ; Sat, 25 Feb 2023 01:17:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229767AbjBYBRp (ORCPT ); Fri, 24 Feb 2023 20:17:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229712AbjBYBQv (ORCPT ); Fri, 24 Feb 2023 20:16:51 -0500 Received: from mail-oi1-x234.google.com (mail-oi1-x234.google.com [IPv6:2607:f8b0:4864:20::234]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 68AB112BE7 for ; Fri, 24 Feb 2023 17:16:46 -0800 (PST) Received: by mail-oi1-x234.google.com with SMTP id bk32so763225oib.10 for ; Fri, 24 Feb 2023 17:16:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=OJkyFKj+c87Orteyx/xPl4EwjGVyceqGzuVp4SNE7lQ=; b=AsllaPkOO5y/L04fMPqr5KwGlCKyIQatKB5WokW6OigwcDWv/c/vLVMg9SmTdhR2VW /Tj0W2hp9OERLHrwTUHo40eDuRSDW7cOLIMg0BED8YWuOFQpbb0htIAwy/XRFaLHBdsY xbem9mM8/qOIP7Vfk1lTAa5g7HpWAsGZAa66iC6BOAd3nQ6gy4GXD/cdGu7mYyo2RnBZ 7sgMdUY4SLne8cT73tCDRhqQ7K9eKdshi5ZsCKv8yUWPT8PT/Xbw+8sfcNtZVISyRB/X YfTX4MQzzSFeyQF23CRrnKYLYEGlEO6da1Cin+vcS4XpbMFfhvNPMUutjyvUv2qCEBbP KltA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OJkyFKj+c87Orteyx/xPl4EwjGVyceqGzuVp4SNE7lQ=; b=Xomjk0YZCIbt0Y/hhpV7zF8ZB51z6pDSSq/ltbQ47JYXapQukK0gVspyLuV98yu2m0 4/DvfVqgDFpzvQzchnByIQkNKh1UpcXpVoZ7P18GoQmak4/cGpIM1jWay+z6+u6nAjFg fVfg6lf5FRlX3McAEGij2P97bnXTQCKyTzbWmWKJ6s9eVUf7d1/5MMyKO1mTotynfcDx mHZ1sACu0o5LqiP6HpAG2TNCK4P50KRxpG8PNyXFZ0Kw7Se29F1PB5JNWuGbVbujf+hL 5a1l5WcS7LnoqL0yM2KU+GKS2sMITi61mSQaQN2sLq/1xU8pKdcKfwrQgwyZGiS/bEI8 tD9g== X-Gm-Message-State: AO0yUKV/jk88tYmewX+ezZth2M5x/mr1Qujy+2h/d+47xx6Mplnbvjlq lCAEYMukgNkv9MeT0s9Bq5/6lW9+TSmQIQGF X-Google-Smtp-Source: AK7set8s1u2V5Omf/zqmjQfJZyzGD+4SF47870L6tIZ+6tN3qsgIi5SI+U9rVepLGssWf28uXhPSHg== X-Received: by 2002:a05:6808:6384:b0:383:c9f8:5613 with SMTP id ec4-20020a056808638400b00383c9f85613mr5822352oib.4.1677287804760; Fri, 24 Feb 2023 17:16:44 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:43 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 33/76] ssdfs: create log logic Date: Fri, 24 Feb 2023 17:08:44 -0800 Message-Id: <20230225010927.813929-34-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This patch contains logic of log creation after the log commit operation or during PEB container object initialization. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/peb_flush_thread.c | 2710 +++++++++++++++++++++++++++++++++++ 1 file changed, 2710 insertions(+) diff --git a/fs/ssdfs/peb_flush_thread.c b/fs/ssdfs/peb_flush_thread.c index 857270e0cbf0..4007cb6ff32d 100644 --- a/fs/ssdfs/peb_flush_thread.c +++ b/fs/ssdfs/peb_flush_thread.c @@ -228,6 +228,2716 @@ struct ssdfs_pagevec_descriptor { * FLUSH THREAD FUNCTIONALITY * ******************************************************************************/ +/* + * __ssdfs_peb_estimate_blk_bmap_bytes() - estimate block bitmap's bytes + * @bits_count: bits count in bitmap + * @is_migrating: is PEB migrating? + */ +static inline +int __ssdfs_peb_estimate_blk_bmap_bytes(u32 bits_count, bool is_migrating) +{ + size_t blk_bmap_hdr_size = sizeof(struct ssdfs_block_bitmap_header); + size_t blk_bmap_frag_hdr_size = sizeof(struct ssdfs_block_bitmap_fragment); + size_t frag_desc_size = sizeof(struct ssdfs_fragment_desc); + size_t blk_bmap_bytes; + int reserved_bytes = 0; + + blk_bmap_bytes = BLK_BMAP_BYTES(bits_count); + + reserved_bytes += blk_bmap_hdr_size; + + if (is_migrating) { + reserved_bytes += 2 * blk_bmap_frag_hdr_size; + reserved_bytes += 2 * frag_desc_size; + reserved_bytes += 2 * blk_bmap_bytes; + } else { + reserved_bytes += blk_bmap_frag_hdr_size; + reserved_bytes += frag_desc_size; + reserved_bytes += blk_bmap_bytes; + } + + return reserved_bytes; +} + +/* + * ssdfs_peb_estimate_blk_bmap_bytes() - estimate block bitmap's bytes + * @pages_per_peb: number of pages in one PEB + * @is_migrating: is PEB migrating? + * @prev_log_bmap_bytes: bytes count in block bitmap of previous log + */ +static inline +int ssdfs_peb_estimate_blk_bmap_bytes(u32 pages_per_peb, bool is_migrating, + u32 prev_log_bmap_bytes) +{ + int reserved_bytes = 0; + + reserved_bytes = __ssdfs_peb_estimate_blk_bmap_bytes(pages_per_peb, + is_migrating); + + if (prev_log_bmap_bytes < S32_MAX) { + reserved_bytes = min_t(int, reserved_bytes, + (int)(prev_log_bmap_bytes * 2)); + } + + return reserved_bytes; +} + +/* + * __ssdfs_peb_estimate_blk2off_bytes() - estimate blk2off table's bytes + * @items_number: number of allocated logical blocks + * @pebs_per_seg: number of PEBs in one segment + */ +static inline +int __ssdfs_peb_estimate_blk2off_bytes(u32 items_number, u32 pebs_per_seg) +{ + size_t blk2off_tbl_hdr_size = sizeof(struct ssdfs_blk2off_table_header); + size_t pot_tbl_hdr_size = sizeof(struct ssdfs_phys_offset_table_header); + size_t phys_off_desc_size = sizeof(struct ssdfs_phys_offset_descriptor); + int reserved_bytes = 0; + + reserved_bytes += blk2off_tbl_hdr_size; + reserved_bytes += pot_tbl_hdr_size; + reserved_bytes += (phys_off_desc_size * items_number) * pebs_per_seg; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_number %u, pebs_per_seg %u, " + "reserved_bytes %d\n", + items_number, pebs_per_seg, reserved_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + return reserved_bytes; +} + +/* + * ssdfs_peb_estimate_blk2off_bytes() - estimate blk2off table's bytes + * @log_pages: number of pages in the full log + * @pebs_per_seg: number of PEBs in one segment + * @log_start_page: start page of the log + * @pages_per_peb: number of pages per PEB + */ +static inline +int ssdfs_peb_estimate_blk2off_bytes(u16 log_pages, u32 pebs_per_seg, + u16 log_start_page, u32 pages_per_peb) +{ + u32 items_number; + + items_number = min_t(u32, log_pages - (log_start_page % log_pages), + pages_per_peb - log_start_page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_number %u, log_pages %u, " + "pages_per_peb %u, log_start_page %u\n", + items_number, log_pages, + pages_per_peb, log_start_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_peb_estimate_blk2off_bytes(items_number, pebs_per_seg); +} + +/* + * __ssdfs_peb_estimate_blk_desc_tbl_bytes() - estimate block desc table's bytes + * @items_number: number of allocated logical blocks + */ +static inline +int __ssdfs_peb_estimate_blk_desc_tbl_bytes(u32 items_number) +{ + size_t blk_desc_tbl_hdr_size = sizeof(struct ssdfs_area_block_table); + size_t blk_desc_size = sizeof(struct ssdfs_block_descriptor); + int reserved_bytes = 0; + + reserved_bytes += blk_desc_tbl_hdr_size; + reserved_bytes += blk_desc_size * items_number; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_number %u, reserved_bytes %d\n", + items_number, reserved_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + return reserved_bytes; +} + +/* + * ssdfs_peb_estimate_blk_desc_tbl_bytes() - estimate block desc table's bytes + * @log_pages: number of pages in the full log + * @log_start_page: start page of the log + * @pages_per_peb: number of pages per PEB + */ +static inline +int ssdfs_peb_estimate_blk_desc_tbl_bytes(u16 log_pages, + u16 log_start_page, + u32 pages_per_peb) +{ + u32 items_number; + int reserved_bytes = 0; + + items_number = min_t(u32, + log_pages - (log_start_page % log_pages), + pages_per_peb - log_start_page); + + reserved_bytes = __ssdfs_peb_estimate_blk_desc_tbl_bytes(items_number); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log_pages %u, log_start_page %u, " + "pages_per_peb %u, items_number %u, " + "reserved_bytes %d\n", + log_pages, log_start_page, + pages_per_peb, items_number, + reserved_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + return reserved_bytes; +} + +/* + * ssdfs_peb_estimate_reserved_metapages() - estimate reserved metapages in log + * @page_size: size of page in bytes + * @pages_per_peb: number of pages in one PEB + * @log_pages: number of pages in the full log + * @pebs_per_seg: number of PEBs in one segment + * @is_migrating: is PEB migrating? + */ +u16 ssdfs_peb_estimate_reserved_metapages(u32 page_size, u32 pages_per_peb, + u16 log_pages, u32 pebs_per_seg, + bool is_migrating) +{ + size_t seg_hdr_size = sizeof(struct ssdfs_segment_header); + size_t lf_hdr_size = sizeof(struct ssdfs_log_footer); + u32 reserved_bytes = 0; + u32 reserved_pages = 0; + + /* segment header */ + reserved_bytes += seg_hdr_size; + + /* block bitmap */ + reserved_bytes += ssdfs_peb_estimate_blk_bmap_bytes(pages_per_peb, + is_migrating, + U32_MAX); + + /* blk2off table */ + reserved_bytes += ssdfs_peb_estimate_blk2off_bytes(log_pages, + pebs_per_seg, + 0, pages_per_peb); + + /* block descriptor table */ + reserved_bytes += ssdfs_peb_estimate_blk_desc_tbl_bytes(log_pages, 0, + pages_per_peb); + + reserved_bytes += page_size - 1; + reserved_bytes /= page_size; + reserved_bytes *= page_size; + + /* log footer header */ + reserved_bytes += lf_hdr_size; + + /* block bitmap */ + reserved_bytes += ssdfs_peb_estimate_blk_bmap_bytes(pages_per_peb, + is_migrating, + U32_MAX); + + /* blk2off table */ + reserved_bytes += ssdfs_peb_estimate_blk2off_bytes(log_pages, + pebs_per_seg, + 0, pages_per_peb); + + reserved_bytes += page_size - 1; + reserved_bytes /= page_size; + reserved_bytes *= page_size; + + reserved_pages = reserved_bytes / page_size; + + BUG_ON(reserved_pages >= U16_MAX); + + return reserved_pages; +} + +/* + * ssdfs_peb_blk_bmap_reserved_bytes() - calculate block bitmap's reserved bytes + * @pebi: pointer on PEB object + */ +static inline +int ssdfs_peb_blk_bmap_reserved_bytes(struct ssdfs_peb_info *pebi) +{ + struct ssdfs_peb_container *pebc = pebi->pebc; + struct ssdfs_segment_info *si = pebc->parent_si; + struct ssdfs_fs_info *fsi = si->fsi; + u32 pages_per_peb = fsi->pages_per_peb; + bool is_migrating = false; + u32 prev_log_bmap_bytes; + + switch (atomic_read(&pebc->migration_state)) { + case SSDFS_PEB_MIGRATION_PREPARATION: + case SSDFS_PEB_RELATION_PREPARATION: + case SSDFS_PEB_UNDER_MIGRATION: + is_migrating = true; + break; + + default: + is_migrating = false; + break; + } + + prev_log_bmap_bytes = pebi->current_log.prev_log_bmap_bytes; + + return ssdfs_peb_estimate_blk_bmap_bytes(pages_per_peb, is_migrating, + prev_log_bmap_bytes); +} + +/* + * ssdfs_peb_blk2off_reserved_bytes() - calculate blk2off table's reserved bytes + * @pebi: pointer on PEB object + */ +static inline +int ssdfs_peb_blk2off_reserved_bytes(struct ssdfs_peb_info *pebi) +{ + struct ssdfs_peb_container *pebc = pebi->pebc; + struct ssdfs_segment_info *si = pebc->parent_si; + struct ssdfs_fs_info *fsi = si->fsi; + u32 pebs_per_seg = fsi->pebs_per_seg; + u16 log_pages = pebi->log_pages; + u32 pages_per_peb = fsi->pages_per_peb; + u16 log_start_page = pebi->current_log.start_page; + + return ssdfs_peb_estimate_blk2off_bytes(log_pages, pebs_per_seg, + log_start_page, pages_per_peb); +} + +/* + * ssdfs_peb_blk_desc_tbl_reserved_bytes() - calculate block desc reserved bytes + * @pebi: pointer on PEB object + */ +static inline +int ssdfs_peb_blk_desc_tbl_reserved_bytes(struct ssdfs_peb_info *pebi) +{ + struct ssdfs_peb_container *pebc = pebi->pebc; + struct ssdfs_segment_info *si = pebc->parent_si; + struct ssdfs_fs_info *fsi = si->fsi; + u16 log_pages = pebi->log_pages; + u32 pages_per_peb = fsi->pages_per_peb; + u16 log_start_page = pebi->current_log.start_page; + + return ssdfs_peb_estimate_blk_desc_tbl_bytes(log_pages, + log_start_page, + pages_per_peb); +} + +/* + * ssdfs_peb_log_footer_reserved_bytes() - calculate log footer's reserved bytes + * @pebi: pointer on PEB object + */ +static inline +u32 ssdfs_peb_log_footer_reserved_bytes(struct ssdfs_peb_info *pebi) +{ + struct ssdfs_peb_container *pebc = pebi->pebc; + struct ssdfs_segment_info *si = pebc->parent_si; + struct ssdfs_fs_info *fsi = si->fsi; + size_t lf_hdr_size = sizeof(struct ssdfs_log_footer); + u32 page_size = fsi->pagesize; + u32 reserved_bytes = 0; + + /* log footer header */ + reserved_bytes = lf_hdr_size; + + /* block bitmap */ + reserved_bytes += atomic_read(&pebi->reserved_bytes.blk_bmap); + + /* blk2off table */ + reserved_bytes += atomic_read(&pebi->reserved_bytes.blk2off_tbl); + + reserved_bytes += page_size - 1; + reserved_bytes /= page_size; + reserved_bytes *= page_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("block_bitmap %d, blk2off_table %d, " + "reserved_bytes %u\n", + atomic_read(&pebi->reserved_bytes.blk_bmap), + atomic_read(&pebi->reserved_bytes.blk2off_tbl), + reserved_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + return reserved_bytes; +} + +/* + * ssdfs_peb_log_footer_metapages() - calculate log footer's metadata pages + * @pebi: pointer on PEB object + */ +static inline +u32 ssdfs_peb_log_footer_metapages(struct ssdfs_peb_info *pebi) +{ + struct ssdfs_peb_container *pebc = pebi->pebc; + struct ssdfs_segment_info *si = pebc->parent_si; + struct ssdfs_fs_info *fsi = si->fsi; + u32 page_size = fsi->pagesize; + u32 reserved_pages = 0; + + reserved_pages = ssdfs_peb_log_footer_reserved_bytes(pebi) / page_size; + + BUG_ON(reserved_pages >= U16_MAX); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("reserved_pages %u\n", reserved_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + return reserved_pages; +} + +/* + * ssdfs_peb_define_reserved_metapages() - calculate reserved metadata pages + * @pebi: pointer on PEB object + */ +static +u16 ssdfs_peb_define_reserved_metapages(struct ssdfs_peb_info *pebi) +{ + struct ssdfs_peb_container *pebc = pebi->pebc; + struct ssdfs_segment_info *si = pebc->parent_si; + struct ssdfs_fs_info *fsi = si->fsi; + u32 reserved_bytes = 0; + u32 reserved_pages = 0; + size_t seg_hdr_size = sizeof(struct ssdfs_segment_header); + u32 page_size = fsi->pagesize; + u32 offset; + u32 blk_desc_reserved; + + /* segment header */ + reserved_bytes += seg_hdr_size; + + /* block bitmap */ + atomic_set(&pebi->reserved_bytes.blk_bmap, + ssdfs_peb_blk_bmap_reserved_bytes(pebi)); + reserved_bytes += atomic_read(&pebi->reserved_bytes.blk_bmap); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pebi->reserved_bytes.blk_bmap %d\n", + atomic_read(&pebi->reserved_bytes.blk_bmap)); +#endif /* CONFIG_SSDFS_DEBUG */ + + /* blk2off table */ + atomic_set(&pebi->reserved_bytes.blk2off_tbl, + ssdfs_peb_blk2off_reserved_bytes(pebi)); + reserved_bytes += atomic_read(&pebi->reserved_bytes.blk2off_tbl); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pebi->reserved_bytes.blk2off_tbl %d\n", + atomic_read(&pebi->reserved_bytes.blk2off_tbl)); +#endif /* CONFIG_SSDFS_DEBUG */ + + /* block descriptor table */ + offset = reserved_bytes; + blk_desc_reserved = ssdfs_peb_blk_desc_tbl_reserved_bytes(pebi); + atomic_set(&pebi->reserved_bytes.blk_desc_tbl, blk_desc_reserved); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pebi->reserved_bytes.blk_desc_tbl %d\n", + atomic_read(&pebi->reserved_bytes.blk_desc_tbl)); +#endif /* CONFIG_SSDFS_DEBUG */ + + reserved_bytes += atomic_read(&pebi->reserved_bytes.blk_desc_tbl); + + reserved_bytes += page_size - 1; + reserved_bytes /= page_size; + reserved_bytes *= page_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("reserved_bytes %u, offset %u\n", + reserved_bytes, offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + reserved_bytes += ssdfs_peb_log_footer_reserved_bytes(pebi); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("reserved_bytes %u\n", reserved_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + reserved_pages = reserved_bytes / page_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("reserved_pages %u\n", reserved_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + BUG_ON(reserved_pages >= U16_MAX); + + return reserved_pages; +} + +/* + * ssdfs_peb_reserve_blk_desc_space() - reserve space for block descriptors + * @pebi: pointer on PEB object + * @metadata: pointer on area's metadata + * + * This function tries to reserve space for block descriptors. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate the memory. + */ +static +int ssdfs_peb_reserve_blk_desc_space(struct ssdfs_peb_info *pebi, + struct ssdfs_peb_area_metadata *metadata) +{ + struct ssdfs_page_array *area_pages; + size_t blk_desc_tbl_hdr_size = sizeof(struct ssdfs_area_block_table); + size_t blk_desc_size = sizeof(struct ssdfs_block_descriptor); + size_t count; + int buf_size; + struct page *page; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb %llu, current_log.start_page %u\n", + pebi->peb_id, pebi->current_log.start_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + buf_size = atomic_read(&pebi->reserved_bytes.blk_desc_tbl); + + if (buf_size <= blk_desc_tbl_hdr_size) { + SSDFS_ERR("invalid reserved_size %d\n", + atomic_read(&pebi->reserved_bytes.blk_desc_tbl)); + return -ERANGE; + } + + buf_size -= blk_desc_tbl_hdr_size; + + if (buf_size < blk_desc_size) { + SSDFS_ERR("invalid reserved_size %d\n", + buf_size); + return -ERANGE; + } + + count = buf_size / blk_desc_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("buf_size %d, blk_desc_size %zu, count %zu\n", + buf_size, blk_desc_size, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + area_pages = &pebi->current_log.area[SSDFS_LOG_BLK_DESC_AREA].array; + + page = ssdfs_page_array_grab_page(area_pages, 0); + if (IS_ERR_OR_NULL(page)) { + SSDFS_ERR("fail to add page into area space\n"); + return -ENOMEM; + } + + ssdfs_memzero_page(page, 0, PAGE_SIZE, PAGE_SIZE); + + ssdfs_set_page_private(page, 0); + ssdfs_put_page(page); + ssdfs_unlock_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + metadata->area.blk_desc.items_count = 0; + metadata->area.blk_desc.capacity = count; + + return 0; +} + +/* + * ssdfs_peb_estimate_min_partial_log_pages() - estimate min partial log size + * @pebi: pointer on PEB object + */ +u16 ssdfs_peb_estimate_min_partial_log_pages(struct ssdfs_peb_info *pebi) +{ + struct ssdfs_peb_container *pebc = pebi->pebc; + struct ssdfs_segment_info *si = pebc->parent_si; + struct ssdfs_fs_info *fsi = si->fsi; + u32 reserved_bytes = 0; + u32 reserved_pages = 0; + size_t pl_hdr_size = sizeof(struct ssdfs_partial_log_header); + u32 page_size = fsi->pagesize; + size_t lf_hdr_size = sizeof(struct ssdfs_log_footer); + + /* partial log header */ + reserved_bytes += pl_hdr_size; + + /* block bitmap */ + reserved_bytes += ssdfs_peb_blk_bmap_reserved_bytes(pebi); + + /* blk2off table */ + reserved_bytes += ssdfs_peb_blk2off_reserved_bytes(pebi); + + /* block descriptor table */ + reserved_bytes += ssdfs_peb_blk_desc_tbl_reserved_bytes(pebi); + + reserved_bytes += page_size - 1; + reserved_bytes /= page_size; + reserved_bytes *= page_size; + + /* log footer header */ + reserved_bytes += lf_hdr_size; + + /* block bitmap */ + reserved_bytes += ssdfs_peb_blk_bmap_reserved_bytes(pebi); + + /* blk2off table */ + reserved_bytes += ssdfs_peb_blk2off_reserved_bytes(pebi); + + reserved_bytes += page_size - 1; + reserved_bytes /= page_size; + reserved_bytes *= page_size; + + reserved_pages = reserved_bytes / page_size; + + BUG_ON(reserved_pages >= U16_MAX); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("reserved_pages %u, reserved_bytes %u, " + "blk_bmap_reserved_bytes %d, " + "blk2off_reserved_bytes %d, " + "blk_desc_tbl_reserved_bytes %d\n", + reserved_pages, reserved_bytes, + ssdfs_peb_blk_bmap_reserved_bytes(pebi), + ssdfs_peb_blk2off_reserved_bytes(pebi), + ssdfs_peb_blk_desc_tbl_reserved_bytes(pebi)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return reserved_pages; +} + +enum { + SSDFS_START_FULL_LOG, + SSDFS_START_PARTIAL_LOG, + SSDFS_CONTINUE_PARTIAL_LOG, + SSDFS_FINISH_PARTIAL_LOG, + SSDFS_FINISH_FULL_LOG +}; + +/* + * is_log_partial() - should the next log be partial? + * @pebi: pointer on PEB object + */ +static inline +int is_log_partial(struct ssdfs_peb_info *pebi) +{ + u16 log_pages; + u16 free_data_pages; + u16 reserved_pages; + u16 min_partial_log_pages; + int sequence_id; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); +#endif /* CONFIG_SSDFS_DEBUG */ + + log_pages = pebi->log_pages; + free_data_pages = pebi->current_log.free_data_pages; + reserved_pages = pebi->current_log.reserved_pages; + sequence_id = atomic_read(&pebi->current_log.sequence_id); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log_pages %u, free_data_pages %u, " + "reserved_pages %u, sequence_id %d\n", + log_pages, free_data_pages, + reserved_pages, sequence_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (free_data_pages == 0) { + if (sequence_id > 0) + return SSDFS_FINISH_PARTIAL_LOG; + else + return SSDFS_FINISH_FULL_LOG; + } + + if (free_data_pages >= log_pages) + return SSDFS_START_FULL_LOG; + + min_partial_log_pages = ssdfs_peb_estimate_min_partial_log_pages(pebi); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("min_partial_log_pages %u, reserved_pages %u\n", + min_partial_log_pages, reserved_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (reserved_pages == 0) { + if (free_data_pages <= min_partial_log_pages) { + if (sequence_id > 0) + return SSDFS_FINISH_PARTIAL_LOG; + else + return SSDFS_FINISH_FULL_LOG; + } + } else { + u32 available_pages = free_data_pages + reserved_pages; + + if (available_pages <= min_partial_log_pages) { + if (sequence_id > 0) + return SSDFS_FINISH_PARTIAL_LOG; + else + return SSDFS_FINISH_FULL_LOG; + } else if (free_data_pages < min_partial_log_pages) { + /* + * Next partial log cannot be created + */ + if (sequence_id > 0) + return SSDFS_FINISH_PARTIAL_LOG; + else + return SSDFS_FINISH_FULL_LOG; + } + } + + if (sequence_id == 0) + return SSDFS_START_PARTIAL_LOG; + + return SSDFS_CONTINUE_PARTIAL_LOG; +} + +/* + * ssdfs_peb_create_log() - create new log + * @pebi: pointer on PEB object + * + * This function tries to create new log in page cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - PEB is full. + * %-EIO - area contain dirty (not committed) pages. + * %-EAGAIN - current log is not initialized. + */ +static +int ssdfs_peb_create_log(struct ssdfs_peb_info *pebi) +{ + struct ssdfs_segment_info *si; + struct ssdfs_peb_log *log; + struct ssdfs_metadata_options *options; + int log_state; + int log_strategy; + u32 pages_per_peb; + u32 log_footer_pages; + int compr_type; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = pebi->pebc->parent_si; + log_state = atomic_read(&pebi->current_log.state); + + switch (log_state) { + case SSDFS_LOG_UNKNOWN: + case SSDFS_LOG_PREPARED: + SSDFS_ERR("peb %llu current log is not initialized\n", + pebi->peb_id); + return -ERANGE; + + case SSDFS_LOG_INITIALIZED: + case SSDFS_LOG_COMMITTED: + /* do function's work */ + break; + + case SSDFS_LOG_CREATED: + SSDFS_WARN("peb %llu current log is not initialized\n", + pebi->peb_id); + return -ERANGE; + + default: + BUG(); + }; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg %llu, peb %llu, current_log.start_page %u\n", + si->seg_id, pebi->peb_id, + pebi->current_log.start_page); +#else + SSDFS_DBG("seg %llu, peb %llu, current_log.start_page %u\n", + si->seg_id, pebi->peb_id, + pebi->current_log.start_page); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ssdfs_peb_current_log_lock(pebi); + + log = &pebi->current_log; + pages_per_peb = min_t(u32, si->fsi->leb_pages_capacity, + si->fsi->peb_pages_capacity); + + /* + * Start page of the next log should be defined during commit. + * It needs to check this value here only. + */ + + if (log->start_page >= pages_per_peb) { + SSDFS_ERR("current_log.start_page %u >= pages_per_peb %u\n", + log->start_page, pages_per_peb); + err = -ENOSPC; + goto finish_log_create; + } + + log_strategy = is_log_partial(pebi); + + switch (log_strategy) { + case SSDFS_START_FULL_LOG: + if ((log->start_page + log->free_data_pages) % pebi->log_pages) { + SSDFS_WARN("unexpected state: " + "log->start_page %u, " + "log->free_data_pages %u, " + "pebi->log_pages %u\n", + log->start_page, + log->free_data_pages, + pebi->log_pages); + } + + log->reserved_pages = ssdfs_peb_define_reserved_metapages(pebi); + break; + + case SSDFS_START_PARTIAL_LOG: + log->reserved_pages = ssdfs_peb_define_reserved_metapages(pebi); + break; + + case SSDFS_CONTINUE_PARTIAL_LOG: + log->reserved_pages = ssdfs_peb_define_reserved_metapages(pebi); + log_footer_pages = ssdfs_peb_log_footer_metapages(pebi); + log->reserved_pages -= log_footer_pages; + break; + + case SSDFS_FINISH_PARTIAL_LOG: + case SSDFS_FINISH_FULL_LOG: + if (log->free_data_pages == 0) { + err = -ENOSPC; + SSDFS_ERR("seg %llu, peb %llu, " + "start_page %u, free_data_pages %u\n", + si->seg_id, pebi->peb_id, + log->start_page, log->free_data_pages); + goto finish_log_create; + } else { + log->reserved_pages = + ssdfs_peb_define_reserved_metapages(pebi); + log_footer_pages = + ssdfs_peb_log_footer_metapages(pebi); + /* + * The reserved pages imply presence of header + * and footer. However, it needs to add the page + * for data itself. If header's page is able + * to keep the data too then footer will be in + * the log. Otherwise, footer will be absent. + */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log_footer_pages %u, log->reserved_pages %u\n", + log_footer_pages, log->reserved_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + log->free_data_pages += log_footer_pages; + } + break; + + default: + err = -ERANGE; + SSDFS_CRIT("unexpected log strategy %#x\n", + log_strategy); + goto finish_log_create; + } + + if (log->free_data_pages < log->reserved_pages) { + err = -ENOSPC; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log->free_data_pages %u < log->reserved_pages %u\n", + log->free_data_pages, log->reserved_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_log_create; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("log_strategy %#x, free_data_pages %u, reserved_pages %u\n", + log_strategy, log->free_data_pages, log->reserved_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_segment_blk_bmap_reserve_metapages(&si->blk_bmap, + pebi->pebc, + log->reserved_pages); + if (err == -ENOSPC) { + /* + * The goal of reservation is to decrease the number of + * free logical blocks because some PEB's space is used + * for the metadata. Such decreasing prevents from + * allocation of logical blocks out of physically + * available space in the PEB. However, if no space + * for reservation but there are some physical pages + * for logs creation then the operation of reservation + * can be simply ignored. Because, current log's + * metadata structure manages the real available + * space in the PEB. + */ + err = 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to reserve metadata pages: " + "count %u, err %d\n", + log->reserved_pages, err); + goto finish_log_create; + } + + log->free_data_pages -= log->reserved_pages; + pebi->current_log.seg_flags = 0; + + for (i = 0; i < SSDFS_LOG_AREA_MAX; i++) { + struct ssdfs_peb_area *area; + struct ssdfs_page_array *area_pages; + struct ssdfs_peb_area_metadata *metadata; + struct ssdfs_fragments_chain_header *chain_hdr; + size_t metadata_size = sizeof(struct ssdfs_peb_area_metadata); + size_t blk_table_size = sizeof(struct ssdfs_area_block_table); + size_t desc_size = sizeof(struct ssdfs_fragment_desc); + + area = &pebi->current_log.area[i]; + area_pages = &area->array; + + if (atomic_read(&area_pages->state) == SSDFS_PAGE_ARRAY_DIRTY) { + /* + * It needs to repeat the commit. + */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PEB %llu is dirty on log creation\n", + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + err = ssdfs_page_array_release_all_pages(area_pages); + if (unlikely(err)) { + ssdfs_fs_error(si->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to release pages: " + "PEB %llu\n", + pebi->peb_id); + err = -EIO; + goto finish_log_create; + } + } + + metadata = &area->metadata; + + switch (i) { + case SSDFS_LOG_BLK_DESC_AREA: + memset(&metadata->area.blk_desc.table, + 0, sizeof(struct ssdfs_area_block_table)); + chain_hdr = &metadata->area.blk_desc.table.chain_hdr; + chain_hdr->desc_size = cpu_to_le16(desc_size); + chain_hdr->magic = SSDFS_CHAIN_HDR_MAGIC; + + options = &si->fsi->metadata_options; + compr_type = options->blk2off_tbl.compression; + + switch (compr_type) { + case SSDFS_BLK2OFF_TBL_NOCOMPR_TYPE: + chain_hdr->type = SSDFS_BLK_DESC_CHAIN_HDR; + break; + case SSDFS_BLK2OFF_TBL_ZLIB_COMPR_TYPE: + chain_hdr->type = SSDFS_BLK_DESC_ZLIB_CHAIN_HDR; + break; + case SSDFS_BLK2OFF_TBL_LZO_COMPR_TYPE: + chain_hdr->type = SSDFS_BLK_DESC_LZO_CHAIN_HDR; + break; + default: + BUG(); + } + + area->has_metadata = true; + area->write_offset = blk_table_size; + area->compressed_offset = blk_table_size; + metadata->area.blk_desc.capacity = 0; + metadata->area.blk_desc.items_count = 0; + metadata->reserved_offset = 0; + metadata->sequence_id = 0; + + err = ssdfs_peb_reserve_blk_desc_space(pebi, metadata); + if (unlikely(err)) { + SSDFS_ERR("fail to reserve blk desc space: " + "err %d\n", err); + goto finish_log_create; + } + break; + + case SSDFS_LOG_DIFFS_AREA: + memset(metadata, 0, metadata_size); + chain_hdr = &metadata->area.diffs.table.hdr.chain_hdr; + chain_hdr->desc_size = cpu_to_le16(desc_size); + chain_hdr->magic = SSDFS_CHAIN_HDR_MAGIC; + chain_hdr->type = SSDFS_BLK_STATE_CHAIN_HDR; + area->has_metadata = false; + area->write_offset = 0; + area->metadata.reserved_offset = 0; + break; + + case SSDFS_LOG_JOURNAL_AREA: + memset(metadata, 0, metadata_size); + chain_hdr = &metadata->area.journal.table.hdr.chain_hdr; + chain_hdr->desc_size = cpu_to_le16(desc_size); + chain_hdr->magic = SSDFS_CHAIN_HDR_MAGIC; + chain_hdr->type = SSDFS_BLK_STATE_CHAIN_HDR; + area->has_metadata = false; + area->write_offset = 0; + area->metadata.reserved_offset = 0; + break; + + case SSDFS_LOG_MAIN_AREA: + memset(metadata, 0, metadata_size); + area->has_metadata = false; + area->write_offset = 0; + area->metadata.reserved_offset = 0; + break; + + default: + BUG(); + }; + } + + ssdfs_peb_set_current_log_state(pebi, SSDFS_LOG_CREATED); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("log created: " + "seg %llu, peb %llu, " + "current_log.start_page %u, free_data_pages %u\n", + si->seg_id, pebi->peb_id, + pebi->current_log.start_page, + log->free_data_pages); +#else + SSDFS_DBG("log created: " + "seg %llu, peb %llu, " + "current_log.start_page %u, free_data_pages %u\n", + si->seg_id, pebi->peb_id, + pebi->current_log.start_page, + log->free_data_pages); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + +finish_log_create: + ssdfs_peb_current_log_unlock(pebi); + return err; +} + +/* + * ssdfs_peb_grow_log_area() - grow log's area + * @pebi: pointer on PEB object + * @area_type: area type + * @fragment_size: size of fragment + * + * This function tries to add memory page into log's area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - log is full. + * %-ENOMEM - fail to allocate memory. + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_grow_log_area(struct ssdfs_peb_info *pebi, int area_type, + u32 fragment_size) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + struct ssdfs_page_array *area_pages; + u32 write_offset; + pgoff_t index_start, index_end; + struct page *page; + u16 metadata_pages = 0; + u16 free_data_pages; + u16 reserved_pages; + int phys_pages = 0; + int log_strategy; + u32 min_log_pages; + u32 footer_pages; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(area_type >= SSDFS_LOG_AREA_MAX); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("peb %llu, current_log.free_data_pages %u, " + "area_type %#x, area.write_offset %u, " + "fragment_size %u\n", + pebi->peb_id, + pebi->current_log.free_data_pages, + area_type, + pebi->current_log.area[area_type].write_offset, + fragment_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + si = pebi->pebc->parent_si; + area_pages = &pebi->current_log.area[area_type].array; + + write_offset = pebi->current_log.area[area_type].write_offset; + + BUG_ON(fragment_size > (2 * PAGE_SIZE)); + + index_start = (((write_offset >> fsi->log_pagesize) << + fsi->log_pagesize) >> PAGE_SHIFT); + + if (fsi->pagesize > PAGE_SIZE) { + index_end = write_offset + fragment_size + fsi->pagesize - 1; + index_end >>= fsi->log_pagesize; + index_end <<= fsi->log_pagesize; + index_end >>= PAGE_SHIFT; + } else { + index_end = write_offset + fragment_size + PAGE_SIZE - 1; + index_end >>= PAGE_SHIFT; + } + + do { + page = ssdfs_page_array_get_page(area_pages, index_start); + if (IS_ERR_OR_NULL(page)) + break; + else { + index_start++; + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + } while (index_start < index_end); + + if (index_start >= index_end) { + SSDFS_DBG("log doesn't need to grow\n"); + return 0; + } + + phys_pages = index_end - index_start; + + if (fsi->pagesize > PAGE_SIZE) { + phys_pages >>= fsi->log_pagesize - PAGE_SHIFT; + if (phys_pages == 0) + phys_pages = 1; + } else if (fsi->pagesize < PAGE_SIZE) + phys_pages <<= PAGE_SHIFT - fsi->log_pagesize; + + log_strategy = is_log_partial(pebi); + free_data_pages = pebi->current_log.free_data_pages; + reserved_pages = pebi->current_log.reserved_pages; + min_log_pages = ssdfs_peb_estimate_min_partial_log_pages(pebi); + footer_pages = ssdfs_peb_log_footer_metapages(pebi); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("min_log_pages %u, footer_pages %u, " + "log_strategy %#x, free_data_pages %u, " + "reserved_pages %u\n", + min_log_pages, footer_pages, + log_strategy, free_data_pages, + reserved_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (phys_pages <= free_data_pages) { + /* + * Continue logic. + */ + } else if (phys_pages <= (free_data_pages + footer_pages) && + reserved_pages >= min_log_pages) { + switch (log_strategy) { + case SSDFS_START_FULL_LOG: + case SSDFS_FINISH_FULL_LOG: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("new_page_count %u > free_data_pages %u\n", + phys_pages, + pebi->current_log.free_data_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + + case SSDFS_START_PARTIAL_LOG: + pebi->current_log.free_data_pages += footer_pages; + pebi->current_log.reserved_pages -= footer_pages; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("use footer page for data: " + "free_data_pages %u, reserved_pages %u\n", + pebi->current_log.free_data_pages, + pebi->current_log.reserved_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + case SSDFS_CONTINUE_PARTIAL_LOG: + case SSDFS_FINISH_PARTIAL_LOG: + /* no free space available */ + + default: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("new_page_count %u > free_data_pages %u\n", + phys_pages, + pebi->current_log.free_data_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("new_page_count %u > free_data_pages %u\n", + phys_pages, + pebi->current_log.free_data_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } + + for (; index_start < index_end; index_start++) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page_index %lu, current_log.free_data_pages %u\n", + index_start, pebi->current_log.free_data_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = ssdfs_page_array_grab_page(area_pages, index_start); + if (IS_ERR_OR_NULL(page)) { + SSDFS_ERR("fail to add page %lu into area %#x space\n", + index_start, area_type); + return -ENOMEM; + } + + ssdfs_memzero_page(page, 0, PAGE_SIZE, PAGE_SIZE); + + ssdfs_set_page_private(page, 0); + ssdfs_put_page(page); + ssdfs_unlock_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + pebi->current_log.free_data_pages -= phys_pages; + + if (area_type == SSDFS_LOG_BLK_DESC_AREA) + metadata_pages = phys_pages; + + if (metadata_pages > 0) { + err = ssdfs_segment_blk_bmap_reserve_metapages(&si->blk_bmap, + pebi->pebc, + metadata_pages); + if (err == -ENOSPC) { + /* + * The goal of reservation is to decrease the number of + * free logical blocks because some PEB's space is used + * for the metadata. Such decreasing prevents from + * allocation of logical blocks out of physically + * available space in the PEB. However, if no space + * for reservation but there are some physical pages + * for logs creation then the operation of reservation + * can be simply ignored. Because, current log's + * metadata structure manages the real available + * space in the PEB. + */ + err = 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to reserve metadata pages: " + "count %u, err %d\n", + metadata_pages, err); + return err; + } + } + + return 0; +} + +/* + * ssdfs_peb_store_fragment() - store fragment into page cache + * @from: fragment source descriptor + * @to: fragment destination descriptor [in|out] + * + * This function tries to store fragment into log. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EAGAIN - fail to store fragment into available space. + */ +static +int ssdfs_peb_store_fragment(struct ssdfs_fragment_source *from, + struct ssdfs_fragment_destination *to) +{ + int compr_type; + unsigned char *src; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!from || !to); + BUG_ON(!from->page || !to->store || !to->desc); + BUG_ON((from->start_offset + from->data_bytes) > PAGE_SIZE); + BUG_ON(from->fragment_type <= SSDFS_UNKNOWN_FRAGMENT_TYPE || + from->fragment_type >= SSDFS_FRAGMENT_DESC_MAX_TYPE); + BUG_ON(from->fragment_flags & ~SSDFS_FRAGMENT_DESC_FLAGS_MASK); + BUG_ON(to->free_space > PAGE_SIZE); + + SSDFS_DBG("page %p, start_offset %u, data_bytes %zu, " + "sequence_id %u, fragment_type %#x, fragment_flags %#x, " + "write_offset %u, store %p, free_space %zu\n", + from->page, from->start_offset, from->data_bytes, + from->sequence_id, from->fragment_type, + from->fragment_flags, + to->write_offset, to->store, to->free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (from->data_bytes == 0) { + SSDFS_WARN("from->data_bytes == 0\n"); + return 0; + } + + if (to->free_space == 0) { + SSDFS_WARN("to->free_space is not enough\n"); + return -EAGAIN; + } + + switch (from->fragment_type) { + case SSDFS_FRAGMENT_UNCOMPR_BLOB: + compr_type = SSDFS_COMPR_NONE; + break; + case SSDFS_FRAGMENT_ZLIB_BLOB: + compr_type = SSDFS_COMPR_ZLIB; + break; + case SSDFS_FRAGMENT_LZO_BLOB: + compr_type = SSDFS_COMPR_LZO; + break; + default: + BUG(); + }; + + if (!ssdfs_can_compress_data(from->page, from->data_bytes)) { + compr_type = SSDFS_COMPR_NONE; + from->fragment_type = SSDFS_FRAGMENT_UNCOMPR_BLOB; + } + + to->compr_size = to->free_space; + + src = kmap_local_page(from->page); + src += from->start_offset; + to->desc->checksum = ssdfs_crc32_le(src, from->data_bytes); + err = ssdfs_compress(compr_type, src, to->store, + &from->data_bytes, &to->compr_size); + kunmap_local(src); + + if (err == -E2BIG || err == -EOPNOTSUPP) { + BUG_ON(from->data_bytes > PAGE_SIZE); + BUG_ON(from->data_bytes > to->free_space); + + from->fragment_type = SSDFS_FRAGMENT_UNCOMPR_BLOB; + + src = kmap_local_page(from->page); + err = ssdfs_memcpy(to->store, 0, to->free_space, + src, from->start_offset, PAGE_SIZE, + from->data_bytes); + kunmap_local(src); + + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + return err; + } + + to->compr_size = from->data_bytes; + } else if (err) { + SSDFS_ERR("fail to compress fragment: " + "data_bytes %zu, free_space %zu, err %d\n", + from->data_bytes, to->free_space, err); + return err; + } + + BUG_ON(to->area_offset > to->write_offset); + to->desc->offset = cpu_to_le32(to->write_offset - to->area_offset); + +#ifdef CONFIG_SSDFS_DEBUG + WARN_ON(to->compr_size > U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + to->desc->compr_size = cpu_to_le16((u16)to->compr_size); + +#ifdef CONFIG_SSDFS_DEBUG + WARN_ON(from->data_bytes > U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + to->desc->uncompr_size = cpu_to_le16((u16)from->data_bytes); + +#ifdef CONFIG_SSDFS_DEBUG + WARN_ON(from->sequence_id >= U8_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + to->desc->sequence_id = from->sequence_id; + to->desc->magic = SSDFS_FRAGMENT_DESC_MAGIC; + to->desc->type = from->fragment_type; + to->desc->flags = from->fragment_flags; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("offset %u, compr_size %u, " + "uncompr_size %u, checksum %#x\n", + to->desc->offset, + to->desc->compr_size, + to->desc->uncompr_size, + le32_to_cpu(to->desc->checksum)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_define_stream_fragments_count() - calculate fragments count + * @start_offset: offset byte stream in bytes + * @data_bytes: size of stream in bytes + * + * This function calculates count of fragments of byte stream. + * The byte stream is part of memory page or it can be distributed + * between several memory pages. One fragment can't be greater + * than memory page (PAGE_SIZE) in bytes. Logic of this + * function calculates count of parts are divided between + * memory pages. + */ +static inline +u16 ssdfs_define_stream_fragments_count(u32 start_offset, + u32 data_bytes) +{ + u16 count = 0; + u32 partial_offset; + u32 front_part; + + if (data_bytes == 0) + return 0; + + partial_offset = start_offset % PAGE_SIZE; + front_part = PAGE_SIZE - partial_offset; + front_part = min_t(u32, front_part, data_bytes); + + if (front_part < data_bytes) { + count++; + data_bytes -= front_part; + } + + count += (data_bytes + PAGE_SIZE - 1) >> PAGE_SHIFT; + + return count; +} + +/* + * ssdfs_peb_store_data_block_fragment() - store data block's fragment + * @pebi: pointer on PEB object + * @from: fragment source descriptor + * @write_offset: write offset + * @type: area type + * @desc: pointer on fragment descriptor + * + * This function tries to store data block's fragment into log. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - fail to get memory page. + * %-EAGAIN - unable to store data fragment. + */ +static +int ssdfs_peb_store_data_block_fragment(struct ssdfs_peb_info *pebi, + struct ssdfs_fragment_source *from, + u32 write_offset, + int type, + struct ssdfs_fragment_desc *desc) +{ + struct ssdfs_fragment_destination to; + struct page *page; + pgoff_t page_index; + u32 offset; + u32 written_bytes = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !from); + BUG_ON(type >= SSDFS_LOG_AREA_MAX); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("from->page %p, from->start_offset %u, " + "from->data_bytes %zu, from->sequence_id %u, " + "write_offset %u, type %#x\n", + from->page, from->start_offset, from->data_bytes, + from->sequence_id, write_offset, type); +#endif /* CONFIG_SSDFS_DEBUG */ + + to.area_offset = 0; + to.write_offset = write_offset; + + to.store = ssdfs_flush_kzalloc(PAGE_SIZE, GFP_KERNEL); + if (!to.store) { + SSDFS_ERR("fail to allocate buffer for fragment\n"); + return -ENOMEM; + } + + to.free_space = PAGE_SIZE; + to.compr_size = 0; + to.desc = desc; + + err = ssdfs_peb_store_fragment(from, &to); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to store data fragment: " + "write_offset %u, dst_free_space %zu\n", + write_offset, to.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + goto free_compr_buffer; + } else if (unlikely(err)) { + SSDFS_ERR("fail to store fragment: " + "sequence_id %u, write_offset %u, err %d\n", + from->sequence_id, write_offset, err); + goto free_compr_buffer; + } + + BUG_ON(to.compr_size == 0); + + do { + struct ssdfs_page_array *area_pages; + u32 size; + + page_index = to.write_offset + written_bytes; + page_index >>= PAGE_SHIFT; + + area_pages = &pebi->current_log.area[type].array; + page = ssdfs_page_array_get_page_locked(area_pages, + page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + + if (err == -ENOENT) { + err = ssdfs_peb_grow_log_area(pebi, type, + from->data_bytes); + if (err == -ENOSPC) { + err = -EAGAIN; + SSDFS_DBG("log is full\n"); + goto free_compr_buffer; + } else if (unlikely(err)) { + SSDFS_ERR("fail to grow log area: " + "type %#x, err %d\n", + type, err); + goto free_compr_buffer; + } + } else { + SSDFS_ERR("fail to get page: " + "index %lu for area %#x\n", + page_index, type); + goto free_compr_buffer; + } + + /* try to get page again */ + page = ssdfs_page_array_get_page_locked(area_pages, + page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to get page: " + "index %lu for area %#x\n", + page_index, type); + goto free_compr_buffer; + } + } + + offset = to.write_offset + written_bytes; + offset %= PAGE_SIZE; + size = PAGE_SIZE - offset; + size = min_t(u32, size, to.compr_size - written_bytes); + + err = ssdfs_memcpy_to_page(page, + offset, PAGE_SIZE, + to.store, + written_bytes, to.free_space, + size); + if (unlikely(err)) { + SSDFS_ERR("failt to copy: err %d\n", err); + goto finish_copy; + } + + SetPageUptodate(page); + err = ssdfs_page_array_set_page_dirty(area_pages, + page_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: " + "err %d\n", + page_index, err); + } + +finish_copy: + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (err) + goto free_compr_buffer; + + written_bytes += size; + } while (written_bytes < to.compr_size); + +free_compr_buffer: + ssdfs_flush_kfree(to.store); + + return err; +} + +/* + * ssdfs_peb_store_block_state_desc() - store block state descriptor + * @pebi: pointer on PEB object + * @write_offset: write offset + * @type: area type + * @desc: pointer on block state descriptor + * @array: fragment descriptors array + * @array_size: number of items in array + * + * This function tries to store block state descriptor into log. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - fail to get memory page. + */ +static +int ssdfs_peb_store_block_state_desc(struct ssdfs_peb_info *pebi, + u32 write_offset, + int type, + struct ssdfs_block_state_descriptor *desc, + struct ssdfs_fragment_desc *array, + u32 array_size) +{ + struct ssdfs_page_array *area_pages; + struct page *page; + pgoff_t page_index; + unsigned char *kaddr; + u32 page_off; + size_t desc_size = sizeof(struct ssdfs_block_state_descriptor); + size_t table_size = sizeof(struct ssdfs_fragment_desc) * array_size; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); + BUG_ON(!desc || !array); + BUG_ON(array_size == 0); + BUG_ON(type >= SSDFS_LOG_AREA_MAX); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("write_offset %u, type %#x, desc %p, " + "array %p, array_size %u\n", + write_offset, type, desc, array, array_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_index = write_offset / PAGE_SIZE; + area_pages = &pebi->current_log.area[type].array; + + page = ssdfs_page_array_get_page_locked(area_pages, page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to get page %lu for area %#x\n", + page_index, type); + return err; + } + + page_off = write_offset % PAGE_SIZE; + + kaddr = kmap_local_page(page); + + err = ssdfs_memcpy(kaddr, page_off, PAGE_SIZE, + desc, 0, desc_size, + desc_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + goto fail_copy; + } + + err = ssdfs_memcpy(kaddr, page_off + desc_size, PAGE_SIZE, + array, 0, table_size, + table_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + goto fail_copy; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("write_offset %u, page_off %u, " + "desc_size %zu, table_size %zu\n", + write_offset, page_off, desc_size, table_size); + SSDFS_DBG("BLOCK STATE DESC AREA DUMP:\n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, PAGE_SIZE); + SSDFS_DBG("\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + +fail_copy: + flush_dcache_page(page); + kunmap_local(kaddr); + + if (unlikely(err)) + goto finish_copy; + + SetPageUptodate(page); + + err = ssdfs_page_array_set_page_dirty(area_pages, + page_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: " + "err %d\n", + page_index, err); + } + +finish_copy: + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_peb_store_byte_stream_in_main_area() - store byte stream into main area + * @pebi: pointer on PEB object + * @stream: byte stream descriptor + * @cno: checkpoint + * @parent_snapshot: parent snapshot number + * + * This function tries to store store data block of some size + * from pagevec into main area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_store_byte_stream_in_main_area(struct ssdfs_peb_info *pebi, + struct ssdfs_byte_stream_descriptor *stream, + u64 cno, + u64 parent_snapshot) +{ + struct ssdfs_peb_area *area; + int area_type = SSDFS_LOG_MAIN_AREA; + struct ssdfs_fragment_desc cur_desc = {0}; + int start_page, page_index; + u16 fragments; + u32 written_bytes = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !stream); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!stream->pvec); + BUG_ON(pagevec_count(stream->pvec) == 0); + BUG_ON((pagevec_count(stream->pvec) * PAGE_SIZE) < + (stream->start_offset + stream->data_bytes)); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, " + "write_offset %u, " + "stream->start_offset %u, stream->data_bytes %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + pebi->current_log.area[area_type].write_offset, + stream->start_offset, stream->data_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + area = &pebi->current_log.area[area_type]; + + fragments = ssdfs_define_stream_fragments_count(stream->start_offset, + stream->data_bytes); + if (fragments == 0) { + SSDFS_ERR("invalid fragments count %u\n", fragments); + return -ERANGE; + } + + start_page = stream->start_offset >> PAGE_SHIFT; + + if ((start_page + fragments) > pagevec_count(stream->pvec)) { + SSDFS_ERR("start_page %d + fragments %u > pagevec_count %u\n", + start_page, fragments, pagevec_count(stream->pvec)); + err = -ERANGE; + goto finish_store_byte_stream; + } + + stream->write_offset = area->write_offset; + + for (page_index = 0; page_index < fragments; page_index++) { + int i = start_page + page_index; + struct ssdfs_fragment_source from; + u32 write_offset; + + if (written_bytes >= stream->data_bytes) { + SSDFS_ERR("written_bytes %u >= data_bytes %u\n", + written_bytes, stream->data_bytes); + err = -ERANGE; + goto finish_store_byte_stream; + } + + from.page = stream->pvec->pages[i]; + from.start_offset = (stream->start_offset + written_bytes) % + PAGE_SIZE; + from.data_bytes = min_t(u32, PAGE_SIZE, + stream->data_bytes - written_bytes); + from.sequence_id = page_index; + + from.fragment_type = SSDFS_FRAGMENT_UNCOMPR_BLOB; + from.fragment_flags = 0; + +try_get_next_page: + write_offset = area->write_offset; + err = ssdfs_peb_store_data_block_fragment(pebi, &from, + write_offset, + area_type, + &cur_desc); + + if (err == -EAGAIN) { + u32 page_off = write_offset % PAGE_SIZE; + u32 rest = PAGE_SIZE - page_off; + + if (page_off == 0) + goto finish_store_byte_stream; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try to get next page: " + "write_offset %u, free_space %u\n", + write_offset, rest); +#endif /* CONFIG_SSDFS_DEBUG */ + + pebi->current_log.area[area_type].write_offset += rest; + goto try_get_next_page; + } + + if (err) { + SSDFS_ERR("fail to store fragment: " + "sequence_id %u, write_offset %u, err %d\n", + from.sequence_id, + area->write_offset, + err); + goto finish_store_byte_stream; + } + + written_bytes += from.data_bytes; + area->write_offset += le16_to_cpu(cur_desc.compr_size); + } + + stream->compr_bytes = area->write_offset; + +finish_store_byte_stream: + if (err) + area->write_offset = 0; + + return err; +} + +static +int ssdfs_peb_define_metadata_space(struct ssdfs_peb_info *pebi, + int area_type, + u32 start_offset, + u32 data_bytes, + u32 *metadata_offset, + u32 *metadata_space) +{ + struct ssdfs_peb_area *area; + u16 fragments; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); + BUG_ON(area_type >= SSDFS_LOG_AREA_MAX); + BUG_ON(!metadata_offset || !metadata_space); + + SSDFS_DBG("seg %llu, peb %llu, " + "area_type %#x, write_offset %u, " + "start_offset %u, data_bytes %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + area_type, + pebi->current_log.area[area_type].write_offset, + start_offset, data_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + area = &pebi->current_log.area[area_type]; + + *metadata_offset = area->write_offset; + *metadata_space = sizeof(struct ssdfs_block_state_descriptor); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("metadata_offset %u, metadata_space %u\n", + *metadata_offset, *metadata_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + fragments = ssdfs_define_stream_fragments_count(start_offset, + data_bytes); + if (fragments == 0) { + SSDFS_ERR("invalid fragments count %u\n", fragments); + return -ERANGE; + } + + *metadata_space += fragments * sizeof(struct ssdfs_fragment_desc); + *metadata_offset = ssdfs_peb_correct_area_write_offset(*metadata_offset, + *metadata_space); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragments %u, metadata_offset %u, metadata_space %u\n", + fragments, *metadata_offset, *metadata_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_peb_store_byte_stream() - store byte stream into log + * @pebi: pointer on PEB object + * @stream: byte stream descriptor + * @area_type: area type + * @fragment_type: fragment type + * @cno: checkpoint + * @parent_snapshot: parent snapshot number + * + * This function tries to store store data block of some size + * from pagevec into log. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_peb_store_byte_stream(struct ssdfs_peb_info *pebi, + struct ssdfs_byte_stream_descriptor *stream, + int area_type, + int fragment_type, + u64 cno, + u64 parent_snapshot) +{ + struct ssdfs_block_state_descriptor state_desc; + struct ssdfs_fragment_desc cur_desc = {0}; + struct ssdfs_peb_area *area; + struct ssdfs_fragment_desc *array = NULL; + u16 fragments; + int start_page, page_index; + u32 metadata_offset; + u32 metadata_space; + u32 written_bytes = 0; +#ifdef CONFIG_SSDFS_DEBUG + void *kaddr; +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc || !stream); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(!stream->pvec); + BUG_ON(pagevec_count(stream->pvec) == 0); + BUG_ON((pagevec_count(stream->pvec) * PAGE_SIZE) < + (stream->start_offset + stream->data_bytes)); + BUG_ON(area_type >= SSDFS_LOG_AREA_MAX); + BUG_ON(fragment_type <= SSDFS_UNKNOWN_FRAGMENT_TYPE || + fragment_type >= SSDFS_FRAGMENT_DESC_MAX_TYPE); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("seg %llu, peb %llu, " + "area_type %#x, fragment_type %#x, write_offset %u, " + "stream->start_offset %u, stream->data_bytes %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + area_type, fragment_type, + pebi->current_log.area[area_type].write_offset, + stream->start_offset, stream->data_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + area = &pebi->current_log.area[area_type]; + + fragments = ssdfs_define_stream_fragments_count(stream->start_offset, + stream->data_bytes); + if (fragments == 0) { + SSDFS_ERR("invalid fragments count %u\n", fragments); + return -ERANGE; + } else if (fragments > 1) { + array = ssdfs_flush_kcalloc(fragments, + sizeof(struct ssdfs_fragment_desc), + GFP_KERNEL); + if (!array) { + SSDFS_ERR("fail to allocate fragment desc array: " + "fragments %u\n", + fragments); + return -ENOMEM; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragments %u, start_offset %u, data_bytes %u\n", + fragments, stream->start_offset, stream->data_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + start_page = stream->start_offset >> PAGE_SHIFT; + + if ((start_page + fragments) > pagevec_count(stream->pvec)) { + SSDFS_ERR("start_page %d + fragments %u > pagevec_count %u\n", + start_page, fragments, pagevec_count(stream->pvec)); + err = -ERANGE; + goto free_array; + } + + err = ssdfs_peb_define_metadata_space(pebi, area_type, + stream->start_offset, + stream->data_bytes, + &metadata_offset, + &metadata_space); + if (err) { + SSDFS_ERR("fail to define metadata space: err %d\n", + err); + goto free_array; + } + + stream->write_offset = area->write_offset = metadata_offset; + area->write_offset += metadata_space; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("write_offset %u\n", area->write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (page_index = 0; page_index < fragments; page_index++) { + int i = start_page + page_index; + struct ssdfs_fragment_source from; + u32 write_offset; + + if (written_bytes >= stream->data_bytes) { + SSDFS_ERR("written_bytes %u >= data_bytes %u\n", + written_bytes, stream->data_bytes); + err = -ERANGE; + goto free_array; + } + +#ifdef CONFIG_SSDFS_DEBUG + kaddr = kmap_local_page(stream->pvec->pages[i]); + SSDFS_DBG("PAGE DUMP: index %d\n", + i); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, + PAGE_SIZE); + SSDFS_DBG("\n"); + kunmap_local(kaddr); +#endif /* CONFIG_SSDFS_DEBUG */ + + from.page = stream->pvec->pages[i]; + from.start_offset = (stream->start_offset + written_bytes) % + PAGE_SIZE; + from.data_bytes = min_t(u32, PAGE_SIZE, + stream->data_bytes - written_bytes); + from.sequence_id = page_index; + from.fragment_type = fragment_type; + from.fragment_flags = SSDFS_FRAGMENT_HAS_CSUM; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("from.start_offset %u, from.data_bytes %zu, " + "page_index %d\n", + from.start_offset, from.data_bytes, + page_index); +#endif /* CONFIG_SSDFS_DEBUG */ + +try_get_next_page: + write_offset = area->write_offset; + err = ssdfs_peb_store_data_block_fragment(pebi, &from, + write_offset, + area_type, + &cur_desc); + + if (err == -EAGAIN) { + u32 page_off = write_offset % PAGE_SIZE; + u32 rest = PAGE_SIZE - page_off; + + if (page_off == 0) + goto free_array; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try to get next page: " + "write_offset %u, free_space %u\n", + write_offset, rest); +#endif /* CONFIG_SSDFS_DEBUG */ + + pebi->current_log.area[area_type].write_offset += rest; + goto try_get_next_page; + } + + if (err) { + SSDFS_ERR("fail to store fragment: " + "sequence_id %u, write_offset %u, err %d\n", + from.sequence_id, + area->write_offset, + err); + goto free_array; + } + + if (array) { + ssdfs_memcpy(&array[page_index], + 0, sizeof(struct ssdfs_fragment_desc), + &cur_desc, + 0, sizeof(struct ssdfs_fragment_desc), + sizeof(struct ssdfs_fragment_desc)); + } else if (page_index > 0) + BUG(); + + written_bytes += from.data_bytes; + area->write_offset += le16_to_cpu(cur_desc.compr_size); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("written_bytes %u, write_offset %u\n", + written_bytes, area->write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + stream->compr_bytes = + area->write_offset - (metadata_offset + metadata_space); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("write_offset %u, metadata_offset %u, metadata_space %u\n", + area->write_offset, metadata_offset, metadata_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + state_desc.cno = cpu_to_le64(cno); + state_desc.parent_snapshot = cpu_to_le64(parent_snapshot); + + state_desc.chain_hdr.compr_bytes = cpu_to_le32(stream->compr_bytes); + state_desc.chain_hdr.uncompr_bytes = cpu_to_le32(written_bytes); + state_desc.chain_hdr.fragments_count = cpu_to_le16(fragments); + state_desc.chain_hdr.desc_size = + cpu_to_le16(sizeof(struct ssdfs_fragment_desc)); + state_desc.chain_hdr.magic = SSDFS_CHAIN_HDR_MAGIC; + state_desc.chain_hdr.type = SSDFS_BLK_STATE_CHAIN_HDR; + state_desc.chain_hdr.flags = 0; + + if (array) { + err = ssdfs_peb_store_block_state_desc(pebi, metadata_offset, + area_type, &state_desc, + array, fragments); + } else { + err = ssdfs_peb_store_block_state_desc(pebi, metadata_offset, + area_type, &state_desc, + &cur_desc, 1); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to store block state descriptor: " + "write_offset %u, area_type %#x, err %d\n", + metadata_offset, area_type, err); + goto free_array; + } + +free_array: + if (array) + ssdfs_flush_kfree(array); + + if (err) + area->write_offset = metadata_offset; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb %llu, " + "area_type %#x, fragment_type %#x, write_offset %u, " + "stream->start_offset %u, stream->data_bytes %u\n", + pebi->pebc->parent_si->seg_id, pebi->peb_id, + area_type, fragment_type, + pebi->current_log.area[area_type].write_offset, + stream->start_offset, stream->data_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_area_free_space() - calculate area's free space + * @pebi: pointer on PEB object + * @area_type: area type + */ +static +u32 ssdfs_area_free_space(struct ssdfs_peb_info *pebi, int area_type) +{ + struct ssdfs_fs_info *fsi; + u32 write_offset; + u32 page_index; + unsigned long pages_count; + u32 free_space = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(area_type >= SSDFS_LOG_AREA_MAX); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("area_type %#x\n", area_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + write_offset = pebi->current_log.area[area_type].write_offset; + page_index = write_offset / PAGE_SIZE; + + down_read(&pebi->current_log.area[area_type].array.lock); + pages_count = pebi->current_log.area[area_type].array.pages_count; + up_read(&pebi->current_log.area[area_type].array.lock); + + if (page_index < pages_count) + free_space += PAGE_SIZE - (write_offset % PAGE_SIZE); + + free_space += pebi->current_log.free_data_pages * fsi->pagesize; + + /* + * Reserved pages could be used for segment header + * and log footer. However, partial log header is + * the special combination of segment header and + * log footer. Usually, latest log has to be ended + * by the log footer. However, it could be used + * only partial log header if it needs to use + * the reserved space for log footer by user data. + */ + free_space += (pebi->current_log.reserved_pages - 1) * fsi->pagesize; + + return free_space; +} + +/* + * can_area_add_fragment() - do we can store fragment into area? + * @pebi: pointer on PEB object + * @area_type: area type + * @fragment_size: size of fragment + * + * This function checks that we can add fragment into + * free space of requested area. + */ +static +bool can_area_add_fragment(struct ssdfs_peb_info *pebi, int area_type, + u32 fragment_size) +{ + u32 write_offset; + u32 free_space; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(fragment_size == 0); + BUG_ON(area_type >= SSDFS_LOG_AREA_MAX); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("area_type %#x, fragment_size %u\n", + area_type, fragment_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + write_offset = pebi->current_log.area[area_type].write_offset; + free_space = ssdfs_area_free_space(pebi, area_type); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("write_offset %u, free_space %u\n", + write_offset, free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + return fragment_size <= free_space; +} + +/* + * has_current_page_free_space() - check current area's memory page + * @pebi: pointer on PEB object + * @area_type: area type + * @fragment_size: size of fragment + * + * This function checks that we can add fragment into + * free space of current memory page. + */ +static +bool has_current_page_free_space(struct ssdfs_peb_info *pebi, + int area_type, + u32 fragment_size) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_page_array *area_pages; + bool is_space_enough, is_page_available; + u32 write_offset; + pgoff_t page_index; + unsigned long pages_count; + struct page *page; + u32 free_space = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi || !pebi->pebc); + BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi); + BUG_ON(fragment_size == 0); + BUG_ON(area_type >= SSDFS_LOG_AREA_MAX); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); + + SSDFS_DBG("area_type %#x, fragment_size %u\n", + area_type, fragment_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + write_offset = pebi->current_log.area[area_type].write_offset; + page_index = write_offset / PAGE_SIZE; + + down_read(&pebi->current_log.area[area_type].array.lock); + pages_count = pebi->current_log.area[area_type].array.pages_count; + up_read(&pebi->current_log.area[area_type].array.lock); + + if (page_index < pages_count) + free_space += PAGE_SIZE - (write_offset % PAGE_SIZE); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("write_offset %u, free_space %u\n", + write_offset, free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + is_space_enough = fragment_size <= free_space; + + page_index = write_offset >> PAGE_SHIFT; + area_pages = &pebi->current_log.area[area_type].array; + page = ssdfs_page_array_get_page(area_pages, page_index); + if (IS_ERR_OR_NULL(page)) + is_page_available = false; + else { + is_page_available = true; + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return is_space_enough && is_page_available; +} + +/* + * ssdfs_peb_get_area_free_frag_desc() - get free fragment descriptor + * @pebi: pointer on PEB object + * @area_type: area type + * + * This function tries to get next vacant fragment descriptor + * from block table. + * + * RETURN: + * [success] - pointer on vacant fragment descriptor. + * [failure] - NULL (block table is full). + */ +static +struct ssdfs_fragment_desc * +ssdfs_peb_get_area_free_frag_desc(struct ssdfs_peb_info *pebi, int area_type) +{ + struct ssdfs_area_block_table *table; + u16 vacant_item; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); + BUG_ON(area_type >= SSDFS_LOG_AREA_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (area_type) { + case SSDFS_LOG_MAIN_AREA: + case SSDFS_LOG_DIFFS_AREA: + case SSDFS_LOG_JOURNAL_AREA: + /* these areas haven't area block table */ + SSDFS_DBG("area block table doesn't be created\n"); + return ERR_PTR(-ERANGE); + + case SSDFS_LOG_BLK_DESC_AREA: + /* store area block table */ + break; + + default: + BUG(); + }; + + table = &pebi->current_log.area[area_type].metadata.area.blk_desc.table; + vacant_item = le16_to_cpu(table->chain_hdr.fragments_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("area_type %#x, vacant_item %u\n", + area_type, vacant_item); +#endif /* CONFIG_SSDFS_DEBUG */ + + BUG_ON(vacant_item > SSDFS_NEXT_BLK_TABLE_INDEX); + if (vacant_item == SSDFS_NEXT_BLK_TABLE_INDEX) { + SSDFS_DBG("block table is full\n"); + return NULL; + } + + le16_add_cpu(&table->chain_hdr.fragments_count, 1); + return &table->blk[vacant_item]; +} + +/* + * ssdfs_peb_get_area_cur_frag_desc() - get current fragment descriptor + * @pebi: pointer on PEB object + * @area_type: area type + * + * This function tries to get current fragment descriptor + * from block table. + * + * RETURN: + * [success] - pointer on current fragment descriptor. + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +struct ssdfs_fragment_desc * +ssdfs_peb_get_area_cur_frag_desc(struct ssdfs_peb_info *pebi, int area_type) +{ + struct ssdfs_area_block_table *table; + u16 fragments_count; + u16 cur_item = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); + BUG_ON(area_type >= SSDFS_LOG_AREA_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (area_type) { + case SSDFS_LOG_MAIN_AREA: + case SSDFS_LOG_DIFFS_AREA: + case SSDFS_LOG_JOURNAL_AREA: + /* these areas haven't area block table */ + SSDFS_DBG("area block table doesn't be created\n"); + return ERR_PTR(-ERANGE); + + case SSDFS_LOG_BLK_DESC_AREA: + /* store area block table */ + break; + + default: + BUG(); + }; + + table = &pebi->current_log.area[area_type].metadata.area.blk_desc.table; + fragments_count = le16_to_cpu(table->chain_hdr.fragments_count); + + if (fragments_count > 0) + cur_item = fragments_count - 1; + else + cur_item = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("area_type %#x, cur_item %u\n", + area_type, cur_item); +#endif /* CONFIG_SSDFS_DEBUG */ + + BUG_ON(cur_item >= SSDFS_NEXT_BLK_TABLE_INDEX); + + return &table->blk[cur_item]; +} + +/* + * ssdfs_peb_store_area_block_table() - store block table + * @pebi: pointer on PEB object + * @area_type: area type + * @flags: area block table header's flags + * + * This function tries to store block table into area's address + * space by reserved offset. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_peb_store_area_block_table(struct ssdfs_peb_info *pebi, + int area_type, u16 flags) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_peb_area *area; + struct ssdfs_area_block_table *table; + struct ssdfs_fragment_desc *last_desc; + u16 fragments; + u32 reserved_offset, new_offset; + size_t blk_table_size = sizeof(struct ssdfs_area_block_table); + u16 hdr_flags; + struct page *page; + pgoff_t page_index; + u32 page_off; + bool is_compressed = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); + BUG_ON(area_type >= SSDFS_LOG_AREA_MAX); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (area_type) { + case SSDFS_LOG_MAIN_AREA: + case SSDFS_LOG_DIFFS_AREA: + case SSDFS_LOG_JOURNAL_AREA: + /* these areas haven't area block table */ + SSDFS_DBG("area block table doesn't be created\n"); + return 0; + + case SSDFS_LOG_BLK_DESC_AREA: + /* store area block table */ + break; + + default: + BUG(); + }; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("reserved_offset %u, area_type %#x\n", + pebi->current_log.area[area_type].metadata.reserved_offset, + area_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + is_compressed = fsi->metadata_options.blk2off_tbl.flags & + SSDFS_BLK2OFF_TBL_MAKE_COMPRESSION; + + area = &pebi->current_log.area[area_type]; + table = &area->metadata.area.blk_desc.table; + + fragments = le16_to_cpu(table->chain_hdr.fragments_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("table->chain_hdr.fragments_count %u\n", + fragments); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (fragments < SSDFS_NEXT_BLK_TABLE_INDEX) { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(flags & SSDFS_MULTIPLE_HDR_CHAIN); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (fragments > 0) + last_desc = &table->blk[fragments - 1]; + else + last_desc = &table->blk[0]; + + last_desc->magic = SSDFS_FRAGMENT_DESC_MAGIC; + + switch (fsi->metadata_options.blk2off_tbl.compression) { + case SSDFS_BLK2OFF_TBL_NOCOMPR_TYPE: + last_desc->type = SSDFS_DATA_BLK_DESC; + break; + case SSDFS_BLK2OFF_TBL_ZLIB_COMPR_TYPE: + last_desc->type = SSDFS_DATA_BLK_DESC_ZLIB; + break; + case SSDFS_BLK2OFF_TBL_LZO_COMPR_TYPE: + last_desc->type = SSDFS_DATA_BLK_DESC_LZO; + break; + default: + BUG(); + } + + last_desc->flags = 0; + } else if (flags & SSDFS_MULTIPLE_HDR_CHAIN) { + u32 write_offset = 0; + + BUG_ON(fragments > SSDFS_NEXT_BLK_TABLE_INDEX); + + last_desc = &table->blk[SSDFS_NEXT_BLK_TABLE_INDEX]; + + if (is_compressed) { + write_offset = area->compressed_offset; + new_offset = + ssdfs_peb_correct_area_write_offset(write_offset, + blk_table_size); + area->compressed_offset = new_offset; + } else { + write_offset = area->write_offset; + new_offset = + ssdfs_peb_correct_area_write_offset(write_offset, + blk_table_size); + area->write_offset = new_offset; + } + + last_desc->offset = cpu_to_le32(new_offset); + + last_desc->compr_size = cpu_to_le16(blk_table_size); + last_desc->uncompr_size = cpu_to_le16(blk_table_size); + last_desc->checksum = 0; + + if (area->metadata.sequence_id == U8_MAX) + area->metadata.sequence_id = 0; + + last_desc->sequence_id = area->metadata.sequence_id++; + + last_desc->magic = SSDFS_FRAGMENT_DESC_MAGIC; + last_desc->type = SSDFS_NEXT_TABLE_DESC; + last_desc->flags = 0; + } + + hdr_flags = le16_to_cpu(table->chain_hdr.flags); + hdr_flags |= flags; + table->chain_hdr.flags = cpu_to_le16(hdr_flags); + + reserved_offset = area->metadata.reserved_offset; + page_index = reserved_offset / PAGE_SIZE; + page = ssdfs_page_array_get_page_locked(&area->array, page_index); + if (IS_ERR_OR_NULL(page)) { + SSDFS_ERR("fail to get page %lu for area %#x\n", + page_index, area_type); + return -ERANGE; + } + + page_off = reserved_offset % PAGE_SIZE; + + err = ssdfs_memcpy_to_page(page, page_off, PAGE_SIZE, + table, 0, blk_table_size, + blk_table_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + goto finish_copy; + } + + SetPageUptodate(page); + + err = ssdfs_page_array_set_page_dirty(&area->array, page_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: " + "err %d\n", + page_index, err); + } + +finish_copy: + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_peb_allocate_area_block_table() - reserve block table + * @pebi: pointer on PEB object + * @area_type: area type + * + * This function tries to prepare new in-core block table. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EAGAIN - log is full. + */ +static +int ssdfs_peb_allocate_area_block_table(struct ssdfs_peb_info *pebi, + int area_type) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_peb_area *area; + u16 fragments; + struct ssdfs_area_block_table *table; + struct ssdfs_fragment_desc *last_desc; + size_t blk_table_size = sizeof(struct ssdfs_area_block_table); + u32 write_offset = 0; + bool is_compressed = false; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebi); + BUG_ON(area_type >= SSDFS_LOG_AREA_MAX); + BUG_ON(!is_ssdfs_peb_current_log_locked(pebi)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (area_type) { + case SSDFS_LOG_MAIN_AREA: + case SSDFS_LOG_DIFFS_AREA: + case SSDFS_LOG_JOURNAL_AREA: + /* these areas haven't area block table */ + SSDFS_DBG("area block table doesn't be created\n"); + return 0; + + case SSDFS_LOG_BLK_DESC_AREA: + /* store area block table */ + break; + + default: + BUG(); + }; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("write_offset %u, area_type %#x\n", + pebi->current_log.area[area_type].write_offset, + area_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebi->pebc->parent_si->fsi; + is_compressed = fsi->metadata_options.blk2off_tbl.flags & + SSDFS_BLK2OFF_TBL_MAKE_COMPRESSION; + + area = &pebi->current_log.area[area_type]; + table = &area->metadata.area.blk_desc.table; + fragments = le16_to_cpu(table->chain_hdr.fragments_count); + + BUG_ON(fragments > SSDFS_NEXT_BLK_TABLE_INDEX); + + if (fragments < SSDFS_NEXT_BLK_TABLE_INDEX) { + SSDFS_ERR("invalid fragments count %u\n", fragments); + return -ERANGE; + } + + last_desc = &table->blk[SSDFS_NEXT_BLK_TABLE_INDEX]; + + if (is_compressed) + write_offset = area->compressed_offset; + else + write_offset = area->write_offset; + + if (le32_to_cpu(last_desc->offset) != write_offset) { + SSDFS_ERR("last_desc->offset %u != write_offset %u\n", + le32_to_cpu(last_desc->offset), write_offset); + return -ERANGE; + } + + if (!has_current_page_free_space(pebi, area_type, blk_table_size)) { + err = ssdfs_peb_grow_log_area(pebi, area_type, blk_table_size); + if (err == -ENOSPC) { + SSDFS_DBG("log is full\n"); + return -EAGAIN; + } else if (unlikely(err)) { + SSDFS_ERR("fail to grow log area: " + "type %#x, err %d\n", + area_type, err); + return err; + } + } + + table->chain_hdr.compr_bytes = 0; + table->chain_hdr.uncompr_bytes = 0; + table->chain_hdr.fragments_count = 0; + table->chain_hdr.desc_size = + cpu_to_le16(sizeof(struct ssdfs_fragment_desc)); + table->chain_hdr.magic = SSDFS_CHAIN_HDR_MAGIC; + table->chain_hdr.flags = 0; + + memset(table->blk, 0, + sizeof(struct ssdfs_fragment_desc) * SSDFS_BLK_TABLE_MAX); + + area->metadata.reserved_offset = write_offset; + + if (is_compressed) + area->compressed_offset += blk_table_size; + else + area->write_offset += blk_table_size; + + return 0; +} + +/* try to estimate fragment size in the log */ +static inline +u32 ssdfs_peb_estimate_data_fragment_size(u32 uncompr_bytes) +{ + u32 estimated_compr_size; + + /* + * There are several alternatives: + * (1) overestimate size; + * (2) underestimate size; + * (3) try to predict possible size by means of some formula. + * + * Currently, try to estimate size as 65% from uncompressed state + * for compression case. + */ + + estimated_compr_size = (uncompr_bytes * 65) / 100; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("uncompr_bytes %u, estimated_compr_size %u\n", + uncompr_bytes, estimated_compr_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + return estimated_compr_size; +} + /* * ssdfs_request_rest_bytes() - define rest bytes in request * @pebi: pointer on PEB object From patchwork Sat Feb 25 01:08:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151939 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A29AFC7EE2F for ; Sat, 25 Feb 2023 01:17:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229769AbjBYBRq (ORCPT ); Fri, 24 Feb 2023 20:17:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48958 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229699AbjBYBQx (ORCPT ); Fri, 24 Feb 2023 20:16:53 -0500 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BCF70126F8 for ; Fri, 24 Feb 2023 17:16:48 -0800 (PST) Received: by mail-oi1-x22b.google.com with SMTP id y184so797407oiy.8 for ; Fri, 24 Feb 2023 17:16:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Aq/FyLfyYq730/9HJ+6sbYZbrF8x53Q1FERLrwNxmK4=; b=nC5/DDmOIJ13VlUHya/tDngrE8ZAC+iwjfkskWVfzsxY2KO6f6AN+gOiXNpmvAzrF9 H/Yl9R8tSREb+TADPrNlka1L0bqbKm0tBL+4G2VExSNP03/QqE10zj7PrZKSvxdwfTuX FpjEObLhK3DgDLAd0skkHTsYNgOf99d51Q9vwZNj4+eGaLCAV+hpuLP9EuSvTkUB3egV L1B8dH1PYgdg/y+eARZ/C7AHTcMZbY+RKgtsZdxo21ITVKpjPWyI+X1+ztNThk7HDNNV o0L0+kvmKGYnbH7U+AifQnwXxGgE5NwkQwXh9YknvHHoArpK1knacFI3Ncwx6WVP11Bp 2NkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Aq/FyLfyYq730/9HJ+6sbYZbrF8x53Q1FERLrwNxmK4=; b=PliPe9wrMZ5v1KAJ2U2j/uDvTJFxICHdgFx+5Cfkd1huny+r6Cn7kNKaewAj/bAS6o UX+jK6JwfDwpMzIedKLdCIm8PiPnB5m7EHLtr0LU+dTLIaFoLaIifsebQvK2uBfBSXSo O4mh3yHWEQ3xXlSmrzGoyHl3gr/J8qbs5MjMYyYbrKj/GnGIgm4jO39swfHGypss9FFG ZMsKKpnD8oO8yTwQhg4eZkx6CWZJo8hcN4dvTEH5szXfhY26SKNpMxvyMiiw6Y4p13ZY v8rEOfVG/wVPvrf3InkJ9txtmw+pochFkTUQSBxh9UnBG1XuCC/0cDSxU/lMc/OWOnPy bETQ== X-Gm-Message-State: AO0yUKUI9/AgTg126bQ+hNHcFvA3P+VPZi4khZkZGWjsI2b1UgJXVsf+ 6pCzlQttsWp9qAWbzZSGmDPuZIeCGXi70lsG X-Google-Smtp-Source: AK7set8CD7AauSgDKwiLHJSP18vVRtB8t3hPn2tuT93To9Ir7ov2vewtTyhMiNmI8+2JvCXb2PYyYw== X-Received: by 2002:aca:1308:0:b0:384:ec1:cc6c with SMTP id e8-20020aca1308000000b003840ec1cc6cmr574930oii.54.1677287807114; Fri, 24 Feb 2023 17:16:47 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:46 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 34/76] ssdfs: auxilairy GC threads logic Date: Fri, 24 Feb 2023 17:08:45 -0800 Message-Id: <20230225010927.813929-35-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org SSDFS implements a migration scheme. Migration scheme is a fundamental technique of GC overhead management. The key responsibility of the migration scheme is to guarantee the presence of data in the same segment for any update operations. Generally speaking, the migration scheme’s model is implemented on the basis of association an exhausted "Physical" Erase Block (PEB) with a clean one. The goal such association of two PEBs is to implement the gradual migration of data by means of the update operations in the initial (exhausted) PEB. As a result, the old, exhausted PEB becomes invalidated after complete data migration and it will be possible to apply the erase operation to convert it in the clean state. Moreover, the destination PEB in the association changes the initial PEB for some index in the segment and, finally, it becomes the only PEB for this position. Namely such technique implements the concept of logical extent with the goal to decrease the write amplification issue and to manage the GC overhead. Because the logical extent concept excludes the necessity to update metadata is tracking the position of user data on the file system’s volume. Generally speaking, the migration scheme is capable to decrease the GC activity significantly by means of excluding the necessity to update metadata and by means of self-migration of data between of PEBs is triggered by regular update operations. Generally speaking, SSDFS doesn't need in classical model of garbage collection that is used in NILFS2 or F2FS. However, SSDFS has several global GC threads (dirty, pre-dirty, used, using segment states) and segment bitmap. The main responsibility of global GC threads is: (1) find segment in a particular state, (2) check that segment object is constructed and initialized by file system driver logic, (3) check the necessity to stimulate or finish the migration (if segment is under update operations or has update operations recently, then migration stimulation is not necessary), (4) define valid blocks that require migration, (5) add recommended migration request to PEB update queue, (6) destroy in-core segment object if no migration is necessary and no create/update requests have been received by segment object recently. Global GC threads are used to recommend migration stimulation for particular PEBs and to destroy in-core segment objects that have no requests for processing. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/peb_gc_thread.c | 2953 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 2953 insertions(+) create mode 100644 fs/ssdfs/peb_gc_thread.c diff --git a/fs/ssdfs/peb_gc_thread.c b/fs/ssdfs/peb_gc_thread.c new file mode 100644 index 000000000000..918da1888196 --- /dev/null +++ b/fs/ssdfs/peb_gc_thread.c @@ -0,0 +1,2953 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/peb_gc_thread.c - GC thread functionality. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "offset_translation_table.h" +#include "compression.h" +#include "page_vector.h" +#include "block_bitmap.h" +#include "page_array.h" +#include "peb.h" +#include "peb_container.h" +#include "peb_mapping_table.h" +#include "segment_bitmap.h" +#include "segment.h" +#include "segment_tree.h" + +#include + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_gc_page_leaks; +atomic64_t ssdfs_gc_memory_leaks; +atomic64_t ssdfs_gc_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_gc_cache_leaks_increment(void *kaddr) + * void ssdfs_gc_cache_leaks_decrement(void *kaddr) + * void *ssdfs_gc_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_gc_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_gc_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_gc_kfree(void *kaddr) + * struct page *ssdfs_gc_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_gc_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_gc_free_page(struct page *page) + * void ssdfs_gc_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(gc) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(gc) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_gc_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_gc_page_leaks, 0); + atomic64_set(&ssdfs_gc_memory_leaks, 0); + atomic64_set(&ssdfs_gc_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_gc_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_gc_page_leaks) != 0) { + SSDFS_ERR("GC: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_gc_page_leaks)); + } + + if (atomic64_read(&ssdfs_gc_memory_leaks) != 0) { + SSDFS_ERR("GC: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_gc_memory_leaks)); + } + + if (atomic64_read(&ssdfs_gc_cache_leaks) != 0) { + SSDFS_ERR("GC: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_gc_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +/****************************************************************************** + * GC THREAD FUNCTIONALITY * + ******************************************************************************/ + +static +struct ssdfs_thread_descriptor thread_desc[SSDFS_GC_THREAD_TYPE_MAX] = { + {.threadfn = ssdfs_using_seg_gc_thread_func, + .fmt = "ssdfs-gc-using-seg",}, + {.threadfn = ssdfs_used_seg_gc_thread_func, + .fmt = "ssdfs-gc-used-seg",}, + {.threadfn = ssdfs_pre_dirty_seg_gc_thread_func, + .fmt = "ssdfs-gc-pre-dirty-seg",}, + {.threadfn = ssdfs_dirty_seg_gc_thread_func, + .fmt = "ssdfs-gc-dirty-seg",}, +}; + +/* + * __ssdfs_peb_define_extent() - define extent for request + * @fsi: pointer on shared file system object + * @pebi: pointer on PEB object + * @desc_off: physical offset descriptor + * @desc_array: array of metadata descriptors + * @pos: position offset + * @req: request + * + * This function tries to define extent for request. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EAGAIN - unable to extract the whole range. + */ +static +int __ssdfs_peb_define_extent(struct ssdfs_fs_info *fsi, + struct ssdfs_peb_info *pebi, + struct ssdfs_phys_offset_descriptor *desc_off, + struct ssdfs_metadata_descriptor *desc_array, + struct ssdfs_offset_position *pos, + struct ssdfs_segment_request *req) +{ + struct ssdfs_block_descriptor *blk_desc = NULL; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !pebi || !desc_off || !req); + BUG_ON(!desc_array); + + SSDFS_DBG("peb %llu, " + "class %#x, cmd %#x, type %#x\n", + pebi->peb_id, + req->private.class, req->private.cmd, req->private.type); + SSDFS_DBG("ino %llu, seg %llu, peb %llu, logical_offset %llu, " + "processed_blks %d, logical_block %u, data_bytes %u, " + "cno %llu, parent_snapshot %llu, cmd %#x, type %#x\n", + req->extent.ino, req->place.start.seg_id, + pebi->peb_id, + req->extent.logical_offset, + req->result.processed_blks, + req->place.start.blk_index, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot, + req->private.cmd, req->private.type); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_blk_desc_buffer_init(pebi->pebc, req, desc_off, pos, + desc_array, + SSDFS_SEG_HDR_DESC_MAX); + if (unlikely(err)) { + SSDFS_ERR("fail to init blk desc buffer: err %d\n", + err); + goto finish_define_extent; + } + + blk_desc = &pos->blk_desc.buf; + + if (req->extent.ino >= U64_MAX) { + req->extent.ino = le64_to_cpu(blk_desc->ino); + req->extent.logical_offset = + le32_to_cpu(blk_desc->logical_offset); + req->extent.logical_offset *= fsi->pagesize; + } else if (req->extent.ino != le64_to_cpu(blk_desc->ino)) { + err = -EAGAIN; + req->place.len = req->result.processed_blks; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("OFFSET DESCRIPTOR: " + "logical_offset %u, logical_blk %u, " + "peb_page %u, log_start_page %u, " + "log_area %u, peb_migration_id %u, " + "byte_offset %u\n", + le32_to_cpu(desc_off->page_desc.logical_offset), + le16_to_cpu(desc_off->page_desc.logical_blk), + le16_to_cpu(desc_off->page_desc.peb_page), + le16_to_cpu(desc_off->blk_state.log_start_page), + desc_off->blk_state.log_area, + desc_off->blk_state.peb_migration_id, + le32_to_cpu(desc_off->blk_state.byte_offset)); + SSDFS_DBG("BLOCK DECRIPTOR: " + "ino %llu, logical_offset %u, " + "peb_index %u, peb_page %u, " + "log_start_page %u, " + "log_area %u, peb_migration_id %u, " + "byte_offset %u\n", + le64_to_cpu(blk_desc->ino), + le32_to_cpu(blk_desc->logical_offset), + le16_to_cpu(blk_desc->peb_index), + le16_to_cpu(blk_desc->peb_page), + le16_to_cpu(blk_desc->state[0].log_start_page), + blk_desc->state[0].log_area, + blk_desc->state[0].peb_migration_id, + le32_to_cpu(blk_desc->state[0].byte_offset)); + SSDFS_DBG("ino %llu, seg %llu, peb %llu, logical_offset %llu, " + "processed_blks %d, logical_block %u, " + "data_bytes %u, blks %u, " + "cno %llu, parent_snapshot %llu, cmd %#x, type %#x\n", + req->extent.ino, req->place.start.seg_id, + pebi->peb_id, + req->extent.logical_offset, + req->result.processed_blks, + req->place.start.blk_index, + req->place.len, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot, + req->private.cmd, req->private.type); + SSDFS_DBG("ino1 %llu != ino2 %llu, peb %llu\n", + req->extent.ino, + le64_to_cpu(blk_desc->ino), + pebi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + goto finish_define_extent; + } + + req->extent.data_bytes += fsi->pagesize; + +finish_define_extent: + return err; +} + +/* + * __ssdfs_peb_copy_page() - copy page from PEB into buffer + * @pebc: pointer on PEB container + * @desc_off: physical offset descriptor + * @pos: position offset + * @req: request + * + * This function tries to copy PEB's page into the buffer. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EAGAIN - unable to extract the whole range. + */ +static +int __ssdfs_peb_copy_page(struct ssdfs_peb_container *pebc, + struct ssdfs_phys_offset_descriptor *desc_off, + struct ssdfs_offset_position *pos, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_peb_info *pebi = NULL; + struct ssdfs_metadata_descriptor desc_array[SSDFS_SEG_HDR_DESC_MAX]; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!desc_off || !pos || !req); + + SSDFS_DBG("seg %llu, peb_index %u, " + "class %#x, cmd %#x, type %#x\n", + pebc->parent_si->seg_id, pebc->peb_index, + req->private.class, req->private.cmd, req->private.type); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebc->parent_si->fsi; + + down_read(&pebc->lock); + + pebi = pebc->src_peb; + + if (!pebi) { + err = -ERANGE; + SSDFS_ERR("invalid source peb: " + "src_peb %p, dst_peb %p\n", + pebc->src_peb, pebc->dst_peb); + goto finish_copy_page; + } + + if (pagevec_space(&req->result.pvec) == 0) { + err = -EAGAIN; + SSDFS_DBG("request's pagevec is full\n"); + goto finish_copy_page; + } + + err = __ssdfs_peb_define_extent(fsi, pebi, desc_off, + desc_array, pos, req); + if (err == -EAGAIN) { + SSDFS_DBG("unable to add block of another inode\n"); + goto finish_copy_page; + } else if (unlikely(err)) { + SSDFS_ERR("fail to define extent: " + "seg %llu, peb_index %u, peb %llu, err %d\n", + pebc->parent_si->seg_id, pebc->peb_index, + pebi->peb_id, err); + goto finish_copy_page; + } + + err = ssdfs_request_add_allocated_page_locked(req); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate memory page: " + "err %d\n", err); + goto finish_copy_page; + } + + err = ssdfs_peb_read_block_state(pebc, req, desc_off, pos, + desc_array, + SSDFS_SEG_HDR_DESC_MAX); + if (unlikely(err)) { + SSDFS_ERR("fail to read block state: err %d\n", + err); + goto finish_copy_page; + } + +finish_copy_page: + up_read(&pebc->lock); + + return err; +} + +/* + * ssdfs_peb_define_extent() - define extent for request + * @pebc: pointer on PEB container + * @desc_off: physical offset descriptor + * @req: request + * + * This function tries to define extent for request. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EAGAIN - unable to extract the whole range. + */ +#ifdef CONFIG_SSDFS_UNDER_DEVELOPMENT_FUNC +static +int ssdfs_peb_define_extent(struct ssdfs_peb_container *pebc, + struct ssdfs_phys_offset_descriptor *desc_off, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_peb_info *pebi = NULL; + struct ssdfs_metadata_descriptor desc_array[SSDFS_SEG_HDR_DESC_MAX]; + struct ssdfs_block_descriptor blk_desc = {0}; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!desc_off || !req); + + SSDFS_DBG("seg %llu, peb_index %u, " + "class %#x, cmd %#x, type %#x\n", + pebc->parent_si->seg_id, pebc->peb_index, + req->private.class, req->private.cmd, req->private.type); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebc->parent_si->fsi; + + down_read(&pebc->lock); + + pebi = pebc->src_peb; + + if (!pebi) { + err = -ERANGE; + SSDFS_ERR("invalid source peb: " + "src_peb %p, dst_peb %p\n", + pebc->src_peb, pebc->dst_peb); + goto finish_define_extent; + } + + err = __ssdfs_peb_define_extent(fsi, pebi, desc_off, + desc_array, &blk_desc, req); + if (err == -EAGAIN) { + SSDFS_DBG("unable to add block of another inode\n"); + goto finish_define_extent; + } else if (unlikely(err)) { + SSDFS_ERR("fail to define extent: " + "seg %llu, peb_index %u, peb %llu, err %d\n", + pebc->parent_si->seg_id, pebc->peb_index, + pebi->peb_id, err); + goto finish_define_extent; + } + +finish_define_extent: + up_read(&pebc->lock); + + return err; +} +#endif /* CONFIG_SSDFS_UNDER_DEVELOPMENT_FUNC */ + +/* + * ssdfs_peb_copy_pre_alloc_page() - copy pre-alloc page into buffer + * @pebc: pointer on PEB container + * @logical_blk: logical block + * @req: request + * + * This function tries to copy PEB's page into the buffer. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - pre-allocated block hasn't content. + * %-EAGAIN - unable to extract the whole range. + */ +int ssdfs_peb_copy_pre_alloc_page(struct ssdfs_peb_container *pebc, + u32 logical_blk, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_blk2off_table *table; + struct ssdfs_phys_offset_descriptor *desc_off = NULL; + struct ssdfs_offset_position pos = {0}; + u16 peb_index; + bool has_data = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!req); + + SSDFS_DBG("seg %llu, peb_index %u, " + "class %#x, cmd %#x, type %#x, " + "logical_blk %u\n", + pebc->parent_si->seg_id, pebc->peb_index, + req->private.class, req->private.cmd, req->private.type, + logical_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebc->parent_si->fsi; + + if (logical_blk >= U32_MAX) { + SSDFS_ERR("invalid logical_blk %u\n", + logical_blk); + return -EINVAL; + } + + table = pebc->parent_si->blk2off_table; + + desc_off = ssdfs_blk2off_table_convert(table, logical_blk, + &peb_index, NULL, + &pos); + if (IS_ERR(desc_off) && PTR_ERR(desc_off) == -EAGAIN) { + struct completion *end = &table->full_init_end; + + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("blk2off init failed: " + "err %d\n", err); + return err; + } + + desc_off = ssdfs_blk2off_table_convert(table, logical_blk, + &peb_index, NULL, + &pos); + } + + if (IS_ERR_OR_NULL(desc_off)) { + err = (desc_off == NULL ? -ERANGE : PTR_ERR(desc_off)); + SSDFS_ERR("fail to convert: " + "logical_blk %u, err %d\n", + logical_blk, err); + return err; + } + + has_data = (desc_off->blk_state.log_area < SSDFS_LOG_AREA_MAX) && + (le32_to_cpu(desc_off->blk_state.byte_offset) < U32_MAX); + + if (has_data) { + ssdfs_peb_read_request_cno(pebc); + + err = __ssdfs_peb_copy_page(pebc, desc_off, &pos, req); + if (err == -EAGAIN) { + SSDFS_DBG("unable to add block of another inode\n"); + goto finish_copy_page; + } else if (unlikely(err)) { + SSDFS_ERR("fail to copy page: " + "logical_blk %u, err %d\n", + logical_blk, err); + goto finish_copy_page; + } + + err = ssdfs_blk2off_table_set_block_migration(table, + logical_blk, + peb_index, + req); + if (unlikely(err)) { + SSDFS_ERR("fail to set migration state: " + "logical_blk %u, peb_index %u, err %d\n", + logical_blk, peb_index, err); + goto finish_copy_page; + } + +finish_copy_page: + ssdfs_peb_finish_read_request_cno(pebc); + } else { + if (req->extent.logical_offset >= U64_MAX) + req->extent.logical_offset = 0; + + req->extent.data_bytes += fsi->pagesize; + + err = -ENODATA; + req->result.processed_blks = 1; + req->result.err = err; + } + + return err; +} + +/* + * ssdfs_peb_copy_page() - copy valid page from PEB into buffer + * @pebc: pointer on PEB container + * @logical_blk: logical block + * @req: request + * + * This function tries to copy PEB's page into the buffer. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EAGAIN - unable to extract the whole range. + */ +int ssdfs_peb_copy_page(struct ssdfs_peb_container *pebc, + u32 logical_blk, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_blk2off_table *table; + struct ssdfs_phys_offset_descriptor *desc_off = NULL; + struct ssdfs_offset_position pos = {0}; + u16 peb_index; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!req); + + SSDFS_DBG("seg %llu, peb_index %u, " + "class %#x, cmd %#x, type %#x, " + "logical_blk %u\n", + pebc->parent_si->seg_id, pebc->peb_index, + req->private.class, req->private.cmd, req->private.type, + logical_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebc->parent_si->fsi; + + if (logical_blk >= U32_MAX) { + SSDFS_ERR("invalid logical_blk %u\n", + logical_blk); + return -EINVAL; + } + + table = pebc->parent_si->blk2off_table; + + desc_off = ssdfs_blk2off_table_convert(table, logical_blk, + &peb_index, NULL, + &pos); + if (IS_ERR(desc_off) && PTR_ERR(desc_off) == -EAGAIN) { + struct completion *end = &table->full_init_end; + + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("blk2off init failed: " + "err %d\n", err); + return err; + } + + desc_off = ssdfs_blk2off_table_convert(table, logical_blk, + &peb_index, NULL, + &pos); + } + + if (IS_ERR_OR_NULL(desc_off)) { + err = (desc_off == NULL ? -ERANGE : PTR_ERR(desc_off)); + SSDFS_ERR("fail to convert: " + "logical_blk %u, err %d\n", + logical_blk, err); + return err; + } + + ssdfs_peb_read_request_cno(pebc); + + err = __ssdfs_peb_copy_page(pebc, desc_off, &pos, req); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to copy the whole range: " + "logical_blk %u, peb_index %u\n", + logical_blk, peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_copy_page; + } else if (unlikely(err)) { + SSDFS_ERR("fail to copy page: " + "logical_blk %u, peb_index %u, err %d\n", + logical_blk, peb_index, err); + goto finish_copy_page; + } + + err = ssdfs_blk2off_table_set_block_migration(table, + logical_blk, + peb_index, + req); + if (unlikely(err)) { + SSDFS_ERR("fail to set migration state: " + "logical_blk %u, peb_index %u, err %d\n", + logical_blk, peb_index, err); + goto finish_copy_page; + } + +finish_copy_page: + ssdfs_peb_finish_read_request_cno(pebc); + + return err; +} + +/* + * ssdfs_peb_copy_pages_range() - copy pages' range into buffer + * @pebc: pointer on PEB container + * @range: range of logical blocks + * @req: request + * + * This function tries to copy PEB's page into the buffer. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EAGAIN - unable to extract the whole range. + */ +int ssdfs_peb_copy_pages_range(struct ssdfs_peb_container *pebc, + struct ssdfs_block_bmap_range *range, + struct ssdfs_segment_request *req) +{ + u32 logical_blk; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!range || !req); + + SSDFS_DBG("seg %llu, peb_index %u, " + "class %#x, cmd %#x, type %#x, " + "range->start %u, range->len %u\n", + pebc->parent_si->seg_id, pebc->peb_index, + req->private.class, req->private.cmd, req->private.type, + range->start, range->len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (range->len == 0) { + SSDFS_WARN("empty pages range request\n"); + return 0; + } + + req->extent.ino = U64_MAX; + req->extent.logical_offset = U64_MAX; + req->extent.data_bytes = 0; + + req->place.start.seg_id = pebc->parent_si->seg_id; + req->place.start.blk_index = range->start; + req->place.len = 0; + + req->result.processed_blks = 0; + + for (i = 0; i < range->len; i++) { + logical_blk = range->start + i; + req->place.len++; + + err = ssdfs_peb_copy_page(pebc, logical_blk, req); + if (err == -EAGAIN) { + req->place.len = req->result.processed_blks; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to copy the whole range: " + "seg %llu, logical_blk %u, len %u\n", + pebc->parent_si->seg_id, + logical_blk, req->place.len); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to copy page: " + "seg %llu, logical_blk %u, err %d\n", + pebc->parent_si->seg_id, + logical_blk, err); + return err; + } + } + + return 0; +} + +/* TODO: add condition of presence of items for processing */ +#define GC_THREAD_WAKE_CONDITION(pebi) \ + (kthread_should_stop()) + +/* + * ssdfs_peb_gc_thread_func() - main fuction of GC thread + * @data: pointer on data object + * + * This function is main fuction of GC thread. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + */ +int ssdfs_peb_gc_thread_func(void *data) +{ + struct ssdfs_peb_container *pebc = data; + wait_queue_head_t *wait_queue; + +#ifdef CONFIG_SSDFS_DEBUG + if (!pebc) { + SSDFS_ERR("pointer on PEB container is NULL\n"); + return -EINVAL; + } + + SSDFS_DBG("GC thread: seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + wait_queue = &pebc->parent_si->wait_queue[SSDFS_PEB_GC_THREAD]; + +repeat: + if (kthread_should_stop()) { + complete_all(&pebc->thread[SSDFS_PEB_GC_THREAD].full_stop); + return 0; + } + + /* TODO: collect garbage */ + SSDFS_DBG("TODO: implement %s\n", __func__); + goto sleep_gc_thread; + /*return -ENOSYS;*/ + +sleep_gc_thread: + wait_event_interruptible(*wait_queue, GC_THREAD_WAKE_CONDITION(pebi)); + goto repeat; +} + +/* + * ssdfs_gc_find_next_seg_id() - find next victim segment ID + * @fsi: pointer on shared file system object + * @start_seg_id: starting segment ID + * @max_seg_id: upper bound value for the search + * @seg_type: type of segment + * @type_mask: segment types' mask + * @seg_id: found segment ID [out] + * + * This function tries to find the next victim + * segement ID for the requested type. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - no segment for requested state was found. + */ +static +int ssdfs_gc_find_next_seg_id(struct ssdfs_fs_info *fsi, + u64 start_seg_id, u64 max_seg_id, + int seg_type, int type_mask, + u64 *seg_id) +{ + struct ssdfs_segment_bmap *segbmap; + struct completion *init_end; + int res; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !fsi->segbmap || !seg_id); + + SSDFS_DBG("fsi %p, start_seg_id %llu, max_seg_id %llu, " + "seg_type %#x, type_mask %#x\n", + fsi, start_seg_id, max_seg_id, + seg_type, type_mask); +#endif /* CONFIG_SSDFS_DEBUG */ + + segbmap = fsi->segbmap; + *seg_id = U64_MAX; + +try_to_find_victim: + res = ssdfs_segbmap_find(segbmap, + start_seg_id, max_seg_id, + seg_type, type_mask, + seg_id, &init_end); + if (res >= 0) { +check_segment_state: + switch (res) { + case SSDFS_SEG_DATA_USING: + case SSDFS_SEG_LEAF_NODE_USING: + case SSDFS_SEG_HYBRID_NODE_USING: + case SSDFS_SEG_INDEX_NODE_USING: + /* do nothing */ + break; + + default: + if (res != seg_type) { + if (*seg_id >= max_seg_id) { + res = -ENODATA; + goto finish_search_segments; + } else { + start_seg_id = *seg_id + 1; + *seg_id = U64_MAX; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("res %#x != seg_type %#x\n", + res, seg_type); +#endif /* CONFIG_SSDFS_DEBUG */ + goto try_to_find_victim; + } + } + break; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found segment: " + "seg_id %llu, state %#x\n", + *seg_id, res); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (res == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(init_end); + if (unlikely(err)) { + SSDFS_ERR("segbmap init failed: " + "err %d\n", err); + return err; + } + + res = ssdfs_segbmap_find(segbmap, + start_seg_id, max_seg_id, + seg_type, type_mask, + seg_id, &init_end); + if (res >= 0) + goto check_segment_state; + else if (res == -ENODATA) + goto finish_search_segments; + else if (res == -EAGAIN) { + res = -ENODATA; + goto finish_search_segments; + } else + goto fail_find_segment; + } else if (res == -ENODATA) { +finish_search_segments: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("no more victim segments: " + "start_seg_id %llu, max_seg_id %llu\n", + start_seg_id, max_seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return res; + } else { +fail_find_segment: + SSDFS_ERR("fail to find segment number: " + "start_seg_id %llu, max_seg_id %llu, " + "err %d\n", + start_seg_id, max_seg_id, res); + return res; + } + + return 0; +} + +/* + * ssdfs_gc_convert_leb2peb() - convert LEB ID into PEB ID + * @fsi: pointer on shared file system object + * @leb_id: LEB ID number + * @pebr: pointer on PEBs association container [out] + * + * This method tries to convert LEB ID into PEB ID. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENODATA - can't convert LEB to PEB. + * %-ERANGE - internal error. + */ +static +int ssdfs_gc_convert_leb2peb(struct ssdfs_fs_info *fsi, + u64 leb_id, + struct ssdfs_maptbl_peb_relation *pebr) +{ + struct completion *init_end; +#ifdef CONFIG_SSDFS_DEBUG + struct ssdfs_maptbl_peb_descriptor *ptr; +#endif /* CONFIG_SSDFS_DEBUG */ + u8 peb_type = SSDFS_MAPTBL_UNKNOWN_PEB_TYPE; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !pebr); + + SSDFS_DBG("fsi %p, leb_id %llu, pebr %p\n", + fsi, leb_id, pebr); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_maptbl_convert_leb2peb(fsi, leb_id, + peb_type, pebr, + &init_end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(init_end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + return err; + } + + err = ssdfs_maptbl_convert_leb2peb(fsi, leb_id, + peb_type, pebr, + &init_end); + } + + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB is not mapped: leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, peb_type %#x, err %d\n", + leb_id, peb_type, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB %llu\n", leb_id); + + ptr = &pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX]; + SSDFS_DBG("MAIN: peb_id %llu, shared_peb_index %u, " + "erase_cycles %u, type %#x, state %#x, " + "flags %#x\n", + ptr->peb_id, ptr->shared_peb_index, + ptr->erase_cycles, ptr->type, + ptr->state, ptr->flags); + ptr = &pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX]; + SSDFS_DBG("RELATION: peb_id %llu, shared_peb_index %u, " + "erase_cycles %u, type %#x, state %#x, " + "flags %#x\n", + ptr->peb_id, ptr->shared_peb_index, + ptr->erase_cycles, ptr->type, + ptr->state, ptr->flags); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * should_ssdfs_segment_be_destroyed() - check necessity to destroy a segment + * @si: pointer on segment object + * + * This method tries to check the necessity to destroy + * a segment object. + */ +static +bool should_ssdfs_segment_be_destroyed(struct ssdfs_segment_info *si) +{ + struct ssdfs_peb_container *pebc; + struct ssdfs_peb_info *pebi; + u64 peb_id; + bool is_rq_empty; + bool is_fq_empty; + bool peb_has_dirty_pages = false; + bool is_blk_bmap_dirty = false; + bool dont_touch = false; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si); + + SSDFS_DBG("seg_id %llu, refs_count %d\n", + si->seg_id, + atomic_read(&si->refs_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (atomic_read(&si->refs_count) > 0) + return false; + + dont_touch = should_gc_doesnt_touch_segment(si); + if (dont_touch) + return false; + + for (i = 0; i < si->pebs_count; i++) { + pebc = &si->peb_array[i]; + + is_rq_empty = is_ssdfs_requests_queue_empty(READ_RQ_PTR(pebc)); + is_fq_empty = !have_flush_requests(pebc); + + is_blk_bmap_dirty = + is_ssdfs_segment_blk_bmap_dirty(&si->blk_bmap, i); + + pebi = ssdfs_get_current_peb_locked(pebc); + if (IS_ERR_OR_NULL(pebi)) + return false; + + ssdfs_peb_current_log_lock(pebi); + peb_has_dirty_pages = ssdfs_peb_has_dirty_pages(pebi); + peb_id = pebi->peb_id; + ssdfs_peb_current_log_unlock(pebi); + ssdfs_unlock_current_peb(pebc); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, peb_id %llu, refs_count %d, " + "peb_has_dirty_pages %#x, " + "not empty: (read %#x, flush %#x), " + "dont_touch %#x, is_blk_bmap_dirty %#x\n", + si->seg_id, peb_id, + atomic_read(&si->refs_count), + peb_has_dirty_pages, + !is_rq_empty, !is_fq_empty, + dont_touch, is_blk_bmap_dirty); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_rq_empty || !is_fq_empty || + peb_has_dirty_pages || is_blk_bmap_dirty) + return false; + } + + return true; +} + +/* + * should_gc_work() - check that GC should fulfill some activity + * @fsi: pointer on shared file system object + * @type: thread type + */ +static inline +bool should_gc_work(struct ssdfs_fs_info *fsi, int type) +{ + return atomic_read(&fsi->gc_should_act[type]) > 0; +} + +#define GLOBAL_GC_THREAD_WAKE_CONDITION(fsi, type) \ + (kthread_should_stop() || should_gc_work(fsi, type)) +#define GLOBAL_GC_FAILED_THREAD_WAKE_CONDITION() \ + (kthread_should_stop()) + +#define SSDFS_GC_LOW_BOUND_THRESHOLD (50) +#define SSDFS_GC_UPPER_BOUND_THRESHOLD (1000) +#define SSDFS_GC_DISTANCE_THRESHOLD (5) +#define SSDFS_GC_DEFAULT_SEARCH_STEP (100) +#define SSDFS_GC_DIRTY_SEG_SEARCH_STEP (1000) +#define SSDFS_GC_DIRTY_SEG_DEFAULT_OPS (50) + +/* + * GC possible states + */ +enum { + SSDFS_UNDEFINED_GC_STATE, + SSDFS_COLLECT_GARBAGE_NOW, + SSDFS_WAIT_IDLE_STATE, + SSDFS_STOP_GC_ACTIVITY_NOW, + SSDFS_GC_STATE_MAX +}; + +/* + * struct ssdfs_io_load_stats - I/O load estimation + * @measurements: number of executed measurements + * @reqs_count: number of I/O requests for every measurement + */ +struct ssdfs_io_load_stats { + u32 measurements; +#define SSDFS_MEASUREMENTS_MAX (10) + s64 reqs_count[SSDFS_MEASUREMENTS_MAX]; +}; + +/* + * is_time_collect_garbage() - check that it's good time for GC activity + * @fsi: pointer on shared file system object + * @io_stats: I/O load estimation [in|out] + * + * This method tries to estimate the I/O load with + * the goal to define the good time for GC activity. + */ +static +int is_time_collect_garbage(struct ssdfs_fs_info *fsi, + struct ssdfs_io_load_stats *io_stats) +{ + int state; + s64 reqs_count; + s64 average_diff; + s64 cur_diff; + u64 distance; + u32 i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !io_stats); + + SSDFS_DBG("fsi %p, io_stats %p, measurements %u\n", + fsi, io_stats, io_stats->measurements); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (io_stats->measurements > SSDFS_MEASUREMENTS_MAX) { + SSDFS_ERR("invalid count: " + "measurements %u\n", + io_stats->measurements); + return SSDFS_UNDEFINED_GC_STATE; + } + + reqs_count = atomic64_read(&fsi->flush_reqs); + + if (reqs_count < 0) { + SSDFS_WARN("unexpected reqs_count %lld\n", + reqs_count); + } + + if (io_stats->measurements < SSDFS_MEASUREMENTS_MAX) { + io_stats->reqs_count[io_stats->measurements] = reqs_count; + io_stats->measurements++; + } + + state = atomic_read(&fsi->global_fs_state); + switch (state) { + case SSDFS_METADATA_GOING_FLUSHING: + case SSDFS_METADATA_UNDER_FLUSH: + /* + * Thread that is trying to flush metadata + * waits the end of user data flush requests. + * So, GC should not add any requests, + * otherwise, the metadata flush could + * never happened. + */ + SSDFS_DBG("don't add request before metadata flush\n"); + return SSDFS_WAIT_IDLE_STATE; + + default: + /* continue logic */ + break; + } + + if (reqs_count <= SSDFS_GC_LOW_BOUND_THRESHOLD) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("reqs_count %lld\n", reqs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + return SSDFS_COLLECT_GARBAGE_NOW; + } + + if (reqs_count >= SSDFS_GC_UPPER_BOUND_THRESHOLD) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("reqs_count %lld\n", reqs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + return SSDFS_STOP_GC_ACTIVITY_NOW; + } + + if (io_stats->measurements < 3) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("measurement %u, reqs_count %lld\n", + io_stats->measurements, + reqs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + return SSDFS_WAIT_IDLE_STATE; + } + + average_diff = 0; + + for (i = 1; i < io_stats->measurements; i++) { + cur_diff = io_stats->reqs_count[i] - + io_stats->reqs_count[i - 1]; + average_diff += cur_diff; + } + + if (average_diff < 0) { + /* + * I/O load is decreasing. + */ + cur_diff = io_stats->reqs_count[io_stats->measurements - 1]; + distance = div_u64((u64)cur_diff, abs(average_diff)); + + if (distance < SSDFS_GC_DISTANCE_THRESHOLD) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("I/O load is decreasing: " + "average_diff %lld : " + "Start GC activity.\n", + average_diff); +#endif /* CONFIG_SSDFS_DEBUG */ + return SSDFS_COLLECT_GARBAGE_NOW; + } + } else { + /* + * I/O load is increasing. + */ + if (io_stats->measurements >= SSDFS_MEASUREMENTS_MAX) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("I/O load is increasing: " + "average_diff %lld : " + "Stop GC activity.\n", + average_diff); +#endif /* CONFIG_SSDFS_DEBUG */ + return SSDFS_STOP_GC_ACTIVITY_NOW; + } + } + + return SSDFS_WAIT_IDLE_STATE; +} + +#define SSDFS_SEG2REQ_PAIR_CAPACITY (10) + +/* + * struct ssdfs_seg2req_pair_array - segment/request pairs array + * @items_count: items count in the array + * @pairs: pairs array + */ +struct ssdfs_seg2req_pair_array { + u32 items_count; + struct ssdfs_seg2req_pair pairs[SSDFS_SEG2REQ_PAIR_CAPACITY]; +}; + +/* + * is_seg2req_pair_array_exhausted() - is seg2req pairs array exhausted? + * @array: pairs array + */ +static inline +bool is_seg2req_pair_array_exhausted(struct ssdfs_seg2req_pair_array *array) +{ + bool is_exhausted; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); +#endif /* CONFIG_SSDFS_DEBUG */ + + is_exhausted = array->items_count >= SSDFS_SEG2REQ_PAIR_CAPACITY; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("is_exhausted %#x\n", is_exhausted); +#endif /* CONFIG_SSDFS_DEBUG */ + + return is_exhausted; +} + +/* + * ssdfs_gc_check_request() - check request + * @req: segment request + * + * This method tries to check the state of request. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_gc_check_request(struct ssdfs_segment_request *req) +{ + wait_queue_head_t *wq = NULL; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); + + SSDFS_DBG("req %p\n", req); +#endif /* CONFIG_SSDFS_DEBUG */ + +check_req_state: + switch (atomic_read(&req->result.state)) { + case SSDFS_REQ_CREATED: + case SSDFS_REQ_STARTED: + wq = &req->private.wait_queue; + + err = wait_event_killable_timeout(*wq, + has_request_been_executed(req), + SSDFS_DEFAULT_TIMEOUT); + if (err < 0) + WARN_ON(err < 0); + else + err = 0; + + goto check_req_state; + break; + + case SSDFS_REQ_FINISHED: + /* do nothing */ + break; + + case SSDFS_REQ_FAILED: + err = req->result.err; + + if (!err) { + SSDFS_ERR("error code is absent: " + "req %p, err %d\n", + req, err); + err = -ERANGE; + } + + SSDFS_ERR("flush request is failed: " + "err %d\n", err); + return err; + + default: + SSDFS_ERR("invalid result's state %#x\n", + atomic_read(&req->result.state)); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_gc_wait_commit_logs_end() - wait commit logs ending + * @fsi: pointer on shared file system object + * @array: seg2req pairs array + * + * This method is waiting the end of commit logs operation. + */ +static +void ssdfs_gc_wait_commit_logs_end(struct ssdfs_fs_info *fsi, + struct ssdfs_seg2req_pair_array *array) +{ + u32 items_count; + int refs_count; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array); + + SSDFS_DBG("items_count %u\n", array->items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + items_count = min_t(u32, array->items_count, + SSDFS_SEG2REQ_PAIR_CAPACITY); + + for (i = 0; i < items_count; i++) { + struct ssdfs_seg2req_pair *pair; + + pair = &array->pairs[i]; + + if (pair->req != NULL) { + err = ssdfs_gc_check_request(pair->req); + if (unlikely(err)) { + SSDFS_ERR("flush request failed: " + "err %d\n", err); + } + + refs_count = + atomic_read(&pair->req->private.refs_count); + if (refs_count != 0) { + SSDFS_WARN("unexpected refs_count %d\n", + refs_count); + } + + ssdfs_request_free(pair->req); + } else { + SSDFS_ERR("request is NULL: " + "item_index %d\n", i); + } + + if (pair->si != NULL) { + struct ssdfs_segment_info *si = pair->si; + + ssdfs_segment_put_object(si); + + if (should_ssdfs_segment_be_destroyed(si)) { + err = ssdfs_segment_tree_remove(fsi, si); + if (unlikely(err)) { + SSDFS_WARN("fail to remove segment: " + "seg %llu, err %d\n", + si->seg_id, err); + } else { + err = ssdfs_segment_destroy_object(si); + if (err) { + SSDFS_WARN("fail to destroy: " + "seg %llu, err %d\n", + si->seg_id, err); + } + } + } + } else { + SSDFS_ERR("segment is NULL: " + "item_index %d\n", i); + } + } + + memset(array, 0, sizeof(struct ssdfs_seg2req_pair_array)); +} + +/* + * ssdfs_gc_stimulate_migration() - stimulate migration + * @si: pointer on segment object + * @pebc: pointer on PEB container object + * @array: seg2req pairs array + * + * This method tries to stimulate the PEB's migration. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_gc_stimulate_migration(struct ssdfs_segment_info *si, + struct ssdfs_peb_container *pebc, + struct ssdfs_seg2req_pair_array *array) +{ + struct ssdfs_peb_info *pebi; + struct ssdfs_seg2req_pair *pair; + u32 index; + int count; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !pebc || !array); + + SSDFS_DBG("seg %llu, peb_index %u\n", + si->seg_id, pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (have_flush_requests(pebc)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("Do nothing: request queue is not empty: " + "seg %llu, peb_index %u\n", + si->seg_id, pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + if (is_seg2req_pair_array_exhausted(array)) { + SSDFS_ERR("seg2req pair array is exhausted\n"); + return -ERANGE; + } + + index = array->items_count; + pair = &array->pairs[index]; + + if (pair->req || pair->si) { + SSDFS_ERR("invalid pair state: index %u\n", + index); + return -ERANGE; + } + + if (!is_peb_under_migration(pebc)) { + SSDFS_ERR("invalid PEB state: " + "seg %llu, peb_index %u\n", + si->seg_id, pebc->peb_index); + return -ERANGE; + } + + pebi = ssdfs_get_current_peb_locked(pebc); + if (IS_ERR_OR_NULL(pebi)) { + err = pebi == NULL ? -ERANGE : PTR_ERR(pebi); + SSDFS_ERR("fail to get PEB object: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + return err; + } + + /* + * The ssdfs_get_current_peb_locked() defines + * migration phase. It should be set properly + * before the ssdfs_peb_prepare_range_migration() + * call. + */ + + ssdfs_unlock_current_peb(pebc); + + mutex_lock(&pebc->migration_lock); + + for (count = 0; count < 2; count++) { + int err1, err2; + + err1 = ssdfs_peb_prepare_range_migration(pebc, 1, + SSDFS_BLK_PRE_ALLOCATED); + if (err1 && err1 != -ENODATA) { + err = err1; + break; + } + + err2 = ssdfs_peb_prepare_range_migration(pebc, 1, + SSDFS_BLK_VALID); + if (err2 && err2 != -ENODATA) { + err = err2; + break; + } + + if (err1 == -ENODATA && err2 == -ENODATA) { + err = 0; + break; + } + } + + if (unlikely(err)) { + SSDFS_ERR("fail to prepare range migration: " + "err %d\n", err); + } else if (count == 0) { + err = -ENODATA; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("no data for migration: " + "seg %llu, peb_index %u\n", + si->seg_id, pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + mutex_unlock(&pebc->migration_lock); + + if (unlikely(err)) + return err; + + pair->req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(pair->req)) { + err = (pair->req == NULL ? -ENOMEM : PTR_ERR(pair->req)); + SSDFS_ERR("fail to allocate request: err %d\n", + err); + return err; + } + + ssdfs_request_init(pair->req); + ssdfs_get_request(pair->req); + + err = ssdfs_segment_commit_log_async2(si, SSDFS_REQ_ASYNC_NO_FREE, + pebc->peb_index, pair->req); + if (unlikely(err)) { + SSDFS_ERR("commit log request failed: " + "err %d\n", err); + ssdfs_put_request(pair->req); + ssdfs_request_free(pair->req); + pair->req = NULL; + return err; + } + + pair->si = si; + array->items_count++; + + return 0; +} + +/* + * ssdfs_gc_finish_migration() - finish migration + * @si: pointer on segment object + * @pebc: pointer on PEB container object + * @array: seg2req pairs array + * + * This method tries to finish the PEB's migration. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_gc_finish_migration(struct ssdfs_segment_info *si, + struct ssdfs_peb_container *pebc, + struct ssdfs_seg2req_pair_array *array) +{ + struct ssdfs_seg2req_pair *pair; + struct ssdfs_peb_info *pebi; + u32 index; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !pebc || !array); + + SSDFS_DBG("seg %llu, peb_index %u\n", + si->seg_id, pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (have_flush_requests(pebc)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("Do nothing: request queue is not empty: " + "seg %llu, peb_index %u\n", + si->seg_id, pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + if (is_seg2req_pair_array_exhausted(array)) { + SSDFS_ERR("seg2req pair array is exhausted\n"); + return -ERANGE; + } + + index = array->items_count; + pair = &array->pairs[index]; + + if (pair->req || pair->si) { + SSDFS_ERR("invalid pair state: index %u\n", + index); + return -ERANGE; + } + + if (!is_peb_under_migration(pebc)) { + SSDFS_ERR("invalid PEB state: " + "seg %llu, peb_index %u\n", + si->seg_id, pebc->peb_index); + return -ERANGE; + } + + err = ssdfs_peb_finish_migration(pebc); + if (unlikely(err)) { + SSDFS_ERR("fail to finish migration: " + "seg %llu, peb_index %u, " + "err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + return err; + } + + pebi = ssdfs_get_current_peb_locked(pebc); + if (IS_ERR_OR_NULL(pebi)) { + err = pebi == NULL ? -ERANGE : PTR_ERR(pebi); + SSDFS_ERR("fail to get PEB object: " + "seg %llu, peb_index %u, err %d\n", + pebc->parent_si->seg_id, + pebc->peb_index, err); + return err; + } + + if (is_ssdfs_maptbl_going_to_be_destroyed(si->fsi->maptbl)) { + SSDFS_WARN("seg %llu, peb_index %u\n", + si->seg_id, pebc->peb_index); + } + + err = ssdfs_peb_container_change_state(pebc); + if (unlikely(err)) { + ssdfs_unlock_current_peb(pebc); + SSDFS_ERR("fail to change peb state: " + "err %d\n", err); + return err; + } + + ssdfs_unlock_current_peb(pebc); + + pair->req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(pair->req)) { + err = (pair->req == NULL ? -ENOMEM : PTR_ERR(pair->req)); + SSDFS_ERR("fail to allocate request: err %d\n", + err); + return err; + } + + ssdfs_request_init(pair->req); + ssdfs_get_request(pair->req); + + err = ssdfs_segment_commit_log_async2(si, SSDFS_REQ_ASYNC_NO_FREE, + pebc->peb_index, pair->req); + if (unlikely(err)) { + SSDFS_ERR("commit log request failed: " + "err %d\n", err); + ssdfs_put_request(pair->req); + ssdfs_request_free(pair->req); + pair->req = NULL; + return err; + } + + pair->si = si; + array->items_count++; + + return 0; +} + +static inline +int ssdfs_mark_segment_under_gc_activity(struct ssdfs_segment_info *si) +{ + int activity_type; + + activity_type = atomic_cmpxchg(&si->activity_type, + SSDFS_SEG_OBJECT_REGULAR_ACTIVITY, + SSDFS_SEG_UNDER_GC_ACTIVITY); + if (activity_type < SSDFS_SEG_OBJECT_REGULAR_ACTIVITY || + activity_type >= SSDFS_SEG_UNDER_GC_ACTIVITY) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segment %llu is busy under activity %#x\n", + si->seg_id, activity_type); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EBUSY; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segment %llu is under GC activity\n", + si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +static inline +int ssdfs_revert_segment_to_regular_activity(struct ssdfs_segment_info *si) +{ + int activity_type; + + activity_type = atomic_cmpxchg(&si->activity_type, + SSDFS_SEG_UNDER_GC_ACTIVITY, + SSDFS_SEG_OBJECT_REGULAR_ACTIVITY); + if (activity_type != SSDFS_SEG_UNDER_GC_ACTIVITY) { + SSDFS_WARN("segment %llu is under activity %#x\n", + si->seg_id, activity_type); + return -EFAULT; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segment %llu has been reverted from GC activity\n", + si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_generic_seg_gc_thread_func() - generic function of GC thread + * @fsi: pointer on shared file system object + * @thread_type: thread type + * @seg_state: type of segment + * @seg_state_mask: segment types' mask + * + * This function is the key logic of GC thread. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_generic_seg_gc_thread_func(struct ssdfs_fs_info *fsi, + int thread_type, + int seg_state, int seg_state_mask) +{ + struct ssdfs_segment_info *si; + struct ssdfs_peb_container *pebc; + struct ssdfs_maptbl_peb_relation pebr; + struct ssdfs_maptbl_peb_descriptor *pebd; + struct ssdfs_io_load_stats io_stats; + size_t io_stats_size = sizeof(struct ssdfs_io_load_stats); + wait_queue_head_t *wq; + struct ssdfs_segment_blk_bmap *seg_blkbmap; + struct ssdfs_peb_blk_bmap *peb_blkbmap; + struct ssdfs_seg2req_pair_array reqs_array; + u8 peb_type = SSDFS_MAPTBL_UNKNOWN_PEB_TYPE; + int seg_type = SSDFS_UNKNOWN_SEG_TYPE; + u64 seg_id = 0; + u64 max_seg_id; + u64 seg_id_step = SSDFS_GC_DEFAULT_SEARCH_STEP; + u64 nsegs; + u64 cur_leb_id; + u32 lebs_per_segment; + int gc_strategy; + int used_pages; + u32 i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + + SSDFS_DBG("GC thread: thread_type %#x, " + "seg_state %#x, seg_state_mask %#x\n", + thread_type, seg_state, seg_state_mask); +#endif /* CONFIG_SSDFS_DEBUG */ + + wq = &fsi->gc_wait_queue[thread_type]; + lebs_per_segment = fsi->pebs_per_seg; + memset(&reqs_array, 0, sizeof(struct ssdfs_seg2req_pair_array)); + +repeat: + if (kthread_should_stop()) { + complete_all(&fsi->gc_thread[thread_type].full_stop); + return err; + } else if (unlikely(err)) + goto sleep_failed_gc_thread; + + mutex_lock(&fsi->resize_mutex); + nsegs = fsi->nsegs; + mutex_unlock(&fsi->resize_mutex); + + if (seg_id >= nsegs) + seg_id = 0; + + while (seg_id < nsegs) { + peb_type = SSDFS_MAPTBL_UNKNOWN_PEB_TYPE; + seg_type = SSDFS_UNKNOWN_SEG_TYPE; + + max_seg_id = seg_id + seg_id_step; + max_seg_id = min_t(u64, max_seg_id, nsegs); + + err = ssdfs_gc_find_next_seg_id(fsi, seg_id, max_seg_id, + seg_state, seg_state_mask, + &seg_id); + if (err == -ENODATA) { + err = 0; + + if (max_seg_id >= nsegs) { + seg_id = 0; + SSDFS_DBG("GC hasn't found any victim\n"); + goto finish_seg_processing; + } + + seg_id = max_seg_id; + + wait_event_interruptible_timeout(*wq, + kthread_should_stop(), HZ); + + if (kthread_should_stop()) + goto finish_seg_processing; + else + continue; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find segment: " + "seg_id %llu, nsegs %llu, err %d\n", + seg_id, nsegs, err); + goto sleep_failed_gc_thread; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found segment: " + "seg_id %llu, seg_state %#x\n", + seg_id, seg_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (kthread_should_stop()) + goto finish_seg_processing; + + i = 0; + + for (; i < lebs_per_segment; i++) { + cur_leb_id = ssdfs_get_leb_id_for_peb_index(fsi, + seg_id, + i); + if (cur_leb_id >= U64_MAX) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unexpected leb_id: " + "seg_id %llu, peb_index %u\n", + seg_id, i); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } + + if (kthread_should_stop()) + goto finish_seg_processing; + + err = ssdfs_gc_convert_leb2peb(fsi, cur_leb_id, &pebr); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB is not mapped: leb_id %llu\n", + cur_leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } else if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, peb_type %#x, err %d\n", + cur_leb_id, peb_type, err); + goto sleep_failed_gc_thread; + } + + pebd = &pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX]; + + switch (pebd->state) { + case SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE: + /* PEB is under migration */ + break; + + case SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("SRC PEB %llu is dirty\n", + pebd->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + + default: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB %llu is not migrating\n", + cur_leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } + + pebd = &pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX]; + + switch (pebd->state) { + case SSDFS_MAPTBL_MIGRATION_DST_CLEAN_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + /* stimulate migration */ + break; + + default: + continue; + } + + if (kthread_should_stop()) + goto finish_seg_processing; + + peb_type = pebd->type; + seg_type = PEB2SEG_TYPE(peb_type); + + goto try_to_find_seg_object; + } + + if (i >= lebs_per_segment) { + /* segment hasn't valid blocks for migration */ + goto check_next_segment; + } + +try_to_find_seg_object: + si = ssdfs_segment_tree_find(fsi, seg_id); + if (IS_ERR_OR_NULL(si)) { + err = PTR_ERR(si); + + if (err == -ENODATA) { + /* + * It needs to create the segment. + */ + err = 0; + goto try_create_seg_object; + } else if (err == 0) { + err = -ERANGE; + SSDFS_ERR("seg tree returns NULL\n"); + goto finish_seg_processing; + } else { + SSDFS_ERR("fail to find segment: " + "seg %llu, err %d\n", + seg_id, err); + goto sleep_failed_gc_thread; + } + } else if (should_ssdfs_segment_be_destroyed(si)) { + /* + * Segment hasn't requests in the queues. + * But it is under migration. + * Try to collect the garbage. + */ + ssdfs_segment_get_object(si); + goto try_collect_garbage; + } else + goto check_next_segment; + +try_create_seg_object: + si = ssdfs_grab_segment(fsi, seg_type, seg_id, U64_MAX); + if (unlikely(IS_ERR_OR_NULL(si))) { + err = PTR_ERR(si); + SSDFS_ERR("fail to grab segment object: " + "seg %llu, err %d\n", + seg_id, err); + goto sleep_failed_gc_thread; + } + +try_collect_garbage: + for (; i < lebs_per_segment; i++) { + cur_leb_id = ssdfs_get_leb_id_for_peb_index(fsi, + seg_id, + i); + if (cur_leb_id >= U64_MAX) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unexpected leb_id: " + "seg_id %llu, peb_index %u\n", + seg_id, i); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } + + if (kthread_should_stop()) { + ssdfs_segment_put_object(si); + goto finish_seg_processing; + } + + err = ssdfs_gc_convert_leb2peb(fsi, cur_leb_id, &pebr); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB is not mapped: leb_id %llu\n", + cur_leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } else if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, peb_type %#x, err %d\n", + cur_leb_id, peb_type, err); + ssdfs_segment_put_object(si); + goto sleep_failed_gc_thread; + } + + pebd = &pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX]; + + switch (pebd->state) { + case SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE: + /* PEB is under migration */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB %llu is not migrating\n", + cur_leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } + + pebd = &pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX]; + + switch (pebd->state) { + case SSDFS_MAPTBL_MIGRATION_DST_CLEAN_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + /* stimulate migration */ + break; + + default: + continue; + } + + memset(&io_stats, 0, io_stats_size); + gc_strategy = SSDFS_UNDEFINED_GC_STATE; + + do { + gc_strategy = is_time_collect_garbage(fsi, + &io_stats); + + switch (gc_strategy) { + case SSDFS_COLLECT_GARBAGE_NOW: + goto collect_garbage_now; + + case SSDFS_STOP_GC_ACTIVITY_NOW: + ssdfs_segment_put_object(si); + goto finish_seg_processing; + + case SSDFS_WAIT_IDLE_STATE: + wait_event_interruptible_timeout(*wq, + kthread_should_stop(), + HZ); + + if (kthread_should_stop()) { + ssdfs_segment_put_object(si); + goto finish_seg_processing; + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("unexpected strategy %#x\n", + gc_strategy); + ssdfs_segment_put_object(si); + goto finish_seg_processing; + } + } while (gc_strategy == SSDFS_WAIT_IDLE_STATE); + +collect_garbage_now: + if (kthread_should_stop()) { + ssdfs_segment_put_object(si); + goto finish_seg_processing; + } + + pebc = &si->peb_array[i]; + + seg_blkbmap = &si->blk_bmap; + peb_blkbmap = &seg_blkbmap->peb[pebc->peb_index]; + + if (is_seg2req_pair_array_exhausted(&reqs_array)) + ssdfs_gc_wait_commit_logs_end(fsi, &reqs_array); + + used_pages = + ssdfs_src_blk_bmap_get_used_pages(peb_blkbmap); + if (used_pages < 0) { + err = used_pages; + SSDFS_ERR("fail to get used pages: err %d\n", + err); + ssdfs_segment_put_object(si); + goto sleep_failed_gc_thread; + } + + err = ssdfs_mark_segment_under_gc_activity(si); + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segment %llu is busy\n", + si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto check_next_segment; + } + + if (used_pages == 0) { + SSDFS_WARN("needs to finish migration: " + "seg %llu, leb_id %llu, " + "used_pages %d\n", + seg_id, cur_leb_id, used_pages); + } else if (used_pages <= SSDFS_GC_FINISH_MIGRATION) { + ssdfs_segment_get_object(si); + + err = ssdfs_gc_finish_migration(si, pebc, + &reqs_array); + if (unlikely(err)) { + SSDFS_ERR("fail to finish migration: " + "seg %llu, leb_id %llu, " + "err %d\n", + seg_id, cur_leb_id, err); + err = 0; + ssdfs_segment_put_object(si); + } + } else { + ssdfs_segment_get_object(si); + + err = ssdfs_gc_stimulate_migration(si, pebc, + &reqs_array); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("no data for migration: " + "seg %llu, leb_id %llu, " + "err %d\n", + seg_id, cur_leb_id, err); +#endif /* CONFIG_SSDFS_DEBUG */ + err = 0; + ssdfs_segment_put_object(si); + } else if (unlikely(err)) { + SSDFS_ERR("fail to stimulate migration: " + "seg %llu, leb_id %llu, " + "err %d\n", + seg_id, cur_leb_id, err); + err = 0; + ssdfs_segment_put_object(si); + } + } + } + + ssdfs_segment_put_object(si); + + if (is_seg2req_pair_array_exhausted(&reqs_array)) + ssdfs_gc_wait_commit_logs_end(fsi, &reqs_array); + + err = ssdfs_revert_segment_to_regular_activity(si); + if (unlikely(err)) { + SSDFS_ERR("segment %llu is under unexpected activity\n", + si->seg_id); + goto sleep_failed_gc_thread; + } + + if (should_ssdfs_segment_be_destroyed(si)) { + err = ssdfs_segment_tree_remove(fsi, si); + if (unlikely(err)) { + SSDFS_WARN("fail to remove segment: " + "seg %llu, err %d\n", + si->seg_id, err); + } else { + err = ssdfs_segment_destroy_object(si); + if (err) { + SSDFS_WARN("fail to destroy: " + "seg %llu, err %d\n", + si->seg_id, err); + } + } + } + +check_next_segment: + seg_id++; + + atomic_dec(&fsi->gc_should_act[thread_type]); + + if (kthread_should_stop()) + goto finish_seg_processing; + + if (atomic_read(&fsi->gc_should_act[thread_type]) > 0) { + wait_event_interruptible_timeout(*wq, + kthread_should_stop(), + HZ); + } else + goto finish_seg_processing; + + if (kthread_should_stop()) + goto finish_seg_processing; + } + +finish_seg_processing: + atomic_set(&fsi->gc_should_act[thread_type], 0); + + ssdfs_gc_wait_commit_logs_end(fsi, &reqs_array); + + wait_event_interruptible(*wq, + GLOBAL_GC_THREAD_WAKE_CONDITION(fsi, thread_type)); + goto repeat; + +sleep_failed_gc_thread: + atomic_set(&fsi->gc_should_act[thread_type], 0); + + ssdfs_gc_wait_commit_logs_end(fsi, &reqs_array); + + wait_event_interruptible(*wq, + GLOBAL_GC_FAILED_THREAD_WAKE_CONDITION()); + goto repeat; +} + +/* + * should_continue_processing() - should continue processing? + */ +static inline +bool should_continue_processing(int mandatory_ops) +{ + if (kthread_should_stop()) { + if (mandatory_ops > 0) + return true; + else + return false; + } else + return true; +} + +/* + * __ssdfs_dirty_seg_gc_thread_func() - GC thread's function for dirty segments + * @fsi: pointer on shared file system object + * @thread_type: thread type + * @seg_state: type of segment + * @seg_state_mask: segment types' mask + * + * This function is the logic of GC thread for dirty segments. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_dirty_seg_gc_thread_func(struct ssdfs_fs_info *fsi, + int thread_type, + int seg_state, int seg_state_mask) +{ + struct ssdfs_segment_info *si; + struct ssdfs_maptbl_peb_relation pebr; + struct ssdfs_maptbl_peb_descriptor *pebd; + struct ssdfs_segment_bmap *segbmap; + struct completion *end = NULL; + wait_queue_head_t *wq; + u64 seg_id = 0; + u64 max_seg_id; + u64 nsegs; + u64 cur_leb_id; + u32 lebs_per_segment; + int mandatory_ops = SSDFS_GC_DIRTY_SEG_DEFAULT_OPS; + u32 i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + + SSDFS_DBG("GC thread: thread_type %#x, " + "seg_state %#x, seg_state_mask %#x\n", + thread_type, seg_state, seg_state_mask); +#endif /* CONFIG_SSDFS_DEBUG */ + + segbmap = fsi->segbmap; + wq = &fsi->gc_wait_queue[thread_type]; + lebs_per_segment = fsi->pebs_per_seg; + +repeat: + if (kthread_should_stop()) { + complete_all(&fsi->gc_thread[thread_type].full_stop); + return err; + } else if (unlikely(err)) + goto sleep_failed_gc_thread; + + mutex_lock(&fsi->resize_mutex); + nsegs = fsi->nsegs; + mutex_unlock(&fsi->resize_mutex); + + if (seg_id >= nsegs) + seg_id = 0; + + while (seg_id < nsegs) { + max_seg_id = nsegs; + + err = ssdfs_gc_find_next_seg_id(fsi, seg_id, max_seg_id, + seg_state, seg_state_mask, + &seg_id); + if (err == -ENODATA) { + err = 0; + seg_id = 0; + SSDFS_DBG("GC hasn't found any victim\n"); + goto finish_seg_processing; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find segment: " + "seg_id %llu, nsegs %llu, err %d\n", + seg_id, nsegs, err); + goto sleep_failed_gc_thread; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found segment: " + "seg_id %llu, seg_state %#x\n", + seg_id, seg_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!should_continue_processing(mandatory_ops)) + goto finish_seg_processing; + + i = 0; + + for (; i < lebs_per_segment; i++) { + cur_leb_id = ssdfs_get_leb_id_for_peb_index(fsi, + seg_id, + i); + if (cur_leb_id >= U64_MAX) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unexpected leb_id: " + "seg_id %llu, peb_index %u\n", + seg_id, i); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } + + err = ssdfs_gc_convert_leb2peb(fsi, cur_leb_id, &pebr); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB doesn't mapped: leb_id %llu\n", + cur_leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } else if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, err %d\n", + cur_leb_id, err); + goto sleep_failed_gc_thread; + } + + pebd = &pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX]; + + switch (pebd->state) { + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + /* PEB is dirty */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB %llu is not dirty: " + "pebd->state %u\n", + cur_leb_id, pebd->state); +#endif /* CONFIG_SSDFS_DEBUG */ + goto check_next_segment; + } + + if (!should_continue_processing(mandatory_ops)) + goto finish_seg_processing; + + goto try_to_find_seg_object; + } + +try_to_find_seg_object: + si = ssdfs_segment_tree_find(fsi, seg_id); + if (IS_ERR_OR_NULL(si)) { + err = PTR_ERR(si); + + if (err == -ENODATA) { + err = 0; + goto try_set_pre_erase_state; + } else if (err == 0) { + err = -ERANGE; + SSDFS_ERR("seg tree returns NULL\n"); + goto finish_seg_processing; + } else { + SSDFS_ERR("fail to find segment: " + "seg %llu, err %d\n", + seg_id, err); + goto sleep_failed_gc_thread; + } + } else if (should_ssdfs_segment_be_destroyed(si)) { + err = ssdfs_segment_tree_remove(fsi, si); + if (unlikely(err)) { + SSDFS_WARN("fail to remove segment: " + "seg %llu, err %d\n", + si->seg_id, err); + } else { + err = ssdfs_segment_destroy_object(si); + if (err) { + SSDFS_WARN("fail to destroy: " + "seg %llu, err %d\n", + si->seg_id, err); + } + } + } else + goto check_next_segment; + +try_set_pre_erase_state: + for (; i < lebs_per_segment; i++) { + cur_leb_id = ssdfs_get_leb_id_for_peb_index(fsi, + seg_id, + i); + if (cur_leb_id >= U64_MAX) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unexpected leb_id: " + "seg_id %llu, peb_index %u\n", + seg_id, i); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } + + err = ssdfs_gc_convert_leb2peb(fsi, cur_leb_id, &pebr); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB doesn't mapped: leb_id %llu\n", + cur_leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } else if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, err %d\n", + cur_leb_id, err); + goto sleep_failed_gc_thread; + } + + pebd = &pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX]; + + switch (pebd->state) { + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + /* PEB is dirty */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB %llu is not dirty: " + "pebd->state %u\n", + cur_leb_id, pebd->state); +#endif /* CONFIG_SSDFS_DEBUG */ + goto check_next_segment; + } + + err = ssdfs_maptbl_prepare_pre_erase_state(fsi, + cur_leb_id, + pebd->type, + &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + goto sleep_failed_gc_thread; + } + + err = ssdfs_maptbl_prepare_pre_erase_state(fsi, + cur_leb_id, + pebd->type, + &end); + } + + if (err == -EBUSY) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to prepare pre-erase state: " + "leb_id %llu\n", + cur_leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_seg_processing; + } else if (unlikely(err)) { + SSDFS_ERR("fail to prepare pre-erase state: " + "leb_id %llu, err %d\n", + cur_leb_id, err); + goto sleep_failed_gc_thread; + } + } + + err = ssdfs_segbmap_change_state(segbmap, seg_id, + SSDFS_SEG_CLEAN, &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("segbmap init failed: " + "err %d\n", err); + goto sleep_failed_gc_thread; + } + + err = ssdfs_segbmap_change_state(segbmap, seg_id, + SSDFS_SEG_CLEAN, &end); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to change segment state: " + "seg %llu, state %#x, err %d\n", + seg_id, SSDFS_SEG_CLEAN, err); + goto sleep_failed_gc_thread; + } + +check_next_segment: + mandatory_ops--; + seg_id++; + + if (!should_continue_processing(mandatory_ops)) + goto finish_seg_processing; + } + +finish_seg_processing: + atomic_set(&fsi->gc_should_act[thread_type], 0); + + wait_event_interruptible(*wq, + GLOBAL_GC_THREAD_WAKE_CONDITION(fsi, thread_type)); + goto repeat; + +sleep_failed_gc_thread: + atomic_set(&fsi->gc_should_act[thread_type], 0); + + wait_event_interruptible(*wq, + GLOBAL_GC_FAILED_THREAD_WAKE_CONDITION()); + goto repeat; +} + +/* + * ssdfs_collect_dirty_segments_now() - collect dirty segments now + * @fsi: pointer on shared file system object + * + * This function tries to collect the dirty segments. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_collect_dirty_segments_now(struct ssdfs_fs_info *fsi) +{ + struct ssdfs_segment_info *si; + struct ssdfs_maptbl_peb_relation pebr; + struct ssdfs_maptbl_peb_descriptor *pebd; + struct ssdfs_segment_bmap *segbmap; + struct completion *end = NULL; + int seg_state = SSDFS_SEG_DIRTY; + int seg_state_mask = SSDFS_SEG_DIRTY_STATE_FLAG; + u64 seg_id = 0; + u64 max_seg_id; + u64 nsegs; + u64 cur_leb_id; + u32 lebs_per_segment; + u32 i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + + SSDFS_DBG("fsi %p\n", fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + + segbmap = fsi->segbmap; + lebs_per_segment = fsi->pebs_per_seg; + + mutex_lock(&fsi->resize_mutex); + nsegs = fsi->nsegs; + mutex_unlock(&fsi->resize_mutex); + + while (seg_id < nsegs) { + max_seg_id = nsegs; + + err = ssdfs_gc_find_next_seg_id(fsi, seg_id, max_seg_id, + seg_state, seg_state_mask, + &seg_id); + if (err == -ENODATA) { + SSDFS_DBG("GC hasn't found any victim\n"); + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find segment: " + "seg_id %llu, nsegs %llu, err %d\n", + seg_id, nsegs, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found segment: " + "seg_id %llu, seg_state %#x\n", + seg_id, seg_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + i = 0; + + for (; i < lebs_per_segment; i++) { + cur_leb_id = ssdfs_get_leb_id_for_peb_index(fsi, + seg_id, + i); + if (cur_leb_id >= U64_MAX) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unexpected leb_id: " + "seg_id %llu, peb_index %u\n", + seg_id, i); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } + + err = ssdfs_gc_convert_leb2peb(fsi, cur_leb_id, &pebr); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB is not mapped: leb_id %llu\n", + cur_leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } else if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, err %d\n", + cur_leb_id, err); + return err; + } + + pebd = &pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX]; + + switch (pebd->state) { + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + /* PEB is dirty */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB %llu is not dirty: " + "pebd->state %u\n", + cur_leb_id, pebd->state); +#endif /* CONFIG_SSDFS_DEBUG */ + goto check_next_segment; + } + + goto try_to_find_seg_object; + } + +try_to_find_seg_object: + si = ssdfs_segment_tree_find(fsi, seg_id); + if (IS_ERR_OR_NULL(si)) { + err = PTR_ERR(si); + + if (err == -ENODATA) { + err = 0; + goto try_set_pre_erase_state; + } else if (err == 0) { + err = -ERANGE; + SSDFS_ERR("seg tree returns NULL\n"); + return err; + } else { + SSDFS_ERR("fail to find segment: " + "seg %llu, err %d\n", + seg_id, err); + return err; + } + } else if (should_ssdfs_segment_be_destroyed(si)) { + err = ssdfs_segment_tree_remove(fsi, si); + if (unlikely(err)) { + SSDFS_WARN("fail to remove segment: " + "seg %llu, err %d\n", + si->seg_id, err); + } else { + err = ssdfs_segment_destroy_object(si); + if (err) { + SSDFS_WARN("fail to destroy: " + "seg %llu, err %d\n", + si->seg_id, err); + } + } + } else + goto check_next_segment; + +try_set_pre_erase_state: + for (; i < lebs_per_segment; i++) { + cur_leb_id = ssdfs_get_leb_id_for_peb_index(fsi, + seg_id, + i); + if (cur_leb_id >= U64_MAX) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unexpected leb_id: " + "seg_id %llu, peb_index %u\n", + seg_id, i); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } + + err = ssdfs_gc_convert_leb2peb(fsi, cur_leb_id, &pebr); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB is not mapped: leb_id %llu\n", + cur_leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } else if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, err %d\n", + cur_leb_id, err); + return err; + } + + pebd = &pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX]; + + switch (pebd->state) { + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + /* PEB is dirty */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB %llu is not dirty: " + "pebd->state %u\n", + cur_leb_id, pebd->state); +#endif /* CONFIG_SSDFS_DEBUG */ + goto check_next_segment; + } + + err = ssdfs_maptbl_prepare_pre_erase_state(fsi, + cur_leb_id, + pebd->type, + &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + return err; + } + + err = ssdfs_maptbl_prepare_pre_erase_state(fsi, + cur_leb_id, + pebd->type, + &end); + } + + if (err == -EBUSY) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to prepare pre-erase state: " + "leb_id %llu\n", + cur_leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto check_next_segment; + } else if (unlikely(err)) { + SSDFS_ERR("fail to prepare pre-erase state: " + "leb_id %llu, err %d\n", + cur_leb_id, err); + return err; + } + } + + err = ssdfs_segbmap_change_state(segbmap, seg_id, + SSDFS_SEG_CLEAN, &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("segbmap init failed: " + "err %d\n", err); + return err; + } + + err = ssdfs_segbmap_change_state(segbmap, seg_id, + SSDFS_SEG_CLEAN, &end); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to change segment state: " + "seg %llu, state %#x, err %d\n", + seg_id, SSDFS_SEG_CLEAN, err); + return err; + } + +check_next_segment: + seg_id++; + } + + return 0; +} + +int ssdfs_using_seg_gc_thread_func(void *data) +{ + struct ssdfs_fs_info *fsi = data; + +#ifdef CONFIG_SSDFS_DEBUG + if (!fsi) { + SSDFS_ERR("invalid shared FS object\n"); + return -EINVAL; + } + + SSDFS_DBG("GC thread: using segments\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return ssdfs_generic_seg_gc_thread_func(fsi, + SSDFS_SEG_USING_GC_THREAD, + SSDFS_SEG_DATA_USING, + SSDFS_SEG_DATA_USING_STATE_FLAG | + SSDFS_SEG_LEAF_NODE_USING_STATE_FLAG | + SSDFS_SEG_HYBRID_NODE_USING_STATE_FLAG | + SSDFS_SEG_INDEX_NODE_USING_STATE_FLAG); +} + +int ssdfs_used_seg_gc_thread_func(void *data) +{ + struct ssdfs_fs_info *fsi = data; + +#ifdef CONFIG_SSDFS_DEBUG + if (!fsi) { + SSDFS_ERR("invalid shared FS object\n"); + return -EINVAL; + } + + SSDFS_DBG("GC thread: used segments\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return ssdfs_generic_seg_gc_thread_func(fsi, + SSDFS_SEG_USED_GC_THREAD, + SSDFS_SEG_USED, + SSDFS_SEG_USED_STATE_FLAG); +} + +int ssdfs_pre_dirty_seg_gc_thread_func(void *data) +{ + struct ssdfs_fs_info *fsi = data; + +#ifdef CONFIG_SSDFS_DEBUG + if (!fsi) { + SSDFS_ERR("invalid shared FS object\n"); + return -EINVAL; + } + + SSDFS_DBG("GC thread: pre-dirty segments\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return ssdfs_generic_seg_gc_thread_func(fsi, + SSDFS_SEG_PRE_DIRTY_GC_THREAD, + SSDFS_SEG_PRE_DIRTY, + SSDFS_SEG_PRE_DIRTY_STATE_FLAG); +} + +int ssdfs_dirty_seg_gc_thread_func(void *data) +{ + struct ssdfs_fs_info *fsi = data; + +#ifdef CONFIG_SSDFS_DEBUG + if (!fsi) { + SSDFS_ERR("invalid shared FS object\n"); + return -EINVAL; + } + + SSDFS_DBG("GC thread: dirty segments\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_dirty_seg_gc_thread_func(fsi, + SSDFS_SEG_DIRTY_GC_THREAD, + SSDFS_SEG_DIRTY, + SSDFS_SEG_DIRTY_STATE_FLAG); +} + +/* + * ssdfs_start_gc_thread() - start GC thread + * @fsi: pointer on shared file system object + * @type: thread type + * + * This function tries to start GC thread of @type. + * + * RETURN: + * [success] - GC thread has been started. + * [failure] - error code: + * + * %-EINVAL - invalid input. + */ +int ssdfs_start_gc_thread(struct ssdfs_fs_info *fsi, int type) +{ + ssdfs_threadfn threadfn; + const char *fmt; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + + if (type >= SSDFS_GC_THREAD_TYPE_MAX) { + SSDFS_ERR("invalid GC thread type %d\n", type); + return -EINVAL; + } + + SSDFS_DBG("thread_type %d\n", type); +#endif /* CONFIG_SSDFS_DEBUG */ + + threadfn = thread_desc[type].threadfn; + fmt = thread_desc[type].fmt; + + fsi->gc_thread[type].task = kthread_create(threadfn, fsi, fmt); + if (IS_ERR_OR_NULL(fsi->gc_thread[type].task)) { + err = PTR_ERR(fsi->gc_thread[type].task); + if (err == -EINTR) { + /* + * Ignore this error. + */ + } else { + if (err == 0) + err = -ERANGE; + SSDFS_ERR("fail to start GC thread: " + "thread_type %d, err %d\n", + type, err); + } + + return err; + } + + init_waitqueue_entry(&fsi->gc_thread[type].wait, + fsi->gc_thread[type].task); + add_wait_queue(&fsi->gc_wait_queue[type], + &fsi->gc_thread[type].wait); + init_completion(&fsi->gc_thread[type].full_stop); + + wake_up_process(fsi->gc_thread[type].task); + + return 0; +} + +/* + * ssdfs_stop_gc_thread() - stop GC thread + * @fsi: pointer on shared file system object + * @type: thread type + * + * This function tries to stop GC thread of @type. + * + * RETURN: + * [success] - GC thread has been stopped. + * [failure] - error code: + * + * %-EINVAL - invalid input. + */ +int ssdfs_stop_gc_thread(struct ssdfs_fs_info *fsi, int type) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + + if (type >= SSDFS_GC_THREAD_TYPE_MAX) { + SSDFS_ERR("invalid GC thread type %d\n", type); + return -EINVAL; + } + + SSDFS_DBG("type %#x, task %p\n", + type, fsi->gc_thread[type].task); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!fsi->gc_thread[type].task) + return 0; + + err = kthread_stop(fsi->gc_thread[type].task); + if (err == -EINTR) { + /* + * Ignore this error. + * The wake_up_process() was never called. + */ + return 0; + } else if (unlikely(err)) { + SSDFS_WARN("thread function had some issue: err %d\n", + err); + return err; + } + + finish_wait(&fsi->gc_wait_queue[type], + &fsi->gc_thread[type].wait); + + fsi->gc_thread[type].task = NULL; + + err = SSDFS_WAIT_COMPLETION(&fsi->gc_thread[type].full_stop); + if (unlikely(err)) { + SSDFS_ERR("stop thread fails: err %d\n", err); + return err; + } + + return 0; +} From patchwork Sat Feb 25 01:08:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151942 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 878CBC7EE2E for ; Sat, 25 Feb 2023 01:17:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229770AbjBYBRr (ORCPT ); Fri, 24 Feb 2023 20:17:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48986 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229618AbjBYBQ4 (ORCPT ); Fri, 24 Feb 2023 20:16:56 -0500 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E90013529 for ; Fri, 24 Feb 2023 17:16:51 -0800 (PST) Received: by mail-oi1-x22b.google.com with SMTP id t22so777846oiw.12 for ; Fri, 24 Feb 2023 17:16:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=fynaQ0+S+275hebR9iXIxiakj9lVqXF0oruYbxkQCD8=; b=5+cEdQDc+/TU+DPfb7abrsWUFqrxOtqw24FQguVZmRFt8c1M9uik0z10eRueyVYd+E taW5eK2Iyj7d54yIRG+fr14OWPBeXb09Z1XkEXIYgvyY9ihoVADNDqF2RwtMRjbuQMis o0tl8UxWGcEzivcmr/iVwmXz9O5UvOhHEs/8H0TIrDv3xLcdBNJ/aUvlCAh6CIGq67UB kUkK7PEgXWfoiiqDGaq4oyHOYKIYjhta0rbcVGEJ5lOEA0pYx081khhf1hv4tnj6EpuX ZG4Qm/n8IC/Elcc477+TqBaLdyaWODEy7grTVuStEBKvryodZlXl+7OYDjjwwbbB5usV /FTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fynaQ0+S+275hebR9iXIxiakj9lVqXF0oruYbxkQCD8=; b=lZPFyvycWCz0v73ChVHutTu/rZwpNjcSSJT4ibIb7SuYEsjPs36VOIB6hFmLcs/NvF 768Pu9+fnWsPWf0hRZPeGux8h/tkYnFui10nQAboIOldQKBJ/arucQo9Z6K8097Ncg0m wsVqI0Ak1D/kj3iAWNxNzdg4HB/sptv2s3LJK/v0yqrx0APWHCdmTLNkyzujU+NRrp2Y qJrFNSM6OPr2uRHvSHOWLEUjZMG1mHOwSDuFTVJF6i9b625+qzFcS9qlRXTj6k20GStX MkY5TedYk+eqiNSqYQNideNHTP0g90xbiq/PJh/PKAmIrqfLOh4lQb0F8a5/doAzRzbO mj5A== X-Gm-Message-State: AO0yUKVNYywtYbd557HUcG+6vKBjJ3DquRbChd2w+PdpuRWE43lY9RBO sjzG8MgIB0+2mnmgTrIdBEtXbP1dJYDtyIdr X-Google-Smtp-Source: AK7set+TtD3ruqf/uaiKXH1gyt1f2VDsSR5WRGREQ3wbFW1grg81DUHVDsVcKuRhPA57Dz/+0GA0hQ== X-Received: by 2002:a54:4415:0:b0:383:fa46:af3e with SMTP id k21-20020a544415000000b00383fa46af3emr1669186oiw.44.1677287808881; Fri, 24 Feb 2023 17:16:48 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:48 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 35/76] ssdfs: introduce segment object Date: Fri, 24 Feb 2023 17:08:46 -0800 Message-Id: <20230225010927.813929-36-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Segment is a basic unit for allocation, manage and free a file system volume space. It's a fixed size portion of volume that can contain one or several Logical Erase Blocks (LEB). Initially, segment is empty container that has clean state. File system logic can find a clean segment by means of search operation in segment bitmap. LEBs of clean segment need to be mapped into "Physical" Erase Blocks (PEB) by using PEB mapping table. Technically speaking, not every LEB can be mapped into PEB if mapping table hasn't any clean PEB. Segment can be imagined like a container that includes an array of PEB containers. Segment object implements the logic of logical blocks allocation, prepare create and update requests. Current segment has create queue that is used to add new data into file, for example. PEB container has update queue that is used for adding update requests. Flush thread is woken up after every operation of adding request into queue. Finally, flush thread executes create/update requests and commit logs with compressed and compacted user data or metadata. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/segment.c | 1315 +++++++++++++++++++++++++++++++++++++++ fs/ssdfs/segment.h | 957 ++++++++++++++++++++++++++++ fs/ssdfs/segment_tree.c | 748 ++++++++++++++++++++++ fs/ssdfs/segment_tree.h | 66 ++ 4 files changed, 3086 insertions(+) create mode 100644 fs/ssdfs/segment.c create mode 100644 fs/ssdfs/segment.h create mode 100644 fs/ssdfs/segment_tree.c create mode 100644 fs/ssdfs/segment_tree.h diff --git a/fs/ssdfs/segment.c b/fs/ssdfs/segment.c new file mode 100644 index 000000000000..6f23c16fe800 --- /dev/null +++ b/fs/ssdfs/segment.c @@ -0,0 +1,1315 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/segment.c - segment concept related functionality. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "page_vector.h" +#include "block_bitmap.h" +#include "offset_translation_table.h" +#include "page_array.h" +#include "peb_container.h" +#include "segment_bitmap.h" +#include "segment.h" +#include "current_segment.h" +#include "segment_tree.h" +#include "peb_mapping_table.h" + +#include + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_seg_obj_page_leaks; +atomic64_t ssdfs_seg_obj_memory_leaks; +atomic64_t ssdfs_seg_obj_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_seg_obj_cache_leaks_increment(void *kaddr) + * void ssdfs_seg_obj_cache_leaks_decrement(void *kaddr) + * void *ssdfs_seg_obj_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_seg_obj_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_seg_obj_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_seg_obj_kfree(void *kaddr) + * struct page *ssdfs_seg_obj_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_seg_obj_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_seg_obj_free_page(struct page *page) + * void ssdfs_seg_obj_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(seg_obj) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(seg_obj) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_seg_obj_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_seg_obj_page_leaks, 0); + atomic64_set(&ssdfs_seg_obj_memory_leaks, 0); + atomic64_set(&ssdfs_seg_obj_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_seg_obj_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_seg_obj_page_leaks) != 0) { + SSDFS_ERR("SEGMENT: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_seg_obj_page_leaks)); + } + + if (atomic64_read(&ssdfs_seg_obj_memory_leaks) != 0) { + SSDFS_ERR("SEGMENT: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_seg_obj_memory_leaks)); + } + + if (atomic64_read(&ssdfs_seg_obj_cache_leaks) != 0) { + SSDFS_ERR("SEGMENT: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_seg_obj_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +static struct kmem_cache *ssdfs_seg_obj_cachep; + +static void ssdfs_init_seg_object_once(void *obj) +{ + struct ssdfs_segment_info *seg_obj = obj; + + atomic_set(&seg_obj->refs_count, 0); +} + +void ssdfs_shrink_seg_obj_cache(void) +{ + if (ssdfs_seg_obj_cachep) + kmem_cache_shrink(ssdfs_seg_obj_cachep); +} + +void ssdfs_zero_seg_obj_cache_ptr(void) +{ + ssdfs_seg_obj_cachep = NULL; +} + +void ssdfs_destroy_seg_obj_cache(void) +{ + if (ssdfs_seg_obj_cachep) + kmem_cache_destroy(ssdfs_seg_obj_cachep); +} + +int ssdfs_init_seg_obj_cache(void) +{ + ssdfs_seg_obj_cachep = kmem_cache_create("ssdfs_seg_obj_cache", + sizeof(struct ssdfs_segment_info), 0, + SLAB_RECLAIM_ACCOUNT | + SLAB_MEM_SPREAD | + SLAB_ACCOUNT, + ssdfs_init_seg_object_once); + if (!ssdfs_seg_obj_cachep) { + SSDFS_ERR("unable to create segment objects cache\n"); + return -ENOMEM; + } + + return 0; +} + +/****************************************************************************** + * SEGMENT OBJECT FUNCTIONALITY * + ******************************************************************************/ + +/* + * ssdfs_segment_allocate_object() - allocate segment object + * @seg_id: segment number + * + * This function tries to allocate segment object. + * + * RETURN: + * [success] - pointer on allocated segment object + * [failure] - error code: + * + * %-ENOMEM - unable to allocate memory. + */ +struct ssdfs_segment_info *ssdfs_segment_allocate_object(u64 seg_id) +{ + struct ssdfs_segment_info *ptr; + + ptr = kmem_cache_alloc(ssdfs_seg_obj_cachep, GFP_KERNEL); + if (!ptr) { + SSDFS_ERR("fail to allocate memory for segment %llu\n", + seg_id); + return ERR_PTR(-ENOMEM); + } + + ssdfs_seg_obj_cache_leaks_increment(ptr); + + memset(ptr, 0, sizeof(struct ssdfs_segment_info)); + atomic_set(&ptr->obj_state, SSDFS_SEG_OBJECT_UNDER_CREATION); + atomic_set(&ptr->activity_type, SSDFS_SEG_OBJECT_NO_ACTIVITY); + ptr->seg_id = seg_id; + atomic_set(&ptr->refs_count, 0); + init_waitqueue_head(&ptr->object_queue); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segment object %p, seg_id %llu\n", + ptr, seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + return ptr; +} + +/* + * ssdfs_segment_free_object() - free segment object + * @si: pointer on segment object + * + * This function tries to free segment object. + */ +void ssdfs_segment_free_object(struct ssdfs_segment_info *si) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segment object %p\n", si); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!si) + return; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu\n", si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&si->obj_state)) { + case SSDFS_SEG_OBJECT_UNDER_CREATION: + case SSDFS_SEG_OBJECT_CREATED: + case SSDFS_CURRENT_SEG_OBJECT: + /* expected state */ + break; + + default: + SSDFS_WARN("unexpected segment object's state %#x\n", + atomic_read(&si->obj_state)); + break; + } + + switch (atomic_read(&si->activity_type)) { + case SSDFS_SEG_OBJECT_NO_ACTIVITY: + case SSDFS_SEG_OBJECT_REGULAR_ACTIVITY: + /* expected state */ + break; + + default: + SSDFS_WARN("unexpected segment object's activity %#x\n", + atomic_read(&si->activity_type)); + break; + } + + ssdfs_seg_obj_cache_leaks_decrement(si); + kmem_cache_free(ssdfs_seg_obj_cachep, si); +} + +/* + * ssdfs_segment_destroy_object() - destroy segment object + * @si: pointer on segment object + * + * This function tries to destroy segment object. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EBUSY - segment object is referenced yet. + * %-EIO - I/O error. + */ +int ssdfs_segment_destroy_object(struct ssdfs_segment_info *si) +{ + int refs_count; + int err = 0; + + if (!si) + return 0; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg %llu, seg_state %#x, log_pages %u, " + "create_threads %u\n", + si->seg_id, atomic_read(&si->seg_state), + si->log_pages, si->create_threads); + SSDFS_ERR("obj_state %#x\n", + atomic_read(&si->obj_state)); +#else + SSDFS_DBG("seg %llu, seg_state %#x, log_pages %u, " + "create_threads %u\n", + si->seg_id, atomic_read(&si->seg_state), + si->log_pages, si->create_threads); + SSDFS_DBG("obj_state %#x\n", + atomic_read(&si->obj_state)); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + switch (atomic_read(&si->obj_state)) { + case SSDFS_SEG_OBJECT_UNDER_CREATION: + case SSDFS_SEG_OBJECT_CREATED: + case SSDFS_CURRENT_SEG_OBJECT: + /* expected state */ + break; + + default: + SSDFS_WARN("unexpected segment object's state %#x\n", + atomic_read(&si->obj_state)); + break; + } + + refs_count = atomic_read(&si->refs_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("si %p, seg %llu, refs_count %d\n", + si, si->seg_id, refs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (refs_count != 0) { + wait_queue_head_t *wq = &si->object_queue; + + err = wait_event_killable_timeout(*wq, + atomic_read(&si->refs_count) <= 0, + SSDFS_DEFAULT_TIMEOUT); + if (err < 0) { + WARN_ON(err < 0); + } else + err = 0; + + if (atomic_read(&si->refs_count) != 0) { + SSDFS_WARN("unable to destroy object of segment %llu: " + "refs_count %d\n", + si->seg_id, refs_count); + return -EBUSY; + } + } + + ssdfs_sysfs_delete_seg_group(si); + + if (si->peb_array) { + struct ssdfs_peb_container *pebc; + int i; + + for (i = 0; i < si->pebs_count; i++) { + pebc = &si->peb_array[i]; + ssdfs_peb_container_destroy(pebc); + } + + ssdfs_seg_obj_kfree(si->peb_array); + } + + ssdfs_segment_blk_bmap_destroy(&si->blk_bmap); + + if (si->blk2off_table) + ssdfs_blk2off_table_destroy(si->blk2off_table); + + if (!is_ssdfs_requests_queue_empty(&si->create_rq)) { + SSDFS_WARN("create queue is not empty\n"); + ssdfs_requests_queue_remove_all(&si->create_rq, -ENOSPC); + } + + ssdfs_segment_free_object(si); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * ssdfs_segment_create_object() - create segment object + * @fsi: pointer on shared file system object + * @seg: segment number + * @seg_state: segment state + * @seg_type: segment type + * @log_pages: count of pages in log + * @create_threads: number of flush PEB's threads for new page requests + * @si: pointer on segment object [in|out] + * + * This function tries to create segment object for @seg + * identification number. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENOMEM - unable to allocate memory. + * %-ERANGE - internal error. + */ +int ssdfs_segment_create_object(struct ssdfs_fs_info *fsi, + u64 seg, + int seg_state, + u16 seg_type, + u16 log_pages, + u8 create_threads, + struct ssdfs_segment_info *si) +{ + int state = SSDFS_BLK2OFF_OBJECT_CREATED; + struct ssdfs_migration_destination *destination; + int refs_count = fsi->pebs_per_seg; + int destination_pebs = 0; + int init_flag, init_state; + u32 logical_blk_capacity; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !si); + + if (seg_state >= SSDFS_SEG_STATE_MAX) { + SSDFS_ERR("invalid segment state %#x\n", seg_state); + return -EINVAL; + } + + if (seg_type > SSDFS_LAST_KNOWN_SEG_TYPE) { + SSDFS_ERR("invalid segment type %#x\n", seg_type); + return -EINVAL; + } + + if (create_threads > fsi->pebs_per_seg || + fsi->pebs_per_seg % create_threads) { + SSDFS_ERR("invalid create threads count %u\n", + create_threads); + return -EINVAL; + } +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("fsi %p, seg %llu, seg_state %#x, log_pages %u, " + "create_threads %u\n", + fsi, seg, seg_state, log_pages, create_threads); +#else + SSDFS_DBG("fsi %p, seg %llu, seg_state %#x, log_pages %u, " + "create_threads %u\n", + fsi, seg, seg_state, log_pages, create_threads); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (seg >= fsi->nsegs) { + SSDFS_ERR("requested seg %llu >= nsegs %llu\n", + seg, fsi->nsegs); + return -EINVAL; + } + + switch (atomic_read(&si->obj_state)) { + case SSDFS_SEG_OBJECT_UNDER_CREATION: + /* expected state */ + break; + + default: + SSDFS_WARN("invalid segment object's state %#x\n", + atomic_read(&si->obj_state)); + ssdfs_segment_free_object(si); + return -EINVAL; + } + + si->seg_id = seg; + si->seg_type = seg_type; + si->log_pages = log_pages; + si->create_threads = create_threads; + si->fsi = fsi; + atomic_set(&si->seg_state, seg_state); + ssdfs_requests_queue_init(&si->create_rq); + + spin_lock_init(&si->protection.cno_lock); + si->protection.create_cno = ssdfs_current_cno(fsi->sb); + si->protection.last_request_cno = si->protection.create_cno; + si->protection.reqs_count = 0; + si->protection.protected_range = 0; + si->protection.future_request_cno = si->protection.create_cno; + + spin_lock_init(&si->pending_lock); + si->pending_new_user_data_pages = 0; + si->invalidated_user_data_pages = 0; + + si->pebs_count = fsi->pebs_per_seg; + si->peb_array = ssdfs_seg_obj_kcalloc(si->pebs_count, + sizeof(struct ssdfs_peb_container), + GFP_KERNEL); + if (!si->peb_array) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate memory for peb array\n"); + goto destroy_seg_obj; + } + + atomic_set(&si->migration.migrating_pebs, 0); + spin_lock_init(&si->migration.lock); + + destination = &si->migration.array[SSDFS_LAST_DESTINATION]; + destination->state = SSDFS_EMPTY_DESTINATION; + destination->destination_pebs = 0; + destination->shared_peb_index = -1; + + destination = &si->migration.array[SSDFS_CREATING_DESTINATION]; + destination->state = SSDFS_EMPTY_DESTINATION; + destination->destination_pebs = 0; + destination->shared_peb_index = -1; + + for (i = 0; i < SSDFS_PEB_THREAD_TYPE_MAX; i++) + init_waitqueue_head(&si->wait_queue[i]); + + if (seg_state == SSDFS_SEG_CLEAN) { + state = SSDFS_BLK2OFF_OBJECT_COMPLETE_INIT; + init_flag = SSDFS_BLK_BMAP_CREATE; + init_state = SSDFS_BLK_FREE; + } else { + init_flag = SSDFS_BLK_BMAP_INIT; + init_state = SSDFS_BLK_STATE_MAX; + } + + logical_blk_capacity = fsi->leb_pages_capacity * fsi->pebs_per_seg; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("create segment block bitmap: seg %llu\n", seg); +#else + SSDFS_DBG("create segment block bitmap: seg %llu\n", seg); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + err = ssdfs_segment_blk_bmap_create(si, init_flag, init_state); + if (unlikely(err)) { + SSDFS_ERR("fail to create segment block bitmap: " + "err %d\n", err); + goto destroy_seg_obj; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("create blk2off table: seg %llu\n", seg); +#else + SSDFS_DBG("create blk2off table: seg %llu\n", seg); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + si->blk2off_table = ssdfs_blk2off_table_create(fsi, + logical_blk_capacity, + SSDFS_SEG_OFF_TABLE, + state); + if (!si->blk2off_table) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate memory for translation table\n"); + goto destroy_seg_obj; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("create PEB containers: seg %llu\n", seg); +#else + SSDFS_DBG("create PEB containers: seg %llu\n", seg); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + for (i = 0; i < si->pebs_count; i++) { + err = ssdfs_peb_container_create(fsi, seg, i, + SEG2PEB_TYPE(seg_type), + log_pages, si); + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto destroy_seg_obj; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create PEB container: " + "seg %llu, peb index %d, err %d\n", + seg, i, err); + goto destroy_seg_obj; + } + } + + for (i = 0; i < si->pebs_count; i++) { + int cur_refs = atomic_read(&si->peb_array[i].dst_peb_refs); + int items_state = atomic_read(&si->peb_array[i].items_state); + + switch (items_state) { + case SSDFS_PEB1_DST_CONTAINER: + case SSDFS_PEB1_SRC_PEB2_DST_CONTAINER: + case SSDFS_PEB2_DST_CONTAINER: + case SSDFS_PEB2_SRC_PEB1_DST_CONTAINER: + destination_pebs++; + break; + + default: + /* do nothing */ + break; + } + + if (cur_refs == 0) + continue; + + if (cur_refs < refs_count) + refs_count = cur_refs; + } + + destination = &si->migration.array[SSDFS_LAST_DESTINATION]; + spin_lock(&si->migration.lock); + destination->shared_peb_index = refs_count; + destination->destination_pebs = destination_pebs; + destination->state = SSDFS_VALID_DESTINATION; + spin_unlock(&si->migration.lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("get free pages: seg %llu\n", seg); +#else + SSDFS_DBG("get free pages: seg %llu\n", seg); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + /* + * The goal of this cycle is to finish segment object + * initialization. The segment object should have + * valid value of free blocks number. + * The ssdfs_peb_get_free_pages() method waits the + * ending of PEB object complete initialization. + */ + for (i = 0; i < si->pebs_count; i++) { + int peb_free_pages; + struct ssdfs_peb_container *pebc = &si->peb_array[i]; + + if (is_peb_container_empty(pebc)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segment %llu hasn't PEB %d\n", + seg, i); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } + + peb_free_pages = ssdfs_peb_get_free_pages(pebc); + if (unlikely(peb_free_pages < 0)) { + err = peb_free_pages; + SSDFS_ERR("fail to calculate PEB's free pages: " + "seg %llu, peb index %d, err %d\n", + seg, i, err); + goto destroy_seg_obj; + } + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("create sysfs group: seg %llu\n", seg); +#else + SSDFS_DBG("create sysfs group: seg %llu\n", seg); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + err = ssdfs_sysfs_create_seg_group(si); + if (unlikely(err)) { + SSDFS_ERR("fail to create segment's sysfs group: " + "seg %llu, err %d\n", + seg, err); + goto destroy_seg_obj; + } + + atomic_set(&si->obj_state, SSDFS_SEG_OBJECT_CREATED); + atomic_set(&si->activity_type, SSDFS_SEG_OBJECT_REGULAR_ACTIVITY); + wake_up_all(&si->object_queue); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("segment %llu has been created\n", + seg); +#else + SSDFS_DBG("segment %llu has been created\n", + seg); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; + +destroy_seg_obj: + atomic_set(&si->obj_state, SSDFS_SEG_OBJECT_FAILURE); + wake_up_all(&si->object_queue); + ssdfs_segment_destroy_object(si); + return err; +} + +/* + * ssdfs_segment_get_object() - increment segment's reference counter + * @si: pointer on segment object + */ +void ssdfs_segment_get_object(struct ssdfs_segment_info *si) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si); + + SSDFS_DBG("seg_id %llu, refs_count %d\n", + si->seg_id, atomic_read(&si->refs_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + WARN_ON(atomic_inc_return(&si->refs_count) <= 0); +} + +/* + * ssdfs_segment_put_object() - decerement segment's reference counter + * @si: pointer on segment object + */ +void ssdfs_segment_put_object(struct ssdfs_segment_info *si) +{ + if (!si) + return; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, refs_count %d\n", + si->seg_id, atomic_read(&si->refs_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + WARN_ON(atomic_dec_return(&si->refs_count) < 0); + + if (atomic_read(&si->refs_count) <= 0) + wake_up_all(&si->object_queue); +} + +/* + * ssdfs_segment_detect_search_range() - detect search range + * @fsi: pointer on shared file system object + * @start_seg: starting ID for segment search [in|out] + * @end_seg: ending ID for segment search [out] + * + * This method tries to detect the search range. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOENT - unable to find valid range for search. + */ +int ssdfs_segment_detect_search_range(struct ssdfs_fs_info *fsi, + u64 *start_seg, u64 *end_seg) +{ + struct completion *init_end; + u64 start_leb; + u64 end_leb; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !start_seg || !end_seg); + + SSDFS_DBG("fsi %p, start_search_id %llu\n", + fsi, *start_seg); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (*start_seg >= fsi->nsegs) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_seg %llu >= nsegs %llu\n", + *start_seg, fsi->nsegs); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOENT; + } + + start_leb = ssdfs_get_leb_id_for_peb_index(fsi, *start_seg, 0); + if (start_leb >= U64_MAX) { + SSDFS_ERR("invalid leb_id for seg_id %llu\n", + *start_seg); + return -ERANGE; + } + + err = ssdfs_maptbl_recommend_search_range(fsi, &start_leb, + &end_leb, &init_end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(init_end); + if (unlikely(err)) { + SSDFS_ERR("maptbl init failed: " + "err %d\n", err); + goto finish_seg_id_correction; + } + + start_leb = ssdfs_get_leb_id_for_peb_index(fsi, *start_seg, 0); + if (start_leb >= U64_MAX) { + SSDFS_ERR("invalid leb_id for seg_id %llu\n", + *start_seg); + return -ERANGE; + } + + err = ssdfs_maptbl_recommend_search_range(fsi, &start_leb, + &end_leb, &init_end); + } + + if (err == -ENOENT) { + *start_seg = U64_MAX; + *end_seg = U64_MAX; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find search range: leb_id %llu\n", + start_leb); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_seg_id_correction; + } else if (unlikely(err)) { + *start_seg = U64_MAX; + *end_seg = U64_MAX; + SSDFS_ERR("fail to find search range: " + "leb_id %llu, err %d\n", + start_leb, err); + goto finish_seg_id_correction; + } + + *start_seg = SSDFS_LEB2SEG(fsi, start_leb); + *end_seg = SSDFS_LEB2SEG(fsi, end_leb); + +finish_seg_id_correction: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_seg %llu, end_seg %llu, err %d\n", + *start_seg, *end_seg, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * __ssdfs_find_new_segment() - find a new segment + * @fsi: pointer on shared file system object + * @seg_type: segment type + * @start_search_id: starting ID for segment search + * @seg_id: found segment ID [out] + * @seg_state: found segment state [out] + * + * This method tries to find a new segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOENT - unable to find a new segment. + */ +static +int __ssdfs_find_new_segment(struct ssdfs_fs_info *fsi, int seg_type, + u64 start_search_id, u64 *seg_id, + int *seg_state) +{ + int new_state; + u64 start_seg = start_search_id; + u64 end_seg = U64_MAX; + struct completion *init_end; + int res; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !seg_id || !seg_state); + + SSDFS_DBG("fsi %p, seg_type %#x, start_search_id %llu\n", + fsi, seg_type, start_search_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + *seg_id = U64_MAX; + *seg_state = SSDFS_SEG_STATE_MAX; + + switch (seg_type) { + case SSDFS_USER_DATA_SEG_TYPE: + new_state = SSDFS_SEG_DATA_USING; + break; + + case SSDFS_LEAF_NODE_SEG_TYPE: + new_state = SSDFS_SEG_LEAF_NODE_USING; + break; + + case SSDFS_HYBRID_NODE_SEG_TYPE: + new_state = SSDFS_SEG_HYBRID_NODE_USING; + break; + + case SSDFS_INDEX_NODE_SEG_TYPE: + new_state = SSDFS_SEG_INDEX_NODE_USING; + break; + + default: + BUG(); + }; + + err = ssdfs_segment_detect_search_range(fsi, + &start_seg, + &end_seg); + if (err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find fragment for search: " + "start_seg %llu, end_seg %llu\n", + start_seg, end_seg); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_search; + } else if (unlikely(err)) { + SSDFS_ERR("fail to define a search range: " + "start_search_id %llu, err %d\n", + start_search_id, err); + goto finish_search; + } + + res = ssdfs_segbmap_find_and_set(fsi->segbmap, + start_seg, end_seg, + SSDFS_SEG_CLEAN, + SEG_TYPE2MASK(seg_type), + new_state, + seg_id, &init_end); + if (res >= 0) { + /* Define segment state */ + *seg_state = res; + } else if (res == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(init_end); + if (unlikely(err)) { + SSDFS_ERR("segbmap init failed: " + "err %d\n", err); + goto finish_search; + } + + res = ssdfs_segbmap_find_and_set(fsi->segbmap, + start_seg, end_seg, + SSDFS_SEG_CLEAN, + SEG_TYPE2MASK(seg_type), + new_state, + seg_id, &init_end); + if (res >= 0) { + /* Define segment state */ + *seg_state = res; + } else if (res == -ENODATA) { + err = -ENOENT; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find segment in range: " + "start_seg %llu, end_seg %llu\n", + start_seg, end_seg); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_search; + } else { + err = res; + SSDFS_ERR("fail to find segment in range: " + "start_seg %llu, end_seg %llu, err %d\n", + start_seg, end_seg, res); + goto finish_search; + } + } else if (res == -ENODATA) { + err = -ENOENT; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find segment in range: " + "start_seg %llu, end_seg %llu\n", + start_seg, end_seg); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_search; + } else { + SSDFS_ERR("fail to find segment in range: " + "start_seg %llu, end_seg %llu, err %d\n", + start_seg, end_seg, res); + goto finish_search; + } + +finish_search: + if (err == -ENOENT) + *seg_id = end_seg; + + return err; +} + +/* + * ssdfs_find_new_segment() - find a new segment + * @fsi: pointer on shared file system object + * @seg_type: segment type + * @start_search_id: starting ID for segment search + * @seg_id: found segment ID [out] + * @seg_state: found segment state [out] + * + * This method tries to find a new segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOSPC - unable to find a new segment. + */ +static +int ssdfs_find_new_segment(struct ssdfs_fs_info *fsi, int seg_type, + u64 start_search_id, u64 *seg_id, + int *seg_state) +{ + u64 cur_id = start_search_id; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !seg_id || !seg_state); + + SSDFS_DBG("fsi %p, seg_type %#x, start_search_id %llu\n", + fsi, seg_type, start_search_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + while (cur_id < fsi->nsegs) { + err = __ssdfs_find_new_segment(fsi, seg_type, cur_id, + seg_id, seg_state); + if (err == -ENOENT) { + err = 0; + cur_id = *seg_id; + continue; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find a new segment: " + "cur_id %llu, err %d\n", + cur_id, err); + return err; + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found seg_id %llu\n", *seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + } + + cur_id = 0; + + while (cur_id < start_search_id) { + err = __ssdfs_find_new_segment(fsi, seg_type, cur_id, + seg_id, seg_state); + if (err == -ENOENT) { + err = 0; + cur_id = *seg_id; + continue; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find a new segment: " + "cur_id %llu, err %d\n", + cur_id, err); + return err; + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found seg_id %llu\n", *seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("no free space for a new segment\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return -ENOSPC; +} + +/* + * __ssdfs_create_new_segment() - create new segment and add into the tree + * @fsi: pointer on shared file system object + * @seg_id: segment number + * @seg_state: segment state + * @seg_type: segment type + * @log_pages: count of pages in log + * @create_threads: number of flush PEB's threads for new page requests + * + * This function tries to create segment object for @seg + * identification number. + * + * RETURN: + * [success] - pointer on created segment object + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENOMEM - unable to allocate memory. + * %-ERANGE - internal error. + */ +struct ssdfs_segment_info * +__ssdfs_create_new_segment(struct ssdfs_fs_info *fsi, + u64 seg_id, int seg_state, + u16 seg_type, u16 log_pages, + u8 create_threads) +{ + struct ssdfs_segment_info *si; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + + if (seg_state >= SSDFS_SEG_STATE_MAX) { + SSDFS_ERR("invalid segment state %#x\n", seg_state); + return ERR_PTR(-EINVAL); + } + + if (seg_type > SSDFS_LAST_KNOWN_SEG_TYPE) { + SSDFS_ERR("invalid segment type %#x\n", seg_type); + return ERR_PTR(-EINVAL); + } + + if (create_threads > fsi->pebs_per_seg || + fsi->pebs_per_seg % create_threads) { + SSDFS_ERR("invalid create threads count %u\n", + create_threads); + return ERR_PTR(-EINVAL); + } + + SSDFS_DBG("fsi %p, seg %llu, seg_state %#x, log_pages %u, " + "create_threads %u\n", + fsi, seg_id, seg_state, log_pages, create_threads); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = ssdfs_segment_allocate_object(seg_id); + if (IS_ERR_OR_NULL(si)) { + SSDFS_ERR("fail to allocate segment: " + "seg %llu, err %ld\n", + seg_id, PTR_ERR(si)); + return si; + } + + err = ssdfs_segment_tree_add(fsi, si); + if (err == -EEXIST) { + wait_queue_head_t *wq = &si->object_queue; + + ssdfs_segment_free_object(si); + + si = ssdfs_segment_tree_find(fsi, seg_id); + if (IS_ERR_OR_NULL(si)) { + SSDFS_ERR("fail to find segment: " + "seg %llu, err %d\n", + seg_id, err); + return ERR_PTR(err); + } + + ssdfs_segment_get_object(si); + + err = wait_event_killable_timeout(*wq, + is_ssdfs_segment_created(si), + SSDFS_DEFAULT_TIMEOUT); + if (err < 0) { + WARN_ON(err < 0); + } else + err = 0; + + switch (atomic_read(&si->obj_state)) { + case SSDFS_SEG_OBJECT_CREATED: + case SSDFS_CURRENT_SEG_OBJECT: + /* do nothing */ + break; + + default: + ssdfs_segment_put_object(si); + SSDFS_ERR("fail to create segment: " + "seg %llu\n", + seg_id); + return ERR_PTR(-ERANGE); + } + + return si; + } else if (unlikely(err)) { + ssdfs_segment_free_object(si); + SSDFS_ERR("fail to add segment into tree: " + "seg %llu, err %d\n", + seg_id, err); + return ERR_PTR(err); + } else { + err = ssdfs_segment_create_object(fsi, + seg_id, + seg_state, + seg_type, + log_pages, + create_threads, + si); + if (err == -EINTR) { + /* + * Ignore this error. + */ + return ERR_PTR(err); + } else if (unlikely(err)) { + SSDFS_ERR("fail to create segment: " + "seg %llu, err %d\n", + seg_id, err); + return ERR_PTR(err); + } + } + + ssdfs_segment_get_object(si); + return si; +} + +/* + * ssdfs_grab_segment() - get or create segment object + * @fsi: pointer on shared file system object + * @seg_type: type of segment + * @seg_id: segment number + * @start_search_id: starting ID for segment search + * + * This method tries to get or to create segment object of + * @seg_type. If @seg_id is U64_MAX then it needs to find + * segment that will be in "clean" or "using" state. + * The @start_search_id is defining the range for search. + * If this value is equal to U64_MAX then it is ignored. + * The found segment number should be used for segment object + * creation and adding into the segment tree. Otherwise, + * if @seg_id contains valid segment number, the method should try + * to find segment object in the segments tree. If the segment + * object is not found then segment state will be detected via + * segment bitmap, segment object will be created and to be added + * into the segment tree. Finally, reference counter of segment + * object will be incremented. + * + * RETURN: + * [success] - pointer on segment object. + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +struct ssdfs_segment_info * +ssdfs_grab_segment(struct ssdfs_fs_info *fsi, int seg_type, u64 seg_id, + u64 start_search_id) +{ + struct ssdfs_segment_info *si; + int seg_state = SSDFS_SEG_STATE_MAX; + struct completion *init_end; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + BUG_ON(seg_type != SSDFS_LEAF_NODE_SEG_TYPE && + seg_type != SSDFS_HYBRID_NODE_SEG_TYPE && + seg_type != SSDFS_INDEX_NODE_SEG_TYPE && + seg_type != SSDFS_USER_DATA_SEG_TYPE); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("fsi %p, seg_type %#x, " + "seg_id %llu, start_search_id %llu\n", + fsi, seg_type, seg_id, start_search_id); +#else + SSDFS_DBG("fsi %p, seg_type %#x, " + "seg_id %llu, start_search_id %llu\n", + fsi, seg_type, seg_id, start_search_id); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (seg_id == U64_MAX) { + err = ssdfs_find_new_segment(fsi, seg_type, + start_search_id, + &seg_id, &seg_state); + if (err == -ENOSPC) { + SSDFS_DBG("no free space for a new segment\n"); + return ERR_PTR(err); + } else if (unlikely(err)) { + SSDFS_ERR("fail to find a new segment: " + "start_search_id %llu, " + "seg_type %#x, err %d\n", + start_search_id, seg_type, err); + return ERR_PTR(err); + } + } + + si = ssdfs_segment_tree_find(fsi, seg_id); + if (IS_ERR_OR_NULL(si)) { + err = PTR_ERR(si); + + if (err == -ENODATA) { + u16 log_pages; + u8 create_threads; + + if (seg_state != SSDFS_SEG_STATE_MAX) + goto create_segment_object; + + seg_state = ssdfs_segbmap_get_state(fsi->segbmap, + seg_id, &init_end); + if (seg_state == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(init_end); + if (unlikely(err)) { + SSDFS_ERR("segbmap init failed: " + "err %d\n", err); + return ERR_PTR(err); + } + + seg_state = + ssdfs_segbmap_get_state(fsi->segbmap, + seg_id, + &init_end); + if (seg_state < 0) + goto fail_define_seg_state; + } else if (seg_state < 0) { +fail_define_seg_state: + SSDFS_ERR("fail to define segment state: " + "seg %llu\n", + seg_id); + return ERR_PTR(seg_state); + } + + switch (seg_state) { + case SSDFS_SEG_DATA_USING: + case SSDFS_SEG_LEAF_NODE_USING: + case SSDFS_SEG_HYBRID_NODE_USING: + case SSDFS_SEG_INDEX_NODE_USING: + case SSDFS_SEG_USED: + case SSDFS_SEG_PRE_DIRTY: + /* expected state */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("seg %llu has unexpected state %#x\n", + seg_id, seg_state); + return ERR_PTR(err); + }; + +create_segment_object: + switch (seg_type) { + case SSDFS_USER_DATA_SEG_TYPE: + log_pages = + fsi->segs_tree->user_data_log_pages; + break; + + case SSDFS_LEAF_NODE_SEG_TYPE: + log_pages = + fsi->segs_tree->lnodes_seg_log_pages; + break; + + case SSDFS_HYBRID_NODE_SEG_TYPE: + log_pages = + fsi->segs_tree->hnodes_seg_log_pages; + break; + + case SSDFS_INDEX_NODE_SEG_TYPE: + log_pages = + fsi->segs_tree->inodes_seg_log_pages; + break; + + default: + log_pages = + fsi->segs_tree->default_log_pages; + break; + }; + + create_threads = fsi->create_threads_per_seg; + si = __ssdfs_create_new_segment(fsi, + seg_id, + seg_state, + seg_type, + log_pages, + create_threads); + if (IS_ERR_OR_NULL(si)) { + err = (si == NULL ? -ENOMEM : PTR_ERR(si)); + if (err == -EINTR) { + /* + * Ignore this error. + */ + } else { + SSDFS_ERR("fail to add new segment: " + "seg %llu, err %d\n", + seg_id, err); + } + } + + return si; + } else if (err == 0) { + SSDFS_ERR("segment tree returns NULL\n"); + return ERR_PTR(-ERANGE); + } else { + SSDFS_ERR("segment tree fail to find segment: " + "seg %llu, err %d\n", + seg_id, err); + return ERR_PTR(err); + } + } + + ssdfs_segment_get_object(si); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return si; +} diff --git a/fs/ssdfs/segment.h b/fs/ssdfs/segment.h new file mode 100644 index 000000000000..cf11f5e5b04f --- /dev/null +++ b/fs/ssdfs/segment.h @@ -0,0 +1,957 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/segment.h - segment concept declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#ifndef _SSDFS_SEGMENT_H +#define _SSDFS_SEGMENT_H + +#include "peb.h" +#include "segment_block_bitmap.h" + +/* Available indexes for destination */ +enum { + SSDFS_LAST_DESTINATION, + SSDFS_CREATING_DESTINATION, + SSDFS_DESTINATION_MAX +}; + +/* Possible states of destination descriptor */ +enum { + SSDFS_EMPTY_DESTINATION, + SSDFS_DESTINATION_UNDER_CREATION, + SSDFS_VALID_DESTINATION, + SSDFS_OBSOLETE_DESTINATION, + SSDFS_DESTINATION_STATE_MAX +}; + +/* + * struct ssdfs_migration_destination - destination descriptor + * @state: descriptor's state + * @destination_pebs: count of destination PEBs for migration + * @shared_peb_index: shared index of destination PEB for migration + */ +struct ssdfs_migration_destination { + int state; + int destination_pebs; + int shared_peb_index; +}; + +/* + * struct ssdfs_segment_migration_info - migration info + * @migrating_pebs: count of migrating PEBs + * @lock: migration data lock + * @array: destination descriptors + */ +struct ssdfs_segment_migration_info { + atomic_t migrating_pebs; + + spinlock_t lock; + struct ssdfs_migration_destination array[SSDFS_DESTINATION_MAX]; +}; + +/* + * struct ssdfs_segment_info - segment object description + * @seg_id: segment identification number + * @log_pages: count of pages in full partial log + * @create_threads: number of flush PEB's threads for new page requests + * @seg_type: segment type + * @protection: segment's protection window + * @seg_state: current state of segment + * @obj_state: segment object's state + * @activity_type: type of activity with segment object + * @peb_array: array of PEB's descriptors + * @pebs_count: count of items in PEBS array + * @migration: migration info + * @refs_count: counter of references on segment object + * @object_queue: wait queue for segment creation/destruction + * @create_rq: new page requests queue + * @pending_lock: lock of pending pages' counter + * @pending_new_user_data_pages: counter of pending new user data pages + * @invalidated_user_data_pages: counter of invalidated user data pages + * @wait_queue: array of PEBs' wait queues + * @blk_bmap: segment's block bitmap + * @blk2off_table: offset translation table + * @fsi: pointer on shared file system object + * @seg_kobj: /sys/fs/ssdfs//segments/ kernel object + * @seg_kobj_unregister: completion state for kernel object + * @pebs_kobj: /sys/fs///segments//pebs kernel object + * @pebs_kobj_unregister: completion state for pebs kernel object + */ +struct ssdfs_segment_info { + /* Static data */ + u64 seg_id; + u16 log_pages; + u8 create_threads; + u16 seg_type; + + /* Checkpoints set */ + struct ssdfs_protection_window protection; + + /* Mutable data */ + atomic_t seg_state; + atomic_t obj_state; + atomic_t activity_type; + + /* Segment's PEB's containers array */ + struct ssdfs_peb_container *peb_array; + u16 pebs_count; + + /* Migration info */ + struct ssdfs_segment_migration_info migration; + + /* Reference counter */ + atomic_t refs_count; + wait_queue_head_t object_queue; + + /* + * New pages processing: + * requests queue, wait queue + */ + struct ssdfs_requests_queue create_rq; + + spinlock_t pending_lock; + u32 pending_new_user_data_pages; + u32 invalidated_user_data_pages; + + /* Threads' wait queues */ + wait_queue_head_t wait_queue[SSDFS_PEB_THREAD_TYPE_MAX]; + + struct ssdfs_segment_blk_bmap blk_bmap; + struct ssdfs_blk2off_table *blk2off_table; + struct ssdfs_fs_info *fsi; + + /* /sys/fs/ssdfs//segments/ */ + struct kobject *seg_kobj; + struct kobject seg_kobj_buf; + struct completion seg_kobj_unregister; + + /* /sys/fs///segments//pebs */ + struct kobject pebs_kobj; + struct completion pebs_kobj_unregister; +}; + +/* Segment object states */ +enum { + SSDFS_SEG_OBJECT_UNKNOWN_STATE, + SSDFS_SEG_OBJECT_UNDER_CREATION, + SSDFS_SEG_OBJECT_CREATED, + SSDFS_CURRENT_SEG_OBJECT, + SSDFS_SEG_OBJECT_FAILURE, + SSDFS_SEG_OBJECT_STATE_MAX +}; + +/* Segment object's activity type */ +enum { + SSDFS_SEG_OBJECT_NO_ACTIVITY, + SSDFS_SEG_OBJECT_REGULAR_ACTIVITY, + SSDFS_SEG_UNDER_GC_ACTIVITY, + SSDFS_SEG_UNDER_INVALIDATION, + SSDFS_SEG_OBJECT_ACTIVITY_TYPE_MAX +}; + +/* + * Inline functions + */ + +/* + * is_ssdfs_segment_created() - check that segment object is created + * + * This function returns TRUE for the case of successful + * creation of segment's object or failure of the creation. + * The responsibility of the caller to check that + * segment object has been created successfully. + */ +static inline +bool is_ssdfs_segment_created(struct ssdfs_segment_info *si) +{ + bool is_created = false; + + switch (atomic_read(&si->obj_state)) { + case SSDFS_SEG_OBJECT_CREATED: + case SSDFS_CURRENT_SEG_OBJECT: + case SSDFS_SEG_OBJECT_FAILURE: + is_created = true; + break; + + default: + is_created = false; + break; + } + + return is_created; +} + +/* + * CUR_SEG_TYPE() - convert request class into current segment type + */ +static inline +int CUR_SEG_TYPE(int req_class) +{ + int cur_seg_type = SSDFS_CUR_SEGS_COUNT; + + switch (req_class) { + case SSDFS_PEB_PRE_ALLOCATE_DATA_REQ: + case SSDFS_PEB_CREATE_DATA_REQ: + cur_seg_type = SSDFS_CUR_DATA_SEG; + break; + + case SSDFS_PEB_PRE_ALLOCATE_LNODE_REQ: + case SSDFS_PEB_CREATE_LNODE_REQ: + cur_seg_type = SSDFS_CUR_LNODE_SEG; + break; + + case SSDFS_PEB_PRE_ALLOCATE_HNODE_REQ: + case SSDFS_PEB_CREATE_HNODE_REQ: + cur_seg_type = SSDFS_CUR_HNODE_SEG; + break; + + case SSDFS_PEB_PRE_ALLOCATE_IDXNODE_REQ: + case SSDFS_PEB_CREATE_IDXNODE_REQ: + cur_seg_type = SSDFS_CUR_IDXNODE_SEG; + break; + + case SSDFS_ZONE_USER_DATA_MIGRATE_REQ: + cur_seg_type = SSDFS_CUR_DATA_UPDATE_SEG; + break; + + default: + BUG(); + } + + return cur_seg_type; +} + +/* + * SEG_TYPE() - convert request class into segment type + */ +static inline +int SEG_TYPE(int req_class) +{ + int seg_type = SSDFS_LAST_KNOWN_SEG_TYPE; + + switch (req_class) { + case SSDFS_PEB_PRE_ALLOCATE_DATA_REQ: + case SSDFS_PEB_CREATE_DATA_REQ: + seg_type = SSDFS_USER_DATA_SEG_TYPE; + break; + + case SSDFS_PEB_PRE_ALLOCATE_LNODE_REQ: + case SSDFS_PEB_CREATE_LNODE_REQ: + seg_type = SSDFS_LEAF_NODE_SEG_TYPE; + break; + + case SSDFS_PEB_PRE_ALLOCATE_HNODE_REQ: + case SSDFS_PEB_CREATE_HNODE_REQ: + seg_type = SSDFS_HYBRID_NODE_SEG_TYPE; + break; + + case SSDFS_PEB_PRE_ALLOCATE_IDXNODE_REQ: + case SSDFS_PEB_CREATE_IDXNODE_REQ: + seg_type = SSDFS_INDEX_NODE_SEG_TYPE; + break; + + default: + BUG(); + } + + return seg_type; +} + +/* + * SEG_TYPE_TO_USING_STATE() - convert segment type to segment using state + * @seg_type: segment type + */ +static inline +int SEG_TYPE_TO_USING_STATE(u16 seg_type) +{ + switch (seg_type) { + case SSDFS_USER_DATA_SEG_TYPE: + return SSDFS_SEG_DATA_USING; + + case SSDFS_LEAF_NODE_SEG_TYPE: + return SSDFS_SEG_LEAF_NODE_USING; + + case SSDFS_HYBRID_NODE_SEG_TYPE: + return SSDFS_SEG_HYBRID_NODE_USING; + + case SSDFS_INDEX_NODE_SEG_TYPE: + return SSDFS_SEG_INDEX_NODE_USING; + } + + return SSDFS_SEG_STATE_MAX; +} + +/* + * SEG_TYPE2MASK() - convert segment type into search mask + */ +static inline +int SEG_TYPE2MASK(int seg_type) +{ + int mask; + + switch (seg_type) { + case SSDFS_USER_DATA_SEG_TYPE: + mask = SSDFS_SEG_DATA_USING_STATE_FLAG; + break; + + case SSDFS_LEAF_NODE_SEG_TYPE: + mask = SSDFS_SEG_LEAF_NODE_USING_STATE_FLAG; + break; + + case SSDFS_HYBRID_NODE_SEG_TYPE: + mask = SSDFS_SEG_HYBRID_NODE_USING_STATE_FLAG; + break; + + case SSDFS_INDEX_NODE_SEG_TYPE: + mask = SSDFS_SEG_INDEX_NODE_USING_STATE_FLAG; + break; + + default: + BUG(); + }; + + return mask; +} + +static inline +void ssdfs_account_user_data_flush_request(struct ssdfs_segment_info *si) +{ + u64 flush_requests = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (si->seg_type == SSDFS_USER_DATA_SEG_TYPE) { + spin_lock(&si->fsi->volume_state_lock); + si->fsi->flushing_user_data_requests++; + flush_requests = si->fsi->flushing_user_data_requests; + spin_unlock(&si->fsi->volume_state_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, flush_requests %llu\n", + si->seg_id, flush_requests); +#endif /* CONFIG_SSDFS_DEBUG */ + } +} + +static inline +void ssdfs_forget_user_data_flush_request(struct ssdfs_segment_info *si) +{ + u64 flush_requests = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (si->seg_type == SSDFS_USER_DATA_SEG_TYPE) { + spin_lock(&si->fsi->volume_state_lock); + flush_requests = si->fsi->flushing_user_data_requests; + if (flush_requests > 0) { + si->fsi->flushing_user_data_requests--; + flush_requests = si->fsi->flushing_user_data_requests; + } else + err = -ERANGE; + spin_unlock(&si->fsi->volume_state_lock); + + if (unlikely(err)) + SSDFS_WARN("fail to decrement\n"); + + if (flush_requests == 0) + wake_up_all(&si->fsi->finish_user_data_flush_wq); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, flush_requests %llu\n", + si->seg_id, flush_requests); +#endif /* CONFIG_SSDFS_DEBUG */ + } +} + +static inline +bool is_user_data_pages_invalidated(struct ssdfs_segment_info *si) +{ + u64 invalidated = 0; + + if (si->seg_type != SSDFS_USER_DATA_SEG_TYPE) + return false; + + spin_lock(&si->pending_lock); + invalidated = si->invalidated_user_data_pages; + spin_unlock(&si->pending_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, invalidated %llu\n", + si->seg_id, invalidated); +#endif /* CONFIG_SSDFS_DEBUG */ + + return invalidated > 0; +} + +static inline +void ssdfs_account_invalidated_user_data_pages(struct ssdfs_segment_info *si, + u32 count) +{ + u64 invalidated = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si); + + SSDFS_DBG("si %p, count %u\n", + si, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (si->seg_type == SSDFS_USER_DATA_SEG_TYPE) { + spin_lock(&si->pending_lock); + si->invalidated_user_data_pages += count; + invalidated = si->invalidated_user_data_pages; + spin_unlock(&si->pending_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, invalidated %llu\n", + si->seg_id, invalidated); +#endif /* CONFIG_SSDFS_DEBUG */ + } +} + +static inline +void ssdfs_forget_invalidated_user_data_pages(struct ssdfs_segment_info *si) +{ + u64 invalidated = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (si->seg_type == SSDFS_USER_DATA_SEG_TYPE) { + spin_lock(&si->pending_lock); + invalidated = si->invalidated_user_data_pages; + si->invalidated_user_data_pages = 0; + spin_unlock(&si->pending_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, invalidated %llu\n", + si->seg_id, invalidated); +#endif /* CONFIG_SSDFS_DEBUG */ + } +} + +static inline +void ssdfs_protection_account_request(struct ssdfs_protection_window *ptr, + u64 current_cno) +{ +#ifdef CONFIG_SSDFS_DEBUG + u64 create_cno; + u64 last_request_cno; + u32 reqs_count; + u64 protected_range; + u64 future_request_cno; +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&ptr->cno_lock); + + if (ptr->reqs_count == 0) { + ptr->reqs_count = 1; + ptr->last_request_cno = current_cno; + } else + ptr->reqs_count++; + +#ifdef CONFIG_SSDFS_DEBUG + create_cno = ptr->create_cno; + last_request_cno = ptr->last_request_cno; + reqs_count = ptr->reqs_count; + protected_range = ptr->protected_range; + future_request_cno = ptr->future_request_cno; +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_unlock(&ptr->cno_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("create_cno %llu, " + "last_request_cno %llu, reqs_count %u, " + "protected_range %llu, future_request_cno %llu\n", + create_cno, + last_request_cno, reqs_count, + protected_range, future_request_cno); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +static inline +void ssdfs_protection_forget_request(struct ssdfs_protection_window *ptr, + u64 current_cno) +{ + u64 create_cno; + u64 last_request_cno; + u32 reqs_count; + u64 protected_range; + u64 future_request_cno; + int err = 0; + + spin_lock(&ptr->cno_lock); + + if (ptr->reqs_count == 0) { + err = -ERANGE; + goto finish_process_request; + } else if (ptr->reqs_count == 1) { + ptr->reqs_count--; + + if (ptr->last_request_cno >= current_cno) { + err = -ERANGE; + goto finish_process_request; + } else { + u64 diff = current_cno - ptr->last_request_cno; + u64 last_range = ptr->protected_range; + ptr->protected_range = max_t(u64, last_range, diff); + ptr->last_request_cno = current_cno; + ptr->future_request_cno = + current_cno + ptr->protected_range; + } + } else + ptr->reqs_count--; + +finish_process_request: + create_cno = ptr->create_cno; + last_request_cno = ptr->last_request_cno; + reqs_count = ptr->reqs_count; + protected_range = ptr->protected_range; + future_request_cno = ptr->future_request_cno; + + spin_unlock(&ptr->cno_lock); + + if (unlikely(err)) { + SSDFS_WARN("create_cno %llu, " + "last_request_cno %llu, reqs_count %u, " + "protected_range %llu, future_request_cno %llu\n", + create_cno, + last_request_cno, reqs_count, + protected_range, future_request_cno); + return; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("create_cno %llu, " + "last_request_cno %llu, reqs_count %u, " + "protected_range %llu, future_request_cno %llu\n", + create_cno, + last_request_cno, reqs_count, + protected_range, future_request_cno); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +static inline +void ssdfs_segment_create_request_cno(struct ssdfs_segment_info *si) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu\n", si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_protection_account_request(&si->protection, + ssdfs_current_cno(si->fsi->sb)); +} + +static inline +void ssdfs_segment_finish_request_cno(struct ssdfs_segment_info *si) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu\n", si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_protection_forget_request(&si->protection, + ssdfs_current_cno(si->fsi->sb)); +} + +static inline +bool should_gc_doesnt_touch_segment(struct ssdfs_segment_info *si) +{ +#ifdef CONFIG_SSDFS_DEBUG + u64 create_cno; + u64 last_request_cno; + u32 reqs_count; + u64 protected_range; + u64 future_request_cno; +#endif /* CONFIG_SSDFS_DEBUG */ + u64 cur_cno; + bool dont_touch = false; + + spin_lock(&si->protection.cno_lock); + if (si->protection.reqs_count > 0) { + /* segment is under processing */ + dont_touch = true; + } else { + cur_cno = ssdfs_current_cno(si->fsi->sb); + if (cur_cno <= si->protection.future_request_cno) { + /* segment is under protection window yet */ + dont_touch = true; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + create_cno = si->protection.create_cno; + last_request_cno = si->protection.last_request_cno; + reqs_count = si->protection.reqs_count; + protected_range = si->protection.protected_range; + future_request_cno = si->protection.future_request_cno; +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_unlock(&si->protection.cno_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, create_cno %llu, " + "last_request_cno %llu, reqs_count %u, " + "protected_range %llu, future_request_cno %llu, " + "dont_touch %#x\n", + si->seg_id, create_cno, + last_request_cno, reqs_count, + protected_range, future_request_cno, + dont_touch); +#endif /* CONFIG_SSDFS_DEBUG */ + + return dont_touch; +} + +static inline +void ssdfs_peb_read_request_cno(struct ssdfs_peb_container *pebc) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_protection_account_request(&pebc->cache_protection, + ssdfs_current_cno(pebc->parent_si->fsi->sb)); +} + +static inline +void ssdfs_peb_finish_read_request_cno(struct ssdfs_peb_container *pebc) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_protection_forget_request(&pebc->cache_protection, + ssdfs_current_cno(pebc->parent_si->fsi->sb)); +} + +static inline +bool is_it_time_free_peb_cache_memory(struct ssdfs_peb_container *pebc) +{ +#ifdef CONFIG_SSDFS_DEBUG + u64 create_cno; + u64 last_request_cno; + u32 reqs_count; + u64 protected_range; + u64 future_request_cno; +#endif /* CONFIG_SSDFS_DEBUG */ + u64 cur_cno; + bool dont_touch = false; + + spin_lock(&pebc->cache_protection.cno_lock); + if (pebc->cache_protection.reqs_count > 0) { + /* PEB has read requests */ + dont_touch = true; + } else { + cur_cno = ssdfs_current_cno(pebc->parent_si->fsi->sb); + if (cur_cno <= pebc->cache_protection.future_request_cno) { + /* PEB is under protection window yet */ + dont_touch = true; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + create_cno = pebc->cache_protection.create_cno; + last_request_cno = pebc->cache_protection.last_request_cno; + reqs_count = pebc->cache_protection.reqs_count; + protected_range = pebc->cache_protection.protected_range; + future_request_cno = pebc->cache_protection.future_request_cno; +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_unlock(&pebc->cache_protection.cno_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, peb_index %u, create_cno %llu, " + "last_request_cno %llu, reqs_count %u, " + "protected_range %llu, future_request_cno %llu, " + "dont_touch %#x\n", + pebc->parent_si->seg_id, + pebc->peb_index, + create_cno, + last_request_cno, reqs_count, + protected_range, future_request_cno, + dont_touch); +#endif /* CONFIG_SSDFS_DEBUG */ + + return !dont_touch; +} + +/* + * Segment object's API + */ +struct ssdfs_segment_info *ssdfs_segment_allocate_object(u64 seg_id); +void ssdfs_segment_free_object(struct ssdfs_segment_info *si); +int ssdfs_segment_create_object(struct ssdfs_fs_info *fsi, + u64 seg, + int seg_state, + u16 seg_type, + u16 log_pages, + u8 create_threads, + struct ssdfs_segment_info *si); +int ssdfs_segment_destroy_object(struct ssdfs_segment_info *si); +void ssdfs_segment_get_object(struct ssdfs_segment_info *si); +void ssdfs_segment_put_object(struct ssdfs_segment_info *si); + +struct ssdfs_segment_info * +ssdfs_grab_segment(struct ssdfs_fs_info *fsi, int seg_type, u64 seg_id, + u64 start_search_id); + +int ssdfs_segment_read_block_sync(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req); +int ssdfs_segment_read_block_async(struct ssdfs_segment_info *si, + int req_type, + struct ssdfs_segment_request *req); + +int ssdfs_segment_pre_alloc_data_block_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_pre_alloc_data_block_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_pre_alloc_leaf_node_block_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_pre_alloc_leaf_node_block_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_pre_alloc_hybrid_node_block_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_pre_alloc_hybrid_node_block_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_pre_alloc_index_node_block_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_pre_alloc_index_node_block_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); + +int ssdfs_segment_add_data_block_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_add_data_block_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_migrate_zone_block_sync(struct ssdfs_fs_info *fsi, + int req_type, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_migrate_zone_block_async(struct ssdfs_fs_info *fsi, + int req_type, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_add_leaf_node_block_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_add_leaf_node_block_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_add_hybrid_node_block_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_add_hybrid_node_block_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_add_index_node_block_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_add_index_node_block_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); + +int ssdfs_segment_pre_alloc_data_extent_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_pre_alloc_data_extent_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_pre_alloc_leaf_node_extent_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_pre_alloc_leaf_node_extent_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_pre_alloc_hybrid_node_extent_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_pre_alloc_hybrid_node_extent_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_pre_alloc_index_node_extent_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_pre_alloc_index_node_extent_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); + +int ssdfs_segment_add_data_extent_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_add_data_extent_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_migrate_zone_extent_sync(struct ssdfs_fs_info *fsi, + int req_type, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_migrate_zone_extent_async(struct ssdfs_fs_info *fsi, + int req_type, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_add_leaf_node_extent_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_add_leaf_node_extent_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_add_hybrid_node_extent_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_add_hybrid_node_extent_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_add_index_node_extent_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); +int ssdfs_segment_add_index_node_extent_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent); + +int ssdfs_segment_update_block_sync(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req); +int ssdfs_segment_update_block_async(struct ssdfs_segment_info *si, + int req_type, + struct ssdfs_segment_request *req); +int ssdfs_segment_update_extent_sync(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req); +int ssdfs_segment_update_extent_async(struct ssdfs_segment_info *si, + int req_type, + struct ssdfs_segment_request *req); +int ssdfs_segment_update_pre_alloc_block_sync(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req); +int ssdfs_segment_update_pre_alloc_block_async(struct ssdfs_segment_info *si, + int req_type, + struct ssdfs_segment_request *req); +int ssdfs_segment_update_pre_alloc_extent_sync(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req); +int ssdfs_segment_update_pre_alloc_extent_async(struct ssdfs_segment_info *si, + int req_type, + struct ssdfs_segment_request *req); + +int ssdfs_segment_node_diff_on_write_sync(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req); +int ssdfs_segment_node_diff_on_write_async(struct ssdfs_segment_info *si, + int req_type, + struct ssdfs_segment_request *req); +int ssdfs_segment_data_diff_on_write_sync(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req); +int ssdfs_segment_data_diff_on_write_async(struct ssdfs_segment_info *si, + int req_type, + struct ssdfs_segment_request *req); + +int ssdfs_segment_prepare_migration_sync(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req); +int ssdfs_segment_prepare_migration_async(struct ssdfs_segment_info *si, + int req_type, + struct ssdfs_segment_request *req); +int ssdfs_segment_commit_log_sync(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req); +int ssdfs_segment_commit_log_async(struct ssdfs_segment_info *si, + int req_type, + struct ssdfs_segment_request *req); +int ssdfs_segment_commit_log_sync2(struct ssdfs_segment_info *si, + u16 peb_index, + struct ssdfs_segment_request *req); +int ssdfs_segment_commit_log_async2(struct ssdfs_segment_info *si, + int req_type, u16 peb_index, + struct ssdfs_segment_request *req); + +int ssdfs_segment_invalidate_logical_block(struct ssdfs_segment_info *si, + u32 blk_offset); +int ssdfs_segment_invalidate_logical_extent(struct ssdfs_segment_info *si, + u32 start_off, u32 blks_count); + +int ssdfs_segment_migrate_range_async(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req); +int ssdfs_segment_migrate_pre_alloc_page_async(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req); +int ssdfs_segment_migrate_fragment_async(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req); + +/* + * Internal segment object's API + */ +struct ssdfs_segment_info * +__ssdfs_create_new_segment(struct ssdfs_fs_info *fsi, + u64 seg_id, int seg_state, + u16 seg_type, u16 log_pages, + u8 create_threads); +int ssdfs_segment_change_state(struct ssdfs_segment_info *si); +int ssdfs_segment_detect_search_range(struct ssdfs_fs_info *fsi, + u64 *start_seg, u64 *end_seg); + +#endif /* _SSDFS_SEGMENT_H */ diff --git a/fs/ssdfs/segment_tree.c b/fs/ssdfs/segment_tree.c new file mode 100644 index 000000000000..2cb3ae2c5c9c --- /dev/null +++ b/fs/ssdfs/segment_tree.c @@ -0,0 +1,748 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/segment_tree.c - segment tree implementation. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "segment_bitmap.h" +#include "offset_translation_table.h" +#include "page_array.h" +#include "page_vector.h" +#include "peb_container.h" +#include "segment.h" +#include "btree_search.h" +#include "btree_node.h" +#include "btree.h" +#include "extents_tree.h" +#include "segment_tree.h" + +#include + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_seg_tree_page_leaks; +atomic64_t ssdfs_seg_tree_memory_leaks; +atomic64_t ssdfs_seg_tree_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_seg_tree_cache_leaks_increment(void *kaddr) + * void ssdfs_seg_tree_cache_leaks_decrement(void *kaddr) + * void *ssdfs_seg_tree_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_seg_tree_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_seg_tree_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_seg_tree_kfree(void *kaddr) + * struct page *ssdfs_seg_tree_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_seg_tree_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_seg_tree_free_page(struct page *page) + * void ssdfs_seg_tree_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(seg_tree) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(seg_tree) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_seg_tree_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_seg_tree_page_leaks, 0); + atomic64_set(&ssdfs_seg_tree_memory_leaks, 0); + atomic64_set(&ssdfs_seg_tree_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_seg_tree_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_seg_tree_page_leaks) != 0) { + SSDFS_ERR("SEGMENT TREE: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_seg_tree_page_leaks)); + } + + if (atomic64_read(&ssdfs_seg_tree_memory_leaks) != 0) { + SSDFS_ERR("SEGMENT TREE: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_seg_tree_memory_leaks)); + } + + if (atomic64_read(&ssdfs_seg_tree_cache_leaks) != 0) { + SSDFS_ERR("SEGMENT TREE: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_seg_tree_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +/****************************************************************************** + * SEGMENTS TREE FUNCTIONALITY * + ******************************************************************************/ + +static +void ssdfs_segment_tree_invalidate_folio(struct folio *folio, size_t offset, + size_t length) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("do nothing: offset %zu, length %zu\n", + offset, length); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +/* + * ssdfs_segment_tree_release_folio() - Release fs-specific metadata on a folio. + * @folio: The folio which the kernel is trying to free. + * @gfp: Memory allocation flags (and I/O mode). + * + * The address_space is trying to release any data attached to a folio + * (presumably at folio->private). + * + * This will also be called if the private_2 flag is set on a page, + * indicating that the folio has other metadata associated with it. + * + * The @gfp argument specifies whether I/O may be performed to release + * this page (__GFP_IO), and whether the call may block + * (__GFP_RECLAIM & __GFP_FS). + * + * Return: %true if the release was successful, otherwise %false. + */ +static +bool ssdfs_segment_tree_release_folio(struct folio *folio, gfp_t gfp) +{ + return false; +} + +static +bool ssdfs_segment_tree_noop_dirty_folio(struct address_space *mapping, + struct folio *folio) +{ + return true; +} + +const struct address_space_operations ssdfs_segment_tree_aops = { + .invalidate_folio = ssdfs_segment_tree_invalidate_folio, + .release_folio = ssdfs_segment_tree_release_folio, + .dirty_folio = ssdfs_segment_tree_noop_dirty_folio, +}; + +/* + * ssdfs_segment_tree_mapping_init() - segment tree's mapping init + */ +static inline +void ssdfs_segment_tree_mapping_init(struct address_space *mapping, + struct inode *inode) +{ + address_space_init_once(mapping); + mapping->a_ops = &ssdfs_segment_tree_aops; + mapping->host = inode; + mapping->flags = 0; + atomic_set(&mapping->i_mmap_writable, 0); + mapping_set_gfp_mask(mapping, GFP_KERNEL | __GFP_ZERO); + mapping->private_data = NULL; + mapping->writeback_index = 0; + inode->i_mapping = mapping; +} + +static const struct inode_operations def_segment_tree_ino_iops; +static const struct file_operations def_segment_tree_ino_fops; +static const struct address_space_operations def_segment_tree_ino_aops; + +/* + * ssdfs_create_segment_tree_inode() - create segments tree's inode + * @fsi: pointer on shared file system object + */ +static +int ssdfs_create_segment_tree_inode(struct ssdfs_fs_info *fsi) +{ + struct inode *inode; + struct ssdfs_inode_info *ii; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + + SSDFS_DBG("fsi %p\n", fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + + inode = iget_locked(fsi->sb, SSDFS_SEG_TREE_INO); + if (unlikely(!inode)) { + err = -ENOMEM; + SSDFS_ERR("unable to allocate segment tree inode: err %d\n", + err); + return err; + } + + BUG_ON(!(inode->i_state & I_NEW)); + + inode->i_mode = S_IFREG; + mapping_set_gfp_mask(inode->i_mapping, GFP_KERNEL); + + inode->i_op = &def_segment_tree_ino_iops; + inode->i_fop = &def_segment_tree_ino_fops; + inode->i_mapping->a_ops = &def_segment_tree_ino_aops; + + ii = SSDFS_I(inode); + ii->birthtime = current_time(inode); + ii->parent_ino = U64_MAX; + + down_write(&ii->lock); + err = ssdfs_extents_tree_create(fsi, ii); + up_write(&ii->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to create the extents tree: " + "err %d\n", err); + unlock_new_inode(inode); + iput(inode); + return -ERANGE; + } + + unlock_new_inode(inode); + + fsi->segs_tree_inode = inode; + + return 0; +} + +/* + * ssdfs_segment_tree_create() - create segments tree + * @fsi: pointer on shared file system object + */ +int ssdfs_segment_tree_create(struct ssdfs_fs_info *fsi) +{ + size_t dentries_desc_size = + sizeof(struct ssdfs_dentries_btree_descriptor); + size_t extents_desc_size = + sizeof(struct ssdfs_extents_btree_descriptor); + size_t xattr_desc_size = + sizeof(struct ssdfs_xattr_btree_descriptor); + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + BUG_ON(!rwsem_is_locked(&fsi->volume_sem)); + + SSDFS_DBG("fsi %p\n", fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi->segs_tree = + ssdfs_seg_tree_kzalloc(sizeof(struct ssdfs_segment_tree), + GFP_KERNEL); + if (!fsi->segs_tree) { + SSDFS_ERR("fail to allocate segment tree's root object\n"); + return -ENOMEM; + } + + ssdfs_memcpy(&fsi->segs_tree->dentries_btree, + 0, dentries_desc_size, + &fsi->vh->dentries_btree, + 0, dentries_desc_size, + dentries_desc_size); + ssdfs_memcpy(&fsi->segs_tree->extents_btree, 0, extents_desc_size, + &fsi->vh->extents_btree, 0, extents_desc_size, + extents_desc_size); + ssdfs_memcpy(&fsi->segs_tree->xattr_btree, 0, xattr_desc_size, + &fsi->vh->xattr_btree, 0, xattr_desc_size, + xattr_desc_size); + + err = ssdfs_create_segment_tree_inode(fsi); + if (unlikely(err)) { + SSDFS_ERR("fail to create segment tree's inode: " + "err %d\n", + err); + goto free_memory; + } + + fsi->segs_tree->lnodes_seg_log_pages = + le16_to_cpu(fsi->vh->lnodes_seg_log_pages); + fsi->segs_tree->hnodes_seg_log_pages = + le16_to_cpu(fsi->vh->hnodes_seg_log_pages); + fsi->segs_tree->inodes_seg_log_pages = + le16_to_cpu(fsi->vh->inodes_seg_log_pages); + fsi->segs_tree->user_data_log_pages = + le16_to_cpu(fsi->vh->user_data_log_pages); + fsi->segs_tree->default_log_pages = SSDFS_LOG_PAGES_DEFAULT; + + ssdfs_segment_tree_mapping_init(&fsi->segs_tree->pages, + fsi->segs_tree_inode); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("DONE: create segment tree\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; + +free_memory: + ssdfs_seg_tree_kfree(fsi->segs_tree); + fsi->segs_tree = NULL; + + return err; +} + +/* + * ssdfs_segment_tree_destroy_objects_in_page() - destroy objects in page + * @fsi: pointer on shared file system object + * @page: pointer on memory page + */ +static +void ssdfs_segment_tree_destroy_objects_in_page(struct ssdfs_fs_info *fsi, + struct page *page) +{ + struct ssdfs_segment_info **kaddr; + size_t ptr_size = sizeof(struct ssdfs_segment_info *); + size_t ptrs_per_page = PAGE_SIZE / ptr_size; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page || !fsi || !fsi->segs_tree); + + SSDFS_DBG("page %p\n", page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + + kaddr = (struct ssdfs_segment_info **)kmap_local_page(page); + + for (i = 0; i < ptrs_per_page; i++) { + struct ssdfs_segment_info *si = *(kaddr + i); + + if (si) { + wait_queue_head_t *wq = &si->object_queue; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("si %p, seg_id %llu\n", si, si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (atomic_read(&si->refs_count) > 0) { + ssdfs_unlock_page(page); + + err = wait_event_killable_timeout(*wq, + atomic_read(&si->refs_count) <= 0, + SSDFS_DEFAULT_TIMEOUT); + if (err < 0) + WARN_ON(err < 0); + else + err = 0; + + ssdfs_lock_page(page); + } + + err = ssdfs_segment_destroy_object(si); + if (err) { + SSDFS_WARN("fail to destroy segment object: " + "seg %llu, err %d\n", + si->seg_id, err); + } + } + + } + + kunmap_local(kaddr); + + __ssdfs_clear_dirty_page(page); + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); + SSDFS_DBG("page_index %ld, flags %#lx\n", + page->index, page->flags); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +/* + * ssdfs_segment_tree_destroy_objects_in_array() - destroy objects in array + * @fsi: pointer on shared file system object + * @array: pointer on array of pages + * @pages_count: count of pages in array + */ +static +void ssdfs_segment_tree_destroy_objects_in_array(struct ssdfs_fs_info *fsi, + struct page **array, + size_t pages_count) +{ + struct page *page; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array || !fsi); + + SSDFS_DBG("array %p, pages_count %zu\n", + array, pages_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < pages_count; i++) { + page = array[i]; + + if (!page) { + SSDFS_WARN("page pointer is NULL: " + "index %d\n", + i); + continue; + } + + ssdfs_segment_tree_destroy_objects_in_page(fsi, page); + } +} + +#define SSDFS_MEM_PAGE_ARRAY_SIZE (16) + +/* + * ssdfs_segment_tree_destroy_segment_objects() - destroy all segment objects + * @fsi: pointer on shared file system object + */ +static +void ssdfs_segment_tree_destroy_segment_objects(struct ssdfs_fs_info *fsi) +{ + pgoff_t start = 0; + pgoff_t end = -1; + size_t pages_count = 0; + struct page *array[SSDFS_MEM_PAGE_ARRAY_SIZE] = {0}; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !fsi->segs_tree); + + SSDFS_DBG("fsi %p\n", fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + + do { + pages_count = find_get_pages_range_tag(&fsi->segs_tree->pages, + &start, end, + PAGECACHE_TAG_DIRTY, + SSDFS_MEM_PAGE_ARRAY_SIZE, + &array[0]); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start %lu, pages_count %zu\n", + start, pages_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pages_count != 0) { + ssdfs_segment_tree_destroy_objects_in_array(fsi, + &array[0], + pages_count); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!array[pages_count - 1]); +#endif /* CONFIG_SSDFS_DEBUG */ + + start = page_index(array[pages_count - 1]) + 1; + } + } while (pages_count != 0); +} + +/* + * ssdfs_segment_tree_destroy() - destroy segments tree + * @fsi: pointer on shared file system object + */ +void ssdfs_segment_tree_destroy(struct ssdfs_fs_info *fsi) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !fsi->segs_tree); + + SSDFS_DBG("fsi %p\n", fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + + inode_lock(fsi->segs_tree_inode); + + ssdfs_segment_tree_destroy_segment_objects(fsi); + + if (fsi->segs_tree->pages.nrpages != 0) + truncate_inode_pages(&fsi->segs_tree->pages, 0); + + inode_unlock(fsi->segs_tree_inode); + + iput(fsi->segs_tree_inode); + ssdfs_seg_tree_kfree(fsi->segs_tree); + fsi->segs_tree = NULL; +} + +/* + * ssdfs_segment_tree_add() - add segment object into the tree + * @fsi: pointer on shared file system object + * @si: pointer on segment object + * + * This method tries to add the valid pointer on segment + * object into the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - fail to allocate memory. + * %-EEXIST - segment has been added already. + */ +int ssdfs_segment_tree_add(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_info *si) +{ + pgoff_t page_index; + u32 object_index; + struct page *page; + struct ssdfs_segment_info **kaddr, *object; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !fsi->segs_tree || !si); + + SSDFS_DBG("fsi %p, si %p, seg %llu\n", + fsi, si, si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_index = div_u64_rem(si->seg_id, SSDFS_SEG_OBJ_PTR_PER_PAGE, + &object_index); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page_index %lu, object_index %u\n", + page_index, object_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + inode_lock(fsi->segs_tree_inode); + + page = grab_cache_page(&fsi->segs_tree->pages, page_index); + if (!page) { + err = -ENOMEM; + SSDFS_ERR("fail to grab page: page_index %lu\n", + page_index); + goto finish_add_segment; + } + + ssdfs_account_locked_page(page); + + kaddr = (struct ssdfs_segment_info **)kmap_local_page(page); + object = *(kaddr + object_index); + if (object) { + err = -EEXIST; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("object exists for segment %llu\n", + si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + } else + *(kaddr + object_index) = si; + kunmap_local(kaddr); + + SetPageUptodate(page); + if (!PageDirty(page)) + ssdfs_set_page_dirty(page); + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); + SSDFS_DBG("page_index %ld, flags %#lx\n", + page->index, page->flags); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_add_segment: + inode_unlock(fsi->segs_tree_inode); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_segment_tree_remove() - remove segment object from the tree + * @fsi: pointer on shared file system object + * @si: pointer on segment object + * + * This method tries to remove the valid pointer on segment + * object from the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENODATA - segment tree hasn't object for @si. + */ +int ssdfs_segment_tree_remove(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_info *si) +{ + pgoff_t page_index; + u32 object_index; + struct page *page; + struct ssdfs_segment_info **kaddr, *object; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !fsi->segs_tree || !si); + + SSDFS_DBG("fsi %p, si %p, seg %llu\n", + fsi, si, si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_index = div_u64_rem(si->seg_id, SSDFS_SEG_OBJ_PTR_PER_PAGE, + &object_index); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page_index %lu, object_index %u\n", + page_index, object_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + inode_lock(fsi->segs_tree_inode); + + page = find_lock_page(&fsi->segs_tree->pages, page_index); + if (!page) { + err = -ENODATA; + SSDFS_ERR("failed to remove segment object: " + "seg %llu\n", + si->seg_id); + goto finish_remove_segment; + } + + ssdfs_account_locked_page(page); + kaddr = (struct ssdfs_segment_info **)kmap_local_page(page); + object = *(kaddr + object_index); + if (!object) { + err = -ENODATA; + SSDFS_WARN("object ptr is NULL: " + "seg %llu\n", + si->seg_id); + } else { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(object != si); +#endif /* CONFIG_SSDFS_DEBUG */ + *(kaddr + object_index) = NULL; + } + kunmap_local(kaddr); + + SetPageUptodate(page); + if (!PageDirty(page)) + ssdfs_set_page_dirty(page); + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + /* + * Prevent from error of creation + * the same segment in another thread. + */ + ssdfs_sysfs_delete_seg_group(si); + +finish_remove_segment: + inode_unlock(fsi->segs_tree_inode); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_segment_tree_find() - find segment object in the tree + * @fsi: pointer on shared file system object + * @seg_id: segment number + * + * This method tries to find the valid pointer on segment + * object for @seg_id. + * + * RETURN: + * [success] - pointer on found segment object + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENODATA - segment tree hasn't object for @seg_id. + */ +struct ssdfs_segment_info * +ssdfs_segment_tree_find(struct ssdfs_fs_info *fsi, u64 seg_id) +{ + pgoff_t page_index; + u32 object_index; + struct page *page; + struct ssdfs_segment_info **kaddr, *object; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !fsi->segs_tree); + + if (seg_id >= fsi->nsegs) { + SSDFS_ERR("seg_id %llu >= fsi->nsegs %llu\n", + seg_id, fsi->nsegs); + return ERR_PTR(-EINVAL); + } + + SSDFS_DBG("fsi %p, seg_id %llu\n", + fsi, seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_index = div_u64_rem(seg_id, SSDFS_SEG_OBJ_PTR_PER_PAGE, + &object_index); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page_index %lu, object_index %u\n", + page_index, object_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + inode_lock_shared(fsi->segs_tree_inode); + + page = find_lock_page(&fsi->segs_tree->pages, page_index); + if (!page) { + object = ERR_PTR(-ENODATA); +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find segment object: " + "seg %llu\n", + seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_find_segment; + } + + ssdfs_account_locked_page(page); + kaddr = (struct ssdfs_segment_info **)kmap_local_page(page); + object = *(kaddr + object_index); + if (!object) { + object = ERR_PTR(-ENODATA); +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find segment object: " + "seg %llu\n", + seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + } + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_find_segment: + inode_unlock_shared(fsi->segs_tree_inode); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return object; +} diff --git a/fs/ssdfs/segment_tree.h b/fs/ssdfs/segment_tree.h new file mode 100644 index 000000000000..9d76fa784e7c --- /dev/null +++ b/fs/ssdfs/segment_tree.h @@ -0,0 +1,66 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/segment_tree.h - segment tree declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#ifndef _SSDFS_SEGMENT_TREE_H +#define _SSDFS_SEGMENT_TREE_H + +/* + * struct ssdfs_segment_tree - tree of segment objects + * @lnodes_seg_log_pages: full log size in leaf nodes segment (pages count) + * @hnodes_seg_log_pages: full log size in hybrid nodes segment (pages count) + * @inodes_seg_log_pages: full log size in index nodes segment (pages count) + * @user_data_log_pages: full log size in user data segment (pages count) + * @default_log_pages: default full log size (pages count) + * @dentries_btree: dentries b-tree descriptor + * @extents_btree: extents b-tree descriptor + * @xattr_btree: xattrs b-tree descriptor + * @pages: pages of segment tree + */ +struct ssdfs_segment_tree { + u16 lnodes_seg_log_pages; + u16 hnodes_seg_log_pages; + u16 inodes_seg_log_pages; + u16 user_data_log_pages; + u16 default_log_pages; + + struct ssdfs_dentries_btree_descriptor dentries_btree; + struct ssdfs_extents_btree_descriptor extents_btree; + struct ssdfs_xattr_btree_descriptor xattr_btree; + + struct address_space pages; +}; + +#define SSDFS_SEG_OBJ_PTR_PER_PAGE \ + (PAGE_SIZE / sizeof(struct ssdfs_segment_info *)) + +/* + * Segments' tree API + */ +int ssdfs_segment_tree_create(struct ssdfs_fs_info *fsi); +void ssdfs_segment_tree_destroy(struct ssdfs_fs_info *fsi); +int ssdfs_segment_tree_add(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_info *si); +int ssdfs_segment_tree_remove(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_info *si); +struct ssdfs_segment_info * +ssdfs_segment_tree_find(struct ssdfs_fs_info *fsi, u64 seg_id); + +#endif /* _SSDFS_SEGMENT_TREE_H */ From patchwork Sat Feb 25 01:08:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151940 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30A41C7EE31 for ; Sat, 25 Feb 2023 01:17:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229771AbjBYBRs (ORCPT ); Fri, 24 Feb 2023 20:17:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48998 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229676AbjBYBQ4 (ORCPT ); Fri, 24 Feb 2023 20:16:56 -0500 Received: from mail-oi1-x234.google.com (mail-oi1-x234.google.com [IPv6:2607:f8b0:4864:20::234]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F392216AFE for ; Fri, 24 Feb 2023 17:16:51 -0800 (PST) Received: by mail-oi1-x234.google.com with SMTP id s41so6237oiw.13 for ; Fri, 24 Feb 2023 17:16:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lGm1e+/JzjM4HZAccv9T5Ee/s9Q/Dd9h0fJqrYK+zWw=; b=guxOqtrg5l9lXvc3i82eFd23M560WN7rdy2cKvpL9GHW6mV86rjQ/AwZP3RwUYO+Xf Knqxvyia6b59SHgJRGyqL5Zl3EDX+QBp1beilND4zAqVy7AFrnse35yFUqQ+TDhJj9/N 5kgezvUBuyCws+poQ/s9xU7KPIzM8g1cQPrMmOhdeUp38OsGFsbOyqFWigRS8QDFYRA8 FU3j73IiBSifag9I6wHvDG5oFvAGBK8vZ3tuNJ+FLV96tN8/k1tZ6bmxWErh02E+KMgW zfHIH+kCBox5ypN2PIF+jX9kjfeHH0m+pVKeyS5NDlj+hHjgyI6fygeeIcPC8DNYKkdp JbqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lGm1e+/JzjM4HZAccv9T5Ee/s9Q/Dd9h0fJqrYK+zWw=; b=v7JTKzvInLdCDnAK6TUYAwTmt9LrcPOSF3D2x4lUnsGaJdAc0FqDyARB5hFT+j7Dar l5jPJ6gAhBBfVt9RtmOvRNfQPH8CXF5Fg3wbvpb9QfTR23QH03Nymh2DEPY/7UwGgdp9 adLtbxIALb8v9E7j/1qJAua+gAweHB8X2B8b/ru54G/ImCFLtnZDQ2PUibHe5CmrnnLX DJG/DtPRSXrBR3dsQwGrK0w8riONySTC1KJf8ck84DUi+XeGZOukQtuipl1BIiesvq0l 4r8Ds5Mr0mjPQ1+QVI3OKC9AiefHLooiN8HBJ1mrBaKde/hbla7zYuxjs9XBosUyrAQr 4kXg== X-Gm-Message-State: AO0yUKVQ6GK941V25oMQW9SBZLu8FpRAwGs2vdVSmoCKZxPSyU6dJ+6t wUjpoKCVeX5l4Y++SLbqZ0y6sFbKDC9a0Ux3 X-Google-Smtp-Source: AK7set/NS440RLO+v7amaSL0E6G7L2iW74nK9GX5jgJm61WwoAPmgUkbjoXgkrQ9AIIyj1vwom3uYQ== X-Received: by 2002:a05:6808:2b0a:b0:378:954e:95b7 with SMTP id fe10-20020a0568082b0a00b00378954e95b7mr4725779oib.58.1677287810702; Fri, 24 Feb 2023 17:16:50 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:49 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 36/76] ssdfs: segment object's add data/metadata operations Date: Fri, 24 Feb 2023 17:08:47 -0800 Message-Id: <20230225010927.813929-37-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Segment object implements API of adding logical blocks into user files or metadata structures. It means that if file or metadata structure (for example, b-tree) needs to grow, then file system logic has to add/allocate new block or extent. Add/Allocate logical block operation requires several steps: (1) Reserve logical block(s) by means decrementing/checking the counter of free logical blocks for the whole volume; (2) Allocate logical block ID(s) by offset translation table of segment object; (3) Add create request into flush thread's queue; (4) Flush thread processes create request by means of compressing user data or metadata and compact several compressed logical block into one or several memory pages; (5) Flush thread execute commit operation by means of preparing the log (header + payload + footer) and stores into offset translation table the association of logical block ID with particular offset into log's payload. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/segment.c | 2426 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 2426 insertions(+) diff --git a/fs/ssdfs/segment.c b/fs/ssdfs/segment.c index 6f23c16fe800..9496b18aa1f3 100644 --- a/fs/ssdfs/segment.c +++ b/fs/ssdfs/segment.c @@ -1313,3 +1313,2429 @@ ssdfs_grab_segment(struct ssdfs_fs_info *fsi, int seg_type, u64 seg_id, return si; } + +/* + * __ssdfs_segment_read_block() - read segment's block + * @si: segment info + * @req: segment request [in|out] + */ +static +int __ssdfs_segment_read_block(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req) +{ + struct ssdfs_blk2off_table *table; + struct ssdfs_phys_offset_descriptor *po_desc; + struct ssdfs_peb_container *pebc; + struct ssdfs_requests_queue *rq; + wait_queue_head_t *wait; + u16 peb_index = U16_MAX; + u16 logical_blk; + struct ssdfs_offset_position pos = {0}; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + table = si->blk2off_table; + logical_blk = req->place.start.blk_index; + + po_desc = ssdfs_blk2off_table_convert(table, logical_blk, + &peb_index, NULL, &pos); + if (IS_ERR(po_desc) && PTR_ERR(po_desc) == -EAGAIN) { + struct completion *end = &table->full_init_end; + + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("blk2off init failed: " + "err %d\n", err); + return err; + } + + po_desc = ssdfs_blk2off_table_convert(table, logical_blk, + &peb_index, NULL, + &pos); + } + + if (IS_ERR_OR_NULL(po_desc)) { + err = (po_desc == NULL ? -ERANGE : PTR_ERR(po_desc)); + SSDFS_ERR("fail to convert: " + "logical_blk %u, err %d\n", + logical_blk, err); + return err; + } + + if (peb_index >= si->pebs_count) { + SSDFS_ERR("peb_index %u >= si->pebs_count %u\n", + peb_index, si->pebs_count); + return -ERANGE; + } + + pebc = &si->peb_array[peb_index]; + + ssdfs_peb_read_request_cno(pebc); + + rq = &pebc->read_rq; + ssdfs_requests_queue_add_tail(rq, req); + + wait = &si->wait_queue[SSDFS_PEB_READ_THREAD]; + wake_up_all(wait); + + return 0; +} + +/* + * ssdfs_segment_read_block_sync() - read segment's block synchronously + * @si: segment info + * @req: segment request [in|out] + */ +int ssdfs_segment_read_block_sync(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + SSDFS_READ_PAGE, + SSDFS_REQ_SYNC, + req); + ssdfs_request_define_segment(si->seg_id, req); + + return __ssdfs_segment_read_block(si, req); +} + +/* + * ssdfs_segment_read_block_async() - read segment's block asynchronously + * @req_type: request type + * @si: segment info + * @req: segment request [in|out] + */ +int ssdfs_segment_read_block_async(struct ssdfs_segment_info *si, + int req_type, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (req_type) { + case SSDFS_REQ_ASYNC: + case SSDFS_REQ_ASYNC_NO_FREE: + /* expected request type */ + break; + + default: + SSDFS_ERR("unexpected request type %#x\n", + req_type); + return -EINVAL; + } + + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + SSDFS_READ_PAGE, + req_type, req); + ssdfs_request_define_segment(si->seg_id, req); + + return __ssdfs_segment_read_block(si, req); +} + +/* + * ssdfs_segment_get_used_data_pages() - get segment's used data pages count + * @si: segment object + * + * This function tries to get segment's used data pages count. + * + * RETURN: + * [success] + * [failure] - error code. + */ +int ssdfs_segment_get_used_data_pages(struct ssdfs_segment_info *si) +{ + int used_pages = 0; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si); + + SSDFS_DBG("seg %llu\n", si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < si->pebs_count; i++) { + struct ssdfs_peb_container *pebc = &si->peb_array[i]; + + err = ssdfs_peb_get_used_data_pages(pebc); + if (err < 0) { + SSDFS_ERR("fail to get used data pages count: " + "seg %llu, peb index %d, err %d\n", + si->seg_id, i, err); + return err; + } else + used_pages += err; + } + + return used_pages; +} + +/* + * ssdfs_segment_change_state() - change segment state + * @si: pointer on segment object + */ +int ssdfs_segment_change_state(struct ssdfs_segment_info *si) +{ + struct ssdfs_segment_bmap *segbmap; + struct ssdfs_blk2off_table *blk2off_tbl; + u32 pages_per_seg; + u16 used_logical_blks; + int free_pages, invalid_pages; + bool need_change_state = false; + int seg_state, old_seg_state; + int new_seg_state = SSDFS_SEG_STATE_MAX; + u64 seg_id; + struct completion *init_end; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si); +#endif /* CONFIG_SSDFS_DEBUG */ + + seg_id = si->seg_id; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("si %p, seg_id %llu\n", + si, seg_id); +#else + SSDFS_DBG("si %p, seg_id %llu\n", + si, seg_id); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + blk2off_tbl = si->blk2off_table; + segbmap = si->fsi->segbmap; + + err = ssdfs_blk2off_table_get_used_logical_blks(blk2off_tbl, + &used_logical_blks); + if (err == -EAGAIN) { + init_end = &blk2off_tbl->partial_init_end; + + err = SSDFS_WAIT_COMPLETION(init_end); + if (unlikely(err)) { + SSDFS_ERR("blk2off init failed: " + "err %d\n", err); + return err; + } + + err = ssdfs_blk2off_table_get_used_logical_blks(blk2off_tbl, + &used_logical_blks); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to get used logical blocks count: " + "err %d\n", + err); + return err; + } else if (used_logical_blks == U16_MAX) { + SSDFS_ERR("invalid used logical blocks count\n"); + return -ERANGE; + } + + pages_per_seg = si->fsi->pages_per_seg; + seg_state = atomic_read(&si->seg_state); + free_pages = ssdfs_segment_blk_bmap_get_free_pages(&si->blk_bmap); + invalid_pages = ssdfs_segment_blk_bmap_get_invalid_pages(&si->blk_bmap); + + if (free_pages > pages_per_seg) { + SSDFS_ERR("free_pages %d > pages_per_seg %u\n", + free_pages, pages_per_seg); + return -ERANGE; + } + + switch (seg_state) { + case SSDFS_SEG_CLEAN: + if (free_pages == pages_per_seg) { + /* + * Do nothing. + */ + } else if (free_pages > 0) { + need_change_state = true; + + if (invalid_pages > 0) { + new_seg_state = SSDFS_SEG_PRE_DIRTY; + } else { + new_seg_state = + SEG_TYPE_TO_USING_STATE(si->seg_type); + if (new_seg_state < 0 || + new_seg_state == SSDFS_SEG_STATE_MAX) { + SSDFS_ERR("invalid seg_type %#x\n", + si->seg_type); + return -ERANGE; + } + } + } else { + need_change_state = true; + + if (invalid_pages == 0) + new_seg_state = SSDFS_SEG_USED; + else if (used_logical_blks == 0) + new_seg_state = SSDFS_SEG_DIRTY; + else + new_seg_state = SSDFS_SEG_PRE_DIRTY; + } + break; + + case SSDFS_SEG_DATA_USING: + case SSDFS_SEG_LEAF_NODE_USING: + case SSDFS_SEG_HYBRID_NODE_USING: + case SSDFS_SEG_INDEX_NODE_USING: + if (free_pages == pages_per_seg) { + if (invalid_pages == 0 && used_logical_blks == 0) { + need_change_state = true; + new_seg_state = SSDFS_SEG_CLEAN; + } else { + SSDFS_ERR("free_pages %d == pages_per_seg %u\n", + free_pages, pages_per_seg); + return -ERANGE; + } + } else if (free_pages > 0) { + if (invalid_pages > 0) { + need_change_state = true; + new_seg_state = SSDFS_SEG_PRE_DIRTY; + } + } else { + need_change_state = true; + + if (invalid_pages == 0) + new_seg_state = SSDFS_SEG_USED; + else if (used_logical_blks == 0) + new_seg_state = SSDFS_SEG_DIRTY; + else + new_seg_state = SSDFS_SEG_PRE_DIRTY; + } + break; + + case SSDFS_SEG_USED: + if (free_pages == pages_per_seg) { + if (invalid_pages == 0 && used_logical_blks == 0) { + need_change_state = true; + new_seg_state = SSDFS_SEG_CLEAN; + } else { + SSDFS_ERR("free_pages %d == pages_per_seg %u\n", + free_pages, pages_per_seg); + return -ERANGE; + } + } else if (invalid_pages > 0) { + need_change_state = true; + + if (used_logical_blks > 0) { + /* pre-dirty state */ + new_seg_state = SSDFS_SEG_PRE_DIRTY; + } else if (free_pages > 0) { + /* pre-dirty state */ + new_seg_state = SSDFS_SEG_PRE_DIRTY; + } else { + /* dirty state */ + new_seg_state = SSDFS_SEG_DIRTY; + } + } else if (free_pages > 0) { + need_change_state = true; + new_seg_state = SEG_TYPE_TO_USING_STATE(si->seg_type); + if (new_seg_state < 0 || + new_seg_state == SSDFS_SEG_STATE_MAX) { + SSDFS_ERR("invalid seg_type %#x\n", + si->seg_type); + return -ERANGE; + } + } + break; + + case SSDFS_SEG_PRE_DIRTY: + if (free_pages == pages_per_seg) { + if (invalid_pages == 0 && used_logical_blks == 0) { + need_change_state = true; + new_seg_state = SSDFS_SEG_CLEAN; + } else { + SSDFS_ERR("free_pages %d == pages_per_seg %u\n", + free_pages, pages_per_seg); + return -ERANGE; + } + } else if (invalid_pages > 0) { + if (used_logical_blks == 0) { + need_change_state = true; + new_seg_state = SSDFS_SEG_DIRTY; + } + } else if (free_pages > 0) { + need_change_state = true; + new_seg_state = SEG_TYPE_TO_USING_STATE(si->seg_type); + if (new_seg_state < 0 || + new_seg_state == SSDFS_SEG_STATE_MAX) { + SSDFS_ERR("invalid seg_type %#x\n", + si->seg_type); + return -ERANGE; + } + } else if (free_pages == 0 && invalid_pages == 0) { + if (used_logical_blks == 0) { + SSDFS_ERR("invalid state: " + "invalid_pages %d, " + "free_pages %d, " + "used_logical_blks %u\n", + invalid_pages, + free_pages, + used_logical_blks); + return -ERANGE; + } else { + need_change_state = true; + new_seg_state = SSDFS_SEG_USED; + } + } + break; + + case SSDFS_SEG_DIRTY: + if (free_pages == pages_per_seg) { + if (invalid_pages == 0 && used_logical_blks == 0) { + need_change_state = true; + new_seg_state = SSDFS_SEG_CLEAN; + } else { + SSDFS_ERR("free_pages %d == pages_per_seg %u\n", + free_pages, pages_per_seg); + return -ERANGE; + } + } else if (invalid_pages > 0) { + if (used_logical_blks > 0 || free_pages > 0) { + need_change_state = true; + new_seg_state = SSDFS_SEG_PRE_DIRTY; + } + } else if (free_pages > 0) { + need_change_state = true; + new_seg_state = SEG_TYPE_TO_USING_STATE(si->seg_type); + if (new_seg_state < 0 || + new_seg_state == SSDFS_SEG_STATE_MAX) { + SSDFS_ERR("invalid seg_type %#x\n", + si->seg_type); + return -ERANGE; + } + } else if (free_pages == 0 && invalid_pages == 0) { + if (used_logical_blks == 0) { + SSDFS_ERR("invalid state: " + "invalid_pages %d, " + "free_pages %d, " + "used_logical_blks %u\n", + invalid_pages, + free_pages, + used_logical_blks); + return -ERANGE; + } else { + need_change_state = true; + new_seg_state = SSDFS_SEG_USED; + } + } + break; + + case SSDFS_SEG_BAD: + case SSDFS_SEG_RESERVED: + /* do nothing */ + break; + + default: + break; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("old_state %#x, new_state %#x, " + "need_change_state %#x, free_pages %d, " + "invalid_pages %d, used_logical_blks %u\n", + seg_state, new_seg_state, + need_change_state, free_pages, + invalid_pages, used_logical_blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!need_change_state) { +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("no need to change state\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + return 0; + } + + err = ssdfs_segbmap_change_state(segbmap, seg_id, + new_seg_state, &init_end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(init_end); + if (unlikely(err)) { + SSDFS_ERR("segbmap init failed: " + "err %d\n", err); + return err; + } + + err = ssdfs_segbmap_change_state(segbmap, seg_id, + new_seg_state, + &init_end); + if (unlikely(err)) + goto fail_change_state; + } else if (unlikely(err)) { +fail_change_state: + SSDFS_ERR("fail to change segment state: " + "seg %llu, state %#x, err %d\n", + seg_id, new_seg_state, err); + return err; + } + + old_seg_state = atomic_cmpxchg(&si->seg_state, + seg_state, new_seg_state); + if (old_seg_state != seg_state) { + SSDFS_WARN("old_seg_state %#x != seg_state %#x\n", + old_seg_state, seg_state); + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} + +/* + * ssdfs_current_segment_change_state() - change current segment state + * @cur_seg: pointer on current segment + */ +static +int ssdfs_current_segment_change_state(struct ssdfs_current_segment *cur_seg) +{ + struct ssdfs_segment_info *si; + u64 seg_id; + int seg_state; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cur_seg || !cur_seg->real_seg); + BUG_ON(!mutex_is_locked(&cur_seg->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = cur_seg->real_seg; + seg_id = si->seg_id; + seg_state = atomic_read(&cur_seg->real_seg->seg_state); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_seg %p, si %p, seg_id %llu, seg_state %#x\n", + cur_seg, si, seg_id, seg_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (seg_state) { + case SSDFS_SEG_CLEAN: + case SSDFS_SEG_DATA_USING: + case SSDFS_SEG_LEAF_NODE_USING: + case SSDFS_SEG_HYBRID_NODE_USING: + case SSDFS_SEG_INDEX_NODE_USING: + case SSDFS_SEG_USED: + case SSDFS_SEG_PRE_DIRTY: + err = ssdfs_segment_change_state(si); + if (unlikely(err)) { + SSDFS_ERR("fail to change segment's state: " + "seg_id %llu, err %d\n", + seg_id, err); + return err; + } + break; + + case SSDFS_SEG_DIRTY: + case SSDFS_SEG_BAD: + case SSDFS_SEG_RESERVED: + SSDFS_ERR("invalid segment state: %#x\n", + seg_state); + return -ERANGE; + + default: + BUG(); + } + + return 0; +} + +/* + * ssdfs_calculate_zns_reservation_threshold() - reservation threshold + */ +static inline +u32 ssdfs_calculate_zns_reservation_threshold(void) +{ + u32 threshold; + + threshold = SSDFS_CUR_SEGS_COUNT * 2; + threshold += SSDFS_SB_CHAIN_MAX * SSDFS_SB_SEG_COPY_MAX; + threshold += SSDFS_SEGBMAP_SEGS * SSDFS_SEGBMAP_SEG_COPY_MAX; + threshold += SSDFS_MAPTBL_RESERVED_EXTENTS * SSDFS_MAPTBL_SEG_COPY_MAX; + + return threshold; +} + +/* + * CHECKED_SEG_TYPE() - correct segment type + * @fsi: pointer on shared file system object + * @cur_seg_type: checking segment type + */ +static inline +int CHECKED_SEG_TYPE(struct ssdfs_fs_info *fsi, int cur_seg_type) +{ + u32 threshold = ssdfs_calculate_zns_reservation_threshold(); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!fsi->is_zns_device) + return cur_seg_type; + + if (threshold < (fsi->max_open_zones / 2)) + return cur_seg_type; + + switch (cur_seg_type) { + case SSDFS_CUR_LNODE_SEG: + case SSDFS_CUR_HNODE_SEG: + case SSDFS_CUR_IDXNODE_SEG: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segment type %#x is corrected to %#x\n", + cur_seg_type, SSDFS_CUR_LNODE_SEG); +#endif /* CONFIG_SSDFS_DEBUG */ + return SSDFS_CUR_LNODE_SEG; + + default: + /* do nothing */ + break; + } + + return cur_seg_type; +} + +/* + * can_current_segment_be_added() - check that current segment can be added + * @si: pointer on segment object + */ +static inline +bool can_current_segment_be_added(struct ssdfs_segment_info *si) +{ + struct ssdfs_fs_info *fsi; + u32 threshold = ssdfs_calculate_zns_reservation_threshold(); + int open_zones; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = si->fsi; + + if (!fsi->is_zns_device) + return true; + + switch (si->seg_type) { + case SSDFS_LEAF_NODE_SEG_TYPE: + case SSDFS_HYBRID_NODE_SEG_TYPE: + case SSDFS_INDEX_NODE_SEG_TYPE: + open_zones = atomic_read(&fsi->open_zones); + + if (threshold < ((fsi->max_open_zones - open_zones) / 2)) + return true; + else + return false; + + case SSDFS_USER_DATA_SEG_TYPE: + return true; + + default: + /* do nothing */ + break; + } + + SSDFS_WARN("unexpected segment type %#x\n", + si->seg_type); + + return false; +} + +/* + * __ssdfs_segment_add_block() - add new block into segment + * @cur_seg: current segment container + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to add new block into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +static +int __ssdfs_segment_add_block(struct ssdfs_current_segment *cur_seg, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + int seg_type; + u64 start = U64_MAX; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cur_seg || !req || !seg_id || !extent); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#else + SSDFS_DBG("ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + fsi = cur_seg->fsi; + *seg_id = U64_MAX; + + ssdfs_current_segment_lock(cur_seg); + + seg_type = CHECKED_SEG_TYPE(fsi, SEG_TYPE(req->private.class)); + +try_current_segment: + if (is_ssdfs_current_segment_empty(cur_seg)) { +add_new_current_segment: + start = cur_seg->seg_id; + si = ssdfs_grab_segment(cur_seg->fsi, seg_type, + U64_MAX, start); + if (IS_ERR_OR_NULL(si)) { + err = (si == NULL ? -ENOMEM : PTR_ERR(si)); + if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to create segment object: " + "err %d\n", err); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + SSDFS_ERR("fail to create segment object: " + "err %d\n", err); + } + + goto finish_add_block; + } + + err = ssdfs_current_segment_add(cur_seg, si); + /* + * ssdfs_grab_segment() has got object already. + */ + ssdfs_segment_put_object(si); + if (unlikely(err)) { + SSDFS_ERR("fail to add segment %llu as current: " + "err %d\n", + si->seg_id, err); + goto finish_add_block; + } + + goto try_current_segment; + } else { + si = cur_seg->real_seg; + + err = ssdfs_segment_blk_bmap_reserve_block(&si->blk_bmap); + if (err == -E2BIG) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segment %llu hasn't enough free pages\n", + si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_current_segment_change_state(cur_seg); + if (unlikely(err)) { + SSDFS_ERR("fail to change segment state: " + "seg %llu, err %d\n", + si->seg_id, err); + goto finish_add_block; + } + + if (can_current_segment_be_added(si)) { + err = 0; + ssdfs_current_segment_remove(cur_seg); + goto add_new_current_segment; + } + + err = -ENOSPC; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to add current segment: " + "err %d\n", err); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_add_block; + } else if (unlikely(err)) { + SSDFS_ERR("fail to reserve logical block: " + "seg %llu, err %d\n", + cur_seg->real_seg->seg_id, err); + goto finish_add_block; + } else { + struct ssdfs_blk2off_table *table; + struct ssdfs_requests_queue *create_rq; + wait_queue_head_t *wait; + u16 blk; + + table = si->blk2off_table; + + *seg_id = si->seg_id; + ssdfs_request_define_segment(si->seg_id, req); + + err = ssdfs_blk2off_table_allocate_block(table, &blk); + if (err == -EAGAIN) { + struct completion *end; + end = &table->partial_init_end; + + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("blk2off init failed: " + "err %d\n", err); + goto finish_add_block; + } + + err = ssdfs_blk2off_table_allocate_block(table, + &blk); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to allocate logical block\n"); + goto finish_add_block; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(blk > U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + extent->start_lblk = blk; + extent->len = 1; + + ssdfs_request_define_volume_extent(blk, 1, req); + + err = ssdfs_current_segment_change_state(cur_seg); + if (unlikely(err)) { + SSDFS_ERR("fail to change segment state: " + "seg %llu, err %d\n", + cur_seg->real_seg->seg_id, err); + goto finish_add_block; + } + + ssdfs_account_user_data_flush_request(si); + ssdfs_segment_create_request_cno(si); + + create_rq = &si->create_rq; + ssdfs_requests_queue_add_tail_inc(si->fsi, + create_rq, req); + + wait = &si->wait_queue[SSDFS_PEB_FLUSH_THREAD]; + wake_up_all(wait); + } + } + +finish_add_block: + ssdfs_current_segment_unlock(cur_seg); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: seg %llu\n", + cur_seg->real_seg->seg_id); +#else + SSDFS_DBG("finished: seg %llu\n", + cur_seg->real_seg->seg_id); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to add block: " + "ino %llu, logical_offset %llu, err %d\n", + req->extent.ino, req->extent.logical_offset, err); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (err) { + SSDFS_ERR("fail to add block: " + "ino %llu, logical_offset %llu, err %d\n", + req->extent.ino, req->extent.logical_offset, err); + return err; + } + + return 0; +} + +/* + * __ssdfs_segment_add_extent() - add new extent into segment + * @cur_seg: current segment container + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to add new extent into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +static +int __ssdfs_segment_add_extent(struct ssdfs_current_segment *cur_seg, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + int seg_type; + u64 start = U64_MAX; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cur_seg || !req || !seg_id || !extent); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#else + SSDFS_DBG("ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); + SSDFS_DBG("current segment: type %#x, seg_id %llu, real_seg %px\n", + cur_seg->type, cur_seg->seg_id, cur_seg->real_seg); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + fsi = cur_seg->fsi; + *seg_id = U64_MAX; + + ssdfs_current_segment_lock(cur_seg); + + seg_type = CHECKED_SEG_TYPE(fsi, SEG_TYPE(req->private.class)); + +try_current_segment: + if (is_ssdfs_current_segment_empty(cur_seg)) { +add_new_current_segment: + start = cur_seg->seg_id; + si = ssdfs_grab_segment(fsi, seg_type, U64_MAX, start); + if (IS_ERR_OR_NULL(si)) { + err = (si == NULL ? -ENOMEM : PTR_ERR(si)); + if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to create segment object: " + "err %d\n", err); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + SSDFS_ERR("fail to create segment object: " + "err %d\n", err); + } + + goto finish_add_extent; + } + + err = ssdfs_current_segment_add(cur_seg, si); + /* + * ssdfs_grab_segment() has got object already. + */ + ssdfs_segment_put_object(si); + if (unlikely(err)) { + SSDFS_ERR("fail to add segment %llu as current: " + "err %d\n", + si->seg_id, err); + goto finish_add_extent; + } + + goto try_current_segment; + } else { + struct ssdfs_segment_blk_bmap *blk_bmap; + u32 extent_bytes = req->extent.data_bytes; + u16 blks_count; + + if (fsi->pagesize > PAGE_SIZE) + extent_bytes += fsi->pagesize - 1; + else if (fsi->pagesize <= PAGE_SIZE) + extent_bytes += PAGE_SIZE - 1; + + si = cur_seg->real_seg; + blk_bmap = &si->blk_bmap; + blks_count = extent_bytes >> fsi->log_pagesize; + + err = ssdfs_segment_blk_bmap_reserve_extent(&si->blk_bmap, + blks_count); + if (err == -E2BIG) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segment %llu hasn't enough free pages\n", + cur_seg->real_seg->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_current_segment_change_state(cur_seg); + if (unlikely(err)) { + SSDFS_ERR("fail to change segment state: " + "seg %llu, err %d\n", + cur_seg->real_seg->seg_id, err); + goto finish_add_extent; + } + + if (can_current_segment_be_added(si)) { + err = 0; + ssdfs_current_segment_remove(cur_seg); + goto add_new_current_segment; + } + + err = -ENOSPC; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to add current segment: " + "err %d\n", err); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_add_extent; + } else if (unlikely(err)) { + SSDFS_ERR("fail to reserve logical extent: " + "seg %llu, err %d\n", + cur_seg->real_seg->seg_id, err); + goto finish_add_extent; + } else { + struct ssdfs_blk2off_table *table; + struct ssdfs_requests_queue *create_rq; + + table = si->blk2off_table; + + *seg_id = si->seg_id; + ssdfs_request_define_segment(si->seg_id, req); + + err = ssdfs_blk2off_table_allocate_extent(table, + blks_count, + extent); + if (err == -EAGAIN) { + struct completion *end; + end = &table->partial_init_end; + + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("blk2off init failed: " + "err %d\n", err); + goto finish_add_extent; + } + + err = ssdfs_blk2off_table_allocate_extent(table, + blks_count, + extent); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to allocate logical extent\n"); + goto finish_add_extent; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(extent->start_lblk >= U16_MAX); + BUG_ON(extent->len != blks_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_define_volume_extent(extent->start_lblk, + extent->len, req); + + err = ssdfs_current_segment_change_state(cur_seg); + if (unlikely(err)) { + SSDFS_ERR("fail to change segment state: " + "seg %llu, err %d\n", + cur_seg->real_seg->seg_id, err); + goto finish_add_extent; + } + + ssdfs_account_user_data_flush_request(si); + ssdfs_segment_create_request_cno(si); + + create_rq = &si->create_rq; + ssdfs_requests_queue_add_tail_inc(si->fsi, + create_rq, req); + wake_up_all(&si->wait_queue[SSDFS_PEB_FLUSH_THREAD]); + } + } + +finish_add_extent: + ssdfs_current_segment_unlock(cur_seg); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + if (cur_seg->real_seg) { + SSDFS_ERR("finished: seg %llu\n", + cur_seg->real_seg->seg_id); + } +#else + if (cur_seg->real_seg) { + SSDFS_DBG("finished: seg %llu\n", + cur_seg->real_seg->seg_id); + } +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to add extent: " + "ino %llu, logical_offset %llu, err %d\n", + req->extent.ino, req->extent.logical_offset, err); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (err) { + SSDFS_ERR("fail to add extent: " + "ino %llu, logical_offset %llu, err %d\n", + req->extent.ino, req->extent.logical_offset, err); + return err; + } + + return 0; +} + +/* + * __ssdfs_segment_add_block_sync() - add new block synchronously + * @fsi: pointer on shared file system object + * @req_class: request class + * @req_type: request type + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to add new block into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +static +int __ssdfs_segment_add_block_sync(struct ssdfs_fs_info *fsi, + int req_class, + int req_type, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + struct ssdfs_current_segment *cur_seg; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !req); + BUG_ON(req_class <= SSDFS_PEB_READ_REQ || + req_class > SSDFS_PEB_CREATE_IDXNODE_REQ); + + SSDFS_DBG("ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (req_type) { + case SSDFS_REQ_SYNC: + /* expected request type */ + break; + + default: + SSDFS_ERR("unexpected request type %#x\n", + req_type); + return -EINVAL; + } + + ssdfs_request_prepare_internal_data(req_class, + SSDFS_CREATE_BLOCK, + req_type, req); + + down_read(&fsi->cur_segs->lock); + cur_seg = fsi->cur_segs->objects[CUR_SEG_TYPE(req_class)]; + err = __ssdfs_segment_add_block(cur_seg, req, seg_id, extent); + up_read(&fsi->cur_segs->lock); + + return err; +} + +/* + * __ssdfs_segment_add_block_async() - add new block asynchronously + * @fsi: pointer on shared file system object + * @req_class: request class + * @req_type: request type + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to add new block into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +static +int __ssdfs_segment_add_block_async(struct ssdfs_fs_info *fsi, + int req_class, + int req_type, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + struct ssdfs_current_segment *cur_seg; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !req); + BUG_ON(req_class <= SSDFS_PEB_READ_REQ || + req_class > SSDFS_PEB_CREATE_IDXNODE_REQ); + + SSDFS_DBG("ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (req_type) { + case SSDFS_REQ_ASYNC: + case SSDFS_REQ_ASYNC_NO_FREE: + /* expected request type */ + break; + + default: + SSDFS_ERR("unexpected request type %#x\n", + req_type); + return -EINVAL; + } + + ssdfs_request_prepare_internal_data(req_class, + SSDFS_CREATE_BLOCK, + req_type, req); + + down_read(&fsi->cur_segs->lock); + cur_seg = fsi->cur_segs->objects[CUR_SEG_TYPE(req_class)]; + err = __ssdfs_segment_add_block(cur_seg, req, seg_id, extent); + up_read(&fsi->cur_segs->lock); + + return err; +} + +/* + * ssdfs_segment_pre_alloc_data_block_sync() - synchronous pre-alloc data block + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to pre-allocate a new data block into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_pre_alloc_data_block_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_block_sync(fsi, + SSDFS_PEB_PRE_ALLOCATE_DATA_REQ, + SSDFS_REQ_SYNC, + req, seg_id, extent); +} + +/* + * ssdfs_segment_pre_alloc_data_block_async() - async pre-alloc data block + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to pre-allocate a new data block into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_pre_alloc_data_block_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_block_async(fsi, + SSDFS_PEB_PRE_ALLOCATE_DATA_REQ, + SSDFS_REQ_ASYNC, + req, seg_id, extent); +} + +/* + * ssdfs_segment_pre_alloc_leaf_node_block_sync() - sync pre-alloc leaf node + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to pre-allocate a leaf node's block into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_pre_alloc_leaf_node_block_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_block_sync(fsi, + SSDFS_PEB_PRE_ALLOCATE_LNODE_REQ, + SSDFS_REQ_SYNC, + req, seg_id, extent); +} + +/* + * ssdfs_segment_pre_alloc_leaf_node_block_async() - async pre-alloc leaf node + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to pre-allocate a leaf node's block into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_pre_alloc_leaf_node_block_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_block_async(fsi, + SSDFS_PEB_PRE_ALLOCATE_LNODE_REQ, + SSDFS_REQ_ASYNC, + req, seg_id, extent); +} + +/* + * ssdfs_segment_pre_alloc_hybrid_node_block_sync() - sync pre-alloc hybrid node + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to pre-allocate a hybrid node's block into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_pre_alloc_hybrid_node_block_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_block_sync(fsi, + SSDFS_PEB_PRE_ALLOCATE_HNODE_REQ, + SSDFS_REQ_SYNC, + req, seg_id, extent); +} + +/* + * ssdfs_segment_pre_alloc_hybrid_node_block_async() - pre-alloc hybrid node + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to pre-allocate a hybrid node's block into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_pre_alloc_hybrid_node_block_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_block_async(fsi, + SSDFS_PEB_PRE_ALLOCATE_HNODE_REQ, + SSDFS_REQ_ASYNC, + req, seg_id, extent); +} + +/* + * ssdfs_segment_pre_alloc_index_node_block_sync() - sync pre-alloc index node + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to pre-allocate an index node's block into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_pre_alloc_index_node_block_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_block_sync(fsi, + SSDFS_PEB_PRE_ALLOCATE_IDXNODE_REQ, + SSDFS_REQ_SYNC, + req, seg_id, extent); +} + +/* + * ssdfs_segment_pre_alloc_index_node_block_async() - pre-alloc index node + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to pre-allocate an index node's block into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_pre_alloc_index_node_block_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_block_async(fsi, + SSDFS_PEB_PRE_ALLOCATE_IDXNODE_REQ, + SSDFS_REQ_ASYNC, + req, seg_id, extent); +} + +/* + * ssdfs_segment_add_data_block_sync() - add new data block synchronously + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to add new data block into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_add_data_block_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_block_sync(fsi, + SSDFS_PEB_CREATE_DATA_REQ, + SSDFS_REQ_SYNC, + req, seg_id, extent); +} + +/* + * ssdfs_segment_add_data_block_async() - add new data block asynchronously + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to add new data block into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_add_data_block_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_block_async(fsi, + SSDFS_PEB_CREATE_DATA_REQ, + SSDFS_REQ_ASYNC, + req, seg_id, extent); +} + +/* + * ssdfs_segment_migrate_zone_block_sync() - migrate zone block synchronously + * @fsi: pointer on shared file system object + * @req_type: request type + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to migrate user data block from + * exhausted zone into current zone for user data updates. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_migrate_zone_block_sync(struct ssdfs_fs_info *fsi, + int req_type, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + struct ssdfs_current_segment *cur_seg; + int req_class = SSDFS_ZONE_USER_DATA_MIGRATE_REQ; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !req); + + SSDFS_DBG("ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (req_type) { + case SSDFS_REQ_SYNC: + /* expected request type */ + break; + + default: + SSDFS_ERR("unexpected request type %#x\n", + req_type); + return -EINVAL; + } + + ssdfs_request_prepare_internal_data(req_class, + SSDFS_MIGRATE_ZONE_USER_BLOCK, + req_type, req); + + down_read(&fsi->cur_segs->lock); + cur_seg = fsi->cur_segs->objects[CUR_SEG_TYPE(req_class)]; + err = __ssdfs_segment_add_block(cur_seg, req, seg_id, extent); + up_read(&fsi->cur_segs->lock); + + return err; +} + +/* + * ssdfs_segment_migrate_zone_block_async() - migrate zone block asynchronously + * @fsi: pointer on shared file system object + * @req_type: request type + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to migrate user data block from + * exhausted zone into current zone for user data updates. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_migrate_zone_block_async(struct ssdfs_fs_info *fsi, + int req_type, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + struct ssdfs_current_segment *cur_seg; + int req_class = SSDFS_ZONE_USER_DATA_MIGRATE_REQ; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !req); + + SSDFS_DBG("ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (req_type) { + case SSDFS_REQ_ASYNC: + case SSDFS_REQ_ASYNC_NO_FREE: + /* expected request type */ + break; + + default: + SSDFS_ERR("unexpected request type %#x\n", + req_type); + return -EINVAL; + } + + ssdfs_request_prepare_internal_data(req_class, + SSDFS_MIGRATE_ZONE_USER_BLOCK, + req_type, req); + + down_read(&fsi->cur_segs->lock); + cur_seg = fsi->cur_segs->objects[CUR_SEG_TYPE(req_class)]; + err = __ssdfs_segment_add_block(cur_seg, req, seg_id, extent); + up_read(&fsi->cur_segs->lock); + + return err; +} + +/* + * ssdfs_segment_add_leaf_node_block_sync() - add new leaf node synchronously + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to add new leaf node's block into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_add_leaf_node_block_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_block_sync(fsi, + SSDFS_PEB_CREATE_LNODE_REQ, + SSDFS_REQ_SYNC, + req, seg_id, extent); +} + +/* + * ssdfs_segment_add_leaf_node_block_async() - add new leaf node asynchronously + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to add new leaf node's block into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_add_leaf_node_block_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_block_async(fsi, + SSDFS_PEB_CREATE_LNODE_REQ, + SSDFS_REQ_ASYNC, + req, seg_id, extent); +} + +/* + * ssdfs_segment_add_hybrid_node_block_sync() - add new hybrid node + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to add new hybrid node's block into segment + * synchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_add_hybrid_node_block_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_block_sync(fsi, + SSDFS_PEB_CREATE_HNODE_REQ, + SSDFS_REQ_SYNC, + req, seg_id, extent); +} + +/* + * ssdfs_segment_add_hybrid_node_block_async() - add new hybrid node + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to add new hybrid node's block into segment + * asynchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_add_hybrid_node_block_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_block_async(fsi, + SSDFS_PEB_CREATE_HNODE_REQ, + SSDFS_REQ_ASYNC, + req, seg_id, extent); +} + +/* + * ssdfs_segment_add_index_node_block_sync() - add new index node + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to add new index node's block into segment + * synchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_add_index_node_block_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_block_sync(fsi, + SSDFS_PEB_CREATE_IDXNODE_REQ, + SSDFS_REQ_SYNC, + req, seg_id, extent); +} + +/* + * ssdfs_segment_add_index_node_block_async() - add new index node + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to add new index node's block into segment + * asynchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_add_index_node_block_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_block_async(fsi, + SSDFS_PEB_CREATE_IDXNODE_REQ, + SSDFS_REQ_ASYNC, + req, seg_id, extent); +} + +/* + * __ssdfs_segment_add_extent_sync() - add new extent synchronously + * @fsi: pointer on shared file system object + * @req_class: request class + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to add new extent into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +static +int __ssdfs_segment_add_extent_sync(struct ssdfs_fs_info *fsi, + int req_class, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + struct ssdfs_current_segment *cur_seg; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !req); + BUG_ON(req_class <= SSDFS_PEB_READ_REQ || + req_class > SSDFS_PEB_CREATE_IDXNODE_REQ); + + SSDFS_DBG("ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_prepare_internal_data(req_class, + SSDFS_CREATE_EXTENT, + SSDFS_REQ_SYNC, + req); + + down_read(&fsi->cur_segs->lock); + cur_seg = fsi->cur_segs->objects[CUR_SEG_TYPE(req_class)]; + err = __ssdfs_segment_add_extent(cur_seg, req, seg_id, extent); + up_read(&fsi->cur_segs->lock); + + return err; +} + +/* + * __ssdfs_segment_add_extent_async() - add new extent asynchronously + * @fsi: pointer on shared file system object + * @req_class: request class + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to add new extent into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +static +int __ssdfs_segment_add_extent_async(struct ssdfs_fs_info *fsi, + int req_class, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + struct ssdfs_current_segment *cur_seg; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !req); + BUG_ON(req_class <= SSDFS_PEB_READ_REQ || + req_class > SSDFS_PEB_CREATE_IDXNODE_REQ); + + SSDFS_DBG("ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_prepare_internal_data(req_class, + SSDFS_CREATE_EXTENT, + SSDFS_REQ_ASYNC, + req); + + down_read(&fsi->cur_segs->lock); + cur_seg = fsi->cur_segs->objects[CUR_SEG_TYPE(req_class)]; + err = __ssdfs_segment_add_extent(cur_seg, req, seg_id, extent); + up_read(&fsi->cur_segs->lock); + + return err; +} + +/* + * ssdfs_segment_pre_alloc_data_extent_sync() - sync pre-alloc a data extent + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to pre-allocate a new data extent into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_pre_alloc_data_extent_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_extent_sync(fsi, + SSDFS_PEB_PRE_ALLOCATE_DATA_REQ, + req, seg_id, extent); +} + +/* + * ssdfs_segment_pre_alloc_data_extent_async() - async pre-alloc a data extent + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to pre-allocate a new data extent into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_pre_alloc_data_extent_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_extent_async(fsi, + SSDFS_PEB_PRE_ALLOCATE_DATA_REQ, + req, seg_id, extent); +} + +/* + * ssdfs_segment_pre_alloc_leaf_node_extent_sync() - pre-alloc a leaf node + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to pre-allocate a leaf node's extent into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_pre_alloc_leaf_node_extent_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_extent_sync(fsi, + SSDFS_PEB_PRE_ALLOCATE_LNODE_REQ, + req, seg_id, extent); +} + +/* + * ssdfs_segment_pre_alloc_leaf_node_extent_async() - pre-alloc a leaf node + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to pre-allocate a leaf node's extent into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_pre_alloc_leaf_node_extent_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_extent_async(fsi, + SSDFS_PEB_PRE_ALLOCATE_LNODE_REQ, + req, seg_id, extent); +} + +/* + * ssdfs_segment_pre_alloc_hybrid_node_extent_sync() - pre-alloc a hybrid node + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to pre-allocate a hybrid node's extent into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_pre_alloc_hybrid_node_extent_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_extent_sync(fsi, + SSDFS_PEB_PRE_ALLOCATE_HNODE_REQ, + req, seg_id, extent); +} + +/* + * ssdfs_segment_pre_alloc_hybrid_node_extent_sync() - pre-alloc a hybrid node + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to pre-allocate a hybrid node's extent into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_pre_alloc_hybrid_node_extent_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_extent_async(fsi, + SSDFS_PEB_PRE_ALLOCATE_HNODE_REQ, + req, seg_id, extent); +} + +/* + * ssdfs_segment_pre_alloc_index_node_extent_sync() - pre-alloc an index node + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to pre-allocate an index node's extent into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_pre_alloc_index_node_extent_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_extent_sync(fsi, + SSDFS_PEB_PRE_ALLOCATE_IDXNODE_REQ, + req, seg_id, extent); +} + +/* + * ssdfs_segment_pre_alloc_index_node_extent_sync() - pre-alloc an index node + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to pre-allocate an index node's extent into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_pre_alloc_index_node_extent_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_extent_async(fsi, + SSDFS_PEB_PRE_ALLOCATE_IDXNODE_REQ, + req, seg_id, extent); +} + +/* + * ssdfs_segment_add_data_extent_sync() - add new data extent synchronously + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to add new data extent into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_add_data_extent_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_extent_sync(fsi, + SSDFS_PEB_CREATE_DATA_REQ, + req, seg_id, extent); +} + +/* + * ssdfs_segment_add_data_extent_async() - add new data extent asynchronously + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to add new data extent into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_add_data_extent_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_extent_async(fsi, + SSDFS_PEB_CREATE_DATA_REQ, + req, seg_id, extent); +} + +/* + * ssdfs_segment_migrate_zone_extent_sync() - migrate zone extent synchronously + * @fsi: pointer on shared file system object + * @req_type: request type + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to migrate user data extent from + * exhausted zone into current zone for user data updates. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_migrate_zone_extent_sync(struct ssdfs_fs_info *fsi, + int req_type, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + struct ssdfs_current_segment *cur_seg; + int req_class = SSDFS_ZONE_USER_DATA_MIGRATE_REQ; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !req); + + SSDFS_DBG("ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (req_type) { + case SSDFS_REQ_SYNC: + /* expected request type */ + break; + + default: + SSDFS_ERR("unexpected request type %#x\n", + req_type); + return -EINVAL; + } + + ssdfs_request_prepare_internal_data(req_class, + SSDFS_MIGRATE_ZONE_USER_EXTENT, + req_type, req); + + down_read(&fsi->cur_segs->lock); + cur_seg = fsi->cur_segs->objects[CUR_SEG_TYPE(req_class)]; + err = __ssdfs_segment_add_extent(cur_seg, req, seg_id, extent); + up_read(&fsi->cur_segs->lock); + + return err; +} + +/* + * ssdfs_segment_migrate_zone_extent_async() - migrate zone extent asynchronously + * @fsi: pointer on shared file system object + * @req_type: request type + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to migrate user data exent from + * exhausted zone into current zone for user data updates. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_migrate_zone_extent_async(struct ssdfs_fs_info *fsi, + int req_type, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + struct ssdfs_current_segment *cur_seg; + int req_class = SSDFS_ZONE_USER_DATA_MIGRATE_REQ; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !req); + + SSDFS_DBG("ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (req_type) { + case SSDFS_REQ_ASYNC: + case SSDFS_REQ_ASYNC_NO_FREE: + /* expected request type */ + break; + + default: + SSDFS_ERR("unexpected request type %#x\n", + req_type); + return -EINVAL; + } + + ssdfs_request_prepare_internal_data(req_class, + SSDFS_MIGRATE_ZONE_USER_EXTENT, + req_type, req); + + down_read(&fsi->cur_segs->lock); + cur_seg = fsi->cur_segs->objects[CUR_SEG_TYPE(req_class)]; + err = __ssdfs_segment_add_extent(cur_seg, req, seg_id, extent); + up_read(&fsi->cur_segs->lock); + + return err; +} + +/* + * ssdfs_segment_add_leaf_node_extent_sync() - add new leaf node synchronously + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to add new leaf node's extent into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_add_leaf_node_extent_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_extent_sync(fsi, + SSDFS_PEB_CREATE_LNODE_REQ, + req, seg_id, extent); +} + +/* + * ssdfs_segment_add_leaf_node_extent_async() - add new leaf node asynchronously + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to add new leaf node's extent into segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_add_leaf_node_extent_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_extent_async(fsi, + SSDFS_PEB_CREATE_LNODE_REQ, + req, seg_id, extent); +} + +/* + * ssdfs_segment_add_hybrid_node_extent_sync() - add new hybrid node + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to add new hybrid node's extent into segment + * synchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_add_hybrid_node_extent_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_extent_sync(fsi, + SSDFS_PEB_CREATE_HNODE_REQ, + req, seg_id, extent); +} + +/* + * ssdfs_segment_add_hybrid_node_extent_async() - add new hybrid node + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to add new hybrid node's extent into segment + * asynchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_add_hybrid_node_extent_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_extent_async(fsi, + SSDFS_PEB_CREATE_HNODE_REQ, + req, seg_id, extent); +} + +/* + * ssdfs_segment_add_index_node_extent_sync() - add new index node + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to add new index node's extent into segment + * synchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_add_index_node_extent_sync(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_extent_sync(fsi, + SSDFS_PEB_CREATE_IDXNODE_REQ, + req, seg_id, extent); +} + +/* + * ssdfs_segment_add_index_node_extent_async() - add new index node + * @fsi: pointer on shared file system object + * @req: segment request [in|out] + * @seg_id: segment ID [out] + * @extent: (pre-)allocated extent [out] + * + * This function tries to add new index node's extent into segment + * asynchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - segment hasn't free pages. + * %-ERANGE - internal error. + */ +int ssdfs_segment_add_index_node_extent_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + u64 *seg_id, + struct ssdfs_blk2off_range *extent) +{ + return __ssdfs_segment_add_extent_async(fsi, + SSDFS_PEB_CREATE_IDXNODE_REQ, + req, seg_id, extent); +} From patchwork Sat Feb 25 01:08:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151941 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E13FDC7EE32 for ; Sat, 25 Feb 2023 01:17:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229772AbjBYBRv (ORCPT ); Fri, 24 Feb 2023 20:17:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49016 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229715AbjBYBQ7 (ORCPT ); Fri, 24 Feb 2023 20:16:59 -0500 Received: from mail-oi1-x236.google.com (mail-oi1-x236.google.com [IPv6:2607:f8b0:4864:20::236]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B196216AFF for ; Fri, 24 Feb 2023 17:16:54 -0800 (PST) Received: by mail-oi1-x236.google.com with SMTP id c11so825049oiw.2 for ; Fri, 24 Feb 2023 17:16:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=RhBKIydBkBXqbowXVpb9pwIgj7S9o/tmgVJHT8762+0=; b=nJEFiVQiNWfOK1gvhKQGXdvHdOjEsTYdr7v3iCnwM9bcLStepTb//S8HYLFlJeSkH+ 8ZVvtLDOBXIvfCHrILKBSnsaBIWRslQB3oyrNoPgni+Ixgq4+TDd6ihbrLm6fLIpSdPZ k37DW4kzXToVWkx+tye11ZhLw9GtJEFrM5sZVpso4L3IeU6Wxmbmf4sb3iI60t0aZys7 GVMwOyubvs92PSKd0zJFlqsHwY3Nw+ldOkiosx+wQhelyspeRjoMg3pbtNHxS60Ys/Ws AfcnkgB6E98AmPMK5F/hyWNomYEwEnO5vwCAE1ccaxVpr+Nv8DSboj17qWaJxZ97jK0n Pwcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RhBKIydBkBXqbowXVpb9pwIgj7S9o/tmgVJHT8762+0=; b=kL61OLzU4/qMIfSxmq+FxpUWPKSI5ce1iwgjEjHD2tdg2EZwamyVDWzZarnXM0Kt88 QcIF0HSCCNWbcMoY1A8pvNwkFAELCx6L5Sw1vjVYlgi1f977KvJQ2M1hzuKBWf7yK4AU zRYOK7RJPlbY1S3p+iV2LGO7LvsYGPJFz6o8CZTOsZBS3I37i89W+5jN70QsrXlziP9M 8cyRrY4G2ta7ORbOMw/BEpCOyJ620vJ2wi0JveazFFZhCbUDtQokiJYw9Va9JTvUvDm9 sgaRuaqHyohQVDlcuDKb3qMRYBF9Di2IAEkte0nOrTys174gdEWObwKesL/Nyv7/lSFK 4n0Q== X-Gm-Message-State: AO0yUKVx10V7KZWEoEoeVMyMNLr0oH+7XqY7nQlff08nJexvSNKJ11d4 FUKFLnkMcC7ZrfqPokG1zPCkjufUtpbVdd6C X-Google-Smtp-Source: AK7set+qeATFN92qXGqwCVEP8yMWevx4I/wsFX7OgdukPv2inEctZ/LuG8ejDdMsOrgg/prtikN3vg== X-Received: by 2002:a05:6808:2345:b0:383:b4f8:9e3 with SMTP id ef5-20020a056808234500b00383b4f809e3mr6808743oib.44.1677287813026; Fri, 24 Feb 2023 17:16:53 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:52 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 37/76] ssdfs: segment object's update/invalidate data/metadata Date: Fri, 24 Feb 2023 17:08:48 -0800 Message-Id: <20230225010927.813929-38-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Any file or metadata structure can be updated, truncated, or deleted. Segment object supports the update and invalidate operations with user data or metadata. SSDFS uses logical extent concept to track the location of any user data or metadata. It means that every metadata structure is described by a sequence of extents. Inode object keeps inline extents or root node of extents b-tree that tracks the location of a file's content. Extent identifies a segment ID, logical block ID, and length of extent. Segment ID is used to create or access the segment object. The segment object has offset translation table that provides the mechanism to convert a logical block ID into "Physical" Erase Block (PEB) ID. Finally, it is possible to add update or invalidation request into PEB's update queue. PEB's flush thread takes the update/invalidate requests from the queue and executes the requests. Execution of request means the creation of new log that will contain the actual state of updated or invalidated data in the log's metadata (header, block bitmap, offset translation table) and payload. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/current_segment.c | 682 ++++++++++++++++ fs/ssdfs/current_segment.h | 76 ++ fs/ssdfs/segment.c | 1521 ++++++++++++++++++++++++++++++++++++ 3 files changed, 2279 insertions(+) create mode 100644 fs/ssdfs/current_segment.c create mode 100644 fs/ssdfs/current_segment.h diff --git a/fs/ssdfs/current_segment.c b/fs/ssdfs/current_segment.c new file mode 100644 index 000000000000..10067f0d8753 --- /dev/null +++ b/fs/ssdfs/current_segment.c @@ -0,0 +1,682 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/current_segment.c - current segment abstraction implementation. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "page_vector.h" +#include "peb_block_bitmap.h" +#include "segment_block_bitmap.h" +#include "offset_translation_table.h" +#include "page_array.h" +#include "peb_container.h" +#include "segment_bitmap.h" +#include "segment.h" +#include "current_segment.h" +#include "segment_tree.h" + +#include + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_cur_seg_page_leaks; +atomic64_t ssdfs_cur_seg_memory_leaks; +atomic64_t ssdfs_cur_seg_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_cur_seg_cache_leaks_increment(void *kaddr) + * void ssdfs_cur_seg_cache_leaks_decrement(void *kaddr) + * void *ssdfs_cur_seg_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_cur_seg_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_cur_seg_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_cur_seg_kfree(void *kaddr) + * struct page *ssdfs_cur_seg_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_cur_seg_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_cur_seg_free_page(struct page *page) + * void ssdfs_cur_seg_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(cur_seg) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(cur_seg) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_cur_seg_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_cur_seg_page_leaks, 0); + atomic64_set(&ssdfs_cur_seg_memory_leaks, 0); + atomic64_set(&ssdfs_cur_seg_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_cur_seg_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_cur_seg_page_leaks) != 0) { + SSDFS_ERR("CURRENT SEGMENT: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_cur_seg_page_leaks)); + } + + if (atomic64_read(&ssdfs_cur_seg_memory_leaks) != 0) { + SSDFS_ERR("CURRENT SEGMENT: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_cur_seg_memory_leaks)); + } + + if (atomic64_read(&ssdfs_cur_seg_cache_leaks) != 0) { + SSDFS_ERR("CURRENT SEGMENT: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_cur_seg_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +/****************************************************************************** + * CURRENT SEGMENT CONTAINER FUNCTIONALITY * + ******************************************************************************/ + +/* + * ssdfs_current_segment_init() - init current segment container + * @fsi: pointer on shared file system object + * @type: current segment type + * @seg_id: segment ID + * @cur_seg: pointer on current segment container [out] + */ +static +void ssdfs_current_segment_init(struct ssdfs_fs_info *fsi, + int type, + u64 seg_id, + struct ssdfs_current_segment *cur_seg) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !cur_seg); +#endif /* CONFIG_SSDFS_DEBUG */ + + mutex_init(&cur_seg->lock); + cur_seg->type = type; + cur_seg->seg_id = seg_id; + cur_seg->real_seg = NULL; + cur_seg->fsi = fsi; +} + +/* + * ssdfs_current_segment_destroy() - destroy current segment + * @cur_seg: pointer on current segment container + */ +static +void ssdfs_current_segment_destroy(struct ssdfs_current_segment *cur_seg) +{ + if (!cur_seg) + return; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(mutex_is_locked(&cur_seg->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_ssdfs_current_segment_empty(cur_seg)) { + ssdfs_current_segment_lock(cur_seg); + ssdfs_current_segment_remove(cur_seg); + ssdfs_current_segment_unlock(cur_seg); + } +} + +/* + * ssdfs_current_segment_lock() - lock current segment + * @cur_seg: pointer on current segment container + */ +void ssdfs_current_segment_lock(struct ssdfs_current_segment *cur_seg) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cur_seg); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = mutex_lock_killable(&cur_seg->lock); + WARN_ON(err); +} + +/* + * ssdfs_current_segment_unlock() - unlock current segment + * @cur_seg: pointer on current segment container + */ +void ssdfs_current_segment_unlock(struct ssdfs_current_segment *cur_seg) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cur_seg); + WARN_ON(!mutex_is_locked(&cur_seg->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + mutex_unlock(&cur_seg->lock); +} + +/* + * need_select_flush_threads() - check necessity to select flush threads + * @seg_state: segment state + */ +static inline +bool need_select_flush_threads(int seg_state) +{ + bool need_select = true; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(seg_state >= SSDFS_SEG_STATE_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (seg_state) { + case SSDFS_SEG_CLEAN: + case SSDFS_SEG_DATA_USING: + case SSDFS_SEG_LEAF_NODE_USING: + case SSDFS_SEG_HYBRID_NODE_USING: + case SSDFS_SEG_INDEX_NODE_USING: + need_select = true; + break; + + case SSDFS_SEG_USED: + case SSDFS_SEG_PRE_DIRTY: + case SSDFS_SEG_DIRTY: + need_select = false; + break; + + default: + BUG(); + } + + return need_select; +} + +/* + * ssdfs_segment_select_flush_threads() - select flush threads + * @si: pointer on segment object + * @max_free_pages: max value and position pair + * + * This function selects PEBs' flush threads that will process + * new pages requests. + */ +static +int ssdfs_segment_select_flush_threads(struct ssdfs_segment_info *si, + struct ssdfs_value_pair *max_free_pages) +{ + int start_pos; + u8 found_flush_threads = 0; + int peb_free_pages; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !max_free_pages); + BUG_ON(max_free_pages->value <= 0); + BUG_ON(max_free_pages->pos < 0); + BUG_ON(max_free_pages->pos >= si->pebs_count); + + SSDFS_DBG("seg %llu, max free pages: value %d, pos %d\n", + si->seg_id, max_free_pages->value, max_free_pages->pos); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!need_select_flush_threads(atomic_read(&si->seg_state)) || + atomic_read(&si->blk_bmap.seg_free_blks) == 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segment %llu can't be used as current: \n", + si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } + + start_pos = max_free_pages->pos + si->create_threads - 1; + start_pos /= si->create_threads; + start_pos *= si->create_threads; + + if (start_pos >= si->pebs_count) + start_pos = 0; + + for (i = start_pos; i < si->pebs_count; i++) { + struct ssdfs_peb_container *pebc = &si->peb_array[i]; + + if (found_flush_threads == si->create_threads) + break; + + peb_free_pages = ssdfs_peb_get_free_pages(pebc); + if (unlikely(peb_free_pages < 0)) { + err = peb_free_pages; + SSDFS_ERR("fail to calculate PEB's free pages: " + "pebc %p, seg %llu, peb index %d, err %d\n", + pebc, si->seg_id, i, err); + return err; + } + + if (peb_free_pages == 0 || + is_peb_joined_into_create_requests_queue(pebc)) + continue; + + err = ssdfs_peb_join_create_requests_queue(pebc, + &si->create_rq); + if (unlikely(err)) { + SSDFS_ERR("fail to join create requests queue: " + "seg %llu, peb index %d, err %d\n", + si->seg_id, i, err); + return err; + } + found_flush_threads++; + } + + for (i = 0; i < start_pos; i++) { + struct ssdfs_peb_container *pebc = &si->peb_array[i]; + + if (found_flush_threads == si->create_threads) + break; + + peb_free_pages = ssdfs_peb_get_free_pages(pebc); + if (unlikely(peb_free_pages < 0)) { + err = peb_free_pages; + SSDFS_ERR("fail to calculate PEB's free pages: " + "pebc %p, seg %llu, peb index %d, err %d\n", + pebc, si->seg_id, i, err); + return err; + } + + if (peb_free_pages == 0 || + is_peb_joined_into_create_requests_queue(pebc)) + continue; + + err = ssdfs_peb_join_create_requests_queue(pebc, + &si->create_rq); + if (unlikely(err)) { + SSDFS_ERR("fail to join create requests queue: " + "seg %llu, peb index %d, err %d\n", + si->seg_id, i, err); + return err; + } + found_flush_threads++; + } + + return 0; +} + +/* + * ssdfs_current_segment_add() - prepare current segment + * @cur_seg: pointer on current segment container + * @si: pointer on segment object + * + * This function tries to make segment object @si as current. + * If segment is "clean" or "using" then it can be a current + * segment that processes new page requests. + * In such case, segment object is initialized by pointer on + * new page requests queue. Also it chooses flush threads of several + * PEBs as actual threads for proccessing new page requests in + * parallel. It makes sense to restrict count of such threads by + * CPUs number or independent dies number. Number of free pages in + * PEB can be a basis for choosing thread as actual thread for + * proccessing new page requests. Namely, first @flush_threads that + * has as maximum as possible free pages choose for this role, firstly. + * When some thread fills the log then it delegates your role + * to a next candidate thread in the chain. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + */ +int ssdfs_current_segment_add(struct ssdfs_current_segment *cur_seg, + struct ssdfs_segment_info *si) +{ + struct ssdfs_value_pair max_free_pages; + int state; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cur_seg || !si); + + if (!mutex_is_locked(&cur_seg->lock)) { + SSDFS_WARN("current segment container should be locked\n"); + return -EINVAL; + } + + SSDFS_DBG("seg %llu, log_pages %u, create_threads %u, seg_type %#x\n", + si->seg_id, si->log_pages, + si->create_threads, si->seg_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + BUG_ON(!is_ssdfs_current_segment_empty(cur_seg)); + + max_free_pages.value = 0; + max_free_pages.pos = -1; + + for (i = 0; i < si->pebs_count; i++) { + int peb_free_pages; + struct ssdfs_peb_container *pebc = &si->peb_array[i]; + + peb_free_pages = ssdfs_peb_get_free_pages(pebc); + if (unlikely(peb_free_pages < 0)) { + err = peb_free_pages; + SSDFS_ERR("fail to calculate PEB's free pages: " + "pebc %p, seg %llu, peb index %d, err %d\n", + pebc, si->seg_id, i, err); + return err; + } else if (peb_free_pages == 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb_index %u, free_pages %d\n", + si->seg_id, pebc->peb_index, + peb_free_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + if (max_free_pages.value < peb_free_pages) { + max_free_pages.value = peb_free_pages; + max_free_pages.pos = i; + } + } + + if (max_free_pages.value <= 0 || max_free_pages.pos < 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segment %llu can't be used as current: " + "max_free_pages.value %d, " + "max_free_pages.pos %d\n", + si->seg_id, + max_free_pages.value, + max_free_pages.pos); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } + + err = ssdfs_segment_select_flush_threads(si, &max_free_pages); + if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segment %llu can't be used as current\n", + si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to select flush threads: " + "seg %llu, max free pages: value %d, pos %d, " + "err %d\n", + si->seg_id, max_free_pages.value, max_free_pages.pos, + err); + return err; + } + + ssdfs_segment_get_object(si); + + state = atomic_cmpxchg(&si->obj_state, + SSDFS_SEG_OBJECT_CREATED, + SSDFS_CURRENT_SEG_OBJECT); + if (state < SSDFS_SEG_OBJECT_CREATED || + state >= SSDFS_CURRENT_SEG_OBJECT) { + ssdfs_segment_put_object(si); + SSDFS_WARN("unexpected state %#x\n", + state); + return -ERANGE; + } + + cur_seg->real_seg = si; + cur_seg->seg_id = si->seg_id; + + return 0; +} + +/* + * ssdfs_current_segment_remove() - remove current segment + * @cur_seg: pointer on current segment container + */ +void ssdfs_current_segment_remove(struct ssdfs_current_segment *cur_seg) +{ + int state; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cur_seg); + + if (!mutex_is_locked(&cur_seg->lock)) + SSDFS_WARN("current segment container should be locked\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_ssdfs_current_segment_empty(cur_seg)) { + SSDFS_WARN("current segment container is empty\n"); + return; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, log_pages %u, create_threads %u, seg_type %#x\n", + cur_seg->real_seg->seg_id, + cur_seg->real_seg->log_pages, + cur_seg->real_seg->create_threads, + cur_seg->real_seg->seg_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_cmpxchg(&cur_seg->real_seg->obj_state, + SSDFS_CURRENT_SEG_OBJECT, + SSDFS_SEG_OBJECT_CREATED); + if (state <= SSDFS_SEG_OBJECT_CREATED || + state > SSDFS_CURRENT_SEG_OBJECT) { + SSDFS_WARN("unexpected state %#x\n", + state); + } + + ssdfs_segment_put_object(cur_seg->real_seg); + cur_seg->real_seg = NULL; +} + +/****************************************************************************** + * CURRENT SEGMENTS ARRAY FUNCTIONALITY * + ******************************************************************************/ + +/* + * ssdfs_current_segment_array_create() - create current segments array + * @fsi: pointer on shared file system object + */ +int ssdfs_current_segment_array_create(struct ssdfs_fs_info *fsi) +{ + struct ssdfs_segment_info *si; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + BUG_ON(!rwsem_is_locked(&fsi->volume_sem)); + + SSDFS_DBG("fsi %p\n", fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi->cur_segs = + ssdfs_cur_seg_kzalloc(sizeof(struct ssdfs_current_segs_array), + GFP_KERNEL); + if (!fsi->cur_segs) { + SSDFS_ERR("fail to allocate current segments array\n"); + return -ENOMEM; + } + + init_rwsem(&fsi->cur_segs->lock); + + for (i = 0; i < SSDFS_CUR_SEGS_COUNT; i++) { + u64 seg; + size_t offset = i * sizeof(struct ssdfs_current_segment); + u8 *start_ptr = fsi->cur_segs->buffer; + struct ssdfs_current_segment *object = NULL; + int seg_state, seg_type; + u16 log_pages; + + object = (struct ssdfs_current_segment *)(start_ptr + offset); + fsi->cur_segs->objects[i] = object; + seg = le64_to_cpu(fsi->vs->cur_segs[i]); + + ssdfs_current_segment_init(fsi, i, seg, object); + + if (seg == U64_MAX) + continue; + + switch (i) { + case SSDFS_CUR_DATA_SEG: + case SSDFS_CUR_DATA_UPDATE_SEG: + seg_state = SSDFS_SEG_DATA_USING; + seg_type = SSDFS_USER_DATA_SEG_TYPE; + log_pages = le16_to_cpu(fsi->vh->user_data_log_pages); + break; + + case SSDFS_CUR_LNODE_SEG: + seg_state = SSDFS_SEG_LEAF_NODE_USING; + seg_type = SSDFS_LEAF_NODE_SEG_TYPE; + log_pages = le16_to_cpu(fsi->vh->lnodes_seg_log_pages); + break; + + case SSDFS_CUR_HNODE_SEG: + seg_state = SSDFS_SEG_HYBRID_NODE_USING; + seg_type = SSDFS_HYBRID_NODE_SEG_TYPE; + log_pages = le16_to_cpu(fsi->vh->hnodes_seg_log_pages); + break; + + case SSDFS_CUR_IDXNODE_SEG: + seg_state = SSDFS_SEG_INDEX_NODE_USING; + seg_type = SSDFS_INDEX_NODE_SEG_TYPE; + log_pages = le16_to_cpu(fsi->vh->inodes_seg_log_pages); + break; + + default: + BUG(); + }; + + si = __ssdfs_create_new_segment(fsi, seg, + seg_state, seg_type, + log_pages, + fsi->create_threads_per_seg); + if (IS_ERR_OR_NULL(si)) { + err = (si == NULL ? -ENOMEM : PTR_ERR(si)); + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto destroy_cur_segs; + } else { + SSDFS_WARN("fail to create segment object: " + "seg %llu, err %d\n", + seg, err); + goto destroy_cur_segs; + } + } + + ssdfs_current_segment_lock(object); + err = ssdfs_current_segment_add(object, si); + ssdfs_current_segment_unlock(object); + + if (err == -ENOSPC) { + err = ssdfs_segment_change_state(si); + if (unlikely(err)) { + SSDFS_ERR("fail to change segment's state: " + "seg %llu, err %d\n", + seg, err); + goto destroy_cur_segs; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("current segment is absent\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + ssdfs_segment_put_object(si); + } else if (unlikely(err)) { + SSDFS_ERR("fail to make segment %llu as current: " + "err %d\n", + seg, err); + goto destroy_cur_segs; + } else { + /* + * Segment object was referenced two times + * in __ssdfs_create_new_segment() and + * ssdfs_current_segment_add(). + */ + ssdfs_segment_put_object(si); + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("DONE: create current segment array\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; + +destroy_cur_segs: + for (; i >= 0; i--) { + struct ssdfs_current_segment *cur_seg; + + cur_seg = fsi->cur_segs->objects[i]; + + ssdfs_current_segment_lock(cur_seg); + ssdfs_current_segment_remove(cur_seg); + ssdfs_current_segment_unlock(cur_seg); + } + + ssdfs_cur_seg_kfree(fsi->cur_segs); + fsi->cur_segs = NULL; + + return err; +} + +/* + * ssdfs_destroy_all_curent_segments() - destroy all current segments + * @fsi: pointer on shared file system object + */ +void ssdfs_destroy_all_curent_segments(struct ssdfs_fs_info *fsi) +{ + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + + SSDFS_DBG("fsi->cur_segs %p\n", fsi->cur_segs); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!fsi->cur_segs) + return; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(rwsem_is_locked(&fsi->cur_segs->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&fsi->cur_segs->lock); + for (i = 0; i < SSDFS_CUR_SEGS_COUNT; i++) + ssdfs_current_segment_destroy(fsi->cur_segs->objects[i]); + up_write(&fsi->cur_segs->lock); +} + +/* + * ssdfs_current_segment_array_destroy() - destroy current segments array + * @fsi: pointer on shared file system object + */ +void ssdfs_current_segment_array_destroy(struct ssdfs_fs_info *fsi) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + + SSDFS_DBG("fsi->cur_segs %p\n", fsi->cur_segs); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!fsi->cur_segs) + return; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(rwsem_is_locked(&fsi->cur_segs->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_cur_seg_kfree(fsi->cur_segs); + fsi->cur_segs = NULL; +} diff --git a/fs/ssdfs/current_segment.h b/fs/ssdfs/current_segment.h new file mode 100644 index 000000000000..668ab3311b59 --- /dev/null +++ b/fs/ssdfs/current_segment.h @@ -0,0 +1,76 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/current_segment.h - current segment abstraction declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#ifndef _SSDFS_CURRENT_SEGMENT_H +#define _SSDFS_CURRENT_SEGMENT_H + +/* + * struct ssdfs_current_segment - current segment container + * @lock: exclusive lock of current segment object + * @type: current segment type + * @seg_id: last known segment ID + * @real_seg: concrete current segment + * @fsi: pointer on shared file system object + */ +struct ssdfs_current_segment { + struct mutex lock; + int type; + u64 seg_id; + struct ssdfs_segment_info *real_seg; + struct ssdfs_fs_info *fsi; +}; + +/* + * struct ssdfs_current_segs_array - array of current segments + * @lock: current segments array's lock + * @objects: array of pointers on current segment objects + * @buffer: buffer for all current segment objects + */ +struct ssdfs_current_segs_array { + struct rw_semaphore lock; + struct ssdfs_current_segment *objects[SSDFS_CUR_SEGS_COUNT]; + u8 buffer[sizeof(struct ssdfs_current_segment) * SSDFS_CUR_SEGS_COUNT]; +}; + +/* + * Inline functions + */ +static inline +bool is_ssdfs_current_segment_empty(struct ssdfs_current_segment *cur_seg) +{ + return cur_seg->real_seg == NULL; +} + +/* + * Current segment container's API + */ +int ssdfs_current_segment_array_create(struct ssdfs_fs_info *fsi); +void ssdfs_destroy_all_curent_segments(struct ssdfs_fs_info *fsi); +void ssdfs_current_segment_array_destroy(struct ssdfs_fs_info *fsi); + +void ssdfs_current_segment_lock(struct ssdfs_current_segment *cur_seg); +void ssdfs_current_segment_unlock(struct ssdfs_current_segment *cur_seg); + +int ssdfs_current_segment_add(struct ssdfs_current_segment *cur_seg, + struct ssdfs_segment_info *si); +void ssdfs_current_segment_remove(struct ssdfs_current_segment *cur_seg); + +#endif /* _SSDFS_CURRENT_SEGMENT_H */ diff --git a/fs/ssdfs/segment.c b/fs/ssdfs/segment.c index 9496b18aa1f3..bf61804980cc 100644 --- a/fs/ssdfs/segment.c +++ b/fs/ssdfs/segment.c @@ -3739,3 +3739,1524 @@ int ssdfs_segment_add_index_node_extent_async(struct ssdfs_fs_info *fsi, SSDFS_PEB_CREATE_IDXNODE_REQ, req, seg_id, extent); } + +static inline +int ssdfs_account_user_data_pages_as_pending(struct ssdfs_peb_container *pebc, + u32 count) +{ + struct ssdfs_fs_info *fsi; + u64 updated = 0; + u32 pending = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebc->parent_si->fsi; + + if (!is_ssdfs_peb_containing_user_data(pebc)) + return 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, peb_index %u, count %u\n", + pebc->parent_si->seg_id, pebc->peb_index, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&fsi->volume_state_lock); + updated = fsi->updated_user_data_pages; + if (fsi->updated_user_data_pages >= count) { + fsi->updated_user_data_pages -= count; + } else { + err = -ERANGE; + fsi->updated_user_data_pages = 0; + } + spin_unlock(&fsi->volume_state_lock); + + if (err) { + SSDFS_WARN("count %u is bigger than updated %llu\n", + count, updated); + + spin_lock(&pebc->pending_lock); + pebc->pending_updated_user_data_pages += updated; + pending = pebc->pending_updated_user_data_pages; + spin_unlock(&pebc->pending_lock); + } else { + spin_lock(&pebc->pending_lock); + pebc->pending_updated_user_data_pages += count; + pending = pebc->pending_updated_user_data_pages; + spin_unlock(&pebc->pending_lock); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, peb_index %u, " + "updated %llu, pending %u\n", + pebc->parent_si->seg_id, pebc->peb_index, + updated, pending); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * __ssdfs_segment_update_block() - update block in segment + * @si: segment info + * @req: segment request [in|out] + * + * This function tries to update a block in segment. + */ +static +int __ssdfs_segment_update_block(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req) +{ + struct ssdfs_blk2off_table *table; + struct ssdfs_phys_offset_descriptor *po_desc; + struct ssdfs_peb_container *pebc; + struct ssdfs_requests_queue *rq; + wait_queue_head_t *wait; + struct ssdfs_offset_position pos = {0}; + u16 peb_index = U16_MAX; + u16 logical_blk; + u16 len; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#else + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, blks %u, " + "cno %llu, parent_snapshot %llu\n", + si->seg_id, + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, req->place.len, + req->extent.cno, req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + table = si->blk2off_table; + logical_blk = req->place.start.blk_index; + len = req->place.len; + + po_desc = ssdfs_blk2off_table_convert(table, logical_blk, + &peb_index, NULL, &pos); + if (IS_ERR(po_desc) && PTR_ERR(po_desc) == -EAGAIN) { + struct completion *end; + end = &table->full_init_end; + + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("blk2off init failed: " + "err %d\n", err); + return err; + } + + po_desc = ssdfs_blk2off_table_convert(table, logical_blk, + &peb_index, NULL, + &pos); + } + + if (IS_ERR_OR_NULL(po_desc)) { + err = (po_desc == NULL ? -ERANGE : PTR_ERR(po_desc)); + SSDFS_ERR("fail to convert: " + "logical_blk %u, err %d\n", + logical_blk, err); + return err; + } + + if (peb_index >= si->pebs_count) { + SSDFS_ERR("peb_index %u >= si->pebs_count %u\n", + peb_index, si->pebs_count); + return -ERANGE; + } + + pebc = &si->peb_array[peb_index]; + rq = &pebc->update_rq; + + if (req->private.cmd != SSDFS_COMMIT_LOG_NOW) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "logical_blk %u, data_bytes %u, blks %u, " + "cno %llu, parent_snapshot %llu\n", + si->seg_id, + req->extent.ino, req->extent.logical_offset, + req->place.start.blk_index, + req->extent.data_bytes, req->place.len, + req->extent.cno, req->extent.parent_snapshot); + SSDFS_DBG("req->private.class %#x, req->private.cmd %#x\n", + req->private.class, req->private.cmd); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (len > 0) { + err = ssdfs_account_user_data_pages_as_pending(pebc, + len); + if (unlikely(err)) { + SSDFS_ERR("fail to make pages as pending: " + "len %u, err %d\n", + len, err); + return err; + } + } else { + SSDFS_WARN("unexpected len %u\n", len); + } + } + + ssdfs_account_user_data_flush_request(si); + ssdfs_segment_create_request_cno(si); + + switch (req->private.class) { + case SSDFS_PEB_COLLECT_GARBAGE_REQ: + ssdfs_requests_queue_add_head_inc(si->fsi, rq, req); + break; + + default: + ssdfs_requests_queue_add_tail_inc(si->fsi, rq, req); + break; + } + + wait = &si->wait_queue[SSDFS_PEB_FLUSH_THREAD]; + wake_up_all(wait); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} + +/* + * ssdfs_segment_update_block_sync() - update block synchronously + * @si: segment info + * @req: segment request [in|out] + * + * This function tries to update the block synchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_update_block_sync(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_prepare_internal_data(SSDFS_PEB_UPDATE_REQ, + SSDFS_UPDATE_BLOCK, + SSDFS_REQ_SYNC, + req); + ssdfs_request_define_segment(si->seg_id, req); + + return __ssdfs_segment_update_block(si, req); +} + +/* + * ssdfs_segment_update_block_async() - update block asynchronously + * @si: segment info + * @req_type: request type + * @req: segment request [in|out] + * + * This function tries to update the block asynchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_update_block_async(struct ssdfs_segment_info *si, + int req_type, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (req_type) { + case SSDFS_REQ_ASYNC: + case SSDFS_REQ_ASYNC_NO_FREE: + /* expected request type */ + break; + + default: + SSDFS_ERR("unexpected request type %#x\n", + req_type); + return -EINVAL; + } + + ssdfs_request_prepare_internal_data(SSDFS_PEB_UPDATE_REQ, + SSDFS_UPDATE_BLOCK, + req_type, req); + ssdfs_request_define_segment(si->seg_id, req); + + return __ssdfs_segment_update_block(si, req); +} + +/* + * __ssdfs_segment_update_extent() - update extent in segment + * @si: segment info + * @req: segment request [in|out] + * + * This function tries to update an extent in segment. + */ +static +int __ssdfs_segment_update_extent(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req) +{ + struct ssdfs_blk2off_table *table; + struct ssdfs_phys_offset_descriptor *po_desc; + struct ssdfs_peb_container *pebc; + struct ssdfs_requests_queue *rq; + wait_queue_head_t *wait; + u16 blk, len; + u16 peb_index = U16_MAX; + struct ssdfs_offset_position pos = {0}; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg %llu, ino %llu, logical_offset %llu, " + "logical_blk %u, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, + req->extent.ino, req->extent.logical_offset, + req->place.start.blk_index, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#else + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "logical_blk %u, data_bytes %u, blks %u, " + "cno %llu, parent_snapshot %llu\n", + si->seg_id, + req->extent.ino, req->extent.logical_offset, + req->place.start.blk_index, + req->extent.data_bytes, req->place.len, + req->extent.cno, req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + table = si->blk2off_table; + blk = req->place.start.blk_index; + len = req->place.len; + + if (len == 0) { + SSDFS_WARN("empty extent\n"); + return -ERANGE; + } + + for (i = 0; i < len; i++) { + u16 cur_peb_index = U16_MAX; + + po_desc = ssdfs_blk2off_table_convert(table, blk + i, + &cur_peb_index, + NULL, &pos); + if (IS_ERR(po_desc) && PTR_ERR(po_desc) == -EAGAIN) { + struct completion *end; + end = &table->full_init_end; + + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("blk2off init failed: " + "err %d\n", err); + return err; + } + + po_desc = ssdfs_blk2off_table_convert(table, blk + i, + &cur_peb_index, + NULL, &pos); + } + + if (IS_ERR_OR_NULL(po_desc)) { + err = (po_desc == NULL ? -ERANGE : PTR_ERR(po_desc)); + SSDFS_ERR("fail to convert: " + "logical_blk %u, err %d\n", + blk + i, err); + return err; + } + + if (cur_peb_index >= U16_MAX) { + SSDFS_ERR("invalid peb_index\n"); + return -ERANGE; + } + + if (peb_index == U16_MAX) + peb_index = cur_peb_index; + else if (peb_index != cur_peb_index) { + SSDFS_ERR("peb_index %u != cur_peb_index %u\n", + peb_index, cur_peb_index); + return -ERANGE; + } + } + + if (peb_index >= si->pebs_count) { + SSDFS_ERR("peb_index %u >= si->pebs_count %u\n", + peb_index, si->pebs_count); + return -ERANGE; + } + + pebc = &si->peb_array[peb_index]; + rq = &pebc->update_rq; + + if (req->private.cmd != SSDFS_COMMIT_LOG_NOW) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "logical_blk %u, data_bytes %u, blks %u, " + "cno %llu, parent_snapshot %llu\n", + si->seg_id, + req->extent.ino, req->extent.logical_offset, + req->place.start.blk_index, + req->extent.data_bytes, req->place.len, + req->extent.cno, req->extent.parent_snapshot); + SSDFS_DBG("req->private.class %#x, req->private.cmd %#x\n", + req->private.class, req->private.cmd); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (len > 0) { + err = ssdfs_account_user_data_pages_as_pending(pebc, + len); + if (unlikely(err)) { + SSDFS_ERR("fail to make pages as pending: " + "len %u, err %d\n", + len, err); + return err; + } + } else { + SSDFS_WARN("unexpected len %u\n", len); + } + } + + ssdfs_account_user_data_flush_request(si); + ssdfs_segment_create_request_cno(si); + + switch (req->private.class) { + case SSDFS_PEB_COLLECT_GARBAGE_REQ: + ssdfs_requests_queue_add_head_inc(si->fsi, rq, req); + break; + + default: + ssdfs_requests_queue_add_tail_inc(si->fsi, rq, req); + break; + } + + wait = &si->wait_queue[SSDFS_PEB_FLUSH_THREAD]; + wake_up_all(wait); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} + +/* + * ssdfs_segment_update_extent_sync() - update extent synchronously + * @si: segment info + * @req: segment request [in|out] + * + * This function tries to update the extent synchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_update_extent_sync(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_prepare_internal_data(SSDFS_PEB_UPDATE_REQ, + SSDFS_UPDATE_EXTENT, + SSDFS_REQ_SYNC, + req); + ssdfs_request_define_segment(si->seg_id, req); + + return __ssdfs_segment_update_extent(si, req); +} + +/* + * ssdfs_segment_update_extent_async() - update extent asynchronously + * @si: segment info + * @req_type: request type + * @req: segment request [in|out] + * + * This function tries to update the extent asynchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_update_extent_async(struct ssdfs_segment_info *si, + int req_type, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (req_type) { + case SSDFS_REQ_ASYNC: + case SSDFS_REQ_ASYNC_NO_FREE: + /* expected request type */ + break; + + default: + SSDFS_ERR("unexpected request type %#x\n", + req_type); + return -EINVAL; + } + + ssdfs_request_prepare_internal_data(SSDFS_PEB_UPDATE_REQ, + SSDFS_UPDATE_EXTENT, + req_type, req); + ssdfs_request_define_segment(si->seg_id, req); + + return __ssdfs_segment_update_extent(si, req); +} + +/* + * ssdfs_segment_update_pre_alloc_block_sync() - update pre-allocated block + * @si: segment info + * @req: segment request [in|out] + * + * This function tries to update the pre-allocated block synchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_update_pre_alloc_block_sync(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_prepare_internal_data(SSDFS_PEB_PRE_ALLOC_UPDATE_REQ, + SSDFS_UPDATE_PRE_ALLOC_BLOCK, + SSDFS_REQ_SYNC, + req); + ssdfs_request_define_segment(si->seg_id, req); + + return __ssdfs_segment_update_extent(si, req); +} + +/* + * ssdfs_segment_update_pre_alloc_block_async() - update pre-allocated block + * @si: segment info + * @req_type: request type + * @req: segment request [in|out] + * + * This function tries to update the pre-allocated block asynchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_update_pre_alloc_block_async(struct ssdfs_segment_info *si, + int req_type, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (req_type) { + case SSDFS_REQ_ASYNC: + case SSDFS_REQ_ASYNC_NO_FREE: + /* expected request type */ + break; + + default: + SSDFS_ERR("unexpected request type %#x\n", + req_type); + return -EINVAL; + } + + ssdfs_request_prepare_internal_data(SSDFS_PEB_PRE_ALLOC_UPDATE_REQ, + SSDFS_UPDATE_PRE_ALLOC_BLOCK, + req_type, req); + ssdfs_request_define_segment(si->seg_id, req); + + return __ssdfs_segment_update_extent(si, req); +} + +/* + * ssdfs_segment_update_pre_alloc_extent_sync() - update pre-allocated extent + * @si: segment info + * @req: segment request [in|out] + * + * This function tries to update the pre-allocated extent synchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_update_pre_alloc_extent_sync(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_prepare_internal_data(SSDFS_PEB_PRE_ALLOC_UPDATE_REQ, + SSDFS_UPDATE_PRE_ALLOC_EXTENT, + SSDFS_REQ_SYNC, + req); + ssdfs_request_define_segment(si->seg_id, req); + + return __ssdfs_segment_update_extent(si, req); +} + +/* + * ssdfs_segment_update_pre_alloc_extent_async() - update pre-allocated extent + * @si: segment info + * @req_type: request type + * @req: segment request [in|out] + * + * This function tries to update the pre-allocated extent asynchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_update_pre_alloc_extent_async(struct ssdfs_segment_info *si, + int req_type, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (req_type) { + case SSDFS_REQ_ASYNC: + case SSDFS_REQ_ASYNC_NO_FREE: + /* expected request type */ + break; + + default: + SSDFS_ERR("unexpected request type %#x\n", + req_type); + return -EINVAL; + } + + ssdfs_request_prepare_internal_data(SSDFS_PEB_PRE_ALLOC_UPDATE_REQ, + SSDFS_UPDATE_PRE_ALLOC_EXTENT, + req_type, req); + ssdfs_request_define_segment(si->seg_id, req); + + return __ssdfs_segment_update_extent(si, req); +} + +/* + * ssdfs_segment_node_diff_on_write_sync() - Diff-On-Write btree node + * @si: segment info + * @req: segment request [in|out] + * + * This function tries to execute Diff-On-Write operation + * on btree node synchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_node_diff_on_write_sync(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_prepare_internal_data(SSDFS_PEB_DIFF_ON_WRITE_REQ, + SSDFS_BTREE_NODE_DIFF, + SSDFS_REQ_SYNC, + req); + ssdfs_request_define_segment(si->seg_id, req); + + return __ssdfs_segment_update_extent(si, req); +} + +/* + * ssdfs_segment_node_diff_on_write_async() - Diff-On-Write btree node + * @si: segment info + * @req_type: request type + * @req: segment request [in|out] + * + * This function tries to execute Diff-On-Write operation + * on btree node asynchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_node_diff_on_write_async(struct ssdfs_segment_info *si, + int req_type, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (req_type) { + case SSDFS_REQ_ASYNC: + case SSDFS_REQ_ASYNC_NO_FREE: + /* expected request type */ + break; + + default: + SSDFS_ERR("unexpected request type %#x\n", + req_type); + return -EINVAL; + } + + ssdfs_request_prepare_internal_data(SSDFS_PEB_DIFF_ON_WRITE_REQ, + SSDFS_BTREE_NODE_DIFF, + req_type, req); + ssdfs_request_define_segment(si->seg_id, req); + + return __ssdfs_segment_update_extent(si, req); +} + +/* + * ssdfs_segment_data_diff_on_write_sync() - Diff-On-Write user data + * @si: segment info + * @req: segment request [in|out] + * + * This function tries to execute Diff-On-Write operation + * on user data synchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_data_diff_on_write_sync(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_prepare_internal_data(SSDFS_PEB_DIFF_ON_WRITE_REQ, + SSDFS_USER_DATA_DIFF, + SSDFS_REQ_SYNC, + req); + ssdfs_request_define_segment(si->seg_id, req); + + return __ssdfs_segment_update_block(si, req); +} + +/* + * ssdfs_segment_data_diff_on_write_async() - Diff-On-Write user data + * @si: segment info + * @req_type: request type + * @req: segment request [in|out] + * + * This function tries to execute Diff-On-Write operation + * on user data asynchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_data_diff_on_write_async(struct ssdfs_segment_info *si, + int req_type, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (req_type) { + case SSDFS_REQ_ASYNC: + case SSDFS_REQ_ASYNC_NO_FREE: + /* expected request type */ + break; + + default: + SSDFS_ERR("unexpected request type %#x\n", + req_type); + return -EINVAL; + } + + ssdfs_request_prepare_internal_data(SSDFS_PEB_DIFF_ON_WRITE_REQ, + SSDFS_USER_DATA_DIFF, + req_type, req); + ssdfs_request_define_segment(si->seg_id, req); + + return __ssdfs_segment_update_block(si, req); +} + +/* + * ssdfs_segment_prepare_migration_sync() - request to prepare migration + * @si: segment info + * @req: segment request [in|out] + * + * This function tries to request to prepare or to start the migration + * synchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_prepare_migration_sync(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_prepare_internal_data(SSDFS_PEB_UPDATE_REQ, + SSDFS_START_MIGRATION_NOW, + SSDFS_REQ_SYNC, + req); + ssdfs_request_define_segment(si->seg_id, req); + + return __ssdfs_segment_update_extent(si, req); +} + +/* + * ssdfs_segment_prepare_migration_async() - request to prepare migration + * @si: segment info + * @req_type: request type + * @req: segment request [in|out] + * + * This function tries to request to prepare or to start the migration + * asynchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_prepare_migration_async(struct ssdfs_segment_info *si, + int req_type, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (req_type) { + case SSDFS_REQ_ASYNC: + case SSDFS_REQ_ASYNC_NO_FREE: + /* expected request type */ + break; + + default: + SSDFS_ERR("unexpected request type %#x\n", + req_type); + return -EINVAL; + } + + ssdfs_request_prepare_internal_data(SSDFS_PEB_UPDATE_REQ, + SSDFS_START_MIGRATION_NOW, + req_type, req); + ssdfs_request_define_segment(si->seg_id, req); + + return __ssdfs_segment_update_extent(si, req); +} + +/* + * ssdfs_segment_commit_log_sync() - request the commit log operation + * @si: segment info + * @req: segment request [in|out] + * + * This function tries to request the commit log operation + * synchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_commit_log_sync(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_prepare_internal_data(SSDFS_PEB_UPDATE_REQ, + SSDFS_COMMIT_LOG_NOW, + SSDFS_REQ_SYNC, + req); + ssdfs_request_define_segment(si->seg_id, req); + + return __ssdfs_segment_update_extent(si, req); +} + +/* + * ssdfs_segment_commit_log_async() - request the commit log operation + * @si: segment info + * @req_type: request type + * @req: segment request [in|out] + * + * This function tries to request the commit log operation + * asynchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_commit_log_async(struct ssdfs_segment_info *si, + int req_type, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (req_type) { + case SSDFS_REQ_ASYNC: + case SSDFS_REQ_ASYNC_NO_FREE: + /* expected request type */ + break; + + default: + SSDFS_ERR("unexpected request type %#x\n", + req_type); + return -EINVAL; + } + + ssdfs_request_prepare_internal_data(SSDFS_PEB_UPDATE_REQ, + SSDFS_COMMIT_LOG_NOW, + req_type, req); + ssdfs_request_define_segment(si->seg_id, req); + + return __ssdfs_segment_update_extent(si, req); +} + +/* + * __ssdfs_segment_commit_log2() - request the commit log operation + * @si: segment info + * @peb_index: PEB's index + * @req: segment request [in|out] + * + * This function tries to request the commit log operation. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_segment_commit_log2(struct ssdfs_segment_info *si, + u16 peb_index, + struct ssdfs_segment_request *req) +{ + struct ssdfs_peb_container *pebc; + struct ssdfs_requests_queue *rq; + wait_queue_head_t *wait; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, peb_index %u, " + "ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, peb_index, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (peb_index >= si->pebs_count) { + SSDFS_ERR("peb_index %u >= si->pebs_count %u\n", + peb_index, si->pebs_count); + return -ERANGE; + } + + ssdfs_account_user_data_flush_request(si); + ssdfs_segment_create_request_cno(si); + + pebc = &si->peb_array[peb_index]; + rq = &pebc->update_rq; + + switch (req->private.class) { + case SSDFS_PEB_COLLECT_GARBAGE_REQ: + ssdfs_requests_queue_add_head_inc(si->fsi, rq, req); + break; + + default: + ssdfs_requests_queue_add_tail_inc(si->fsi, rq, req); + break; + } + + wait = &si->wait_queue[SSDFS_PEB_FLUSH_THREAD]; + wake_up_all(wait); + + return 0; +} + +/* + * ssdfs_segment_commit_log_sync2() - request the commit log operation + * @si: segment info + * @peb_index: PEB's index + * @req: segment request [in|out] + * + * This function tries to request the commit log operation + * synchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_commit_log_sync2(struct ssdfs_segment_info *si, + u16 peb_index, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, peb_index %u, " + "ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, peb_index, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_prepare_internal_data(SSDFS_PEB_UPDATE_REQ, + SSDFS_COMMIT_LOG_NOW, + SSDFS_REQ_SYNC, + req); + ssdfs_request_define_segment(si->seg_id, req); + + return __ssdfs_segment_commit_log2(si, peb_index, req); +} + +/* + * ssdfs_segment_commit_log_async2() - request the commit log operation + * @si: segment info + * @req_type: request type + * @peb_index: PEB's index + * @req: segment request [in|out] + * + * This function tries to request the commit log operation + * asynchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_commit_log_async2(struct ssdfs_segment_info *si, + int req_type, u16 peb_index, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, peb_index %u, " + "ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, peb_index, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (req_type) { + case SSDFS_REQ_ASYNC: + case SSDFS_REQ_ASYNC_NO_FREE: + /* expected request type */ + break; + + default: + SSDFS_ERR("unexpected request type %#x\n", + req_type); + return -EINVAL; + } + + ssdfs_request_prepare_internal_data(SSDFS_PEB_UPDATE_REQ, + SSDFS_COMMIT_LOG_NOW, + req_type, req); + ssdfs_request_define_segment(si->seg_id, req); + + return __ssdfs_segment_commit_log2(si, peb_index, req); +} + +/* + * ssdfs_segment_invalidate_logical_extent() - invalidate logical extent + * @si: segment info + * @start_off: starting logical block + * @blks_count: count of logical blocks in the extent + * + * This function tries to invalidate extent of logical blocks. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_invalidate_logical_extent(struct ssdfs_segment_info *si, + u32 start_off, u32 blks_count) +{ + struct ssdfs_blk2off_table *blk2off_tbl; + struct ssdfs_phys_offset_descriptor *off_desc = NULL; + struct ssdfs_phys_offset_descriptor old_desc; + size_t desc_size = sizeof(struct ssdfs_phys_offset_descriptor); + u32 blk; + u32 upper_blk = start_off + blks_count; + struct completion *init_end; + struct ssdfs_offset_position pos = {0}; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("si %p, seg %llu, start_off %u, blks_count %u\n", + si, si->seg_id, start_off, blks_count); +#else + SSDFS_DBG("si %p, seg %llu, start_off %u, blks_count %u\n", + si, si->seg_id, start_off, blks_count); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + blk2off_tbl = si->blk2off_table; + + ssdfs_account_invalidated_user_data_pages(si, blks_count); + + for (blk = start_off; blk < upper_blk; blk++) { + struct ssdfs_segment_request *req; + struct ssdfs_peb_container *pebc; + struct ssdfs_requests_queue *rq; + wait_queue_head_t *wait; + u16 peb_index = U16_MAX; + u16 peb_page; + + if (blk >= U16_MAX) { + SSDFS_ERR("invalid logical block number: %u\n", + blk); + return -ERANGE; + } + + off_desc = ssdfs_blk2off_table_convert(blk2off_tbl, + (u16)blk, + &peb_index, + NULL, &pos); + if (PTR_ERR(off_desc) == -EAGAIN) { + init_end = &blk2off_tbl->full_init_end; + + err = SSDFS_WAIT_COMPLETION(init_end); + if (unlikely(err)) { + SSDFS_ERR("blk2off init failed: " + "err %d\n", err); + return err; + } + + off_desc = ssdfs_blk2off_table_convert(blk2off_tbl, + (u16)blk, + &peb_index, + NULL, + &pos); + } + + if (IS_ERR_OR_NULL(off_desc)) { + err = !off_desc ? -ERANGE : PTR_ERR(off_desc); + SSDFS_ERR("fail to convert logical block: " + "blk %u, err %d\n", + blk, err); + return err; + } + + ssdfs_memcpy(&old_desc, 0, desc_size, + off_desc, 0, desc_size, + desc_size); + + peb_page = le16_to_cpu(old_desc.page_desc.peb_page); + + if (peb_index >= si->pebs_count) { + SSDFS_ERR("peb_index %u >= pebs_count %u\n", + peb_index, si->pebs_count); + return -ERANGE; + } + + pebc = &si->peb_array[peb_index]; + + err = ssdfs_blk2off_table_free_block(blk2off_tbl, + peb_index, + (u16)blk); + if (err == -EAGAIN) { + init_end = &blk2off_tbl->full_init_end; + + err = SSDFS_WAIT_COMPLETION(init_end); + if (unlikely(err)) { + SSDFS_ERR("blk2off init failed: " + "err %d\n", err); + return err; + } + + err = ssdfs_blk2off_table_free_block(blk2off_tbl, + peb_index, + (u16)blk); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to free logical block: " + "blk %u, err %d\n", + blk, err); + return err; + } + + mutex_lock(&pebc->migration_lock); + + err = ssdfs_peb_container_invalidate_block(pebc, &old_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate: " + "logical_blk %u, peb_index %u, " + "err %d\n", + blk, peb_index, err); + goto finish_invalidate_block; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("valid_blks %d, invalid_blks %d\n", + atomic_read(&si->blk_bmap.seg_valid_blks), + atomic_read(&si->blk_bmap.seg_invalid_blks)); +#endif /* CONFIG_SSDFS_DEBUG */ + + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + goto finish_invalidate_block; + } + + ssdfs_request_init(req); + ssdfs_get_request(req); + + ssdfs_request_prepare_internal_data(SSDFS_PEB_UPDATE_REQ, + SSDFS_EXTENT_WAS_INVALIDATED, + SSDFS_REQ_ASYNC, req); + ssdfs_request_define_segment(si->seg_id, req); + + ssdfs_account_user_data_flush_request(si); + ssdfs_segment_create_request_cno(si); + + rq = &pebc->update_rq; + ssdfs_requests_queue_add_tail_inc(si->fsi, rq, req); + +finish_invalidate_block: + mutex_unlock(&pebc->migration_lock); + + if (unlikely(err)) + return err; + + wait = &si->wait_queue[SSDFS_PEB_FLUSH_THREAD]; + wake_up_all(wait); + } + + err = ssdfs_segment_change_state(si); + if (unlikely(err)) { + SSDFS_ERR("fail to change segment state: " + "seg %llu, err %d\n", + si->seg_id, err); + return err; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} + +/* + * ssdfs_segment_invalidate_logical_block() - invalidate logical block + * @si: segment info + * @blk_offset: logical block number + * + * This function tries to invalidate a logical block. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_invalidate_logical_block(struct ssdfs_segment_info *si, + u32 blk_offset) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si); + + SSDFS_DBG("si %p, seg %llu, blk_offset %u\n", + si, si->seg_id, blk_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + return ssdfs_segment_invalidate_logical_extent(si, blk_offset, 1); +} + +/* + * ssdfs_segment_migrate_range_async() - migrate range by flush thread + * @si: segment info + * @req: segment request [in|out] + * + * This function tries to migrate the range by flush thread + * asynchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_migrate_range_async(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_prepare_internal_data(SSDFS_PEB_COLLECT_GARBAGE_REQ, + SSDFS_MIGRATE_RANGE, + SSDFS_REQ_ASYNC, + req); + ssdfs_request_define_segment(si->seg_id, req); + + return __ssdfs_segment_update_extent(si, req); +} + +/* + * ssdfs_segment_migrate_pre_alloc_page_async() - migrate page by flush thread + * @si: segment info + * @req: segment request [in|out] + * + * This function tries to migrate the pre-allocated page by flush thread + * asynchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_migrate_pre_alloc_page_async(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_prepare_internal_data(SSDFS_PEB_COLLECT_GARBAGE_REQ, + SSDFS_MIGRATE_PRE_ALLOC_PAGE, + SSDFS_REQ_ASYNC, + req); + ssdfs_request_define_segment(si->seg_id, req); + + return __ssdfs_segment_update_extent(si, req); +} + +/* + * ssdfs_segment_migrate_fragment_async() - migrate fragment by flush thread + * @si: segment info + * @req: segment request [in|out] + * + * This function tries to migrate the fragment by flush thread + * asynchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_segment_migrate_fragment_async(struct ssdfs_segment_info *si, + struct ssdfs_segment_request *req) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si || !req); + + SSDFS_DBG("seg %llu, ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, parent_snapshot %llu\n", + si->seg_id, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_prepare_internal_data(SSDFS_PEB_COLLECT_GARBAGE_REQ, + SSDFS_MIGRATE_FRAGMENT, + SSDFS_REQ_ASYNC, + req); + ssdfs_request_define_segment(si->seg_id, req); + + return __ssdfs_segment_update_extent(si, req); +} From patchwork Sat Feb 25 01:08:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151943 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 816D5C6FA8E for ; Sat, 25 Feb 2023 01:17:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229715AbjBYBRv (ORCPT ); Fri, 24 Feb 2023 20:17:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49034 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229716AbjBYBRB (ORCPT ); Fri, 24 Feb 2023 20:17:01 -0500 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DBF66136F7 for ; Fri, 24 Feb 2023 17:16:56 -0800 (PST) Received: by mail-oi1-x22e.google.com with SMTP id bk32so763421oib.10 for ; Fri, 24 Feb 2023 17:16:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+CAG4G4BRMu49/oFfT/FlFFOh/aHIajciKpBDkhGpHc=; b=p85qd9YCQiD2mC0dIh1VXPfQCr3/BaYbIRJuhTP8DZjggGYxbKm70xts5fy4k7BnPB CeFV5xA2w72MzeydfYcLe66GB4fIgWslhySeRQLNxwTInN1NCLBTFdotoe/rVq0Lssgd TGwF+jIqKExB4aBLKeenT/ziJffOfdNm7Ie1n39vfeTeWiDk7oo6beR9J/3NwcVkhLMk 8ZbcMBgeZ4Vfk3b6BEIEwD1Cdj+RyIolxfOjM68eh6miIY1FmkxEmyO/AQRRY28gfisr kOHOycsxyHqacaIad2SOnmePUDMaKNoUjhaV09o8TDUy5mE3nbawbB0GHsDsoY7OWjGa M5+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+CAG4G4BRMu49/oFfT/FlFFOh/aHIajciKpBDkhGpHc=; b=3mUwVAi5xBlxO0AyprfcV91qs1DpS9z0BvH/po0wy/XvqmNfq5FDhvxxfgcZOYQGmr 09gvdE/Sn5yw+m7nVDphOCEwdsqtDXMysxLMNRNAUYRU3zAddrLLspDJfWMYqoIrSYTK a34Fsg/uR+KN2lTWLY+xBgsMkVAuWsHE2t08iBj+LH/GDQ+TknOzoNw84P3DJ1ueRVN2 mzC1ATRfWgJEd0bLpASRU6TxiCjBo+ExKAWNVn41Fel/Z+D3d2TX20FsLycyKEqpltdD 84EfYYaIpzUn6abNSKlUe+GngeA0h01b5TdnpV6ZlsnaaR0AJ9UPcu0D1lRYK2SHP/zB V8pA== X-Gm-Message-State: AO0yUKWUaTJ35PkOa40oNGAbhiJQ3liLEJJimzXiJ27tEqfyatWnu+Mv EffTgsJwYNY9fY/IK/wddfKyeabqLEjbWx8w X-Google-Smtp-Source: AK7set+ikzSkNDCVRFailsbktRT8BIz39AvZmx3QyiSv8NRInoC91FnBQXh22YxxyxJREtG14czXuw== X-Received: by 2002:a05:6808:152b:b0:377:f784:3332 with SMTP id u43-20020a056808152b00b00377f7843332mr922328oiw.24.1677287815443; Fri, 24 Feb 2023 17:16:55 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:54 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 38/76] ssdfs: introduce PEB mapping table Date: Fri, 24 Feb 2023 17:08:49 -0800 Message-Id: <20230225010927.813929-39-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org SSDFS file system is based on the concept of logical segment that is the aggregation of Logical Erase Blocks (LEB). Moreover, initially, LEB hasn’t association with a particular "Physical" Erase Block (PEB). It means that segment could have the association not for all LEBs or, even, to have no association at all with any PEB (for example, in the case of clean segment). Generally speaking, SSDFS file system needs a special metadata structure (PEB mapping table) that is capable of associating any LEB with any PEB. The PEB mapping table is the crucial metadata structure that has several goals: (1) mapping LEB to PEB, (2) implementation the logical extent concept, (3) implementation the concept of PEB migration, (4) implementation of the delayed erase operation by specialized thread. PEB mapping table describes the state of all PEBs on a particular SSDFS file system’s volume. These descriptors are split on several fragments that are distributed amongst PEBs of specialized segments. Every fragment of PEB mapping table represents the log’s payload in a specialized segment. Generally speaking, the payload’s content is split on: (1) LEB table, and (2) PEB table. The LEB table starts from the header and it contains the array of records are ordered by LEB IDs. It means that LEB ID plays the role of index in the array of records. As a result, the responsibility of LEB table is to define an index inside of PEB table. Moreover, every LEB table’s record defines two indexes. The first index (physical index) associates the LEB ID with some PEB ID. Additionally, the second index (relation index) is able to define a PEB ID that plays the role of destination PEB during the migration process from the exhausted PEB into a new one. It is possible to see that PEB table starts from the header and it contains the array of PEB’s state records is ordered by PEB ID. The most important fields of the PEB’s state record are: (1) erase cycles, (2) PEB type, (3) PEB state. PEB type describes possible types of data that PEB could contain: (1) user data, (2) leaf b-tree node, (3) hybrid b-tree node, (4) index b-tree node, (5) snapshot, (6) superblock, (7) segment bitmap, (8) PEB mapping table. PEB state describes possible states of PEB during the lifecycle: (1) clean state means that PEB contains only free NAND flash pages are ready for write operations, (2) using state means that PEB could contain valid, invalid, and free pages, (3) used state means that PEB contains only valid pages, (4) pre-dirty state means that PEB contains as valid as invalid pages only, (5) dirty state means that PEB contains only invalid pages, (6) migrating state means that PEB is under migration, (7) pre-erase state means that PEB is added into the queue of PEBs are waiting the erase operation, (8) recovering state means that PEB will be untouched during some amount of time with the goal to recover the ability to fulfill the erase operation, (9) bad state means that PEB is unable to be used for storing the data. Generally speaking, the responsibility of PEB state is to track the passing of PEBs through various phases of their lifetime with the goal to manage the PEBs’ pool of the file system’s volume efficiently. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/peb_mapping_queue.c | 334 ++++++ fs/ssdfs/peb_mapping_queue.h | 67 ++ fs/ssdfs/peb_mapping_table.c | 1954 ++++++++++++++++++++++++++++++++++ fs/ssdfs/peb_mapping_table.h | 699 ++++++++++++ 4 files changed, 3054 insertions(+) create mode 100644 fs/ssdfs/peb_mapping_queue.c create mode 100644 fs/ssdfs/peb_mapping_queue.h create mode 100644 fs/ssdfs/peb_mapping_table.c create mode 100644 fs/ssdfs/peb_mapping_table.h diff --git a/fs/ssdfs/peb_mapping_queue.c b/fs/ssdfs/peb_mapping_queue.c new file mode 100644 index 000000000000..7d00060da941 --- /dev/null +++ b/fs/ssdfs/peb_mapping_queue.c @@ -0,0 +1,334 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/peb_mapping_queue.c - PEB mappings queue implementation. + * + * Copyright (c) 2019-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + */ + +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_map_queue_page_leaks; +atomic64_t ssdfs_map_queue_memory_leaks; +atomic64_t ssdfs_map_queue_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_map_queue_cache_leaks_increment(void *kaddr) + * void ssdfs_map_queue_cache_leaks_decrement(void *kaddr) + * void *ssdfs_map_queue_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_map_queue_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_map_queue_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_map_queue_kfree(void *kaddr) + * struct page *ssdfs_map_queue_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_map_queue_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_map_queue_free_page(struct page *page) + * void ssdfs_map_queue_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(map_queue) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(map_queue) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_map_queue_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_map_queue_page_leaks, 0); + atomic64_set(&ssdfs_map_queue_memory_leaks, 0); + atomic64_set(&ssdfs_map_queue_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_map_queue_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_map_queue_page_leaks) != 0) { + SSDFS_ERR("MAPPING QUEUE: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_map_queue_page_leaks)); + } + + if (atomic64_read(&ssdfs_map_queue_memory_leaks) != 0) { + SSDFS_ERR("MAPPING QUEUE: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_map_queue_memory_leaks)); + } + + if (atomic64_read(&ssdfs_map_queue_cache_leaks) != 0) { + SSDFS_ERR("MAPPING QUEUE: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_map_queue_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +static struct kmem_cache *ssdfs_peb_mapping_info_cachep; + +void ssdfs_zero_peb_mapping_info_cache_ptr(void) +{ + ssdfs_peb_mapping_info_cachep = NULL; +} + +static +void ssdfs_init_peb_mapping_info_once(void *obj) +{ + struct ssdfs_peb_mapping_info *pmi_obj = obj; + + memset(pmi_obj, 0, sizeof(struct ssdfs_peb_mapping_info)); +} + +void ssdfs_shrink_peb_mapping_info_cache(void) +{ + if (ssdfs_peb_mapping_info_cachep) + kmem_cache_shrink(ssdfs_peb_mapping_info_cachep); +} + +void ssdfs_destroy_peb_mapping_info_cache(void) +{ + if (ssdfs_peb_mapping_info_cachep) + kmem_cache_destroy(ssdfs_peb_mapping_info_cachep); +} + +int ssdfs_init_peb_mapping_info_cache(void) +{ + ssdfs_peb_mapping_info_cachep = + kmem_cache_create("ssdfs_peb_mapping_info_cache", + sizeof(struct ssdfs_peb_mapping_info), 0, + SLAB_RECLAIM_ACCOUNT | + SLAB_MEM_SPREAD | + SLAB_ACCOUNT, + ssdfs_init_peb_mapping_info_once); + if (!ssdfs_peb_mapping_info_cachep) { + SSDFS_ERR("unable to create PEB mapping info objects cache\n"); + return -ENOMEM; + } + + return 0; +} + +/* + * ssdfs_peb_mapping_queue_init() - initialize PEB mappings queue + * @pmq: initialized PEB mappings queue + */ +void ssdfs_peb_mapping_queue_init(struct ssdfs_peb_mapping_queue *pmq) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pmq); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock_init(&pmq->lock); + INIT_LIST_HEAD(&pmq->list); +} + +/* + * is_ssdfs_peb_mapping_queue_empty() - check that PEB mappings queue is empty + * @pmq: PEB mappings queue + */ +bool is_ssdfs_peb_mapping_queue_empty(struct ssdfs_peb_mapping_queue *pmq) +{ + bool is_empty; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pmq); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&pmq->lock); + is_empty = list_empty_careful(&pmq->list); + spin_unlock(&pmq->lock); + + return is_empty; +} + +/* + * ssdfs_peb_mapping_queue_add_head() - add PEB mapping at the head of queue + * @pmq: PEB mappings queue + * @pmi: PEB mapping info + */ +void ssdfs_peb_mapping_queue_add_head(struct ssdfs_peb_mapping_queue *pmq, + struct ssdfs_peb_mapping_info *pmi) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pmq || !pmi); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&pmq->lock); + list_add(&pmi->list, &pmq->list); + spin_unlock(&pmq->lock); +} + +/* + * ssdfs_peb_mapping_queue_add_tail() - add PEB mapping at the tail of queue + * @pmq: PEB mappings queue + * @pmi: PEB mapping info + */ +void ssdfs_peb_mapping_queue_add_tail(struct ssdfs_peb_mapping_queue *pmq, + struct ssdfs_peb_mapping_info *pmi) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pmq || !pmi); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&pmq->lock); + list_add_tail(&pmi->list, &pmq->list); + spin_unlock(&pmq->lock); +} + +/* + * ssdfs_peb_mapping_queue_remove_first() - get mapping and remove from queue + * @pmq: PEB mappings queue + * @pmi: first PEB mapping [out] + * + * This function get first PEB mapping in @pmq, remove it from queue + * and return as @pmi. + * + * RETURN: + * [success] - @pmi contains pointer on PEB mapping. + * [failure] - error code: + * + * %-ENODATA - queue is empty. + * %-ENOENT - first entry is NULL. + */ +int ssdfs_peb_mapping_queue_remove_first(struct ssdfs_peb_mapping_queue *pmq, + struct ssdfs_peb_mapping_info **pmi) +{ + bool is_empty; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pmq || !pmi); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&pmq->lock); + is_empty = list_empty_careful(&pmq->list); + if (!is_empty) { + *pmi = list_first_entry_or_null(&pmq->list, + struct ssdfs_peb_mapping_info, + list); + if (!*pmi) { + SSDFS_WARN("first entry is NULL\n"); + err = -ENOENT; + } else + list_del(&(*pmi)->list); + } + spin_unlock(&pmq->lock); + + if (is_empty) { + SSDFS_WARN("PEB mappings queue is empty\n"); + err = -ENODATA; + } + + return err; +} + +/* + * ssdfs_peb_mapping_queue_remove_all() - remove all PEB mappings from queue + * @pmq: PEB mappings queue + * + * This function removes all PEB mappings from the queue. + */ +void ssdfs_peb_mapping_queue_remove_all(struct ssdfs_peb_mapping_queue *pmq) +{ + bool is_empty; + LIST_HEAD(tmp_list); + struct list_head *this, *next; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pmq); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&pmq->lock); + is_empty = list_empty_careful(&pmq->list); + if (!is_empty) + list_replace_init(&pmq->list, &tmp_list); + spin_unlock(&pmq->lock); + + if (is_empty) + return; + + list_for_each_safe(this, next, &tmp_list) { + struct ssdfs_peb_mapping_info *pmi; + + pmi = list_entry(this, struct ssdfs_peb_mapping_info, list); + list_del(&pmi->list); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("delete PEB mapping: " + "leb_id %llu, peb_id %llu, consistency %d\n", + pmi->leb_id, pmi->peb_id, pmi->consistency); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_peb_mapping_info_free(pmi); + } +} + +/* + * ssdfs_peb_mapping_info_alloc() - allocate memory for PEB mapping info object + */ +struct ssdfs_peb_mapping_info *ssdfs_peb_mapping_info_alloc(void) +{ + struct ssdfs_peb_mapping_info *ptr; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ssdfs_peb_mapping_info_cachep); +#endif /* CONFIG_SSDFS_DEBUG */ + + ptr = kmem_cache_alloc(ssdfs_peb_mapping_info_cachep, GFP_KERNEL); + if (!ptr) { + SSDFS_ERR("fail to allocate memory for PEB mapping\n"); + return ERR_PTR(-ENOMEM); + } + + ssdfs_map_queue_cache_leaks_increment(ptr); + + return ptr; +} + +/* + * ssdfs_peb_mapping_info_free() - free memory for PEB mapping info object + */ +void ssdfs_peb_mapping_info_free(struct ssdfs_peb_mapping_info *pmi) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ssdfs_peb_mapping_info_cachep); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!pmi) + return; + + ssdfs_map_queue_cache_leaks_decrement(pmi); + kmem_cache_free(ssdfs_peb_mapping_info_cachep, pmi); +} + +/* + * ssdfs_peb_mapping_info_init() - PEB mapping info initialization + * @leb_id: LEB ID + * @peb_id: PEB ID + * @consistency: consistency state in PEB mapping table cache + * @pmi: PEB mapping info [out] + */ +void ssdfs_peb_mapping_info_init(u64 leb_id, u64 peb_id, int consistency, + struct ssdfs_peb_mapping_info *pmi) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pmi); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset(pmi, 0, sizeof(struct ssdfs_peb_mapping_info)); + + INIT_LIST_HEAD(&pmi->list); + pmi->leb_id = leb_id; + pmi->peb_id = peb_id; + pmi->consistency = consistency; +} diff --git a/fs/ssdfs/peb_mapping_queue.h b/fs/ssdfs/peb_mapping_queue.h new file mode 100644 index 000000000000..0d9c7305c318 --- /dev/null +++ b/fs/ssdfs/peb_mapping_queue.h @@ -0,0 +1,67 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/peb_mapping_queue.h - PEB mappings queue declarations. + * + * Copyright (c) 2019-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + */ + +#ifndef _SSDFS_PEB_MAPPING_QUEUE_H +#define _SSDFS_PEB_MAPPING_QUEUE_H + +/* + * struct ssdfs_peb_mapping_queue - PEB mappings queue descriptor + * @lock: extents queue's lock + * @list: extents queue's list + */ +struct ssdfs_peb_mapping_queue { + spinlock_t lock; + struct list_head list; +}; + +/* + * struct ssdfs_peb_mapping_info - peb mapping info + * @list: extents queue list + * @leb_id: LEB ID + * @peb_id: PEB ID + * @consistency: consistency state in the mapping table cache + */ +struct ssdfs_peb_mapping_info { + struct list_head list; + u64 leb_id; + u64 peb_id; + int consistency; +}; + +/* + * PEB mappings queue API + */ +void ssdfs_peb_mapping_queue_init(struct ssdfs_peb_mapping_queue *pmq); +bool is_ssdfs_peb_mapping_queue_empty(struct ssdfs_peb_mapping_queue *pmq); +void ssdfs_peb_mapping_queue_add_tail(struct ssdfs_peb_mapping_queue *pmq, + struct ssdfs_peb_mapping_info *pmi); +void ssdfs_peb_mapping_queue_add_head(struct ssdfs_peb_mapping_queue *pmq, + struct ssdfs_peb_mapping_info *pmi); +int ssdfs_peb_mapping_queue_remove_first(struct ssdfs_peb_mapping_queue *pmq, + struct ssdfs_peb_mapping_info **pmi); +void ssdfs_peb_mapping_queue_remove_all(struct ssdfs_peb_mapping_queue *pmq); + +/* + * PEB mapping info's API + */ +void ssdfs_zero_peb_mapping_info_cache_ptr(void); +int ssdfs_init_peb_mapping_info_cache(void); +void ssdfs_shrink_peb_mapping_info_cache(void); +void ssdfs_destroy_peb_mapping_info_cache(void); + +struct ssdfs_peb_mapping_info *ssdfs_peb_mapping_info_alloc(void); +void ssdfs_peb_mapping_info_free(struct ssdfs_peb_mapping_info *pmi); +void ssdfs_peb_mapping_info_init(u64 leb_id, u64 peb_id, int consistency, + struct ssdfs_peb_mapping_info *pmi); + +#endif /* _SSDFS_PEB_MAPPING_QUEUE_H */ diff --git a/fs/ssdfs/peb_mapping_table.c b/fs/ssdfs/peb_mapping_table.c new file mode 100644 index 000000000000..cd5835eb96a2 --- /dev/null +++ b/fs/ssdfs/peb_mapping_table.c @@ -0,0 +1,1954 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/peb_mapping_table.c - PEB mapping table implementation. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates. + * https://www.bytedance.com/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + * Cong Wang + */ + +#include +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "segment_bitmap.h" +#include "offset_translation_table.h" +#include "page_array.h" +#include "page_vector.h" +#include "peb.h" +#include "peb_container.h" +#include "segment.h" +#include "btree_search.h" +#include "btree_node.h" +#include "btree.h" +#include "extents_tree.h" +#include "extents_queue.h" +#include "shared_extents_tree.h" +#include "snapshots_tree.h" +#include "peb_mapping_table.h" + +#include + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_map_tbl_page_leaks; +atomic64_t ssdfs_map_tbl_memory_leaks; +atomic64_t ssdfs_map_tbl_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_map_tbl_cache_leaks_increment(void *kaddr) + * void ssdfs_map_tbl_cache_leaks_decrement(void *kaddr) + * void *ssdfs_map_tbl_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_map_tbl_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_map_tbl_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_map_tbl_kfree(void *kaddr) + * struct page *ssdfs_map_tbl_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_map_tbl_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_map_tbl_free_page(struct page *page) + * void ssdfs_map_tbl_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(map_tbl) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(map_tbl) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_map_tbl_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_map_tbl_page_leaks, 0); + atomic64_set(&ssdfs_map_tbl_memory_leaks, 0); + atomic64_set(&ssdfs_map_tbl_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_map_tbl_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_map_tbl_page_leaks) != 0) { + SSDFS_ERR("MAPPING TABLE: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_map_tbl_page_leaks)); + } + + if (atomic64_read(&ssdfs_map_tbl_memory_leaks) != 0) { + SSDFS_ERR("MAPPING TABLE: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_map_tbl_memory_leaks)); + } + + if (atomic64_read(&ssdfs_map_tbl_cache_leaks) != 0) { + SSDFS_ERR("MAPPING TABLE: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_map_tbl_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +/* + * ssdfs_unused_lebs_in_fragment() - calculate unused LEBs in fragment + * @fdesc: fragment descriptor + */ +static inline +u32 ssdfs_unused_lebs_in_fragment(struct ssdfs_maptbl_fragment_desc *fdesc) +{ + u32 unused_lebs; + u32 reserved_pool; + + reserved_pool = fdesc->reserved_pebs + fdesc->pre_erase_pebs; + + unused_lebs = fdesc->lebs_count; + unused_lebs -= fdesc->mapped_lebs + fdesc->migrating_lebs; + unused_lebs -= reserved_pool; + + return unused_lebs; +} + +static inline +u32 ssdfs_lebs_reservation_threshold(struct ssdfs_maptbl_fragment_desc *fdesc) +{ + u32 expected2migrate = 0; + u32 reserved_pool = 0; + u32 migration_NOT_guaranted = 0; + u32 threshold; + + expected2migrate = fdesc->mapped_lebs - fdesc->migrating_lebs; + reserved_pool = fdesc->reserved_pebs + fdesc->pre_erase_pebs; + + if (expected2migrate > reserved_pool) + migration_NOT_guaranted = expected2migrate - reserved_pool; + else + migration_NOT_guaranted = 0; + + threshold = migration_NOT_guaranted / 10; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("lebs_count %u, mapped_lebs %u, " + "migrating_lebs %u, reserved_pebs %u, " + "pre_erase_pebs %u, expected2migrate %u, " + "reserved_pool %u, migration_NOT_guaranted %u, " + "threshold %u\n", + fdesc->lebs_count, fdesc->mapped_lebs, + fdesc->migrating_lebs, fdesc->reserved_pebs, + fdesc->pre_erase_pebs, expected2migrate, + reserved_pool, migration_NOT_guaranted, + threshold); +#endif /* CONFIG_SSDFS_DEBUG */ + + return threshold; +} + +int ssdfs_maptbl_define_fragment_info(struct ssdfs_fs_info *fsi, + u64 leb_id, + u16 *pebs_per_fragment, + u16 *pebs_per_stripe, + u16 *stripes_per_fragment) +{ + struct ssdfs_peb_mapping_table *tbl; + u32 fragments_count; + u64 lebs_count; + u16 pebs_per_fragment_default; + u16 pebs_per_stripe_default; + u16 stripes_per_fragment_default; + u64 fragment_index; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !fsi->maptbl); + + SSDFS_DBG("leb_id %llu\n", leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + tbl = fsi->maptbl; + + *pebs_per_fragment = U16_MAX; + *pebs_per_stripe = U16_MAX; + *stripes_per_fragment = U16_MAX; + + if (atomic_read(&tbl->flags) & SSDFS_MAPTBL_ERROR) { + ssdfs_fs_error(tbl->fsi->sb, + __FILE__, __func__, __LINE__, + "maptbl has corrupted state\n"); + return -EFAULT; + } + + down_read(&tbl->tbl_lock); + fragments_count = tbl->fragments_count; + lebs_count = tbl->lebs_count; + pebs_per_fragment_default = tbl->pebs_per_fragment; + pebs_per_stripe_default = tbl->pebs_per_stripe; + stripes_per_fragment_default = tbl->stripes_per_fragment; + up_read(&tbl->tbl_lock); + + if (leb_id >= lebs_count) { + SSDFS_ERR("invalid request: " + "leb_id %llu, lebs_count %llu\n", + leb_id, lebs_count); + return -EINVAL; + } + + fragment_index = div_u64(leb_id, (u32)pebs_per_fragment_default); + + if ((fragment_index + 1) < fragments_count) { + *pebs_per_fragment = pebs_per_fragment_default; + *pebs_per_stripe = pebs_per_stripe_default; + *stripes_per_fragment = stripes_per_fragment_default; + } else { + u64 rest_pebs; + + rest_pebs = (u64)fragment_index * pebs_per_fragment_default; + rest_pebs = lebs_count - rest_pebs; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(rest_pebs >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + *pebs_per_fragment = (u16)rest_pebs; + *stripes_per_fragment = stripes_per_fragment_default; + + *pebs_per_stripe = *pebs_per_fragment / *stripes_per_fragment; + if (*pebs_per_fragment % *stripes_per_fragment) + *pebs_per_stripe += 1; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("leb_id %llu, pebs_per_fragment %u, " + "pebs_per_stripe %u, stripes_per_fragment %u\n", + leb_id, *pebs_per_fragment, + *pebs_per_stripe, *stripes_per_fragment); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_check_maptbl_sb_header() - check mapping table's sb_header + * @fsi: file system info object + * + * This method checks mapping table description in volume header. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - maptbl_sb_header is corrupted. + * %-EROFS - mapping table has corrupted state. + */ +static +int ssdfs_check_maptbl_sb_header(struct ssdfs_fs_info *fsi) +{ + struct ssdfs_peb_mapping_table *ptr; + u64 calculated; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !fsi->maptbl); + + SSDFS_DBG("fsi %p, maptbl %p\n", fsi, fsi->maptbl); +#endif /* CONFIG_SSDFS_DEBUG */ + + ptr = fsi->maptbl; + + if (atomic_read(&ptr->flags) & ~SSDFS_MAPTBL_FLAGS_MASK) { + SSDFS_CRIT("maptbl header corrupted: " + "unknown flags %#x\n", + atomic_read(&ptr->flags)); + return -EIO; + } + + if (atomic_read(&ptr->flags) & SSDFS_MAPTBL_ERROR) { + SSDFS_NOTICE("mapping table has corrupted state: " + "Please, run fsck utility\n"); + return -EROFS; + } + + calculated = (u64)ptr->fragments_per_seg * ptr->fragment_bytes; + if (calculated > fsi->segsize) { + SSDFS_CRIT("mapping table has corrupted state: " + "fragments_per_seg %u, fragment_bytes %u, " + "segsize %u\n", + ptr->fragments_per_seg, + ptr->fragment_bytes, + fsi->segsize); + return -EIO; + } + + calculated = (u64)ptr->fragments_per_peb * ptr->fragment_bytes; + if (calculated > fsi->erasesize) { + SSDFS_CRIT("mapping table has corrupted state: " + "fragments_per_peb %u, fragment_bytes %u, " + "erasesize %u\n", + ptr->fragments_per_peb, + ptr->fragment_bytes, + fsi->erasesize); + return -EIO; + } + + calculated = (u64)ptr->fragments_per_peb * fsi->pebs_per_seg; + if (calculated != ptr->fragments_per_seg) { + SSDFS_CRIT("mapping table has corrupted state: " + "fragments_per_peb %u, fragments_per_seg %u, " + "pebs_per_seg %u\n", + ptr->fragments_per_peb, + ptr->fragments_per_seg, + fsi->pebs_per_seg); + return -EIO; + } + + calculated = fsi->nsegs * fsi->pebs_per_seg; + if (ptr->lebs_count != calculated || ptr->pebs_count != calculated) { + SSDFS_CRIT("mapping table has corrupted state: " + "lebs_count %llu, pebs_count %llu, " + "nsegs %llu, pebs_per_seg %u\n", + ptr->lebs_count, ptr->pebs_count, + fsi->nsegs, fsi->pebs_per_seg); + return -EIO; + } + + calculated = (u64)ptr->fragments_count * ptr->lebs_per_fragment; + if (ptr->lebs_count > calculated || + calculated > (ptr->lebs_count + (2 * ptr->lebs_per_fragment))) { + SSDFS_CRIT("mapping table has corrupted state: " + "lebs_per_fragment %u, fragments_count %u, " + "lebs_per_fragment %u\n", + ptr->lebs_per_fragment, + ptr->fragments_count, + ptr->lebs_per_fragment); + return -EIO; + } + + calculated = (u64)ptr->fragments_count * ptr->pebs_per_fragment; + if (ptr->pebs_count > calculated || + calculated > (ptr->pebs_count + (2 * ptr->pebs_per_fragment))) { + SSDFS_CRIT("mapping table has corrupted state: " + "pebs_per_fragment %u, fragments_count %u, " + "pebs_per_fragment %u\n", + ptr->pebs_per_fragment, + ptr->fragments_count, + ptr->pebs_per_fragment); + return -EIO; + } + + calculated = (u64)ptr->pebs_per_stripe * ptr->stripes_per_fragment; + if (ptr->pebs_per_fragment != calculated) { + SSDFS_CRIT("mapping table has corrupted state: " + "pebs_per_stripe %u, stripes_per_fragment %u, " + "pebs_per_fragment %u\n", + ptr->pebs_per_stripe, + ptr->stripes_per_fragment, + ptr->pebs_per_fragment); + return -EIO; + } + + return 0; +} + +/* + * ssdfs_maptbl_create_fragment() - initial fragment preparation. + * @fsi: file system info object + * @index: fragment index + */ +static +int ssdfs_maptbl_create_fragment(struct ssdfs_fs_info *fsi, u32 index) +{ + struct ssdfs_maptbl_fragment_desc *ptr; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !fsi->maptbl || !fsi->maptbl->desc_array); + BUG_ON(index >= fsi->maptbl->fragments_count); + + SSDFS_DBG("fsi %p, index %u\n", fsi, index); +#endif /* CONFIG_SSDFS_DEBUG */ + + ptr = &fsi->maptbl->desc_array[index]; + + init_rwsem(&ptr->lock); + ptr->fragment_id = index; + ptr->fragment_pages = fsi->maptbl->fragment_pages; + ptr->start_leb = U64_MAX; + ptr->lebs_count = U32_MAX; + ptr->lebs_per_page = U16_MAX; + ptr->lebtbl_pages = U16_MAX; + ptr->pebs_per_page = U16_MAX; + ptr->stripe_pages = U16_MAX; + ptr->mapped_lebs = 0; + ptr->migrating_lebs = 0; + ptr->reserved_pebs = 0; + ptr->pre_erase_pebs = 0; + ptr->recovering_pebs = 0; + + err = ssdfs_create_page_array(ptr->fragment_pages, &ptr->array); + if (unlikely(err)) { + SSDFS_ERR("fail to create page array: " + "capacity %u, err %d\n", + ptr->fragment_pages, err); + return err; + } + + init_completion(&ptr->init_end); + + ptr->flush_req1 = NULL; + ptr->flush_req2 = NULL; + ptr->flush_req_count = 0; + + ptr->flush_seq_size = min_t(u32, ptr->fragment_pages, PAGEVEC_SIZE); + ptr->flush_req1 = ssdfs_map_tbl_kcalloc(ptr->flush_seq_size, + sizeof(struct ssdfs_segment_request), + GFP_KERNEL); + if (!ptr->flush_req1) { + ssdfs_destroy_page_array(&ptr->array); + SSDFS_ERR("fail to allocate flush requests array: " + "array_size %u\n", + ptr->flush_seq_size); + return -ENODATA; + } + + ptr->flush_req2 = ssdfs_map_tbl_kcalloc(ptr->flush_seq_size, + sizeof(struct ssdfs_segment_request), + GFP_KERNEL); + if (!ptr->flush_req2) { + ssdfs_destroy_page_array(&ptr->array); + ssdfs_map_tbl_kfree(ptr->flush_req1); + ptr->flush_req1 = NULL; + SSDFS_ERR("fail to allocate flush requests array: " + "array_size %u\n", + ptr->flush_seq_size); + return -ENODATA; + } + + atomic_set(&ptr->state, SSDFS_MAPTBL_FRAG_CREATED); + + return 0; +} + +/* + * CHECK_META_EXTENT_TYPE() - check type of metadata area's extent + */ +static +int CHECK_META_EXTENT_TYPE(struct ssdfs_meta_area_extent *extent) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!extent); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (le16_to_cpu(extent->type)) { + case SSDFS_EMPTY_EXTENT_TYPE: + return -ENODATA; + + case SSDFS_SEG_EXTENT_TYPE: + return 0; + } + + return -EOPNOTSUPP; +} + +/* + * ssdfs_maptbl_define_segment_counts() - define total maptbl's segments count + * @tbl: mapping table object + * + * This method determines total count of segments that are allocated + * for mapping table. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - extents are corrupted. + */ +static +int ssdfs_maptbl_define_segment_counts(struct ssdfs_peb_mapping_table *tbl) +{ + u32 segs_count1 = 0, segs_count2 = 0; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl); + + SSDFS_DBG("tbl %p\n", tbl); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < SSDFS_MAPTBL_RESERVED_EXTENTS; i++) { + struct ssdfs_meta_area_extent *extent; + u32 len1 = 0, len2 = 0; + + extent = &tbl->extents[i][SSDFS_MAIN_MAPTBL_SEG]; + + err = CHECK_META_EXTENT_TYPE(extent); + if (err == -ENODATA) { + /* do nothing */ + break; + } else if (unlikely(err)) { + SSDFS_WARN("invalid meta area extent: " + "index %d, err %d\n", + i, err); + return err; + } + + len1 = le32_to_cpu(extent->len); + + if (atomic_read(&tbl->flags) & SSDFS_MAPTBL_HAS_COPY) { + extent = &tbl->extents[i][SSDFS_COPY_MAPTBL_SEG]; + + err = CHECK_META_EXTENT_TYPE(extent); + if (err == -ENODATA) { + SSDFS_ERR("empty copy meta area extent: " + "index %d\n", i); + return -EIO; + } else if (unlikely(err)) { + SSDFS_WARN("invalid meta area extent: " + "index %d, err %d\n", + i, err); + return err; + } + + len2 = le32_to_cpu(extent->len); + + if (len1 != len2) { + SSDFS_ERR("different main and copy extents: " + "index %d, len1 %u, len2 %u\n", + i, len1, len2); + return -EIO; + } + } + + segs_count1 += len1; + segs_count2 += len2; + } + + if (segs_count1 == 0) { + SSDFS_CRIT("empty maptbl extents\n"); + return -EIO; + } else if (segs_count1 >= U16_MAX) { + SSDFS_CRIT("invalid segment count %u\n", + segs_count1); + return -EIO; + } + + if (atomic_read(&tbl->flags) & SSDFS_MAPTBL_HAS_COPY && + segs_count1 != segs_count2) { + SSDFS_ERR("segs_count1 %u != segs_count2 %u\n", + segs_count1, segs_count2); + return -EIO; + } + + tbl->segs_count = (u16)segs_count1; + return 0; +} + +/* + * ssdfs_maptbl_create_segments() - create mapping table's segment objects + * @fsi: file system info object + * @array_type: main/backup segments chain + * @tbl: mapping table object + * + * This method tries to create mapping table's segment objects. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - fail to allocate memory. + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_create_segments(struct ssdfs_fs_info *fsi, + int array_type, + struct ssdfs_peb_mapping_table *tbl) +{ + u64 seg; + int seg_type = SSDFS_MAPTBL_SEG_TYPE; + int seg_state = SSDFS_SEG_LEAF_NODE_USING; + u16 log_pages; + u8 create_threads; + struct ssdfs_segment_info **kaddr = NULL; + int i, j; + u32 created_segs = 0; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !tbl); + BUG_ON(array_type >= SSDFS_MAPTBL_SEG_COPY_MAX); + BUG_ON(!rwsem_is_locked(&fsi->volume_sem)); + + SSDFS_DBG("fsi %p, array_type %#x, tbl %p, segs_count %u\n", + fsi, array_type, tbl, tbl->segs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + log_pages = le16_to_cpu(fsi->vh->maptbl_log_pages); + create_threads = fsi->create_threads_per_seg; + + tbl->segs[array_type] = ssdfs_map_tbl_kcalloc(tbl->segs_count, + sizeof(struct ssdfs_segment_info *), + GFP_KERNEL); + if (!tbl->segs[array_type]) { + SSDFS_ERR("fail to allocate segment array\n"); + return -ENOMEM; + } + + for (i = 0; i < SSDFS_MAPTBL_RESERVED_EXTENTS; i++) { + struct ssdfs_meta_area_extent *extent; + u64 start_seg; + u32 len; + + extent = &tbl->extents[i][array_type]; + + err = CHECK_META_EXTENT_TYPE(extent); + if (err == -ENODATA) { + /* do nothing */ + break; + } else if (unlikely(err)) { + SSDFS_WARN("invalid meta area extent: " + "index %d, err %d\n", + i, err); + return err; + } + + start_seg = le64_to_cpu(extent->start_id); + len = le32_to_cpu(extent->len); + + for (j = 0; j < len; j++) { + if (created_segs >= tbl->segs_count) { + SSDFS_ERR("created_segs %u >= segs_count %u\n", + created_segs, tbl->segs_count); + return -ERANGE; + } + + seg = start_seg + j; + BUG_ON(!tbl->segs[array_type]); + kaddr = &tbl->segs[array_type][created_segs]; + BUG_ON(*kaddr != NULL); + + *kaddr = ssdfs_segment_allocate_object(seg); + if (IS_ERR_OR_NULL(*kaddr)) { + err = !*kaddr ? -ENOMEM : PTR_ERR(*kaddr); + *kaddr = NULL; + SSDFS_ERR("fail to allocate segment object: " + "seg %llu, err %d\n", + seg, err); + return err; + } + + err = ssdfs_segment_create_object(fsi, seg, seg_state, + seg_type, log_pages, + create_threads, + *kaddr); + if (err == -EINTR) { + /* + * Ignore this error. + */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create segment: " + "seg %llu, err %d\n", + seg, err); + return err; + } + + ssdfs_segment_get_object(*kaddr); + created_segs++; + } + } + + if (created_segs != tbl->segs_count) { + SSDFS_ERR("created_segs %u != tbl->segs_count %u\n", + created_segs, tbl->segs_count); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_maptbl_destroy_segments() - destroy mapping table's segment objects + * @tbl: mapping table object + */ +static +void ssdfs_maptbl_destroy_segments(struct ssdfs_peb_mapping_table *tbl) +{ + struct ssdfs_segment_info *si; + int i, j; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl); + + SSDFS_DBG("maptbl %p\n", tbl); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < tbl->segs_count; i++) { + for (j = 0; j < SSDFS_MAPTBL_SEG_COPY_MAX; j++) { + if (tbl->segs[j] == NULL) + continue; + + si = tbl->segs[j][i]; + + ssdfs_segment_put_object(si); + err = ssdfs_segment_destroy_object(si); + if (unlikely(err == -EBUSY)) + BUG(); + else if (unlikely(err)) { + SSDFS_WARN("issue during segment destroy: " + "err %d\n", + err); + } + } + } + + for (i = 0; i < SSDFS_MAPTBL_SEG_COPY_MAX; i++) { + ssdfs_map_tbl_kfree(tbl->segs[i]); + tbl->segs[i] = NULL; + } +} + +/* + * ssdfs_maptbl_destroy_fragment() - destroy mapping table's fragment + * @fsi: file system info object + * @index: fragment index + */ +inline +void ssdfs_maptbl_destroy_fragment(struct ssdfs_fs_info *fsi, u32 index) +{ + struct ssdfs_maptbl_fragment_desc *ptr; + int state; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !fsi->maptbl || !fsi->maptbl->desc_array); + BUG_ON(index >= fsi->maptbl->fragments_count); + + SSDFS_DBG("fsi %p, index %u\n", fsi, index); +#endif /* CONFIG_SSDFS_DEBUG */ + + ptr = &fsi->maptbl->desc_array[index]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(rwsem_is_locked(&ptr->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_read(&ptr->state); + + if (state == SSDFS_MAPTBL_FRAG_DIRTY) + SSDFS_WARN("fragment %u is dirty\n", index); + else if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED) { + SSDFS_DBG("fragment %u init was failed\n", index); + return; + } else if (state >= SSDFS_MAPTBL_FRAG_STATE_MAX) + BUG(); + + if (ptr->flush_req1) { + ssdfs_map_tbl_kfree(ptr->flush_req1); + ptr->flush_req1 = NULL; + } + + if (ptr->flush_req2) { + ssdfs_map_tbl_kfree(ptr->flush_req2); + ptr->flush_req2 = NULL; + } + + ssdfs_destroy_page_array(&ptr->array); + complete_all(&ptr->init_end); +} + +/* + * ssdfs_maptbl_segment_init() - initiate mapping table's segment init + * @tbl: mapping table object + * @si: segment object + * @seg_index: index of segment in the sequence + */ +static +int ssdfs_maptbl_segment_init(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_segment_info *si, + int seg_index) +{ + u32 page_size; + u64 logical_offset; + u64 logical_blk; + u32 blks_count; + u32 fragment_bytes = tbl->fragment_bytes; + u64 bytes_per_peb = (u64)tbl->fragments_per_peb * fragment_bytes; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si); + + SSDFS_DBG("si %p, seg %llu, seg_index %d\n", + si, si->seg_id, seg_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_size = si->fsi->pagesize; + logical_offset = bytes_per_peb * si->pebs_count * seg_index; + + for (i = 0; i < si->pebs_count; i++) { + struct ssdfs_peb_container *pebc = &si->peb_array[i]; + struct ssdfs_segment_request *req; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_peb_container_empty(pebc)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PEB container empty: " + "seg %llu, peb_index %d\n", + si->seg_id, i); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } + + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + req = NULL; + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + return err; + } + + ssdfs_request_init(req); + ssdfs_get_request(req); + + logical_offset += bytes_per_peb * i; + + ssdfs_request_prepare_logical_extent(SSDFS_MAPTBL_INO, + logical_offset, + fragment_bytes, + 0, 0, req); + ssdfs_request_define_segment(si->seg_id, req); + + logical_blk = (u64)i * fragment_bytes; + logical_blk = div64_u64(logical_blk, si->fsi->pagesize); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(logical_blk >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + blks_count = (fragment_bytes + page_size - 1) / page_size; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(blks_count >= U16_MAX); + + SSDFS_DBG("seg %llu, peb_index %d, " + "logical_blk %llu, blks_count %u, " + "fragment_bytes %u, page_size %u, " + "logical_offset %llu\n", + si->seg_id, i, + logical_blk, blks_count, + fragment_bytes, page_size, + logical_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_define_volume_extent((u16)logical_blk, + (u16) blks_count, + req); + + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + SSDFS_READ_INIT_MAPTBL, + SSDFS_REQ_ASYNC, + req); + ssdfs_peb_read_request_cno(pebc); + ssdfs_requests_queue_add_tail(&pebc->read_rq, req); + } + + wake_up_all(&si->wait_queue[SSDFS_PEB_READ_THREAD]); + + return 0; +} + +/* + * ssdfs_maptbl_init() - initiate mapping table's initialization procedure + * @tbl: mapping table object + */ +static +int ssdfs_maptbl_init(struct ssdfs_peb_mapping_table *tbl) +{ + struct ssdfs_segment_info *si; + int i, j; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl); + + SSDFS_DBG("maptbl %p\n", tbl); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < tbl->segs_count; i++) { + for (j = 0; j < SSDFS_MAPTBL_SEG_COPY_MAX; j++) { + if (tbl->segs[j] == NULL) + continue; + + si = tbl->segs[j][i]; + + if (!si) + continue; + + err = ssdfs_maptbl_segment_init(tbl, si, i); + if (unlikely(err)) { + SSDFS_ERR("fail to init segment: " + "seg %llu, err %d\n", + si->seg_id, err); + return err; + } + } + } + + return 0; +} + +/* + * ssdfs_maptbl_create() - create mapping table object + * @fsi: file system info object + */ +int ssdfs_maptbl_create(struct ssdfs_fs_info *fsi) +{ + struct ssdfs_peb_mapping_table *ptr; + size_t maptbl_obj_size = sizeof(struct ssdfs_peb_mapping_table); + size_t frag_desc_size = sizeof(struct ssdfs_maptbl_fragment_desc); + void *kaddr; + size_t bytes_count; + size_t bmap_bytes; + int array_type; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("fsi %p, segs_count %llu\n", fsi, fsi->nsegs); +#else + SSDFS_DBG("fsi %p, segs_count %llu\n", fsi, fsi->nsegs); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + kaddr = ssdfs_map_tbl_kzalloc(maptbl_obj_size, GFP_KERNEL); + if (!kaddr) { + SSDFS_ERR("fail to allocate mapping table object\n"); + return -ENOMEM; + } + + fsi->maptbl = ptr = (struct ssdfs_peb_mapping_table *)kaddr; + + ptr->fsi = fsi; + + init_rwsem(&ptr->tbl_lock); + + atomic_set(&ptr->flags, le16_to_cpu(fsi->vh->maptbl.flags)); + ptr->fragments_count = le32_to_cpu(fsi->vh->maptbl.fragments_count); + ptr->fragment_bytes = le32_to_cpu(fsi->vh->maptbl.fragment_bytes); + ptr->fragment_pages = (ptr->fragment_bytes + PAGE_SIZE - 1) / PAGE_SIZE; + ptr->fragments_per_seg = le16_to_cpu(fsi->vh->maptbl.fragments_per_seg); + ptr->fragments_per_peb = le16_to_cpu(fsi->vh->maptbl.fragments_per_peb); + ptr->lebs_count = le64_to_cpu(fsi->vh->maptbl.lebs_count); + ptr->pebs_count = le64_to_cpu(fsi->vh->maptbl.pebs_count); + ptr->lebs_per_fragment = le16_to_cpu(fsi->vh->maptbl.lebs_per_fragment); + ptr->pebs_per_fragment = le16_to_cpu(fsi->vh->maptbl.pebs_per_fragment); + ptr->pebs_per_stripe = le16_to_cpu(fsi->vh->maptbl.pebs_per_stripe); + ptr->stripes_per_fragment = + le16_to_cpu(fsi->vh->maptbl.stripes_per_fragment); + + atomic_set(&ptr->erase_op_state, SSDFS_MAPTBL_NO_ERASE); + atomic_set(&ptr->pre_erase_pebs, + le16_to_cpu(fsi->vh->maptbl.pre_erase_pebs)); + /* + * TODO: the max_erase_ops field should be used by GC or + * special management thread for determination of + * upper bound of erase operations for one iteration + * with the goal to orchestrate I/O load with + * erasing load. But if it will be used TRIM command + * for erasing then maybe the erasing load will be + * no so sensitive. + */ + atomic_set(&ptr->max_erase_ops, ptr->pebs_count); + + init_waitqueue_head(&ptr->erase_ops_end_wq); + + atomic64_set(&ptr->last_peb_recover_cno, + le64_to_cpu(fsi->vh->maptbl.last_peb_recover_cno)); + + bytes_count = sizeof(struct ssdfs_meta_area_extent); + bytes_count *= SSDFS_MAPTBL_RESERVED_EXTENTS; + bytes_count *= SSDFS_MAPTBL_SEG_COPY_MAX; + ssdfs_memcpy(ptr->extents, 0, bytes_count, + fsi->vh->maptbl.extents, 0, bytes_count, + bytes_count); + + mutex_init(&ptr->bmap_lock); + bmap_bytes = ptr->fragments_count + BITS_PER_LONG - 1; + bmap_bytes /= BITS_PER_BYTE; + ptr->dirty_bmap = ssdfs_map_tbl_kzalloc(bmap_bytes, GFP_KERNEL); + if (!ptr->dirty_bmap) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate dirty_bmap\n"); + goto free_maptbl_object; + } + + init_waitqueue_head(&ptr->wait_queue); + + err = ssdfs_check_maptbl_sb_header(fsi); + if (unlikely(err)) { + SSDFS_ERR("mapping table is corrupted: err %d\n", err); + goto free_dirty_bmap; + } + + kaddr = ssdfs_map_tbl_kcalloc(ptr->fragments_count, + frag_desc_size, GFP_KERNEL); + if (!kaddr) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate fragment descriptors array\n"); + goto free_dirty_bmap; + } + + ptr->desc_array = (struct ssdfs_maptbl_fragment_desc *)kaddr; + + for (i = 0; i < ptr->fragments_count; i++) { + err = ssdfs_maptbl_create_fragment(fsi, i); + if (unlikely(err)) { + SSDFS_ERR("fail to create fragment: " + "index %d, err %d\n", + i, err); + + for (--i; i >= 0; i--) { + /* Destroy created fragments */ + ssdfs_maptbl_destroy_fragment(fsi, i); + } + + goto free_fragment_descriptors; + } + } + + err = ssdfs_maptbl_define_segment_counts(ptr); + if (unlikely(err)) { + SSDFS_ERR("fail to define segments count: err %d\n", err); + goto free_fragment_descriptors; + } + + array_type = SSDFS_MAIN_MAPTBL_SEG; + err = ssdfs_maptbl_create_segments(fsi, array_type, ptr); + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto destroy_seg_objects; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create maptbl's segment objects: " + "err %d\n", err); + goto destroy_seg_objects; + } + + if (atomic_read(&ptr->flags) & SSDFS_MAPTBL_HAS_COPY) { + array_type = SSDFS_COPY_MAPTBL_SEG; + err = ssdfs_maptbl_create_segments(fsi, array_type, ptr); + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto destroy_seg_objects; + } if (unlikely(err)) { + SSDFS_ERR("fail to create segbmap's segment objects: " + "err %d\n", err); + goto destroy_seg_objects; + } + } + + err = ssdfs_maptbl_init(ptr); + if (unlikely(err)) { + SSDFS_ERR("fail to init mapping table: err %d\n", + err); + goto destroy_seg_objects; + } + + err = ssdfs_maptbl_start_thread(ptr); + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto destroy_seg_objects; + } else if (unlikely(err)) { + SSDFS_ERR("fail to start mapping table's thread: " + "err %d\n", err); + goto destroy_seg_objects; + } + + atomic_set(&ptr->state, SSDFS_MAPTBL_CREATED); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("DONE: create mapping table\n"); +#else + SSDFS_DBG("DONE: create mapping table\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; + +destroy_seg_objects: + ssdfs_maptbl_destroy_segments(ptr); + +free_fragment_descriptors: + ssdfs_map_tbl_kfree(ptr->desc_array); + +free_dirty_bmap: + ssdfs_map_tbl_kfree(fsi->maptbl->dirty_bmap); + fsi->maptbl->dirty_bmap = NULL; + +free_maptbl_object: + ssdfs_map_tbl_kfree(fsi->maptbl); + fsi->maptbl = NULL; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(err == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_maptbl_destroy() - destroy mapping table object + * @fsi: file system info object + */ +void ssdfs_maptbl_destroy(struct ssdfs_fs_info *fsi) +{ + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("maptbl %p\n", fsi->maptbl); +#else + SSDFS_DBG("maptbl %p\n", fsi->maptbl); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (!fsi->maptbl) + return; + + ssdfs_maptbl_destroy_segments(fsi->maptbl); + + for (i = 0; i < fsi->maptbl->fragments_count; i++) + ssdfs_maptbl_destroy_fragment(fsi, i); + + ssdfs_map_tbl_kfree(fsi->maptbl->desc_array); + ssdfs_map_tbl_kfree(fsi->maptbl->dirty_bmap); + fsi->maptbl->dirty_bmap = NULL; + ssdfs_map_tbl_kfree(fsi->maptbl); + fsi->maptbl = NULL; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ +} + +/* + * ssdfs_maptbl_fragment_desc_init() - prepare fragment descriptor + * @tbl: mapping table object + * @area: mapping table's area descriptor + * @fdesc: mapping table's fragment descriptor + */ +static +void ssdfs_maptbl_fragment_desc_init(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_area *area, + struct ssdfs_maptbl_fragment_desc *fdesc) +{ + u32 aligned_lebs_count; + u16 lebs_per_page; + u32 pebs_count; + u32 aligned_pebs_count, aligned_stripe_pebs; + u16 pebs_per_page; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !area || !fdesc); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + BUG_ON(!rwsem_is_locked(&fdesc->lock)); + + SSDFS_DBG("portion_id %u, tbl %p, " + "area %p, fdesc %p\n", + area->portion_id, tbl, area, fdesc); +#endif /* CONFIG_SSDFS_DEBUG */ + + fdesc->start_leb = (u64)area->portion_id * tbl->lebs_per_fragment; + fdesc->lebs_count = (u32)min_t(u64, (u64)tbl->lebs_per_fragment, + tbl->lebs_count - fdesc->start_leb); + + lebs_per_page = SSDFS_LEB_DESC_PER_FRAGMENT(PAGE_SIZE); + aligned_lebs_count = fdesc->lebs_count + lebs_per_page - 1; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON((aligned_lebs_count / lebs_per_page) >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + fdesc->lebtbl_pages = (u16)(aligned_lebs_count / lebs_per_page); + + fdesc->lebs_per_page = lebs_per_page; + + pebs_count = fdesc->lebs_count; + pebs_per_page = SSDFS_PEB_DESC_PER_FRAGMENT(PAGE_SIZE); + + aligned_pebs_count = pebs_count + + (pebs_count % tbl->stripes_per_fragment); + aligned_stripe_pebs = aligned_pebs_count / tbl->stripes_per_fragment; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(((aligned_stripe_pebs + pebs_per_page - 1) / + pebs_per_page) >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + fdesc->stripe_pages = (aligned_stripe_pebs + pebs_per_page - 1) / + pebs_per_page; + + fdesc->pebs_per_page = pebs_per_page; +} + +/* + * ssdfs_maptbl_check_lebtbl_page() - check LEB table's page + * @page: memory page with LEB table's fragment + * @portion_id: portion identification number + * @fragment_id: portion's fragment identification number + * @fdesc: mapping table's fragment descriptor + * @page_index: index of page inside of LEB table + * @lebs_per_fragment: pointer on counter of LEBs in fragment [in|out] + * + * This method checks LEB table's page. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-EIO - fragment's LEB table is corrupted. + */ +static +int ssdfs_maptbl_check_lebtbl_page(struct page *page, + u16 portion_id, u16 fragment_id, + struct ssdfs_maptbl_fragment_desc *fdesc, + int page_index, + u16 *lebs_per_fragment) +{ + void *kaddr; + struct ssdfs_leb_table_fragment_header *hdr; + u32 bytes_count; + __le32 csum; + u64 start_leb; + u16 lebs_count, mapped_lebs, migrating_lebs; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page || !fdesc || !lebs_per_fragment); + BUG_ON(*lebs_per_fragment == U16_MAX); + + if (page_index >= fdesc->lebtbl_pages) { + SSDFS_ERR("page_index %d >= fdesc->lebtbl_pages %u\n", + page_index, fdesc->lebtbl_pages); + return -EINVAL; + } + + SSDFS_DBG("page %p, portion_id %u, fragment_id %u, " + "fdesc %p, page_index %d, " + "lebs_per_fragment %u\n", + page, portion_id, fragment_id, + fdesc, page_index, + *lebs_per_fragment); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + hdr = (struct ssdfs_leb_table_fragment_header *)kaddr; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PAGE DUMP: page_index %u\n", + page_index); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, + PAGE_SIZE); + SSDFS_DBG("\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (le16_to_cpu(hdr->magic) != SSDFS_LEB_TABLE_MAGIC) { + err = -EIO; + SSDFS_ERR("invalid LEB table's magic signature: " + "page_index %d\n", + page_index); + goto finish_lebtbl_check; + } + + bytes_count = le32_to_cpu(hdr->bytes_count); + if (bytes_count > PAGE_SIZE) { + err = -EIO; + SSDFS_ERR("invalid bytes_count %u\n", + bytes_count); + goto finish_lebtbl_check; + } + + csum = hdr->checksum; + hdr->checksum = 0; + hdr->checksum = ssdfs_crc32_le(kaddr, bytes_count); + if (hdr->checksum != csum) { + err = -EIO; + SSDFS_ERR("hdr->checksum %u != csum %u\n", + le32_to_cpu(hdr->checksum), + le32_to_cpu(csum)); + hdr->checksum = csum; + goto finish_lebtbl_check; + } + + if (le16_to_cpu(hdr->portion_id) != portion_id || + le16_to_cpu(hdr->fragment_id) != fragment_id) { + err = -EIO; + SSDFS_ERR("hdr->portion_id %u != portion_id %u OR " + "hdr->fragment_id %u != fragment_id %u\n", + le16_to_cpu(hdr->portion_id), + portion_id, + le16_to_cpu(hdr->fragment_id), + fragment_id); + goto finish_lebtbl_check; + } + + if (hdr->flags != 0) { + err = -EIO; + SSDFS_ERR("unsupported flags %#x\n", + le16_to_cpu(hdr->flags)); + goto finish_lebtbl_check; + } + + start_leb = fdesc->start_leb + ((u64)fdesc->lebs_per_page * page_index); + if (start_leb != le64_to_cpu(hdr->start_leb)) { + err = -EIO; + SSDFS_ERR("hdr->start_leb %llu != start_leb %llu\n", + le64_to_cpu(hdr->start_leb), + start_leb); + goto finish_lebtbl_check; + } + + lebs_count = le16_to_cpu(hdr->lebs_count); + mapped_lebs = le16_to_cpu(hdr->mapped_lebs); + migrating_lebs = le16_to_cpu(hdr->migrating_lebs); + + if (lebs_count > fdesc->lebs_per_page) { + err = -EIO; + SSDFS_ERR("lebs_count %u > fdesc->lebs_per_page %u\n", + lebs_count, fdesc->lebs_per_page); + goto finish_lebtbl_check; + } + + if (lebs_count < (mapped_lebs + migrating_lebs)) { + err = -EIO; + SSDFS_ERR("lebs_count %u, mapped_lebs %u, migrating_lebs %u\n", + lebs_count, mapped_lebs, migrating_lebs); + goto finish_lebtbl_check; + } + + fdesc->mapped_lebs += mapped_lebs; + fdesc->migrating_lebs += migrating_lebs; + + *lebs_per_fragment += lebs_count; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("mapped_lebs %u, migrating_lebs %u\n", + fdesc->mapped_lebs, fdesc->migrating_lebs); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_lebtbl_check: + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + return err; +} + +/* + * ssdfs_maptbl_check_pebtbl_page() - check page in stripe of PEB table + * @pebc: pointer on PEB container + * @page: memory page with PEB table's fragment + * @portion_id: portion identification number + * @fragment_id: portion's fragment identification number + * @fdesc: mapping table's fragment descriptor + * @stripe_id: PEB table's stripe identification number + * @page_index: index of page inside of PEB table's stripe + * @pebs_per_fragment: pointer on counter of PEBs in fragment [in|out] + * + * This method checks PEB table's page. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-EIO - fragment's PEB table is corrupted. + */ +static +int ssdfs_maptbl_check_pebtbl_page(struct ssdfs_peb_container *pebc, + struct page *page, + u16 portion_id, u16 fragment_id, + struct ssdfs_maptbl_fragment_desc *fdesc, + int stripe_id, + int page_index, + u16 *pebs_per_fragment) +{ + struct ssdfs_fs_info *fsi; + void *kaddr; + struct ssdfs_peb_table_fragment_header *hdr; + u32 bytes_count; + __le32 csum; + u16 pebs_count; + u16 reserved_pebs; + u16 used_pebs; + u16 unused_pebs = 0; + unsigned long *bmap; + int pre_erase_pebs, recovering_pebs; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !page || !fdesc || !pebs_per_fragment); + BUG_ON(*pebs_per_fragment == U16_MAX); + + if (page_index >= fdesc->stripe_pages) { + SSDFS_ERR("page_index %d >= fdesc->stripe_pages %u\n", + page_index, fdesc->stripe_pages); + return -EINVAL; + } + + SSDFS_DBG("seg %llu, peb_index %u\n", + pebc->parent_si->seg_id, + pebc->peb_index); + SSDFS_DBG("page %p, portion_id %u, fragment_id %u, " + "fdesc %p, stripe_id %d, page_index %d, " + "pebs_per_fragment %u\n", + page, portion_id, fragment_id, + fdesc, stripe_id, page_index, + *pebs_per_fragment); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = pebc->parent_si->fsi; + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + hdr = (struct ssdfs_peb_table_fragment_header *)kaddr; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PAGE DUMP: page_index %u\n", + page_index); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, + PAGE_SIZE); + SSDFS_DBG("\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (le16_to_cpu(hdr->magic) != SSDFS_PEB_TABLE_MAGIC) { + err = -EIO; + SSDFS_ERR("invalid PEB table's magic signature: " + "stripe_id %d, page_index %d\n", + stripe_id, page_index); + goto finish_pebtbl_check; + } + + bytes_count = le32_to_cpu(hdr->bytes_count); + if (bytes_count > PAGE_SIZE) { + err = -EIO; + SSDFS_ERR("invalid bytes_count %u\n", + bytes_count); + goto finish_pebtbl_check; + } + + csum = hdr->checksum; + hdr->checksum = 0; + hdr->checksum = ssdfs_crc32_le(kaddr, bytes_count); + if (hdr->checksum != csum) { + err = -EIO; + SSDFS_ERR("hdr->checksum %u != csum %u\n", + le32_to_cpu(hdr->checksum), + le32_to_cpu(csum)); + hdr->checksum = csum; + goto finish_pebtbl_check; + } + + if (le16_to_cpu(hdr->portion_id) != portion_id || + le16_to_cpu(hdr->fragment_id) != fragment_id) { + err = -EIO; + SSDFS_ERR("hdr->portion_id %u != portion_id %u OR " + "hdr->fragment_id %u != fragment_id %u\n", + le16_to_cpu(hdr->portion_id), + portion_id, + le16_to_cpu(hdr->fragment_id), + fragment_id); + goto finish_pebtbl_check; + } + + if (hdr->flags != 0) { + err = -EIO; + SSDFS_ERR("unsupported flags %#x\n", + hdr->flags); + goto finish_pebtbl_check; + } + + if (le16_to_cpu(hdr->stripe_id) != stripe_id) { + err = -EIO; + SSDFS_ERR("hdr->stripe_id %u != stripe_id %d\n", + le16_to_cpu(hdr->stripe_id), + stripe_id); + goto finish_pebtbl_check; + } + + pebs_count = le16_to_cpu(hdr->pebs_count); + reserved_pebs = le16_to_cpu(hdr->reserved_pebs); + fdesc->reserved_pebs += reserved_pebs; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hdr->start_peb %llu, hdr->pebs_count %u\n", + le64_to_cpu(hdr->start_peb), pebs_count); + SSDFS_DBG("hdr->reserved_pebs %u\n", reserved_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pebs_count > fdesc->pebs_per_page) { + err = -EIO; + SSDFS_ERR("pebs_count %u > fdesc->pebs_per_page %u\n", + pebs_count, fdesc->pebs_per_page); + goto finish_pebtbl_check; + } + + bmap = (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_USED_BMAP][0]; + used_pebs = bitmap_weight(bmap, pebs_count); + + if (used_pebs > pebs_count) { + err = -EIO; + SSDFS_ERR("used_pebs %u > pebs_count %u\n", + used_pebs, pebs_count); + goto finish_pebtbl_check; + } + + bmap = (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_DIRTY_BMAP][0]; + pre_erase_pebs = bitmap_weight(bmap, pebs_count); + fdesc->pre_erase_pebs += pre_erase_pebs; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment_id %u, stripe_id %u, pre_erase_pebs %u\n", + fragment_id, stripe_id, fdesc->pre_erase_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + + bmap = (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_RECOVER_BMAP][0]; + recovering_pebs = bitmap_weight(bmap, pebs_count); + fdesc->recovering_pebs += recovering_pebs; + + *pebs_per_fragment += pebs_count; + + unused_pebs = pebs_count - used_pebs; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pebs_count %u, used_pebs %u, " + "unused_pebs %u, reserved_pebs %u\n", + pebs_count, used_pebs, + unused_pebs, reserved_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unused_pebs < reserved_pebs) { + err = -EIO; + SSDFS_ERR("unused_pebs %u < reserved_pebs %u\n", + unused_pebs, reserved_pebs); + goto finish_pebtbl_check; + } + + unused_pebs -= reserved_pebs; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pebs_count %u, used_pebs %u, " + "reserved_pebs %u, unused_pebs %u\n", + pebs_count, used_pebs, + reserved_pebs, unused_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_pebtbl_check: + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + if (!err) { + u32 unused_lebs; + u64 free_pages; + u64 unused_pages = 0; + u32 threshold; + + unused_lebs = ssdfs_unused_lebs_in_fragment(fdesc); + threshold = ssdfs_lebs_reservation_threshold(fdesc); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unused_pebs %u, unused_lebs %u, threshold %u\n", + unused_pebs, unused_lebs, threshold); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unused_lebs > threshold) { + unused_pages = (u64)unused_pebs * fsi->pages_per_peb; + + spin_lock(&fsi->volume_state_lock); + fsi->free_pages += unused_pages; + free_pages = fsi->free_pages; + spin_unlock(&fsi->volume_state_lock); + } else { +#ifdef CONFIG_SSDFS_DEBUG + spin_lock(&fsi->volume_state_lock); + free_pages = fsi->free_pages; + spin_unlock(&fsi->volume_state_lock); +#endif /* CONFIG_SSDFS_DEBUG */ + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg %llu, peb_index %u, " + "free_pages %llu, unused_pages %llu\n", + pebc->parent_si->seg_id, + pebc->peb_index, free_pages, unused_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return err; +} + +/* + * ssdfs_maptbl_move_fragment_pages() - move fragment's pages + * @req: segment request + * @area: fragment's pages + * @pages_count: pages count in area + */ +void ssdfs_maptbl_move_fragment_pages(struct ssdfs_segment_request *req, + struct ssdfs_maptbl_area *area, + u16 pages_count) +{ + struct page *page; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req || !area); + + SSDFS_DBG("req %p, area %p\n", + req, area); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < pages_count; i++) { + page = req->result.pvec.pages[i]; + area->pages[area->pages_count] = page; + area->pages_count++; + ssdfs_map_tbl_account_page(page); + ssdfs_request_unlock_and_remove_page(req, i); + } + +#ifdef CONFIG_SSDFS_DEBUG + for (i = 0; i < pagevec_count(&req->result.pvec); i++) { + page = req->result.pvec.pages[i]; + + if (page) { + SSDFS_ERR("page %d is valid\n", i); + BUG_ON(page); + } + } +#endif /* CONFIG_SSDFS_DEBUG */ + + pagevec_reinit(&req->result.pvec); +} + +/* + * ssdfs_maptbl_fragment_init() - init mapping table's fragment + * @pebc: pointer on PEB container + * @area: mapping table's area descriptor + * + * This method tries to initialize mapping table's fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EIO - fragment is corrupted. + */ +int ssdfs_maptbl_fragment_init(struct ssdfs_peb_container *pebc, + struct ssdfs_maptbl_area *area) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_peb_mapping_table *tbl; + struct ssdfs_maptbl_fragment_desc *fdesc; + int state; + u16 lebs_per_fragment = 0, pebs_per_fragment = 0; + u32 calculated; + int page_index; + int fragment_id; + int i, j; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!pebc->parent_si->fsi->maptbl || !area); + BUG_ON(!rwsem_is_locked(&pebc->parent_si->fsi->maptbl->tbl_lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg %llu, peb_index %u, portion_id %u, " + "pages_count %zu, pages_capacity %zu\n", + pebc->parent_si->seg_id, + pebc->peb_index, area->portion_id, + area->pages_count, area->pages_capacity); +#else + SSDFS_DBG("seg %llu, peb_index %u, portion_id %u, " + "pages_count %zu, pages_capacity %zu\n", + pebc->parent_si->seg_id, + pebc->peb_index, area->portion_id, + area->pages_count, area->pages_capacity); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + fsi = pebc->parent_si->fsi; + tbl = fsi->maptbl; + + if (area->pages_count > area->pages_capacity) { + SSDFS_ERR("area->pages_count %zu > area->pages_capacity %zu\n", + area->pages_count, + area->pages_capacity); + return -EINVAL; + } + + if (area->pages_count > tbl->fragment_pages) { + SSDFS_ERR("area->pages_count %zu > tbl->fragment_pages %u\n", + area->pages_count, + tbl->fragment_pages); + return -EINVAL; + } + + if (area->portion_id >= tbl->fragments_count) { + SSDFS_ERR("invalid index: " + "portion_id %u, fragment_count %u\n", + area->portion_id, + tbl->fragments_count); + return -EINVAL; + } + + fdesc = &tbl->desc_array[area->portion_id]; + + state = atomic_read(&fdesc->state); + if (state != SSDFS_MAPTBL_FRAG_CREATED) { + SSDFS_ERR("invalid fragment state %#x\n", state); + return -ERANGE; + } + + down_write(&fdesc->lock); + + ssdfs_maptbl_fragment_desc_init(tbl, area, fdesc); + + calculated = fdesc->lebtbl_pages; + calculated += fdesc->stripe_pages * tbl->stripes_per_fragment; + if (calculated != area->pages_count) { + err = -EIO; + SSDFS_ERR("calculated %u != area->pages_count %zu\n", + calculated, area->pages_count); + goto finish_fragment_init; + } + + page_index = 0; + + for (i = 0; i < fdesc->lebtbl_pages; i++) { + struct page *page; + + if (page_index >= area->pages_count) { + err = -ERANGE; + SSDFS_ERR("page_index %d >= pages_count %zu\n", + page_index, area->pages_count); + goto finish_fragment_init; + } + + page = area->pages[page_index]; + if (!page) { + err = -ERANGE; + SSDFS_ERR("page %d is absent\n", i); + goto finish_fragment_init; + } + + err = ssdfs_maptbl_check_lebtbl_page(page, + area->portion_id, i, + fdesc, i, + &lebs_per_fragment); + if (unlikely(err)) { + SSDFS_ERR("maptbl's page %d is corrupted: " + "err %d\n", + page_index, err); + goto finish_fragment_init; + } + + page_index++; + } + + if (fdesc->lebs_count < (fdesc->mapped_lebs + fdesc->migrating_lebs)) { + err = -EIO; + SSDFS_ERR("lebs_count %u, mapped_lebs %u, migratind_lebs %u\n", + fdesc->lebs_count, + fdesc->mapped_lebs, + fdesc->migrating_lebs); + goto finish_fragment_init; + } + + if (fdesc->lebs_count < fdesc->pre_erase_pebs) { + err = -EIO; + SSDFS_ERR("lebs_count %u, pre_erase_pebs %u\n", + fdesc->lebs_count, + fdesc->pre_erase_pebs); + goto finish_fragment_init; + } + + for (i = 0, fragment_id = 0; i < tbl->stripes_per_fragment; i++) { + for (j = 0; j < fdesc->stripe_pages; j++) { + struct page *page; + + if (page_index >= area->pages_count) { + err = -ERANGE; + SSDFS_ERR("page_index %d >= pages_count %zu\n", + page_index, area->pages_count); + goto finish_fragment_init; + } + + page = area->pages[page_index]; + if (!page) { + err = -ERANGE; + SSDFS_ERR("page %d is absent\n", i); + goto finish_fragment_init; + } + + err = ssdfs_maptbl_check_pebtbl_page(pebc, page, + area->portion_id, + fragment_id, + fdesc, i, j, + &pebs_per_fragment); + if (unlikely(err)) { + SSDFS_ERR("maptbl's page %d is corrupted: " + "err %d\n", + page_index, err); + goto finish_fragment_init; + } + + page_index++; + fragment_id++; + } + } + + if (lebs_per_fragment > pebs_per_fragment) { + err = -EIO; + SSDFS_ERR("lebs_per_fragment %u > pebs_per_fragment %u\n", + lebs_per_fragment, pebs_per_fragment); + goto finish_fragment_init; + } else if (lebs_per_fragment < pebs_per_fragment) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("lebs_per_fragment %u < pebs_per_fragment %u\n", + lebs_per_fragment, pebs_per_fragment); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + if (lebs_per_fragment > tbl->lebs_per_fragment || + lebs_per_fragment != fdesc->lebs_count) { + err = -EIO; + SSDFS_ERR("lebs_per_fragment %u, tbl->lebs_per_fragment %u, " + "fdesc->lebs_count %u\n", + lebs_per_fragment, + tbl->lebs_per_fragment, + fdesc->lebs_count); + goto finish_fragment_init; + } + + if (pebs_per_fragment > tbl->pebs_per_fragment || + fdesc->lebs_count > pebs_per_fragment) { + err = -EIO; + SSDFS_ERR("pebs_per_fragment %u, tbl->pebs_per_fragment %u, " + "fdesc->lebs_count %u\n", + pebs_per_fragment, + tbl->pebs_per_fragment, + fdesc->lebs_count); + goto finish_fragment_init; + } + + for (i = 0; i < area->pages_count; i++) { + struct page *page; + + if (i >= area->pages_count) { + err = -ERANGE; + SSDFS_ERR("page_index %d >= pages_count %zu\n", + i, area->pages_count); + goto finish_fragment_init; + } + + page = area->pages[i]; + if (!page) { + err = -ERANGE; + SSDFS_ERR("page %d is absent\n", i); + goto finish_fragment_init; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page_index %d, page %p\n", + i, page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + err = ssdfs_page_array_add_page(&fdesc->array, + page, i); + ssdfs_unlock_page(page); + + if (unlikely(err)) { + SSDFS_ERR("fail to add page %d: err %d\n", + i, err); + goto finish_fragment_init; + } + + ssdfs_map_tbl_forget_page(page); + area->pages[i] = NULL; + } + +finish_fragment_init: + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment init failed: portion_id %u\n", + area->portion_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_cmpxchg(&fdesc->state, + SSDFS_MAPTBL_FRAG_CREATED, + SSDFS_MAPTBL_FRAG_INIT_FAILED); + if (state != SSDFS_MAPTBL_FRAG_CREATED) { + /* don't change error code */ + SSDFS_WARN("invalid fragment state %#x\n", state); + } + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment init finished; portion_id %u\n", + area->portion_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_cmpxchg(&fdesc->state, + SSDFS_MAPTBL_FRAG_CREATED, + SSDFS_MAPTBL_FRAG_INITIALIZED); + if (state != SSDFS_MAPTBL_FRAG_CREATED) { + err = -ERANGE; + SSDFS_ERR("invalid fragment state %#x\n", state); + } + } + + up_write(&fdesc->lock); + + complete_all(&fdesc->init_end); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} diff --git a/fs/ssdfs/peb_mapping_table.h b/fs/ssdfs/peb_mapping_table.h new file mode 100644 index 000000000000..89f9fcefc6fb --- /dev/null +++ b/fs/ssdfs/peb_mapping_table.h @@ -0,0 +1,699 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/peb_mapping_table.h - PEB mapping table declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#ifndef _SSDFS_PEB_MAPPING_TABLE_H +#define _SSDFS_PEB_MAPPING_TABLE_H + +#define SSDFS_MAPTBL_FIRST_PROTECTED_INDEX 0 +#define SSDFS_MAPTBL_PROTECTION_STEP 50 +#define SSDFS_MAPTBL_PROTECTION_RANGE 3 + +#define SSDFS_PRE_ERASE_PEB_THRESHOLD_PCT (3) +#define SSDFS_UNUSED_LEB_THRESHOLD_PCT (1) + +/* + * struct ssdfs_maptbl_fragment_desc - fragment descriptor + * @lock: fragment lock + * @state: fragment state + * @fragment_id: fragment's ID in the whole sequence + * @fragment_pages: count of memory pages in fragment + * @start_leb: start LEB of fragment + * @lebs_count: count of LEB descriptors in the whole fragment + * @lebs_per_page: count of LEB descriptors in memory page + * @lebtbl_pages: count of memory pages are used for LEBs description + * @pebs_per_page: count of PEB descriptors in memory page + * @stripe_pages: count of memory pages in one stripe + * @mapped_lebs: mapped LEBs count in the fragment + * @migrating_lebs: migrating LEBs count in the fragment + * @reserved_pebs: count of reserved PEBs in fragment + * @pre_erase_pebs: count of PEBs in pre-erase state per fragment + * @recovering_pebs: count of recovering PEBs per fragment + * @array: fragment's memory pages + * @init_end: wait of init ending + * @flush_req1: main flush requests array + * @flush_req2: backup flush requests array + * @flush_req_count: number of flush requests in the array + * @flush_seq_size: flush requests' array capacity + */ +struct ssdfs_maptbl_fragment_desc { + struct rw_semaphore lock; + atomic_t state; + + u32 fragment_id; + u32 fragment_pages; + + u64 start_leb; + u32 lebs_count; + + u16 lebs_per_page; + u16 lebtbl_pages; + + u16 pebs_per_page; + u16 stripe_pages; + + u32 mapped_lebs; + u32 migrating_lebs; + u32 reserved_pebs; + u32 pre_erase_pebs; + u32 recovering_pebs; + + struct ssdfs_page_array array; + struct completion init_end; + + struct ssdfs_segment_request *flush_req1; + struct ssdfs_segment_request *flush_req2; + u32 flush_req_count; + u32 flush_seq_size; +}; + +/* Fragment's state */ +enum { + SSDFS_MAPTBL_FRAG_CREATED = 0, + SSDFS_MAPTBL_FRAG_INIT_FAILED = 1, + SSDFS_MAPTBL_FRAG_INITIALIZED = 2, + SSDFS_MAPTBL_FRAG_DIRTY = 3, + SSDFS_MAPTBL_FRAG_TOWRITE = 4, + SSDFS_MAPTBL_FRAG_STATE_MAX = 5, +}; + +/* + * struct ssdfs_maptbl_area - mapping table area + * @portion_id: sequential ID of mapping table fragment + * @pages: array of memory page pointers + * @pages_capacity: capacity of array + * @pages_count: count of pages in array + */ +struct ssdfs_maptbl_area { + u16 portion_id; + struct page **pages; + size_t pages_capacity; + size_t pages_count; +}; + +/* + * struct ssdfs_peb_mapping_table - mapping table object + * @tbl_lock: mapping table lock + * @fragments_count: count of fragments + * @fragments_per_seg: count of fragments in segment + * @fragments_per_peb: count of fragments in PEB + * @fragment_bytes: count of bytes in one fragment + * @fragment_pages: count of memory pages in one fragment + * @flags: mapping table flags + * @lebs_count: count of LEBs are described by mapping table + * @pebs_count: count of PEBs are described by mapping table + * @lebs_per_fragment: count of LEB descriptors in fragment + * @pebs_per_fragment: count of PEB descriptors in fragment + * @pebs_per_stripe: count of PEB descriptors in stripe + * @stripes_per_fragment: count of stripes in fragment + * @extents: metadata extents that describe mapping table location + * @segs: array of pointers on segment objects + * @segs_count: count of segment objects are used for mapping table + * @state: mapping table's state + * @erase_op_state: state of erase operation + * @pre_erase_pebs: count of PEBs in pre-erase state + * @max_erase_ops: upper bound of erase operations for one iteration + * @erase_ops_end_wq: wait queue of threads are waiting end of erase operation + * @bmap_lock: dirty bitmap's lock + * @dirty_bmap: bitmap of dirty fragments + * @desc_array: array of fragment descriptors + * @wait_queue: wait queue of mapping table's thread + * @thread: descriptor of mapping table's thread + * @fsi: pointer on shared file system object + */ +struct ssdfs_peb_mapping_table { + struct rw_semaphore tbl_lock; + u32 fragments_count; + u16 fragments_per_seg; + u16 fragments_per_peb; + u32 fragment_bytes; + u32 fragment_pages; + atomic_t flags; + u64 lebs_count; + u64 pebs_count; + u16 lebs_per_fragment; + u16 pebs_per_fragment; + u16 pebs_per_stripe; + u16 stripes_per_fragment; + struct ssdfs_meta_area_extent extents[MAPTBL_LIMIT1][MAPTBL_LIMIT2]; + struct ssdfs_segment_info **segs[SSDFS_MAPTBL_SEG_COPY_MAX]; + u16 segs_count; + + atomic_t state; + + atomic_t erase_op_state; + atomic_t pre_erase_pebs; + atomic_t max_erase_ops; + wait_queue_head_t erase_ops_end_wq; + + atomic64_t last_peb_recover_cno; + + struct mutex bmap_lock; + unsigned long *dirty_bmap; + struct ssdfs_maptbl_fragment_desc *desc_array; + + wait_queue_head_t wait_queue; + struct ssdfs_thread_info thread; + struct ssdfs_fs_info *fsi; +}; + +/* PEB mapping table's state */ +enum { + SSDFS_MAPTBL_CREATED = 0, + SSDFS_MAPTBL_GOING_TO_BE_DESTROY = 1, + SSDFS_MAPTBL_STATE_MAX = 2, +}; + +/* + * struct ssdfs_maptbl_peb_descriptor - PEB descriptor + * @peb_id: PEB identification number + * @shared_peb_index: index of external shared destination PEB + * @erase_cycles: P/E cycles + * @type: PEB type + * @state: PEB state + * @flags: PEB flags + * @consistency: PEB state consistency type + */ +struct ssdfs_maptbl_peb_descriptor { + u64 peb_id; + u8 shared_peb_index; + u32 erase_cycles; + u8 type; + u8 state; + u8 flags; + u8 consistency; +}; + +/* + * struct ssdfs_maptbl_peb_relation - PEBs association + * @pebs: array of PEB descriptors + */ +struct ssdfs_maptbl_peb_relation { + struct ssdfs_maptbl_peb_descriptor pebs[SSDFS_MAPTBL_RELATION_MAX]; +}; + +/* + * Erase operation state + */ +enum { + SSDFS_MAPTBL_NO_ERASE, + SSDFS_MAPTBL_ERASE_IN_PROGRESS +}; + +/* Stage of recovering try */ +enum { + SSDFS_CHECK_RECOVERABILITY, + SSDFS_MAKE_RECOVERING, + SSDFS_RECOVER_STAGE_MAX +}; + +/* Possible states of erase operation */ +enum { + SSDFS_ERASE_RESULT_UNKNOWN, + SSDFS_ERASE_DONE, + SSDFS_ERASE_SB_PEB_DONE, + SSDFS_IGNORE_ERASE, + SSDFS_ERASE_FAILURE, + SSDFS_BAD_BLOCK_DETECTED, + SSDFS_ERASE_RESULT_MAX +}; + +/* + * struct ssdfs_erase_result - PEB's erase operation result + * @fragment_index: index of mapping table's fragment + * @peb_index: PEB's index in fragment + * @peb_id: PEB ID number + * @state: state of erase operation + */ +struct ssdfs_erase_result { + u32 fragment_index; + u16 peb_index; + u64 peb_id; + int state; +}; + +/* + * struct ssdfs_erase_result_array - array of erase operation results + * @ptr: pointer on memory buffer + * @capacity: maximal number of erase operation results in array + * @size: count of erase operation results in array + */ +struct ssdfs_erase_result_array { + struct ssdfs_erase_result *ptr; + u32 capacity; + u32 size; +}; + +#define SSDFS_ERASE_RESULTS_PER_FRAGMENT (10) + +/* + * Inline functions + */ + +/* + * SSDFS_ERASE_RESULT_INIT() - init erase result + * @fragment_index: index of mapping table's fragment + * @peb_index: PEB's index in fragment + * @peb_id: PEB ID number + * @state: state of erase operation + * @result: erase operation result [out] + * + * This method initializes the erase operation result. + */ +static inline +void SSDFS_ERASE_RESULT_INIT(u32 fragment_index, u16 peb_index, + u64 peb_id, int state, + struct ssdfs_erase_result *result) +{ + result->fragment_index = fragment_index; + result->peb_index = peb_index; + result->peb_id = peb_id; + result->state = state; +} + +/* + * DEFINE_PEB_INDEX_IN_FRAGMENT() - define PEB index in the whole fragment + * @fdesc: fragment descriptor + * @page_index: page index in the fragment + * @item_index: item index in the memory page + */ +static inline +u16 DEFINE_PEB_INDEX_IN_FRAGMENT(struct ssdfs_maptbl_fragment_desc *fdesc, + pgoff_t page_index, + u16 item_index) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc); + BUG_ON(page_index < fdesc->lebtbl_pages); + + SSDFS_DBG("fdesc %p, page_index %lu, item_index %u\n", + fdesc, page_index, item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_index -= fdesc->lebtbl_pages; + page_index *= fdesc->pebs_per_page; + page_index += item_index; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(page_index >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + return (u16)page_index; +} + +/* + * GET_PEB_ID() - define PEB ID for the index + * @kaddr: pointer on memory page's content + * @item_index: item index inside of the page + * + * This method tries to convert @item_index into + * PEB ID value. + * + * RETURN: + * [success] - PEB ID + * [failure] - U64_MAX + */ +static inline +u64 GET_PEB_ID(void *kaddr, u16 item_index) +{ + struct ssdfs_peb_table_fragment_header *hdr; + u64 start_peb; + u16 pebs_count; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!kaddr); + + SSDFS_DBG("kaddr %p, item_index %u\n", + kaddr, item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr = (struct ssdfs_peb_table_fragment_header *)kaddr; + + if (le16_to_cpu(hdr->magic) != SSDFS_PEB_TABLE_MAGIC) { + SSDFS_ERR("corrupted page\n"); + return U64_MAX; + } + + start_peb = le64_to_cpu(hdr->start_peb); + pebs_count = le16_to_cpu(hdr->pebs_count); + + if (item_index >= pebs_count) { + SSDFS_ERR("item_index %u >= pebs_count %u\n", + item_index, pebs_count); + return U64_MAX; + } + + return start_peb + item_index; +} + +/* + * PEBTBL_PAGE_INDEX() - define PEB table page index + * @fdesc: fragment descriptor + * @peb_index: index of PEB in the fragment + */ +static inline +pgoff_t PEBTBL_PAGE_INDEX(struct ssdfs_maptbl_fragment_desc *fdesc, + u16 peb_index) +{ + pgoff_t page_index; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc); + + SSDFS_DBG("fdesc %p, peb_index %u\n", + fdesc, peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_index = fdesc->lebtbl_pages; + page_index += peb_index / fdesc->pebs_per_page; + return page_index; +} + +/* + * GET_PEB_DESCRIPTOR() - retrieve PEB descriptor + * @kaddr: pointer on memory page's content + * @item_index: item index inside of the page + * + * This method tries to return the pointer on + * PEB descriptor for @item_index. + * + * RETURN: + * [success] - pointer on PEB descriptor + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static inline +struct ssdfs_peb_descriptor *GET_PEB_DESCRIPTOR(void *kaddr, u16 item_index) +{ + struct ssdfs_peb_table_fragment_header *hdr; + u16 pebs_count; + u32 peb_desc_off; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!kaddr); + + SSDFS_DBG("kaddr %p, item_index %u\n", + kaddr, item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr = (struct ssdfs_peb_table_fragment_header *)kaddr; + + if (le16_to_cpu(hdr->magic) != SSDFS_PEB_TABLE_MAGIC) { + SSDFS_ERR("corrupted page\n"); + return ERR_PTR(-ERANGE); + } + + pebs_count = le16_to_cpu(hdr->pebs_count); + + if (item_index >= pebs_count) { + SSDFS_ERR("item_index %u >= pebs_count %u\n", + item_index, pebs_count); + return ERR_PTR(-ERANGE); + } + + peb_desc_off = SSDFS_PEBTBL_FRAGMENT_HDR_SIZE; + peb_desc_off += item_index * sizeof(struct ssdfs_peb_descriptor); + + if (peb_desc_off >= PAGE_SIZE) { + SSDFS_ERR("invalid offset %u\n", peb_desc_off); + return ERR_PTR(-ERANGE); + } + + return (struct ssdfs_peb_descriptor *)((u8 *)kaddr + peb_desc_off); +} + +/* + * SEG2PEB_TYPE() - convert segment into PEB type + */ +static inline +int SEG2PEB_TYPE(int seg_type) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_type %d\n", seg_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (seg_type) { + case SSDFS_USER_DATA_SEG_TYPE: + return SSDFS_MAPTBL_DATA_PEB_TYPE; + + case SSDFS_LEAF_NODE_SEG_TYPE: + return SSDFS_MAPTBL_LNODE_PEB_TYPE; + + case SSDFS_HYBRID_NODE_SEG_TYPE: + return SSDFS_MAPTBL_HNODE_PEB_TYPE; + + case SSDFS_INDEX_NODE_SEG_TYPE: + return SSDFS_MAPTBL_IDXNODE_PEB_TYPE; + + case SSDFS_INITIAL_SNAPSHOT_SEG_TYPE: + return SSDFS_MAPTBL_INIT_SNAP_PEB_TYPE; + + case SSDFS_SB_SEG_TYPE: + return SSDFS_MAPTBL_SBSEG_PEB_TYPE; + + case SSDFS_SEGBMAP_SEG_TYPE: + return SSDFS_MAPTBL_SEGBMAP_PEB_TYPE; + + case SSDFS_MAPTBL_SEG_TYPE: + return SSDFS_MAPTBL_MAPTBL_PEB_TYPE; + } + + return SSDFS_MAPTBL_PEB_TYPE_MAX; +} + +/* + * PEB2SEG_TYPE() - convert PEB into segment type + */ +static inline +int PEB2SEG_TYPE(int peb_type) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_type %d\n", peb_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (peb_type) { + case SSDFS_MAPTBL_DATA_PEB_TYPE: + return SSDFS_USER_DATA_SEG_TYPE; + + case SSDFS_MAPTBL_LNODE_PEB_TYPE: + return SSDFS_LEAF_NODE_SEG_TYPE; + + case SSDFS_MAPTBL_HNODE_PEB_TYPE: + return SSDFS_HYBRID_NODE_SEG_TYPE; + + case SSDFS_MAPTBL_IDXNODE_PEB_TYPE: + return SSDFS_INDEX_NODE_SEG_TYPE; + + case SSDFS_MAPTBL_INIT_SNAP_PEB_TYPE: + return SSDFS_INITIAL_SNAPSHOT_SEG_TYPE; + + case SSDFS_MAPTBL_SBSEG_PEB_TYPE: + return SSDFS_SB_SEG_TYPE; + + case SSDFS_MAPTBL_SEGBMAP_PEB_TYPE: + return SSDFS_SEGBMAP_SEG_TYPE; + + case SSDFS_MAPTBL_MAPTBL_PEB_TYPE: + return SSDFS_MAPTBL_SEG_TYPE; + } + + return SSDFS_UNKNOWN_SEG_TYPE; +} + +static inline +bool is_ssdfs_maptbl_under_flush(struct ssdfs_fs_info *fsi) +{ + return atomic_read(&fsi->maptbl->flags) & SSDFS_MAPTBL_UNDER_FLUSH; +} + +/* + * is_peb_protected() - check that PEB is protected + * @found_item: PEB index in the fragment + */ +static inline +bool is_peb_protected(unsigned long found_item) +{ + unsigned long remainder; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found_item %lu\n", found_item); +#endif /* CONFIG_SSDFS_DEBUG */ + + remainder = found_item % SSDFS_MAPTBL_PROTECTION_STEP; + return remainder == 0; +} + +static inline +bool is_ssdfs_maptbl_going_to_be_destroyed(struct ssdfs_peb_mapping_table *tbl) +{ + return atomic_read(&tbl->state) == SSDFS_MAPTBL_GOING_TO_BE_DESTROY; +} + +static inline +void set_maptbl_going_to_be_destroyed(struct ssdfs_fs_info *fsi) +{ + atomic_set(&fsi->maptbl->state, SSDFS_MAPTBL_GOING_TO_BE_DESTROY); +} + +static inline +void ssdfs_account_updated_user_data_pages(struct ssdfs_fs_info *fsi, + u32 count) +{ +#ifdef CONFIG_SSDFS_DEBUG + u64 updated = 0; + + BUG_ON(!fsi); + + SSDFS_DBG("fsi %p, count %u\n", + fsi, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&fsi->volume_state_lock); + fsi->updated_user_data_pages += count; +#ifdef CONFIG_SSDFS_DEBUG + updated = fsi->updated_user_data_pages; +#endif /* CONFIG_SSDFS_DEBUG */ + spin_unlock(&fsi->volume_state_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("updated %llu\n", updated); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +/* + * PEB mapping table's API + */ +int ssdfs_maptbl_create(struct ssdfs_fs_info *fsi); +void ssdfs_maptbl_destroy(struct ssdfs_fs_info *fsi); +int ssdfs_maptbl_fragment_init(struct ssdfs_peb_container *pebc, + struct ssdfs_maptbl_area *area); +int ssdfs_maptbl_flush(struct ssdfs_peb_mapping_table *tbl); +int ssdfs_maptbl_resize(struct ssdfs_peb_mapping_table *tbl, + u64 new_pebs_count); + +int ssdfs_maptbl_convert_leb2peb(struct ssdfs_fs_info *fsi, + u64 leb_id, u8 peb_type, + struct ssdfs_maptbl_peb_relation *pebr, + struct completion **end); +int ssdfs_maptbl_map_leb2peb(struct ssdfs_fs_info *fsi, + u64 leb_id, u8 peb_type, + struct ssdfs_maptbl_peb_relation *pebr, + struct completion **end); +int ssdfs_maptbl_recommend_search_range(struct ssdfs_fs_info *fsi, + u64 *start_leb, + u64 *end_leb, + struct completion **end); +int ssdfs_maptbl_change_peb_state(struct ssdfs_fs_info *fsi, + u64 leb_id, u8 peb_type, + int peb_state, + struct completion **end); +int ssdfs_maptbl_prepare_pre_erase_state(struct ssdfs_fs_info *fsi, + u64 leb_id, u8 peb_type, + struct completion **end); +int ssdfs_maptbl_set_pre_erased_snapshot_peb(struct ssdfs_fs_info *fsi, + u64 peb_id, + struct completion **end); +int ssdfs_maptbl_add_migration_peb(struct ssdfs_fs_info *fsi, + u64 leb_id, u8 peb_type, + struct ssdfs_maptbl_peb_relation *pebr, + struct completion **end); +int ssdfs_maptbl_exclude_migration_peb(struct ssdfs_fs_info *fsi, + u64 leb_id, u8 peb_type, + u64 peb_create_time, + u64 last_log_time, + struct completion **end); +int ssdfs_maptbl_set_indirect_relation(struct ssdfs_peb_mapping_table *tbl, + u64 leb_id, u8 peb_type, + u64 dst_leb_id, u16 dst_peb_index, + struct completion **end); +int ssdfs_maptbl_break_indirect_relation(struct ssdfs_peb_mapping_table *tbl, + u64 leb_id, u8 peb_type, + u64 dst_leb_id, int dst_peb_refs, + struct completion **end); +int ssdfs_maptbl_set_zns_indirect_relation(struct ssdfs_peb_mapping_table *tbl, + u64 leb_id, u8 peb_type, + struct completion **end); +int ssdfs_maptbl_break_zns_indirect_relation(struct ssdfs_peb_mapping_table *tbl, + u64 leb_id, u8 peb_type, + struct completion **end); + +int ssdfs_reserve_free_pages(struct ssdfs_fs_info *fsi, + u32 count, int type); + +/* + * It makes sense to have special thread for the whole mapping table. + * The goal of the thread will be clearing of dirty PEBs, + * tracking P/E cycles, excluding bad PEBs and recovering PEBs + * in the background. Knowledge about PEBs will be hidden by + * mapping table. All other subsystems will operate by LEBs. + */ + +/* + * PEB mapping table's internal API + */ +int ssdfs_maptbl_start_thread(struct ssdfs_peb_mapping_table *tbl); +int ssdfs_maptbl_stop_thread(struct ssdfs_peb_mapping_table *tbl); + +int ssdfs_maptbl_define_fragment_info(struct ssdfs_fs_info *fsi, + u64 leb_id, + u16 *pebs_per_fragment, + u16 *pebs_per_stripe, + u16 *stripes_per_fragment); +struct ssdfs_maptbl_fragment_desc * +ssdfs_maptbl_get_fragment_descriptor(struct ssdfs_peb_mapping_table *tbl, + u64 leb_id); +void ssdfs_maptbl_set_fragment_dirty(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_fragment_desc *fdesc, + u64 leb_id); +int ssdfs_maptbl_solve_inconsistency(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_fragment_desc *fdesc, + u64 leb_id, + struct ssdfs_maptbl_peb_relation *pebr); +int ssdfs_maptbl_solve_pre_deleted_state(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_fragment_desc *fdesc, + u64 leb_id, + struct ssdfs_maptbl_peb_relation *pebr); +void ssdfs_maptbl_move_fragment_pages(struct ssdfs_segment_request *req, + struct ssdfs_maptbl_area *area, + u16 pages_count); +int ssdfs_maptbl_erase_peb(struct ssdfs_fs_info *fsi, + struct ssdfs_erase_result *result); +int ssdfs_maptbl_correct_dirty_peb(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_fragment_desc *fdesc, + struct ssdfs_erase_result *result); +int ssdfs_maptbl_erase_reserved_peb_now(struct ssdfs_fs_info *fsi, + u64 leb_id, u8 peb_type, + struct completion **end); + +#ifdef CONFIG_SSDFS_TESTING +int ssdfs_maptbl_erase_dirty_pebs_now(struct ssdfs_peb_mapping_table *tbl); +#else +static inline +int ssdfs_maptbl_erase_dirty_pebs_now(struct ssdfs_peb_mapping_table *tbl) +{ + SSDFS_ERR("function is not supported\n"); + return -EOPNOTSUPP; +} +#endif /* CONFIG_SSDFS_TESTING */ + +void ssdfs_debug_maptbl_object(struct ssdfs_peb_mapping_table *tbl); + +#endif /* _SSDFS_PEB_MAPPING_TABLE_H */ From patchwork Sat Feb 25 01:08:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151944 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4170FC64ED8 for ; Sat, 25 Feb 2023 01:17:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229773AbjBYBRx (ORCPT ); Fri, 24 Feb 2023 20:17:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49064 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229609AbjBYBRL (ORCPT ); Fri, 24 Feb 2023 20:17:11 -0500 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 320101285B for ; Fri, 24 Feb 2023 17:16:59 -0800 (PST) Received: by mail-oi1-x22b.google.com with SMTP id c11so825133oiw.2 for ; Fri, 24 Feb 2023 17:16:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=PcAuqLfB71E3lDrbDEz3uVRNCTkZ0qE5nA04DuoXHWs=; b=ot8vZtB+hww3G4SISDtXIlsEtPW/RPu2S7Io7w3nvjL3LEGCy7/Uj/3wsheNR71e95 JbM/ZYbuo85i/iPhSjptIgaYiY8IpvgPZcaGFjQ7iGHjzWZAMVxOw5s+IspkY/GDnpZ9 hnynKkES3wfKAuV3z9hWGUoSaZYmEKFu6QNvuUs/FjmsT1cwS9pSnkvzGvj10SmJnm7+ CnKxiEz/49VIvHmIkpcVHowfhjPCRNtYEH03bHKDjuhQbd2eHvSgoCzi8UHdmXDDcxi6 AeDfkAwGb21yK0UG2v58kl5toGsqcV8Gc3/otXVDAtyay5mMnLgGHwuQx23AioQujoG6 06xA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PcAuqLfB71E3lDrbDEz3uVRNCTkZ0qE5nA04DuoXHWs=; b=gXs8e11IVzrSe795JegIryMqzBZaqaAOrLsmiRe12vzPES3o5JGAxeYMZyPkfUmKZw koGHALICYkGiiMgaqUJIjx1FmLAedtSfu9alxGv1HgyXJA1VWyEagjJ5ARDJbXa8SpzG Rd70JLOqjRfYQ7AYebY/nWSnojL77NG9XdR/qm4S5uk1ediJ0CIVCfGzCd4zQ7HSGr+6 DPJFGbEhRYisFglKjMwaobWiHG110m+Pqm8uAnZRuW4MDUueNWnTExZ96uxTFqqG245/ mtdt6KS1ymQvCtm2FDIz94AGAkBssBkhm3hlrq8wCyzV8P6R4Q8U0gvmtDnM/S/NGouR cgKA== X-Gm-Message-State: AO0yUKV3GUeXL4hpRx2npQZv7Kjp0aB+7LHmcvo2MZNtr9DKVZtIqX8t fmSmi901L6sl2DxWj52U3eGS/2hPrKOpWaO/ X-Google-Smtp-Source: AK7set+k9kEnYJgLrhRvMRLrLTaH3VQ8hCoqd6mROGbrQpBTgEaD0tp2ndDxuQgbpz0fh/yPJciuAQ== X-Received: by 2002:a54:461a:0:b0:378:5c2e:d8b with SMTP id p26-20020a54461a000000b003785c2e0d8bmr4359634oip.25.1677287817485; Fri, 24 Feb 2023 17:16:57 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:56 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 39/76] ssdfs: flush PEB mapping table Date: Fri, 24 Feb 2023 17:08:50 -0800 Message-Id: <20230225010927.813929-40-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org "Physical" Erase Block (PEB) mapping table is represented by a sequence of fragments are distributed among several segments. Every map or unmap operation marks a fragment as dirty. Flush operation requires to check the dirty state of all fragments and to flush dirty fragments on the volume by means of creation of log(s) into PEB(s) is dedicated to store mapping table's content. Flush operation is executed in several steps: (1) prepare migration, (2) flush dirty fragments, (3) commit logs. Prepare migration operation is requested before mapping table flush with the goal to check the necessity to finish/start migration. Because, start/finish migration requires the modification of mapping table. However, mapping table's flush operation needs to be finished without any modifications of mapping table itself. Flush dirty fragments step implies the searching of dirty fragments and preparation of update requests for PEB(s) flush thread. Finally, commit log should be requested because metadata flush operation must be finished by storing new metadata state persistently. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/peb_mapping_table.c | 2811 ++++++++++++++++++++++++++++++++++ 1 file changed, 2811 insertions(+) diff --git a/fs/ssdfs/peb_mapping_table.c b/fs/ssdfs/peb_mapping_table.c index cd5835eb96a2..bfc11bb73360 100644 --- a/fs/ssdfs/peb_mapping_table.c +++ b/fs/ssdfs/peb_mapping_table.c @@ -1952,3 +1952,2814 @@ int ssdfs_maptbl_fragment_init(struct ssdfs_peb_container *pebc, return err; } + +/* + * ssdfs_sb_maptbl_header_correct_state() - save maptbl's state in superblock + * @tbl: mapping table object + */ +static +void ssdfs_sb_maptbl_header_correct_state(struct ssdfs_peb_mapping_table *tbl) +{ + struct ssdfs_maptbl_sb_header *hdr; + size_t bytes_count; + u32 flags = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + BUG_ON(!rwsem_is_locked(&tbl->fsi->volume_sem)); + + SSDFS_DBG("maptbl %p\n", tbl); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr = &tbl->fsi->vh->maptbl; + + hdr->fragments_count = cpu_to_le32(tbl->fragments_count); + hdr->fragment_bytes = cpu_to_le32(tbl->fragment_bytes); + hdr->last_peb_recover_cno = + cpu_to_le64(atomic64_read(&tbl->last_peb_recover_cno)); + hdr->lebs_count = cpu_to_le64(tbl->lebs_count); + hdr->pebs_count = cpu_to_le64(tbl->pebs_count); + hdr->fragments_per_seg = cpu_to_le16(tbl->fragments_per_seg); + hdr->fragments_per_peb = cpu_to_le16(tbl->fragments_per_peb); + + flags = atomic_read(&tbl->flags); + /* exclude run-time flags*/ + flags &= ~SSDFS_MAPTBL_UNDER_FLUSH; + hdr->flags = cpu_to_le16(flags); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(atomic_read(&tbl->pre_erase_pebs) >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + hdr->pre_erase_pebs = le16_to_cpu(atomic_read(&tbl->pre_erase_pebs)); + + hdr->lebs_per_fragment = cpu_to_le16(tbl->lebs_per_fragment); + hdr->pebs_per_fragment = cpu_to_le16(tbl->pebs_per_fragment); + hdr->pebs_per_stripe = cpu_to_le16(tbl->pebs_per_stripe); + hdr->stripes_per_fragment = cpu_to_le16(tbl->stripes_per_fragment); + + bytes_count = sizeof(struct ssdfs_meta_area_extent); + bytes_count *= SSDFS_MAPTBL_RESERVED_EXTENTS; + bytes_count *= SSDFS_MAPTBL_SEG_COPY_MAX; + ssdfs_memcpy(hdr->extents, 0, bytes_count, + tbl->fsi->vh->maptbl.extents, 0, bytes_count, + bytes_count); +} + +/* + * ssdfs_maptbl_copy_dirty_page() - copy dirty page into request + * @tbl: mapping table object + * @pvec: pagevec with dirty pages + * @spage_index: index of page in pagevec + * @dpage_index: index of page in request + * @req: segment request + * + * This method tries to copy dirty page into request. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_copy_dirty_page(struct ssdfs_peb_mapping_table *tbl, + struct pagevec *pvec, + int spage_index, int dpage_index, + struct ssdfs_segment_request *req) +{ + struct page *spage, *dpage; + void *kaddr1, *kaddr2; + struct ssdfs_leb_table_fragment_header *lhdr; + struct ssdfs_peb_table_fragment_header *phdr; + __le16 *magic; + __le32 csum; + u32 bytes_count; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !pvec || !req); + BUG_ON(spage_index >= pagevec_count(pvec)); + + SSDFS_DBG("maptbl %p, pvec %p, spage_index %d, " + "dpage_index %d, req %p\n", + tbl, pvec, spage_index, dpage_index, req); +#endif /* CONFIG_SSDFS_DEBUG */ + + spage = pvec->pages[spage_index]; + + ssdfs_lock_page(spage); + kaddr1 = kmap_local_page(spage); + + magic = (__le16 *)kaddr1; + if (*magic == cpu_to_le16(SSDFS_LEB_TABLE_MAGIC)) { + lhdr = (struct ssdfs_leb_table_fragment_header *)kaddr1; + bytes_count = le32_to_cpu(lhdr->bytes_count); + csum = lhdr->checksum; + lhdr->checksum = 0; + lhdr->checksum = ssdfs_crc32_le(kaddr1, bytes_count); + if (csum != lhdr->checksum) { + err = -ERANGE; + SSDFS_ERR("csum %#x != lhdr->checksum %#x\n", + le16_to_cpu(csum), + le16_to_cpu(lhdr->checksum)); + lhdr->checksum = csum; + goto end_copy_dirty_page; + } + } else if (*magic == cpu_to_le16(SSDFS_PEB_TABLE_MAGIC)) { + phdr = (struct ssdfs_peb_table_fragment_header *)kaddr1; + bytes_count = le32_to_cpu(phdr->bytes_count); + csum = phdr->checksum; + phdr->checksum = 0; + phdr->checksum = ssdfs_crc32_le(kaddr1, bytes_count); + if (csum != phdr->checksum) { + err = -ERANGE; + SSDFS_ERR("csum %#x != phdr->checksum %#x\n", + le16_to_cpu(csum), + le16_to_cpu(phdr->checksum)); + phdr->checksum = csum; + goto end_copy_dirty_page; + } + } else { + err = -ERANGE; + SSDFS_ERR("corrupted maptbl's page: index %lu\n", + spage->index); + goto end_copy_dirty_page; + } + + dpage = req->result.pvec.pages[dpage_index]; + + if (!dpage) { + err = -ERANGE; + SSDFS_ERR("invalid page: page_index %u\n", + dpage_index); + goto end_copy_dirty_page; + } + + kaddr2 = kmap_local_page(dpage); + ssdfs_memcpy(kaddr2, 0, PAGE_SIZE, + kaddr1, 0, PAGE_SIZE, + PAGE_SIZE); + flush_dcache_page(dpage); + kunmap_local(kaddr2); + + SetPageUptodate(dpage); + if (!PageDirty(dpage)) + SetPageDirty(dpage); + set_page_writeback(dpage); + +end_copy_dirty_page: + flush_dcache_page(spage); + kunmap_local(kaddr1); + ssdfs_unlock_page(spage); + + return err; +} + +/* + * ssdfs_maptbl_replicate_dirty_page() - replicate dirty page content + * @req1: source request + * @page_index: index of replicated page in @req1 + * @req2: destination request + */ +static +void ssdfs_maptbl_replicate_dirty_page(struct ssdfs_segment_request *req1, + int page_index, + struct ssdfs_segment_request *req2) +{ + struct page *spage, *dpage; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req1 || !req2); + BUG_ON(page_index >= pagevec_count(&req1->result.pvec)); + BUG_ON(page_index >= pagevec_count(&req2->result.pvec)); + + SSDFS_DBG("req1 %p, req2 %p, page_index %d\n", + req1, req2, page_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + spage = req1->result.pvec.pages[page_index]; + dpage = req2->result.pvec.pages[page_index]; + + ssdfs_memcpy_page(dpage, 0, PAGE_SIZE, + spage, 0, PAGE_SIZE, + PAGE_SIZE); + + SetPageUptodate(dpage); + if (!PageDirty(dpage)) + SetPageDirty(dpage); + set_page_writeback(dpage); +} + +/* + * ssdfs_check_portion_id() - check portion_id in the pagevec + * @pvec: checking pagevec + */ +static inline +int ssdfs_check_portion_id(struct pagevec *pvec) +{ + struct ssdfs_leb_table_fragment_header *lhdr; + struct ssdfs_peb_table_fragment_header *phdr; + u32 portion_id = U32_MAX; + void *kaddr; + __le16 *magic; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pvec); + + SSDFS_DBG("pvec %p\n", pvec); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pagevec_count(pvec) == 0) { + SSDFS_ERR("empty pagevec\n"); + return -EINVAL; + } + + for (i = 0; i < pagevec_count(pvec); i++) { + kaddr = kmap_local_page(pvec->pages[i]); + magic = (__le16 *)kaddr; + if (le16_to_cpu(*magic) == SSDFS_LEB_TABLE_MAGIC) { + lhdr = (struct ssdfs_leb_table_fragment_header *)kaddr; + if (portion_id == U32_MAX) + portion_id = le32_to_cpu(lhdr->portion_id); + else if (portion_id != le32_to_cpu(lhdr->portion_id)) + err = -ERANGE; + } else if (le16_to_cpu(*magic) == SSDFS_PEB_TABLE_MAGIC) { + phdr = (struct ssdfs_peb_table_fragment_header *)kaddr; + if (portion_id == U32_MAX) + portion_id = le32_to_cpu(phdr->portion_id); + else if (portion_id != le32_to_cpu(phdr->portion_id)) + err = -ERANGE; + } else { + err = -ERANGE; + SSDFS_ERR("corrupted maptbl's page: index %d\n", + i); + } + kunmap_local(kaddr); + + if (unlikely(err)) + return err; + } + + return 0; +} + +/* + * ssdfs_maptbl_define_volume_extent() - define volume extent for request + * @tbl: mapping table object + * @req: segment request + * @fragment: pointer on raw fragment + * @area_start: index of memeory page inside of fragment + * @pages_count: number of memory pages in the area + * @seg_index: index of segment in maptbl's array [out] + */ +static +int ssdfs_maptbl_define_volume_extent(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_segment_request *req, + void *fragment, + pgoff_t area_start, + u32 pages_count, + u16 *seg_index) +{ + struct ssdfs_leb_table_fragment_header *lhdr; + struct ssdfs_peb_table_fragment_header *phdr; + u32 portion_id = U32_MAX; + __le16 *magic; + u64 fragment_offset; + u16 item_index; + u32 pagesize; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !req || !fragment || !seg_index); + + SSDFS_DBG("maptbl %p, req %p, fragment %p, " + "area_start %lu, pages_count %u, " + "seg_index %p\n", + tbl, req, fragment, area_start, + pages_count, seg_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + pagesize = tbl->fsi->pagesize; + + magic = (__le16 *)fragment; + if (le16_to_cpu(*magic) == SSDFS_LEB_TABLE_MAGIC) { + lhdr = (struct ssdfs_leb_table_fragment_header *)fragment; + portion_id = le32_to_cpu(lhdr->portion_id); + } else if (le16_to_cpu(*magic) == SSDFS_PEB_TABLE_MAGIC) { + phdr = (struct ssdfs_peb_table_fragment_header *)fragment; + portion_id = le32_to_cpu(phdr->portion_id); + } else { + SSDFS_ERR("corrupted maptbl's page\n"); + return -ERANGE; + } + + if (portion_id >= tbl->fragments_count) { + SSDFS_ERR("portion_id %u >= tbl->fragments_count %u\n", + portion_id, tbl->fragments_count); + return -ERANGE; + } + + *seg_index = portion_id / tbl->fragments_per_seg; + + fragment_offset = portion_id % tbl->fragments_per_seg; + fragment_offset *= tbl->fragment_bytes; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(div_u64(fragment_offset, PAGE_SIZE) >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + item_index = (u16)div_u64(fragment_offset, PAGE_SIZE); + item_index += area_start; + + if (tbl->fsi->pagesize < PAGE_SIZE) { + u32 pages_per_item; + u32 items_count = pages_count; + + pages_per_item = PAGE_SIZE + pagesize - 1; + pages_per_item /= pagesize; + req->place.start.blk_index = item_index * pages_per_item; + req->place.len = items_count * pages_per_item; + } else if (tbl->fsi->pagesize > PAGE_SIZE) { + u32 items_per_page; + u32 items_count = pages_count; + + items_per_page = pagesize + PAGE_SIZE - 1; + items_per_page /= PAGE_SIZE; + req->place.start.blk_index = item_index / items_per_page; + req->place.len = items_count + items_per_page - 1; + req->place.len /= items_per_page; + } else { + req->place.start.blk_index = item_index; + req->place.len = pages_count; + } + + return 0; +} + +/* + * ssdfs_maptbl_set_fragment_checksum() - calculate checksum of dirty fragment + * @pvec: pagevec with dirty pages + * + * This method tries to calculate checksum of dirty fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_set_fragment_checksum(struct pagevec *pvec) +{ + struct ssdfs_leb_table_fragment_header *lhdr; + struct ssdfs_peb_table_fragment_header *phdr; + struct page *page; + void *kaddr; + __le16 *magic; + u32 bytes_count; + unsigned count; + unsigned i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pvec); +#endif /* CONFIG_SSDFS_DEBUG */ + + count = pagevec_count(pvec); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pvec %p, pages_count %u\n", + pvec, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (count == 0) { + SSDFS_WARN("empty pagevec\n"); + return -ERANGE; + } + + for (i = 0; i < count; i++) { + page = pvec->pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + kaddr = kmap_local_page(page); + magic = (__le16 *)kaddr; + if (le16_to_cpu(*magic) == SSDFS_LEB_TABLE_MAGIC) { + lhdr = (struct ssdfs_leb_table_fragment_header *)kaddr; + bytes_count = le32_to_cpu(lhdr->bytes_count); + lhdr->checksum = 0; + lhdr->checksum = ssdfs_crc32_le(kaddr, bytes_count); + } else if (le16_to_cpu(*magic) == SSDFS_PEB_TABLE_MAGIC) { + phdr = (struct ssdfs_peb_table_fragment_header *)kaddr; + bytes_count = le32_to_cpu(phdr->bytes_count); + phdr->checksum = 0; + phdr->checksum = ssdfs_crc32_le(kaddr, bytes_count); + } else { + err = -ERANGE; + SSDFS_ERR("corrupted maptbl's page: index %d\n", + i); + } + flush_dcache_page(page); + kunmap_local(kaddr); + + if (unlikely(err)) + return err; + } + + return 0; +} + +/* + * ssdfs_realloc_flush_reqs_array() - check necessity to realloc reqs array + * @fdesc: pointer on fragment descriptor + * + * This method checks the necessity to realloc the flush + * requests array. Finally, it tries to realloc the memory + * for the flush requests array. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static inline +int ssdfs_realloc_flush_reqs_array(struct ssdfs_maptbl_fragment_desc *fdesc) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc); + BUG_ON(!rwsem_is_locked(&fdesc->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (fdesc->flush_req_count > fdesc->flush_seq_size) { + SSDFS_ERR("request_index %u > flush_seq_size %u\n", + fdesc->flush_req_count, fdesc->flush_seq_size); + return -ERANGE; + } else if (fdesc->flush_req_count == fdesc->flush_seq_size) { + size_t seg_req_size = sizeof(struct ssdfs_segment_request); + + fdesc->flush_seq_size *= 2; + + fdesc->flush_req1 = krealloc(fdesc->flush_req1, + fdesc->flush_seq_size * seg_req_size, + GFP_KERNEL | __GFP_ZERO); + if (!fdesc->flush_req1) { + SSDFS_ERR("fail to reallocate buffer\n"); + return -ENOMEM; + } + + fdesc->flush_req2 = krealloc(fdesc->flush_req2, + fdesc->flush_seq_size * seg_req_size, + GFP_KERNEL | __GFP_ZERO); + if (!fdesc->flush_req2) { + SSDFS_ERR("fail to reallocate buffer\n"); + return -ENOMEM; + } + } + + return 0; +} + +/* + * ssdfs_maptbl_update_fragment() - update dirty fragment + * @tbl: mapping table object + * @fragment_index: index of fragment in the array + * + * This method tries to update dirty fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_maptbl_update_fragment(struct ssdfs_peb_mapping_table *tbl, + u32 fragment_index) +{ + struct ssdfs_maptbl_fragment_desc *fdesc; + struct ssdfs_segment_request *req1 = NULL, *req2 = NULL; + struct ssdfs_segment_info *si; + int state; + struct pagevec pvec; + bool has_backup; + pgoff_t page_index, end, range_len; + int i, j; + pgoff_t area_start; + unsigned area_size; + u64 ino = SSDFS_MAPTBL_INO; + u64 offset; + u32 size; + u16 seg_index; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + BUG_ON(fragment_index >= tbl->fragments_count); + + SSDFS_DBG("maptbl %p, fragment_index %u\n", + tbl, fragment_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + fdesc = &tbl->desc_array[fragment_index]; + has_backup = atomic_read(&tbl->flags) & SSDFS_MAPTBL_HAS_COPY; + + state = atomic_read(&fdesc->state); + if (state != SSDFS_MAPTBL_FRAG_DIRTY) { + SSDFS_ERR("fragment %u hasn't dirty state: state %#x\n", + fragment_index, state); + return -ERANGE; + } + + page_index = 0; + range_len = min_t(pgoff_t, + (pgoff_t)PAGEVEC_SIZE, + (pgoff_t)(tbl->fragment_pages - page_index)); + end = page_index + range_len - 1; + + down_write(&fdesc->lock); + + fdesc->flush_req_count = 0; + +retrive_dirty_pages: + pagevec_init(&pvec); + + err = ssdfs_page_array_lookup_range(&fdesc->array, + &page_index, end, + SSDFS_DIRTY_PAGE_TAG, + tbl->fragment_pages, + &pvec); + if (unlikely(err)) { + SSDFS_ERR("fail to find dirty pages: " + "fragment_index %u, start %lu, " + "end %lu, err %d\n", + fragment_index, page_index, end, err); + goto finish_fragment_update; + } + + if (pagevec_count(&pvec) == 0) { + page_index += range_len; + + if (page_index >= tbl->fragment_pages) + goto finish_fragment_update; + + range_len = min_t(pgoff_t, + (pgoff_t)PAGEVEC_SIZE, + (pgoff_t)(tbl->fragment_pages - page_index)); + end = page_index + range_len - 1; + goto retrive_dirty_pages; + } + + err = ssdfs_page_array_clear_dirty_range(&fdesc->array, + page_index, end); + if (unlikely(err)) { + SSDFS_ERR("fail to clear dirty range: " + "start %lu, end %lu, err %d\n", + page_index, end, err); + goto finish_fragment_update; + } + + err = ssdfs_maptbl_set_fragment_checksum(&pvec); + if (unlikely(err)) { + SSDFS_ERR("fail to set fragment checksum: " + "fragment_index %u, err %d\n", + fragment_index, err); + goto finish_fragment_update; + } + + i = 0; + +define_update_area: + area_start = pvec.pages[i]->index; + area_size = 0; + for (; i < pagevec_count(&pvec); i++) { + if ((area_start + area_size) != pvec.pages[i]->index) + break; + else + area_size++; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment_index %u, area_start %lu, area_size %u\n", + fragment_index, area_start, area_size); + + BUG_ON(area_size == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_realloc_flush_reqs_array(fdesc); + if (unlikely(err)) { + SSDFS_ERR("fail to realloc the reqs array\n"); + goto finish_fragment_update; + } + + req1 = &fdesc->flush_req1[fdesc->flush_req_count]; + req2 = &fdesc->flush_req2[fdesc->flush_req_count]; + fdesc->flush_req_count++; + + ssdfs_request_init(req1); + ssdfs_get_request(req1); + if (has_backup) { + ssdfs_request_init(req2); + ssdfs_get_request(req2); + } + + for (j = 0; j < area_size; j++) { + err = ssdfs_request_add_allocated_page_locked(req1); + if (!err && has_backup) + err = ssdfs_request_add_allocated_page_locked(req2); + if (unlikely(err)) { + SSDFS_ERR("fail allocate memory page: err %d\n", err); + goto fail_issue_fragment_updates; + } + + err = ssdfs_maptbl_copy_dirty_page(tbl, &pvec, + (i - area_size) + j, + j, req1); + if (unlikely(err)) { + SSDFS_ERR("fail to copy dirty page: " + "spage_index %d, dpage_index %d, err %d\n", + (i - area_size) + j, j, err); + goto fail_issue_fragment_updates; + } + + if (has_backup) + ssdfs_maptbl_replicate_dirty_page(req1, j, req2); + } + + offset = area_start * PAGE_SIZE; + offset += fragment_index * tbl->fragment_bytes; + size = area_size * PAGE_SIZE; + + ssdfs_request_prepare_logical_extent(ino, offset, size, 0, 0, req1); + if (has_backup) { + ssdfs_request_prepare_logical_extent(ino, offset, size, + 0, 0, req2); + } + + err = ssdfs_check_portion_id(&req1->result.pvec); + if (unlikely(err)) { + SSDFS_ERR("corrupted maptbl's page was found: " + "err %d\n", err); + goto fail_issue_fragment_updates; + } + + kaddr = kmap_local_page(req1->result.pvec.pages[0]); + err = ssdfs_maptbl_define_volume_extent(tbl, req1, kaddr, + area_start, area_size, + &seg_index); + kunmap_local(kaddr); + + if (unlikely(err)) { + SSDFS_ERR("fail to define volume extent: " + "err %d\n", + err); + goto fail_issue_fragment_updates; + } + + if (has_backup) { + ssdfs_memcpy(&req2->place, + 0, sizeof(struct ssdfs_volume_extent), + &req1->place, + 0, sizeof(struct ssdfs_volume_extent), + sizeof(struct ssdfs_volume_extent)); + } + + si = tbl->segs[SSDFS_MAIN_MAPTBL_SEG][seg_index]; + err = ssdfs_segment_update_extent_async(si, + SSDFS_REQ_ASYNC_NO_FREE, + req1); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("update extent async: seg %llu\n", si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!err && has_backup) { + if (!tbl->segs[SSDFS_COPY_MAPTBL_SEG]) { + err = -ERANGE; + SSDFS_ERR("copy of maptbl doesn't exist\n"); + goto fail_issue_fragment_updates; + } + + si = tbl->segs[SSDFS_COPY_MAPTBL_SEG][seg_index]; + err = ssdfs_segment_update_extent_async(si, + SSDFS_REQ_ASYNC_NO_FREE, + req2); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to update extent: " + "seg_index %u, err %d\n", + seg_index, err); + goto fail_issue_fragment_updates; + } + + if (err) { +fail_issue_fragment_updates: + ssdfs_request_unlock_and_remove_pages(req1); + ssdfs_put_request(req1); + if (has_backup) { + ssdfs_request_unlock_and_remove_pages(req2); + ssdfs_put_request(req2); + } + goto finish_fragment_update; + } + + if (i < pagevec_count(&pvec)) + goto define_update_area; + + for (j = 0; j < pagevec_count(&pvec); j++) { + ssdfs_put_page(pvec.pages[j]); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + pvec.pages[j], + page_ref_count(pvec.pages[j])); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + page_index += range_len; + + if (page_index < tbl->fragment_pages) { + range_len = min_t(pgoff_t, + (pgoff_t)PAGEVEC_SIZE, + (pgoff_t)(tbl->fragment_pages - page_index)); + end = page_index + range_len - 1; + pagevec_reinit(&pvec); + goto retrive_dirty_pages; + } + +finish_fragment_update: + if (!err) { + state = atomic_cmpxchg(&fdesc->state, + SSDFS_MAPTBL_FRAG_DIRTY, + SSDFS_MAPTBL_FRAG_TOWRITE); + if (state != SSDFS_MAPTBL_FRAG_DIRTY) { + err = -ERANGE; + SSDFS_ERR("invalid fragment state %#x\n", state); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment_index %u, state %#x\n", + fragment_index, + atomic_read(&fdesc->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + for (j = 0; j < pagevec_count(&pvec); j++) { + ssdfs_put_page(pvec.pages[j]); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + pvec.pages[j], + page_ref_count(pvec.pages[j])); +#endif /* CONFIG_SSDFS_DEBUG */ + } + } + + up_write(&fdesc->lock); + + pagevec_reinit(&pvec); + return err; +} + +/* + * ssdfs_maptbl_issue_fragments_update() - issue update of fragments + * @tbl: mapping table object + * @start_fragment: index of start fragment in the dirty bmap + * @dirty_bmap: bmap of dirty fragments + * + * This method tries to find the dirty fragments in @dirty_bmap. + * It updates the state of every found dirty fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENODATA - @dirty_bmap doesn't contain the dirty fragments. + */ +static +int ssdfs_maptbl_issue_fragments_update(struct ssdfs_peb_mapping_table *tbl, + u32 start_fragment, + unsigned long dirty_bmap) +{ + bool is_bit_found; + int i = 0; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + + SSDFS_DBG("maptbl %p, start_fragment %u, dirty_bmap %#lx\n", + tbl, start_fragment, dirty_bmap); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (dirty_bmap == 0) { + SSDFS_DBG("bmap doesn't contain dirty bits\n"); + return -ENODATA; + } + + for (i = 0; i < BITS_PER_LONG; i++) { + is_bit_found = test_bit(i, &dirty_bmap); + + if (!is_bit_found) + continue; + + err = ssdfs_maptbl_update_fragment(tbl, start_fragment + i); + if (unlikely(err)) { + SSDFS_ERR("fail to update fragment: " + "fragment_index %u, err %d\n", + start_fragment + i, err); + return err; + } + } + + return 0; +} + +/* + * ssdfs_maptbl_flush_dirty_fragments() - find and flush dirty fragments + * @tbl: mapping table object + * + * This method tries to find and to flush all dirty fragments. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_flush_dirty_fragments(struct ssdfs_peb_mapping_table *tbl) +{ + unsigned long *bmap; + int size; + unsigned long *found; + u32 start_fragment; +#ifdef CONFIG_SSDFS_DEBUG + size_t bytes_count; +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + + SSDFS_DBG("maptbl %p\n", tbl); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_debug_maptbl_object(tbl); + + mutex_lock(&tbl->bmap_lock); + + bmap = tbl->dirty_bmap; + +#ifdef CONFIG_SSDFS_DEBUG + bytes_count = tbl->fragments_count + BITS_PER_LONG - 1; + bytes_count /= BITS_PER_BYTE; + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + tbl->dirty_bmap, bytes_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + size = tbl->fragments_count; + err = ssdfs_find_first_dirty_fragment(bmap, size, &found); + if (err == -ENODATA) { + SSDFS_DBG("maptbl hasn't dirty fragments\n"); + goto finish_flush_dirty_fragments; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find dirty fragments: " + "err %d\n", + err); + goto finish_flush_dirty_fragments; + } else if (!found) { + err = -ERANGE; + SSDFS_ERR("invalid bitmap pointer\n"); + goto finish_flush_dirty_fragments; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("bmap %p, found %p\n", bmap, found); + + BUG_ON(((found - bmap) * BITS_PER_LONG) >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + start_fragment = (u32)((found - bmap) * BITS_PER_LONG); + + err = ssdfs_maptbl_issue_fragments_update(tbl, start_fragment, + *found); + if (unlikely(err)) { + SSDFS_ERR("fail to issue fragments update: " + "start_fragment %u, found %#lx, err %d\n", + start_fragment, *found, err); + goto finish_flush_dirty_fragments; + } + + err = ssdfs_clear_dirty_state(found); + if (unlikely(err)) { + SSDFS_ERR("fail to clear dirty state: " + "err %d\n", + err); + goto finish_flush_dirty_fragments; + } + + if ((start_fragment + BITS_PER_LONG) >= tbl->fragments_count) + goto finish_flush_dirty_fragments; + + size = tbl->fragments_count - (start_fragment + BITS_PER_LONG); + while (size > 0) { + err = ssdfs_find_first_dirty_fragment(++found, size, &found); + if (err == -ENODATA) { + err = 0; + goto finish_flush_dirty_fragments; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find dirty fragments: " + "err %d\n", + err); + goto finish_flush_dirty_fragments; + } else if (!found) { + err = -ERANGE; + SSDFS_ERR("invalid bitmap pointer\n"); + goto finish_flush_dirty_fragments; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(((found - bmap) * BITS_PER_LONG) >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + start_fragment = (u32)((found - bmap) * BITS_PER_LONG); + + err = ssdfs_maptbl_issue_fragments_update(tbl, start_fragment, + *found); + if (unlikely(err)) { + SSDFS_ERR("fail to issue fragments update: " + "start_fragment %u, found %#lx, err %d\n", + start_fragment, *found, err); + goto finish_flush_dirty_fragments; + } + + err = ssdfs_clear_dirty_state(found); + if (unlikely(err)) { + SSDFS_ERR("fail to clear dirty state: " + "err %d\n", + err); + goto finish_flush_dirty_fragments; + } + + size = tbl->fragments_count - (start_fragment + BITS_PER_LONG); + } + +finish_flush_dirty_fragments: + mutex_unlock(&tbl->bmap_lock); + return err; +} + +/* + * ssdfs_maptbl_check_request() - check request + * @fdesc: pointer on fragment descriptor + * @req: segment request + * + * This method tries to check the state of request. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_check_request(struct ssdfs_maptbl_fragment_desc *fdesc, + struct ssdfs_segment_request *req) +{ + wait_queue_head_t *wq = NULL; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc || !req); + BUG_ON(!rwsem_is_locked(&fdesc->lock)); + + SSDFS_DBG("fdesc %p, req %p\n", fdesc, req); +#endif /* CONFIG_SSDFS_DEBUG */ + +check_req_state: + switch (atomic_read(&req->result.state)) { + case SSDFS_REQ_CREATED: + case SSDFS_REQ_STARTED: + wq = &req->private.wait_queue; + + up_write(&fdesc->lock); + err = wait_event_killable_timeout(*wq, + has_request_been_executed(req), + SSDFS_DEFAULT_TIMEOUT); + down_write(&fdesc->lock); + + if (err < 0) + WARN_ON(err < 0); + else + err = 0; + + goto check_req_state; + break; + + case SSDFS_REQ_FINISHED: + /* do nothing */ + break; + + case SSDFS_REQ_FAILED: + err = req->result.err; + + if (!err) { + SSDFS_ERR("error code is absent: " + "req %p, err %d\n", + req, err); + err = -ERANGE; + } + + SSDFS_ERR("flush request is failed: " + "err %d\n", err); + return err; + + default: + SSDFS_ERR("invalid result's state %#x\n", + atomic_read(&req->result.state)); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_maptbl_wait_flush_end() - wait flush ending + * @tbl: mapping table object + * + * This method is waiting the end of flush operation. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_wait_flush_end(struct ssdfs_peb_mapping_table *tbl) +{ + struct ssdfs_maptbl_fragment_desc *fdesc; + struct ssdfs_segment_request *req1 = NULL, *req2 = NULL; + bool has_backup; + u32 fragments_count; + u32 i, j; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + + SSDFS_DBG("maptbl %p\n", tbl); +#endif /* CONFIG_SSDFS_DEBUG */ + + fragments_count = tbl->fragments_count; + has_backup = atomic_read(&tbl->flags) & SSDFS_MAPTBL_HAS_COPY; + + for (i = 0; i < fragments_count; i++) { + fdesc = &tbl->desc_array[i]; + + down_write(&fdesc->lock); + + switch (atomic_read(&fdesc->state)) { + case SSDFS_MAPTBL_FRAG_DIRTY: + err = -ERANGE; + SSDFS_ERR("found unprocessed dirty fragment: " + "index %d\n", i); + goto finish_fragment_processing; + + case SSDFS_MAPTBL_FRAG_TOWRITE: + for (j = 0; j < fdesc->flush_req_count; j++) { + req1 = &fdesc->flush_req1[j]; + req2 = &fdesc->flush_req2[j]; + + err = ssdfs_maptbl_check_request(fdesc, req1); + if (unlikely(err)) { + SSDFS_ERR("flush request failed: " + "err %d\n", err); + goto finish_fragment_processing; + } + + if (!has_backup) + continue; + + err = ssdfs_maptbl_check_request(fdesc, req2); + if (unlikely(err)) { + SSDFS_ERR("flush request failed: " + "err %d\n", err); + goto finish_fragment_processing; + } + } + break; + + default: + /* do nothing */ + break; + } + +finish_fragment_processing: + up_write(&fdesc->lock); + + if (unlikely(err)) + return err; + } + + return 0; +} + +/* + * __ssdfs_maptbl_commit_logs() - issue commit log requests + * @tbl: mapping table object + * @fdesc: pointer on fragment descriptor + * @fragment_index: index of fragment in the array + * + * This method tries to issue the commit log requests. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_maptbl_commit_logs(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_fragment_desc *fdesc, + u32 fragment_index) +{ + struct ssdfs_segment_request *req1 = NULL, *req2 = NULL; + struct ssdfs_segment_info *si; + u64 ino = SSDFS_MAPTBL_INO; + int state; + bool has_backup; + pgoff_t area_start; + pgoff_t area_size, processed_pages; + u64 offset; + u16 seg_index; + struct page *page; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !fdesc); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + BUG_ON(!rwsem_is_locked(&fdesc->lock)); + + SSDFS_DBG("maptbl %p, fragment_index %u\n", + tbl, fragment_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + has_backup = atomic_read(&tbl->flags) & SSDFS_MAPTBL_HAS_COPY; + + state = atomic_read(&fdesc->state); + if (state != SSDFS_MAPTBL_FRAG_TOWRITE) { + SSDFS_ERR("fragment isn't under flush: state %#x\n", + state); + return -ERANGE; + } + + area_start = 0; + area_size = min_t(pgoff_t, + (pgoff_t)PAGEVEC_SIZE, + (pgoff_t)tbl->fragment_pages); + processed_pages = 0; + + fdesc->flush_req_count = 0; + + do { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(area_size == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_realloc_flush_reqs_array(fdesc); + if (unlikely(err)) { + SSDFS_ERR("fail to realloc the reqs array\n"); + goto finish_issue_commit_request; + } + + req1 = &fdesc->flush_req1[fdesc->flush_req_count]; + req2 = &fdesc->flush_req2[fdesc->flush_req_count]; + fdesc->flush_req_count++; + + ssdfs_request_init(req1); + ssdfs_get_request(req1); + if (has_backup) { + ssdfs_request_init(req2); + ssdfs_get_request(req2); + } + + offset = area_start * PAGE_SIZE; + offset += fragment_index * tbl->fragment_bytes; + + ssdfs_request_prepare_logical_extent(ino, offset, + 0, 0, 0, req1); + if (has_backup) { + ssdfs_request_prepare_logical_extent(ino, + offset, + 0, 0, 0, + req2); + } + + page = ssdfs_page_array_get_page_locked(&fdesc->array, + area_start); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to get page: " + "index %lu, err %d\n", + area_start, err); + goto finish_issue_commit_request; + } + + kaddr = kmap_local_page(page); + err = ssdfs_maptbl_define_volume_extent(tbl, req1, kaddr, + area_start, area_size, + &seg_index); + kunmap_local(kaddr); + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) { + SSDFS_ERR("fail to define volume extent: " + "err %d\n", + err); + goto finish_issue_commit_request; + } + + if (has_backup) { + ssdfs_memcpy(&req2->place, + 0, sizeof(struct ssdfs_volume_extent), + &req1->place, + 0, sizeof(struct ssdfs_volume_extent), + sizeof(struct ssdfs_volume_extent)); + } + + si = tbl->segs[SSDFS_MAIN_MAPTBL_SEG][seg_index]; + err = ssdfs_segment_commit_log_async(si, + SSDFS_REQ_ASYNC_NO_FREE, + req1); + + if (!err && has_backup) { + if (!tbl->segs[SSDFS_COPY_MAPTBL_SEG]) { + err = -ERANGE; + SSDFS_ERR("copy of maptbl doesn't exist\n"); + goto finish_issue_commit_request; + } + + si = tbl->segs[SSDFS_COPY_MAPTBL_SEG][seg_index]; + err = ssdfs_segment_commit_log_async(si, + SSDFS_REQ_ASYNC_NO_FREE, + req2); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to update extent: " + "seg_index %u, err %d\n", + seg_index, err); + goto finish_issue_commit_request; + } + + area_start += area_size; + processed_pages += area_size; + area_size = min_t(pgoff_t, + (pgoff_t)PAGEVEC_SIZE, + (pgoff_t)(tbl->fragment_pages - + processed_pages)); + } while (processed_pages < tbl->fragment_pages); + +finish_issue_commit_request: + if (err) { + ssdfs_put_request(req1); + if (has_backup) + ssdfs_put_request(req2); + } + + return err; +} + +/* + * ssdfs_maptbl_commit_logs() - issue commit log requests + * @tbl: mapping table object + * + * This method tries to issue the commit log requests. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_commit_logs(struct ssdfs_peb_mapping_table *tbl) +{ + struct ssdfs_maptbl_fragment_desc *fdesc; + u32 fragments_count; + bool has_backup; + u32 i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + + SSDFS_DBG("maptbl %p\n", tbl); +#endif /* CONFIG_SSDFS_DEBUG */ + + fragments_count = tbl->fragments_count; + has_backup = atomic_read(&tbl->flags) & SSDFS_MAPTBL_HAS_COPY; + + for (i = 0; i < fragments_count; i++) { + fdesc = &tbl->desc_array[i]; + + down_write(&fdesc->lock); + + switch (atomic_read(&fdesc->state)) { + case SSDFS_MAPTBL_FRAG_DIRTY: + err = -ERANGE; + SSDFS_ERR("found unprocessed dirty fragment: " + "index %d\n", i); + goto finish_fragment_processing; + + case SSDFS_MAPTBL_FRAG_TOWRITE: + err = __ssdfs_maptbl_commit_logs(tbl, fdesc, i); + if (unlikely(err)) { + SSDFS_ERR("fail to commit logs: " + "fragment_index %u, err %d\n", + i, err); + goto finish_fragment_processing; + } + break; + + default: + /* do nothing */ + break; + } + +finish_fragment_processing: + up_write(&fdesc->lock); + + if (unlikely(err)) + return err; + } + + return 0; +} + +/* + * ssdfs_maptbl_wait_commit_logs_end() - wait commit logs ending + * @tbl: mapping table object + * + * This method is waiting the end of commit logs operation. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_wait_commit_logs_end(struct ssdfs_peb_mapping_table *tbl) +{ + struct ssdfs_maptbl_fragment_desc *fdesc; + struct ssdfs_segment_request *req1 = NULL, *req2 = NULL; + bool has_backup; + u32 fragments_count; + int state; + u32 i, j; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + + SSDFS_DBG("maptbl %p\n", tbl); +#endif /* CONFIG_SSDFS_DEBUG */ + + fragments_count = tbl->fragments_count; + has_backup = atomic_read(&tbl->flags) & SSDFS_MAPTBL_HAS_COPY; + + for (i = 0; i < fragments_count; i++) { + fdesc = &tbl->desc_array[i]; + + down_write(&fdesc->lock); + + switch (atomic_read(&fdesc->state)) { + case SSDFS_MAPTBL_FRAG_DIRTY: + err = -ERANGE; + SSDFS_ERR("found unprocessed dirty fragment: " + "index %d\n", i); + goto finish_fragment_processing; + + case SSDFS_MAPTBL_FRAG_TOWRITE: + for (j = 0; j < fdesc->flush_req_count; j++) { + req1 = &fdesc->flush_req1[j]; + req2 = &fdesc->flush_req2[j]; + + err = ssdfs_maptbl_check_request(fdesc, req1); + if (unlikely(err)) { + SSDFS_ERR("flush request failed: " + "err %d\n", err); + goto finish_fragment_processing; + } + + if (!has_backup) + continue; + + err = ssdfs_maptbl_check_request(fdesc, req2); + if (unlikely(err)) { + SSDFS_ERR("flush request failed: " + "err %d\n", err); + goto finish_fragment_processing; + } + } + + state = atomic_cmpxchg(&fdesc->state, + SSDFS_MAPTBL_FRAG_TOWRITE, + SSDFS_MAPTBL_FRAG_INITIALIZED); + if (state != SSDFS_MAPTBL_FRAG_TOWRITE) { + err = -ERANGE; + SSDFS_ERR("invalid fragment state %#x\n", + state); + goto finish_fragment_processing;; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment_index %u, state %#x\n", + i, + atomic_read(&fdesc->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + default: + /* do nothing */ + break; + } + +finish_fragment_processing: + up_write(&fdesc->lock); + + if (unlikely(err)) + return err; + } + + return 0; +} + +/* + * __ssdfs_maptbl_prepare_migration() - issue prepare migration requests + * @tbl: mapping table object + * @fdesc: pointer on fragment descriptor + * @fragment_index: index of fragment in the array + * + * This method tries to issue prepare migration requests. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_maptbl_prepare_migration(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_fragment_desc *fdesc, + u32 fragment_index) +{ + struct ssdfs_segment_request *req1 = NULL, *req2 = NULL; + struct ssdfs_segment_info *si; + u64 ino = SSDFS_MAPTBL_INO; + bool has_backup; + pgoff_t area_start; + pgoff_t area_size, processed_pages; + u64 offset; + u16 seg_index; + struct page *page; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !fdesc); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + BUG_ON(!rwsem_is_locked(&fdesc->lock)); + + SSDFS_DBG("maptbl %p, fragment_index %u\n", + tbl, fragment_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + has_backup = atomic_read(&tbl->flags) & SSDFS_MAPTBL_HAS_COPY; + + area_start = 0; + area_size = min_t(pgoff_t, + (pgoff_t)PAGEVEC_SIZE, + (pgoff_t)tbl->fragment_pages); + processed_pages = 0; + + fdesc->flush_req_count = 0; + + do { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(area_size == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_realloc_flush_reqs_array(fdesc); + if (unlikely(err)) { + SSDFS_ERR("fail to realloc the reqs array\n"); + goto finish_issue_prepare_migration_request; + } + + req1 = &fdesc->flush_req1[fdesc->flush_req_count]; + req2 = &fdesc->flush_req2[fdesc->flush_req_count]; + fdesc->flush_req_count++; + + ssdfs_request_init(req1); + ssdfs_get_request(req1); + if (has_backup) { + ssdfs_request_init(req2); + ssdfs_get_request(req2); + } + + offset = area_start * PAGE_SIZE; + offset += fragment_index * tbl->fragment_bytes; + + ssdfs_request_prepare_logical_extent(ino, offset, + 0, 0, 0, req1); + if (has_backup) { + ssdfs_request_prepare_logical_extent(ino, + offset, + 0, 0, 0, + req2); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("area_start %lu, area_size %lu, " + "processed_pages %lu, tbl->fragment_pages %u\n", + area_start, area_size, processed_pages, + tbl->fragment_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = ssdfs_page_array_get_page_locked(&fdesc->array, + area_start); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to get page: " + "index %lu, err %d\n", + area_start, err); + goto finish_issue_prepare_migration_request; + } + + kaddr = kmap_local_page(page); + err = ssdfs_maptbl_define_volume_extent(tbl, req1, kaddr, + area_start, area_size, + &seg_index); + kunmap_local(kaddr); + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) { + SSDFS_ERR("fail to define volume extent: " + "err %d\n", + err); + goto finish_issue_prepare_migration_request; + } + + if (has_backup) { + ssdfs_memcpy(&req2->place, + 0, sizeof(struct ssdfs_volume_extent), + &req1->place, + 0, sizeof(struct ssdfs_volume_extent), + sizeof(struct ssdfs_volume_extent)); + } + + si = tbl->segs[SSDFS_MAIN_MAPTBL_SEG][seg_index]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start migration now: seg %llu\n", si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_segment_prepare_migration_async(si, + SSDFS_REQ_ASYNC_NO_FREE, + req1); + if (!err && has_backup) { + if (!tbl->segs[SSDFS_COPY_MAPTBL_SEG]) { + err = -ERANGE; + SSDFS_ERR("copy of maptbl doesn't exist\n"); + goto finish_issue_prepare_migration_request; + } + + si = tbl->segs[SSDFS_COPY_MAPTBL_SEG][seg_index]; + err = ssdfs_segment_prepare_migration_async(si, + SSDFS_REQ_ASYNC_NO_FREE, + req2); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to update extent: " + "seg_index %u, err %d\n", + seg_index, err); + goto finish_issue_prepare_migration_request; + } + + area_start += area_size; + processed_pages += area_size; + area_size = min_t(pgoff_t, + (pgoff_t)PAGEVEC_SIZE, + (pgoff_t)(tbl->fragment_pages - + processed_pages)); + } while (processed_pages < tbl->fragment_pages); + +finish_issue_prepare_migration_request: + if (err) { + ssdfs_put_request(req1); + if (has_backup) + ssdfs_put_request(req2); + } + + return err; +} + +/* + * ssdfs_maptbl_prepare_migration() - issue prepare migration requests + * @tbl: mapping table object + * + * This method tries to issue prepare migration requests. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_prepare_migration(struct ssdfs_peb_mapping_table *tbl) +{ + struct ssdfs_maptbl_fragment_desc *fdesc; + u32 fragments_count; + int state; + u32 i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + + SSDFS_DBG("maptbl %p\n", tbl); +#endif /* CONFIG_SSDFS_DEBUG */ + + fragments_count = tbl->fragments_count; + + for (i = 0; i < fragments_count; i++) { + fdesc = &tbl->desc_array[i]; + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED) { + SSDFS_ERR("fragment is corrupted: index %u\n", + i); + return -EFAULT; + } else if (state == SSDFS_MAPTBL_FRAG_CREATED) { + struct completion *end = &fdesc->init_end; + + up_read(&tbl->tbl_lock); + + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("maptbl's fragment init failed: " + "index %u\n", i); + return -ERANGE; + } + + down_read(&tbl->tbl_lock); + } + + state = atomic_read(&fdesc->state); + switch (state) { + case SSDFS_MAPTBL_FRAG_INITIALIZED: + case SSDFS_MAPTBL_FRAG_DIRTY: + /* expected state */ + break; + + case SSDFS_MAPTBL_FRAG_CREATED: + case SSDFS_MAPTBL_FRAG_INIT_FAILED: + SSDFS_WARN("fragment is not initialized: " + "index %u, state %#x\n", + i, state); + return -EFAULT; + + default: + SSDFS_WARN("unexpected fragment state: " + "index %u, state %#x\n", + i, atomic_read(&fdesc->state)); + return -ERANGE; + } + + down_write(&fdesc->lock); + err = __ssdfs_maptbl_prepare_migration(tbl, fdesc, i); + up_write(&fdesc->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to prepare migration: " + "fragment_index %u, err %d\n", + i, err); + return err; + } + } + + return 0; +} + +/* + * ssdfs_maptbl_wait_prepare_migration_end() - wait migration preparation ending + * @tbl: mapping table object + * + * This method is waiting the end of migration preparation operation. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_wait_prepare_migration_end(struct ssdfs_peb_mapping_table *tbl) +{ + struct ssdfs_maptbl_fragment_desc *fdesc; + struct ssdfs_segment_request *req1 = NULL, *req2 = NULL; + bool has_backup; + u32 fragments_count; + u32 i, j; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + + SSDFS_DBG("maptbl %p\n", tbl); +#endif /* CONFIG_SSDFS_DEBUG */ + + fragments_count = tbl->fragments_count; + has_backup = atomic_read(&tbl->flags) & SSDFS_MAPTBL_HAS_COPY; + + for (i = 0; i < fragments_count; i++) { + fdesc = &tbl->desc_array[i]; + + down_write(&fdesc->lock); + + for (j = 0; j < fdesc->flush_req_count; j++) { + req1 = &fdesc->flush_req1[j]; + req2 = &fdesc->flush_req2[j]; + + err = ssdfs_maptbl_check_request(fdesc, req1); + if (unlikely(err)) { + SSDFS_ERR("flush request failed: " + "err %d\n", err); + goto finish_fragment_processing; + } + + if (!has_backup) + continue; + + err = ssdfs_maptbl_check_request(fdesc, req2); + if (unlikely(err)) { + SSDFS_ERR("flush request failed: " + "err %d\n", err); + goto finish_fragment_processing; + } + } + +finish_fragment_processing: + up_write(&fdesc->lock); + + if (unlikely(err)) + return err; + } + + return 0; +} + +static +int ssdfs_maptbl_create_checkpoint(struct ssdfs_peb_mapping_table *tbl) +{ +#ifdef CONFIG_SSDFS_DEBUG + /* TODO: implement */ + SSDFS_DBG("TODO: implement %s\n", __func__); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_maptbl_flush() - flush dirty mapping table object + * @tbl: mapping table object + * + * This method tries to flush dirty mapping table object. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EFAULT - mapping table is corrupted. + */ +int ssdfs_maptbl_flush(struct ssdfs_peb_mapping_table *tbl) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("maptbl %p\n", tbl); +#else + SSDFS_DBG("maptbl %p\n", tbl); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (atomic_read(&tbl->flags) & SSDFS_MAPTBL_ERROR) { + ssdfs_fs_error(tbl->fsi->sb, + __FILE__, __func__, __LINE__, + "maptbl has corrupted state\n"); + return -EFAULT; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("prepare migration\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&tbl->tbl_lock); + + err = ssdfs_maptbl_prepare_migration(tbl); + if (unlikely(err)) { + ssdfs_fs_error(tbl->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to prepare migration: err %d\n", + err); + goto finish_prepare_migration; + } + + err = ssdfs_maptbl_wait_prepare_migration_end(tbl); + if (unlikely(err)) { + ssdfs_fs_error(tbl->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to prepare migration: err %d\n", + err); + goto finish_prepare_migration; + } + +finish_prepare_migration: + up_read(&tbl->tbl_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finish prepare migration\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) + return err; + + /* + * This flag should be not included into the header. + * The flag is used only during flush operation. + * The inclusion of the flag in the on-disk layout's + * state means the volume corruption. + */ + atomic_or(SSDFS_MAPTBL_UNDER_FLUSH, &tbl->flags); + + down_write(&tbl->tbl_lock); + + ssdfs_sb_maptbl_header_correct_state(tbl); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("flush dirty fragments\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_maptbl_flush_dirty_fragments(tbl); + if (err == -ENODATA) { + err = 0; + up_write(&tbl->tbl_lock); + SSDFS_DBG("maptbl hasn't dirty fragments\n"); + goto finish_maptbl_flush; + } else if (unlikely(err)) { + up_write(&tbl->tbl_lock); + ssdfs_fs_error(tbl->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to flush maptbl: err %d\n", + err); + goto finish_maptbl_flush; + } + + err = ssdfs_maptbl_wait_flush_end(tbl); + if (unlikely(err)) { + up_write(&tbl->tbl_lock); + ssdfs_fs_error(tbl->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to flush maptbl: err %d\n", + err); + goto finish_maptbl_flush; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finish flush dirty fragments\n"); + + SSDFS_DBG("commit logs\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_maptbl_commit_logs(tbl); + if (unlikely(err)) { + up_write(&tbl->tbl_lock); + ssdfs_fs_error(tbl->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to flush maptbl: err %d\n", + err); + goto finish_maptbl_flush; + } + + err = ssdfs_maptbl_wait_commit_logs_end(tbl); + if (unlikely(err)) { + up_write(&tbl->tbl_lock); + ssdfs_fs_error(tbl->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to flush maptbl: err %d\n", + err); + goto finish_maptbl_flush; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finish commit logs\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + downgrade_write(&tbl->tbl_lock); + + err = ssdfs_maptbl_create_checkpoint(tbl); + if (unlikely(err)) { + ssdfs_fs_error(tbl->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to create maptbl's checkpoint: " + "err %d\n", + err); + } + + up_read(&tbl->tbl_lock); + +finish_maptbl_flush: + atomic_and(~SSDFS_MAPTBL_UNDER_FLUSH, &tbl->flags); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +int ssdfs_maptbl_resize(struct ssdfs_peb_mapping_table *tbl, + u64 new_pebs_count) +{ + /* TODO: implement */ + SSDFS_WARN("TODO: implement %s\n", __func__); + return -ENOSYS; +} + +/* + * ssdfs_maptbl_get_peb_descriptor() - retrieve PEB descriptor + * @fdesc: fragment descriptor + * @index: index of PEB descriptor in the PEB table + * @peb_id: pointer on PEB ID value [out] + * @peb_desc: pointer on PEB descriptor value [out] + * + * This method tries to extract PEB ID and PEB descriptor + * for the index of PEB descriptor in the PEB table. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_get_peb_descriptor(struct ssdfs_maptbl_fragment_desc *fdesc, + u16 index, u64 *peb_id, + struct ssdfs_peb_descriptor *peb_desc) +{ + struct ssdfs_peb_descriptor *ptr; + pgoff_t page_index; + u16 item_index; + struct page *page; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc || !peb_id || !peb_desc); + BUG_ON(!rwsem_is_locked(&fdesc->lock)); + + SSDFS_DBG("fdesc %p, index %u, peb_id %p, peb_desc %p\n", + fdesc, index, peb_id, peb_desc); +#endif /* CONFIG_SSDFS_DEBUG */ + + *peb_id = U64_MAX; + page_index = PEBTBL_PAGE_INDEX(fdesc, index); + item_index = index % fdesc->pebs_per_page; + + page = ssdfs_page_array_get_page_locked(&fdesc->array, page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + page_index); + return err; + } + + kaddr = kmap_local_page(page); + + *peb_id = GET_PEB_ID(kaddr, item_index); + if (*peb_id == U64_MAX) { + err = -ERANGE; + SSDFS_ERR("fail to define peb_id: " + "page_index %lu, item_index %u\n", + page_index, item_index); + goto finish_page_processing; + } + + ptr = GET_PEB_DESCRIPTOR(kaddr, item_index); + if (IS_ERR_OR_NULL(ptr)) { + err = IS_ERR(ptr) ? PTR_ERR(ptr) : -ERANGE; + SSDFS_ERR("fail to get peb_descriptor: " + "page_index %lu, item_index %u, err %d\n", + page_index, item_index, err); + goto finish_page_processing; + } + + ssdfs_memcpy(peb_desc, + 0, sizeof(struct ssdfs_peb_descriptor), + ptr, + 0, sizeof(struct ssdfs_peb_descriptor), + sizeof(struct ssdfs_peb_descriptor)); + +finish_page_processing: + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * GET_LEB_DESCRIPTOR() - retrieve LEB descriptor + * @kaddr: pointer on memory page's content + * @leb_id: LEB ID number + * + * This method tries to return the pointer on + * LEB descriptor for @leb_id. + * + * RETURN: + * [success] - pointer on LEB descriptor + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static inline +struct ssdfs_leb_descriptor *GET_LEB_DESCRIPTOR(void *kaddr, u64 leb_id) +{ + struct ssdfs_leb_table_fragment_header *hdr; + u64 start_leb; + u16 lebs_count; + u64 leb_id_diff; + u32 leb_desc_off; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!kaddr); + + SSDFS_DBG("kaddr %p, leb_id %llu\n", + kaddr, leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr = (struct ssdfs_leb_table_fragment_header *)kaddr; + + if (le16_to_cpu(hdr->magic) != SSDFS_LEB_TABLE_MAGIC) { + SSDFS_ERR("corrupted page\n"); + return ERR_PTR(-ERANGE); + } + + start_leb = le64_to_cpu(hdr->start_leb); + lebs_count = le16_to_cpu(hdr->lebs_count); + + if (leb_id < start_leb || + leb_id >= (start_leb + lebs_count)) { + SSDFS_ERR("corrupted page: " + "leb_id %llu, start_leb %llu, lebs_count %u\n", + leb_id, start_leb, lebs_count); + return ERR_PTR(-ERANGE); + } + + leb_id_diff = leb_id - start_leb; + leb_desc_off = SSDFS_LEBTBL_FRAGMENT_HDR_SIZE; + leb_desc_off += leb_id_diff * sizeof(struct ssdfs_leb_descriptor); + + if (leb_desc_off >= PAGE_SIZE) { + SSDFS_ERR("invalid offset %u\n", leb_desc_off); + return ERR_PTR(-ERANGE); + } + + return (struct ssdfs_leb_descriptor *)((u8 *)kaddr + leb_desc_off); +} + +/* + * LEBTBL_PAGE_INDEX() - define LEB table's page index + * @fdesc: fragment descriptor + * @leb_id: LEB identification number + * + * RETURN: + * [success] - page index. + * [failure] - ULONG_MAX. + */ +static inline +pgoff_t LEBTBL_PAGE_INDEX(struct ssdfs_maptbl_fragment_desc *fdesc, + u64 leb_id) +{ + u64 leb_id_diff; + pgoff_t page_index; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc); + + SSDFS_DBG("fdesc %p, leb_id %llu\n", + fdesc, leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (leb_id < fdesc->start_leb || + leb_id >= (fdesc->start_leb + fdesc->lebs_count)) { + SSDFS_ERR("invalid leb_id: leb_id %llu, " + "start_leb %llu, lebs_count %u\n", + leb_id, fdesc->start_leb, fdesc->lebs_count); + return ULONG_MAX; + } + + leb_id_diff = leb_id - fdesc->start_leb; + page_index = (pgoff_t)(leb_id_diff / fdesc->lebs_per_page); + + if (page_index >= fdesc->lebtbl_pages) { + SSDFS_ERR("page_index %lu >= fdesc->lebtbl_pages %u\n", + page_index, fdesc->lebtbl_pages); + return ULONG_MAX; + } + + return page_index; +} + +/* + * ssdfs_maptbl_get_leb_descriptor() - retrieve LEB descriptor + * @fdesc: fragment descriptor + * @leb_id: LEB ID number + * @leb_desc: pointer on LEB descriptor value [out] + * + * This method tries to extract LEB descriptor + * for the LEB ID number. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_get_leb_descriptor(struct ssdfs_maptbl_fragment_desc *fdesc, + u64 leb_id, + struct ssdfs_leb_descriptor *leb_desc) +{ + struct ssdfs_leb_descriptor *ptr; + pgoff_t page_index; + struct page *page; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc || !leb_desc); + BUG_ON(!rwsem_is_locked(&fdesc->lock)); + + SSDFS_DBG("fdesc %p, leb_id %llu, leb_desc %p\n", + fdesc, leb_id, leb_desc); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_index = LEBTBL_PAGE_INDEX(fdesc, leb_id); + if (page_index == ULONG_MAX) { + SSDFS_ERR("fail to define page_index: " + "leb_id %llu\n", + leb_id); + return -ERANGE; + } + + page = ssdfs_page_array_get_page_locked(&fdesc->array, page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + page_index); + return err; + } + + kaddr = kmap_local_page(page); + + ptr = GET_LEB_DESCRIPTOR(kaddr, leb_id); + if (IS_ERR_OR_NULL(ptr)) { + err = IS_ERR(ptr) ? PTR_ERR(ptr) : -ERANGE; + SSDFS_ERR("fail to get leb_descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_page_processing; + } + + ssdfs_memcpy(leb_desc, + 0, sizeof(struct ssdfs_leb_descriptor), + ptr, + 0, sizeof(struct ssdfs_leb_descriptor), + sizeof(struct ssdfs_leb_descriptor)); + +finish_page_processing: + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * FRAGMENT_INDEX() - define fragment index + * @tbl: pointer on mapping table object + * @leb_id: LEB ID number + * + * RETURN: + * [success] - fragment index. + * [failure] - U32_MAX. + */ +static inline +u32 FRAGMENT_INDEX(struct ssdfs_peb_mapping_table *tbl, u64 leb_id) +{ + u32 fragment_index; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + + SSDFS_DBG("maptbl %p, leb_id %llu\n", + tbl, leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (leb_id >= tbl->lebs_count) { + SSDFS_ERR("leb_id %llu >= tbl->lebs_count %llu\n", + leb_id, tbl->lebs_count); + return U32_MAX; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(div_u64(leb_id, tbl->lebs_per_fragment) >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + fragment_index = (u32)div_u64(leb_id, tbl->lebs_per_fragment); + if (fragment_index >= tbl->fragments_count) { + SSDFS_ERR("fragment_index %u >= tbl->fragments_count %u\n", + fragment_index, tbl->fragments_count); + return U32_MAX; + } + + return fragment_index; +} + +/* + * ssdfs_maptbl_get_fragment_descriptor() - get fragment descriptor + * @tbl: pointer on mapping table object + * @leb_id: LEB ID number + * + * RETURN: + * [success] - pointer on fragment descriptor. + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +struct ssdfs_maptbl_fragment_desc * +ssdfs_maptbl_get_fragment_descriptor(struct ssdfs_peb_mapping_table *tbl, + u64 leb_id) +{ + u32 fragment_index = FRAGMENT_INDEX(tbl, leb_id); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("leb_id %llu, fragment index %u\n", + leb_id, fragment_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (fragment_index == U32_MAX) { + SSDFS_ERR("invalid fragment_index: leb_id %llu\n", + leb_id); + return ERR_PTR(-ERANGE); + } + + return &tbl->desc_array[fragment_index]; +} + +/* + * ssdfs_maptbl_get_peb_relation() - retrieve PEB relation + * @fdesc: fragment descriptor + * @leb_desc: LEB descriptor + * @pebr: PEB relation [out] + * + * This method tries to retrieve PEB relation for @leb_desc. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENODATA - unitialized LEB descriptor. + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_get_peb_relation(struct ssdfs_maptbl_fragment_desc *fdesc, + struct ssdfs_leb_descriptor *leb_desc, + struct ssdfs_maptbl_peb_relation *pebr) +{ + u16 physical_index, relation_index; + u64 peb_id; + struct ssdfs_peb_descriptor peb_desc; + struct ssdfs_maptbl_peb_descriptor *ptr; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc || !leb_desc || !pebr); + BUG_ON(!rwsem_is_locked(&fdesc->lock)); + + SSDFS_DBG("fdesc %p, leb_desc %p, pebr %p\n", + fdesc, leb_desc, pebr); +#endif /* CONFIG_SSDFS_DEBUG */ + + physical_index = le16_to_cpu(leb_desc->physical_index); + relation_index = le16_to_cpu(leb_desc->relation_index); + + if (physical_index == U16_MAX) { + SSDFS_DBG("unitialized leb descriptor\n"); + return -ENODATA; + } + + err = ssdfs_maptbl_get_peb_descriptor(fdesc, physical_index, + &peb_id, &peb_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to get peb descriptor: " + "physical_index %u, err %d\n", + physical_index, err); + return err; + } + + ptr = &pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX]; + + if (peb_id == U64_MAX) { + SSDFS_ERR("invalid peb_id\n"); + return -ERANGE; + } + + ptr->peb_id = peb_id; + ptr->shared_peb_index = peb_desc.shared_peb_index; + ptr->erase_cycles = le32_to_cpu(peb_desc.erase_cycles); + ptr->type = peb_desc.type; + ptr->state = peb_desc.state; + ptr->flags = peb_desc.flags; + + if (relation_index == U16_MAX) { + SSDFS_DBG("relation peb_id is absent\n"); + return 0; + } + + err = ssdfs_maptbl_get_peb_descriptor(fdesc, relation_index, + &peb_id, &peb_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to get peb descriptor: " + "relation_index %u, err %d\n", + relation_index, err); + return err; + } + + ptr = &pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX]; + + if (peb_id == U64_MAX) { + SSDFS_ERR("invalid peb_id\n"); + return -ERANGE; + } + + ptr->peb_id = peb_id; + ptr->erase_cycles = le32_to_cpu(peb_desc.erase_cycles); + ptr->type = peb_desc.type; + ptr->state = peb_desc.state; + ptr->flags = le16_to_cpu(peb_desc.flags); + + return 0; +} + +/* + * should_cache_peb_info() - check that PEB info is cached + * @peb_type: PEB type + */ +static inline +bool should_cache_peb_info(u8 peb_type) +{ + return peb_type == SSDFS_MAPTBL_SBSEG_PEB_TYPE || + peb_type == SSDFS_MAPTBL_SEGBMAP_PEB_TYPE || + peb_type == SSDFS_MAPTBL_MAPTBL_PEB_TYPE; +} + +/* + * ssdfs_maptbl_define_pebtbl_page() - define PEB table's page index + * @tbl: pointer on mapping table object + * @desc: fragment descriptor + * @leb_id: LEB ID number + * @peb_desc_index: PEB descriptor index + * + * RETURN: + * [success] - page index. + * [failure] - ULONG_MAX. + */ +static +pgoff_t ssdfs_maptbl_define_pebtbl_page(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_fragment_desc *desc, + u64 leb_id, + u16 peb_desc_index) +{ + u64 leb_id_diff; + u64 stripe_index; + u64 page_index; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !desc); + + if (leb_id < desc->start_leb || + leb_id >= (desc->start_leb + desc->lebs_count)) { + SSDFS_ERR("invalid leb_id: leb_id %llu, " + "start_leb %llu, lebs_count %u\n", + leb_id, desc->start_leb, desc->lebs_count); + return ULONG_MAX; + } + + if (peb_desc_index != U16_MAX) { + if (peb_desc_index >= tbl->pebs_per_fragment) { + SSDFS_ERR("peb_desc_index %u >= pebs_per_fragment %u\n", + peb_desc_index, tbl->pebs_per_fragment); + return ULONG_MAX; + } + } + + SSDFS_DBG("tbl %p, desc %p, leb_id %llu\n", tbl, desc, leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (peb_desc_index >= U16_MAX) { + leb_id_diff = leb_id - desc->start_leb; + stripe_index = div_u64(leb_id_diff, tbl->pebs_per_stripe); + page_index = leb_id_diff - + (stripe_index * tbl->pebs_per_stripe); + page_index = div_u64(page_index, desc->pebs_per_page); + page_index += stripe_index * desc->stripe_pages; + page_index += desc->lebtbl_pages; + } else { + page_index = PEBTBL_PAGE_INDEX(desc, peb_desc_index); + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(page_index > ULONG_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + return (pgoff_t)page_index; +} + +/* + * is_pebtbl_stripe_recovering() - check that PEB is under recovering + * @hdr: PEB table fragment's header + */ +static inline +bool is_pebtbl_stripe_recovering(struct ssdfs_peb_table_fragment_header *hdr) +{ + u16 flags; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!hdr); + + SSDFS_DBG("pebtbl_hdr %p\n", hdr); +#endif /* CONFIG_SSDFS_DEBUG */ + + flags = hdr->flags; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(flags & ~SSDFS_PEBTBL_FLAGS_MASK); +#endif /* CONFIG_SSDFS_DEBUG */ + + return flags & SSDFS_PEBTBL_UNDER_RECOVERING; +} + +/* + * ssdfs_maptbl_solve_inconsistency() - resolve PEB state inconsistency + * @tbl: pointer on mapping table object + * @fdesc: fragment descriptor + * @leb_id: LEB ID number + * @pebr: cached PEB relation + * + * This method tries to change the PEB state in the mapping table + * for the case if cached PEB state is inconsistent. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - unitialized leb descriptor. + */ +int ssdfs_maptbl_solve_inconsistency(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_fragment_desc *fdesc, + u64 leb_id, + struct ssdfs_maptbl_peb_relation *pebr) +{ + struct ssdfs_leb_descriptor leb_desc; + pgoff_t page_index; + struct page *page; + void *kaddr; + struct ssdfs_peb_table_fragment_header *hdr; + u16 physical_index, relation_index; + struct ssdfs_peb_descriptor *peb_desc; + struct ssdfs_maptbl_peb_descriptor *cached; + u16 item_index; + u64 peb_id; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !fdesc || !pebr); + BUG_ON(!rwsem_is_locked(&fdesc->lock)); + + SSDFS_DBG("fdesc %p, leb_id %llu\n", + fdesc, leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_maptbl_get_leb_descriptor(fdesc, leb_id, &leb_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to get leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + return err; + } + + physical_index = le16_to_cpu(leb_desc.physical_index); + + if (physical_index == U16_MAX) { + SSDFS_ERR("unitialized leb descriptor: " + "leb_id %llu\n", leb_id); + return -ENODATA; + } + + page_index = ssdfs_maptbl_define_pebtbl_page(tbl, fdesc, + leb_id, physical_index); + if (page_index == ULONG_MAX) { + SSDFS_ERR("fail to define PEB table's page_index: " + "leb_id %llu, physical_index %u\n", + leb_id, physical_index); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("leb_id %llu, physical_index %u, page_index %lu\n", + leb_id, physical_index, page_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = ssdfs_page_array_get_page_locked(&fdesc->array, page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + page_index); + return err; + } + + kaddr = kmap_local_page(page); + + hdr = (struct ssdfs_peb_table_fragment_header *)kaddr; + + if (is_pebtbl_stripe_recovering(hdr)) { + err = -EACCES; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to change the PEB state: " + "leb_id %llu: " + "stripe %u is under recovering\n", + leb_id, + le16_to_cpu(hdr->stripe_id)); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_physical_index_processing; + } + + item_index = physical_index % fdesc->pebs_per_page; + + peb_id = GET_PEB_ID(kaddr, item_index); + if (peb_id == U64_MAX) { + err = -ERANGE; + SSDFS_ERR("fail to define peb_id: " + "page_index %lu, item_index %u\n", + page_index, item_index); + goto finish_physical_index_processing; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("physical_index %u, item_index %u, " + "pebs_per_page %u, peb_id %llu\n", + physical_index, item_index, + fdesc->pebs_per_page, peb_id); + + SSDFS_DBG("PAGE DUMP: page_index %lu\n", + page_index); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, + PAGE_SIZE); + SSDFS_DBG("\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + peb_desc = GET_PEB_DESCRIPTOR(kaddr, item_index); + if (IS_ERR_OR_NULL(peb_desc)) { + err = IS_ERR(peb_desc) ? PTR_ERR(peb_desc) : -ERANGE; + SSDFS_ERR("fail to get peb_descriptor: " + "page_index %lu, item_index %u, err %d\n", + page_index, item_index, err); + goto finish_physical_index_processing; + } + + cached = &pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX]; + + if (cached->peb_id != peb_id) { + err = -ERANGE; + SSDFS_ERR("invalid main index: " + "cached->peb_id %llu, peb_id %llu\n", + cached->peb_id, peb_id); + goto finish_physical_index_processing; + } + + peb_desc->state = cached->state; + peb_desc->flags = cached->flags; + peb_desc->shared_peb_index = cached->shared_peb_index; + +finish_physical_index_processing: + kunmap_local(kaddr); + + if (!err) { + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + err = ssdfs_page_array_set_page_dirty(&fdesc->array, + page_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: err %d\n", + page_index, err); + } + } + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (err) + return err; + + cached = &pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX]; + relation_index = le16_to_cpu(leb_desc.relation_index); + + if (cached->peb_id >= U64_MAX && relation_index == U16_MAX) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("LEB %llu hasn't relation\n", leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } else if (relation_index == U16_MAX) { + SSDFS_ERR("unitialized leb descriptor: " + "leb_id %llu\n", leb_id); + return -ENODATA; + } + + page_index = ssdfs_maptbl_define_pebtbl_page(tbl, fdesc, + leb_id, relation_index); + if (page_index == ULONG_MAX) { + SSDFS_ERR("fail to define PEB table's page_index: " + "leb_id %llu, relation_index %u\n", + leb_id, relation_index); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("leb_id %llu, relation_index %u, page_index %lu\n", + leb_id, relation_index, page_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = ssdfs_page_array_get_page_locked(&fdesc->array, page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + page_index); + return err; + } + + kaddr = kmap_local_page(page); + + hdr = (struct ssdfs_peb_table_fragment_header *)kaddr; + + if (is_pebtbl_stripe_recovering(hdr)) { + err = -EACCES; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to change the PEB state: " + "leb_id %llu: " + "stripe %u is under recovering\n", + leb_id, + le16_to_cpu(hdr->stripe_id)); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_relation_index_processing; + } + + item_index = relation_index % fdesc->pebs_per_page; + + peb_id = GET_PEB_ID(kaddr, item_index); + if (peb_id == U64_MAX) { + err = -ERANGE; + SSDFS_ERR("fail to define peb_id: " + "page_index %lu, item_index %u\n", + page_index, item_index); + goto finish_relation_index_processing; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("relation_index %u, item_index %u, " + "pebs_per_page %u, peb_id %llu\n", + relation_index, item_index, + fdesc->pebs_per_page, peb_id); + + SSDFS_DBG("PAGE DUMP: page_index %lu\n", + page_index); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, + PAGE_SIZE); + SSDFS_DBG("\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + peb_desc = GET_PEB_DESCRIPTOR(kaddr, item_index); + if (IS_ERR_OR_NULL(peb_desc)) { + err = IS_ERR(peb_desc) ? PTR_ERR(peb_desc) : -ERANGE; + SSDFS_ERR("fail to get peb_descriptor: " + "page_index %lu, item_index %u, err %d\n", + page_index, item_index, err); + goto finish_relation_index_processing; + } + + if (cached->peb_id != peb_id) { + err = -ERANGE; + SSDFS_ERR("invalid main index: " + "cached->peb_id %llu, peb_id %llu\n", + cached->peb_id, peb_id); + goto finish_relation_index_processing; + } + + peb_desc->state = cached->state; + peb_desc->flags = cached->flags; + peb_desc->shared_peb_index = cached->shared_peb_index; + +finish_relation_index_processing: + kunmap_local(kaddr); + + if (!err) { + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + err = ssdfs_page_array_set_page_dirty(&fdesc->array, + page_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: err %d\n", + page_index, err); + } + } + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} From patchwork Sat Feb 25 01:08:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151945 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9E4CC7EE30 for ; Sat, 25 Feb 2023 01:17:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229776AbjBYBR4 (ORCPT ); Fri, 24 Feb 2023 20:17:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49066 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229718AbjBYBRL (ORCPT ); Fri, 24 Feb 2023 20:17:11 -0500 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C946C12864 for ; Fri, 24 Feb 2023 17:17:00 -0800 (PST) Received: by mail-oi1-x22b.google.com with SMTP id bg11so810782oib.5 for ; Fri, 24 Feb 2023 17:17:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=6W9QJr6cnOo6Oc4vSZBQ10mjMf8/dxAMIjj1Xmwsi84=; b=CtfLoqRpigszC+3agUjsqiVuJKoTsoSWF59RnSNHdodvqP5L/4N7vp4caznI5G29y4 eBDJqw+HmC7CPPbvZIzmMxygZeUsZVa6OGmQR01nHK3VnyJYgpisDr+JJ02rnr1HDU1F i0Cr2r49AZc50+0PNiMLkPT5tNY0JdHHFx48Gz7QZ/4gkKEiYqp3la4jhcRL5p/xOlw5 jjHqi2Y5fOf2ybDSQVlBi9P52sjyMUICZEge6rawkEVxT3uVKDEGgF3J6P/2Ibnt2rxl KGFpoPvENncZQGKLV8uZcxJNniHJXfounYfPz0AMhDeM0cMkU37hFeLHGl2nU/3ilYKt dFGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6W9QJr6cnOo6Oc4vSZBQ10mjMf8/dxAMIjj1Xmwsi84=; b=VMC5YAPHkAJDvXjmhEwl973r1FPFo2NVSQ7kXCFWab2LsfdQ6R2WUr+XUsVWfKb92U EC7D8/3EJo/yqvRkbN9TA6Yj6J5nOHUYDyjPVfbVoXlbgQU/xiT7QPtZC5PEFDvF5bAX yUuL+HNy/e/TMPIhcJIwECHuxoQZt4Lplnv40vCi2IE31eREZGCnSfbCanVn3obX71tO DsgFYoy20KcmJboNrNsFr1G/K4pKfIsIKqVJWTK2cxm70Up7Gv6dGgp3jDkyf9cXk8xl Hee5frroXRQ23ePPdFGjUtpanwZmSTlp1iflTZ4Ouv4YZYYJqmLDiOrn17VLvoxijKNd T7sg== X-Gm-Message-State: AO0yUKVyIeN+yt3GYB6FiALV7E+fuucVIqFbScoLFoiW1fhgHwc/fExL oVBYrqXLKxHFKNsxGe2UCP1B7L/FigZXoH51 X-Google-Smtp-Source: AK7set/S+PkkbBHsZsCUDF4qmWbLzh3hhPZ9+HRgFKN7SD2MmaLXZjhLkuNh4i+uZISY5Z3/Mn6sjA== X-Received: by 2002:aca:2309:0:b0:37f:b11c:7525 with SMTP id e9-20020aca2309000000b0037fb11c7525mr5274195oie.29.1677287819457; Fri, 24 Feb 2023 17:16:59 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:16:58 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 40/76] ssdfs: convert/map LEB to PEB functionality Date: Fri, 24 Feb 2023 17:08:51 -0800 Message-Id: <20230225010927.813929-41-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Logical extent represents fundamental concept of SSDFS file system. Any piece of data or metadata on file system volume is identified by: (1) segment ID, (2) logical block ID, and (3) length. As a result, any logical block is always located at the same segment because segment is logical portion of file system volume is always located at the same position. However, logical block's content should be located into some erase block. "Physical" Erase Block (PEB) mapping table implements mapping of Logical Erase Block (LEB) into PEB because any segment is a container for one or several LEBs. Moreover, mapping table supports migration scheme implementation. The migration scheme guarantee that logical block will be always located at the same segment even for the case of update requests. PEB mapping table implements two fundamental methods: (1) convert LEB to PEB; (2) map LEB to PEB. Conversion operation is required if we need to identify which particular PEB contains data for a LEB of particular segment. Mapping operation is required if a clean segment has been allocated because LEB(s) of clean segment need to be associated with PEB(s) that can store logs with user data or metadata. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/peb_mapping_table.c | 3289 ++++++++++++++++++++++++++++++++++ 1 file changed, 3289 insertions(+) diff --git a/fs/ssdfs/peb_mapping_table.c b/fs/ssdfs/peb_mapping_table.c index bfc11bb73360..44995170fe75 100644 --- a/fs/ssdfs/peb_mapping_table.c +++ b/fs/ssdfs/peb_mapping_table.c @@ -4763,3 +4763,3292 @@ int ssdfs_maptbl_solve_inconsistency(struct ssdfs_peb_mapping_table *tbl, return err; } + +/* + * __is_mapped_leb2peb() - check that LEB is mapped + * @leb_desc: LEB descriptor + */ +static inline +bool __is_mapped_leb2peb(struct ssdfs_leb_descriptor *leb_desc) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!leb_desc); + + SSDFS_DBG("physical_index %u, relation_index %u\n", + le16_to_cpu(leb_desc->physical_index), + le16_to_cpu(leb_desc->relation_index)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return le16_to_cpu(leb_desc->physical_index) != U16_MAX; +} + +/* + * is_leb_migrating() - check that LEB is migrating + * @leb_desc: LEB descriptor + */ +static inline +bool is_leb_migrating(struct ssdfs_leb_descriptor *leb_desc) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!leb_desc); + + SSDFS_DBG("physical_index %u, relation_index %u\n", + le16_to_cpu(leb_desc->physical_index), + le16_to_cpu(leb_desc->relation_index)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return le16_to_cpu(leb_desc->relation_index) != U16_MAX; +} + +/* + * ssdfs_maptbl_set_under_erase_state() - set source PEB as under erase + * @fdesc: fragment descriptor + * @index: PEB index in the fragment + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_set_under_erase_state(struct ssdfs_maptbl_fragment_desc *fdesc, + u16 index) +{ + struct ssdfs_peb_descriptor *ptr; + pgoff_t page_index; + u16 item_index; + struct page *page; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc); + + SSDFS_DBG("fdesc %p, index %u\n", + fdesc, index); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_index = PEBTBL_PAGE_INDEX(fdesc, index); + item_index = index % fdesc->pebs_per_page; + + page = ssdfs_page_array_get_page_locked(&fdesc->array, page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + page_index); + return err; + } + + kaddr = kmap_local_page(page); + + ptr = GET_PEB_DESCRIPTOR(kaddr, item_index); + if (IS_ERR_OR_NULL(ptr)) { + err = IS_ERR(ptr) ? PTR_ERR(ptr) : -ERANGE; + SSDFS_ERR("fail to get peb_descriptor: " + "page_index %lu, item_index %u, err %d\n", + page_index, item_index, err); + goto finish_page_processing; + } + + ptr->state = SSDFS_MAPTBL_UNDER_ERASE_STATE; + +finish_page_processing: + kunmap_local(kaddr); + + if (!err) { + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + err = ssdfs_page_array_set_page_dirty(&fdesc->array, + page_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: err %d\n", + page_index, err); + } + } + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_maptbl_set_pre_erase_state() - set source PEB as pre-erased + * @fdesc: fragment descriptor + * @index: PEB index in the fragment + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_set_pre_erase_state(struct ssdfs_maptbl_fragment_desc *fdesc, + u16 index) +{ + struct ssdfs_peb_table_fragment_header *hdr; + struct ssdfs_peb_descriptor *ptr; + pgoff_t page_index; + u16 item_index; + struct page *page; + void *kaddr; + unsigned long *bmap; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc); + + SSDFS_DBG("fdesc %p, index %u\n", + fdesc, index); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_index = PEBTBL_PAGE_INDEX(fdesc, index); + item_index = index % fdesc->pebs_per_page; + + page = ssdfs_page_array_get_page_locked(&fdesc->array, page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + page_index); + return err; + } + + kaddr = kmap_local_page(page); + + ptr = GET_PEB_DESCRIPTOR(kaddr, item_index); + if (IS_ERR_OR_NULL(ptr)) { + err = IS_ERR(ptr) ? PTR_ERR(ptr) : -ERANGE; + SSDFS_ERR("fail to get peb_descriptor: " + "page_index %lu, item_index %u, err %d\n", + page_index, item_index, err); + goto finish_page_processing; + } + + ptr->state = SSDFS_MAPTBL_PRE_ERASE_STATE; + + hdr = (struct ssdfs_peb_table_fragment_header *)kaddr; + bmap = (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_DIRTY_BMAP][0]; + bitmap_set(bmap, item_index, 1); + +finish_page_processing: + kunmap_local(kaddr); + + if (!err) { + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + err = ssdfs_page_array_set_page_dirty(&fdesc->array, + page_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: err %d\n", + page_index, err); + } + } + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_maptbl_set_snapshot_state() - set PEB in snapshot state + * @fdesc: fragment descriptor + * @index: PEB index in the fragment + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_set_snapshot_state(struct ssdfs_maptbl_fragment_desc *fdesc, + u16 index) +{ + struct ssdfs_peb_table_fragment_header *hdr; + struct ssdfs_peb_descriptor *ptr; + pgoff_t page_index; + u16 item_index; + struct page *page; + void *kaddr; + unsigned long *bmap; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc); + + SSDFS_DBG("fdesc %p, index %u\n", + fdesc, index); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_index = PEBTBL_PAGE_INDEX(fdesc, index); + item_index = index % fdesc->pebs_per_page; + + page = ssdfs_page_array_get_page_locked(&fdesc->array, page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + page_index); + return err; + } + + kaddr = kmap_local_page(page); + + ptr = GET_PEB_DESCRIPTOR(kaddr, item_index); + if (IS_ERR_OR_NULL(ptr)) { + err = IS_ERR(ptr) ? PTR_ERR(ptr) : -ERANGE; + SSDFS_ERR("fail to get peb_descriptor: " + "page_index %lu, item_index %u, err %d\n", + page_index, item_index, err); + goto finish_page_processing; + } + + ptr->state = SSDFS_MAPTBL_SNAPSHOT_STATE; + + hdr = (struct ssdfs_peb_table_fragment_header *)kaddr; + bmap = (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_DIRTY_BMAP][0]; + bitmap_set(bmap, item_index, 1); + +finish_page_processing: + kunmap_local(kaddr); + + if (!err) { + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + err = ssdfs_page_array_set_page_dirty(&fdesc->array, + page_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: err %d\n", + page_index, err); + } + } + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_maptbl_set_source_state() - set destination PEB as source + * @fdesc: fragment descriptor + * @index: PEB index in the fragment + * @peb_state: PEB's state + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_set_source_state(struct ssdfs_maptbl_fragment_desc *fdesc, + u16 index, u8 peb_state) +{ + struct ssdfs_peb_descriptor *ptr; + pgoff_t page_index; + u16 item_index; + struct page *page; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc); + + SSDFS_DBG("fdesc %p, index %u\n", + fdesc, index); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_index = PEBTBL_PAGE_INDEX(fdesc, index); + item_index = index % fdesc->pebs_per_page; + + page = ssdfs_page_array_get_page_locked(&fdesc->array, page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + page_index); + return err; + } + + kaddr = kmap_local_page(page); + + ptr = GET_PEB_DESCRIPTOR(kaddr, item_index); + if (IS_ERR_OR_NULL(ptr)) { + err = IS_ERR(ptr) ? PTR_ERR(ptr) : -ERANGE; + SSDFS_ERR("fail to get peb_descriptor: " + "page_index %lu, item_index %u, err %d\n", + page_index, item_index, err); + goto finish_page_processing; + } + + if (peb_state == SSDFS_MAPTBL_UNKNOWN_PEB_STATE) { + switch (ptr->state) { + case SSDFS_MAPTBL_MIGRATION_DST_CLEAN_STATE: + ptr->state = SSDFS_MAPTBL_CLEAN_PEB_STATE; + break; + + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + ptr->state = SSDFS_MAPTBL_USING_PEB_STATE; + break; + + case SSDFS_MAPTBL_MIGRATION_DST_USED_STATE: + ptr->state = SSDFS_MAPTBL_USED_PEB_STATE; + break; + + case SSDFS_MAPTBL_MIGRATION_DST_PRE_DIRTY_STATE: + ptr->state = SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE; + break; + + case SSDFS_MAPTBL_MIGRATION_DST_DIRTY_STATE: + ptr->state = SSDFS_MAPTBL_DIRTY_PEB_STATE; + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid PEB state: " + "state %#x\n", + ptr->state); + goto finish_page_processing; + } + } else { + switch (ptr->state) { + case SSDFS_MAPTBL_MIGRATION_DST_CLEAN_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_DIRTY_STATE: + ptr->state = peb_state; + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid PEB state: " + "state %#x\n", + ptr->state); + goto finish_page_processing; + break; + } + } + +finish_page_processing: + kunmap_local(kaddr); + + if (!err) { + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + err = ssdfs_page_array_set_page_dirty(&fdesc->array, + page_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: err %d\n", + page_index, err); + } + } + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * __ssdfs_maptbl_exclude_migration_peb() - correct LEB table state + * @ptr: fragment descriptor + * @leb_id: LEB ID number + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_maptbl_exclude_migration_peb(struct ssdfs_maptbl_fragment_desc *ptr, + u64 leb_id) +{ + struct ssdfs_leb_table_fragment_header *hdr; + struct ssdfs_leb_descriptor *leb_desc; + pgoff_t page_index; + struct page *page; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr); + + SSDFS_DBG("fdesc %p, leb_id %llu\n", + ptr, leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_index = LEBTBL_PAGE_INDEX(ptr, leb_id); + if (page_index == ULONG_MAX) { + SSDFS_ERR("fail to define page_index: " + "leb_id %llu\n", + leb_id); + return -ERANGE; + } + + page = ssdfs_page_array_get_page_locked(&ptr->array, page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + page_index); + return err; + } + + kaddr = kmap_local_page(page); + + leb_desc = GET_LEB_DESCRIPTOR(kaddr, leb_id); + if (IS_ERR_OR_NULL(leb_desc)) { + err = IS_ERR(leb_desc) ? PTR_ERR(leb_desc) : -ERANGE; + SSDFS_ERR("fail to get leb_descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_page_processing; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("INITIAL: page_index %lu, " + "physical_index %u, relation_index %u\n", + page_index, + le16_to_cpu(leb_desc->physical_index), + le16_to_cpu(leb_desc->relation_index)); +#endif /* CONFIG_SSDFS_DEBUG */ + + leb_desc->physical_index = leb_desc->relation_index; + leb_desc->relation_index = cpu_to_le16(U16_MAX); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("MODIFIED: page_index %lu, " + "physical_index %u, relation_index %u\n", + page_index, + le16_to_cpu(leb_desc->physical_index), + le16_to_cpu(leb_desc->relation_index)); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr = (struct ssdfs_leb_table_fragment_header *)kaddr; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(le16_to_cpu(hdr->migrating_lebs) == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + le16_add_cpu(&hdr->migrating_lebs, -1); + +finish_page_processing: + kunmap_local(kaddr); + + if (!err) { + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + err = ssdfs_page_array_set_page_dirty(&ptr->array, + page_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: err %d\n", + page_index, err); + } + } + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_maptbl_solve_pre_deleted_state() - exclude pre-deleted migration PEB + * @tbl: pointer on mapping table object + * @fdesc: fragment descriptor + * @leb_id: LEB ID number + * @pebr: cached PEB relation + * + * This method tries to exclude the pre-deleted migration PEB + * from the relation by means of mapping table modification if + * the migration PEB is marked as pre-deleted in the mapping + * table cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int +ssdfs_maptbl_solve_pre_deleted_state(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_fragment_desc *fdesc, + u64 leb_id, + struct ssdfs_maptbl_peb_relation *pebr) +{ + struct ssdfs_leb_descriptor leb_desc; + u16 physical_index, relation_index; + int peb_state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !fdesc); + BUG_ON(!rwsem_is_locked(&fdesc->lock)); + + SSDFS_DBG("fdesc %p, leb_id %llu\n", + fdesc, leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_maptbl_get_leb_descriptor(fdesc, leb_id, &leb_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to get leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + return err; + } + + if (!__is_mapped_leb2peb(&leb_desc)) { + SSDFS_ERR("leb %llu doesn't be mapped yet\n", + leb_id); + return -ERANGE; + } + + if (!is_leb_migrating(&leb_desc)) { + SSDFS_ERR("leb %llu isn't under migration\n", + leb_id); + return -ERANGE; + } + + physical_index = le16_to_cpu(leb_desc.physical_index); + relation_index = le16_to_cpu(leb_desc.relation_index); + + peb_state = pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].state; + + switch (peb_state) { + case SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid state %#x of source PEB\n", + peb_state); + return -ERANGE; + } + + err = ssdfs_maptbl_set_pre_erase_state(fdesc, physical_index); + if (unlikely(err)) { + SSDFS_ERR("fail to move PEB into pre-erase state: " + "index %u, err %d\n", + physical_index, err); + return err; + } + + peb_state = pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].state; + + err = ssdfs_maptbl_set_source_state(fdesc, relation_index, + (u8)peb_state); + if (unlikely(err)) { + SSDFS_ERR("fail to move PEB into source state: " + "index %u, peb_state %#x, err %d\n", + relation_index, peb_state, err); + return err; + } + + err = __ssdfs_maptbl_exclude_migration_peb(fdesc, leb_id); + if (unlikely(err)) { + SSDFS_ERR("fail to change leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(fdesc->migrating_lebs == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + fdesc->migrating_lebs--; + fdesc->pre_erase_pebs++; + atomic_inc(&tbl->pre_erase_pebs); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("mapped_lebs %u, migrating_lebs %u\n", + fdesc->mapped_lebs, fdesc->migrating_lebs); + SSDFS_DBG("fdesc->pre_erase_pebs %u, tbl->pre_erase_pebs %d\n", + fdesc->pre_erase_pebs, + atomic_read(&tbl->pre_erase_pebs)); +#endif /* CONFIG_SSDFS_DEBUG */ + + wake_up(&tbl->wait_queue); + + return 0; +} + +/* + * ssdfs_maptbl_set_fragment_dirty() - set fragment as dirty + * @tbl: pointer on mapping table object + * @fdesc: fragment descriptor + * @leb_id: LEB ID number + */ +void ssdfs_maptbl_set_fragment_dirty(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_fragment_desc *fdesc, + u64 leb_id) +{ + u32 fragment_index; +#ifdef CONFIG_SSDFS_DEBUG + size_t bytes_count; +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !fdesc); +#endif /* CONFIG_SSDFS_DEBUG */ + + fragment_index = FRAGMENT_INDEX(tbl, leb_id); + + if (is_ssdfs_maptbl_going_to_be_destroyed(tbl)) { + SSDFS_WARN("maptbl %p, leb_id %llu, " + "fdesc %p, fragment_index %u, " + "start_leb %llu, lebs_count %u\n", + tbl, leb_id, + fdesc, fragment_index, + fdesc->start_leb, fdesc->lebs_count); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("maptbl %p, leb_id %llu, " + "fdesc %p, fragment_index %u, " + "start_leb %llu, lebs_count %u\n", + tbl, leb_id, + fdesc, fragment_index, + fdesc->start_leb, fdesc->lebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(fragment_index == U32_MAX); + BUG_ON(fragment_index >= tbl->fragments_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + mutex_lock(&tbl->bmap_lock); +#ifdef CONFIG_SSDFS_DEBUG + bytes_count = tbl->fragments_count + BITS_PER_LONG - 1; + bytes_count /= BITS_PER_BYTE; + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + tbl->dirty_bmap, bytes_count); +#endif /* CONFIG_SSDFS_DEBUG */ + atomic_set(&fdesc->state, SSDFS_MAPTBL_FRAG_DIRTY); + bitmap_set(tbl->dirty_bmap, fragment_index, 1); + mutex_unlock(&tbl->bmap_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment_index %u, state %#x\n", + fragment_index, + atomic_read(&fdesc->state)); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +/* + * ssdfs_maptbl_convert_leb2peb() - get description of PEBs + * @fsi: file system info object + * @leb_id: LEB ID number + * @peb_type: PEB type + * @pebr: description of PEBs relation [out] + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to get description of PEBs for the + * LEB ID number. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EAGAIN - fragment is under initialization yet. + * %-EFAULT - maptbl has inconsistent state. + * %-ENODATA - LEB doesn't mapped to PEB yet. + * %-ERANGE - internal error. + */ +int ssdfs_maptbl_convert_leb2peb(struct ssdfs_fs_info *fsi, + u64 leb_id, + u8 peb_type, + struct ssdfs_maptbl_peb_relation *pebr, + struct completion **end) +{ + struct ssdfs_peb_mapping_table *tbl; + struct ssdfs_maptbl_cache *cache; + struct ssdfs_maptbl_fragment_desc *fdesc; + struct ssdfs_leb_descriptor leb_desc; + struct ssdfs_maptbl_peb_relation cached_pebr; + size_t peb_relation_size = sizeof(struct ssdfs_maptbl_peb_relation); + u8 consistency = SSDFS_PEB_STATE_CONSISTENT; + int state; + u64 peb_id; + u8 peb_state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !pebr || !end); + + SSDFS_DBG("fsi %p, leb_id %llu, peb_type %#x, " + "pebr %p, init_end %p\n", + fsi, leb_id, peb_type, pebr, end); +#endif /* CONFIG_SSDFS_DEBUG */ + + *end = NULL; + memset(pebr, 0xFF, peb_relation_size); + + tbl = fsi->maptbl; + cache = &tbl->fsi->maptbl_cache; + + if (!tbl) { + err = 0; + + if (should_cache_peb_info(peb_type)) { + err = ssdfs_maptbl_cache_convert_leb2peb(cache, leb_id, + pebr); + if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, err %d\n", + leb_id, err); + } + } else { + err = -ERANGE; + SSDFS_CRIT("mapping table is absent\n"); + } + + return err; + } + + if (atomic_read(&tbl->flags) & SSDFS_MAPTBL_ERROR) { + ssdfs_fs_error(tbl->fsi->sb, + __FILE__, __func__, __LINE__, + "maptbl has corrupted state\n"); + return -EFAULT; + } + + if (rwsem_is_locked(&tbl->tbl_lock) && + atomic_read(&tbl->flags) & SSDFS_MAPTBL_UNDER_FLUSH) { + if (should_cache_peb_info(peb_type)) { + err = ssdfs_maptbl_cache_convert_leb2peb(cache, leb_id, + pebr); + if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, err %d\n", + leb_id, err); + } + + return err; + } + } + + down_read(&tbl->tbl_lock); + + if (peb_type == SSDFS_MAPTBL_UNKNOWN_PEB_TYPE) { + /* + * GC thread requested the conversion + * without the knowledge of PEB's type. + */ + goto start_convert_leb2peb; + } + + if (should_cache_peb_info(peb_type)) { + struct ssdfs_maptbl_peb_descriptor *peb_desc; + + err = __ssdfs_maptbl_cache_convert_leb2peb(cache, leb_id, + &cached_pebr); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to convert LEB to PEB: " + "leb_id %llu, err %d\n", + leb_id, err); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_conversion; + } else if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_conversion; + } + + peb_desc = &cached_pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX]; + consistency = peb_desc->consistency; + + switch (consistency) { + case SSDFS_PEB_STATE_CONSISTENT: + peb_desc = + &cached_pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX]; + switch (peb_desc->consistency) { + case SSDFS_PEB_STATE_INCONSISTENT: + consistency = peb_desc->consistency; + break; + + default: + /* do nothing */ + break; + } + break; + + default: + /* do nothing */ + break; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("MAIN_INDEX: peb_id %llu, type %#x, " + "state %#x, consistency %#x; " + "RELATION_INDEX: peb_id %llu, type %#x, " + "state %#x, consistency %#x\n", + cached_pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id, + cached_pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].type, + cached_pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + cached_pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].consistency, + cached_pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX].peb_id, + cached_pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX].type, + cached_pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX].state, + cached_pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX].consistency); +#endif /* CONFIG_SSDFS_DEBUG */ + } + +start_convert_leb2peb: + fdesc = ssdfs_maptbl_get_fragment_descriptor(tbl, leb_id); + if (IS_ERR_OR_NULL(fdesc)) { + err = IS_ERR(fdesc) ? PTR_ERR(fdesc) : -ERANGE; + SSDFS_ERR("fail to get fragment descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_conversion; + } + + *end = &fdesc->init_end; + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED) { + err = -EFAULT; + SSDFS_ERR("fragment is corrupted: leb_id %llu\n", + leb_id); + goto finish_conversion; + } else if (state == SSDFS_MAPTBL_FRAG_CREATED) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: " + "leb_id %llu\n", leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + err = -EAGAIN; + goto finish_conversion; + } + + switch (consistency) { + case SSDFS_PEB_STATE_CONSISTENT: + down_read(&fdesc->lock); + + err = ssdfs_maptbl_get_leb_descriptor(fdesc, leb_id, &leb_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to get leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_consistent_case; + } + + err = ssdfs_maptbl_get_peb_relation(fdesc, &leb_desc, pebr); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to get peb relation: " + "leb_id %llu, err %d\n", + leb_id, err); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_consistent_case; + } else if (unlikely(err)) { + SSDFS_ERR("fail to get peb relation: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_consistent_case; + } + +finish_consistent_case: + up_read(&fdesc->lock); + break; + + case SSDFS_PEB_STATE_INCONSISTENT: + down_write(&cache->lock); + down_write(&fdesc->lock); + + err = ssdfs_maptbl_cache_convert_leb2peb_nolock(cache, + leb_id, + &cached_pebr); + if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_inconsistent_case; + } + + err = ssdfs_maptbl_solve_inconsistency(tbl, fdesc, leb_id, + &cached_pebr); + if (unlikely(err)) { + SSDFS_ERR("fail to resolve inconsistency: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_inconsistent_case; + } + + err = ssdfs_maptbl_get_leb_descriptor(fdesc, leb_id, &leb_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to get leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_inconsistent_case; + } + + err = ssdfs_maptbl_get_peb_relation(fdesc, &leb_desc, pebr); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to get peb relation: " + "leb_id %llu, err %d\n", + leb_id, err); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_inconsistent_case; + } else if (unlikely(err)) { + SSDFS_ERR("fail to get peb relation: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_inconsistent_case; + } + + peb_id = cached_pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id; + peb_state = cached_pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].state; + if (peb_id != U64_MAX) { + consistency = SSDFS_PEB_STATE_CONSISTENT; + err = ssdfs_maptbl_cache_change_peb_state_nolock(cache, + leb_id, + peb_state, + consistency); + if (unlikely(err)) { + SSDFS_ERR("fail to change PEB state: " + "leb_id %llu, peb_state %#x, " + "err %d\n", + leb_id, peb_state, err); + goto finish_inconsistent_case; + } + } + + peb_id = cached_pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX].peb_id; + peb_state = cached_pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX].state; + if (peb_id != U64_MAX) { + consistency = SSDFS_PEB_STATE_CONSISTENT; + err = ssdfs_maptbl_cache_change_peb_state_nolock(cache, + leb_id, + peb_state, + consistency); + if (unlikely(err)) { + SSDFS_ERR("fail to change PEB state: " + "leb_id %llu, peb_state %#x, " + "err %d\n", + leb_id, peb_state, err); + goto finish_inconsistent_case; + } + } + +finish_inconsistent_case: + up_write(&fdesc->lock); + up_write(&cache->lock); + + if (!err) { + ssdfs_maptbl_set_fragment_dirty(tbl, fdesc, + leb_id); + } + break; + + case SSDFS_PEB_STATE_PRE_DELETED: + down_write(&cache->lock); + down_write(&fdesc->lock); + + err = ssdfs_maptbl_cache_convert_leb2peb_nolock(cache, + leb_id, + &cached_pebr); + if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_pre_deleted_case; + } + + err = ssdfs_maptbl_solve_pre_deleted_state(tbl, fdesc, leb_id, + &cached_pebr); + if (unlikely(err)) { + SSDFS_ERR("fail to resolve pre-deleted state: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_pre_deleted_case; + } + + err = ssdfs_maptbl_get_leb_descriptor(fdesc, leb_id, &leb_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to get leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_pre_deleted_case; + } + + err = ssdfs_maptbl_get_peb_relation(fdesc, &leb_desc, pebr); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to get peb relation: " + "leb_id %llu, err %d\n", + leb_id, err); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_pre_deleted_case; + } else if (unlikely(err)) { + SSDFS_ERR("fail to get peb relation: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_pre_deleted_case; + } + + consistency = SSDFS_PEB_STATE_CONSISTENT; + err = ssdfs_maptbl_cache_forget_leb2peb_nolock(cache, + leb_id, + consistency); + if (unlikely(err)) { + SSDFS_ERR("fail to exclude migration PEB: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_pre_deleted_case; + } + +finish_pre_deleted_case: + up_write(&fdesc->lock); + up_write(&cache->lock); + + if (!err) { + ssdfs_maptbl_set_fragment_dirty(tbl, fdesc, + leb_id); + } + break; + + default: + err = -EFAULT; + SSDFS_ERR("invalid consistency %#x\n", + consistency); + goto finish_conversion; + } + +finish_conversion: + up_read(&tbl->tbl_lock); + + if (!err && peb_type == SSDFS_MAPTBL_UNKNOWN_PEB_TYPE) { + peb_type = pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].type; + + if (should_cache_peb_info(peb_type)) { + err = ssdfs_maptbl_cache_convert_leb2peb(cache, leb_id, + &cached_pebr); + if (err == -ENODATA) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cache has nothing for leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, err %d\n", + leb_id, err); + return err; + } else { + /* use the cached value */ + ssdfs_memcpy(pebr, 0, peb_relation_size, + &cached_pebr, 0, peb_relation_size, + peb_relation_size); + } + } + } else if (err == -EAGAIN && should_cache_peb_info(peb_type)) { + err = ssdfs_maptbl_cache_convert_leb2peb(cache, leb_id, + pebr); + if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, err %d\n", + leb_id, err); + return err; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("MAIN_INDEX: peb_id %llu, type %#x, " + "state %#x, consistency %#x; " + "RELATION_INDEX: peb_id %llu, type %#x, " + "state %#x, consistency %#x\n", + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id, + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].type, + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].consistency, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].peb_id, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].type, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].state, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].consistency); + + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * is_mapped_leb2peb() - check that LEB is mapped + * @fdesc: fragment descriptor + * @leb_id: LEB ID number + */ +static inline +bool is_mapped_leb2peb(struct ssdfs_maptbl_fragment_desc *fdesc, + u64 leb_id) +{ + struct ssdfs_leb_descriptor leb_desc; + bool is_mapped; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc); + + SSDFS_DBG("leb_id %llu, fdesc %p\n", + leb_id, fdesc); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_maptbl_get_leb_descriptor(fdesc, leb_id, &leb_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to get leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + return false; + } + + is_mapped = __is_mapped_leb2peb(&leb_desc); + + if (!is_mapped) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unitialized leb descriptor: leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return is_mapped; +} + +static inline +bool need_try2reserve_peb(struct ssdfs_fs_info *fsi) +{ +#define SSDFS_PEB_RESERVATION_THRESHOLD 1 + return fsi->pebs_per_seg == SSDFS_PEB_RESERVATION_THRESHOLD; +} + +/* + * can_be_mapped_leb2peb() - check that LEB can be mapped + * @tbl: pointer on mapping table object + * @fdesc: fragment descriptor + * @leb_id: LEB ID number + */ +static inline +bool can_be_mapped_leb2peb(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_fragment_desc *fdesc, + u64 leb_id) +{ + u32 unused_lebs; + u32 expected2migrate = 0; + u32 reserved_pool = 0; + u32 migration_NOT_guaranted = 0; + u32 threshold; + bool is_mapping_possible = false; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !fdesc); + BUG_ON(!tbl->fsi); + + SSDFS_DBG("maptbl %p, leb_id %llu, fdesc %p\n", + tbl, leb_id, fdesc); +#endif /* CONFIG_SSDFS_DEBUG */ + + expected2migrate = fdesc->mapped_lebs - fdesc->migrating_lebs; + reserved_pool = fdesc->reserved_pebs + fdesc->pre_erase_pebs; + + if (expected2migrate > reserved_pool) + migration_NOT_guaranted = expected2migrate - reserved_pool; + else + migration_NOT_guaranted = 0; + + unused_lebs = ssdfs_unused_lebs_in_fragment(fdesc); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("lebs_count %u, mapped_lebs %u, " + "migrating_lebs %u, reserved_pebs %u, " + "pre_erase_pebs %u, expected2migrate %u, " + "reserved_pool %u, migration_NOT_guaranted %u, " + "unused_lebs %u\n", + fdesc->lebs_count, fdesc->mapped_lebs, + fdesc->migrating_lebs, fdesc->reserved_pebs, + fdesc->pre_erase_pebs, expected2migrate, + reserved_pool, migration_NOT_guaranted, + unused_lebs); +#endif /* CONFIG_SSDFS_DEBUG */ + + threshold = ssdfs_lebs_reservation_threshold(fdesc); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unused_lebs %u, migration_NOT_guaranted %u, " + "threshold %u, stripe_pages %u\n", + unused_lebs, + migration_NOT_guaranted, + threshold, + fdesc->stripe_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + if ((reserved_pool + 1) >= unused_lebs) { + is_mapping_possible = false; + goto finish_check; + } + + if (need_try2reserve_peb(tbl->fsi)) { + threshold = max_t(u32, threshold, + (u32)tbl->stripes_per_fragment); + + if (unused_lebs > threshold) { + is_mapping_possible = true; + goto finish_check; + } + + if (migration_NOT_guaranted == 0 && + unused_lebs > tbl->stripes_per_fragment) { + is_mapping_possible = true; + goto finish_check; + } + } else { + if (unused_lebs > threshold) { + is_mapping_possible = true; + goto finish_check; + } + + if (migration_NOT_guaranted == 0 && unused_lebs > 0) { + is_mapping_possible = true; + goto finish_check; + } + } + +finish_check: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("is_mapping_possible %#x\n", + is_mapping_possible); +#endif /* CONFIG_SSDFS_DEBUG */ + + return is_mapping_possible; +} + +/* + * has_fragment_unused_pebs() - check that fragment has unused PEBs + * @hdr: PEB table fragment's header + */ +static inline +bool has_fragment_unused_pebs(struct ssdfs_peb_table_fragment_header *hdr) +{ + unsigned long *bmap; + u16 pebs_count; + int used_pebs, unused_pebs; + u16 reserved_pebs; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!hdr); +#endif /* CONFIG_SSDFS_DEBUG */ + + pebs_count = le16_to_cpu(hdr->pebs_count); + + bmap = (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_USED_BMAP][0]; + used_pebs = bitmap_weight(bmap, pebs_count); + unused_pebs = pebs_count - used_pebs; + + WARN_ON(unused_pebs < 0); + + reserved_pebs = le16_to_cpu(hdr->reserved_pebs); + + if (reserved_pebs > unused_pebs) { + SSDFS_ERR("reserved_pebs %u > unused_pebs %u\n", + reserved_pebs, unused_pebs); + return false; + } + + unused_pebs -= reserved_pebs; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hdr %p, unused_pebs %d, reserved_pebs %u\n", + hdr, unused_pebs, reserved_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + + return unused_pebs > 0; +} + +/* + * ssdfs_maptbl_decrease_reserved_pebs() - decrease amount of reserved PEBs + * @fsi: file system info object + * @desc: fragment descriptor + * @hdr: PEB table fragment's header + * + * This method tries to move some amount of reserved PEBs into + * unused state. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - unable to decrease amount of reserved PEBs. + */ +static +int ssdfs_maptbl_decrease_reserved_pebs(struct ssdfs_fs_info *fsi, + struct ssdfs_maptbl_fragment_desc *desc, + struct ssdfs_peb_table_fragment_header *hdr) +{ + unsigned long *bmap; + u32 expected2migrate; + u16 pebs_count; + u16 reserved_pebs; + u16 used_pebs; + u16 unused_pebs; + u16 new_reservation; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!desc || !hdr); +#endif /* CONFIG_SSDFS_DEBUG */ + + pebs_count = le16_to_cpu(hdr->pebs_count); + reserved_pebs = le16_to_cpu(hdr->reserved_pebs); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("desc %p, hdr %p\n", desc, hdr); + SSDFS_DBG("mapped_lebs %u, migrating_lebs %u, " + "pebs_count %u, reserved_pebs %u\n", + desc->mapped_lebs, desc->migrating_lebs, + pebs_count, reserved_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + + expected2migrate = (desc->mapped_lebs - desc->migrating_lebs); + expected2migrate /= desc->stripe_pages; + + bmap = (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_USED_BMAP][0]; + used_pebs = bitmap_weight(bmap, pebs_count); + unused_pebs = pebs_count - used_pebs; + + if (reserved_pebs > unused_pebs) { + SSDFS_ERR("reserved_pebs %u > unused_pebs %u\n", + reserved_pebs, unused_pebs); + return -ERANGE; + } + + unused_pebs -= reserved_pebs; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pebs_count %u, used_pebs %u, unused_pebs %u, " + "expected2migrate %u\n", + pebs_count, used_pebs, + unused_pebs, expected2migrate); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unused_pebs > reserved_pebs) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("no necessity to decrease: " + "unused_pebs %u, reserved_pebs %u\n", + unused_pebs, reserved_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + new_reservation = max_t(u16, expected2migrate, + (unused_pebs * 20) / 100); + + if (reserved_pebs > new_reservation) { + u64 free_pages; + u64 new_free_pages; + u16 new_unused_pebs = reserved_pebs - new_reservation; + + hdr->reserved_pebs = cpu_to_le16(new_reservation); + desc->reserved_pebs -= new_unused_pebs; + + spin_lock(&fsi->volume_state_lock); + new_free_pages = (u64)new_unused_pebs * fsi->pages_per_peb; + fsi->free_pages += new_free_pages; + free_pages = fsi->free_pages; + spin_unlock(&fsi->volume_state_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_pages %llu, new_free_pages %llu\n", + free_pages, new_free_pages); + SSDFS_DBG("reserved_pebs %u, new_reservation %u, " + "desc->reserved_pebs %u\n", + reserved_pebs, new_reservation, + desc->reserved_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to decrease reserved PEBs\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return -ENOSPC; +} + +static inline +u32 ssdfs_mandatory_reserved_pebs_pct(struct ssdfs_fs_info *fsi) +{ + u32 percentage = 50; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + + SSDFS_DBG("fsi %p\n", fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + + percentage /= fsi->pebs_per_seg; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pebs_per_seg %u, percentage %u\n", + fsi->pebs_per_seg, percentage); +#endif /* CONFIG_SSDFS_DEBUG */ + + return percentage; +} + +/* + * ssdfs_maptbl_increase_reserved_pebs() - increase amount of reserved PEBs + * @fsi: file system info object + * @desc: fragment descriptor + * @hdr: PEB table fragment's header + * + * This method tries to move some amount of unused PEBs into + * reserved state. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOSPC - unable to increase amount of reserved PEBs. + */ +static +int ssdfs_maptbl_increase_reserved_pebs(struct ssdfs_fs_info *fsi, + struct ssdfs_maptbl_fragment_desc *desc, + struct ssdfs_peb_table_fragment_header *hdr) +{ + unsigned long *bmap; + u32 expected2migrate; + u16 pebs_count; + u16 reserved_pebs; + u16 used_pebs; + u16 unused_pebs; + u64 free_pages = 0; + u64 free_pebs = 0; + u64 reserved_pages = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!desc || !hdr); + + if (desc->migrating_lebs > desc->mapped_lebs) { + SSDFS_ERR("fragment is corrupted: " + "migrating_lebs %u, mapped_lebs %u\n", + desc->migrating_lebs, + desc->mapped_lebs); + return -ERANGE; + } + + SSDFS_DBG("desc %p, hdr %p\n", desc, hdr); +#endif /* CONFIG_SSDFS_DEBUG */ + + pebs_count = le16_to_cpu(hdr->pebs_count); + reserved_pebs = le16_to_cpu(hdr->reserved_pebs); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("mapped_lebs %u, migrating_lebs %u, " + "pebs_count %u, reserved_pebs %u\n", + desc->mapped_lebs, desc->migrating_lebs, + pebs_count, reserved_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + + expected2migrate = desc->mapped_lebs - desc->migrating_lebs; + + bmap = (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_USED_BMAP][0]; + used_pebs = bitmap_weight(bmap, pebs_count); + unused_pebs = pebs_count - used_pebs; + + if (reserved_pebs > unused_pebs) { + SSDFS_ERR("reserved_pebs %u > unused_pebs %u\n", + reserved_pebs, unused_pebs); + return -ERANGE; + } + + unused_pebs -= reserved_pebs; + + if (need_try2reserve_peb(fsi)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("used_pebs %u, unused_pebs %u, " + "reserved_pebs %u\n", + used_pebs, unused_pebs, reserved_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (reserved_pebs < used_pebs && unused_pebs >= used_pebs) { + reserved_pebs = used_pebs; + + spin_lock(&fsi->volume_state_lock); + free_pages = fsi->free_pages; + free_pebs = div64_u64(free_pages, fsi->pages_per_peb); + if (reserved_pebs <= free_pebs) { + reserved_pages = (u64)reserved_pebs * + fsi->pages_per_peb; + fsi->free_pages -= reserved_pages; + free_pages = fsi->free_pages; + hdr->reserved_pebs = cpu_to_le16(reserved_pebs); + desc->reserved_pebs += reserved_pebs; + } else + err = -ENOSPC; + spin_unlock(&fsi->volume_state_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_pages %llu, reserved_pages %llu, " + "reserved_pebs %u, err %d\n", + free_pages, reserved_pages, + reserved_pebs, err); + SSDFS_DBG("hdr->reserved_pebs %u\n", + le16_to_cpu(hdr->reserved_pebs)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; + } + } + + if (reserved_pebs > 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("no need to increase reserved pebs: " + "reserved_pebs %u\n", + reserved_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + reserved_pebs = min_t(u16, unused_pebs / 2, expected2migrate); + + if (reserved_pebs == 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("reserved_pebs %u, unused_pebs %u, " + "expected2migrate %u\n", + reserved_pebs, unused_pebs, + expected2migrate); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } + + spin_lock(&fsi->volume_state_lock); + free_pages = fsi->free_pages; + free_pebs = div64_u64(free_pages, fsi->pages_per_peb); + if (reserved_pebs <= free_pebs) { + reserved_pages = (u64)reserved_pebs * fsi->pages_per_peb; + fsi->free_pages -= reserved_pages; + free_pages = fsi->free_pages; + le16_add_cpu(&hdr->reserved_pebs, reserved_pebs); + desc->reserved_pebs += reserved_pebs; + } else + err = -ENOSPC; + spin_unlock(&fsi->volume_state_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_pages %llu, reserved_pages %llu, " + "reserved_pebs %u, err %d\n", + free_pages, reserved_pages, + reserved_pebs, err); + SSDFS_DBG("hdr->reserved_pebs %u\n", + le16_to_cpu(hdr->reserved_pebs)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_maptbl_get_erase_threshold() - detect erase threshold for fragment + * @hdr: PEB table fragment's header + * @start: start item for search + * @max: upper bound for the search + * @used_pebs: number of used PEBs + * @found: found item index [out] + * @erase_cycles: erase cycles for found item [out] + * + * This method tries to detect the erase threshold of + * PEB table's fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENODATA - unable to detect the erase threshold. + */ +static int +ssdfs_maptbl_get_erase_threshold(struct ssdfs_peb_table_fragment_header *hdr, + unsigned long start, unsigned long max, + unsigned long used_pebs, + unsigned long *found, u32 *threshold) +{ + struct ssdfs_peb_descriptor *desc; + unsigned long *bmap; + unsigned long index, index1; + u32 found_cycles; + int step = 1; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!hdr || !found || !threshold); + + SSDFS_DBG("hdr %p, start_peb %llu, pebs_count %u, " + "start %lu, max %lu, used_pebs %lu\n", + hdr, + le64_to_cpu(hdr->start_peb), + le16_to_cpu(hdr->pebs_count), + start, max, used_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + + bmap = (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_USED_BMAP][0]; + + *found = ULONG_MAX; + *threshold = U32_MAX; + + index = max - 1; + while (index > 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index %lu, used_pebs %lu\n", + index, used_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + + index1 = bitmap_find_next_zero_area(bmap, + max, index, + 1, 0); + if (index1 >= max) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try next: index1 %lu >= max %lu\n", + index1, max); +#endif /* CONFIG_SSDFS_DEBUG */ + + desc = GET_PEB_DESCRIPTOR(hdr, (u16)index); + if (IS_ERR_OR_NULL(desc)) { + err = IS_ERR(desc) ? PTR_ERR(desc) : -ERANGE; + SSDFS_ERR("fail to get peb_descriptor: " + "index %lu, err %d\n", + index, err); + return err; + } + + if (desc->state != SSDFS_MAPTBL_BAD_PEB_STATE) { + found_cycles = le32_to_cpu(desc->erase_cycles); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index %lu, found_cycles %u, " + "threshold %u\n", + index, found_cycles, *threshold); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (*threshold > found_cycles) + *threshold = found_cycles; + } + + goto try_next_index; + } else + index = index1; + + if (index == *found) + goto finish_search; + + desc = GET_PEB_DESCRIPTOR(hdr, (u16)index); + if (IS_ERR_OR_NULL(desc)) { + err = IS_ERR(desc) ? PTR_ERR(desc) : -ERANGE; + SSDFS_ERR("fail to get peb_descriptor: " + "index %lu, err %d\n", + index, err); + return err; + } + + if (desc->state != SSDFS_MAPTBL_BAD_PEB_STATE) { + found_cycles = le32_to_cpu(desc->erase_cycles); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index %lu, found_cycles %u, threshold %u\n", + index, found_cycles, *threshold); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (*found >= ULONG_MAX) { + *threshold = found_cycles; + *found = index; + } else if (*threshold > found_cycles) { + *threshold = found_cycles; + *found = index; + } else if (*threshold == found_cycles) { + /* continue search */ + *found = index; + } else if ((*threshold + 1) <= found_cycles) { + *found = index; + goto finish_search; + } + } + +try_next_index: + if (index <= step) + break; + + index -= step; + step *= 2; + + while ((index - start) < step && step >= 2) + step /= 2; + } + + if (*found >= ULONG_MAX) { + index = bitmap_find_next_zero_area(bmap, + max, 0, + 1, 0); + if (index < max) { + desc = GET_PEB_DESCRIPTOR(hdr, (u16)index); + if (IS_ERR_OR_NULL(desc)) { + err = IS_ERR(desc) ? PTR_ERR(desc) : -ERANGE; + SSDFS_ERR("fail to get peb_descriptor: " + "index %lu, err %d\n", + index, err); + return err; + } + + if (desc->state != SSDFS_MAPTBL_BAD_PEB_STATE) { + found_cycles = le32_to_cpu(desc->erase_cycles); + *threshold = found_cycles; + *found = index; + } + } + } + +finish_search: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found %lu, threshold %u\n", + *found, *threshold); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * __ssdfs_maptbl_find_unused_peb() - find unused PEB + * @hdr: PEB table fragment's header + * @start: start item for search + * @max: upper bound for the search + * @threshold: erase threshold for fragment + * @found: found item index [out] + * + * This method tries to find unused PEB in the bitmap of + * PEB table's fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENODATA - unable to find unused PEB. + */ +static +int __ssdfs_maptbl_find_unused_peb(struct ssdfs_peb_table_fragment_header *hdr, + unsigned long start, unsigned long max, + u32 threshold, unsigned long *found) +{ + struct ssdfs_peb_descriptor *desc; + unsigned long *bmap; + unsigned long index; + int err = -ENODATA; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!hdr || !found); + + SSDFS_DBG("hdr %p, start %lu, max %lu, threshold %u\n", + hdr, start, max, threshold); +#endif /* CONFIG_SSDFS_DEBUG */ + + bmap = (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_USED_BMAP][0]; + + *found = ULONG_MAX; + + if (start >= max) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start %lu >= max %lu\n", + start, max); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + + do { + index = bitmap_find_next_zero_area(bmap, max, start, 1, 0); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start %lu, max %lu, index %lu\n", + start, max, index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (index >= max) { + SSDFS_DBG("unable to find the unused peb\n"); + return -ENODATA; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(index >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + desc = GET_PEB_DESCRIPTOR(hdr, (u16)index); + if (IS_ERR_OR_NULL(desc)) { + err = IS_ERR(desc) ? PTR_ERR(desc) : -ERANGE; + SSDFS_ERR("fail to get peb_descriptor: " + "index %lu, err %d\n", + index, err); + return err; + } + + if (desc->state != SSDFS_MAPTBL_BAD_PEB_STATE) { + u32 found_cycles = le32_to_cpu(desc->erase_cycles); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index %lu, found_cycles %u, threshold %u\n", + index, found_cycles, threshold); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (found_cycles <= threshold) { + *found = index; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found: index %lu, " + "found_cycles %u, threshold %u\n", + *found, found_cycles, threshold); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } else { + /* continue to search */ + *found = ULONG_MAX; + } + } + + start = index + 1; + } while (start < max); + + return err; +} + +/* + * ssdfs_maptbl_find_unused_peb() - find unused PEB + * @hdr: PEB table fragment's header + * @start: start item for search + * @max: upper bound for the search + * @used_pebs: number of used PEBs + * @found: found item index [out] + * @erase_cycles: erase cycles for found item [out] + * + * This method tries to find unused PEB in the bitmap of + * PEB table's fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENODATA - unable to find unused PEB. + */ +static +int ssdfs_maptbl_find_unused_peb(struct ssdfs_peb_table_fragment_header *hdr, + unsigned long start, unsigned long max, + unsigned long used_pebs, + unsigned long *found, u32 *erase_cycles) +{ + u32 threshold = U32_MAX; + unsigned long found_for_threshold; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!hdr || !found || !erase_cycles); + + SSDFS_DBG("hdr %p, start %lu, max %lu\n", + hdr, start, max); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (start >= max) { + SSDFS_ERR("start %lu >= max %lu\n", + start, max); + return -EINVAL; + } + + err = ssdfs_maptbl_get_erase_threshold(hdr, 0, max, used_pebs, + found, &threshold); + if (unlikely(err)) { + SSDFS_ERR("fail to detect erase threshold: err %d\n", err); + return err; + } else if (threshold >= U32_MAX) { + SSDFS_ERR("invalid erase threshold %u\n", threshold); + return -ERANGE; + } + + *erase_cycles = threshold; + found_for_threshold = *found; + + err = __ssdfs_maptbl_find_unused_peb(hdr, start, max, + threshold, found); + if (err == -ENODATA) { + err = __ssdfs_maptbl_find_unused_peb(hdr, + 0, start, + threshold, found); + } + + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + struct ssdfs_peb_descriptor *desc; + unsigned long *bmap; + u64 start_peb; + u16 pebs_count; + u16 reserved_pebs; + u16 last_selected_peb; + unsigned long used_pebs; + u32 found_cycles; + int i; + + SSDFS_DBG("unable to find unused PEB: " + "found_for_threshold %lu, threshold %u\n", + found_for_threshold, threshold); + + bmap = (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_USED_BMAP][0]; + start_peb = le64_to_cpu(hdr->start_peb); + pebs_count = le16_to_cpu(hdr->pebs_count); + reserved_pebs = le16_to_cpu(hdr->reserved_pebs); + last_selected_peb = le16_to_cpu(hdr->last_selected_peb); + used_pebs = bitmap_weight(bmap, pebs_count); + + SSDFS_DBG("hdr %p, start_peb %llu, pebs_count %u, " + "last_selected_peb %u, " + "reserved_pebs %u, used_pebs %lu\n", + hdr, start_peb, pebs_count, last_selected_peb, + reserved_pebs, used_pebs); + + for (i = 0; i < max; i++) { + desc = GET_PEB_DESCRIPTOR(hdr, (u16)i); + if (IS_ERR_OR_NULL(desc)) + continue; + + found_cycles = le32_to_cpu(desc->erase_cycles); + + SSDFS_DBG("index %d, found_cycles %u\n", + i, found_cycles); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find unused PEB: err %d\n", err); + return err; + } + + return 0; +} + +enum { + SSDFS_MAPTBL_MAPPING_PEB, + SSDFS_MAPTBL_MIGRATING_PEB, + SSDFS_MAPTBL_PEB_PURPOSE_MAX +}; + +/* + * ssdfs_maptbl_select_unused_peb() - select unused PEB + * @fdesc: fragment descriptor + * @hdr: PEB table fragment's header + * @pebs_per_volume: number of PEBs per whole volume + * @peb_goal: PEB purpose + * + * This method tries to find unused PEB and to set this + * PEB as used. + * + * RETURN: + * [success] - item index. + * [failure] - U16_MAX. + */ +static +u16 ssdfs_maptbl_select_unused_peb(struct ssdfs_maptbl_fragment_desc *fdesc, + struct ssdfs_peb_table_fragment_header *hdr, + u64 pebs_per_volume, + int peb_goal) +{ + unsigned long *bmap; + u64 start_peb; + u16 pebs_count; + u16 unused_pebs; + u16 reserved_pebs; + u16 last_selected_peb; + unsigned long used_pebs; + unsigned long start = 0; + unsigned long found = ULONG_MAX; + u32 erase_cycles = U32_MAX; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!hdr || !fdesc); + BUG_ON(peb_goal >= SSDFS_MAPTBL_PEB_PURPOSE_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + bmap = (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_USED_BMAP][0]; + start_peb = le64_to_cpu(hdr->start_peb); + pebs_count = le16_to_cpu(hdr->pebs_count); + reserved_pebs = le16_to_cpu(hdr->reserved_pebs); + last_selected_peb = le16_to_cpu(hdr->last_selected_peb); + used_pebs = bitmap_weight(bmap, pebs_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hdr %p, start_peb %llu, pebs_count %u, " + "last_selected_peb %u, " + "reserved_pebs %u, used_pebs %lu\n", + hdr, start_peb, pebs_count, last_selected_peb, + reserved_pebs, used_pebs); + SSDFS_DBG("mapped_lebs %u, migrating_lebs %u, " + "pre_erase_pebs %u, recovering_pebs %u\n", + fdesc->mapped_lebs, fdesc->migrating_lebs, + fdesc->pre_erase_pebs, fdesc->recovering_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + + if ((start_peb + pebs_count) > pebs_per_volume) { + /* correct value */ + pebs_count = (u16)(pebs_per_volume - start_peb); + } + + if (used_pebs > pebs_count) { + SSDFS_ERR("used_pebs %lu > pebs_count %u\n", + used_pebs, pebs_count); + return -ERANGE; + } + + unused_pebs = pebs_count - used_pebs; + + switch (peb_goal) { + case SSDFS_MAPTBL_MAPPING_PEB: + if (unused_pebs <= reserved_pebs) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unused_pebs %u, reserved_pebs %u\n", + unused_pebs, reserved_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + return U16_MAX; + } + break; + + case SSDFS_MAPTBL_MIGRATING_PEB: + if (reserved_pebs == 0 && unused_pebs == 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("reserved_pebs %u, unused_pebs %u\n", + reserved_pebs, unused_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + return U16_MAX; + } + break; + + default: + BUG(); + }; + + if ((last_selected_peb + 1) >= pebs_count) + last_selected_peb = 0; + + err = ssdfs_maptbl_find_unused_peb(hdr, last_selected_peb, + pebs_count, used_pebs, + &found, &erase_cycles); + if (err == -ENODATA) { + SSDFS_DBG("unable to find the unused peb\n"); + return U16_MAX; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find unused peb: " + "start %lu, pebs_count %u, err %d\n", + start, pebs_count, err); + return U16_MAX; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(found >= U16_MAX); + BUG_ON(erase_cycles >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + bitmap_set(bmap, found, 1); + hdr->last_selected_peb = cpu_to_le16((u16)found); + + switch (peb_goal) { + case SSDFS_MAPTBL_MAPPING_PEB: + /* do nothing */ + break; + + case SSDFS_MAPTBL_MIGRATING_PEB: + if (reserved_pebs > 0) { + le16_add_cpu(&hdr->reserved_pebs, -1); + fdesc->reserved_pebs--; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hdr->reserved_pebs %u\n", + le16_to_cpu(hdr->reserved_pebs)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + + default: + BUG(); + }; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found %lu, erase_cycles %u\n", + found, erase_cycles); +#endif /* CONFIG_SSDFS_DEBUG */ + + return (u16)found; +} + +/* + * __ssdfs_maptbl_map_leb2peb() - map LEB into PEB + * @fdesc: fragment descriptor + * @hdr: PEB table fragment's header + * @leb_id: LEB ID number + * @page_index: page index in the fragment + * @peb_type: type of the PEB + * @pebr: description of PEBs relation [out] + * + * This method sets mapping association between LEB and PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOENT - unable to select unused PEB. + */ +static +int __ssdfs_maptbl_map_leb2peb(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_fragment_desc *fdesc, + struct ssdfs_peb_table_fragment_header *hdr, + u64 leb_id, pgoff_t page_index, u8 peb_type, + struct ssdfs_maptbl_peb_relation *pebr) +{ + struct ssdfs_peb_descriptor *peb_desc; + struct ssdfs_leb_table_fragment_header *lebtbl_hdr; + struct ssdfs_leb_descriptor *leb_desc; + struct ssdfs_maptbl_peb_descriptor *ptr = NULL; + u16 item_index; + u16 peb_index = 0; + pgoff_t lebtbl_page; + struct page *page; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc || !hdr || !pebr); + + if (peb_type >= SSDFS_MAPTBL_PEB_TYPE_MAX) { + SSDFS_ERR("invalid peb_type %#x\n", + peb_type); + return -EINVAL; + } + + SSDFS_DBG("fdesc %p, hdr %p, leb_id %llu, peb_type %#x, pebr %p\n", + fdesc, hdr, leb_id, peb_type, pebr); +#endif /* CONFIG_SSDFS_DEBUG */ + + item_index = ssdfs_maptbl_select_unused_peb(fdesc, hdr, + tbl->pebs_count, + SSDFS_MAPTBL_MAPPING_PEB); + if (item_index == U16_MAX) { + SSDFS_DBG("unable to select unused peb\n"); + return -ENOENT; + } + + memset(pebr, 0xFF, sizeof(struct ssdfs_maptbl_peb_relation)); + + peb_desc = GET_PEB_DESCRIPTOR(hdr, item_index); + if (IS_ERR_OR_NULL(peb_desc)) { + err = IS_ERR(peb_desc) ? PTR_ERR(peb_desc) : -ERANGE; + SSDFS_ERR("fail to get peb_descriptor: " + "index %u, err %d\n", + item_index, err); + return err; + } + + peb_desc->type = peb_type; + peb_desc->state = SSDFS_MAPTBL_CLEAN_PEB_STATE; + + lebtbl_page = LEBTBL_PAGE_INDEX(fdesc, leb_id); + if (lebtbl_page == ULONG_MAX) { + SSDFS_ERR("fail to define page_index: " + "leb_id %llu\n", + leb_id); + return -ERANGE; + } + + page = ssdfs_page_array_get_page_locked(&fdesc->array, lebtbl_page); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + lebtbl_page); + return err; + } + + kaddr = kmap_local_page(page); + + leb_desc = GET_LEB_DESCRIPTOR(kaddr, leb_id); + if (IS_ERR_OR_NULL(leb_desc)) { + err = IS_ERR(leb_desc) ? PTR_ERR(leb_desc) : -ERANGE; + SSDFS_ERR("fail to get leb_descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_page_processing; + } + + peb_index = DEFINE_PEB_INDEX_IN_FRAGMENT(fdesc, page_index, item_index); + if (peb_index == U16_MAX) { + err = -ERANGE; + SSDFS_ERR("fail to define peb index\n"); + goto finish_page_processing; + } + + leb_desc->physical_index = cpu_to_le16(peb_index); + leb_desc->relation_index = U16_MAX; + + lebtbl_hdr = (struct ssdfs_leb_table_fragment_header *)kaddr; + le16_add_cpu(&lebtbl_hdr->mapped_lebs, 1); + + ptr = &pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX]; + ptr->peb_id = le64_to_cpu(hdr->start_peb) + item_index; + ptr->shared_peb_index = peb_desc->shared_peb_index; + ptr->erase_cycles = le32_to_cpu(peb_desc->erase_cycles); + ptr->type = peb_desc->type; + ptr->state = peb_desc->state; + ptr->flags = peb_desc->flags; + +finish_page_processing: + kunmap_local(kaddr); + + if (!err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("leb_id %llu, item_index %u, peb_index %u, " + "start_peb %llu, peb_id %llu\n", + leb_id, item_index, peb_index, + le64_to_cpu(hdr->start_peb), + ptr->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + err = ssdfs_page_array_set_page_dirty(&fdesc->array, + lebtbl_page); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: err %d\n", + lebtbl_page, err); + } + } + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +static +int ssdfs_maptbl_reserve_free_pages(struct ssdfs_fs_info *fsi) +{ + u64 free_pebs = 0; + u64 free_pages = 0; + u64 reserved_pages = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&fsi->volume_state_lock); + free_pages = fsi->free_pages; + free_pebs = div64_u64(free_pages, fsi->pages_per_peb); + if (free_pebs >= 1) { + reserved_pages = fsi->pages_per_peb; + if (fsi->free_pages >= reserved_pages) { + fsi->free_pages -= reserved_pages; + free_pages = fsi->free_pages; + } else + err = -ERANGE; + } else + err = -ENOSPC; + spin_unlock(&fsi->volume_state_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to reserve PEB: " + "free_pages %llu, err %d\n", + free_pages, err); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_pages %llu, reserved_pages %llu\n", + free_pages, reserved_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return err; +} + +static +void ssdfs_maptbl_free_reserved_pages(struct ssdfs_fs_info *fsi) +{ + u64 free_pages = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&fsi->volume_state_lock); + fsi->free_pages += fsi->pages_per_peb; + free_pages = fsi->free_pages; + spin_unlock(&fsi->volume_state_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_pages %llu\n", + free_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + return; +} + +static inline +bool can_peb_be_reserved(struct ssdfs_fs_info *fsi, + struct ssdfs_peb_table_fragment_header *hdr) +{ + unsigned long *bmap; + u16 pebs_count; + u16 used_pebs; + u16 unused_pebs; + u16 reserved_pebs; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !hdr); +#endif /* CONFIG_SSDFS_DEBUG */ + + pebs_count = le16_to_cpu(hdr->pebs_count); + reserved_pebs = le16_to_cpu(hdr->reserved_pebs); + + bmap = (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_USED_BMAP][0]; + used_pebs = bitmap_weight(bmap, pebs_count); + unused_pebs = pebs_count - used_pebs; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pebs_count %u, used_pebs %u, " + "unused_pebs %u, reserved_pebs %u\n", + pebs_count, used_pebs, + unused_pebs, reserved_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unused_pebs == 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to reserve PEB: " + "pebs_count %u, used_pebs %u, " + "unused_pebs %u, reserved_pebs %u\n", + pebs_count, used_pebs, + unused_pebs, reserved_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } else if ((reserved_pebs + 1) >= unused_pebs) { + /* + * Mapping operation takes one PEB + + * reservation needs another one. + */ + if (reserved_pebs > unused_pebs) { + SSDFS_WARN("fail to reserve PEB: " + "pebs_count %u, used_pebs %u, " + "unused_pebs %u, reserved_pebs %u\n", + pebs_count, used_pebs, + unused_pebs, reserved_pebs); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to reserve PEB: " + "pebs_count %u, used_pebs %u, " + "unused_pebs %u, reserved_pebs %u\n", + pebs_count, used_pebs, + unused_pebs, reserved_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return false; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PEB can be reserved\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return true; +} + +/* + * __ssdfs_maptbl_try_map_leb2peb() - try to map LEB into PEB + * @tbl: pointer on mapping table object + * @fdesc: fragment descriptor + * @leb_id: LEB ID number + * @peb_type: type of the PEB + * @pebr: description of PEBs relation [out] + * + * This method tries to set association between LEB identification + * number and PEB identification number. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EACCES - PEB stripe is under recovering. + * %-ENOENT - provided @leb_id cannot be mapped. + */ +static +int __ssdfs_maptbl_try_map_leb2peb(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_fragment_desc *fdesc, + u64 leb_id, u64 start_peb_id, u8 peb_type, + struct ssdfs_maptbl_peb_relation *pebr) +{ + struct ssdfs_fs_info *fsi; + pgoff_t page_index; + struct page *page; + void *kaddr; + struct ssdfs_peb_table_fragment_header *hdr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !fdesc || !pebr); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + BUG_ON(!rwsem_is_locked(&fdesc->lock)); + + if (peb_type >= SSDFS_MAPTBL_PEB_TYPE_MAX) { + SSDFS_ERR("invalid peb_type %#x\n", + peb_type); + return -EINVAL; + } + + SSDFS_DBG("tbl %p, fdesc %p, leb_id %llu, " + "start_peb_id %llu, peb_type %#x\n", + tbl, fdesc, leb_id, start_peb_id, peb_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = tbl->fsi; + + page_index = ssdfs_maptbl_define_pebtbl_page(tbl, fdesc, + start_peb_id, + U16_MAX); + if (page_index == ULONG_MAX) { + err = -ERANGE; + SSDFS_ERR("fail to define PEB table's page_index: " + "start_peb_id %llu\n", start_peb_id); + goto finish_fragment_change; + } + + page = ssdfs_page_array_get_page_locked(&fdesc->array, page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + page_index); + goto finish_fragment_change; + } + + kaddr = kmap_local_page(page); + + hdr = (struct ssdfs_peb_table_fragment_header *)kaddr; + + if (is_pebtbl_stripe_recovering(hdr)) { + err = -EACCES; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to map leb_id %llu: " + "stripe %u is under recovering\n", + leb_id, + le16_to_cpu(hdr->stripe_id)); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_page_processing; + } + + if (!can_be_mapped_leb2peb(tbl, fdesc, leb_id)) { + err = ssdfs_maptbl_decrease_reserved_pebs(fsi, fdesc, hdr); + if (err == -ENOSPC) { + err = -ENOENT; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to decrease reserved_pebs %u\n", + le16_to_cpu(hdr->reserved_pebs)); + SSDFS_DBG("unable to map leb_id %llu: " + "value is out of threshold\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_page_processing; + } else if (unlikely(err)) { + SSDFS_ERR("fail to decrease reserved_pebs: err %d\n", + err); + goto finish_page_processing; + } + } + + if (!has_fragment_unused_pebs(hdr)) { + err = ssdfs_maptbl_decrease_reserved_pebs(fsi, fdesc, hdr); + if (err == -ENOSPC) { + err = -ENOENT; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to decrease reserved_pebs %u\n", + le16_to_cpu(hdr->reserved_pebs)); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_page_processing; + } else if (unlikely(err)) { + SSDFS_ERR("fail to decrease reserved_pebs: err %d\n", + err); + goto finish_page_processing; + } + } + + if (!has_fragment_unused_pebs(hdr)) { + err = -ERANGE; + SSDFS_ERR("fail to map leb_id %llu\n", leb_id); + goto finish_page_processing; + } + + if (need_try2reserve_peb(fsi)) { + /* + * Reservation could be not aligned with + * already mapped PEBs. Simply, try to align + * the number of reserved PEBs. + */ + err = ssdfs_maptbl_increase_reserved_pebs(fsi, fdesc, hdr); + if (err == -ENOSPC) { + err = 0; + SSDFS_DBG("no space to reserve PEBs\n"); + } else if (unlikely(err)) { + SSDFS_ERR("fail to increase reserved PEBs: " + "err %d\n", err); + goto finish_page_processing; + } + + if (can_peb_be_reserved(fsi, hdr)) { + err = ssdfs_maptbl_reserve_free_pages(fsi); + if (err == -ENOSPC) { + err = -ENOENT; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to reserve PEB: " + "err %d\n", err); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_page_processing; + } else if (unlikely(err)) { + SSDFS_ERR("fail to reserve PEB: " + "err %d\n", err); + goto finish_page_processing; + } + } else { + err = -ENOENT; + SSDFS_DBG("unable to reserve PEB\n"); + goto finish_page_processing; + } + } + + err = __ssdfs_maptbl_map_leb2peb(tbl, fdesc, hdr, leb_id, + page_index, peb_type, pebr); + if (err == -ENOENT) { + if (need_try2reserve_peb(fsi)) { + ssdfs_maptbl_free_reserved_pages(fsi); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to map: leb_id %llu, page_index %lu\n", + leb_id, page_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + goto finish_page_processing; + } else if (unlikely(err)) { + if (need_try2reserve_peb(fsi)) { + ssdfs_maptbl_free_reserved_pages(fsi); + } + + SSDFS_ERR("fail to map leb_id %llu, err %d\n", + leb_id, err); + goto finish_page_processing; + } + + fdesc->mapped_lebs++; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("mapped_lebs %u, migrating_lebs %u\n", + fdesc->mapped_lebs, fdesc->migrating_lebs); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (need_try2reserve_peb(fsi)) { + le16_add_cpu(&hdr->reserved_pebs, 1); + fdesc->reserved_pebs++; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("reserved_pebs %u\n", + le16_to_cpu(hdr->reserved_pebs)); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_page_processing: + flush_dcache_page(page); + kunmap_local(kaddr); + + if (!err) { + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + err = ssdfs_page_array_set_page_dirty(&fdesc->array, + page_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: err %d\n", + page_index, err); + } + } + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_fragment_change: + return err; +} + +/* + * ssdfs_maptbl_try_map_leb2peb() - try to map LEB into PEB + * @tbl: pointer on mapping table object + * @fdesc: fragment descriptor + * @leb_id: LEB ID number + * @peb_type: type of the PEB + * @pebr: description of PEBs relation [out] + * + * This method tries to set association between LEB identification + * number and PEB identification number. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EACCES - PEB stripe is under recovering. + * %-ENOENT - provided @leb_id cannot be mapped. + */ +static +int ssdfs_maptbl_try_map_leb2peb(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_fragment_desc *fdesc, + u64 leb_id, u8 peb_type, + struct ssdfs_maptbl_peb_relation *pebr) +{ + u64 start_peb; + u64 end_peb; + int err = -ENOENT; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !fdesc || !pebr); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + BUG_ON(!rwsem_is_locked(&fdesc->lock)); + + if (peb_type >= SSDFS_MAPTBL_PEB_TYPE_MAX) { + SSDFS_ERR("invalid peb_type %#x\n", + peb_type); + return -EINVAL; + } + + SSDFS_DBG("tbl %p, fdesc %p, leb_id %llu, peb_type %#x\n", + tbl, fdesc, leb_id, peb_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + start_peb = fdesc->start_leb; + end_peb = fdesc->start_leb + fdesc->lebs_count; + + while (start_peb < end_peb) { + err = __ssdfs_maptbl_try_map_leb2peb(tbl, fdesc, + leb_id, start_peb, + peb_type, pebr); + if (err == -ENOENT) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to map: " + "leb_id %llu, start_peb %llu\n", + leb_id, start_peb); +#endif /* CONFIG_SSDFS_DEBUG */ + start_peb += fdesc->pebs_per_page; + } else if (unlikely(err)) { + SSDFS_ERR("fail to map: leb_id %llu, err %d\n", + leb_id, err); + return err; + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("leb_id %llu has been mapped\n", leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to map: leb_id %llu\n", leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + return -ENOENT; +} + +/* + * ssdfs_maptbl_map_leb2peb() - map LEB into PEB + * @fsi: file system info object + * @leb_id: LEB ID number + * @peb_type: type of the PEB + * @pebr: description of PEBs relation [out] + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to set association between LEB identification + * number and PEB identification number. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EFAULT - maptbl has inconsistent state. + * %-EAGAIN - fragment is under initialization yet. + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EACCES - PEB stripe is under recovering. + * %-ENOENT - provided @leb_id cannot be mapped. + * %-EEXIST - LEB is mapped yet. + */ +int ssdfs_maptbl_map_leb2peb(struct ssdfs_fs_info *fsi, + u64 leb_id, u8 peb_type, + struct ssdfs_maptbl_peb_relation *pebr, + struct completion **end) +{ + struct ssdfs_peb_mapping_table *tbl; + struct ssdfs_maptbl_cache *cache; + struct ssdfs_maptbl_fragment_desc *fdesc; + int state; + struct ssdfs_leb_descriptor leb_desc; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !pebr || !end); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("fsi %p, leb_id %llu, pebr %p, init_end %p\n", + fsi, leb_id, pebr, end); +#else + SSDFS_DBG("fsi %p, leb_id %llu, pebr %p, init_end %p\n", + fsi, leb_id, pebr, end); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + *end = NULL; + memset(pebr, 0xFF, sizeof(struct ssdfs_maptbl_peb_relation)); + + tbl = fsi->maptbl; + cache = &tbl->fsi->maptbl_cache; + + if (!tbl) { + SSDFS_CRIT("mapping table is absent\n"); + return -ERANGE; + } + + if (atomic_read(&tbl->flags) & SSDFS_MAPTBL_ERROR) { + ssdfs_fs_error(tbl->fsi->sb, + __FILE__, __func__, __LINE__, + "maptbl has corrupted state\n"); + return -EFAULT; + } + + down_read(&tbl->tbl_lock); + + fdesc = ssdfs_maptbl_get_fragment_descriptor(tbl, leb_id); + if (IS_ERR_OR_NULL(fdesc)) { + err = IS_ERR(fdesc) ? PTR_ERR(fdesc) : -ERANGE; + SSDFS_ERR("fail to get fragment descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_mapping; + } + + *end = &fdesc->init_end; + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED) { + err = -EFAULT; + SSDFS_ERR("fragment is corrupted: leb_id %llu\n", leb_id); + goto finish_mapping; + } else if (state == SSDFS_MAPTBL_FRAG_CREATED) { + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_mapping; + } + + down_write(&fdesc->lock); + + err = ssdfs_maptbl_get_leb_descriptor(fdesc, leb_id, &leb_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to get leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_fragment_change; + } + + err = ssdfs_maptbl_get_peb_relation(fdesc, &leb_desc, pebr); + if (err != -ENODATA) { + if (unlikely(err)) { + SSDFS_ERR("fail to get peb relation: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_fragment_change; + } else { + err = -EEXIST; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("leb_id %llu is mapped yet\n", leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_fragment_change; + } + } else + err = 0; + + err = ssdfs_maptbl_try_map_leb2peb(tbl, fdesc, leb_id, peb_type, pebr); + if (err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to map: leb_id %llu, peb_type %#x\n", + leb_id, peb_type); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_fragment_change; + } else if (unlikely(err)) { + SSDFS_ERR("fail to map: leb_id %llu, peb_type %#x, err %d\n", + leb_id, peb_type, err); + goto finish_fragment_change; + } + +finish_fragment_change: + up_write(&fdesc->lock); + + if (!err) + ssdfs_maptbl_set_fragment_dirty(tbl, fdesc, leb_id); + +finish_mapping: + up_read(&tbl->tbl_lock); + + if (err == -EAGAIN && should_cache_peb_info(peb_type)) { + err = ssdfs_maptbl_cache_convert_leb2peb(cache, leb_id, + pebr); + if (err == -ENODATA) { + err = -ENOENT; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to convert LEB to PEB: " + "leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, err %d\n", + leb_id, err); + } else { + err = -EEXIST; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("leb_id %llu is mapped yet\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + } + } else if (!err && should_cache_peb_info(peb_type)) { + err = ssdfs_maptbl_cache_map_leb2peb(cache, leb_id, pebr, + SSDFS_PEB_STATE_CONSISTENT); + if (unlikely(err)) { + SSDFS_ERR("fail to cache LEB/PEB mapping: " + "leb_id %llu, peb_id %llu, err %d\n", + leb_id, + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id, + err); + err = -EFAULT; + } + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("leb_id %llu, pebs_count %llu\n", + leb_id, tbl->pebs_count); + SSDFS_ERR("MAIN_INDEX: peb_id %llu, type %#x, " + "state %#x, consistency %#x; " + "RELATION_INDEX: peb_id %llu, type %#x, " + "state %#x, consistency %#x\n", + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id, + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].type, + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].consistency, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].peb_id, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].type, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].state, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].consistency); + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("leb_id %llu, pebs_count %llu\n", + leb_id, tbl->pebs_count); + SSDFS_DBG("MAIN_INDEX: peb_id %llu, type %#x, " + "state %#x, consistency %#x; " + "RELATION_INDEX: peb_id %llu, type %#x, " + "state %#x, consistency %#x\n", + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id, + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].type, + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].consistency, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].peb_id, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].type, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].state, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].consistency); + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (!err) { + u64 peb_id = pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id; + loff_t offset = peb_id * fsi->erasesize; + + err = fsi->devops->open_zone(fsi->sb, offset); + if (err == -EOPNOTSUPP && !fsi->is_zns_device) { + /* ignore error */ + err = 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to open zone: " + "offset %llu, err %d\n", + offset, err); + return err; + } + } + + return err; +} + +/* + * ssdfs_maptbl_find_pebtbl_page() - find next page of PEB table + * @tbl: pointer on mapping table object + * @fdesc: fragment descriptor + * @cur_index: current page index + * @start_index: page index in the start of searching + * + * This method tries to find a next page of PEB table. + */ +static +pgoff_t ssdfs_maptbl_find_pebtbl_page(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_fragment_desc *fdesc, + pgoff_t cur_index, + pgoff_t start_index) +{ + pgoff_t index; + u32 pebtbl_pages, fragment_pages; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("maptbl %p, fdesc %p, cur_index %lu, start_index %lu\n", + tbl, fdesc, cur_index, start_index); + + BUG_ON(!tbl || !fdesc); + BUG_ON((tbl->stripes_per_fragment * fdesc->stripe_pages) < cur_index); + BUG_ON((tbl->stripes_per_fragment * fdesc->stripe_pages) < start_index); + BUG_ON(cur_index < fdesc->lebtbl_pages); + BUG_ON(start_index < fdesc->lebtbl_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + pebtbl_pages = tbl->stripes_per_fragment * fdesc->stripe_pages; + fragment_pages = (u32)fdesc->lebtbl_pages + pebtbl_pages; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(cur_index >= fragment_pages); + BUG_ON(start_index >= fragment_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + index = cur_index + fdesc->stripe_pages; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pebtbl_pages %u, fragment_pages %u, " + "fdesc->stripe_pages %u, cur_index %lu, " + "index %lu\n", + pebtbl_pages, fragment_pages, + fdesc->stripe_pages, cur_index, + index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (index >= fragment_pages) + index = ULONG_MAX; + + return index; +} + +/* + * ssdfs_maptbl_try_decrease_reserved_pebs() - try decrease reserved PEBs + * @tbl: pointer on mapping table object + * @fdesc: fragment descriptor + * + * This method tries to decrease number of reserved PEBs. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EACCES - fragment is recovering. + * %-ENOENT - unable to decrease the number of reserved PEBs. + * %-ERANGE - internal error. + */ +static int +ssdfs_maptbl_try_decrease_reserved_pebs(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_fragment_desc *fdesc) +{ + struct ssdfs_fs_info *fsi; + pgoff_t start_page; + pgoff_t page_index; + struct page *page; + void *kaddr; + struct ssdfs_peb_table_fragment_header *hdr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !fdesc); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + BUG_ON(!rwsem_is_locked(&fdesc->lock)); + + SSDFS_DBG("start_leb %llu, end_leb %llu\n", + fdesc->start_leb, + fdesc->start_leb + fdesc->lebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = tbl->fsi; + + start_page = ssdfs_maptbl_define_pebtbl_page(tbl, fdesc, + fdesc->start_leb, + U16_MAX); + if (start_page == ULONG_MAX) { + err = -ERANGE; + SSDFS_ERR("fail to define PEB table's page_index: " + "start_peb_id %llu\n", fdesc->start_leb); + goto finish_fragment_change; + } + + page_index = start_page; + +try_next_page: + page = ssdfs_page_array_get_page_locked(&fdesc->array, page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + page_index); + goto finish_fragment_change; + } + + kaddr = kmap_local_page(page); + + hdr = (struct ssdfs_peb_table_fragment_header *)kaddr; + + if (is_pebtbl_stripe_recovering(hdr)) { + err = -EACCES; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to decrease reserved_pebs: " + "stripe %u is under recovering\n", + le16_to_cpu(hdr->stripe_id)); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_page_processing; + } + + err = ssdfs_maptbl_decrease_reserved_pebs(fsi, fdesc, hdr); + if (err == -ENOSPC) { + err = -ENOENT; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to decrease reserved_pebs %u\n", + le16_to_cpu(hdr->reserved_pebs)); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_page_processing; + } else if (unlikely(err)) { + SSDFS_ERR("fail to decrease reserved_pebs: err %d\n", + err); + goto finish_page_processing; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("mapped_lebs %u, migrating_lebs %u, " + "reserved_pebs %u, pre_erase_pebs %u, " + "recovering_pebs %u\n", + fdesc->mapped_lebs, fdesc->migrating_lebs, + fdesc->reserved_pebs, fdesc->pre_erase_pebs, + fdesc->recovering_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_page_processing: + flush_dcache_page(page); + kunmap_local(kaddr); + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + + if (err == -EACCES || err == -ENOENT) { + page_index = ssdfs_maptbl_find_pebtbl_page(tbl, fdesc, + page_index, + start_page); + if (page_index == ULONG_MAX) + goto finish_fragment_change; + else + goto try_next_page; + } + +finish_fragment_change: + return err; +} + +/* + * ssdfs_maptbl_recommend_search_range() - recommend search range + * @fsi: file system info object + * @start_leb: recommended start LEB ID [in|out] + * @end_leb: recommended end LEB ID [out] + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to find not exhausted fragment and + * to share the starting/ending LEB ID of this fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EFAULT - maptbl has inconsistent state. + * %-EAGAIN - fragment is under initialization yet. + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOENT - all fragments have been exhausted. + */ +int ssdfs_maptbl_recommend_search_range(struct ssdfs_fs_info *fsi, + u64 *start_leb, + u64 *end_leb, + struct completion **end) +{ + struct ssdfs_peb_mapping_table *tbl; + struct ssdfs_maptbl_fragment_desc *fdesc; + int state; + u64 start_search_leb; + u64 found_start_leb = 0; + u64 found_end_leb = 0; + int start_index; + bool is_found = false; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !start_leb || !end_leb || !end); + + SSDFS_DBG("fsi %p, start_leb %llu, end_leb %p, init_end %p\n", + fsi, *start_leb, end_leb, end); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (*start_leb >= fsi->nsegs) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_leb %llu >= nsegs %llu", + *start_leb, fsi->nsegs); +#endif /* CONFIG_SSDFS_DEBUG */ + *start_leb = U64_MAX; + *end_leb = U64_MAX; + return -ENOENT; + } + + start_search_leb = *start_leb; + + *start_leb = U64_MAX; + *end_leb = U64_MAX; + *end = NULL; + + tbl = fsi->maptbl; + if (!tbl) { + SSDFS_CRIT("mapping table is absent\n"); + return -ERANGE; + } + + if (atomic_read(&tbl->flags) & SSDFS_MAPTBL_ERROR) { + ssdfs_fs_error(tbl->fsi->sb, + __FILE__, __func__, __LINE__, + "maptbl has corrupted state\n"); + return -EFAULT; + } + + err = -ENOENT; + + down_read(&tbl->tbl_lock); + + start_index = FRAGMENT_INDEX(tbl, start_search_leb); + + for (i = start_index; i < tbl->fragments_count; i++) { + fdesc = &tbl->desc_array[i]; + + *end = &fdesc->init_end; + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED) { + err = -EFAULT; + SSDFS_ERR("fragment is corrupted: index %d\n", i); + goto finish_check; + } else if (state == SSDFS_MAPTBL_FRAG_CREATED) { + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: " + "index %d\n", i); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_check; + } + + down_read(&fdesc->lock); + + found_start_leb = fdesc->start_leb; + found_end_leb = fdesc->start_leb + fdesc->lebs_count; + is_found = can_be_mapped_leb2peb(tbl, fdesc, found_start_leb); + + if (!is_found) { + err = ssdfs_maptbl_try_decrease_reserved_pebs(tbl, + fdesc); + if (err == -ENOENT) { + err = 0; + SSDFS_DBG("unable to decrease reserved pebs\n"); + } else if (unlikely(err)) { + SSDFS_ERR("fail to decrease reserved pebs: " + "err %d\n", err); + goto finish_fragment_processing; + } + + is_found = can_be_mapped_leb2peb(tbl, fdesc, + found_start_leb); + } + +finish_fragment_processing: + up_read(&fdesc->lock); + + *start_leb = max_t(u64, start_search_leb, found_start_leb); + *end_leb = found_end_leb; + + if (is_found) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("recommend: start_leb %llu, end_leb %llu\n", + *start_leb, *end_leb); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + } else { + err = -ENOENT; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment %d (leb_id %llu) is exhausted\n", + i, found_start_leb); +#endif /* CONFIG_SSDFS_DEBUG */ + } + } + +finish_check: + up_read(&tbl->tbl_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished: start_leb %llu, end_leb %llu, err %d\n", + *start_leb, *end_leb, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} From patchwork Sat Feb 25 01:08:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151946 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9ADEAC7EE31 for ; Sat, 25 Feb 2023 01:17:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229718AbjBYBR4 (ORCPT ); Fri, 24 Feb 2023 20:17:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48762 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229722AbjBYBRL (ORCPT ); Fri, 24 Feb 2023 20:17:11 -0500 Received: from mail-oa1-x2e.google.com (mail-oa1-x2e.google.com [IPv6:2001:4860:4864:20::2e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7EB801352D for ; Fri, 24 Feb 2023 17:17:03 -0800 (PST) Received: by mail-oa1-x2e.google.com with SMTP id 586e51a60fabf-1720600a5f0so1485488fac.11 for ; Fri, 24 Feb 2023 17:17:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BzHQ2qJYirzEnxrJlmagZL6r6yJZeLJJpIhqMusMgxM=; b=Pv66UtNVVAFLnYXclVJm/NLFe23cs5XB5EviuhImlgGKSO5xasuVBDiZaIN+cfbDeM SSj3kRs1m9mAUCOrkGqTSZ0ALOIJa/fTQLCC/Z+0yaKSDA7dQ24BMSyn1adFROomMNQu cYxTtR2GDo5/a/pPnlJt3POfYPz8suKWjd0Ryy3JuqsMKoKHK1ZQCkQWTEVh8v2D2uIy q7FCsDLelQ9g1AXo2sxRpT6jsHVGU0nTC8jAAvlZY/RSKzabzf4NkpIClTGoZF3Ap35L IwayejlgeLfw2g+5Ki2bBoaK4kM1eYtNZODASUht0s+J4397kFuRvDmR4QR2AnBG0Xox 13Uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BzHQ2qJYirzEnxrJlmagZL6r6yJZeLJJpIhqMusMgxM=; b=w9zOG2tpW/1K/CffRIJh1u0Jn+miFAmnWAnLi7rT3vsBdkj5RcQ0c/8fynUG2BndPF iNUQObNQSjm1lawOMs7mWthIqNccrsuxGkoFamam39kJv7KMLrN1yaMmYJvLST38CAqY Xr6DcO8n/3WpE6pLd2zo+TLPDIUvjUc5zDVCi80JBcnkTDt6YM3Uo6+ojU1OvNj+Sk8t q5OPMlGoDrSronMzoYhO3YceSU7wscetttyv6p6okWq+15+Fl9NcCMYc7pU9eCPFD3iA Qgc9aVlY5cMLomPsu1JM9IDqRE06kpJ/Xbb9az7mumK0XThlwdCac/2PTpDXCD+dTWRj nqSQ== X-Gm-Message-State: AO0yUKV+UcVpLkqEJUhLxTM4y+D0Jzh0flM2i3tUmKqoZz9RnkzXF1Zu WP9YT2iG4s/sRr5luehwySkpRh8Ci1TXLUHN X-Google-Smtp-Source: AK7set8+mtaTr9GKSKeb9bQKYY+QQSr/Hj6kZB1l84rJJYA6yQqT8y8s1hlAnl7Sor1b+dQLEoKXcg== X-Received: by 2002:a05:6870:fba9:b0:172:2d00:9a4c with SMTP id kv41-20020a056870fba900b001722d009a4cmr8214852oab.34.1677287821891; Fri, 24 Feb 2023 17:17:01 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.16.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:01 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 41/76] ssdfs: support migration scheme by PEB state Date: Fri, 24 Feb 2023 17:08:52 -0800 Message-Id: <20230225010927.813929-42-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Migration scheme is the fundamental technique of GC overhead management in the SSDFS file system. The key responsibility of the migration scheme is to guarantee the presence of data at the same segment for any update operations. Generally speaking, the migration scheme’s model is implemented on the basis of association an exhausted PEB with a clean one. The goal of such association of two PEBs is to implement the gradual migration of data by means of the update operations in the initial (exhausted) PEB. As a result, the old, exhausted PEB becomes invalidated after complete data migration and it will be possible to apply the erase operation to convert it in the clean state. Moreover, the destination PEB in the association changes the initial PEB for some index in the segment and, finally, it becomes the only PEB for this position. Such technique implements the concept of logical extent with the goal to decrease the write amplification issue and to manage the GC overhead. Because the logical extent concept excludes the necessity to update metadata tracking the position of user data on the file system’s volume. Generally speaking, the migration scheme is capable to decrease the GC activity significantly by means of the excluding the necessity to update metadata and by means of self-migration of data between of PEBs is triggered by regular update operations. Mapping table supports two principal operations: (1) add migration PEB, (2) exclude migration PEB. Operation of adding migration PEB is required for the case of starting migration. Exclude migration PEB operation is executed during finishing migration. Adding migration PEB operation implies the association an exhausted PEB with a clean one. Excluding migration PEB operation implies removing completely invalidated PEB from the association and request to TRIM/erase this PEB. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/peb_mapping_table.c | 2985 ++++++++++++++++++++++++++++++++++ 1 file changed, 2985 insertions(+) diff --git a/fs/ssdfs/peb_mapping_table.c b/fs/ssdfs/peb_mapping_table.c index 44995170fe75..490114d77c67 100644 --- a/fs/ssdfs/peb_mapping_table.c +++ b/fs/ssdfs/peb_mapping_table.c @@ -8052,3 +8052,2988 @@ int ssdfs_maptbl_recommend_search_range(struct ssdfs_fs_info *fsi, return err; } + +/* + * __ssdfs_maptbl_change_peb_state() - change PEB state + * @tbl: pointer on mapping table object + * @fdesc: fragment descriptor + * @leb_id: LEB ID number + * @selected_index: index of item in the whole fragment + * @new_peb_state: new state of the PEB + * @old_peb_state: old state of the PEB [out] + * + * This method tries to change the state of the PEB + * in the mapping table. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EACCES - PEB stripe is under recovering. + * %-EEXIST - PEB has this state already. + */ +static +int __ssdfs_maptbl_change_peb_state(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_fragment_desc *fdesc, + u64 leb_id, + u16 selected_index, + int new_peb_state, + int *old_peb_state) +{ + struct ssdfs_peb_table_fragment_header *hdr; + struct ssdfs_peb_descriptor *peb_desc; + pgoff_t page_index; + struct page *page; + void *kaddr; + u16 item_index; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tbl %p, fdesc %p, leb_id %llu, " + "selected_index %u, new_peb_state %#x\n", + tbl, fdesc, leb_id, + selected_index, new_peb_state); + + BUG_ON(!tbl || !fdesc || !old_peb_state); + BUG_ON(selected_index >= U16_MAX); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + BUG_ON(!rwsem_is_locked(&fdesc->lock)); + + if (new_peb_state <= SSDFS_MAPTBL_UNKNOWN_PEB_STATE || + new_peb_state >= SSDFS_MAPTBL_PEB_STATE_MAX) { + SSDFS_ERR("invalid PEB state %#x\n", + new_peb_state); + return -EINVAL; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + *old_peb_state = SSDFS_MAPTBL_PEB_STATE_MAX; + + page_index = ssdfs_maptbl_define_pebtbl_page(tbl, fdesc, + leb_id, selected_index); + if (page_index == ULONG_MAX) { + err = -ERANGE; + SSDFS_ERR("fail to define PEB table's page_index: " + "leb_id %llu\n", leb_id); + goto finish_fragment_change; + } + + page = ssdfs_page_array_get_page_locked(&fdesc->array, page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + page_index); + goto finish_fragment_change; + } + + kaddr = kmap_local_page(page); + + hdr = (struct ssdfs_peb_table_fragment_header *)kaddr; + + if (is_pebtbl_stripe_recovering(hdr)) { + err = -EACCES; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to change the PEB state: " + "leb_id %llu: " + "stripe %u is under recovering\n", + leb_id, + le16_to_cpu(hdr->stripe_id)); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_page_processing; + } + + item_index = selected_index % fdesc->pebs_per_page; + + peb_desc = GET_PEB_DESCRIPTOR(kaddr, item_index); + if (IS_ERR_OR_NULL(peb_desc)) { + err = IS_ERR(peb_desc) ? PTR_ERR(peb_desc) : -ERANGE; + SSDFS_ERR("fail to get peb_descriptor: " + "page_index %lu, item_index %u, err %d\n", + page_index, item_index, err); + goto finish_page_processing; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("leb_id %llu, item_index %u, " + "old_peb_state %#x, new_peb_state %#x\n", + leb_id, item_index, peb_desc->state, new_peb_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + *old_peb_state = peb_desc->state; + + if (peb_desc->state == (u8)new_peb_state) { + err = -EEXIST; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_state1 %#x == peb_state2 %#x\n", + peb_desc->state, + (u8)new_peb_state); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_page_processing; + } else + peb_desc->state = (u8)new_peb_state; + +finish_page_processing: + flush_dcache_page(page); + kunmap_local(kaddr); + + if (!err) { + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + err = ssdfs_page_array_set_page_dirty(&fdesc->array, + page_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: err %d\n", + page_index, err); + } + } + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_fragment_change: + return err; +} + +/* + * ssdfs_maptbl_change_peb_state() - change PEB state + * @fsi: file system info object + * @leb_id: LEB ID number + * @peb_type: type of the PEB + * @peb_state: new state of the PEB + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to change the state of the PEB + * in the mapping table. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EFAULT - maptbl has inconsistent state. + * %-EAGAIN - fragment is under initialization yet. + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EACCES - PEB stripe is under recovering. + * %-ENODATA - uninitialized LEB descriptor. + */ +int ssdfs_maptbl_change_peb_state(struct ssdfs_fs_info *fsi, + u64 leb_id, u8 peb_type, int peb_state, + struct completion **end) +{ + struct ssdfs_peb_mapping_table *tbl; + struct ssdfs_maptbl_cache *cache; + struct ssdfs_maptbl_fragment_desc *fdesc; + struct ssdfs_leb_descriptor leb_desc; + struct ssdfs_maptbl_peb_relation pebr; + int state; + u16 selected_index; + int consistency; + int old_peb_state = SSDFS_MAPTBL_PEB_STATE_MAX; + int err = 0; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("fsi %p, leb_id %llu, peb_type %#x, " + "peb_state %#x, init_end %p\n", + fsi, leb_id, peb_type, peb_state, end); +#else + SSDFS_DBG("fsi %p, leb_id %llu, peb_type %#x, " + "peb_state %#x, init_end %p\n", + fsi, leb_id, peb_type, peb_state, end); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !end); +#endif /* CONFIG_SSDFS_DEBUG */ + + tbl = fsi->maptbl; + cache = &tbl->fsi->maptbl_cache; + *end = NULL; + + if (peb_state <= SSDFS_MAPTBL_UNKNOWN_PEB_STATE || + peb_state >= SSDFS_MAPTBL_PEB_STATE_MAX) { + SSDFS_ERR("invalid PEB state %#x\n", + peb_state); + return -EINVAL; + } + + if (!tbl) { + err = 0; + + if (should_cache_peb_info(peb_type)) { + consistency = SSDFS_PEB_STATE_INCONSISTENT; + err = ssdfs_maptbl_cache_change_peb_state(cache, + leb_id, + peb_state, + consistency); + if (unlikely(err)) { + SSDFS_ERR("fail to change PEB state: " + "leb_id %llu, peb_state %#x, " + "err %d\n", + leb_id, peb_state, err); + } + } else { + err = -ERANGE; + SSDFS_CRIT("mapping table is absent\n"); + } + + return err; + } + + if (atomic_read(&tbl->flags) & SSDFS_MAPTBL_ERROR) { + ssdfs_fs_error(tbl->fsi->sb, + __FILE__, __func__, __LINE__, + "maptbl has corrupted state\n"); + return -EFAULT; + } + + if (atomic_read(&tbl->flags) & SSDFS_MAPTBL_UNDER_FLUSH) { + if (should_cache_peb_info(peb_type)) { + consistency = SSDFS_PEB_STATE_INCONSISTENT; + err = ssdfs_maptbl_cache_change_peb_state(cache, + leb_id, + peb_state, + consistency); + if (unlikely(err)) { + SSDFS_ERR("fail to change PEB state: " + "leb_id %llu, peb_state %#x, " + "err %d\n", + leb_id, peb_state, err); + } + + return err; + } + } + + if (should_cache_peb_info(peb_type)) { + /* resolve potential inconsistency */ + err = ssdfs_maptbl_convert_leb2peb(fsi, leb_id, peb_type, + &pebr, end); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: " + "leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to resolve inconsistency: " + "leb_id %llu, err %d\n", + leb_id, err); + return err; + } + } + + if (rwsem_is_locked(&tbl->tbl_lock) && + atomic_read(&tbl->flags) & SSDFS_MAPTBL_UNDER_FLUSH) { + if (should_cache_peb_info(peb_type)) { + consistency = SSDFS_PEB_STATE_INCONSISTENT; + err = ssdfs_maptbl_cache_change_peb_state(cache, + leb_id, + peb_state, + consistency); + if (unlikely(err)) { + SSDFS_ERR("fail to change PEB state: " + "leb_id %llu, peb_state %#x, " + "err %d\n", + leb_id, peb_state, err); + } + + return err; + } + } + + down_read(&tbl->tbl_lock); + + fdesc = ssdfs_maptbl_get_fragment_descriptor(tbl, leb_id); + if (IS_ERR_OR_NULL(fdesc)) { + err = IS_ERR(fdesc) ? PTR_ERR(fdesc) : -ERANGE; + SSDFS_ERR("fail to get fragment descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_change_state; + } + + *end = &fdesc->init_end; + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED) { + err = -EFAULT; + SSDFS_ERR("fragment is corrupted: leb_id %llu\n", + leb_id); + goto finish_change_state; + } else if (state == SSDFS_MAPTBL_FRAG_CREATED) { + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_change_state; + } + +#ifdef CONFIG_SSDFS_DEBUG + if (rwsem_is_locked(&fdesc->lock)) { + SSDFS_DBG("fragment is locked -> lock fragment: " + "leb_id %llu\n", leb_id); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&fdesc->lock); + + err = ssdfs_maptbl_get_leb_descriptor(fdesc, leb_id, &leb_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to get leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_fragment_change; + } + + err = ssdfs_maptbl_get_peb_relation(fdesc, &leb_desc, &pebr); + if (unlikely(err)) { + SSDFS_ERR("fail to get peb relation: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_fragment_change; + } + + switch (peb_state) { + case SSDFS_MAPTBL_BAD_PEB_STATE: + case SSDFS_MAPTBL_CLEAN_PEB_STATE: + case SSDFS_MAPTBL_USING_PEB_STATE: + case SSDFS_MAPTBL_USED_PEB_STATE: + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_PRE_ERASE_STATE: + case SSDFS_MAPTBL_RECOVERING_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE: + selected_index = le16_to_cpu(leb_desc.physical_index); + break; + + case SSDFS_MAPTBL_MIGRATION_DST_CLEAN_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_DIRTY_STATE: + selected_index = le16_to_cpu(leb_desc.relation_index); + break; + + default: + BUG(); + } + + if (selected_index == U16_MAX) { + err = -ENODATA; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unitialized leb descriptor: " + "leb_id %llu\n", leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_fragment_change; + } + + err = __ssdfs_maptbl_change_peb_state(tbl, fdesc, leb_id, + selected_index, + peb_state, + &old_peb_state); + if (err == -EEXIST) { + /* + * PEB has this state already. + * Don't set fragment dirty!!! + */ + goto finish_fragment_change; + } else if (err == -EACCES) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to change the PEB state: " + "leb_id %llu: " + "stripe is under recovering\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_fragment_change; + } else if (unlikely(err)) { + SSDFS_ERR("fail to change the PEB state: " + "leb_id %llu, peb_state %#x, err %d\n", + leb_id, peb_state, err); + goto finish_fragment_change; + } + +finish_fragment_change: + up_write(&fdesc->lock); + + if (!err) + ssdfs_maptbl_set_fragment_dirty(tbl, fdesc, leb_id); + +finish_change_state: + up_read(&tbl->tbl_lock); + + if (err == -EAGAIN && should_cache_peb_info(peb_type)) { + consistency = SSDFS_PEB_STATE_INCONSISTENT; + err = ssdfs_maptbl_cache_change_peb_state(cache, + leb_id, + peb_state, + consistency); + if (unlikely(err)) { + SSDFS_ERR("fail to change PEB state: " + "leb_id %llu, peb_state %#x, " + "err %d\n", + leb_id, peb_state, err); + } + } else if (!err && should_cache_peb_info(peb_type)) { + consistency = SSDFS_PEB_STATE_CONSISTENT; + err = ssdfs_maptbl_cache_change_peb_state(cache, + leb_id, + peb_state, + consistency); + if (unlikely(err)) { + SSDFS_ERR("fail to change PEB state: " + "leb_id %llu, peb_state %#x, " + "err %d\n", + leb_id, peb_state, err); + } + } else if (err == -EEXIST) { + /* PEB has this state already */ + err = 0; + + if (should_cache_peb_info(peb_type)) { + consistency = SSDFS_PEB_STATE_CONSISTENT; + err = ssdfs_maptbl_cache_change_peb_state(cache, + leb_id, + peb_state, + consistency); + if (unlikely(err)) { + SSDFS_ERR("fail to change PEB state: " + "leb_id %llu, peb_state %#x, " + "err %d\n", + leb_id, peb_state, err); + } + } + } + + if (!err && fsi->is_zns_device) { + u64 peb_id = U64_MAX; + + err = -ENODATA; + + switch (old_peb_state) { + case SSDFS_MAPTBL_CLEAN_PEB_STATE: + case SSDFS_MAPTBL_USING_PEB_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_USED_PEB_STATE: + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_PRE_ERASE_STATE: + case SSDFS_MAPTBL_RECOVERING_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE: + err = 0; + selected_index = SSDFS_MAPTBL_MAIN_INDEX; + peb_id = pebr.pebs[selected_index].peb_id; + break; + + default: + /* do nothing */ + break; + } + break; + + case SSDFS_MAPTBL_MIGRATION_DST_CLEAN_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_MIGRATION_DST_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_DIRTY_STATE: + err = 0; + selected_index = SSDFS_MAPTBL_RELATION_INDEX; + peb_id = pebr.pebs[selected_index].peb_id; + break; + + default: + /* do nothing */ + break; + } + + default: + /* do nothing */ + break; + }; + + if (!err) { + loff_t offset = peb_id * fsi->erasesize; + + err = fsi->devops->close_zone(fsi->sb, offset); + if (unlikely(err)) { + SSDFS_ERR("fail to close zone: " + "offset %llu, err %d\n", + offset, err); + return err; + } + } else + err = 0; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * __ssdfs_maptbl_unmap_dirty_peb() - unmap dirty PEB + * @ptr: fragment descriptor + * @leb_id: LEB ID number + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_maptbl_unmap_dirty_peb(struct ssdfs_maptbl_fragment_desc *ptr, + u64 leb_id) +{ + struct ssdfs_leb_table_fragment_header *hdr; + struct ssdfs_leb_descriptor *leb_desc; + pgoff_t page_index; + struct page *page; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr); + + SSDFS_DBG("fdesc %p, leb_id %llu\n", + ptr, leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_index = LEBTBL_PAGE_INDEX(ptr, leb_id); + if (page_index == ULONG_MAX) { + SSDFS_ERR("fail to define page_index: " + "leb_id %llu\n", + leb_id); + return -ERANGE; + } + + page = ssdfs_page_array_get_page_locked(&ptr->array, page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + page_index); + return err; + } + + kaddr = kmap_local_page(page); + + leb_desc = GET_LEB_DESCRIPTOR(kaddr, leb_id); + if (IS_ERR_OR_NULL(leb_desc)) { + err = IS_ERR(leb_desc) ? PTR_ERR(leb_desc) : -ERANGE; + SSDFS_ERR("fail to get leb_descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_page_processing; + } + + leb_desc->physical_index = cpu_to_le16(U16_MAX); + leb_desc->relation_index = cpu_to_le16(U16_MAX); + + hdr = (struct ssdfs_leb_table_fragment_header *)kaddr; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(le16_to_cpu(hdr->mapped_lebs) == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + le16_add_cpu(&hdr->mapped_lebs, -1); + +finish_page_processing: + kunmap_local(kaddr); + + if (!err) { + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + err = ssdfs_page_array_set_page_dirty(&ptr->array, + page_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: err %d\n", + page_index, err); + } + } + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_maptbl_prepare_pre_erase_state() - convert dirty PEB into pre-erased + * @fsi: file system info object + * @leb_id: LEB ID number + * @peb_type: type of the PEB + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to convert dirty PEB into pre-erase state + * in the mapping table. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EFAULT - maptbl has inconsistent state. + * %-EAGAIN - fragment is under initialization yet. + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EACCES - PEB stripe is under recovering. + * %-ENODATA - uninitialized LEB descriptor. + * %-EBUSY - maptbl is under flush operation. + */ +int ssdfs_maptbl_prepare_pre_erase_state(struct ssdfs_fs_info *fsi, + u64 leb_id, u8 peb_type, + struct completion **end) +{ + struct ssdfs_peb_mapping_table *tbl; + struct ssdfs_maptbl_cache *cache; + struct ssdfs_maptbl_fragment_desc *fdesc; + struct ssdfs_leb_descriptor leb_desc; + int state; + u16 physical_index, relation_index; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !end); + + SSDFS_DBG("fsi %p, leb_id %llu, peb_type %#x, " + "init_end %p\n", + fsi, leb_id, peb_type, end); +#endif /* CONFIG_SSDFS_DEBUG */ + + tbl = fsi->maptbl; + cache = &tbl->fsi->maptbl_cache; + *end = NULL; + + if (!tbl) { + SSDFS_WARN("operation is not supported\n"); + return -EOPNOTSUPP; + } + + if (atomic_read(&tbl->flags) & SSDFS_MAPTBL_ERROR) { + ssdfs_fs_error(tbl->fsi->sb, + __FILE__, __func__, __LINE__, + "maptbl has corrupted state\n"); + return -EFAULT; + } + + if (atomic_read(&tbl->flags) & SSDFS_MAPTBL_UNDER_FLUSH) { + SSDFS_DBG("maptbl is under flush\n"); + return -EBUSY; + } + + down_read(&tbl->tbl_lock); + + fdesc = ssdfs_maptbl_get_fragment_descriptor(tbl, leb_id); + if (IS_ERR_OR_NULL(fdesc)) { + err = IS_ERR(fdesc) ? PTR_ERR(fdesc) : -ERANGE; + SSDFS_ERR("fail to get fragment descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_change_state; + } + + *end = &fdesc->init_end; + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED) { + err = -EFAULT; + SSDFS_ERR("fragment is corrupted: leb_id %llu\n", + leb_id); + goto finish_change_state; + } else if (state == SSDFS_MAPTBL_FRAG_CREATED) { + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_change_state; + } + +#ifdef CONFIG_SSDFS_DEBUG + if (rwsem_is_locked(&fdesc->lock)) { + SSDFS_DBG("fragment is locked -> lock fragment: " + "leb_id %llu\n", leb_id); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&fdesc->lock); + + err = ssdfs_maptbl_get_leb_descriptor(fdesc, leb_id, &leb_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to get leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_fragment_change; + } + + if (!__is_mapped_leb2peb(&leb_desc)) { + err = -ERANGE; + SSDFS_ERR("leb %llu doesn't be mapped yet\n", + leb_id); + goto finish_fragment_change; + } + + if (is_leb_migrating(&leb_desc)) { + err = -ERANGE; + SSDFS_ERR("leb %llu is under migration\n", + leb_id); + goto finish_fragment_change; + } + + physical_index = le16_to_cpu(leb_desc.physical_index); + relation_index = le16_to_cpu(leb_desc.relation_index); + + if (relation_index != U16_MAX) { + err = -EFAULT; + SSDFS_ERR("fragment is corrupted: leb_id %llu\n", + leb_id); + goto finish_fragment_change; + } + + err = ssdfs_maptbl_set_pre_erase_state(fdesc, physical_index); + if (unlikely(err)) { + SSDFS_ERR("fail to move PEB into pre-erase state: " + "index %u, err %d\n", + physical_index, err); + goto finish_fragment_change; + } + + err = __ssdfs_maptbl_unmap_dirty_peb(fdesc, leb_id); + if (unlikely(err)) { + SSDFS_ERR("fail to change leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_fragment_change; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(fdesc->mapped_lebs == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + fdesc->mapped_lebs--; + fdesc->pre_erase_pebs++; + atomic_inc(&tbl->pre_erase_pebs); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fdesc->pre_erase_pebs %u, tbl->pre_erase_pebs %d\n", + fdesc->pre_erase_pebs, + atomic_read(&tbl->pre_erase_pebs)); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_fragment_change: + up_write(&fdesc->lock); + + if (!err) + ssdfs_maptbl_set_fragment_dirty(tbl, fdesc, leb_id); + + if (should_cache_peb_info(peb_type)) { + err = ssdfs_maptbl_cache_forget_leb2peb(cache, leb_id, + SSDFS_PEB_STATE_CONSISTENT); + if (err == -ENODATA || err == -EFAULT) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("leb_id %llu is not in cache already\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to forget leb_id %llu, err %d\n", + leb_id, err); + goto finish_change_state; + } + } + +finish_change_state: + wake_up(&tbl->wait_queue); + up_read(&tbl->tbl_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_maptbl_set_pre_erased_snapshot_peb() - set snapshot PEB as pre-erased + * @fsi: file system info object + * @peb_id: PEB ID number + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to convert snapshot PEB into pre-erase state + * in the mapping table. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EFAULT - maptbl has inconsistent state. + * %-EAGAIN - fragment is under initialization yet. + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EACCES - PEB stripe is under recovering. + * %-ENODATA - uninitialized LEB descriptor. + * %-EBUSY - maptbl is under flush operation. + */ +int ssdfs_maptbl_set_pre_erased_snapshot_peb(struct ssdfs_fs_info *fsi, + u64 peb_id, + struct completion **end) +{ + struct ssdfs_peb_mapping_table *tbl; + struct ssdfs_maptbl_fragment_desc *fdesc; + struct ssdfs_peb_descriptor peb_desc; + int state; + u16 physical_index; + u64 found_peb_id; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !end); + + SSDFS_DBG("fsi %p, peb_id %llu, init_end %p\n", + fsi, peb_id, end); +#endif /* CONFIG_SSDFS_DEBUG */ + + tbl = fsi->maptbl; + *end = NULL; + + if (!tbl) { + SSDFS_WARN("operation is not supported\n"); + return -EOPNOTSUPP; + } + + if (atomic_read(&tbl->flags) & SSDFS_MAPTBL_ERROR) { + ssdfs_fs_error(tbl->fsi->sb, + __FILE__, __func__, __LINE__, + "maptbl has corrupted state\n"); + return -EFAULT; + } + + if (atomic_read(&tbl->flags) & SSDFS_MAPTBL_UNDER_FLUSH) { + SSDFS_DBG("maptbl is under flush\n"); + return -EBUSY; + } + + down_read(&tbl->tbl_lock); + + fdesc = ssdfs_maptbl_get_fragment_descriptor(tbl, peb_id); + if (IS_ERR_OR_NULL(fdesc)) { + err = IS_ERR(fdesc) ? PTR_ERR(fdesc) : -ERANGE; + SSDFS_ERR("fail to get fragment descriptor: " + "peb_id %llu, err %d\n", + peb_id, err); + goto finish_change_state; + } + + *end = &fdesc->init_end; + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED) { + err = -EFAULT; + SSDFS_ERR("fragment is corrupted: peb_id %llu\n", + peb_id); + goto finish_change_state; + } else if (state == SSDFS_MAPTBL_FRAG_CREATED) { + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: peb_id %llu\n", + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_change_state; + } + +#ifdef CONFIG_SSDFS_DEBUG + if (rwsem_is_locked(&fdesc->lock)) { + SSDFS_DBG("fragment is locked -> lock fragment: " + "peb_id %llu\n", peb_id); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&fdesc->lock); + + if (peb_id < fdesc->start_leb || + peb_id > (fdesc->start_leb + fdesc->lebs_count)) { + err = -ERANGE; + SSDFS_ERR("peb_id %llu is out of range: " + "start_leb %llu, lebs_count %u\n", + peb_id, fdesc->start_leb, fdesc->lebs_count); + goto finish_fragment_change; + } + + physical_index = peb_id - fdesc->start_leb; + + err = ssdfs_maptbl_get_peb_descriptor(fdesc, physical_index, + &found_peb_id, &peb_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to get peb descriptor: " + "peb_id %llu, err %d\n", + peb_id, err); + goto finish_fragment_change; + } + + if (found_peb_id != peb_id) { + err = -ERANGE; + SSDFS_ERR("corrupted mapping table: " + "found_peb_id %llu != peb_id %llu\n", + found_peb_id, peb_id); + goto finish_fragment_change; + } + + if (peb_desc.state != SSDFS_MAPTBL_SNAPSHOT_STATE) { + err = -ERANGE; + SSDFS_ERR("unexpected PEB state: " + "peb_id %llu, state %#x\n", + peb_id, peb_desc.state); + goto finish_fragment_change; + } + + err = ssdfs_maptbl_set_pre_erase_state(fdesc, physical_index); + if (unlikely(err)) { + SSDFS_ERR("fail to move PEB into pre-erase state: " + "index %u, err %d\n", + physical_index, err); + goto finish_fragment_change; + } + + fdesc->pre_erase_pebs++; + atomic_inc(&tbl->pre_erase_pebs); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fdesc->pre_erase_pebs %u, tbl->pre_erase_pebs %d\n", + fdesc->pre_erase_pebs, + atomic_read(&tbl->pre_erase_pebs)); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_fragment_change: + up_write(&fdesc->lock); + + if (!err) + ssdfs_maptbl_set_fragment_dirty(tbl, fdesc, peb_id); + +finish_change_state: + wake_up(&tbl->wait_queue); + up_read(&tbl->tbl_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * has_fragment_reserved_pebs() - check that fragment has reserved PEBs + * @hdr: PEB table fragment's header + */ +static inline +bool has_fragment_reserved_pebs(struct ssdfs_peb_table_fragment_header *hdr) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!hdr); + + SSDFS_DBG("hdr %p, reserved_pebs %u\n", + hdr, le16_to_cpu(hdr->reserved_pebs)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return le16_to_cpu(hdr->reserved_pebs) != 0; +} + +/* + * ssdfs_maptbl_select_pebtbl_page() - select page of PEB table + * @tbl: pointer on mapping table object + * @fdesc: fragment descriptor + * @leb_id: LEB ID number + * + * This method tries to select a page of PEB table. + */ +static +int ssdfs_maptbl_select_pebtbl_page(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_fragment_desc *fdesc, + u64 leb_id, pgoff_t *page_index) +{ + pgoff_t start_page; + pgoff_t first_valid_page = ULONG_MAX; + struct page *page; + void *kaddr; + struct ssdfs_peb_table_fragment_header *hdr; + unsigned long *bmap; + u16 pebs_count, used_pebs; + u16 unused_pebs, reserved_pebs; + bool is_recovering = false; + bool has_reserved_pebs = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("maptbl %p, fdesc %p, leb_id %llu\n", + tbl, fdesc, leb_id); + + BUG_ON(!tbl || !fdesc || !page_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + *page_index = ssdfs_maptbl_define_pebtbl_page(tbl, fdesc, + leb_id, U16_MAX); + if (*page_index == ULONG_MAX) { + SSDFS_ERR("fail to define PEB table's page_index: " + "leb_id %llu\n", leb_id); + return -ERANGE; + } + + start_page = *page_index; + +try_next_page: + page = ssdfs_page_array_get_page_locked(&fdesc->array, *page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: " + "page_index %lu, err %d\n", + *page_index, err); + return -ERANGE; + } + + kaddr = kmap_local_page(page); + + hdr = (struct ssdfs_peb_table_fragment_header *)kaddr; + bmap = (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_USED_BMAP][0]; + pebs_count = le16_to_cpu(hdr->pebs_count); + used_pebs = bitmap_weight(bmap, pebs_count); + unused_pebs = pebs_count - used_pebs; + reserved_pebs = le16_to_cpu(hdr->reserved_pebs); + is_recovering = is_pebtbl_stripe_recovering(hdr); + + has_reserved_pebs = has_fragment_reserved_pebs(hdr); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d, page_index %lu\n", + page, page_ref_count(page), *page_index); + SSDFS_DBG("pebs_count %u, used_pebs %u, unused_pebs %u, " + "reserved_pebs %u, is_recovering %#x, " + "has_reserved_pebs %#x\n", + pebs_count, used_pebs, unused_pebs, + reserved_pebs, is_recovering, + has_reserved_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!has_reserved_pebs) { + err = ssdfs_maptbl_increase_reserved_pebs(tbl->fsi, fdesc, hdr); + if (!err) { + reserved_pebs = le16_to_cpu(hdr->reserved_pebs); + has_reserved_pebs = has_fragment_reserved_pebs(hdr); + } else if (err == -ENOSPC && unused_pebs > 0) { + /* we can take from the unused pool, anyway */ + err = 0; + } + } + + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + + if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find PEB table page: " + "leb_id %llu, page_index %lu\n", + leb_id, *page_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + *page_index = ssdfs_maptbl_find_pebtbl_page(tbl, fdesc, + *page_index, + start_page); + if (*page_index == ULONG_MAX) + goto use_first_valid_page; + else { + err = 0; + goto try_next_page; + } + } else if (unlikely(err)) { + *page_index = ULONG_MAX; + SSDFS_ERR("fail to increase reserved pebs: " + "err %d\n", err); + goto finish_select_pebtbl_page; + } + + if (is_recovering) { + *page_index = ssdfs_maptbl_find_pebtbl_page(tbl, fdesc, + *page_index, + start_page); + if (*page_index == ULONG_MAX) + goto use_first_valid_page; + else + goto try_next_page; + } else if (!has_reserved_pebs) { + if (unused_pebs > 0) { + first_valid_page = *page_index; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("take from unused pool: " + "leb_id %llu, unused_pebs %u, " + "reserved_pebs %u\n", + leb_id, unused_pebs, reserved_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + *page_index = ULONG_MAX; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find PEB table page: " + "leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + goto finish_select_pebtbl_page; + } else if (unused_pebs > 0) { + first_valid_page = *page_index; + + if (unused_pebs < reserved_pebs) { + *page_index = ssdfs_maptbl_find_pebtbl_page(tbl, fdesc, + *page_index, + start_page); + if (*page_index == ULONG_MAX) + goto use_first_valid_page; + else + goto try_next_page; + } else + goto finish_select_pebtbl_page; + } else + goto finish_select_pebtbl_page; + +use_first_valid_page: + if (first_valid_page >= ULONG_MAX) { + if (fdesc->pre_erase_pebs > 0) + err = -EBUSY; + else + err = -ENODATA; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find PEB table page: " + "leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + *page_index = first_valid_page; + +finish_select_pebtbl_page: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page_index %lu\n", *page_index); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; +} + +/* + * ssdfs_maptbl_set_peb_descriptor() - change PEB descriptor + * @fdesc: fragment descriptor + * @pebtbl_page: page index of PEB table + * @peb_goal: PEB purpose + * @peb_type: type of the PEB + * @item_index: item index in the memory page [out] + * + * This method tries to change PEB descriptor in the PEB table. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_set_peb_descriptor(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_fragment_desc *fdesc, + pgoff_t pebtbl_page, + int peb_goal, + u8 peb_type, + u16 *item_index) +{ + struct ssdfs_peb_table_fragment_header *hdr; + struct ssdfs_peb_descriptor *peb_desc; + struct page *page; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc || !item_index); + + SSDFS_DBG("fdesc %p, pebtbl_page %lu, " + "peb_goal %#x, peb_type %#x\n", + fdesc, pebtbl_page, peb_goal, peb_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + *item_index = U16_MAX; + + page = ssdfs_page_array_get_page_locked(&fdesc->array, pebtbl_page); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: " + "page_index %lu, err %d\n", + pebtbl_page, err); + return err; + } + + kaddr = kmap_local_page(page); + + hdr = (struct ssdfs_peb_table_fragment_header *)kaddr; + + *item_index = ssdfs_maptbl_select_unused_peb(fdesc, hdr, + tbl->pebs_count, + peb_goal); + if (*item_index >= U16_MAX) { + err = -ERANGE; + SSDFS_DBG("unable to select unused peb\n"); + goto finish_set_peb_descriptor; + } + + peb_desc = GET_PEB_DESCRIPTOR(hdr, *item_index); + if (IS_ERR_OR_NULL(peb_desc)) { + err = IS_ERR(peb_desc) ? PTR_ERR(peb_desc) : -ERANGE; + SSDFS_ERR("fail to get peb_descriptor: " + "index %u, err %d\n", + *item_index, err); + goto finish_set_peb_descriptor; + } + + peb_desc->type = peb_type; + peb_desc->state = SSDFS_MAPTBL_MIGRATION_DST_CLEAN_STATE; + + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + err = ssdfs_page_array_set_page_dirty(&fdesc->array, + pebtbl_page); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: err %d\n", + pebtbl_page, err); + } + +finish_set_peb_descriptor: + flush_dcache_page(page); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_maptbl_set_leb_descriptor() - change LEB descriptor + * @fdesc: fragment descriptor + * @leb_id: LEB ID number + * @pebtbl_page: page index of PEB table + * @item_index: item index in the memory page + * + * This method tries to change LEB descriptor in the LEB table. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_set_leb_descriptor(struct ssdfs_maptbl_fragment_desc *fdesc, + u64 leb_id, pgoff_t pebtbl_page, + u16 item_index) +{ + struct ssdfs_leb_descriptor *leb_desc; + struct ssdfs_leb_table_fragment_header *lebtbl_hdr; + pgoff_t lebtbl_page; + u16 peb_index; + struct page *page; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc); + + SSDFS_DBG("fdesc %p, leb_id %llu, pebtbl_page %lu, " + "item_index %u\n", + fdesc, leb_id, pebtbl_page, item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + lebtbl_page = LEBTBL_PAGE_INDEX(fdesc, leb_id); + if (lebtbl_page == ULONG_MAX) { + SSDFS_ERR("fail to define page_index: " + "leb_id %llu\n", + leb_id); + return -ERANGE; + } + + page = ssdfs_page_array_get_page_locked(&fdesc->array, lebtbl_page); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + lebtbl_page); + return err; + } + + kaddr = kmap_local_page(page); + + leb_desc = GET_LEB_DESCRIPTOR(kaddr, leb_id); + if (IS_ERR_OR_NULL(leb_desc)) { + err = IS_ERR(leb_desc) ? PTR_ERR(leb_desc) : -ERANGE; + SSDFS_ERR("fail to get leb_descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_page_processing; + } + + peb_index = DEFINE_PEB_INDEX_IN_FRAGMENT(fdesc, + pebtbl_page, + item_index); + if (peb_index == U16_MAX) { + err = -ERANGE; + SSDFS_ERR("fail to define peb index\n"); + goto finish_page_processing; + } + + leb_desc->relation_index = cpu_to_le16(peb_index); + + lebtbl_hdr = (struct ssdfs_leb_table_fragment_header *)kaddr; + le16_add_cpu(&lebtbl_hdr->migrating_lebs, 1); + + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + err = ssdfs_page_array_set_page_dirty(&fdesc->array, + lebtbl_page); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: err %d\n", + lebtbl_page, err); + } + +finish_page_processing: + flush_dcache_page(page); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_maptbl_add_migration_peb() - associate PEB for migration + * @fsi: file system info object + * @leb_id: LEB ID number + * @peb_type: type of the PEB + * @pebr: description of PEBs relation [out] + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to add in the pair destination PEB for + * data migration. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EFAULT - maptbl has inconsistent state. + * %-EAGAIN - fragment is under initialization yet. + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - unable to find PEB for migration. + * %-EEXIST - LEB is under migration yet. + */ +int ssdfs_maptbl_add_migration_peb(struct ssdfs_fs_info *fsi, + u64 leb_id, u8 peb_type, + struct ssdfs_maptbl_peb_relation *pebr, + struct completion **end) +{ + struct ssdfs_peb_mapping_table *tbl; + struct ssdfs_maptbl_cache *cache; + struct ssdfs_maptbl_fragment_desc *fdesc; + int state; + struct ssdfs_leb_descriptor leb_desc; + pgoff_t pebtbl_page = ULONG_MAX; + u16 item_index; + int consistency; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !pebr || !end); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("fsi %p, leb_id %llu, pebr %p, init_end %p\n", + fsi, leb_id, pebr, end); +#else + SSDFS_DBG("fsi %p, leb_id %llu, pebr %p, init_end %p\n", + fsi, leb_id, pebr, end); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + tbl = fsi->maptbl; + cache = &tbl->fsi->maptbl_cache; + *end = NULL; + + memset(pebr, 0xFF, sizeof(struct ssdfs_maptbl_peb_relation)); + + if (!tbl) { + SSDFS_CRIT("mapping table is absent\n"); + return -ERANGE; + } + + if (atomic_read(&tbl->flags) & SSDFS_MAPTBL_ERROR) { + ssdfs_fs_error(tbl->fsi->sb, + __FILE__, __func__, __LINE__, + "maptbl has corrupted state\n"); + return -EFAULT; + } + + if (should_cache_peb_info(peb_type)) { + struct ssdfs_maptbl_peb_relation prev_pebr; + + /* resolve potential inconsistency */ + err = ssdfs_maptbl_convert_leb2peb(fsi, leb_id, peb_type, + &prev_pebr, end); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: " + "leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to resolve inconsistency: " + "leb_id %llu, err %d\n", + leb_id, err); + return err; + } + } + + down_read(&tbl->tbl_lock); + + fdesc = ssdfs_maptbl_get_fragment_descriptor(tbl, leb_id); + if (IS_ERR_OR_NULL(fdesc)) { + err = IS_ERR(fdesc) ? PTR_ERR(fdesc) : -ERANGE; + SSDFS_ERR("fail to get fragment descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_add_migrating_peb; + } + + *end = &fdesc->init_end; + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED) { + err = -EFAULT; + SSDFS_ERR("fragment is corrupted: leb_id %llu\n", leb_id); + goto finish_add_migrating_peb; + } else if (state == SSDFS_MAPTBL_FRAG_CREATED) { + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_add_migrating_peb; + } + +#ifdef CONFIG_SSDFS_DEBUG + if (rwsem_is_locked(&fdesc->lock)) { + SSDFS_DBG("fragment is locked -> lock fragment: " + "leb_id %llu\n", leb_id); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&fdesc->lock); + + err = ssdfs_maptbl_get_leb_descriptor(fdesc, leb_id, &leb_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to get leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_fragment_change; + } + + if (!__is_mapped_leb2peb(&leb_desc)) { + err = -ERANGE; + SSDFS_ERR("leb %llu doesn't be mapped yet\n", + leb_id); + goto finish_fragment_change; + } + + if (is_leb_migrating(&leb_desc)) { + err = ssdfs_maptbl_get_peb_relation(fdesc, &leb_desc, pebr); + if (unlikely(err)) { + SSDFS_ERR("fail to get peb relation: " + "leb_id %llu, err %d\n", + leb_id, err); + } else { + err = -EEXIST; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("leb %llu is under migration yet\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + } + goto finish_fragment_change; + } + + err = ssdfs_maptbl_select_pebtbl_page(tbl, fdesc, leb_id, &pebtbl_page); + if (unlikely(err)) { + SSDFS_DBG("unable to find the peb table's page\n"); + goto finish_fragment_change; + } + + err = ssdfs_maptbl_set_peb_descriptor(tbl, fdesc, pebtbl_page, + SSDFS_MAPTBL_MIGRATING_PEB, + peb_type, &item_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set PEB descriptor: " + "pebtbl_page %lu, " + "peb_type %#x, err %d\n", + pebtbl_page, + peb_type, err); + goto finish_fragment_change; + } + + err = ssdfs_maptbl_set_leb_descriptor(fdesc, leb_id, + pebtbl_page, + item_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set LEB descriptor: " + "leb_id %llu, pebtbl_page %lu, " + "item_index %u, err %d\n", + leb_id, pebtbl_page, + item_index, err); + goto finish_fragment_change; + } + + fdesc->migrating_lebs++; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("mapped_lebs %u, migrating_lebs %u\n", + fdesc->mapped_lebs, fdesc->migrating_lebs); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_maptbl_get_leb_descriptor(fdesc, leb_id, &leb_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to get leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_fragment_change; + } + + err = ssdfs_maptbl_get_peb_relation(fdesc, &leb_desc, pebr); + if (unlikely(err)) { + SSDFS_ERR("fail to get peb relation: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_fragment_change; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("MAIN_INDEX: peb_id %llu, type %#x, " + "state %#x, consistency %#x; " + "RELATION_INDEX: peb_id %llu, type %#x, " + "state %#x, consistency %#x\n", + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id, + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].type, + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].consistency, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].peb_id, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].type, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].state, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].consistency); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_fragment_change: + up_write(&fdesc->lock); + + if (!err) + ssdfs_maptbl_set_fragment_dirty(tbl, fdesc, leb_id); + +finish_add_migrating_peb: + up_read(&tbl->tbl_lock); + + if (!err && should_cache_peb_info(peb_type)) { + consistency = SSDFS_PEB_STATE_CONSISTENT; + err = ssdfs_maptbl_cache_add_migration_peb(cache, leb_id, + pebr, + consistency); + if (unlikely(err)) { + SSDFS_ERR("fail to add migration PEB: " + "leb_id %llu, peb_id %llu, err %d\n", + leb_id, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].peb_id, + err); + err = -EFAULT; + } + } + + if (!err) { + u64 peb_id = pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].peb_id; + loff_t offset = peb_id * fsi->erasesize; + + err = fsi->devops->open_zone(fsi->sb, offset); + if (err == -EOPNOTSUPP && !fsi->is_zns_device) { + /* ignore error */ + err = 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to open zone: " + "offset %llu, err %d\n", + offset, err); + return err; + } + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * need_erase_peb_now() - does it need to erase PEB now? + * @fdesc: fragment descriptor + */ +static inline +bool need_erase_peb_now(struct ssdfs_maptbl_fragment_desc *fdesc) +{ + u32 percentage; + u32 unused_lebs; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc); + BUG_ON(!rwsem_is_locked(&fdesc->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + percentage = (fdesc->pre_erase_pebs * 100) / fdesc->lebs_count; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("lebs_count %u, pre_erase_pebs %u, " + "percentage %u\n", + fdesc->lebs_count, + fdesc->pre_erase_pebs, + percentage); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (percentage > SSDFS_PRE_ERASE_PEB_THRESHOLD_PCT) + return true; + + unused_lebs = fdesc->lebs_count; + unused_lebs -= fdesc->mapped_lebs; + unused_lebs -= fdesc->migrating_lebs; + unused_lebs -= fdesc->pre_erase_pebs; + unused_lebs -= fdesc->recovering_pebs; + + percentage = (unused_lebs * 100) / fdesc->lebs_count; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("lebs_count %u, mapped_lebs %u, " + "migrating_lebs %u, pre_erase_pebs %u, " + "recovering_pebs %u, reserved_pebs %u, " + "percentage %u\n", + fdesc->lebs_count, fdesc->mapped_lebs, + fdesc->migrating_lebs, fdesc->pre_erase_pebs, + fdesc->recovering_pebs, fdesc->reserved_pebs, + percentage); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (percentage <= SSDFS_UNUSED_LEB_THRESHOLD_PCT) + return true; + + return false; +} + +/* + * ssdfs_maptbl_erase_reserved_peb_now() - erase reserved dirty PEB + * @fsi: file system info object + * @leb_id: LEB ID number + * @peb_type: PEB type + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to erase a reserved dirty PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EFAULT - maptbl has inconsistent state. + * %-EAGAIN - fragment is under initialization yet. + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_maptbl_erase_reserved_peb_now(struct ssdfs_fs_info *fsi, + u64 leb_id, u8 peb_type, + struct completion **end) +{ + struct ssdfs_peb_mapping_table *tbl; + struct ssdfs_maptbl_fragment_desc *fdesc; + struct ssdfs_maptbl_peb_relation pebr; + struct ssdfs_maptbl_peb_descriptor *ptr; + struct ssdfs_erase_result res; + int state; + struct ssdfs_leb_descriptor leb_desc; + u16 physical_index; + u64 peb_id; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !end); + + SSDFS_DBG("fsi %p, leb_id %llu, init_end %p\n", + fsi, leb_id, end); +#endif /* CONFIG_SSDFS_DEBUG */ + + tbl = fsi->maptbl; + *end = NULL; + + if (atomic_read(&tbl->flags) & SSDFS_MAPTBL_ERROR) { + ssdfs_fs_error(tbl->fsi->sb, + __FILE__, __func__, __LINE__, + "maptbl has corrupted state\n"); + return -EFAULT; + } + + if (atomic_read(&tbl->flags) & SSDFS_MAPTBL_UNDER_FLUSH) + BUG(); + + down_read(&tbl->tbl_lock); + + fdesc = ssdfs_maptbl_get_fragment_descriptor(tbl, leb_id); + if (IS_ERR_OR_NULL(fdesc)) { + err = IS_ERR(fdesc) ? PTR_ERR(fdesc) : -ERANGE; + SSDFS_ERR("fail to get fragment descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_erase_reserved_peb; + } + + *end = &fdesc->init_end; + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED) { + err = -EFAULT; + SSDFS_ERR("fragment is corrupted: leb_id %llu\n", leb_id); + goto finish_erase_reserved_peb; + } else if (state == SSDFS_MAPTBL_FRAG_CREATED) { + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_erase_reserved_peb; + } + + down_write(&fdesc->lock); + + err = ssdfs_maptbl_get_leb_descriptor(fdesc, leb_id, &leb_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to get leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_fragment_change; + } + + if (!__is_mapped_leb2peb(&leb_desc)) { + err = -ERANGE; + SSDFS_ERR("leb %llu has not been mapped yet\n", + leb_id); + goto finish_fragment_change; + } + + physical_index = le16_to_cpu(leb_desc.physical_index); + + err = ssdfs_maptbl_get_peb_relation(fdesc, &leb_desc, &pebr); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to get peb relation: " + "leb_id %llu, err %d\n", + leb_id, err); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_fragment_change; + } else if (unlikely(err)) { + SSDFS_ERR("fail to get peb relation: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_fragment_change; + } + + err = ssdfs_maptbl_set_under_erase_state(fdesc, physical_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set PEB as under erase state: " + "index %u, err %d\n", + physical_index, err); + goto finish_fragment_change; + } + + ptr = &pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX]; + peb_id = ptr->peb_id; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("erase peb_id %llu now\n", + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + SSDFS_ERASE_RESULT_INIT(fdesc->fragment_id, physical_index, + peb_id, SSDFS_ERASE_RESULT_UNKNOWN, + &res); + + up_write(&fdesc->lock); + err = ssdfs_maptbl_erase_peb(fsi, &res); + if (unlikely(err)) { + SSDFS_ERR("fail to erase: " + "peb_id %llu, err %d\n", + peb_id, err); + goto finish_erase_reserved_peb; + } + down_write(&fdesc->lock); + + switch (res.state) { + case SSDFS_ERASE_DONE: + res.state = SSDFS_ERASE_SB_PEB_DONE; + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to erase: peb_id %llu\n", + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + } + + fdesc->pre_erase_pebs++; + atomic_inc(&tbl->pre_erase_pebs); + + err = ssdfs_maptbl_correct_dirty_peb(tbl, fdesc, &res); + if (unlikely(err)) { + SSDFS_ERR("fail to correct dirty PEB's state: " + "err %d\n", err); + goto finish_fragment_change; + } + +finish_fragment_change: + up_write(&fdesc->lock); + + if (!err) + ssdfs_maptbl_set_fragment_dirty(tbl, fdesc, leb_id); + +finish_erase_reserved_peb: + up_read(&tbl->tbl_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * is_ssdfs_peb_contains_snapshot() - check that PEB contains snapshot + * @fsi: file system info object + * @peb_type: PEB type + * @peb_create_time: PEB creation time + * @last_log_time: last log creation time + * + * This method tries to check that PEB contains a snapshot. + */ +static +bool is_ssdfs_peb_contains_snapshot(struct ssdfs_fs_info *fsi, + u8 peb_type, + u64 peb_create_time, + u64 last_log_time) +{ + struct ssdfs_snapshots_btree_info *tree; + struct ssdfs_btree_search *search = NULL; + struct ssdfs_timestamp_range range; + bool is_contains_snapshot = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + + SSDFS_DBG("peb_type %#x, peb_create_time %llu, " + "last_log_time %llu\n", + peb_type, peb_create_time, + last_log_time); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (peb_type) { + case SSDFS_MAPTBL_DATA_PEB_TYPE: + case SSDFS_MAPTBL_LNODE_PEB_TYPE: + case SSDFS_MAPTBL_HNODE_PEB_TYPE: + case SSDFS_MAPTBL_IDXNODE_PEB_TYPE: + /* continue logic */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PEB hasn't snapshot: " + "peb_type %#x\n", + peb_type); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + tree = fsi->snapshots.tree; + + search = ssdfs_btree_search_alloc(); + if (!search) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate btree search object\n"); + goto finish_search_snapshots_range; + } + + range.start = peb_create_time; + range.end = last_log_time; + + ssdfs_btree_search_init(search); + err = ssdfs_snapshots_btree_check_range(tree, &range, search); + if (err == -ENODATA) { + err = 0; + is_contains_snapshot = false; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find snapshot: " + "start_timestamp %llu, end_timestamp %llu\n", + peb_create_time, last_log_time); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (err == -EAGAIN) { + err = 0; + is_contains_snapshot = true; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("snapshots have been found: " + "start_timestamp %llu, end_timestamp %llu\n", + peb_create_time, last_log_time); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (unlikely(err)) { + SSDFS_WARN("fail to find snapshot: " + "start_timestamp %llu, end_timestamp %llu, " + "err %d\n", + peb_create_time, last_log_time, err); + } else { + is_contains_snapshot = true; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("snapshots have been found: " + "start_timestamp %llu, end_timestamp %llu\n", + peb_create_time, last_log_time); +#endif /* CONFIG_SSDFS_DEBUG */ + } + +finish_search_snapshots_range: + ssdfs_btree_search_free(search); + + if (unlikely(err)) + return false; + + return is_contains_snapshot; +} + +/* + * ssdfs_maptbl_exclude_migration_peb() - exclude PEB from migration + * @fsi: file system info object + * @leb_id: LEB ID number + * @peb_type: PEB type + * @peb_create_time: PEB creation time + * @last_log_time: last log creation time + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to exclude PEB from migration association. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EFAULT - maptbl has inconsistent state. + * %-EAGAIN - fragment is under initialization yet. + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_maptbl_exclude_migration_peb(struct ssdfs_fs_info *fsi, + u64 leb_id, u8 peb_type, + u64 peb_create_time, + u64 last_log_time, + struct completion **end) +{ + struct ssdfs_peb_mapping_table *tbl; + struct ssdfs_maptbl_cache *cache; + struct ssdfs_snapshots_btree_info *snap_tree; + struct ssdfs_maptbl_fragment_desc *fdesc; + struct ssdfs_maptbl_peb_relation pebr; + struct ssdfs_maptbl_peb_descriptor *ptr; + struct ssdfs_erase_result res; + int state; + struct ssdfs_leb_descriptor leb_desc; + u16 physical_index, relation_index; + int consistency; + u64 peb_id; + bool need_erase = false; + bool peb_contains_snapshot = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !end); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("fsi %p, leb_id %llu, init_end %p\n", + fsi, leb_id, end); +#else + SSDFS_DBG("fsi %p, leb_id %llu, init_end %p\n", + fsi, leb_id, end); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + tbl = fsi->maptbl; + cache = &tbl->fsi->maptbl_cache; + snap_tree = fsi->snapshots.tree; + *end = NULL; + + if (!tbl) { + err = 0; + + if (should_cache_peb_info(peb_type)) { + consistency = SSDFS_PEB_STATE_PRE_DELETED; + err = ssdfs_maptbl_cache_exclude_migration_peb(cache, + leb_id, + consistency); + if (unlikely(err)) { + SSDFS_ERR("fail to exclude migration PEB: " + "leb_id %llu, err %d\n", + leb_id, err); + } + } else { + err = -ERANGE; + SSDFS_CRIT("mapping table is absent\n"); + } + + return err; + } + + if (atomic_read(&tbl->flags) & SSDFS_MAPTBL_ERROR) { + ssdfs_fs_error(tbl->fsi->sb, + __FILE__, __func__, __LINE__, + "maptbl has corrupted state\n"); + return -EFAULT; + } + + if (atomic_read(&tbl->flags) & SSDFS_MAPTBL_UNDER_FLUSH) { + if (should_cache_peb_info(peb_type)) { + consistency = SSDFS_PEB_STATE_PRE_DELETED; + err = ssdfs_maptbl_cache_exclude_migration_peb(cache, + leb_id, + consistency); + if (unlikely(err)) { + SSDFS_ERR("fail to exclude migration PEB: " + "leb_id %llu, err %d\n", + leb_id, err); + } + + return err; + } + } + + if (should_cache_peb_info(peb_type)) { + struct ssdfs_maptbl_peb_relation prev_pebr; + + /* resolve potential inconsistency */ + err = ssdfs_maptbl_convert_leb2peb(fsi, leb_id, peb_type, + &prev_pebr, end); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: " + "leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to resolve inconsistency: " + "leb_id %llu, err %d\n", + leb_id, err); + return err; + } + } + + if (rwsem_is_locked(&tbl->tbl_lock) && + atomic_read(&tbl->flags) & SSDFS_MAPTBL_UNDER_FLUSH) { + if (should_cache_peb_info(peb_type)) { + consistency = SSDFS_PEB_STATE_PRE_DELETED; + err = ssdfs_maptbl_cache_exclude_migration_peb(cache, + leb_id, + consistency); + if (unlikely(err)) { + SSDFS_ERR("fail to exclude migration PEB: " + "leb_id %llu, err %d\n", + leb_id, err); + } + + return err; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_create_time %llx, last_log_time %llx\n", + peb_create_time, last_log_time); +#endif /* CONFIG_SSDFS_DEBUG */ + + peb_contains_snapshot = is_ssdfs_peb_contains_snapshot(fsi, peb_type, + peb_create_time, + last_log_time); + + down_read(&tbl->tbl_lock); + + fdesc = ssdfs_maptbl_get_fragment_descriptor(tbl, leb_id); + if (IS_ERR_OR_NULL(fdesc)) { + err = IS_ERR(fdesc) ? PTR_ERR(fdesc) : -ERANGE; + SSDFS_ERR("fail to get fragment descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_exclude_migrating_peb; + } + + *end = &fdesc->init_end; + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED) { + err = -EFAULT; + SSDFS_ERR("fragment is corrupted: leb_id %llu\n", leb_id); + goto finish_exclude_migrating_peb; + } else if (state == SSDFS_MAPTBL_FRAG_CREATED) { + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_exclude_migrating_peb; + } + +#ifdef CONFIG_SSDFS_DEBUG + if (rwsem_is_locked(&fdesc->lock)) { + SSDFS_DBG("fragment is locked -> lock fragment: " + "leb_id %llu\n", leb_id); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&fdesc->lock); + + err = ssdfs_maptbl_get_leb_descriptor(fdesc, leb_id, &leb_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to get leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_fragment_change; + } + + if (!__is_mapped_leb2peb(&leb_desc)) { + err = -ERANGE; + SSDFS_ERR("leb %llu has not been mapped yet\n", + leb_id); + goto finish_fragment_change; + } + + if (!is_leb_migrating(&leb_desc)) { + err = -ERANGE; + SSDFS_ERR("leb %llu isn't under migration\n", + leb_id); + goto finish_fragment_change; + } + + physical_index = le16_to_cpu(leb_desc.physical_index); + relation_index = le16_to_cpu(leb_desc.relation_index); + + need_erase = need_erase_peb_now(fdesc); + + if (peb_contains_snapshot) { + struct ssdfs_peb_timestamps peb2time; + struct ssdfs_btree_search *search = NULL; + + need_erase = false; + + err = ssdfs_maptbl_get_peb_relation(fdesc, &leb_desc, &pebr); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to get peb relation: " + "leb_id %llu, err %d\n", + leb_id, err); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_fragment_change; + } else if (unlikely(err)) { + SSDFS_ERR("fail to get peb relation: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_fragment_change; + } + + err = ssdfs_maptbl_set_snapshot_state(fdesc, physical_index); + if (unlikely(err)) { + SSDFS_ERR("fail to move PEB into snapshot state: " + "index %u, err %d\n", + physical_index, err); + goto finish_fragment_change; + } + + peb2time.peb_id = pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id; + peb2time.create_time = peb_create_time; + peb2time.last_log_time = last_log_time; + + search = ssdfs_btree_search_alloc(); + if (!search) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate btree search object\n"); + goto finish_fragment_change; + } + + ssdfs_btree_search_init(search); + err = ssdfs_snapshots_btree_add_peb2time(snap_tree, &peb2time, + search); + ssdfs_btree_search_free(search); + + if (unlikely(err)) { + SSDFS_ERR("fail to add peb2time: " + "peb_id %llu, peb_create_time %llu, " + "last_log_time %llu, err %d\n", + peb2time.peb_id, peb2time.create_time, + peb2time.last_log_time, err); + goto finish_fragment_change; + } + + err = ssdfs_maptbl_set_source_state(fdesc, relation_index, + SSDFS_MAPTBL_UNKNOWN_PEB_STATE); + if (unlikely(err)) { + SSDFS_ERR("fail to move PEB into source state: " + "index %u, err %d\n", + relation_index, err); + goto finish_fragment_change; + } + + err = __ssdfs_maptbl_exclude_migration_peb(fdesc, leb_id); + if (unlikely(err)) { + SSDFS_ERR("fail to change leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_fragment_change; + } + } else if (need_erase) { + err = ssdfs_maptbl_get_peb_relation(fdesc, &leb_desc, &pebr); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to get peb relation: " + "leb_id %llu, err %d\n", + leb_id, err); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_fragment_change; + } else if (unlikely(err)) { + SSDFS_ERR("fail to get peb relation: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_fragment_change; + } + + err = ssdfs_maptbl_set_under_erase_state(fdesc, physical_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set PEB as under erase state: " + "index %u, err %d\n", + physical_index, err); + goto finish_fragment_change; + } + + err = ssdfs_maptbl_set_source_state(fdesc, relation_index, + SSDFS_MAPTBL_UNKNOWN_PEB_STATE); + if (unlikely(err)) { + SSDFS_ERR("fail to move PEB into source state: " + "index %u, err %d\n", + relation_index, err); + goto finish_fragment_change; + } + + err = __ssdfs_maptbl_exclude_migration_peb(fdesc, leb_id); + if (unlikely(err)) { + SSDFS_ERR("fail to change leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_fragment_change; + } + + ptr = &pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX]; + peb_id = ptr->peb_id; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("erase peb_id %llu now\n", + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + SSDFS_ERASE_RESULT_INIT(fdesc->fragment_id, physical_index, + peb_id, SSDFS_ERASE_RESULT_UNKNOWN, + &res); + + up_write(&fdesc->lock); + err = ssdfs_maptbl_erase_peb(fsi, &res); + if (unlikely(err)) { + SSDFS_ERR("fail to erase: " + "peb_id %llu, err %d\n", + peb_id, err); + goto finish_exclude_migrating_peb; + } + down_write(&fdesc->lock); + + switch (res.state) { + case SSDFS_ERASE_DONE: + /* expected state */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to erase: peb_id %llu\n", + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + } + } else { + err = ssdfs_maptbl_set_pre_erase_state(fdesc, physical_index); + if (unlikely(err)) { + SSDFS_ERR("fail to move PEB into pre-erase state: " + "index %u, err %d\n", + physical_index, err); + goto finish_fragment_change; + } + + err = ssdfs_maptbl_set_source_state(fdesc, relation_index, + SSDFS_MAPTBL_UNKNOWN_PEB_STATE); + if (unlikely(err)) { + SSDFS_ERR("fail to move PEB into source state: " + "index %u, err %d\n", + relation_index, err); + goto finish_fragment_change; + } + + err = __ssdfs_maptbl_exclude_migration_peb(fdesc, leb_id); + if (unlikely(err)) { + SSDFS_ERR("fail to change leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_fragment_change; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(fdesc->migrating_lebs == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + fdesc->migrating_lebs--; + fdesc->pre_erase_pebs++; + atomic_inc(&tbl->pre_erase_pebs); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("mapped_lebs %u, migrating_lebs %u\n", + fdesc->mapped_lebs, fdesc->migrating_lebs); + SSDFS_DBG("fdesc->pre_erase_pebs %u, tbl->pre_erase_pebs %d\n", + fdesc->pre_erase_pebs, + atomic_read(&tbl->pre_erase_pebs)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (need_erase) { + err = ssdfs_maptbl_correct_dirty_peb(tbl, fdesc, &res); + if (unlikely(err)) { + SSDFS_ERR("fail to correct dirty PEB's state: " + "err %d\n", err); + goto finish_fragment_change; + } + } + + wake_up(&tbl->wait_queue); + +finish_fragment_change: + up_write(&fdesc->lock); + + if (!err) + ssdfs_maptbl_set_fragment_dirty(tbl, fdesc, leb_id); + +finish_exclude_migrating_peb: + up_read(&tbl->tbl_lock); + + if (err == -EAGAIN && should_cache_peb_info(peb_type)) { + consistency = SSDFS_PEB_STATE_PRE_DELETED; + err = ssdfs_maptbl_cache_exclude_migration_peb(cache, + leb_id, + consistency); + if (unlikely(err)) { + SSDFS_ERR("fail to exclude migration PEB: " + "leb_id %llu, err %d\n", + leb_id, err); + } + } else if (!err && should_cache_peb_info(peb_type)) { + consistency = SSDFS_PEB_STATE_CONSISTENT; + err = ssdfs_maptbl_cache_exclude_migration_peb(cache, + leb_id, + consistency); + if (unlikely(err)) { + SSDFS_ERR("fail to exclude migration PEB: " + "leb_id %llu, err %d\n", + leb_id, err); + } + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * ssdfs_maptbl_set_peb_as_shared() - set destination PEB as shared + * @fdesc: fragment descriptor + * @index: PEB index in the fragment + * @peb_type: PEB type + * + * This method tries to set SSDFS_MAPTBL_SHARED_DESTINATION_PEB flag + * in destination PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_set_peb_as_shared(struct ssdfs_maptbl_fragment_desc *fdesc, + u16 index, u8 peb_type) +{ + struct ssdfs_peb_descriptor *ptr; + pgoff_t page_index; + u16 item_index; + struct page *page; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc); + + SSDFS_DBG("fdesc %p, index %u\n", + fdesc, index); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_index = PEBTBL_PAGE_INDEX(fdesc, index); + item_index = index % fdesc->pebs_per_page; + + page = ssdfs_page_array_get_page_locked(&fdesc->array, page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + page_index); + return err; + } + + kaddr = kmap_local_page(page); + + ptr = GET_PEB_DESCRIPTOR(kaddr, item_index); + if (IS_ERR_OR_NULL(ptr)) { + err = IS_ERR(ptr) ? PTR_ERR(ptr) : -ERANGE; + SSDFS_ERR("fail to get peb_descriptor: " + "page_index %lu, item_index %u, err %d\n", + page_index, item_index, err); + goto finish_page_processing; + } + + if (peb_type != ptr->type) { + err = -ERANGE; + SSDFS_ERR("peb_type %#x != ptr->type %#x\n", + peb_type, ptr->type); + goto finish_page_processing; + } + + switch (ptr->state) { + case SSDFS_MAPTBL_MIGRATION_DST_CLEAN_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + /* valid state */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid PEB state %#x\n", + ptr->state); + goto finish_page_processing; + } + + if (ptr->flags & SSDFS_MAPTBL_SOURCE_PEB_HAS_EXT_PTR || + ptr->shared_peb_index != U16_MAX) { + err = -ERANGE; + SSDFS_ERR("corrupted PEB desriptor\n"); + goto finish_page_processing; + } + + ptr->flags |= SSDFS_MAPTBL_SHARED_DESTINATION_PEB; + +finish_page_processing: + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_maptbl_set_shared_destination_peb() - set destination PEB as shared + * @tbl: pointer on mapping table object + * @leb_id: LEB ID number + * @peb_type: PEB type + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to set SSDFS_MAPTBL_SHARED_DESTINATION_PEB flag + * in destination PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EFAULT - maptbl has inconsistent state. + * %-EAGAIN - fragment is under initialization yet. + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_set_shared_destination_peb(struct ssdfs_peb_mapping_table *tbl, + u64 leb_id, u8 peb_type, + struct completion **end) +{ + struct ssdfs_maptbl_fragment_desc *fdesc; + int state; + struct ssdfs_leb_descriptor leb_desc; + u16 relation_index; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !end); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + + SSDFS_DBG("maptbl %p, leb_id %llu, peb_type %#x\n", + tbl, leb_id, peb_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + fdesc = ssdfs_maptbl_get_fragment_descriptor(tbl, leb_id); + if (IS_ERR_OR_NULL(fdesc)) { + err = IS_ERR(fdesc) ? PTR_ERR(fdesc) : -ERANGE; + SSDFS_ERR("fail to get fragment descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + return err; + } + + *end = &fdesc->init_end; + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED) { + err = -EFAULT; + SSDFS_ERR("fragment is corrupted: leb_id %llu\n", leb_id); + return err; + } else if (state == SSDFS_MAPTBL_FRAG_CREATED) { + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } + + down_write(&fdesc->lock); + + err = ssdfs_maptbl_get_leb_descriptor(fdesc, leb_id, &leb_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to get leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_fragment_change; + } + + if (!__is_mapped_leb2peb(&leb_desc)) { + err = -ERANGE; + SSDFS_ERR("leb %llu doesn't be mapped yet\n", + leb_id); + goto finish_fragment_change; + } + + if (!is_leb_migrating(&leb_desc)) { + err = -ERANGE; + SSDFS_ERR("leb %llu isn't under migration\n", + leb_id); + goto finish_fragment_change; + } + + relation_index = le16_to_cpu(leb_desc.relation_index); + + if (relation_index == U16_MAX) { + err = -ENODATA; + SSDFS_DBG("unitialized leb descriptor\n"); + goto finish_fragment_change; + } + + err = ssdfs_maptbl_set_peb_as_shared(fdesc, relation_index, + peb_type); + if (unlikely(err)) { + SSDFS_ERR("fail to set shared destination PEB: " + "relation_index %u, err %d\n", + relation_index, err); + goto finish_fragment_change; + } + +finish_fragment_change: + up_write(&fdesc->lock); + + if (!err) + ssdfs_maptbl_set_fragment_dirty(tbl, fdesc, leb_id); + + return err; +} + +/* + * ssdfs_maptbl_set_external_peb_ptr() - define PEB as external pointer + * @fdesc: fragment descriptor + * @index: PEB index in the fragment + * @peb_type: PEB type + * @dst_peb_index: destination PEB index + * + * This method tries to define index of destination PEB and to set + * SSDFS_MAPTBL_SOURCE_PEB_HAS_EXT_PTR flag. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_set_external_peb_ptr(struct ssdfs_maptbl_fragment_desc *fdesc, + u16 index, u8 peb_type, + u16 dst_peb_index) +{ + struct ssdfs_peb_descriptor *ptr; + pgoff_t page_index; + u16 item_index; + struct page *page; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc); + + SSDFS_DBG("fdesc %p, index %u\n", + fdesc, index); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_index = PEBTBL_PAGE_INDEX(fdesc, index); + item_index = index % fdesc->pebs_per_page; + + page = ssdfs_page_array_get_page_locked(&fdesc->array, page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + page_index); + return err; + } + + kaddr = kmap_local_page(page); + + ptr = GET_PEB_DESCRIPTOR(kaddr, item_index); + if (IS_ERR_OR_NULL(ptr)) { + err = IS_ERR(ptr) ? PTR_ERR(ptr) : -ERANGE; + SSDFS_ERR("fail to get peb_descriptor: " + "page_index %lu, item_index %u, err %d\n", + page_index, item_index, err); + goto finish_page_processing; + } + + if (peb_type != ptr->type) { + err = -ERANGE; + SSDFS_ERR("peb_type %#x != ptr->type %#x\n", + peb_type, ptr->type); + goto finish_page_processing; + } + + if (ptr->flags & SSDFS_MAPTBL_SHARED_DESTINATION_PEB) { + err = -ERANGE; + SSDFS_ERR("corrupted PEB desriptor\n"); + goto finish_page_processing; + } + + switch (ptr->state) { + case SSDFS_MAPTBL_USED_PEB_STATE: + ptr->state = SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE; + break; + + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + ptr->state = SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE; + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid PEB state %#x\n", + ptr->state); + goto finish_page_processing; + } + + if (dst_peb_index >= U8_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid dst_peb_index %u\n", + dst_peb_index); + goto finish_page_processing; + } + + ptr->shared_peb_index = (u8)dst_peb_index; + ptr->flags |= SSDFS_MAPTBL_SOURCE_PEB_HAS_EXT_PTR; + +finish_page_processing: + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * __ssdfs_maptbl_set_indirect_relation() - set destination PEB as shared + * @tbl: pointer on mapping table object + * @leb_id: LEB ID number + * @peb_type: PEB type + * @dst_peb_index: destination PEB index + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to define index of destination PEB and to set + * SSDFS_MAPTBL_SOURCE_PEB_HAS_EXT_PTR flag. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EFAULT - maptbl has inconsistent state. + * %-EAGAIN - fragment is under initialization yet. + * %-ERANGE - internal error. + */ +static +int __ssdfs_maptbl_set_indirect_relation(struct ssdfs_peb_mapping_table *tbl, + u64 leb_id, u8 peb_type, + u16 dst_peb_index, + struct completion **end) +{ + struct ssdfs_maptbl_fragment_desc *fdesc; + int state; + struct ssdfs_leb_descriptor leb_desc; + u16 physical_index; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !end); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + + SSDFS_DBG("maptbl %p, leb_id %llu, peb_type %#x, " + "dst_peb_index %u\n", + tbl, leb_id, peb_type, dst_peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + fdesc = ssdfs_maptbl_get_fragment_descriptor(tbl, leb_id); + if (IS_ERR_OR_NULL(fdesc)) { + err = IS_ERR(fdesc) ? PTR_ERR(fdesc) : -ERANGE; + SSDFS_ERR("fail to get fragment descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + return err; + } + + *end = &fdesc->init_end; + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED) { + err = -EFAULT; + SSDFS_ERR("fragment is corrupted: leb_id %llu\n", leb_id); + return err; + } else if (state == SSDFS_MAPTBL_FRAG_CREATED) { + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } + + down_write(&fdesc->lock); + + err = ssdfs_maptbl_get_leb_descriptor(fdesc, leb_id, &leb_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to get leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_fragment_change; + } + + if (!__is_mapped_leb2peb(&leb_desc)) { + err = -ERANGE; + SSDFS_ERR("leb %llu doesn't be mapped yet\n", + leb_id); + goto finish_fragment_change; + } + + if (is_leb_migrating(&leb_desc)) { + err = -ERANGE; + SSDFS_ERR("leb %llu has direct relation\n", + leb_id); + goto finish_fragment_change; + } + + physical_index = le16_to_cpu(leb_desc.physical_index); + + if (physical_index == U16_MAX) { + err = -ENODATA; + SSDFS_DBG("unitialized leb descriptor\n"); + goto finish_fragment_change; + } + + err = ssdfs_maptbl_set_external_peb_ptr(fdesc, physical_index, + peb_type, dst_peb_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set external PEB pointer: " + "physical_index %u, err %d\n", + physical_index, err); + goto finish_fragment_change; + } + +finish_fragment_change: + up_write(&fdesc->lock); + + if (!err) + ssdfs_maptbl_set_fragment_dirty(tbl, fdesc, leb_id); + + return err; +} + +/* + * ssdfs_maptbl_set_indirect_relation() - set PEBs indirect relation + * @tbl: pointer on mapping table object + * @leb_id: source LEB ID number + * @peb_type: PEB type + * @dst_leb_id: destination LEB ID number + * @dst_peb_index: destination PEB index + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to set SSDFS_MAPTBL_SHARED_DESTINATION_PEB flag + * in destination PEB. Then it tries to define index of destination PEB + * and to set SSDFS_MAPTBL_SOURCE_PEB_HAS_EXT_PTR flag. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EFAULT - maptbl has inconsistent state. + * %-EAGAIN - fragment is under initialization yet. + * %-ERANGE - internal error. + */ +int ssdfs_maptbl_set_indirect_relation(struct ssdfs_peb_mapping_table *tbl, + u64 leb_id, u8 peb_type, + u64 dst_leb_id, u16 dst_peb_index, + struct completion **end) +{ + struct ssdfs_fs_info *fsi; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !end); + + SSDFS_DBG("maptbl %p, leb_id %llu, " + "peb_type %#x, dst_peb_index %u\n", + tbl, leb_id, peb_type, dst_peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + *end = NULL; + fsi = tbl->fsi; + + if (atomic_read(&tbl->flags) & SSDFS_MAPTBL_ERROR) { + ssdfs_fs_error(tbl->fsi->sb, + __FILE__, __func__, __LINE__, + "maptbl has corrupted state\n"); + return -EFAULT; + } + + if (should_cache_peb_info(peb_type)) { + struct ssdfs_maptbl_peb_relation prev_pebr; + + /* resolve potential inconsistency */ + err = ssdfs_maptbl_convert_leb2peb(fsi, leb_id, peb_type, + &prev_pebr, end); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: " + "leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to resolve inconsistency: " + "leb_id %llu, err %d\n", + leb_id, err); + return err; + } + } + + down_read(&tbl->tbl_lock); + + err = ssdfs_maptbl_set_shared_destination_peb(tbl, dst_leb_id, + peb_type, end); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: leb_id %llu\n", + dst_leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_set_indirect_relation; + } else if (unlikely(err)) { + SSDFS_ERR("fail to set shared destination PEB: " + "dst_leb_id %llu, err %u\n", + dst_leb_id, err); + goto finish_set_indirect_relation; + } + + err = __ssdfs_maptbl_set_indirect_relation(tbl, leb_id, peb_type, + dst_peb_index, end); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_set_indirect_relation; + } else if (unlikely(err)) { + SSDFS_ERR("fail to set indirect relation: " + "leb_id %llu, err %u\n", + leb_id, err); + goto finish_set_indirect_relation; + } + +finish_set_indirect_relation: + up_read(&tbl->tbl_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} From patchwork Sat Feb 25 01:08:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151947 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0139C64ED8 for ; Sat, 25 Feb 2023 01:19:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229647AbjBYBTK (ORCPT ); Fri, 24 Feb 2023 20:19:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49068 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229723AbjBYBRL (ORCPT ); Fri, 24 Feb 2023 20:17:11 -0500 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B9D3F136C5 for ; Fri, 24 Feb 2023 17:17:06 -0800 (PST) Received: by mail-oi1-x22b.google.com with SMTP id y184so797764oiy.8 for ; Fri, 24 Feb 2023 17:17:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wRd8HWvndzu51uXjNZ1+V/ShQ9IwrylreJhBv45QIqs=; b=2bJ+H05pfh4iyP7TH0WKty/Ep5rk0kHPZRoR2/Ls1TKeHeYmMIQl53ZpUmv9eZnG/y 0A47i/2MMb6OTgZh0X3F+BO/LMOHgmfSk46bxiLr2P48GpAwcZ7uq1Q6XbLLWocutlHg CgcqdQgdcLh7H6gXMDuFl0JKVXccLVpD/r+9OLCcgirIAlY51yP6z/pkYEs24Fw6QMWE mtNiNEqaelqLwPWQb/NM4Z00+PtngVvqvZbzEgLpLFmX7mqW7uILmceI8AdMOnPTeq+f crsQbHNITf+bbIIr7/LBgqO6Pu38COYQNzYAu6VqPiLGnfM8epCDqp3o8d81QqsV0a8r rxVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wRd8HWvndzu51uXjNZ1+V/ShQ9IwrylreJhBv45QIqs=; b=QpjUpBjRY7IfILzxm3Bwr06UP26HznJxoU69bUYQmChgyGN+yB1BGfLCIl86mM0vGu SR06KSFZKWLUBEMaXWxslmYk0+N6+GsJrUA7RUZS8V+2FvWSLZbHXV7uTDCqL250Uj+k PbPn58jF9krEj5c45etJM1sUyJGFPJcC778YNjE6AED9ybGlbm4xsOR6NyBKKmnSe2g0 CSjiLdgtVGS+gEEwmjYro4YBch9VLOEcRH0UAST3/5CUfcE4OWz7uojMVlcJkeehQKeU 1t0XhtP1qnbOPA6c2hcyILgrPgv2DhETIF5QwsBBCF8WafWHVyu9HTk9U7CO8IE5qC8s /r1Q== X-Gm-Message-State: AO0yUKW4e4CSKxKxk9WyKGZn+qNYKyF5CwIaJVtWO1rIA5xrCvtLy9R6 0t6Fyc4pnEv/gCDpxowUrU6Mngru5HpXSliJ X-Google-Smtp-Source: AK7set+aWAeXEObLXQepgihvWI5M3XfN0TOe1iiku0xQq1VfZtiuA4RL2h7MDcvuFxVpfKAv9byiVg== X-Received: by 2002:a05:6808:19:b0:383:f514:ecc6 with SMTP id u25-20020a056808001900b00383f514ecc6mr1854210oic.36.1677287824206; Fri, 24 Feb 2023 17:17:04 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:03 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 42/76] ssdfs: PEB mapping table thread logic Date: Fri, 24 Feb 2023 17:08:53 -0800 Message-Id: <20230225010927.813929-43-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org "Physical" Erase Block (PEB) mapping table has dedicated thread. This thread has goal to track the presence of dirty PEB(s) in mapping table and to execute TRIM/erase operation for dirty PEBs in the background. However, if the number of dirty PEBs is big enough, then erase operation(s) can be executed at the context of the thread that marks PEB as dirty. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/peb_mapping_table.c | 513 +++++ fs/ssdfs/peb_mapping_table_thread.c | 2817 +++++++++++++++++++++++++++ 2 files changed, 3330 insertions(+) create mode 100644 fs/ssdfs/peb_mapping_table_thread.c diff --git a/fs/ssdfs/peb_mapping_table.c b/fs/ssdfs/peb_mapping_table.c index 490114d77c67..738de2d62c9f 100644 --- a/fs/ssdfs/peb_mapping_table.c +++ b/fs/ssdfs/peb_mapping_table.c @@ -11037,3 +11037,516 @@ int ssdfs_maptbl_set_indirect_relation(struct ssdfs_peb_mapping_table *tbl, return err; } + +/* + * ssdfs_maptbl_set_zns_external_peb_ptr() - define zone as external pointer + * @fdesc: fragment descriptor + * @index: PEB index in the fragment + * @peb_type: PEB type + * + * This method tries to set SSDFS_MAPTBL_SOURCE_PEB_HAS_ZONE_PTR flag. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static int +ssdfs_maptbl_set_zns_external_peb_ptr(struct ssdfs_maptbl_fragment_desc *fdesc, + u16 index, u8 peb_type) +{ + struct ssdfs_peb_descriptor *ptr; + pgoff_t page_index; + u16 item_index; + struct page *page; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc); + + SSDFS_DBG("fdesc %p, index %u\n", + fdesc, index); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_index = PEBTBL_PAGE_INDEX(fdesc, index); + item_index = index % fdesc->pebs_per_page; + + page = ssdfs_page_array_get_page_locked(&fdesc->array, page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + page_index); + return err; + } + + kaddr = kmap_local_page(page); + + ptr = GET_PEB_DESCRIPTOR(kaddr, item_index); + if (IS_ERR_OR_NULL(ptr)) { + err = IS_ERR(ptr) ? PTR_ERR(ptr) : -ERANGE; + SSDFS_ERR("fail to get peb_descriptor: " + "page_index %lu, item_index %u, err %d\n", + page_index, item_index, err); + goto finish_page_processing; + } + + if (peb_type != ptr->type) { + err = -ERANGE; + SSDFS_ERR("peb_type %#x != ptr->type %#x\n", + peb_type, ptr->type); + goto finish_page_processing; + } + + if (ptr->flags & SSDFS_MAPTBL_SHARED_DESTINATION_PEB) { + err = -ERANGE; + SSDFS_ERR("corrupted PEB desriptor\n"); + goto finish_page_processing; + } + + switch (ptr->state) { + case SSDFS_MAPTBL_USED_PEB_STATE: + ptr->state = SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE; + break; + + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + ptr->state = SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE; + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid PEB state %#x\n", + ptr->state); + goto finish_page_processing; + } + + ptr->shared_peb_index = U8_MAX; + ptr->flags |= SSDFS_MAPTBL_SOURCE_PEB_HAS_ZONE_PTR; + +finish_page_processing: + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * __ssdfs_maptbl_set_zns_indirect_relation() - set PEBs indirect relation + * @tbl: pointer on mapping table object + * @leb_id: LEB ID number + * @peb_type: PEB type + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to set SSDFS_MAPTBL_SOURCE_PEB_HAS_ZONE_PTR flag. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EFAULT - maptbl has inconsistent state. + * %-EAGAIN - fragment is under initialization yet. + * %-ERANGE - internal error. + */ +static int +__ssdfs_maptbl_set_zns_indirect_relation(struct ssdfs_peb_mapping_table *tbl, + u64 leb_id, u8 peb_type, + struct completion **end) +{ + struct ssdfs_maptbl_fragment_desc *fdesc; + int state; + struct ssdfs_leb_descriptor leb_desc; + u16 physical_index; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !end); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + + SSDFS_DBG("maptbl %p, leb_id %llu, peb_type %#x\n", + tbl, leb_id, peb_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + fdesc = ssdfs_maptbl_get_fragment_descriptor(tbl, leb_id); + if (IS_ERR_OR_NULL(fdesc)) { + err = IS_ERR(fdesc) ? PTR_ERR(fdesc) : -ERANGE; + SSDFS_ERR("fail to get fragment descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + return err; + } + + *end = &fdesc->init_end; + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED) { + err = -EFAULT; + SSDFS_ERR("fragment is corrupted: leb_id %llu\n", leb_id); + return err; + } else if (state == SSDFS_MAPTBL_FRAG_CREATED) { + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } + + down_write(&fdesc->lock); + + err = ssdfs_maptbl_get_leb_descriptor(fdesc, leb_id, &leb_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to get leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_fragment_change; + } + + if (!__is_mapped_leb2peb(&leb_desc)) { + err = -ERANGE; + SSDFS_ERR("leb %llu doesn't be mapped yet\n", + leb_id); + goto finish_fragment_change; + } + + if (is_leb_migrating(&leb_desc)) { + err = -ERANGE; + SSDFS_ERR("leb %llu has direct relation\n", + leb_id); + goto finish_fragment_change; + } + + physical_index = le16_to_cpu(leb_desc.physical_index); + + if (physical_index == U16_MAX) { + err = -ENODATA; + SSDFS_DBG("unitialized leb descriptor\n"); + goto finish_fragment_change; + } + + err = ssdfs_maptbl_set_zns_external_peb_ptr(fdesc, physical_index, + peb_type); + if (unlikely(err)) { + SSDFS_ERR("fail to set external PEB pointer: " + "physical_index %u, err %d\n", + physical_index, err); + goto finish_fragment_change; + } + +finish_fragment_change: + up_write(&fdesc->lock); + + if (!err) + ssdfs_maptbl_set_fragment_dirty(tbl, fdesc, leb_id); + + return err; +} + +/* + * ssdfs_maptbl_set_zns_indirect_relation() - set PEBs indirect relation + * @tbl: pointer on mapping table object + * @leb_id: source LEB ID number + * @peb_type: PEB type + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to set SSDFS_MAPTBL_SOURCE_PEB_HAS_ZONE_PTR flag. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EFAULT - maptbl has inconsistent state. + * %-EAGAIN - fragment is under initialization yet. + * %-ERANGE - internal error. + */ +int ssdfs_maptbl_set_zns_indirect_relation(struct ssdfs_peb_mapping_table *tbl, + u64 leb_id, u8 peb_type, + struct completion **end) +{ + struct ssdfs_fs_info *fsi; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !end); + + SSDFS_DBG("maptbl %p, leb_id %llu, peb_type %#x\n", + tbl, leb_id, peb_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + *end = NULL; + fsi = tbl->fsi; + + if (atomic_read(&tbl->flags) & SSDFS_MAPTBL_ERROR) { + ssdfs_fs_error(tbl->fsi->sb, + __FILE__, __func__, __LINE__, + "maptbl has corrupted state\n"); + return -EFAULT; + } + + if (should_cache_peb_info(peb_type)) { + struct ssdfs_maptbl_peb_relation prev_pebr; + + /* resolve potential inconsistency */ + err = ssdfs_maptbl_convert_leb2peb(fsi, leb_id, peb_type, + &prev_pebr, end); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: " + "leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to resolve inconsistency: " + "leb_id %llu, err %d\n", + leb_id, err); + return err; + } + } + + down_read(&tbl->tbl_lock); + + err = __ssdfs_maptbl_set_zns_indirect_relation(tbl, leb_id, + peb_type, end); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_set_indirect_relation; + } else if (unlikely(err)) { + SSDFS_ERR("fail to set indirect relation: " + "leb_id %llu, err %u\n", + leb_id, err); + goto finish_set_indirect_relation; + } + +finish_set_indirect_relation: + up_read(&tbl->tbl_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_maptbl_clear_peb_as_shared() - clear destination PEB as shared + * @fdesc: fragment descriptor + * @index: PEB index in the fragment + * @peb_type: PEB type + * + * This method tries to clear SSDFS_MAPTBL_SHARED_DESTINATION_PEB flag + * in destination PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_clear_peb_as_shared(struct ssdfs_maptbl_fragment_desc *fdesc, + u16 index, u8 peb_type) +{ + struct ssdfs_peb_descriptor *ptr; + pgoff_t page_index; + u16 item_index; + struct page *page; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc); + + SSDFS_DBG("fdesc %p, index %u\n", + fdesc, index); +#endif /* CONFIG_SSDFS_DEBUG */ + + page_index = PEBTBL_PAGE_INDEX(fdesc, index); + item_index = index % fdesc->pebs_per_page; + + page = ssdfs_page_array_get_page_locked(&fdesc->array, page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + page_index); + return err; + } + + kaddr = kmap_local_page(page); + + ptr = GET_PEB_DESCRIPTOR(kaddr, item_index); + if (IS_ERR_OR_NULL(ptr)) { + err = IS_ERR(ptr) ? PTR_ERR(ptr) : -ERANGE; + SSDFS_ERR("fail to get peb_descriptor: " + "page_index %lu, item_index %u, err %d\n", + page_index, item_index, err); + goto finish_page_processing; + } + + if (peb_type != ptr->type) { + err = -ERANGE; + SSDFS_ERR("peb_type %#x != ptr->type %#x\n", + peb_type, ptr->type); + goto finish_page_processing; + } + + switch (ptr->state) { + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_DIRTY_STATE: + /* valid state */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid PEB state %#x\n", + ptr->state); + goto finish_page_processing; + } + + if (ptr->flags & SSDFS_MAPTBL_SOURCE_PEB_HAS_EXT_PTR || + ptr->shared_peb_index != U16_MAX) { + err = -ERANGE; + SSDFS_ERR("corrupted PEB desriptor\n"); + goto finish_page_processing; + } + + if (!(ptr->flags & SSDFS_MAPTBL_SHARED_DESTINATION_PEB)) + SSDFS_WARN("it is not shared destination PEB\n"); + + ptr->flags &= ~SSDFS_MAPTBL_SHARED_DESTINATION_PEB; + +finish_page_processing: + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_maptbl_clear_shared_destination_peb() - clear destination PEB as shared + * @tbl: pointer on mapping table object + * @leb_id: LEB ID number + * @peb_type: PEB type + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to clear SSDFS_MAPTBL_SHARED_DESTINATION_PEB flag + * in destination PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EFAULT - maptbl has inconsistent state. + * %-EAGAIN - fragment is under initialization yet. + * %-ERANGE - internal error. + */ +static int +ssdfs_maptbl_clear_shared_destination_peb(struct ssdfs_peb_mapping_table *tbl, + u64 leb_id, u8 peb_type, + struct completion **end) +{ + struct ssdfs_maptbl_fragment_desc *fdesc; + int state; + struct ssdfs_leb_descriptor leb_desc; + u16 relation_index; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !end); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + + SSDFS_DBG("maptbl %p, leb_id %llu, peb_type %#x\n", + tbl, leb_id, peb_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + fdesc = ssdfs_maptbl_get_fragment_descriptor(tbl, leb_id); + if (IS_ERR_OR_NULL(fdesc)) { + err = IS_ERR(fdesc) ? PTR_ERR(fdesc) : -ERANGE; + SSDFS_ERR("fail to get fragment descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + return err; + } + + *end = &fdesc->init_end; + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED) { + err = -EFAULT; + SSDFS_ERR("fragment is corrupted: leb_id %llu\n", leb_id); + return err; + } else if (state == SSDFS_MAPTBL_FRAG_CREATED) { + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } + + down_write(&fdesc->lock); + + err = ssdfs_maptbl_get_leb_descriptor(fdesc, leb_id, &leb_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to get leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_fragment_change; + } + + if (!__is_mapped_leb2peb(&leb_desc)) { + err = -ERANGE; + SSDFS_ERR("leb %llu doesn't be mapped yet\n", + leb_id); + goto finish_fragment_change; + } + + if (!is_leb_migrating(&leb_desc)) { + err = -ERANGE; + SSDFS_ERR("leb %llu isn't under migration\n", + leb_id); + goto finish_fragment_change; + } + + relation_index = le16_to_cpu(leb_desc.relation_index); + + if (relation_index == U16_MAX) { + err = -ENODATA; + SSDFS_DBG("unitialized leb descriptor\n"); + goto finish_fragment_change; + } + + err = ssdfs_maptbl_clear_peb_as_shared(fdesc, relation_index, + peb_type); + if (unlikely(err)) { + SSDFS_ERR("fail to clear PEB as shared: " + "relation_index %u, err %d\n", + relation_index, err); + goto finish_fragment_change; + } + +finish_fragment_change: + up_write(&fdesc->lock); + + if (!err) + ssdfs_maptbl_set_fragment_dirty(tbl, fdesc, leb_id); + + return err; +} diff --git a/fs/ssdfs/peb_mapping_table_thread.c b/fs/ssdfs/peb_mapping_table_thread.c new file mode 100644 index 000000000000..25236cce7b18 --- /dev/null +++ b/fs/ssdfs/peb_mapping_table_thread.c @@ -0,0 +1,2817 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/peb_mapping_table_thread.c - PEB mapping table thread functionality. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "page_array.h" +#include "peb_mapping_table.h" + +#include + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_map_thread_page_leaks; +atomic64_t ssdfs_map_thread_memory_leaks; +atomic64_t ssdfs_map_thread_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_map_thread_cache_leaks_increment(void *kaddr) + * void ssdfs_map_thread_cache_leaks_decrement(void *kaddr) + * void *ssdfs_map_thread_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_map_thread_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_map_thread_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_map_thread_kfree(void *kaddr) + * struct page *ssdfs_map_thread_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_map_thread_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_map_thread_free_page(struct page *page) + * void ssdfs_map_thread_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(map_thread) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(map_thread) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_map_thread_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_map_thread_page_leaks, 0); + atomic64_set(&ssdfs_map_thread_memory_leaks, 0); + atomic64_set(&ssdfs_map_thread_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_map_thread_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_map_thread_page_leaks) != 0) { + SSDFS_ERR("MAPPING TABLE THREAD: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_map_thread_page_leaks)); + } + + if (atomic64_read(&ssdfs_map_thread_memory_leaks) != 0) { + SSDFS_ERR("MAPPING TABLE THREAD: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_map_thread_memory_leaks)); + } + + if (atomic64_read(&ssdfs_map_thread_cache_leaks) != 0) { + SSDFS_ERR("MAPPING TABLE THREAD: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_map_thread_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +/* + * is_time_to_erase_peb() - check that PEB can be erased + * @hdr: fragment's header + * @found_item: PEB index in the fragment + */ +static +bool is_time_to_erase_peb(struct ssdfs_peb_table_fragment_header *hdr, + unsigned long found_item) +{ + unsigned long *used_bmap; + unsigned long *dirty_bmap; + u16 pebs_count; + unsigned long protected_item = found_item; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!hdr); + + SSDFS_DBG("hdr %p, found_item %lu\n", + hdr, found_item); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_peb_protected(found_item)) + return true; + + used_bmap = (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_USED_BMAP][0]; + dirty_bmap = (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_DIRTY_BMAP][0]; + pebs_count = le16_to_cpu(hdr->pebs_count); + + if (found_item >= pebs_count) { + SSDFS_ERR("found_item %lu >= pebs_count %u\n", + found_item, pebs_count); + return false; + } + + for (i = 0; i < SSDFS_MAPTBL_PROTECTION_RANGE; i++) { + unsigned long found; + + protected_item += SSDFS_MAPTBL_PROTECTION_STEP; + + if (protected_item >= pebs_count) + protected_item = SSDFS_MAPTBL_FIRST_PROTECTED_INDEX; + + if (protected_item == found_item) + return false; + + found = find_next_bit(used_bmap, pebs_count, + protected_item); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("i %d, protected_item %lu, found %lu\n", + i, protected_item, found); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (found == protected_item) + continue; + + found = find_next_bit(dirty_bmap, pebs_count, + protected_item); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("i %d, protected_item %lu, found %lu\n", + i, protected_item, found); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (found == protected_item) + continue; + + /* the item is protected */ + return false; + } + + return true; +} + +/* + * does_peb_contain_snapshot() - check that PEB contains snapshot + * @ptr: PEB descriptor + */ +static inline +bool does_peb_contain_snapshot(struct ssdfs_peb_descriptor *ptr) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr); + + SSDFS_DBG("ptr->state %#x\n", + ptr->state); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (ptr->state == SSDFS_MAPTBL_SNAPSHOT_STATE) + return true; + + return false; +} + +/* + * ssdfs_maptbl_collect_stripe_dirty_pebs() - collect dirty PEBs in stripe + * @tbl: mapping table object + * @fdesc: fragment descriptor + * @fragment_index: index of fragment + * @stripe_index: index of stripe + * @erases_per_stripe: count of erases per stripe + * @array: array of erase operation results [out] + * + * This method tries to collect information about dirty PEBs + * in the stripe. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static int +ssdfs_maptbl_collect_stripe_dirty_pebs(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_fragment_desc *fdesc, + u32 fragment_index, + int stripe_index, + int erases_per_stripe, + struct ssdfs_erase_result_array *array) +{ + struct ssdfs_peb_table_fragment_header *hdr; + struct ssdfs_peb_descriptor *ptr; + int found_pebs = 0; + u16 stripe_pages = fdesc->stripe_pages; + pgoff_t start_page; + unsigned long *dirty_bmap; + bool has_protected_peb_collected = false; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc || !array); + BUG_ON(!rwsem_is_locked(&fdesc->lock)); + + SSDFS_DBG("fdesc %p, stripe_index %u, " + "erases_per_stripe %d\n", + fdesc, stripe_index, + erases_per_stripe); +#endif /* CONFIG_SSDFS_DEBUG */ + + start_page = stripe_index * stripe_pages; + start_page += fdesc->lebtbl_pages; + + for (i = 0; i < stripe_pages; i++) { + pgoff_t page_index = start_page + i; + struct page *page; + void *kaddr; + unsigned long found_item = 0; + u16 peb_index; + u64 start_peb; + u16 pebs_count; + + page = ssdfs_page_array_get_page_locked(&fdesc->array, + page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + page_index); + return err; + } + + kaddr = kmap_local_page(page); + + hdr = (struct ssdfs_peb_table_fragment_header *)kaddr; + dirty_bmap = + (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_DIRTY_BMAP][0]; + start_peb = le64_to_cpu(hdr->start_peb); + pebs_count = le16_to_cpu(hdr->pebs_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment_index %u, stripe_index %d, " + "stripe_page %d, dirty_bits %d\n", + fragment_index, stripe_index, i, + bitmap_weight(dirty_bmap, pebs_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + while (found_pebs < erases_per_stripe) { + found_item = find_next_bit(dirty_bmap, pebs_count, + found_item); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found_item %lu, pebs_count %u\n", + found_item, pebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (found_item >= pebs_count) { + /* all dirty PEBs were found */ + goto finish_page_processing; + } + + if ((start_peb + found_item) >= tbl->pebs_count) { + /* all dirty PEBs were found */ + goto finish_page_processing; + } + + if (!is_time_to_erase_peb(hdr, found_item)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PEB %llu is protected yet\n", + GET_PEB_ID(kaddr, found_item)); +#endif /* CONFIG_SSDFS_DEBUG */ + found_item++; + continue; + } + + if (is_peb_protected(found_item)) + has_protected_peb_collected = true; + + ptr = GET_PEB_DESCRIPTOR(hdr, found_item); + if (IS_ERR_OR_NULL(ptr)) { + err = IS_ERR(ptr) ? PTR_ERR(ptr) : -ERANGE; + SSDFS_ERR("fail to get peb_descriptor: " + "found_item %lu, err %d\n", + found_item, err); + goto finish_page_processing; + } + + if (ptr->state == SSDFS_MAPTBL_UNDER_ERASE_STATE) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PEB %llu is under erase\n", + GET_PEB_ID(kaddr, found_item)); +#endif /* CONFIG_SSDFS_DEBUG */ + found_item++; + continue; + } + + if (does_peb_contain_snapshot(ptr)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PEB %llu contains snapshot\n", + GET_PEB_ID(kaddr, found_item)); +#endif /* CONFIG_SSDFS_DEBUG */ + found_item++; + continue; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(array->size >= array->capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + peb_index = DEFINE_PEB_INDEX_IN_FRAGMENT(fdesc, + page_index, + found_item); + SSDFS_ERASE_RESULT_INIT(fragment_index, peb_index, + GET_PEB_ID(kaddr, found_item), + SSDFS_ERASE_RESULT_UNKNOWN, + &array->ptr[array->size]); + + array->size++; + found_pebs++; + found_item++; + + if (has_protected_peb_collected) + goto finish_page_processing; + }; + +finish_page_processing: + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) + return err; + } + + return 0; +} + +/* + * ssdfs_maptbl_collect_dirty_pebs() - collect dirty PEBs in fragment + * @tbl: mapping table object + * @fragment_index: index of fragment + * @erases_per_fragment: maximal amount of erases per fragment + * @array: array of erase operation results [out] + * + * This method tries to collect information about dirty PEBs + * in fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOENT - no dirty PEBs. + */ +static +int ssdfs_maptbl_collect_dirty_pebs(struct ssdfs_peb_mapping_table *tbl, + u32 fragment_index, + int erases_per_fragment, + struct ssdfs_erase_result_array *array) +{ + struct ssdfs_maptbl_fragment_desc *fdesc; + int state; + u16 stripes_per_fragment; + int erases_per_stripe; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !array); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + + if (fragment_index >= tbl->fragments_count) { + SSDFS_ERR("fragment_index %u >= tbl->fragments_count %u\n", + fragment_index, tbl->fragments_count); + return -EINVAL; + } + + SSDFS_DBG("tbl %p, fragment_index %u, " + "erases_per_fragment %d\n", + tbl, fragment_index, + erases_per_fragment); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset(array->ptr, 0, + array->capacity * sizeof(struct ssdfs_erase_result)); + array->size = 0; + + fdesc = &tbl->desc_array[fragment_index]; + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED || + state == SSDFS_MAPTBL_FRAG_CREATED) { + /* do nothing */ + return -ENOENT; + } + + stripes_per_fragment = tbl->stripes_per_fragment; + erases_per_stripe = erases_per_fragment / stripes_per_fragment; + if (erases_per_stripe == 0) + erases_per_stripe = 1; + + down_read(&fdesc->lock); + + if (fdesc->pre_erase_pebs == 0) { + /* no dirty PEBs */ + err = -ENOENT; + goto finish_gathering; + } + + for (i = 0; i < stripes_per_fragment; i++) { + err = ssdfs_maptbl_collect_stripe_dirty_pebs(tbl, fdesc, + fragment_index, + i, + erases_per_stripe, + array); + if (unlikely(err)) { + SSDFS_ERR("fail to collect dirty PEBs: " + "fragment_index %u, stripe_index %d, " + "err %d\n", + fragment_index, i, err); + goto finish_gathering; + } + } + +finish_gathering: + up_read(&fdesc->lock); + + return err; +} + +/* + * ssdfs_maptbl_erase_peb() - erase particular PEB + * @fsi: file system info object + * @result: erase operation result [in|out] + * + * This method tries to erase dirty PEB. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EROFS - file system in RO state. + */ +int ssdfs_maptbl_erase_peb(struct ssdfs_fs_info *fsi, + struct ssdfs_erase_result *result) +{ + u64 peb_id; + loff_t offset; + size_t len = fsi->erasesize; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !result); +#endif /* CONFIG_SSDFS_DEBUG */ + + peb_id = result->peb_id; + + if (((LLONG_MAX - 1) / fsi->erasesize) < peb_id) { + SSDFS_NOTICE("ignore erasing peb %llu\n", peb_id); + result->state = SSDFS_IGNORE_ERASE; + return 0; + } + + offset = peb_id * fsi->erasesize; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_id %llu, offset %llu\n", + peb_id, (u64)offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (result->state == SSDFS_BAD_BLOCK_DETECTED) { + err = fsi->devops->mark_peb_bad(fsi->sb, offset); + if (unlikely(err)) { + SSDFS_ERR("fail to mark PEB as bad: " + "peb %llu, err %d\n", + peb_id, err); + } + err = 0; + } else { + err = fsi->devops->trim(fsi->sb, offset, len); + if (err == -EROFS) { + SSDFS_DBG("file system has READ_ONLY state\n"); + return err; + } else if (err == -EFAULT) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("erase operation failure: peb %llu\n", + peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + result->state = SSDFS_ERASE_FAILURE; + } else if (unlikely(err)) { + SSDFS_ERR("fail to erase: peb %llu, err %d\n", + peb_id, err); + err = 0; + result->state = SSDFS_IGNORE_ERASE; + } else + result->state = SSDFS_ERASE_DONE; + } + + return 0; +} + +/* + * ssdfs_maptbl_erase_pebs_array() - erase PEBs + * @fsi: file system info object + * @array: array of erase operation results [in|out] + * + * This method tries to erase dirty PEBs. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EROFS - file system in RO state. + */ +static inline +int ssdfs_maptbl_erase_pebs_array(struct ssdfs_fs_info *fsi, + struct ssdfs_erase_result_array *array) +{ + u32 i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !array || !array->ptr); + BUG_ON(!fsi->devops || !fsi->devops->trim); + BUG_ON(array->capacity == 0); + BUG_ON(array->capacity < array->size); + + SSDFS_DBG("fsi %p, capacity %u, size %u\n", + fsi, array->capacity, array->size); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (array->size == 0) + return 0; + + for (i = 0; i < array->size; i++) { + err = ssdfs_maptbl_erase_peb(fsi, &array->ptr[i]); + if (unlikely(err)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to erase PEB: " + "peb_id %llu, err %d\n", + array->ptr[i].peb_id, err); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } + } + + return 0; +} + +/* + * ssdfs_maptbl_correct_peb_state() - correct state of erased PEB + * @fdesc: fragment descriptor + * @res: result of erase operation + * + * This method corrects PEB state after erasing. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_correct_peb_state(struct ssdfs_maptbl_fragment_desc *fdesc, + struct ssdfs_erase_result *res) +{ + struct ssdfs_peb_table_fragment_header *hdr; + struct ssdfs_peb_descriptor *ptr; + pgoff_t page_index; + u16 item_index; + struct page *page; + void *kaddr; + unsigned long *dirty_bmap, *used_bmap, *recover_bmap, *bad_bmap; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc || !res); + BUG_ON(!rwsem_is_locked(&fdesc->lock)); + + SSDFS_DBG("fdesc %p, res->fragment_index %u, res->peb_index %u, " + "res->peb_id %llu, res->state %#x\n", + fdesc, res->fragment_index, res->peb_index, + res->peb_id, res->state); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (res->state == SSDFS_IGNORE_ERASE) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ignore PEB: peb_id %llu\n", res->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + page_index = PEBTBL_PAGE_INDEX(fdesc, res->peb_index); + item_index = res->peb_index % fdesc->pebs_per_page; + + page = ssdfs_page_array_get_page_locked(&fdesc->array, page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + page_index); + return err; + } + + kaddr = kmap_local_page(page); + + hdr = (struct ssdfs_peb_table_fragment_header *)kaddr; + dirty_bmap = (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_DIRTY_BMAP][0]; + used_bmap = (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_USED_BMAP][0]; + recover_bmap = + (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_RECOVER_BMAP][0]; + bad_bmap = (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_BADBLK_BMAP][0]; + + ptr = GET_PEB_DESCRIPTOR(hdr, item_index); + if (IS_ERR_OR_NULL(ptr)) { + err = IS_ERR(ptr) ? PTR_ERR(ptr) : -ERANGE; + SSDFS_ERR("fail to get peb_descriptor: " + "peb_index %u, err %d\n", + res->peb_index, err); + goto finish_page_processing; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_id %llu, peb_index %u, state %#x\n", + res->peb_id, res->peb_index, ptr->state); + SSDFS_DBG("erase_cycles %u, type %#x, " + "state %#x, flags %#x, shared_peb_index %u\n", + le32_to_cpu(ptr->erase_cycles), + ptr->type, ptr->state, + ptr->flags, ptr->shared_peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (ptr->state != SSDFS_MAPTBL_PRE_ERASE_STATE && + ptr->state != SSDFS_MAPTBL_UNDER_ERASE_STATE && + ptr->state != SSDFS_MAPTBL_RECOVERING_STATE) { + err = -ERANGE; + SSDFS_ERR("invalid PEB state: " + "peb_id %llu, peb_index %u, state %#x\n", + res->peb_id, res->peb_index, ptr->state); + goto finish_page_processing; + } + + le32_add_cpu(&ptr->erase_cycles, 1); + ptr->type = SSDFS_MAPTBL_UNKNOWN_PEB_TYPE; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("erase_cycles %u, type %#x, " + "state %#x, flags %#x, shared_peb_index %u\n", + le32_to_cpu(ptr->erase_cycles), + ptr->type, ptr->state, + ptr->flags, ptr->shared_peb_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (res->state) { + case SSDFS_ERASE_DONE: + ptr->state = SSDFS_MAPTBL_UNKNOWN_PEB_STATE; + bitmap_clear(dirty_bmap, item_index, 1); + bitmap_clear(used_bmap, item_index, 1); + le16_add_cpu(&hdr->reserved_pebs, 1); + fdesc->reserved_pebs++; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hdr->reserved_pebs %u\n", + le16_to_cpu(hdr->reserved_pebs)); + BUG_ON(fdesc->pre_erase_pebs == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + fdesc->pre_erase_pebs--; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fdesc->pre_erase_pebs %u\n", + fdesc->pre_erase_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + case SSDFS_ERASE_SB_PEB_DONE: + ptr->type = SSDFS_MAPTBL_SBSEG_PEB_TYPE; + ptr->state = SSDFS_MAPTBL_USING_PEB_STATE; + bitmap_clear(dirty_bmap, item_index, 1); +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(fdesc->pre_erase_pebs == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + fdesc->pre_erase_pebs--; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fdesc->pre_erase_pebs %u\n", + fdesc->pre_erase_pebs); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + case SSDFS_ERASE_FAILURE: + ptr->state = SSDFS_MAPTBL_RECOVERING_STATE; + bitmap_clear(dirty_bmap, item_index, 1); + bitmap_set(recover_bmap, item_index, 1); + fdesc->recovering_pebs++; + if (!(hdr->flags & SSDFS_PEBTBL_UNDER_RECOVERING)) { + hdr->flags |= SSDFS_PEBTBL_UNDER_RECOVERING; + hdr->recover_months = 1; + hdr->recover_threshold = SSDFS_PEBTBL_FIRST_RECOVER_TRY; + } + break; + + default: + BUG(); + }; + + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + err = ssdfs_page_array_set_page_dirty(&fdesc->array, + page_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: err %d\n", + page_index, err); + } + +finish_page_processing: + flush_dcache_page(page); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_maptbl_correct_fragment_dirty_pebs() - correct PEBs' state in fragment + * @tbl: mapping table object + * @array: array of erase operation results + * @item_index: pointer on current index in array [in|out] + * + * This method corrects PEBs' state in fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static int +ssdfs_maptbl_correct_fragment_dirty_pebs(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_erase_result_array *array, + u32 *item_index) +{ + struct ssdfs_maptbl_fragment_desc *fdesc; + u32 fragment_index; + int state; + int erased_pebs = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !array || !array->ptr || !item_index); + BUG_ON(array->capacity == 0); + BUG_ON(array->capacity < array->size); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + + SSDFS_DBG("tbl %p, capacity %u, size %u, item_index %u\n", + tbl, array->capacity, array->size, *item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (*item_index >= array->size) { + SSDFS_ERR("item_index %u >= array->size %u\n", + *item_index, array->size); + return -EINVAL; + } + + fragment_index = array->ptr[*item_index].fragment_index; + + if (fragment_index >= tbl->fragments_count) { + SSDFS_ERR("fragment_index %u >= tbl->fragments_count %u\n", + fragment_index, tbl->fragments_count); + return -ERANGE; + } + + fdesc = &tbl->desc_array[fragment_index]; + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED || + state == SSDFS_MAPTBL_FRAG_CREATED) { + SSDFS_ERR("fail to correct fragment: " + "fragment_index %u, state %#x\n", + fragment_index, state); + return -ERANGE; + } + + down_write(&fdesc->lock); + + if (fdesc->pre_erase_pebs == 0) { + SSDFS_ERR("fdesc->pre_erase_pebs == 0\n"); + err = -ERANGE; + goto finish_fragment_correction; + } + + do { + err = ssdfs_maptbl_correct_peb_state(fdesc, + &array->ptr[*item_index]); + if (unlikely(err)) { + SSDFS_ERR("fail to correct PEB state: " + "peb_id %llu, err %d\n", + array->ptr[*item_index].peb_id, + err); + goto finish_fragment_correction; + } + + if (array->ptr[*item_index].state != SSDFS_IGNORE_ERASE) + erased_pebs++; + + ++*item_index; + } while (*item_index < array->size && + fragment_index == array->ptr[*item_index].fragment_index); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("erased_pebs %d, pre_erase_pebs %d\n", + erased_pebs, + atomic_read(&tbl->pre_erase_pebs)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (atomic_sub_return(erased_pebs, &tbl->pre_erase_pebs) < 0) { + SSDFS_WARN("erased_pebs %d, pre_erase_pebs %d\n", + erased_pebs, + atomic_read(&tbl->pre_erase_pebs)); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tbl->pre_erase_pebs %d\n", + atomic_read(&tbl->pre_erase_pebs)); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_fragment_correction: + up_write(&fdesc->lock); + + if (!err) { + if (is_ssdfs_maptbl_going_to_be_destroyed(tbl)) { + SSDFS_WARN("maptbl %p, " + "fdesc %p, fragment_index %u, " + "start_leb %llu, lebs_count %u\n", + tbl, fdesc, fragment_index, + fdesc->start_leb, fdesc->lebs_count); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("maptbl %p, " + "fdesc %p, fragment_index %u, " + "start_leb %llu, lebs_count %u\n", + tbl, fdesc, fragment_index, + fdesc->start_leb, fdesc->lebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + mutex_lock(&tbl->bmap_lock); + atomic_set(&fdesc->state, SSDFS_MAPTBL_FRAG_DIRTY); + bitmap_set(tbl->dirty_bmap, fragment_index, 1); + mutex_unlock(&tbl->bmap_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment_index %u, state %#x\n", + fragment_index, + atomic_read(&fdesc->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return err; +} + +/* + * ssdfs_maptbl_correct_dirty_pebs() - correct PEBs' state after erasing + * @tbl: mapping table object + * @array: array of erase operation results + */ +static +int ssdfs_maptbl_correct_dirty_pebs(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_erase_result_array *array) +{ + u32 item_index = 0; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !array || !array->ptr); + BUG_ON(array->capacity == 0); + BUG_ON(array->capacity < array->size); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + + SSDFS_DBG("tbl %p, capacity %u, size %u\n", + tbl, array->capacity, array->size); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (array->size == 0) + return 0; + + do { + err = ssdfs_maptbl_correct_fragment_dirty_pebs(tbl, array, + &item_index); + if (unlikely(err)) { + SSDFS_ERR("fail to correct fragment's: err %d\n", + err); + return err; + } + } while (item_index < array->size); + + if (item_index != array->size) { + SSDFS_ERR("item_index %u != array->size %u\n", + item_index, array->size); + return err; + } + + return 0; +} + +/* + * ssdfs_maptbl_correct_dirty_peb() - correct PEB's state in fragment + * @tbl: mapping table object + * @fdesc: fragment descriptor + * @result: erase operation result + * + * This method corrects PEB's state in fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_maptbl_correct_dirty_peb(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_fragment_desc *fdesc, + struct ssdfs_erase_result *result) +{ + int state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc || !result); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + BUG_ON(!rwsem_is_locked(&fdesc->lock)); + + SSDFS_DBG("peb_id %llu\n", result->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED || + state == SSDFS_MAPTBL_FRAG_CREATED) { + SSDFS_ERR("fail to correct fragment: " + "fragment_id %u, state %#x\n", + fdesc->fragment_id, state); + return -ERANGE; + } + + if (fdesc->pre_erase_pebs == 0) { + SSDFS_ERR("fdesc->pre_erase_pebs == 0\n"); + return -ERANGE; + } + + err = ssdfs_maptbl_correct_peb_state(fdesc, result); + if (unlikely(err)) { + SSDFS_ERR("fail to correct PEB state: " + "peb_id %llu, err %d\n", + result->peb_id, err); + return err; + } + + if (result->state == SSDFS_IGNORE_ERASE) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ignore erase operation: " + "peb_id %llu\n", + result->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + if (atomic_dec_return(&tbl->pre_erase_pebs) < 0) { + SSDFS_WARN("pre_erase_pebs %d\n", + atomic_read(&tbl->pre_erase_pebs)); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tbl->pre_erase_pebs %d\n", + atomic_read(&tbl->pre_erase_pebs)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_ssdfs_maptbl_going_to_be_destroyed(tbl)) { + SSDFS_WARN("maptbl %p, " + "fdesc %p, fragment_id %u, " + "start_leb %llu, lebs_count %u\n", + tbl, fdesc, fdesc->fragment_id, + fdesc->start_leb, fdesc->lebs_count); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("maptbl %p, " + "fdesc %p, fragment_id %u, " + "start_leb %llu, lebs_count %u\n", + tbl, fdesc, fdesc->fragment_id, + fdesc->start_leb, fdesc->lebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + mutex_lock(&tbl->bmap_lock); + atomic_set(&fdesc->state, SSDFS_MAPTBL_FRAG_DIRTY); + bitmap_set(tbl->dirty_bmap, fdesc->fragment_id, 1); + mutex_unlock(&tbl->bmap_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment_id %u, state %#x\n", + fdesc->fragment_id, + atomic_read(&fdesc->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * is_time_to_recover_pebs() - check that it's time to recover PEBs + * @tbl: mapping table object + */ +static inline +bool is_time_to_recover_pebs(struct ssdfs_peb_mapping_table *tbl) +{ +#define BILLION 1000000000L + u64 month_ns = 31 * 24 * 60 * 60 * BILLION; + u64 current_cno, upper_bound_cno; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !tbl->fsi || !tbl->fsi->sb); + + SSDFS_DBG("tbl %p\n", tbl); +#endif /* CONFIG_SSDFS_DEBUG */ + + upper_bound_cno = atomic64_read(&tbl->last_peb_recover_cno); + upper_bound_cno += month_ns; + + current_cno = ssdfs_current_cno(tbl->fsi->sb); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("current_cno %llu, upper_bound_cno %llu\n", + current_cno, upper_bound_cno); +#endif /* CONFIG_SSDFS_DEBUG */ + + return current_cno >= upper_bound_cno; +} + +/* + * set_last_recovering_cno() - set current checkpoint as last recovering try + * @tbl: mapping table object + */ +static inline +void set_last_recovering_cno(struct ssdfs_peb_mapping_table *tbl) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !tbl->fsi || !tbl->fsi->sb); + + SSDFS_DBG("tbl %p\n", tbl); +#endif /* CONFIG_SSDFS_DEBUG */ + + atomic64_set(&tbl->last_peb_recover_cno, + ssdfs_current_cno(tbl->fsi->sb)); +} + +/* + * ssdfs_maptbl_find_page_recovering_pebs() - finds recovering PEBs in a page + * @fdesc: fragment descriptor + * @fragment_index: fragment index + * @page_index: page index + * @max_erases: upper bound of erase operations for a page + * @stage: phase of PEBs recovering + * @array: array of erase operation results [out] + * + * This method tries to find PEBs for recovering. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOSPC - array is full. + */ +static int +ssdfs_maptbl_find_page_recovering_pebs(struct ssdfs_maptbl_fragment_desc *fdesc, + u32 fragment_index, + pgoff_t page_index, + int max_erases, + int stage, + struct ssdfs_erase_result_array *array) +{ + struct ssdfs_peb_table_fragment_header *hdr; + bool need_mark_peb_bad = false; + unsigned long *recover_bmap; + int recovering_pebs; + u16 pebs_count; + struct page *page; + void *kaddr; + unsigned long found_item, search_step; + u16 peb_index; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc || !array); + BUG_ON(!rwsem_is_locked(&fdesc->lock)); + + if (stage >= SSDFS_RECOVER_STAGE_MAX) { + SSDFS_ERR("invalid recovering stage %#x\n", + stage); + return -EINVAL; + } + + SSDFS_DBG("fdesc %p, fragment_index %u, page_index %lu, " + "max_erases %d, stage %#x\n", + fdesc, fragment_index, page_index, + max_erases, stage); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = ssdfs_page_array_get_page_locked(&fdesc->array, page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + page_index); + return err; + } + + kaddr = kmap_local_page(page); + + hdr = (struct ssdfs_peb_table_fragment_header *)kaddr; + + switch (stage) { + case SSDFS_CHECK_RECOVERABILITY: + if (!(hdr->flags & SSDFS_PEBTBL_FIND_RECOVERING_PEBS)) { + /* no PEBs for recovering */ + goto finish_page_processing; + } + break; + + case SSDFS_MAKE_RECOVERING: + if (!(hdr->flags & SSDFS_PEBTBL_TRY_CORRECT_PEBS_AGAIN)) { + /* no PEBs for recovering */ + goto finish_page_processing; + } else if (!(hdr->flags & SSDFS_PEBTBL_FIND_RECOVERING_PEBS)) { + err = -ERANGE; + SSDFS_WARN("invalid flags combination: %#x\n", + hdr->flags); + goto finish_page_processing; + } + break; + + default: + BUG(); + }; + + if (hdr->recover_months > 0) { + hdr->recover_months--; + goto finish_page_processing; + } + + pebs_count = le16_to_cpu(hdr->pebs_count); + recover_bmap = + (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_RECOVER_BMAP][0]; + recovering_pebs = bitmap_weight(recover_bmap, pebs_count); + + if (unlikely(recovering_pebs == 0)) { + err = -ERANGE; + SSDFS_ERR("recovering_pebs == 0\n"); + goto finish_page_processing; + } else if (hdr->recover_threshold == SSDFS_PEBTBL_BADBLK_THRESHOLD) { + /* simply reserve PEBs for marking as bad */ + need_mark_peb_bad = true; + } else if (((recovering_pebs * 100) / pebs_count) < 20) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("leave page %lu untouched: " + "recovering_pebs %d, pebs_count %u\n", + page_index, recovering_pebs, pebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + hdr->recover_months++; + goto finish_page_processing; + } + + max_erases = min_t(int, max_erases, (int)pebs_count); + + if (need_mark_peb_bad) + search_step = 1; + else + search_step = pebs_count / max_erases; + + while (array->size < array->capacity) { + unsigned long start = 0; + int state; + + found_item = find_next_bit(recover_bmap, pebs_count, + start); + if (found_item >= pebs_count) { + /* all PEBs were found */ + goto finish_page_processing; + } + + array->ptr[array->size].fragment_index = fragment_index; + peb_index = DEFINE_PEB_INDEX_IN_FRAGMENT(fdesc, page_index, + found_item); + array->ptr[array->size].peb_index = peb_index; + array->ptr[array->size].peb_id = GET_PEB_ID(kaddr, found_item); + + if (need_mark_peb_bad) + state = SSDFS_BAD_BLOCK_DETECTED; + else + state = SSDFS_ERASE_RESULT_UNKNOWN; + + array->ptr[array->size].state = state; + array->size++; + + start = (found_item / search_step) * search_step; + }; + +finish_page_processing: + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (array->size >= array->capacity) { + err = -ENOSPC; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("array->size %u, max_erases %d\n", + array->size, max_erases); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return err; +} + +/* + * ssdfs_maptbl_collect_recovering_pebs() - collect recovering PEBs in fragment + * @tbl: mapping table object + * @fragment_index: fragment index + * @erases_per_fragment: upper bound of erase operations for fragment + * @stage: phase of PEBs recovering + * @array: array of erase operation results [out] + * + * This method tries to find PEBs for recovering in fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_collect_recovering_pebs(struct ssdfs_peb_mapping_table *tbl, + u32 fragment_index, + int erases_per_fragment, + int stage, + struct ssdfs_erase_result_array *array) +{ + struct ssdfs_maptbl_fragment_desc *fdesc; + int state; + pgoff_t index, max_index; + int max_erases; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !array); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + + if (fragment_index >= tbl->fragments_count) { + SSDFS_ERR("fragment_index %u >= tbl->fragments_count %u\n", + fragment_index, tbl->fragments_count); + return -EINVAL; + } + + if (stage >= SSDFS_RECOVER_STAGE_MAX) { + SSDFS_ERR("invalid recovering stage %#x\n", + stage); + return -EINVAL; + } + + SSDFS_DBG("tbl %p, fragment_index %u, " + "erases_per_fragment %d, stage %#x\n", + tbl, fragment_index, + erases_per_fragment, stage); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset(array->ptr, 0, + array->capacity * sizeof(struct ssdfs_erase_result)); + array->size = 0; + + fdesc = &tbl->desc_array[fragment_index]; + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED || + state == SSDFS_MAPTBL_FRAG_CREATED) { + /* do nothing */ + return 0; + } + + down_read(&fdesc->lock); + + if (fdesc->recovering_pebs == 0) { + /* no PEBs for recovering */ + goto finish_gathering; + } + + max_index = fdesc->lebtbl_pages; + max_index += tbl->stripes_per_fragment * fdesc->stripe_pages; + max_erases = erases_per_fragment / fdesc->stripe_pages; + + for (index = fdesc->lebtbl_pages; index < max_index; index++) { + err = ssdfs_maptbl_find_page_recovering_pebs(fdesc, + fragment_index, + index, + max_erases, + stage, + array); + if (err == -ENOSPC) { + err = 0; + goto finish_gathering; + } else if (unlikely(err)) { + SSDFS_ERR("fail to collect recovering PEBs: " + "fragment_index %u, page_index %lu, " + "err %d\n", + fragment_index, index, err); + goto finish_gathering; + } + } + +finish_gathering: + up_read(&fdesc->lock); + + return err; +} + +/* + * ssdfs_maptbl_increase_threshold() - increase threshold of waiting time + * @hdr: PEB table fragment header + */ +static inline void +ssdfs_maptbl_increase_threshold(struct ssdfs_peb_table_fragment_header *hdr) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!hdr); + + SSDFS_DBG("hdr %p, recover_threshold %u\n", + hdr, hdr->recover_threshold); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (hdr->recover_threshold) { + case SSDFS_PEBTBL_FIRST_RECOVER_TRY: + hdr->recover_threshold = SSDFS_PEBTBL_SECOND_RECOVER_TRY; + hdr->recover_months = 2; + break; + + case SSDFS_PEBTBL_SECOND_RECOVER_TRY: + hdr->recover_threshold = SSDFS_PEBTBL_THIRD_RECOVER_TRY; + hdr->recover_months = 3; + break; + + case SSDFS_PEBTBL_THIRD_RECOVER_TRY: + hdr->recover_threshold = SSDFS_PEBTBL_FOURTH_RECOVER_TRY; + hdr->recover_months = 4; + break; + + case SSDFS_PEBTBL_FOURTH_RECOVER_TRY: + hdr->recover_threshold = SSDFS_PEBTBL_FIFTH_RECOVER_TRY; + hdr->recover_months = 5; + break; + + case SSDFS_PEBTBL_FIFTH_RECOVER_TRY: + hdr->recover_threshold = SSDFS_PEBTBL_SIX_RECOVER_TRY; + hdr->recover_months = 6; + break; + + case SSDFS_PEBTBL_SIX_RECOVER_TRY: + hdr->recover_threshold = SSDFS_PEBTBL_BADBLK_THRESHOLD; + hdr->recover_months = 0; + break; + + default: + /* do nothing */ + break; + } +} + +/* + * ssdfs_maptbl_define_wait_time() - define time of next waiting iteration + * @hdr: PEB table fragment header + */ +static inline void +ssdfs_maptbl_define_wait_time(struct ssdfs_peb_table_fragment_header *hdr) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!hdr); + + SSDFS_DBG("hdr %p, recover_threshold %u\n", + hdr, hdr->recover_threshold); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (hdr->recover_threshold) { + case SSDFS_PEBTBL_FIRST_RECOVER_TRY: + hdr->recover_months = 1; + break; + + case SSDFS_PEBTBL_SECOND_RECOVER_TRY: + hdr->recover_months = 2; + break; + + case SSDFS_PEBTBL_THIRD_RECOVER_TRY: + hdr->recover_months = 3; + break; + + case SSDFS_PEBTBL_FOURTH_RECOVER_TRY: + hdr->recover_months = 4; + break; + + case SSDFS_PEBTBL_FIFTH_RECOVER_TRY: + hdr->recover_months = 5; + break; + + case SSDFS_PEBTBL_SIX_RECOVER_TRY: + hdr->recover_months = 6; + break; + + default: + hdr->recover_months = 0; + break; + } +} + +/* + * ssdfs_maptbl_correct_page_recovered_pebs() - correct state of PEBs in page + * @tbl: mapping table object + * @ptr: fragment descriptor + * @array: array of erase operation results + * @item_index: pointer on current index in array [in|out] + * + * This method corrects PEBs state after recovering. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static int +ssdfs_maptbl_correct_page_recovered_pebs(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_fragment_desc *ptr, + struct ssdfs_erase_result_array *array, + u32 *item_index) +{ + struct ssdfs_peb_table_fragment_header *hdr; + struct ssdfs_peb_descriptor *peb_desc; + struct ssdfs_erase_result *res; + pgoff_t page_index, next_page; + struct page *page; + void *kaddr; + unsigned long *dirty_bmap, *used_bmap, *recover_bmap, *bad_bmap; + u32 recovered_pebs = 0, failed_pebs = 0, bad_pebs = 0; + u16 peb_index_offset; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !ptr || !array || !array->ptr || !item_index); + BUG_ON(array->capacity == 0); + BUG_ON(array->capacity < array->size); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + BUG_ON(!rwsem_is_locked(&ptr->lock)); + + SSDFS_DBG("fdesc %p, capacity %u, size %u, item_index %u\n", + ptr, array->capacity, array->size, *item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (*item_index >= array->size) { + SSDFS_ERR("item_index %u >= array->size %u\n", + *item_index, array->size); + return -EINVAL; + } + + res = &array->ptr[*item_index]; + page_index = PEBTBL_PAGE_INDEX(ptr, res->peb_index); + + page = ssdfs_page_array_get_page_locked(&ptr->array, page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + page_index); + return err; + } + + kaddr = kmap_local_page(page); + + hdr = (struct ssdfs_peb_table_fragment_header *)kaddr; + dirty_bmap = (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_DIRTY_BMAP][0]; + used_bmap = (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_USED_BMAP][0]; + recover_bmap = + (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_RECOVER_BMAP][0]; + bad_bmap = (unsigned long *)&hdr->bmaps[SSDFS_PEBTBL_BADBLK_BMAP][0]; + + if (!(hdr->flags & SSDFS_PEBTBL_UNDER_RECOVERING)) { + err = -ERANGE; + SSDFS_ERR("page %lu isn't recovering\n", page_index); + goto finish_page_processing; + } + + do { + res = &array->ptr[*item_index]; + peb_index_offset = res->peb_index % ptr->pebs_per_page; + + peb_desc = GET_PEB_DESCRIPTOR(hdr, peb_index_offset); + if (IS_ERR_OR_NULL(peb_desc)) { + err = IS_ERR(peb_desc) ? PTR_ERR(peb_desc) : -ERANGE; + SSDFS_ERR("fail to get peb_descriptor: " + "peb_index %u, err %d\n", + res->peb_index, err); + goto finish_page_processing; + } + + if (peb_desc->state != SSDFS_MAPTBL_RECOVERING_STATE) { + err = -ERANGE; + SSDFS_ERR("invalid PEB state: " + "peb_id %llu, peb_index %u, state %#x\n", + res->peb_id, res->peb_index, res->state); + goto finish_page_processing; + } + + if (res->state == SSDFS_BAD_BLOCK_DETECTED) { + peb_desc->state = SSDFS_MAPTBL_BAD_PEB_STATE; + bitmap_clear(dirty_bmap, peb_index_offset, 1); + bitmap_set(bad_bmap, peb_index_offset, 1); + ptr->recovering_pebs--; + + bad_pebs++; + } else if (res->state != SSDFS_ERASE_DONE) { + /* do nothing */ + failed_pebs++; + } else { + peb_desc->state = SSDFS_MAPTBL_UNKNOWN_PEB_STATE; + bitmap_clear(recover_bmap, peb_index_offset, 1); + bitmap_clear(used_bmap, peb_index_offset, 1); + le16_add_cpu(&hdr->reserved_pebs, 1); + ptr->reserved_pebs++; + ptr->recovering_pebs--; + + recovered_pebs++; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hdr->reserved_pebs %u\n", + le16_to_cpu(hdr->reserved_pebs)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + ++*item_index; + + res->peb_index = array->ptr[*item_index].peb_index; + next_page = PEBTBL_PAGE_INDEX(ptr, res->peb_index); + } while (*item_index < array->size && page_index == next_page); + + if (bad_pebs > 0) { + err = -EAGAIN; + hdr->flags |= SSDFS_PEBTBL_BADBLK_EXIST; + hdr->flags &= ~SSDFS_PEBTBL_UNDER_RECOVERING; + hdr->flags |= SSDFS_PEBTBL_TRY_CORRECT_PEBS_AGAIN; + BUG_ON(recovered_pebs > 0); + } else if (recovered_pebs == 0) { + BUG_ON(failed_pebs == 0); + ssdfs_maptbl_increase_threshold(hdr); + hdr->flags &= ~SSDFS_PEBTBL_TRY_CORRECT_PEBS_AGAIN; + } else if (recovered_pebs < failed_pebs) { + /* use the same duration for recovering */ + ssdfs_maptbl_define_wait_time(hdr); + hdr->flags &= ~SSDFS_PEBTBL_TRY_CORRECT_PEBS_AGAIN; + } else { + err = -EAGAIN; + hdr->flags |= SSDFS_PEBTBL_TRY_CORRECT_PEBS_AGAIN; + } + + if (!err) { + ssdfs_set_page_private(page, 0); + SetPageUptodate(page); + err = ssdfs_page_array_set_page_dirty(&ptr->array, + page_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set page %lu dirty: err %d\n", + page_index, err); + } + } + +finish_page_processing: + flush_dcache_page(page); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_correct_fragment_recovered_pebs() - correct state of PEBs in fragment + * @tbl: mapping table object + * @array: array of erase operation results + * @item_index: pointer on current index in array [in|out] + * + * This method corrects PEBs state after recovering. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EAGAIN - need to repeat recovering. + */ +static int +ssdfs_correct_fragment_recovered_pebs(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_erase_result_array *array, + u32 *item_index) +{ + struct ssdfs_maptbl_fragment_desc *fdesc; + u32 fragment_index; + int state; + int err = 0, err2 = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !array || !array->ptr || !item_index); + BUG_ON(array->capacity == 0); + BUG_ON(array->capacity < array->size); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + + SSDFS_DBG("tbl %p, capacity %u, size %u, item_index %u\n", + tbl, array->capacity, array->size, *item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (*item_index >= array->size) { + SSDFS_ERR("item_index %u >= array->size %u\n", + *item_index, array->size); + return -EINVAL; + } + + fragment_index = array->ptr[*item_index].fragment_index; + + if (fragment_index >= tbl->fragments_count) { + SSDFS_ERR("fragment_index %u >= tbl->fragments_count %u\n", + fragment_index, tbl->fragments_count); + return -ERANGE; + } + + fdesc = &tbl->desc_array[fragment_index]; + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED || + state == SSDFS_MAPTBL_FRAG_CREATED) { + SSDFS_ERR("fail to correct fragment: " + "fragment_index %u, state %#x\n", + fragment_index, state); + return -ERANGE; + } + + down_write(&fdesc->lock); + + if (fdesc->recovering_pebs == 0) { + SSDFS_ERR("fdesc->recovering_pebs == 0\n"); + err = -ERANGE; + goto finish_fragment_correction; + } + + do { + err = ssdfs_maptbl_correct_page_recovered_pebs(tbl, fdesc, + array, + item_index); + if (err == -EAGAIN) { + err2 = -EAGAIN; + err = 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to correct page's PEB state: " + "item_index %u, err %d\n", + *item_index, err); + goto finish_fragment_correction; + } + } while (*item_index < array->size && + fragment_index == array->ptr[*item_index].fragment_index); + +finish_fragment_correction: + up_write(&fdesc->lock); + + if (!err) { + if (is_ssdfs_maptbl_going_to_be_destroyed(tbl)) { + SSDFS_WARN("maptbl %p, " + "fdesc %p, fragment_index %u, " + "start_leb %llu, lebs_count %u\n", + tbl, fdesc, fragment_index, + fdesc->start_leb, fdesc->lebs_count); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("maptbl %p, " + "fdesc %p, fragment_index %u, " + "start_leb %llu, lebs_count %u\n", + tbl, fdesc, fragment_index, + fdesc->start_leb, fdesc->lebs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + mutex_lock(&tbl->bmap_lock); + atomic_set(&fdesc->state, SSDFS_MAPTBL_FRAG_DIRTY); + bitmap_set(tbl->dirty_bmap, fragment_index, 1); + mutex_unlock(&tbl->bmap_lock); + err = err2; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment_index %u, state %#x\n", + fragment_index, + atomic_read(&fdesc->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return err; +} + +/* + * ssdfs_maptbl_correct_recovered_pebs() - correct state of PEBs + * @tbl: mapping table object + * @array: array of erase operation results + * + * This method corrects PEBs state after recovering. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EAGAIN - need to repeat recovering. + */ +static +int ssdfs_maptbl_correct_recovered_pebs(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_erase_result_array *array) +{ + u32 item_index = 0; + int err = 0, err2 = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !array || !array->ptr); + BUG_ON(array->capacity == 0); + BUG_ON(array->capacity < array->size); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + + SSDFS_DBG("tbl %p, capacity %u, size %u\n", + tbl, array->capacity, array->size); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (array->size == 0) + return 0; + + do { + err = ssdfs_correct_fragment_recovered_pebs(tbl, array, + &item_index); + if (err == -EAGAIN) { + err2 = err; + err = 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to correct fragment's: err %d\n", + err); + return err; + } + } while (item_index < array->size); + + if (item_index != array->size) { + SSDFS_ERR("item_index %u != array->size %u\n", + item_index, array->size); + return -ERANGE; + } + + return !err ? err2 : err; +} + +#define SSDFS_MAPTBL_IO_RANGE (10) + +/* + * ssdfs_maptbl_correct_max_erase_ops() - correct max erase operations + * @fsi: file system info object + * @max_erase_ops: max number of erase operations + */ +static +int ssdfs_maptbl_correct_max_erase_ops(struct ssdfs_fs_info *fsi, + int max_erase_ops) +{ + s64 reqs_count; + s64 factor; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + + SSDFS_DBG("fsi %p, max_erase_ops %d\n", + fsi, max_erase_ops); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (max_erase_ops <= 0) + return 0; + + reqs_count = atomic64_read(&fsi->flush_reqs); + reqs_count += atomic_read(&fsi->pending_bios); + + if (reqs_count <= SSDFS_MAPTBL_IO_RANGE) + return max_erase_ops; + + factor = reqs_count / SSDFS_MAPTBL_IO_RANGE; + max_erase_ops /= factor; + + if (max_erase_ops == 0) + max_erase_ops = 1; + + return max_erase_ops; +} + +/* + * ssdfs_maptbl_process_dirty_pebs() - process dirty PEBs + * @tbl: mapping table object + * @array: array of erase operation results + */ +int ssdfs_maptbl_process_dirty_pebs(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_erase_result_array *array) +{ + struct ssdfs_fs_info *fsi; + u32 fragments_count; + int max_erase_ops; + int erases_per_fragment; + int state = SSDFS_MAPTBL_NO_ERASE; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !array || !array->ptr); + BUG_ON(array->capacity == 0); + + SSDFS_DBG("tbl %p, capacity %u\n", + tbl, array->capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = tbl->fsi; + + max_erase_ops = atomic_read(&tbl->max_erase_ops); + max_erase_ops = min_t(int, max_erase_ops, array->capacity); + max_erase_ops = ssdfs_maptbl_correct_max_erase_ops(fsi, max_erase_ops); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("max_erase_ops %d\n", max_erase_ops); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (max_erase_ops == 0) { + SSDFS_WARN("max_erase_ops == 0\n"); + return 0; + } + + down_read(&tbl->tbl_lock); + + if (is_ssdfs_maptbl_under_flush(fsi)) { + err = -EBUSY; + SSDFS_DBG("mapping table is under flush\n"); + goto finish_collect_dirty_pebs; + } + + state = atomic_cmpxchg(&tbl->erase_op_state, + SSDFS_MAPTBL_NO_ERASE, + SSDFS_MAPTBL_ERASE_IN_PROGRESS); + if (state != SSDFS_MAPTBL_NO_ERASE) { + err = -EBUSY; + SSDFS_DBG("erase operation is in progress\n"); + goto finish_collect_dirty_pebs; + } else + state = SSDFS_MAPTBL_ERASE_IN_PROGRESS; + + fragments_count = tbl->fragments_count; + erases_per_fragment = max_erase_ops / fragments_count; + if (erases_per_fragment == 0) + erases_per_fragment = 1; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("erases_per_fragment %d\n", erases_per_fragment); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < fragments_count; i++) { + err = ssdfs_maptbl_collect_dirty_pebs(tbl, i, + erases_per_fragment, array); + if (err == -ENOENT) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment %d has no dirty PEBs\n", + i); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } else if (unlikely(err)) { + SSDFS_ERR("fail to collect dirty pebs: " + "fragment_index %d, err %d\n", + i, err); + goto finish_collect_dirty_pebs; + } + + up_read(&tbl->tbl_lock); + + if (is_ssdfs_maptbl_under_flush(fsi)) { + err = -EBUSY; + SSDFS_DBG("mapping table is under flush\n"); + goto finish_dirty_pebs_processing; + } + + err = ssdfs_maptbl_erase_pebs_array(tbl->fsi, array); + if (err == -EROFS) { + err = 0; + SSDFS_DBG("file system has READ-ONLY state\n"); + goto finish_dirty_pebs_processing; + } else if (unlikely(err)) { + SSDFS_ERR("fail to erase PEBs in array: err %d\n", err); + goto finish_dirty_pebs_processing; + } + + wake_up_all(&tbl->erase_ops_end_wq); + + down_read(&tbl->tbl_lock); + + err = ssdfs_maptbl_correct_dirty_pebs(tbl, array); + if (unlikely(err)) { + SSDFS_ERR("fail to correct erased PEBs state: err %d\n", + err); + goto finish_collect_dirty_pebs; + } + } + +finish_collect_dirty_pebs: + up_read(&tbl->tbl_lock); + +finish_dirty_pebs_processing: + if (state == SSDFS_MAPTBL_ERASE_IN_PROGRESS) { + state = SSDFS_MAPTBL_NO_ERASE; + atomic_set(&tbl->erase_op_state, SSDFS_MAPTBL_NO_ERASE); + } + + wake_up_all(&tbl->erase_ops_end_wq); + + return err; +} + +/* + * __ssdfs_maptbl_recover_pebs() - try to recover PEBs + * @tbl: mapping table object + * @array: array of erase operation results + * @stage: phase of PEBs recovering + * + * This method tries to recover PEBs. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EAGAIN - need to repeat recovering. + * %-EBUSY - mapping table is under flush. + */ +static +int __ssdfs_maptbl_recover_pebs(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_erase_result_array *array, + int stage) +{ + struct ssdfs_fs_info *fsi; + u32 fragments_count; + int max_erase_ops; + int erases_per_fragment; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !array || !array->ptr); + BUG_ON(array->capacity == 0); + + if (stage >= SSDFS_RECOVER_STAGE_MAX) { + SSDFS_ERR("invalid recovering stage %#x\n", + stage); + return -EINVAL; + } + + SSDFS_DBG("tbl %p, capacity %u, stage %#x\n", + tbl, array->capacity, stage); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = tbl->fsi; + + max_erase_ops = atomic_read(&tbl->max_erase_ops); + max_erase_ops = min_t(int, max_erase_ops, array->capacity); + max_erase_ops = ssdfs_maptbl_correct_max_erase_ops(fsi, max_erase_ops); + + if (max_erase_ops == 0) { + SSDFS_WARN("max_erase_ops == 0\n"); + return 0; + } + + down_read(&tbl->tbl_lock); + + if (is_ssdfs_maptbl_under_flush(fsi)) { + err = -EBUSY; + SSDFS_DBG("mapping table is under flush\n"); + goto finish_collect_recovering_pebs; + } + + fragments_count = tbl->fragments_count; + erases_per_fragment = max_erase_ops / fragments_count; + if (erases_per_fragment == 0) + erases_per_fragment = 1; + + for (i = 0; i < fragments_count; i++) { + err = ssdfs_maptbl_collect_recovering_pebs(tbl, i, + erases_per_fragment, + stage, + array); + if (unlikely(err)) { + SSDFS_ERR("fail to collect recovering pebs: " + "fragment_index %d, err %d\n", + i, err); + goto finish_collect_recovering_pebs; + } + + if (kthread_should_stop()) { + err = -EAGAIN; + goto finish_collect_recovering_pebs; + } + } + +finish_collect_recovering_pebs: + up_read(&tbl->tbl_lock); + + if (err) + goto finish_pebs_recovering; + + if (is_ssdfs_maptbl_under_flush(fsi)) { + err = -EBUSY; + SSDFS_DBG("mapping table is under flush\n"); + goto finish_pebs_recovering; + } + + if (kthread_should_stop()) { + err = -EAGAIN; + goto finish_pebs_recovering; + } + + err = ssdfs_maptbl_erase_pebs_array(tbl->fsi, array); + if (err == -EROFS) { + err = 0; + SSDFS_DBG("file system has READ-ONLY state\n"); + goto finish_pebs_recovering; + } else if (unlikely(err)) { + SSDFS_ERR("fail to erase PEBs in array: err %d\n", err); + goto finish_pebs_recovering; + } + + down_read(&tbl->tbl_lock); + err = ssdfs_maptbl_correct_recovered_pebs(tbl, array); + if (unlikely(err)) { + SSDFS_ERR("fail to correct recovered PEBs state: err %d\n", + err); + } + up_read(&tbl->tbl_lock); + +finish_pebs_recovering: + return err; +} + +/* + * ssdfs_maptbl_check_pebs_recoverability() - check PEBs recoverability + * @tbl: mapping table object + * @array: array of erase operation results + * + * This method check that PEBs are ready for recovering. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EAGAIN - need to repeat recovering. + * %-EBUSY - mapping table is under flush. + */ +static inline int +ssdfs_maptbl_check_pebs_recoverability(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_erase_result_array *array) +{ + int stage = SSDFS_CHECK_RECOVERABILITY; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !array || !array->ptr); + BUG_ON(array->capacity == 0); + + SSDFS_DBG("tbl %p, capacity %u\n", + tbl, array->capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_maptbl_recover_pebs(tbl, array, stage); +} + +/* + * ssdfs_maptbl_recover_pebs() - recover as many PEBs as possible + * @tbl: mapping table object + * @array: array of erase operation results + * + * This method tries to recover as many PEBs as possible. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EAGAIN - need to repeat recovering. + * %-EBUSY - mapping table is under flush. + */ +static +int ssdfs_maptbl_recover_pebs(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_erase_result_array *array) +{ + int stage = SSDFS_MAKE_RECOVERING; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !array || !array->ptr); + BUG_ON(array->capacity == 0); + + SSDFS_DBG("tbl %p, capacity %u\n", + tbl, array->capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_maptbl_recover_pebs(tbl, array, stage); +} + +/* + * ssdfs_maptbl_resolve_peb_mapping() - resolve inconsistency + * @tbl: mapping table object + * @cache: mapping table cache + * @pmi: PEB mapping info + * + * This method tries to resolve inconsistency of states between + * mapping table and cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-EFAULT - unable to do resolving. + * %-ENODATA - PEB ID is not found. + * %-EAGAIN - repeat resolving again. + * %-EBUSY - mapping table is under flush. + */ +static +int ssdfs_maptbl_resolve_peb_mapping(struct ssdfs_peb_mapping_table *tbl, + struct ssdfs_maptbl_cache *cache, + struct ssdfs_peb_mapping_info *pmi) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_maptbl_peb_relation pebr; + int consistency = SSDFS_PEB_STATE_UNKNOWN; + struct ssdfs_maptbl_fragment_desc *fdesc; + int state; + u64 peb_id; + u8 peb_state; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !cache || !pmi); + + SSDFS_DBG("leb_id %llu, peb_id %llu, consistency %#x\n", + pmi->leb_id, pmi->peb_id, pmi->consistency); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = tbl->fsi; + + if (pmi->leb_id >= U64_MAX) { + SSDFS_ERR("invalid leb_id %llu\n", pmi->leb_id); + return -EINVAL; + } + + if (pmi->peb_id >= U64_MAX) { + SSDFS_ERR("invalid peb_id %llu\n", pmi->peb_id); + return -EINVAL; + } + + switch (pmi->consistency) { + case SSDFS_PEB_STATE_CONSISTENT: + SSDFS_WARN("unexpected consistency %#x\n", + pmi->consistency); + return -EINVAL; + + case SSDFS_PEB_STATE_INCONSISTENT: + case SSDFS_PEB_STATE_PRE_DELETED: + /* expected consistency */ + break; + + default: + SSDFS_ERR("invalid consistency %#x\n", + pmi->consistency); + return -ERANGE; + } + + err = __ssdfs_maptbl_cache_convert_leb2peb(cache, + pmi->leb_id, + &pebr); + if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, err %d\n", + pmi->leb_id, err); + return err; + } + + for (i = SSDFS_MAPTBL_MAIN_INDEX; i < SSDFS_MAPTBL_RELATION_MAX; i++) { + peb_id = pebr.pebs[i].peb_id; + + if (peb_id == pmi->peb_id) { + consistency = pebr.pebs[i].consistency; + break; + } + } + + if (consistency == SSDFS_PEB_STATE_UNKNOWN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_id %llu isn't be found\n", pmi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + switch (consistency) { + case SSDFS_PEB_STATE_CONSISTENT: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("peb_id %llu has consistent state already\n", + pmi->peb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + + default: + if (consistency != pmi->consistency) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("consistency1 %#x != consistency2 %#x\n", + consistency, pmi->consistency); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + } + + down_read(&tbl->tbl_lock); + + if (is_ssdfs_maptbl_under_flush(fsi)) { + err = -EBUSY; + SSDFS_DBG("mapping table is under flush\n"); + goto finish_resolving; + } + + fdesc = ssdfs_maptbl_get_fragment_descriptor(tbl, pmi->leb_id); + if (IS_ERR_OR_NULL(fdesc)) { + err = IS_ERR(fdesc) ? PTR_ERR(fdesc) : -ERANGE; + SSDFS_ERR("fail to get fragment descriptor: " + "leb_id %llu, err %d\n", + pmi->leb_id, err); + goto finish_resolving; + } + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED) { + err = -EFAULT; + SSDFS_ERR("fragment is corrupted: leb_id %llu\n", + pmi->leb_id); + goto finish_resolving; + } else if (state == SSDFS_MAPTBL_FRAG_CREATED) { + struct completion *end = &fdesc->init_end; + + up_read(&tbl->tbl_lock); + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("maptbl's fragment init failed: " + "leb_id %llu, err %d\n", + pmi->leb_id, err); + goto finish_resolving_no_lock; + } + down_read(&tbl->tbl_lock); + } + + if (is_ssdfs_maptbl_under_flush(fsi)) { + err = -EBUSY; + SSDFS_DBG("mapping table is under flush\n"); + goto finish_resolving; + } + + switch (consistency) { + case SSDFS_PEB_STATE_INCONSISTENT: + down_write(&cache->lock); + down_write(&fdesc->lock); + + err = ssdfs_maptbl_cache_convert_leb2peb_nolock(cache, + pmi->leb_id, + &pebr); + if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, err %d\n", + pmi->leb_id, err); + goto finish_inconsistent_case; + } + + err = ssdfs_maptbl_solve_inconsistency(tbl, fdesc, + pmi->leb_id, + &pebr); + if (unlikely(err)) { + SSDFS_ERR("fail to resolve inconsistency: " + "leb_id %llu, err %d\n", + pmi->leb_id, err); + goto finish_inconsistent_case; + } + + peb_id = pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id; + peb_state = pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].state; + if (peb_id != U64_MAX) { + consistency = SSDFS_PEB_STATE_CONSISTENT; + err = ssdfs_maptbl_cache_change_peb_state_nolock(cache, + pmi->leb_id, + peb_state, + consistency); + if (unlikely(err)) { + SSDFS_ERR("fail to change PEB state: " + "leb_id %llu, peb_state %#x, " + "err %d\n", + pmi->leb_id, peb_state, err); + goto finish_inconsistent_case; + } + } + + peb_id = pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX].peb_id; + peb_state = pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX].state; + if (peb_id != U64_MAX) { + consistency = SSDFS_PEB_STATE_CONSISTENT; + err = ssdfs_maptbl_cache_change_peb_state_nolock(cache, + pmi->leb_id, + peb_state, + consistency); + if (unlikely(err)) { + SSDFS_ERR("fail to change PEB state: " + "leb_id %llu, peb_state %#x, " + "err %d\n", + pmi->leb_id, peb_state, err); + goto finish_inconsistent_case; + } + } + +finish_inconsistent_case: + up_write(&fdesc->lock); + up_write(&cache->lock); + + if (!err) { + ssdfs_maptbl_set_fragment_dirty(tbl, fdesc, + pmi->leb_id); + } + break; + + case SSDFS_PEB_STATE_PRE_DELETED: + down_write(&cache->lock); + down_write(&fdesc->lock); + + err = ssdfs_maptbl_cache_convert_leb2peb_nolock(cache, + pmi->leb_id, + &pebr); + if (unlikely(err)) { + SSDFS_ERR("fail to convert LEB to PEB: " + "leb_id %llu, err %d\n", + pmi->leb_id, err); + goto finish_pre_deleted_case; + } + + err = ssdfs_maptbl_solve_pre_deleted_state(tbl, fdesc, + pmi->leb_id, + &pebr); + if (unlikely(err)) { + SSDFS_ERR("fail to resolve pre-deleted state: " + "leb_id %llu, err %d\n", + pmi->leb_id, err); + goto finish_pre_deleted_case; + } + + consistency = SSDFS_PEB_STATE_CONSISTENT; + err = ssdfs_maptbl_cache_forget_leb2peb_nolock(cache, + pmi->leb_id, + consistency); + if (unlikely(err)) { + SSDFS_ERR("fail to exclude migration PEB: " + "leb_id %llu, err %d\n", + pmi->leb_id, err); + goto finish_pre_deleted_case; + } + +finish_pre_deleted_case: + up_write(&fdesc->lock); + up_write(&cache->lock); + + if (!err) { + ssdfs_maptbl_set_fragment_dirty(tbl, fdesc, + pmi->leb_id); + } + break; + + default: + err = -EFAULT; + SSDFS_ERR("invalid consistency %#x\n", + consistency); + goto finish_resolving; + } + +finish_resolving: + up_read(&tbl->tbl_lock); + +finish_resolving_no_lock: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * has_maptbl_pre_erase_pebs() - check that maptbl contains pre-erased PEBs + * @tbl: mapping table object + */ +static inline +bool has_maptbl_pre_erase_pebs(struct ssdfs_peb_mapping_table *tbl) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl); + + SSDFS_DBG("pre_erase_pebs %d\n", + atomic_read(&tbl->pre_erase_pebs)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return atomic_read(&tbl->pre_erase_pebs) > 0; +} + +#ifdef CONFIG_SSDFS_TESTING +int ssdfs_maptbl_erase_dirty_pebs_now(struct ssdfs_peb_mapping_table *tbl) +{ + struct ssdfs_erase_result_array array = {NULL, 0, 0}; + int err = 0; + + down_read(&tbl->tbl_lock); + array.capacity = (u32)tbl->fragments_count * + SSDFS_ERASE_RESULTS_PER_FRAGMENT; + up_read(&tbl->tbl_lock); + + array.size = 0; + array.ptr = ssdfs_map_thread_kcalloc(array.capacity, + sizeof(struct ssdfs_erase_result), + GFP_KERNEL); + if (!array.ptr) { + SSDFS_ERR("fail to allocate erase_results array\n"); + return -ENOMEM; + } + + if (has_maptbl_pre_erase_pebs(tbl)) { + err = ssdfs_maptbl_process_dirty_pebs(tbl, &array); + if (err == -EBUSY || err == -EAGAIN) { + err = 0; + goto finish_erase_dirty_pebs; + } else if (unlikely(err)) { + SSDFS_ERR("fail to process dirty PEBs: err %d\n", + err); + goto finish_erase_dirty_pebs; + } + } + +finish_erase_dirty_pebs: + if (array.ptr) + ssdfs_map_thread_kfree(array.ptr); + + return err; +} +#endif /* CONFIG_SSDFS_TESTING */ + +#define MAPTBL_PTR(tbl) \ + ((struct ssdfs_peb_mapping_table *)(tbl)) +#define MAPTBL_THREAD_WAKE_CONDITION(tbl, cache) \ + (kthread_should_stop() || \ + has_maptbl_pre_erase_pebs(MAPTBL_PTR(tbl)) || \ + !is_ssdfs_peb_mapping_queue_empty(&cache->pm_queue)) +#define MAPTBL_FAILED_THREAD_WAKE_CONDITION() \ + (kthread_should_stop()) + +/* + * ssdfs_maptbl_thread_func() - maptbl object's thread's function + */ +static +int ssdfs_maptbl_thread_func(void *data) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_peb_mapping_table *tbl = data; + struct ssdfs_maptbl_cache *cache; + struct ssdfs_peb_mapping_info *pmi; + wait_queue_head_t *wait_queue; + struct ssdfs_erase_result_array array = {NULL, 0, 0}; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + if (!tbl) { + SSDFS_ERR("pointer on mapping table object is NULL\n"); + BUG(); + } + + SSDFS_DBG("MAPTBL thread\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = tbl->fsi; + cache = &fsi->maptbl_cache; + wait_queue = &tbl->wait_queue; + + down_read(&tbl->tbl_lock); + array.capacity = (u32)tbl->fragments_count * + SSDFS_ERASE_RESULTS_PER_FRAGMENT; + up_read(&tbl->tbl_lock); + + array.size = 0; + array.ptr = ssdfs_map_thread_kcalloc(array.capacity, + sizeof(struct ssdfs_erase_result), + GFP_KERNEL); + if (!array.ptr) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate erase_results array\n"); + goto sleep_maptbl_thread; + } + + down_read(&tbl->tbl_lock); + for (i = 0; i < tbl->fragments_count; i++) { + struct completion *init_end = &tbl->desc_array[i].init_end; + + up_read(&tbl->tbl_lock); + + wait_for_completion_timeout(init_end, HZ); + if (kthread_should_stop()) + goto repeat; + + down_read(&tbl->tbl_lock); + } + up_read(&tbl->tbl_lock); + +repeat: + if (kthread_should_stop()) { + wake_up_all(&tbl->erase_ops_end_wq); + complete_all(&tbl->thread.full_stop); + if (array.ptr) + ssdfs_map_thread_kfree(array.ptr); + + if (unlikely(err)) { + SSDFS_ERR("thread function had some issue: err %d\n", + err); + } + + return err; + } + + if (unlikely(err)) + goto sleep_failed_maptbl_thread; + + if (atomic_read(&tbl->flags) & SSDFS_MAPTBL_ERROR) + err = -EFAULT; + + if (unlikely(err)) { + SSDFS_ERR("fail to continue activity: err %d\n", err); + goto sleep_failed_maptbl_thread; + } + + if (!has_maptbl_pre_erase_pebs(tbl) && + is_ssdfs_peb_mapping_queue_empty(&cache->pm_queue)) { + /* go to sleep */ + goto sleep_maptbl_thread; + } + + while (!is_ssdfs_peb_mapping_queue_empty(&cache->pm_queue)) { + err = ssdfs_peb_mapping_queue_remove_first(&cache->pm_queue, + &pmi); + if (err == -ENODATA) { + /* empty queue */ + err = 0; + break; + } else if (err == -ENOENT) { + SSDFS_WARN("request queue contains NULL request\n"); + err = 0; + continue; + } else if (unlikely(err < 0)) { + SSDFS_CRIT("fail to get request from the queue: " + "err %d\n", + err); + goto check_next_step; + } + + err = ssdfs_maptbl_resolve_peb_mapping(tbl, cache, pmi); + if (err == -EBUSY) { + err = 0; + ssdfs_peb_mapping_queue_add_tail(&cache->pm_queue, + pmi); + goto sleep_maptbl_thread; + } else if (err == -EAGAIN) { + ssdfs_peb_mapping_queue_add_tail(&cache->pm_queue, + pmi); + continue; + } else if (unlikely(err)) { + ssdfs_peb_mapping_queue_add_tail(&cache->pm_queue, + pmi); + SSDFS_ERR("failed to resolve inconsistency: " + "leb_id %llu, peb_id %llu, err %d\n", + pmi->leb_id, pmi->peb_id, err); + goto check_next_step; + } + + ssdfs_peb_mapping_info_free(pmi); + + if (kthread_should_stop()) + goto repeat; + } + + if (has_maptbl_pre_erase_pebs(tbl)) { + err = ssdfs_maptbl_process_dirty_pebs(tbl, &array); + if (err == -EBUSY || err == -EAGAIN) { + err = 0; + wait_event_interruptible_timeout(*wait_queue, + kthread_should_stop(), HZ); + goto sleep_maptbl_thread; + } else if (unlikely(err)) { + SSDFS_ERR("fail to process dirty PEBs: err %d\n", + err); + } + + wait_event_interruptible_timeout(*wait_queue, + kthread_should_stop(), HZ); + } + +check_next_step: + if (kthread_should_stop()) + goto repeat; + + if (unlikely(err)) + goto sleep_failed_maptbl_thread; + + if (is_time_to_recover_pebs(tbl)) { + err = ssdfs_maptbl_check_pebs_recoverability(tbl, &array); + if (err == -EBUSY) { + err = 0; + goto sleep_maptbl_thread; + } else if (err && err != -EAGAIN) { + SSDFS_ERR("fail to check PEBs recoverability: " + "err %d\n", + err); + goto sleep_failed_maptbl_thread; + } + + set_last_recovering_cno(tbl); + + wait_event_interruptible_timeout(*wait_queue, + kthread_should_stop(), HZ); + } else + goto sleep_maptbl_thread; + + if (kthread_should_stop()) + goto repeat; + + while (err == -EAGAIN) { + err = ssdfs_maptbl_recover_pebs(tbl, &array); + if (err == -EBUSY) { + err = 0; + goto sleep_maptbl_thread; + } else if (err && err != -EAGAIN) { + SSDFS_ERR("fail to recover PEBs: err %d\n", + err); + goto sleep_failed_maptbl_thread; + } + + set_last_recovering_cno(tbl); + + wait_event_interruptible_timeout(*wait_queue, + kthread_should_stop(), HZ); + + if (kthread_should_stop()) + goto repeat; + } + +sleep_maptbl_thread: + wait_event_interruptible(*wait_queue, + MAPTBL_THREAD_WAKE_CONDITION(tbl, cache)); + goto repeat; + +sleep_failed_maptbl_thread: + wake_up_all(&tbl->erase_ops_end_wq); + wait_event_interruptible(*wait_queue, + MAPTBL_FAILED_THREAD_WAKE_CONDITION()); + goto repeat; +} + +static +struct ssdfs_thread_descriptor maptbl_thread = { + .threadfn = ssdfs_maptbl_thread_func, + .fmt = "ssdfs-maptbl", +}; + +/* + * ssdfs_maptbl_start_thread() - start mapping table's thread + * @tbl: mapping table object + */ +int ssdfs_maptbl_start_thread(struct ssdfs_peb_mapping_table *tbl) +{ + ssdfs_threadfn threadfn; + const char *fmt; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl); + + SSDFS_DBG("tbl %p\n", tbl); +#endif /* CONFIG_SSDFS_DEBUG */ + + threadfn = maptbl_thread.threadfn; + fmt = maptbl_thread.fmt; + + tbl->thread.task = kthread_create(threadfn, tbl, fmt); + if (IS_ERR_OR_NULL(tbl->thread.task)) { + err = PTR_ERR(tbl->thread.task); + if (err == -EINTR) { + /* + * Ignore this error. + */ + } else { + if (err == 0) + err = -ERANGE; + SSDFS_ERR("fail to start mapping table's thread: " + "err %d\n", err); + } + + return err; + } + + init_waitqueue_entry(&tbl->thread.wait, tbl->thread.task); + add_wait_queue(&tbl->wait_queue, &tbl->thread.wait); + init_completion(&tbl->thread.full_stop); + + wake_up_process(tbl->thread.task); + + return 0; +} + +/* + * ssdfs_maptbl_stop_thread() - stop mapping table's thread + * @tbl: mapping table object + */ +int ssdfs_maptbl_stop_thread(struct ssdfs_peb_mapping_table *tbl) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!tbl->thread.task) + return 0; + + err = kthread_stop(tbl->thread.task); + if (err == -EINTR) { + /* + * Ignore this error. + * The wake_up_process() was never called. + */ + return 0; + } else if (unlikely(err)) { + SSDFS_WARN("thread function had some issue: err %d\n", + err); + return err; + } + + finish_wait(&tbl->wait_queue, &tbl->thread.wait); + tbl->thread.task = NULL; + + err = SSDFS_WAIT_COMPLETION(&tbl->thread.full_stop); + if (unlikely(err)) { + SSDFS_ERR("stop thread fails: err %d\n", err); + return err; + } + + return 0; +} From patchwork Sat Feb 25 01:08:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151950 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7328C7EE2F for ; Sat, 25 Feb 2023 01:19:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229705AbjBYBTM (ORCPT ); Fri, 24 Feb 2023 20:19:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48872 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229725AbjBYBRN (ORCPT ); Fri, 24 Feb 2023 20:17:13 -0500 Received: from mail-oi1-x229.google.com (mail-oi1-x229.google.com [IPv6:2607:f8b0:4864:20::229]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1A2B166C9 for ; Fri, 24 Feb 2023 17:17:07 -0800 (PST) Received: by mail-oi1-x229.google.com with SMTP id q15so780946oiw.11 for ; Fri, 24 Feb 2023 17:17:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0D2uDTuwixRNZTXnArg1dsYJIUQZ31YJXhBxaUJkagE=; b=FAVGwcOhOnrwvhJTytXYcrrd8yip5TSuXb2d4okBnuw278O2kNVEyFU272EvkzII4Q e3aziSA7gUIE5Vc2s5c4aB29quV6hIgcqtF4CjudMe6/j6oZull7eDN7PgECL2Ji9to6 FyQZfT+T/BGwZpICctfwcy+G9i7w+eA29dld4GPz5FJZM1l8iV9smUjWVydHdD//HdZD SaQgniaI/ZGw+no8APe/yL4Jd5ZNVY4ADrgNO7lDZuDp/t2gvra11tffVI2LSIavW0JI ZPfVV82PAqinIG0/y77b8MyqhcGwjvNcuAkaLaIsibC/0BTrChYzvSAKoQ4ySs7i/oJB sHMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0D2uDTuwixRNZTXnArg1dsYJIUQZ31YJXhBxaUJkagE=; b=qBCAdQwjABgRNMeQl1O1V8w0KUF0DNVF520rvkczoJAxqTov0AZRo/Z6aN2dEGz1eH /yWqFIVCaYWi5Lj5kiXkjzQh0zC1u38knmKDH0BZogMuopg9z7CApCEj+ceJChPlPjcZ qhy/sz8InmluR2eHxqYQ0C48cIZG1TmvbfUP3cHBFttCh89hfk235mFxQHw0qQ67T597 rnNuWJtWYMCTA6onn54pBpEFUUPzJwEFapJ9nXIHGB3Nx8btbwvZU1OoMsZGtpZ7n09E rgudxoSOnevLtF2NnSy33mtItxLBL5c9OzSucJNCVhgGxCIy+WDjZzXJqQsBmBQekW/t xv5g== X-Gm-Message-State: AO0yUKVPHJV6hvMTUQD7ggpBtQDDvM5fWcaVbU8EADaEYBuXDRN4G/ED GmELfJsRPVCkEAzyblmbD8nauJmeEpxmvWC2 X-Google-Smtp-Source: AK7set+taS0nox9IBDv5i55KMrRUlAeXKmHaqMvzgWQ5Vn8Me0GGgjWeebgN50A940iTlNtn8yL0dg== X-Received: by 2002:a05:6808:6da:b0:37e:acc5:79 with SMTP id m26-20020a05680806da00b0037eacc50079mr4036392oih.4.1677287826053; Fri, 24 Feb 2023 17:17:06 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:05 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 43/76] ssdfs: introduce PEB mapping table cache Date: Fri, 24 Feb 2023 17:08:54 -0800 Message-Id: <20230225010927.813929-44-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org "Physical" Erase Block (PEB) mapping table is enhanced by special cache is stored in the payload of superblock segment’s log. Generally speaking, the cache stores the copy of records of PEBs’ state. The goal of PEB mapping table’s cache is to resolve the case when a PEB’s descriptor is associated with a LEB of PEB mapping table itself, for example. If unmount operation triggers the flush of PEB mapping table then there are the cases when the PEB mapping table could be modified during the flush operation’s activity. As a result, actual PEB’s state is stored only into PEB mapping table’s cache. Such record is marked as inconsistent and the inconsistency has to be resolved during the next mount operation by means of storing the actual PEB’s state into the PEB mapping table by specialized thread. Moreover, the cache plays another very important role. Namely, PEB mapping table’s cache is used for conversion the LEB ID into PEB ID for the case of basic metadata structures (PEB mapping table, segment bitmap, for example) before the finishing of PEB mapping table initialization during the mount operation. PEB mapping table’s cache starts from the header that precedes to: (1) LEB ID / PEB ID pairs, (2) PEB state records. The pairs’ area associates the LEB IDs with PEB IDs. Additionally, PEB state records’ area contains information about the last actual state of PEBs for every record in the pairs’ area. It makes sense to point out that the most important fields in PEB state area are: (1) consistency, (2) PEB state, and (3) PEB flags. Generally speaking, the consistency field simply shows that a record in the cache and mapping table is identical or not. If some record in the cache has marked as inconsistent then it means that the PEB mapping table has to be modified with the goal to keep the actual value of the cache. As a result, finally, the value in the table and the cache will be consistent. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/peb_mapping_table.c | 1154 +++++++++++++++++++++ fs/ssdfs/peb_mapping_table_cache.c | 1497 ++++++++++++++++++++++++++++ fs/ssdfs/peb_mapping_table_cache.h | 119 +++ 3 files changed, 2770 insertions(+) create mode 100644 fs/ssdfs/peb_mapping_table_cache.c create mode 100644 fs/ssdfs/peb_mapping_table_cache.h diff --git a/fs/ssdfs/peb_mapping_table.c b/fs/ssdfs/peb_mapping_table.c index 738de2d62c9f..aabaa1dc8a5d 100644 --- a/fs/ssdfs/peb_mapping_table.c +++ b/fs/ssdfs/peb_mapping_table.c @@ -11550,3 +11550,1157 @@ ssdfs_maptbl_clear_shared_destination_peb(struct ssdfs_peb_mapping_table *tbl, return err; } + +/* + * ssdfs_maptbl_break_external_peb_ptr() - forget PEB as external pointer + * @fdesc: fragment descriptor + * @index: PEB index in the fragment + * @peb_type: PEB type + * @peb_state: pointer on PEB state value [out] + * + * This method tries to forget index of destination PEB and to clear + * SSDFS_MAPTBL_SOURCE_PEB_HAS_EXT_PTR flag. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static int +ssdfs_maptbl_break_external_peb_ptr(struct ssdfs_maptbl_fragment_desc *fdesc, + u16 index, u8 peb_type, + u8 *peb_state) +{ + struct ssdfs_peb_descriptor *ptr; + pgoff_t page_index; + u16 item_index; + struct page *page; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc || !peb_state); + + SSDFS_DBG("fdesc %p, index %u\n", + fdesc, index); +#endif /* CONFIG_SSDFS_DEBUG */ + + *peb_state = SSDFS_MAPTBL_UNKNOWN_PEB_STATE; + + page_index = PEBTBL_PAGE_INDEX(fdesc, index); + item_index = index % fdesc->pebs_per_page; + + page = ssdfs_page_array_get_page_locked(&fdesc->array, page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + page_index); + return err; + } + + kaddr = kmap_local_page(page); + + ptr = GET_PEB_DESCRIPTOR(kaddr, item_index); + if (IS_ERR_OR_NULL(ptr)) { + err = IS_ERR(ptr) ? PTR_ERR(ptr) : -ERANGE; + SSDFS_ERR("fail to get peb_descriptor: " + "page_index %lu, item_index %u, err %d\n", + page_index, item_index, err); + goto finish_page_processing; + } + + if (peb_type != ptr->type) { + err = -ERANGE; + SSDFS_ERR("peb_type %#x != ptr->type %#x\n", + peb_type, ptr->type); + goto finish_page_processing; + } + + if (ptr->flags & SSDFS_MAPTBL_SHARED_DESTINATION_PEB) { + err = -ERANGE; + SSDFS_ERR("corrupted PEB desriptor\n"); + goto finish_page_processing; + } + + if (!(ptr->flags & SSDFS_MAPTBL_SOURCE_PEB_HAS_EXT_PTR)) + SSDFS_WARN("PEB hasn't indirect relation\n"); + + switch (ptr->state) { + case SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE: + ptr->state = SSDFS_MAPTBL_USED_PEB_STATE; + *peb_state = SSDFS_MAPTBL_USED_PEB_STATE; + break; + + case SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE: + ptr->state = SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE; + *peb_state = SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE; + break; + + case SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE: + ptr->state = SSDFS_MAPTBL_DIRTY_PEB_STATE; + *peb_state = SSDFS_MAPTBL_DIRTY_PEB_STATE; + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid PEB state %#x\n", + ptr->state); + goto finish_page_processing; + } + + ptr->shared_peb_index = U8_MAX; + ptr->flags &= ~SSDFS_MAPTBL_SOURCE_PEB_HAS_EXT_PTR; + +finish_page_processing: + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * __ssdfs_maptbl_break_indirect_relation() - forget destination PEB as shared + * @tbl: pointer on mapping table object + * @leb_id: LEB ID number + * @peb_type: PEB type + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to forget index of destination PEB and to clear + * SSDFS_MAPTBL_SOURCE_PEB_HAS_EXT_PTR flag. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EFAULT - maptbl has inconsistent state. + * %-EAGAIN - fragment is under initialization yet. + * %-ERANGE - internal error. + */ +static +int __ssdfs_maptbl_break_indirect_relation(struct ssdfs_peb_mapping_table *tbl, + u64 leb_id, u8 peb_type, + struct completion **end) +{ + struct ssdfs_maptbl_fragment_desc *fdesc; + int state; + struct ssdfs_leb_descriptor leb_desc; + u16 physical_index; + u8 peb_state = SSDFS_MAPTBL_UNKNOWN_PEB_STATE; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !end); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + + SSDFS_DBG("maptbl %p, leb_id %llu, peb_type %#x\n", + tbl, leb_id, peb_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + fdesc = ssdfs_maptbl_get_fragment_descriptor(tbl, leb_id); + if (IS_ERR_OR_NULL(fdesc)) { + err = IS_ERR(fdesc) ? PTR_ERR(fdesc) : -ERANGE; + SSDFS_ERR("fail to get fragment descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + return err; + } + + *end = &fdesc->init_end; + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED) { + err = -EFAULT; + SSDFS_ERR("fragment is corrupted: leb_id %llu\n", leb_id); + return err; + } else if (state == SSDFS_MAPTBL_FRAG_CREATED) { + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } + + down_write(&fdesc->lock); + + err = ssdfs_maptbl_get_leb_descriptor(fdesc, leb_id, &leb_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to get leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_fragment_change; + } + + if (!__is_mapped_leb2peb(&leb_desc)) { + err = -ERANGE; + SSDFS_ERR("leb %llu doesn't be mapped yet\n", + leb_id); + goto finish_fragment_change; + } + + if (is_leb_migrating(&leb_desc)) { + err = -ERANGE; + SSDFS_ERR("leb %llu has direct relation\n", + leb_id); + goto finish_fragment_change; + } + + physical_index = le16_to_cpu(leb_desc.physical_index); + + if (physical_index == U16_MAX) { + err = -ENODATA; + SSDFS_DBG("unitialized leb descriptor\n"); + goto finish_fragment_change; + } + + err = ssdfs_maptbl_break_external_peb_ptr(fdesc, physical_index, + peb_type, &peb_state); + if (unlikely(err)) { + SSDFS_ERR("fail to break external PEB pointer: " + "physical_index %u, err %d\n", + physical_index, err); + goto finish_fragment_change; + } + + if (peb_state == SSDFS_MAPTBL_DIRTY_PEB_STATE) { + err = ssdfs_maptbl_set_pre_erase_state(fdesc, physical_index); + if (unlikely(err)) { + SSDFS_ERR("fail to move PEB into pre-erase state: " + "index %u, err %d\n", + physical_index, err); + goto finish_fragment_change; + } + + fdesc->pre_erase_pebs++; + atomic_inc(&tbl->pre_erase_pebs); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fdesc->pre_erase_pebs %u, tbl->pre_erase_pebs %d\n", + fdesc->pre_erase_pebs, + atomic_read(&tbl->pre_erase_pebs)); +#endif /* CONFIG_SSDFS_DEBUG */ + + wake_up(&tbl->wait_queue); + } + +finish_fragment_change: + up_write(&fdesc->lock); + + if (!err) + ssdfs_maptbl_set_fragment_dirty(tbl, fdesc, leb_id); + + return err; +} + +/* + * ssdfs_maptbl_break_indirect_relation() - break PEBs indirect relation + * @tbl: pointer on mapping table object + * @leb_id: source LEB ID number + * @peb_type: PEB type + * @dst_leb_id: destination LEB ID number + * @dst_peb_refs: destination PEB reference counter + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to clear SSDFS_MAPTBL_SHARED_DESTINATION_PEB flag + * in destination PEB. Then it tries to forget index of destination PEB + * and to clear SSDFS_MAPTBL_SOURCE_PEB_HAS_EXT_PTR flag. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EFAULT - maptbl has inconsistent state. + * %-EAGAIN - fragment is under initialization yet. + * %-ERANGE - internal error. + */ +int ssdfs_maptbl_break_indirect_relation(struct ssdfs_peb_mapping_table *tbl, + u64 leb_id, u8 peb_type, + u64 dst_leb_id, int dst_peb_refs, + struct completion **end) +{ + struct ssdfs_fs_info *fsi; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !end); + + SSDFS_DBG("maptbl %p, leb_id %llu, " + "peb_type %#x, dst_leb_id %llu, " + "dst_peb_refs %d\n", + tbl, leb_id, peb_type, + dst_leb_id, dst_peb_refs); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = tbl->fsi; + *end = NULL; + + if (atomic_read(&tbl->flags) & SSDFS_MAPTBL_ERROR) { + ssdfs_fs_error(tbl->fsi->sb, + __FILE__, __func__, __LINE__, + "maptbl has corrupted state\n"); + return -EFAULT; + } + + if (dst_peb_refs <= 0) { + SSDFS_ERR("invalid dst_peb_refs\n"); + return -ERANGE; + } + + if (should_cache_peb_info(peb_type)) { + struct ssdfs_maptbl_peb_relation prev_pebr; + + /* resolve potential inconsistency */ + err = ssdfs_maptbl_convert_leb2peb(fsi, leb_id, peb_type, + &prev_pebr, end); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: " + "leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to resolve inconsistency: " + "leb_id %llu, err %d\n", + leb_id, err); + return err; + } + } + + down_read(&tbl->tbl_lock); + + if (dst_peb_refs > 1) + goto break_indirect_relation; + + err = ssdfs_maptbl_clear_shared_destination_peb(tbl, dst_leb_id, + peb_type, end); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: leb_id %llu\n", + dst_leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_break_indirect_relation; + } else if (unlikely(err)) { + SSDFS_ERR("fail to clear shared destination PEB: " + "dst_leb_id %llu, err %u\n", + dst_leb_id, err); + goto finish_break_indirect_relation; + } + +break_indirect_relation: + err = __ssdfs_maptbl_break_indirect_relation(tbl, leb_id, + peb_type, end); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_break_indirect_relation; + } else if (unlikely(err)) { + SSDFS_ERR("fail to break indirect relation: " + "leb_id %llu, err %u\n", + leb_id, err); + goto finish_break_indirect_relation; + } + +finish_break_indirect_relation: + up_read(&tbl->tbl_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_maptbl_break_zns_external_peb_ptr() - forget shared zone + * @fdesc: fragment descriptor + * @index: PEB index in the fragment + * @peb_type: PEB type + * @peb_state: pointer on PEB state value [out] + * + * This method tries to clear SSDFS_MAPTBL_SOURCE_PEB_HAS_ZONE_PTR flag. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static int +ssdfs_maptbl_break_zns_external_peb_ptr(struct ssdfs_maptbl_fragment_desc *fdesc, + u16 index, u8 peb_type, + u8 *peb_state) +{ + struct ssdfs_peb_descriptor *ptr; + pgoff_t page_index; + u16 item_index; + struct page *page; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fdesc || !peb_state); + + SSDFS_DBG("fdesc %p, index %u\n", + fdesc, index); +#endif /* CONFIG_SSDFS_DEBUG */ + + *peb_state = SSDFS_MAPTBL_UNKNOWN_PEB_STATE; + + page_index = PEBTBL_PAGE_INDEX(fdesc, index); + item_index = index % fdesc->pebs_per_page; + + page = ssdfs_page_array_get_page_locked(&fdesc->array, page_index); + if (IS_ERR_OR_NULL(page)) { + err = page == NULL ? -ERANGE : PTR_ERR(page); + SSDFS_ERR("fail to find page: page_index %lu\n", + page_index); + return err; + } + + kaddr = kmap_local_page(page); + + ptr = GET_PEB_DESCRIPTOR(kaddr, item_index); + if (IS_ERR_OR_NULL(ptr)) { + err = IS_ERR(ptr) ? PTR_ERR(ptr) : -ERANGE; + SSDFS_ERR("fail to get peb_descriptor: " + "page_index %lu, item_index %u, err %d\n", + page_index, item_index, err); + goto finish_page_processing; + } + + if (peb_type != ptr->type) { + err = -ERANGE; + SSDFS_ERR("peb_type %#x != ptr->type %#x\n", + peb_type, ptr->type); + goto finish_page_processing; + } + + if (ptr->flags & SSDFS_MAPTBL_SHARED_DESTINATION_PEB) { + err = -ERANGE; + SSDFS_ERR("corrupted PEB desriptor\n"); + goto finish_page_processing; + } + + if (!(ptr->flags & SSDFS_MAPTBL_SOURCE_PEB_HAS_ZONE_PTR)) + SSDFS_WARN("PEB hasn't indirect relation\n"); + + switch (ptr->state) { + case SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE: + ptr->state = SSDFS_MAPTBL_USED_PEB_STATE; + *peb_state = SSDFS_MAPTBL_USED_PEB_STATE; + break; + + case SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE: + ptr->state = SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE; + *peb_state = SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE; + break; + + case SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE: + ptr->state = SSDFS_MAPTBL_DIRTY_PEB_STATE; + *peb_state = SSDFS_MAPTBL_DIRTY_PEB_STATE; + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid PEB state %#x\n", + ptr->state); + goto finish_page_processing; + } + + ptr->shared_peb_index = U8_MAX; + ptr->flags &= ~SSDFS_MAPTBL_SOURCE_PEB_HAS_ZONE_PTR; + +finish_page_processing: + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * __ssdfs_maptbl_break_zns_indirect_relation() - forget shared zone + * @tbl: pointer on mapping table object + * @leb_id: LEB ID number + * @peb_type: PEB type + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to clear SSDFS_MAPTBL_SOURCE_PEB_HAS_ZONE_PTR flag. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EFAULT - maptbl has inconsistent state. + * %-EAGAIN - fragment is under initialization yet. + * %-ERANGE - internal error. + */ +static int +__ssdfs_maptbl_break_zns_indirect_relation(struct ssdfs_peb_mapping_table *tbl, + u64 leb_id, u8 peb_type, + struct completion **end) +{ + struct ssdfs_maptbl_fragment_desc *fdesc; + int state; + struct ssdfs_leb_descriptor leb_desc; + u16 physical_index; + u8 peb_state = SSDFS_MAPTBL_UNKNOWN_PEB_STATE; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !end); + BUG_ON(!rwsem_is_locked(&tbl->tbl_lock)); + + SSDFS_DBG("maptbl %p, leb_id %llu, peb_type %#x\n", + tbl, leb_id, peb_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + fdesc = ssdfs_maptbl_get_fragment_descriptor(tbl, leb_id); + if (IS_ERR_OR_NULL(fdesc)) { + err = IS_ERR(fdesc) ? PTR_ERR(fdesc) : -ERANGE; + SSDFS_ERR("fail to get fragment descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + return err; + } + + *end = &fdesc->init_end; + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED) { + err = -EFAULT; + SSDFS_ERR("fragment is corrupted: leb_id %llu\n", leb_id); + return err; + } else if (state == SSDFS_MAPTBL_FRAG_CREATED) { + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } + + down_write(&fdesc->lock); + + err = ssdfs_maptbl_get_leb_descriptor(fdesc, leb_id, &leb_desc); + if (unlikely(err)) { + SSDFS_ERR("fail to get leb descriptor: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_fragment_change; + } + + if (!__is_mapped_leb2peb(&leb_desc)) { + err = -ERANGE; + SSDFS_ERR("leb %llu doesn't be mapped yet\n", + leb_id); + goto finish_fragment_change; + } + + if (is_leb_migrating(&leb_desc)) { + err = -ERANGE; + SSDFS_ERR("leb %llu has direct relation\n", + leb_id); + goto finish_fragment_change; + } + + physical_index = le16_to_cpu(leb_desc.physical_index); + + if (physical_index == U16_MAX) { + err = -ENODATA; + SSDFS_DBG("unitialized leb descriptor\n"); + goto finish_fragment_change; + } + + err = ssdfs_maptbl_break_zns_external_peb_ptr(fdesc, physical_index, + peb_type, &peb_state); + if (unlikely(err)) { + SSDFS_ERR("fail to break external PEB pointer: " + "physical_index %u, err %d\n", + physical_index, err); + goto finish_fragment_change; + } + + if (peb_state == SSDFS_MAPTBL_DIRTY_PEB_STATE) { + err = ssdfs_maptbl_set_pre_erase_state(fdesc, physical_index); + if (unlikely(err)) { + SSDFS_ERR("fail to move PEB into pre-erase state: " + "index %u, err %d\n", + physical_index, err); + goto finish_fragment_change; + } + + fdesc->pre_erase_pebs++; + atomic_inc(&tbl->pre_erase_pebs); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fdesc->pre_erase_pebs %u, tbl->pre_erase_pebs %d\n", + fdesc->pre_erase_pebs, + atomic_read(&tbl->pre_erase_pebs)); +#endif /* CONFIG_SSDFS_DEBUG */ + + wake_up(&tbl->wait_queue); + } + +finish_fragment_change: + up_write(&fdesc->lock); + + if (!err) + ssdfs_maptbl_set_fragment_dirty(tbl, fdesc, leb_id); + + return err; +} + +/* + * ssdfs_maptbl_break_zns_indirect_relation() - break PEBs indirect relation + * @tbl: pointer on mapping table object + * @leb_id: source LEB ID number + * @peb_type: PEB type + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to clear SSDFS_MAPTBL_SOURCE_PEB_HAS_ZONE_PTR flag. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EFAULT - maptbl has inconsistent state. + * %-EAGAIN - fragment is under initialization yet. + * %-ERANGE - internal error. + */ +int ssdfs_maptbl_break_zns_indirect_relation(struct ssdfs_peb_mapping_table *tbl, + u64 leb_id, u8 peb_type, + struct completion **end) +{ + struct ssdfs_fs_info *fsi; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tbl || !end); + + SSDFS_DBG("maptbl %p, leb_id %llu, " + "peb_type %#x\n", + tbl, leb_id, peb_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = tbl->fsi; + *end = NULL; + + if (atomic_read(&tbl->flags) & SSDFS_MAPTBL_ERROR) { + ssdfs_fs_error(tbl->fsi->sb, + __FILE__, __func__, __LINE__, + "maptbl has corrupted state\n"); + return -EFAULT; + } + + if (should_cache_peb_info(peb_type)) { + struct ssdfs_maptbl_peb_relation prev_pebr; + + /* resolve potential inconsistency */ + err = ssdfs_maptbl_convert_leb2peb(fsi, leb_id, peb_type, + &prev_pebr, end); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: " + "leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to resolve inconsistency: " + "leb_id %llu, err %d\n", + leb_id, err); + return err; + } + } + + down_read(&tbl->tbl_lock); + + err = __ssdfs_maptbl_break_zns_indirect_relation(tbl, leb_id, + peb_type, end); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment is under initialization: leb_id %llu\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_break_indirect_relation; + } else if (unlikely(err)) { + SSDFS_ERR("fail to break indirect relation: " + "leb_id %llu, err %u\n", + leb_id, err); + goto finish_break_indirect_relation; + } + +finish_break_indirect_relation: + up_read(&tbl->tbl_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +static inline +int __ssdfs_reserve_free_pages(struct ssdfs_fs_info *fsi, u32 count, + int type, u64 *free_pages) +{ +#ifdef CONFIG_SSDFS_DEBUG + u64 reserved = 0; +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + BUG_ON(type <= SSDFS_UNKNOWN_PAGE_TYPE || type >= SSDFS_PAGES_TYPE_MAX); + + SSDFS_DBG("fsi %p, count %u, type %#x\n", + fsi, count, type); +#endif /* CONFIG_SSDFS_DEBUG */ + + *free_pages = 0; + + spin_lock(&fsi->volume_state_lock); + *free_pages = fsi->free_pages; + if (fsi->free_pages >= count) { + err = -EEXIST; + fsi->free_pages -= count; + switch (type) { + case SSDFS_USER_DATA_PAGES: + fsi->reserved_new_user_data_pages += count; + break; + + default: + /* do nothing */ + break; + }; +#ifdef CONFIG_SSDFS_DEBUG + reserved = fsi->reserved_new_user_data_pages; +#endif /* CONFIG_SSDFS_DEBUG */ + } else + err = -ENOSPC; + spin_unlock(&fsi->volume_state_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("reserved %llu\n", reserved); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +static +int ssdfs_try2increase_free_pages(struct ssdfs_fs_info *fsi) +{ + struct ssdfs_peb_mapping_table *tbl; + struct ssdfs_maptbl_fragment_desc *fdesc; + u32 fragments_count; + int state; + u32 i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + + SSDFS_DBG("fsi %p\n", fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + + tbl = fsi->maptbl; + + fragments_count = tbl->fragments_count; + + down_read(&tbl->tbl_lock); + + for (i = 0; i < fragments_count; i++) { + fdesc = &tbl->desc_array[i]; + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED) { + err = -EFAULT; + SSDFS_ERR("fragment is corrupted: index %u\n", + i); + goto finish_fragment_check; + } else if (state == SSDFS_MAPTBL_FRAG_CREATED) { + struct completion *end = &fdesc->init_end; + + up_read(&tbl->tbl_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("wait fragment initialization end: " + "index %u, state %#x\n", + i, state); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("fragment init failed: " + "index %u\n", i); + err = -EFAULT; + goto finish_try2increase_free_pages; + } + + down_read(&tbl->tbl_lock); + } + + down_read(&fdesc->lock); + err = ssdfs_maptbl_try_decrease_reserved_pebs(tbl, fdesc); + up_read(&fdesc->lock); + + if (err == -ENOENT) { + err = -ENOSPC; + SSDFS_DBG("unable to decrease reserved pebs\n"); + } else if (unlikely(err)) { + SSDFS_ERR("fail to decrease reserved pebs: " + "err %d\n", err); + goto finish_fragment_check; + } + } + +finish_fragment_check: + up_read(&tbl->tbl_lock); + +finish_try2increase_free_pages: + return err; +} + +static +int ssdfs_wait_maptbl_init_ending(struct ssdfs_fs_info *fsi, u32 count) +{ + struct ssdfs_peb_mapping_table *tbl; + struct ssdfs_maptbl_fragment_desc *fdesc; + u32 fragments_count; + int state; + u64 free_pages; + u32 i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + + SSDFS_DBG("fsi %p\n", fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + + tbl = fsi->maptbl; + + fragments_count = tbl->fragments_count; + + down_read(&tbl->tbl_lock); + + for (i = 0; i < fragments_count; i++) { + fdesc = &tbl->desc_array[i]; + + state = atomic_read(&fdesc->state); + if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED) { + err = -EFAULT; + SSDFS_ERR("fragment is corrupted: index %u\n", + i); + goto finish_fragment_check; + } else if (state == SSDFS_MAPTBL_FRAG_CREATED) { + struct completion *end = &fdesc->init_end; + + up_read(&tbl->tbl_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("wait fragment initialization end: " + "index %u, state %#x\n", + i, state); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("fragment init failed: " + "index %u\n", i); + err = -EFAULT; + goto finish_wait_init; + } + + spin_lock(&fsi->volume_state_lock); + free_pages = fsi->free_pages; + spin_unlock(&fsi->volume_state_lock); + + if (free_pages >= count) + goto finish_wait_init; + + down_read(&tbl->tbl_lock); + } + } + +finish_fragment_check: + up_read(&tbl->tbl_lock); + +finish_wait_init: + return err; +} + +int ssdfs_reserve_free_pages(struct ssdfs_fs_info *fsi, u32 count, int type) +{ + u64 free_pages = 0; + int state; + u32 i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + BUG_ON(type <= SSDFS_UNKNOWN_PAGE_TYPE || type >= SSDFS_PAGES_TYPE_MAX); + + SSDFS_DBG("fsi %p, count %u, type %#x\n", + fsi, count, type); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_read(&fsi->global_fs_state); + + err = __ssdfs_reserve_free_pages(fsi, count, type, &free_pages); + if (err == -EEXIST) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free pages %u have been reserved, free_pages %llu\n", + count, free_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (err == -ENOSPC && state == SSDFS_UNKNOWN_GLOBAL_FS_STATE) { + err = ssdfs_wait_maptbl_init_ending(fsi, count); + if (unlikely(err)) { + SSDFS_ERR("initialization has failed: " + "err %d\n", err); + goto finish_reserve_free_pages; + } + + err = __ssdfs_reserve_free_pages(fsi, count, + type, &free_pages); + if (err == -EEXIST) { + /* succesful reservation */ + err = 0; + goto finish_reserve_free_pages; + } else { + /* + * finish logic + */ + goto finish_reserve_free_pages; + } + } else if (err == -ENOSPC) { + DEFINE_WAIT(wait); + err = 0; + + wake_up_all(&fsi->shextree->wait_queue); + wake_up_all(&fsi->maptbl->wait_queue); + + for (i = 0; i < SSDFS_GC_THREAD_TYPE_MAX; i++) { + wake_up_all(&fsi->gc_wait_queue[i]); + } + + prepare_to_wait(&fsi->maptbl->erase_ops_end_wq, &wait, + TASK_UNINTERRUPTIBLE); + schedule(); + finish_wait(&fsi->maptbl->erase_ops_end_wq, &wait); + + err = ssdfs_try2increase_free_pages(fsi); + if (err == -ENOSPC) { + /* + * try to collect the dirty segments + */ + err = 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to increase the free pages: " + "err %d\n", err); + goto finish_reserve_free_pages; + } else { + err = __ssdfs_reserve_free_pages(fsi, count, + type, &free_pages); + if (err == -EEXIST) { + /* succesful reservation */ + err = 0; + goto finish_reserve_free_pages; + } else { + /* + * try to collect the dirty segments + */ + err = 0; + } + } + + err = ssdfs_collect_dirty_segments_now(fsi); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to collect the dirty segments: " + "err %d\n", err); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_reserve_free_pages; + } else if (unlikely(err)) { + SSDFS_ERR("fail to collect the dirty segments: " + "err %d\n", err); + goto finish_reserve_free_pages; + } + + err = ssdfs_try2increase_free_pages(fsi); + if (err == -ENOSPC) { + /* + * finish logic + */ + goto finish_reserve_free_pages; + } else if (unlikely(err)) { + SSDFS_ERR("fail to increase the free pages: " + "err %d\n", err); + goto finish_reserve_free_pages; + } else { + err = __ssdfs_reserve_free_pages(fsi, count, + type, &free_pages); + if (err == -EEXIST) { + /* succesful reservation */ + err = 0; + goto finish_reserve_free_pages; + } else { + /* + * finish logic + */ + goto finish_reserve_free_pages; + } + } + } else + BUG(); + +finish_reserve_free_pages: + if (err) { + err = -ENOSPC; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to reserve, free_pages %llu\n", + free_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free pages %u have been reserved, free_pages %llu\n", + count, free_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return err; +} + +void ssdfs_debug_maptbl_object(struct ssdfs_peb_mapping_table *tbl) +{ +#ifdef CONFIG_SSDFS_DEBUG + int i, j; + size_t bytes_count; + + BUG_ON(!tbl); + + SSDFS_DBG("fragments_count %u, fragments_per_seg %u, " + "fragments_per_peb %u, fragment_bytes %u, " + "flags %#x, lebs_count %llu, pebs_count %llu, " + "lebs_per_fragment %u, pebs_per_fragment %u, " + "pebs_per_stripe %u, stripes_per_fragment %u\n", + tbl->fragments_count, tbl->fragments_per_seg, + tbl->fragments_per_peb, tbl->fragment_bytes, + atomic_read(&tbl->flags), tbl->lebs_count, + tbl->pebs_count, tbl->lebs_per_fragment, + tbl->pebs_per_fragment, tbl->pebs_per_stripe, + tbl->stripes_per_fragment); + + for (i = 0; i < MAPTBL_LIMIT1; i++) { + for (j = 0; j < MAPTBL_LIMIT2; j++) { + struct ssdfs_meta_area_extent *extent; + extent = &tbl->extents[i][j]; + SSDFS_DBG("extent[%d][%d]: " + "start_id %llu, len %u, " + "type %#x, flags %#x\n", + i, j, + le64_to_cpu(extent->start_id), + le32_to_cpu(extent->len), + le16_to_cpu(extent->type), + le16_to_cpu(extent->flags)); + } + } + + SSDFS_DBG("segs_count %u\n", tbl->segs_count); + + for (i = 0; i < SSDFS_MAPTBL_SEG_COPY_MAX; i++) { + if (!tbl->segs[i]) + continue; + + for (j = 0; j < tbl->segs_count; j++) + SSDFS_DBG("seg[%d][%d] %p\n", i, j, tbl->segs[i][j]); + } + + SSDFS_DBG("pre_erase_pebs %u, max_erase_ops %u, " + "last_peb_recover_cno %llu\n", + atomic_read(&tbl->pre_erase_pebs), + atomic_read(&tbl->max_erase_ops), + (u64)atomic64_read(&tbl->last_peb_recover_cno)); + + bytes_count = tbl->fragments_count + BITS_PER_LONG - 1; + bytes_count /= BITS_PER_BYTE; + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + tbl->dirty_bmap, bytes_count); + + for (i = 0; i < tbl->fragments_count; i++) { + struct ssdfs_maptbl_fragment_desc *desc; + struct page *page; + u32 pages_count; + int state; + + desc = &tbl->desc_array[i]; + + state = atomic_read(&desc->state); + SSDFS_DBG("fragment #%d: " + "state %#x, start_leb %llu, lebs_count %u, " + "lebs_per_page %u, lebtbl_pages %u, " + "pebs_per_page %u, stripe_pages %u, " + "mapped_lebs %u, migrating_lebs %u, " + "pre_erase_pebs %u, recovering_pebs %u\n", + i, state, + desc->start_leb, desc->lebs_count, + desc->lebs_per_page, desc->lebtbl_pages, + desc->pebs_per_page, desc->stripe_pages, + desc->mapped_lebs, desc->migrating_lebs, + desc->pre_erase_pebs, desc->recovering_pebs); + + if (state == SSDFS_MAPTBL_FRAG_CREATED) { + SSDFS_DBG("fragment #%d isn't initialized\n", i); + continue; + } else if (state == SSDFS_MAPTBL_FRAG_INIT_FAILED) { + SSDFS_DBG("fragment #%d init was failed\n", i); + continue; + } + + pages_count = desc->lebtbl_pages + + (desc->stripe_pages * tbl->stripes_per_fragment); + + for (j = 0; j < pages_count; j++) { + void *kaddr; + + page = ssdfs_page_array_get_page_locked(&desc->array, + j); + + SSDFS_DBG("page[%d] %p\n", j, page); + if (IS_ERR_OR_NULL(page)) + continue; + + SSDFS_DBG("page_index %llu, flags %#lx\n", + (u64)page_index(page), page->flags); + + kaddr = kmap_local_page(page); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, PAGE_SIZE); + kunmap_local(kaddr); + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); + } + } +#endif /* CONFIG_SSDFS_DEBUG */ +} diff --git a/fs/ssdfs/peb_mapping_table_cache.c b/fs/ssdfs/peb_mapping_table_cache.c new file mode 100644 index 000000000000..e83e07947743 --- /dev/null +++ b/fs/ssdfs/peb_mapping_table_cache.c @@ -0,0 +1,1497 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/peb_mapping_table_cache.c - PEB mapping table cache functionality. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "page_array.h" +#include "peb_mapping_table.h" + +#include + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_map_cache_page_leaks; +atomic64_t ssdfs_map_cache_memory_leaks; +atomic64_t ssdfs_map_cache_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_map_cache_cache_leaks_increment(void *kaddr) + * void ssdfs_map_cache_cache_leaks_decrement(void *kaddr) + * void *ssdfs_map_cache_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_map_cache_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_map_cache_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_map_cache_kfree(void *kaddr) + * struct page *ssdfs_map_cache_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_map_cache_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_map_cache_free_page(struct page *page) + * void ssdfs_map_cache_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(map_cache) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(map_cache) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_map_cache_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_map_cache_page_leaks, 0); + atomic64_set(&ssdfs_map_cache_memory_leaks, 0); + atomic64_set(&ssdfs_map_cache_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_map_cache_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_map_cache_page_leaks) != 0) { + SSDFS_ERR("MAPPING CACHE: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_map_cache_page_leaks)); + } + + if (atomic64_read(&ssdfs_map_cache_memory_leaks) != 0) { + SSDFS_ERR("MAPPING CACHE: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_map_cache_memory_leaks)); + } + + if (atomic64_read(&ssdfs_map_cache_cache_leaks) != 0) { + SSDFS_ERR("MAPPING CACHE: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_map_cache_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +/* + * ssdfs_maptbl_cache_init() - init mapping table cache + */ +void ssdfs_maptbl_cache_init(struct ssdfs_maptbl_cache *cache) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cache); + + SSDFS_DBG("cache %p\n", cache); +#endif /* CONFIG_SSDFS_DEBUG */ + + init_rwsem(&cache->lock); + pagevec_init(&cache->pvec); + atomic_set(&cache->bytes_count, 0); + ssdfs_peb_mapping_queue_init(&cache->pm_queue); +} + +/* + * ssdfs_maptbl_cache_destroy() - destroy mapping table cache + */ +void ssdfs_maptbl_cache_destroy(struct ssdfs_maptbl_cache *cache) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cache); + + SSDFS_DBG("cache %p\n", cache); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_map_cache_pagevec_release(&cache->pvec); + ssdfs_peb_mapping_queue_remove_all(&cache->pm_queue); +} + +/* + * __ssdfs_maptbl_cache_area_size() - calculate areas' size in fragment + * @hdr: fragment's header + * @leb2peb_area_size: LEB2PEB area size [out] + * @peb_state_area_size: PEB state area size [out] + * + * This method calculates size in bytes of LEB2PEB area and + * PEB state area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - fragment is empty. + */ +static inline +int __ssdfs_maptbl_cache_area_size(struct ssdfs_maptbl_cache_header *hdr, + size_t *leb2peb_area_size, + size_t *peb_state_area_size) +{ + size_t hdr_size = sizeof(struct ssdfs_maptbl_cache_header); + size_t pair_size = sizeof(struct ssdfs_leb2peb_pair); + size_t peb_state_size = sizeof(struct ssdfs_maptbl_cache_peb_state); + size_t magic_size = peb_state_size; + u16 bytes_count; + u16 items_count; + size_t threshold_size; + size_t capacity; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!hdr || !leb2peb_area_size || !peb_state_area_size); + + SSDFS_DBG("hdr %p\n", hdr); +#endif /* CONFIG_SSDFS_DEBUG */ + + *leb2peb_area_size = 0; + *peb_state_area_size = magic_size; + + bytes_count = le16_to_cpu(hdr->bytes_count); + items_count = le16_to_cpu(hdr->items_count); + + threshold_size = hdr_size + magic_size; + + if (bytes_count < threshold_size) { + SSDFS_ERR("fragment is corrupted: " + "hdr_size %zu, bytes_count %u\n", + hdr_size, bytes_count); + return -ERANGE; + } else if (bytes_count == threshold_size) { + SSDFS_DBG("fragment is empty\n"); + return -ENODATA; + } + + capacity = + (bytes_count - threshold_size) / (pair_size + peb_state_size); + + if (items_count > capacity) { + SSDFS_ERR("items_count %u > capacity %zu\n", + items_count, capacity); + return -ERANGE; + } + + *leb2peb_area_size = capacity * pair_size; + *peb_state_area_size = magic_size + (capacity * peb_state_size); + + return 0; +} + +/* + * ssdfs_leb2peb_pair_area_size() - calculate LEB2PEB area size + * @hdr: fragment's header + * + * This method calculates size in bytes of LEB2PEB area. + * + * RETURN: + * [success] - LEB2PEB area size in bytes. + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - fragment is empty. + */ +static inline +int ssdfs_leb2peb_pair_area_size(struct ssdfs_maptbl_cache_header *hdr) +{ + size_t leb2peb_area_size; + size_t peb_state_area_size; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!hdr); + + SSDFS_DBG("hdr %p\n", hdr); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_maptbl_cache_area_size(hdr, + &leb2peb_area_size, + &peb_state_area_size); + if (unlikely(err)) { + SSDFS_ERR("fail to define leb2peb area size: " + "err %d\n", + err); + return err; + } + + return (int)leb2peb_area_size; +} + +/* + * ssdfs_maptbl_cache_fragment_capacity() - calculate fragment capacity + * + * This method calculates the capacity (maximum number of items) + * of fragment. + */ +static inline +size_t ssdfs_maptbl_cache_fragment_capacity(void) +{ + size_t hdr_size = sizeof(struct ssdfs_maptbl_cache_header); + size_t pair_size = sizeof(struct ssdfs_leb2peb_pair); + size_t peb_state_size = sizeof(struct ssdfs_maptbl_cache_peb_state); + size_t magic_size = peb_state_size; + size_t size = PAGE_SIZE; + size_t count; + + size -= hdr_size + magic_size; + count = size / (pair_size + peb_state_size); + + return count; +} + +/* + * LEB2PEB_PAIR_AREA() - get pointer on first LEB2PEB pair + * @kaddr: pointer on fragment's beginning + */ +static inline +struct ssdfs_leb2peb_pair *LEB2PEB_PAIR_AREA(void *kaddr) +{ + size_t hdr_size = sizeof(struct ssdfs_maptbl_cache_header); + + return (struct ssdfs_leb2peb_pair *)((u8 *)kaddr + hdr_size); +} + +/* + * ssdfs_peb_state_area_size() - calculate PEB state area size + * @hdr: fragment's header + * + * This method calculates size in bytes of PEB state area. + * + * RETURN: + * [success] - PEB state area size in bytes. + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - fragment is empty. + */ +static inline +int ssdfs_peb_state_area_size(struct ssdfs_maptbl_cache_header *hdr) +{ + size_t leb2peb_area_size; + size_t peb_state_area_size; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!hdr); + + SSDFS_DBG("hdr %p\n", hdr); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_maptbl_cache_area_size(hdr, + &leb2peb_area_size, + &peb_state_area_size); + if (err == -ENODATA) { + /* empty area */ + err = 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to define peb state area size: " + "err %d\n", + err); + return err; + } + + return (int)peb_state_area_size; +} + +/* + * PEB_STATE_AREA() - get pointer on PEB state area + * @kaddr: pointer on fragment's beginning + * @area_offset: PEB state area's offset + * + * This method tries to prepare pointer on the + * PEB state area in the fragment. + * + * RETURN: + * [success] - pointer on the PEB state area. + * [failure] - error code: + * + * %-ERANGE - corrupted PEB state area. + */ +static inline +void *PEB_STATE_AREA(void *kaddr, u32 *area_offset) +{ + struct ssdfs_maptbl_cache_header *hdr; + size_t hdr_size = sizeof(struct ssdfs_maptbl_cache_header); + size_t leb2peb_area_size; + size_t peb_state_area_size; + void *start = NULL; + __le32 *magic = NULL; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!kaddr || !area_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + *area_offset = U32_MAX; + + err = __ssdfs_maptbl_cache_area_size(hdr, + &leb2peb_area_size, + &peb_state_area_size); + if (err == -ENODATA) { + /* empty area */ + err = 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to get area size: err %d\n", err); + return ERR_PTR(err); + } + + if ((hdr_size + leb2peb_area_size + peb_state_area_size) > PAGE_SIZE) { + err = -ERANGE; + SSDFS_ERR("invalid state: " + "hdr_size %zu, leb2peb_area_size %zu, " + "peb_state_area_size %zu\n", + hdr_size, leb2peb_area_size, + peb_state_area_size); + return ERR_PTR(err); + } + + *area_offset = hdr_size + leb2peb_area_size; + start = (u8 *)kaddr + hdr_size + leb2peb_area_size; + magic = (__le32 *)start; + + if (le32_to_cpu(*magic) != SSDFS_MAPTBL_CACHE_PEB_STATE_MAGIC) { + SSDFS_ERR("invalid magic %#x\n", + le32_to_cpu(*magic)); + return ERR_PTR(-ERANGE); + } + + return start; +} + +/* + * FIRST_PEB_STATE() - get pointer on first PEB state + * @kaddr: pointer on fragment's beginning + * @area_offset: PEB state area's offset + * + * This method tries to prepare pointer on the first + * PEB state in the fragment. + * + * RETURN: + * [success] - pointer on first PEB state. + * [failure] - error code: + * + * %-ERANGE - corrupted PEB state area. + */ +static inline +struct ssdfs_maptbl_cache_peb_state *FIRST_PEB_STATE(void *kaddr, + u32 *area_offset) +{ + size_t peb_state_size = sizeof(struct ssdfs_maptbl_cache_peb_state); + size_t magic_size = peb_state_size; + void *start = PEB_STATE_AREA(kaddr, area_offset); + + if (IS_ERR_OR_NULL(start)) + return (struct ssdfs_maptbl_cache_peb_state *)start; + + return (struct ssdfs_maptbl_cache_peb_state *)((u8 *)start + + magic_size); +} + +/* + * ssdfs_find_range_lower_limit() - find the first item of range + * @hdr: mapping table cache's header + * @leb_id: LEB ID + * @start_index: starting index + * @start_pair: pointer on starting LEB2PEB pair + * @found_index: pointer on found index [out] + * + * This method tries to find position of the first item + * for the same @leb_id in the range. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_find_range_lower_limit(struct ssdfs_maptbl_cache_header *hdr, + u64 leb_id, int start_index, + struct ssdfs_leb2peb_pair *start_pair, + int *found_index) +{ + struct ssdfs_leb2peb_pair *cur_pair = NULL; + u16 items_count; + u64 cur_leb_id; + int i = 0, j = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!hdr || !start_pair || !found_index); + + SSDFS_DBG("hdr %p, leb_id %llu, start_index %d, " + "start_pair %p, found_index %p\n", + hdr, leb_id, start_index, start_pair, found_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + items_count = le16_to_cpu(hdr->items_count); + + if (items_count == 0) { + SSDFS_ERR("empty maptbl cache\n"); + return -ERANGE; + } + + if (start_index < 0 || start_index >= items_count) { + SSDFS_ERR("invalid index: " + "start_index %d, items_count %u\n", + start_index, items_count); + return -EINVAL; + } + + if (leb_id != le64_to_cpu(start_pair->leb_id)) { + SSDFS_ERR("invalid ID: " + "leb_id1 %llu, leb_id2 %llu\n", + leb_id, + le64_to_cpu(start_pair->leb_id)); + return -EINVAL; + } + + *found_index = start_index; + + for (i = start_index - 1, j = 1; i >= 0; i--, j++) { + cur_pair = start_pair - j; + cur_leb_id = le64_to_cpu(cur_pair->leb_id); + + if (cur_leb_id == leb_id) { + *found_index = i; + continue; + } else + return 0; + + if ((start_index - i) >= 2) { + SSDFS_ERR("corrupted cache\n"); + return -ERANGE; + } + } + + return 0; +} + +/* + * ssdfs_find_range_upper_limit() - find the last item of range + * @hdr: mapping table cache's header + * @leb_id: LEB ID + * @start_index: starting index + * @start_pair: pointer on starting LEB2PEB pair + * @found_index: pointer on found index [out] + * + * This method tries to find position of the last item + * for the same @leb_id in the range. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_find_range_upper_limit(struct ssdfs_maptbl_cache_header *hdr, + u64 leb_id, int start_index, + struct ssdfs_leb2peb_pair *start_pair, + int *found_index) +{ + struct ssdfs_leb2peb_pair *cur_pair = NULL; + u16 items_count; + u64 cur_leb_id; + int i = 0, j = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!hdr || !start_pair || !found_index); + + SSDFS_DBG("hdr %p, leb_id %llu, start_index %d, " + "start_pair %p, found_index %p\n", + hdr, leb_id, start_index, start_pair, found_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + items_count = le16_to_cpu(hdr->items_count); + + if (items_count == 0) { + SSDFS_ERR("empty maptbl cache\n"); + return -ERANGE; + } + + if (start_index < 0 || start_index >= items_count) { + SSDFS_ERR("invalid index: " + "start_index %d, items_count %u\n", + start_index, items_count); + return -EINVAL; + } + + if (leb_id != le64_to_cpu(start_pair->leb_id)) { + SSDFS_ERR("invalid ID: " + "leb_id1 %llu, leb_id2 %llu\n", + leb_id, + le64_to_cpu(start_pair->leb_id)); + return -EINVAL; + } + + *found_index = start_index; + + for (i = start_index + 1, j = 1; i < items_count; i++, j++) { + cur_pair = start_pair + j; + cur_leb_id = le64_to_cpu(cur_pair->leb_id); + + if (cur_leb_id == leb_id) { + *found_index = i; + continue; + } else + return 0; + + if ((i - start_index) >= 2) { + SSDFS_ERR("corrupted cache\n"); + return -ERANGE; + } + } + + return 0; +} + +/* + * ssdfs_find_result_pair() - extract pair of descriptors + * @hdr: mapping table cache's header + * @sequence_id: fragment ID + * @leb_id: LEB ID + * @peb_index: main/relation PEB index + * @cur_index: current index of item in cache + * @start_pair: pointer on starting pair in cache + * @cur_pair: pointer on current pair for @current_index + * @res: pointer on the extracted pair of descriptors [out] + * + * This method tries to extract the pair of descriptor for + * main and relation LEB2PEB pairs. + * + * RETURN: + * [success] - error code: + * %-EAGAIN - repeat the search for the next memory page + * %-EEXIST - @leb_id is found. + * + * [failure] - error code: + * %-ERANGE - internal error. + */ +static +int ssdfs_find_result_pair(struct ssdfs_maptbl_cache_header *hdr, + unsigned sequence_id, + u64 leb_id, + int peb_index, + int cur_index, + struct ssdfs_leb2peb_pair *start_pair, + struct ssdfs_leb2peb_pair *cur_pair, + struct ssdfs_maptbl_cache_search_result *res) +{ + struct ssdfs_maptbl_cache_item *cur_item; + size_t pair_size = sizeof(struct ssdfs_leb2peb_pair); + int lo_limit = -1; + int up_limit = -1; + u16 items_count; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!hdr || !start_pair || !cur_pair || !res); + + SSDFS_DBG("sequence_id %u, leb_id %llu, " + "peb_index %#x, cur_index %d\n", + sequence_id, leb_id, peb_index, cur_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + cur_item = &res->pebs[peb_index]; + cur_item->state = SSDFS_MAPTBL_CACHE_SEARCH_ERROR; + + items_count = le16_to_cpu(hdr->items_count); + if (items_count == 0) { + SSDFS_ERR("empty maptbl cache\n"); + return -ERANGE; + } + + err = ssdfs_find_range_lower_limit(hdr, leb_id, cur_index, + cur_pair, &lo_limit); + if (unlikely(err)) { + SSDFS_ERR("fail to find lower_limit: " + "leb_id %llu, cur_index %d, " + "err %d\n", + leb_id, cur_index, err); + return err; + } + + err = ssdfs_find_range_upper_limit(hdr, leb_id, cur_index, + cur_pair, &up_limit); + if (unlikely(err)) { + SSDFS_ERR("fail to find upper_limit: " + "leb_id %llu, cur_index %d, " + "err %d\n", + leb_id, cur_index, err); + return err; + } + + switch (peb_index) { + case SSDFS_MAPTBL_MAIN_INDEX: + /* save main item */ + cur_item->state = SSDFS_MAPTBL_CACHE_ITEM_FOUND; + cur_item->page_index = sequence_id; + cur_item->item_index = lo_limit; + cur_pair = start_pair + lo_limit; + ssdfs_memcpy(&cur_item->found, 0, pair_size, + cur_pair, 0, pair_size, + pair_size); + peb_index = SSDFS_MAPTBL_RELATION_INDEX; + cur_item = &res->pebs[peb_index]; + cur_item->state = SSDFS_MAPTBL_CACHE_ITEM_ABSENT; + + if (lo_limit == up_limit && (up_limit + 1) == items_count) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("leb_id %llu, peb_index %d, cur_index %d, " + "lo_limit %d, up_limit %d, items_count %u\n", + leb_id, peb_index, cur_index, + lo_limit, up_limit, items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EAGAIN; + } else if (lo_limit == up_limit) + return -EEXIST; + + /* save relation item */ + cur_item->state = SSDFS_MAPTBL_CACHE_ITEM_FOUND; + cur_item->page_index = sequence_id; + cur_item->item_index = up_limit; + cur_pair = start_pair + up_limit; + ssdfs_memcpy(&cur_item->found, 0, pair_size, + cur_pair, 0, pair_size, + pair_size); + break; + + case SSDFS_MAPTBL_RELATION_INDEX: + if (lo_limit != up_limit && lo_limit != 0) { + SSDFS_ERR("corrupted cache\n"); + return -ERANGE; + } + + cur_item->state = SSDFS_MAPTBL_CACHE_ITEM_FOUND; + cur_item->page_index = sequence_id; + cur_item->item_index = lo_limit; + cur_pair = start_pair + lo_limit; + ssdfs_memcpy(&cur_item->found, 0, pair_size, + cur_pair, 0, pair_size, + pair_size); + break; + + default: + SSDFS_ERR("invalid index %d\n", peb_index); + return -ERANGE; + } + + return -EEXIST; +} + +static +void ssdfs_maptbl_cache_show_items(void *kaddr) +{ + struct ssdfs_maptbl_cache_header *hdr; + struct ssdfs_leb2peb_pair *start_pair, *cur_pair; + struct ssdfs_maptbl_cache_peb_state *start_state = NULL; + struct ssdfs_maptbl_cache_peb_state *state_ptr; + size_t peb_state_size = sizeof(struct ssdfs_maptbl_cache_peb_state); + u16 items_count; + u32 area_offset = U32_MAX; + int i; + + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + items_count = le16_to_cpu(hdr->items_count); + start_pair = LEB2PEB_PAIR_AREA(kaddr); + + SSDFS_ERR("MAPTBL CACHE:\n"); + + SSDFS_ERR("LEB2PEB pairs:\n"); + for (i = 0; i < items_count; i++) { + cur_pair = start_pair + i; + SSDFS_ERR("item %d, leb_id %llu, peb_id %llu\n", + i, + le64_to_cpu(cur_pair->leb_id), + le64_to_cpu(cur_pair->peb_id)); + } + + start_state = FIRST_PEB_STATE(kaddr, &area_offset); + + SSDFS_ERR("PEB states:\n"); + for (i = 0; i < items_count; i++) { + state_ptr = + (struct ssdfs_maptbl_cache_peb_state *)((u8 *)start_state + + (peb_state_size * i)); + SSDFS_ERR("item %d, consistency %#x, " + "state %#x, flags %#x, " + "shared_peb_index %u\n", + i, state_ptr->consistency, + state_ptr->state, state_ptr->flags, + state_ptr->shared_peb_index); + } +} + +/* + * __ssdfs_maptbl_cache_find_leb() - find position of LEB + * @kaddr: pointer on maptbl cache's fragment + * @sequence_id: fragment ID + * @leb_id: LEB ID + * @res: pointer on the extracted pair of descriptors [out] + * + * This method tries to find position of LEB for extracting + * or inserting a LEB/PEB pair. + * + * RETURN: + * [success] - error code: + * %-EAGAIN - repeat the search for the next memory page + * %-EFAULT - @leb_id doesn't found; position can be used for inserting. + * %-E2BIG - page is full; @leb_id is greater than ending LEB number. + * %-ENODATA - @leb_id is greater than ending LEB number. + * %-EEXIST - @leb_id is found. + * + * [failure] - error code: + * %-ERANGE - internal error. + */ +static +int __ssdfs_maptbl_cache_find_leb(void *kaddr, + unsigned sequence_id, + u64 leb_id, + struct ssdfs_maptbl_cache_search_result *res) +{ + struct ssdfs_maptbl_cache_header *hdr; + struct ssdfs_maptbl_cache_item *cur_item; + int cur_item_index = SSDFS_MAPTBL_MAIN_INDEX; + struct ssdfs_leb2peb_pair *start_pair, *cur_pair; + size_t pair_size = sizeof(struct ssdfs_leb2peb_pair); + u64 start_leb, end_leb; + u64 start_diff, end_diff; + u64 cur_leb_id; + u16 items_count; + int i = 0; + int step, cur_index; + bool disable_step = false; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!kaddr || !res); + + SSDFS_DBG("kaddr %p, sequence_id %u, " + "leb_id %llu, res %p\n", + kaddr, sequence_id, leb_id, res); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + if (le16_to_cpu(hdr->sequence_id) != sequence_id) { + SSDFS_ERR("invalid sequence_id %u\n", sequence_id); + return -ERANGE; + } + + items_count = le16_to_cpu(hdr->items_count); + + if (items_count == 0) { + SSDFS_ERR("maptbl cache fragment %u is empty\n", + sequence_id); + return -ERANGE; + } + + start_pair = LEB2PEB_PAIR_AREA(kaddr); + start_leb = le64_to_cpu(hdr->start_leb); + end_leb = le64_to_cpu(hdr->end_leb); + + cur_item = &res->pebs[cur_item_index]; + + switch (cur_item->state) { + case SSDFS_MAPTBL_CACHE_ITEM_UNKNOWN: + /* + * Continue the search for main item + */ + break; + + case SSDFS_MAPTBL_CACHE_ITEM_FOUND: + cur_item_index = SSDFS_MAPTBL_RELATION_INDEX; + cur_item = &res->pebs[cur_item_index]; + + switch (cur_item->state) { + case SSDFS_MAPTBL_CACHE_ITEM_UNKNOWN: + /* + * Continue the search for relation item + */ + break; + + default: + SSDFS_ERR("invalid search result's state %#x\n", + cur_item->state); + return -ERANGE; + } + break; + + default: + SSDFS_ERR("invalid search result's state %#x\n", + cur_item->state); + return -ERANGE; + } + + if (leb_id < start_leb) { + cur_item->state = SSDFS_MAPTBL_CACHE_ITEM_ABSENT; + cur_item->page_index = sequence_id; + cur_item->item_index = 0; + ssdfs_memcpy(&cur_item->found, 0, pair_size, + start_pair, 0, pair_size, + pair_size); +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("leb_id %llu, start_leb %llu, end_leb %llu\n", + leb_id, start_leb, end_leb); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EFAULT; + } + + if (end_leb < leb_id) { + size_t capacity = ssdfs_maptbl_cache_fragment_capacity(); + + if ((items_count + 1) > capacity) + return -E2BIG; + else { + cur_item->state = SSDFS_MAPTBL_CACHE_ITEM_ABSENT; + cur_item->page_index = sequence_id; + cur_item->item_index = items_count; + ssdfs_memcpy(&cur_item->found, 0, pair_size, + start_pair + items_count, 0, pair_size, + pair_size); + return -ENODATA; + } + } + + start_diff = leb_id - start_leb; + end_diff = end_leb - leb_id; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_diff %llu, end_diff %llu\n", + start_diff, end_diff); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (start_diff <= end_diff) { + /* straight search */ + SSDFS_DBG("straight search\n"); + + i = 0; + cur_index = 0; + step = 1; + while (i < items_count) { + cur_pair = start_pair + cur_index; + cur_leb_id = le64_to_cpu(cur_pair->leb_id); + + if (leb_id < cur_leb_id) { + disable_step = true; + cur_index = i; + cur_pair = start_pair + cur_index; + cur_leb_id = le64_to_cpu(cur_pair->leb_id); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_index %d, step %d, " + "cur_leb_id %llu, leb_id %llu\n", + cur_index, step, cur_leb_id, leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (leb_id > cur_leb_id) + goto continue_straight_search; + else if (cur_leb_id == leb_id) { + return ssdfs_find_result_pair(hdr, sequence_id, + leb_id, + cur_item_index, + cur_index, + start_pair, + cur_pair, + res); + } else { + cur_item->state = + SSDFS_MAPTBL_CACHE_ITEM_ABSENT; + cur_item->page_index = sequence_id; + cur_item->item_index = cur_index; + ssdfs_memcpy(&cur_item->found, 0, pair_size, + cur_pair, 0, pair_size, + pair_size); +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("leb_id %llu, start_leb %llu, end_leb %llu, " + "cur_leb_id %llu, cur_index %d, step %d\n", + leb_id, start_leb, end_leb, + cur_leb_id, cur_index, step); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EFAULT; + } + +continue_straight_search: + if (!disable_step) + step *= 2; + + i = cur_index + 1; + + if (disable_step) + cur_index = i; + else if ((i + step) < items_count) { + cur_index = i + step; + } else { + disable_step = true; + cur_index = i; + } + } + } else { + /* reverse search */ + SSDFS_DBG("reverse search\n"); + + i = items_count - 1; + cur_index = i; + step = 1; + while (i >= 0) { + cur_pair = start_pair + cur_index; + cur_leb_id = le64_to_cpu(cur_pair->leb_id); + + if (leb_id > cur_leb_id) { + disable_step = true; + cur_index = i; + cur_pair = start_pair + cur_index; + cur_leb_id = le64_to_cpu(cur_pair->leb_id); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_index %d, step %d, " + "cur_leb_id %llu, leb_id %llu\n", + cur_index, step, cur_leb_id, leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (leb_id < cur_leb_id) + goto continue_reverse_search; + else if (cur_leb_id == leb_id) { + return ssdfs_find_result_pair(hdr, sequence_id, + leb_id, + cur_item_index, + cur_index, + start_pair, + cur_pair, + res); + } else { + cur_item->state = + SSDFS_MAPTBL_CACHE_ITEM_ABSENT; + cur_item->page_index = sequence_id; + cur_index++; + cur_pair = start_pair + cur_index; + cur_item->item_index = cur_index; + ssdfs_memcpy(&cur_item->found, 0, pair_size, + cur_pair, 0, pair_size, + pair_size); +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("leb_id %llu, start_leb %llu, end_leb %llu, " + "cur_leb_id %llu, cur_index %d, step %d\n", + leb_id, start_leb, end_leb, + cur_leb_id, cur_index, step); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EFAULT; + } + +continue_reverse_search: + if (!disable_step) + step *= 2; + + i = cur_index - 1; + + if (disable_step) + cur_index = i; + else if (i >= step && ((i - step) >= 0)) + cur_index = i - step; + else { + disable_step = true; + cur_index = i; + } + }; + } + + return -ERANGE; +} + +/* + * ssdfs_maptbl_cache_get_leb2peb_pair() - get LEB2PEB pair + * @kaddr: pointer on fragment's beginning + * @item_index: index of item in the fragment + * @pair: pointer on requested LEB2PEB pair [out] + * + * This method tries to prepare pointer on the requested + * LEB2PEB pair in the fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + */ +static +int ssdfs_maptbl_cache_get_leb2peb_pair(void *kaddr, u16 item_index, + struct ssdfs_leb2peb_pair **pair) +{ + struct ssdfs_maptbl_cache_header *hdr; + struct ssdfs_leb2peb_pair *start = NULL; + size_t pair_size = sizeof(struct ssdfs_leb2peb_pair); + u16 items_count; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!kaddr || !pair); + + SSDFS_DBG("kaddr %p, item_index %u, pair %p\n", + kaddr, item_index, pair); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + items_count = le16_to_cpu(hdr->items_count); + + if (item_index >= items_count) { + SSDFS_ERR("item_index %u >= items_count %u\n", + item_index, items_count); + return -EINVAL; + } + + start = LEB2PEB_PAIR_AREA(kaddr); + + *pair = (struct ssdfs_leb2peb_pair *)((u8 *)start + + (pair_size * item_index)); + + return 0; +} + +/* + * ssdfs_maptbl_cache_get_peb_state() - get PEB state + * @kaddr: pointer on fragment's beginning + * @item_index: index of item in the fragment + * @ptr: pointer on requested PEB state [out] + * + * This method tries to prepare pointer on the requested + * PEB state in the fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_cache_get_peb_state(void *kaddr, u16 item_index, + struct ssdfs_maptbl_cache_peb_state **ptr) +{ + struct ssdfs_maptbl_cache_header *hdr; + struct ssdfs_maptbl_cache_peb_state *start = NULL; + size_t peb_state_size = sizeof(struct ssdfs_maptbl_cache_peb_state); + u16 items_count; + u32 area_offset = U32_MAX; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!kaddr || !ptr); + + SSDFS_DBG("kaddr %p, item_index %u, ptr %p\n", + kaddr, item_index, ptr); + + SSDFS_DBG("PAGE DUMP\n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, PAGE_SIZE); + SSDFS_DBG("\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + items_count = le16_to_cpu(hdr->items_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("CACHE HEADER\n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, 32); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (item_index >= items_count) { + SSDFS_ERR("item_index %u >= items_count %u\n", + item_index, items_count); + return -EINVAL; + } + + start = FIRST_PEB_STATE(kaddr, &area_offset); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PEB STATE START\n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + start, 32); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (IS_ERR_OR_NULL(start)) { + err = start == NULL ? -ERANGE : PTR_ERR(start); + SSDFS_ERR("fail to get area's start pointer: " + "err %d\n", err); + return err; + } + + *ptr = (struct ssdfs_maptbl_cache_peb_state *)((u8 *)start + + (peb_state_size * item_index)); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("MODIFIED ITEM\n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + *ptr, 32); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_maptbl_cache_find_leb() - find LEB ID inside maptbl cache's fragment + * @cache: maptbl cache object + * @leb_id: LEB ID + * @res: pointer on the extracted pair of descriptors [out] + * @pebr: description of PEBs relation [out] + * + * This method tries to find LEB/PEB pair for requested LEB ID + * inside of fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EFAULT - cache doesn't contain LEB/PEB pair. + * %-ENODATA - try to search in the next fragment. + * %-EAGAIN - try to search the relation LEB/PEB pair in the next page. + */ +static +int ssdfs_maptbl_cache_find_leb(struct ssdfs_maptbl_cache *cache, + u64 leb_id, + struct ssdfs_maptbl_cache_search_result *res, + struct ssdfs_maptbl_peb_relation *pebr) +{ + struct ssdfs_maptbl_cache_peb_state *peb_state = NULL; + struct page *page; + unsigned page_index; + u16 item_index; + struct ssdfs_leb2peb_pair *found; + void *kaddr; + u64 peb_id = U64_MAX; + unsigned i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cache || !res || !pebr); + BUG_ON(!rwsem_is_locked(&cache->lock)); + + SSDFS_DBG("cache %p, leb_id %llu, res %p, pebr %p\n", + cache, leb_id, res, pebr); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset(res, 0xFF, sizeof(struct ssdfs_maptbl_cache_search_result)); + res->pebs[SSDFS_MAPTBL_MAIN_INDEX].state = + SSDFS_MAPTBL_CACHE_ITEM_UNKNOWN; + res->pebs[SSDFS_MAPTBL_RELATION_INDEX].state = + SSDFS_MAPTBL_CACHE_ITEM_UNKNOWN; + + memset(pebr, 0xFF, sizeof(struct ssdfs_maptbl_peb_relation)); + + for (i = 0; i < pagevec_count(&cache->pvec); i++) { + page = cache->pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + err = __ssdfs_maptbl_cache_find_leb(kaddr, i, leb_id, res); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("leb_id %llu, page_index %u, err %d\n", + leb_id, i, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (err == -ENODATA || err == -E2BIG) + continue; + else if (err == -EAGAIN) + continue; + else if (err == -EFAULT) { + err = -ENODATA; + goto finish_leb_id_search; + } else if (err == -EEXIST) { + err = 0; + break; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find LEB: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_leb_id_search; + } + } + + if (err == -ENODATA || err == -E2BIG) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find: leb_id %llu\n", leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + err = -ENODATA; + goto finish_leb_id_search; + } + + for (i = SSDFS_MAPTBL_MAIN_INDEX; i < SSDFS_MAPTBL_RELATION_MAX; i++) { + switch (res->pebs[i].state) { + case SSDFS_MAPTBL_CACHE_ITEM_FOUND: + page_index = res->pebs[i].page_index; + item_index = res->pebs[i].item_index; + found = &res->pebs[i].found; + + if (page_index >= pagevec_count(&cache->pvec)) { + err = -ERANGE; + SSDFS_ERR("invalid page index %u\n", + page_index); + goto finish_leb_id_search; + } + + page = cache->pvec.pages[page_index]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + err = ssdfs_maptbl_cache_get_peb_state(kaddr, + item_index, + &peb_state); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + if (unlikely(err)) { + SSDFS_ERR("fail to get peb state: " + "item_index %u, err %d\n", + item_index, err); + goto finish_leb_id_search; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!peb_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (le64_to_cpu(found->leb_id) != leb_id) { + err = -ERANGE; + SSDFS_ERR("leb_id1 %llu != leb_id2 %llu\n", + le64_to_cpu(found->leb_id), + leb_id); + goto finish_leb_id_search; + } + + peb_id = le64_to_cpu(found->peb_id); + + pebr->pebs[i].peb_id = peb_id; + pebr->pebs[i].shared_peb_index = + peb_state->shared_peb_index; + pebr->pebs[i].type = + SSDFS_MAPTBL_UNKNOWN_PEB_TYPE; + pebr->pebs[i].state = peb_state->state; + pebr->pebs[i].flags = peb_state->flags; + pebr->pebs[i].consistency = peb_state->consistency; + break; + + case SSDFS_MAPTBL_CACHE_ITEM_ABSENT: + continue; + + default: + err = -ERANGE; + SSDFS_ERR("search failure: " + "leb_id %llu, index %u, state %#x\n", + leb_id, i, res->pebs[i].state); + goto finish_leb_id_search; + } + } + +finish_leb_id_search: + return err; +} + +/* + * ssdfs_maptbl_cache_convert_leb2peb_nolock() - cache-based LEB/PEB conversion + * @cache: maptbl cache object + * @leb_id: LEB ID number + * @pebr: description of PEBs relation [out] + * + * This method tries to convert LEB ID into PEB ID on the basis of + * mapping table's cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENODATA - LEB doesn't mapped to PEB yet. + */ +int ssdfs_maptbl_cache_convert_leb2peb_nolock(struct ssdfs_maptbl_cache *cache, + u64 leb_id, + struct ssdfs_maptbl_peb_relation *pebr) +{ + struct ssdfs_maptbl_cache_search_result res; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cache || !pebr); + BUG_ON(atomic_read(&cache->bytes_count) == 0); + BUG_ON(pagevec_count(&cache->pvec) == 0); + BUG_ON(atomic_read(&cache->bytes_count) > + (pagevec_count(&cache->pvec) * PAGE_SIZE)); + BUG_ON(atomic_read(&cache->bytes_count) <= + ((pagevec_count(&cache->pvec) - 1) * PAGE_SIZE)); + BUG_ON(!rwsem_is_locked(&cache->lock)); + + SSDFS_DBG("cache %p, leb_id %llu, pebr %p\n", + cache, leb_id, pebr); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_maptbl_cache_find_leb(cache, leb_id, &res, pebr); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to convert leb %llu to peb\n", + leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to convert leb %llu to peb: " + "err %d\n", + leb_id, err); + return err; + } + + return 0; +} + +/* + * __ssdfs_maptbl_cache_convert_leb2peb() - cache-based LEB/PEB conversion + * @cache: maptbl cache object + * @leb_id: LEB ID number + * @pebr: description of PEBs relation [out] + * + * This method tries to convert LEB ID into PEB ID on the basis of + * mapping table's cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENODATA - LEB doesn't mapped to PEB yet. + */ +int __ssdfs_maptbl_cache_convert_leb2peb(struct ssdfs_maptbl_cache *cache, + u64 leb_id, + struct ssdfs_maptbl_peb_relation *pebr) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cache || !pebr); + BUG_ON(atomic_read(&cache->bytes_count) == 0); + BUG_ON(pagevec_count(&cache->pvec) == 0); + BUG_ON(atomic_read(&cache->bytes_count) > + (pagevec_count(&cache->pvec) * PAGE_SIZE)); + BUG_ON(atomic_read(&cache->bytes_count) <= + ((pagevec_count(&cache->pvec) - 1) * PAGE_SIZE)); + + SSDFS_DBG("cache %p, leb_id %llu, pebr %p\n", + cache, leb_id, pebr); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&cache->lock); + err = ssdfs_maptbl_cache_convert_leb2peb_nolock(cache, leb_id, pebr); + up_read(&cache->lock); + + return err; +} + +/* + * ssdfs_maptbl_cache_convert_leb2peb() - maptbl cache-based LEB/PEB conversion + * @cache: maptbl cache object + * @leb_id: LEB ID number + * @pebr: description of PEBs relation [out] + * + * This method tries to convert LEB ID into PEB ID on the basis of + * mapping table's cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENODATA - LEB doesn't mapped to PEB yet. + */ +int ssdfs_maptbl_cache_convert_leb2peb(struct ssdfs_maptbl_cache *cache, + u64 leb_id, + struct ssdfs_maptbl_peb_relation *pebr) +{ + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cache || !pebr); + BUG_ON(atomic_read(&cache->bytes_count) == 0); + BUG_ON(pagevec_count(&cache->pvec) == 0); + BUG_ON(atomic_read(&cache->bytes_count) > + (pagevec_count(&cache->pvec) * PAGE_SIZE)); + BUG_ON(atomic_read(&cache->bytes_count) <= + ((pagevec_count(&cache->pvec) - 1) * PAGE_SIZE)); + + SSDFS_DBG("cache %p, leb_id %llu, pebr %p\n", + cache, leb_id, pebr); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_maptbl_cache_convert_leb2peb(cache, leb_id, pebr); + if (unlikely(err)) + goto finish_leb2peb_conversion; + + for (i = SSDFS_MAPTBL_MAIN_INDEX; i < SSDFS_MAPTBL_RELATION_MAX; i++) { + struct ssdfs_peb_mapping_info *pmi = NULL; + int consistency = pebr->pebs[i].consistency; + u64 peb_id = pebr->pebs[i].peb_id; + + switch (consistency) { + case SSDFS_PEB_STATE_INCONSISTENT: + case SSDFS_PEB_STATE_PRE_DELETED: + pmi = ssdfs_peb_mapping_info_alloc(); + if (IS_ERR_OR_NULL(pmi)) { + err = !pmi ? -ENOMEM : PTR_ERR(pmi); + SSDFS_ERR("fail to alloc PEB mapping info: " + "leb_id %llu, err %d\n", + leb_id, err); + goto finish_leb2peb_conversion; + } + + ssdfs_peb_mapping_info_init(leb_id, peb_id, + consistency, pmi); + ssdfs_peb_mapping_queue_add_tail(&cache->pm_queue, pmi); + break; + } + } + + switch (pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].consistency) { + case SSDFS_PEB_STATE_PRE_DELETED: + ssdfs_memcpy(&pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX], + 0, sizeof(struct ssdfs_maptbl_peb_descriptor), + &pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX], + 0, sizeof(struct ssdfs_maptbl_peb_descriptor), + sizeof(struct ssdfs_maptbl_peb_descriptor)); + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].peb_id = U64_MAX; + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].shared_peb_index = + U8_MAX; + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].type = + SSDFS_MAPTBL_UNKNOWN_PEB_TYPE; + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].state = U8_MAX; + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].flags = 0; + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].consistency = + SSDFS_PEB_STATE_UNKNOWN; + break; + + default: + /* do nothing */ + break; + } + +finish_leb2peb_conversion: + return err; +} diff --git a/fs/ssdfs/peb_mapping_table_cache.h b/fs/ssdfs/peb_mapping_table_cache.h new file mode 100644 index 000000000000..803cfb8447a5 --- /dev/null +++ b/fs/ssdfs/peb_mapping_table_cache.h @@ -0,0 +1,119 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/peb_mapping_table_cache.h - PEB mapping table cache declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#ifndef _SSDFS_PEB_MAPPING_TABLE_CACHE_H +#define _SSDFS_PEB_MAPPING_TABLE_CACHE_H + +#include + +/* + * struct ssdfs_maptbl_cache - maptbl cache + * @lock: lock of maptbl cache + * @pvec: memory pages of maptbl cache + * @bytes_count: count of bytes in maptbl cache + * @pm_queue: PEB mappings queue + */ +struct ssdfs_maptbl_cache { + struct rw_semaphore lock; + struct pagevec pvec; + atomic_t bytes_count; + + struct ssdfs_peb_mapping_queue pm_queue; +}; + +/* + * struct ssdfs_maptbl_cache_item - cache item descriptor + * @page_index: index of the found memory page + * @item_index: item of found index + * @found: found LEB2PEB pair + */ +struct ssdfs_maptbl_cache_item { +#define SSDFS_MAPTBL_CACHE_ITEM_UNKNOWN (0) +#define SSDFS_MAPTBL_CACHE_ITEM_FOUND (1) +#define SSDFS_MAPTBL_CACHE_ITEM_ABSENT (2) +#define SSDFS_MAPTBL_CACHE_SEARCH_ERROR (3) +#define SSDFS_MAPTBL_CACHE_SEARCH_MAX (4) + int state; + unsigned page_index; + u16 item_index; + struct ssdfs_leb2peb_pair found; +}; + +#define SSDFS_MAPTBL_MAIN_INDEX (0) +#define SSDFS_MAPTBL_RELATION_INDEX (1) +#define SSDFS_MAPTBL_RELATION_MAX (2) + +/* + * struct ssdfs_maptbl_cache_search_result - PEBs association + * @pebs: array of PEB descriptors + */ +struct ssdfs_maptbl_cache_search_result { + struct ssdfs_maptbl_cache_item pebs[SSDFS_MAPTBL_RELATION_MAX]; +}; + +struct ssdfs_maptbl_peb_relation; + +/* + * PEB mapping table cache's API + */ +void ssdfs_maptbl_cache_init(struct ssdfs_maptbl_cache *cache); +void ssdfs_maptbl_cache_destroy(struct ssdfs_maptbl_cache *cache); + +int ssdfs_maptbl_cache_convert_leb2peb(struct ssdfs_maptbl_cache *cache, + u64 leb_id, + struct ssdfs_maptbl_peb_relation *pebr); +int ssdfs_maptbl_cache_map_leb2peb(struct ssdfs_maptbl_cache *cache, + u64 leb_id, + struct ssdfs_maptbl_peb_relation *pebr, + int consistency); +int ssdfs_maptbl_cache_forget_leb2peb(struct ssdfs_maptbl_cache *cache, + u64 leb_id, + int consistency); +int ssdfs_maptbl_cache_change_peb_state(struct ssdfs_maptbl_cache *cache, + u64 leb_id, int peb_state, + int consistency); +int ssdfs_maptbl_cache_add_migration_peb(struct ssdfs_maptbl_cache *cache, + u64 leb_id, + struct ssdfs_maptbl_peb_relation *pebr, + int consistency); +int ssdfs_maptbl_cache_exclude_migration_peb(struct ssdfs_maptbl_cache *cache, + u64 leb_id, + int consistency); + +/* + * PEB mapping table cache's internal API + */ +struct page * +ssdfs_maptbl_cache_add_pagevec_page(struct ssdfs_maptbl_cache *cache); +int ssdfs_maptbl_cache_convert_leb2peb_nolock(struct ssdfs_maptbl_cache *cache, + u64 leb_id, + struct ssdfs_maptbl_peb_relation *pebr); +int __ssdfs_maptbl_cache_convert_leb2peb(struct ssdfs_maptbl_cache *cache, + u64 leb_id, + struct ssdfs_maptbl_peb_relation *pebr); +int ssdfs_maptbl_cache_change_peb_state_nolock(struct ssdfs_maptbl_cache *cache, + u64 leb_id, int peb_state, + int consistency); +int ssdfs_maptbl_cache_forget_leb2peb_nolock(struct ssdfs_maptbl_cache *cache, + u64 leb_id, + int consistency); + +#endif /* _SSDFS_PEB_MAPPING_TABLE_CACHE_H */ From patchwork Sat Feb 25 01:08:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151948 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB576C7EE2E for ; Sat, 25 Feb 2023 01:19:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229790AbjBYBTL (ORCPT ); Fri, 24 Feb 2023 20:19:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49120 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229727AbjBYBRO (ORCPT ); Fri, 24 Feb 2023 20:17:14 -0500 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 602DA16895 for ; Fri, 24 Feb 2023 17:17:09 -0800 (PST) Received: by mail-oi1-x22b.google.com with SMTP id t22so778214oiw.12 for ; Fri, 24 Feb 2023 17:17:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=bxqfg0pAZ3cOAY4/hBS56hwzdP+psN8rKZhDrbnMzbU=; b=fe2e0GIiMubAuMVg9+ZTMUdqeLJ2hmR4k28L2WHxqKlOd//SywMThwhRHjuqDkjAXA joS/lZuqv++bmJMysqMg0CZjvH7D336GWwcOgS8TJi4pwmfMdp9wsxSAWjKqCM2Ysvn/ 8D8ylLOmOu4t+1tBrjXFi95YtSiqKmvazuwayTPnODI8wzbYXiBrfxDDtavJ/GifEBNR HxcFUhLikphgHYfG3O63X7BWqFDo8EN6lD40/TzPjLFz2LlOdntZDl8Q2F/Y0LjOArm4 r3/pNfDYGUvyt5km5ryFsAOkmnmaWiXzM7Gir+LWL7NhQgbPbRZsnicUvS4+Y72I7/WV nd/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bxqfg0pAZ3cOAY4/hBS56hwzdP+psN8rKZhDrbnMzbU=; b=urvaE7s/o3PxAIEgqotXx/8AnBINEAJWPsWOs7HMOp2KdledkzK9EzLa5PVqcVjMVF Nwzb1fWTzlmupSigQIriUCTpR/fXJ3lNTBdpQ4ZWb+DVOv+oxCZL3W1aaiz5JK0A6OvJ Ds7NRlcFmVOuW4tISv5WReY4jhK7I6b868hbHIT743y0DGFWrPlGLg57FjMRfUAoLAI/ QDnxj+7hMqFJXtMlaD1XQdeWwtGzhoDxkI5awCGwtwMdTHaXBgqoFV89fwuIJh0rlXR6 HXO7hqBHIpvQt9rt+ZKRmDUM12wmmKVBTDLr0HMjW2GtDDIIo4JEzna4oWU9O5ickb9I sKFQ== X-Gm-Message-State: AO0yUKUQCAZnsFPoiXFcYvfx5TYVZn6Nyr868jh+ngkx+AjbjMnc9FIg NQJECdmcnLHg/IFmj88WvtrD0XbJYa5CqQdW X-Google-Smtp-Source: AK7set87mBJZOKDZ7xeY5lq9X2WoFn7LtZgS7U1RIKk/FZbfh1XMKFnH/S2yZy/bTLlMU+NcsWdhyA== X-Received: by 2002:aca:d06:0:b0:36e:b7bf:e3d7 with SMTP id 6-20020aca0d06000000b0036eb7bfe3d7mr4538413oin.52.1677287828052; Fri, 24 Feb 2023 17:17:08 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:07 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 44/76] ssdfs: PEB mapping table cache's modification operations Date: Fri, 24 Feb 2023 17:08:55 -0800 Message-Id: <20230225010927.813929-45-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org "Physical" Erase Block (PEB) mapping table cache supports operations: (1) convert LEB to PEB - convert LEB to PEB if mapping table is not initialized yet (2) map LEB to PEB - cache information about LEB to PEB mapping (3) forget LEB to PEB - exclude information about LEB to PEB mapping from the cache (4) change PEB state - update cached information about LEB to PEB mapping (5) add migration PEB - cache information about migration destination (6) exclude migration PEB - exclude information about migration destination Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/peb_mapping_table_cache.c | 3205 ++++++++++++++++++++++++++++ 1 file changed, 3205 insertions(+) diff --git a/fs/ssdfs/peb_mapping_table_cache.c b/fs/ssdfs/peb_mapping_table_cache.c index e83e07947743..4242d4a5f9ac 100644 --- a/fs/ssdfs/peb_mapping_table_cache.c +++ b/fs/ssdfs/peb_mapping_table_cache.c @@ -1495,3 +1495,3208 @@ int ssdfs_maptbl_cache_convert_leb2peb(struct ssdfs_maptbl_cache *cache, finish_leb2peb_conversion: return err; } + +/* + * ssdfs_maptbl_cache_init_page() - init page of maptbl cache + * @kaddr: pointer on maptbl cache's fragment + * @sequence_id: fragment's sequence ID number + * + * This method initialize empty maptbl cache fragment's page. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + */ +static +int ssdfs_maptbl_cache_init_page(void *kaddr, unsigned sequence_id) +{ + struct ssdfs_maptbl_cache_header *hdr; + size_t hdr_size = sizeof(struct ssdfs_maptbl_cache_header); + size_t peb_state_size = sizeof(struct ssdfs_maptbl_cache_peb_state); + size_t magic_size = peb_state_size; + size_t threshold_size = hdr_size + magic_size; + __le32 *magic; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!kaddr); + + SSDFS_DBG("kaddr %p, sequence_id %u\n", + kaddr, sequence_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (sequence_id >= PAGEVEC_SIZE) { + SSDFS_ERR("invalid sequence_id %u\n", + sequence_id); + return -EINVAL; + } + + memset(kaddr, 0, PAGE_SIZE); + + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + + hdr->magic.common = cpu_to_le32(SSDFS_SUPER_MAGIC); + hdr->magic.key = cpu_to_le16(SSDFS_MAPTBL_CACHE_MAGIC); + hdr->magic.version.major = SSDFS_MAJOR_REVISION; + hdr->magic.version.minor = SSDFS_MINOR_REVISION; + + hdr->sequence_id = cpu_to_le16((u16)sequence_id); + hdr->items_count = 0; + hdr->bytes_count = cpu_to_le16((u16)threshold_size); + + hdr->start_leb = cpu_to_le64(U64_MAX); + hdr->end_leb = cpu_to_le64(U64_MAX); + + magic = (__le32 *)((u8 *)kaddr + hdr_size); + *magic = cpu_to_le32(SSDFS_MAPTBL_CACHE_PEB_STATE_MAGIC); + + return 0; +} + +/* + * ssdfs_shift_right_peb_state_area() - shift the whole PEB state area + * @kaddr: pointer on maptbl cache's fragment + * @shift: size of shift in bytes + * + * This method tries to shift the whole PEB state area + * to the right in the fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static inline +int ssdfs_shift_right_peb_state_area(void *kaddr, size_t shift) +{ + struct ssdfs_maptbl_cache_header *hdr; + void *area = NULL; + size_t pair_size = sizeof(struct ssdfs_leb2peb_pair); + size_t peb_state_size = sizeof(struct ssdfs_maptbl_cache_peb_state); + size_t diff_count; + int area_size; + u32 area_offset = U32_MAX; + size_t bytes_count, new_bytes_count; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!kaddr); + + SSDFS_DBG("kaddr %p, shift %zu\n", kaddr, shift); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (shift % pair_size) { + SSDFS_ERR("invalid request: " + "shift %zu, pair_size %zu\n", + shift, pair_size); + return -ERANGE; + } + + diff_count = shift / pair_size; + + if (diff_count == 0) { + SSDFS_ERR("invalid diff_count %zu\n", diff_count); + return -ERANGE; + } + + area = PEB_STATE_AREA(kaddr, &area_offset); + if (IS_ERR_OR_NULL(area)) { + err = !area ? PTR_ERR(area) : -ERANGE; + SSDFS_ERR("fail to get the PEB state area: " + "err %d\n", err); + return err; + } + + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + bytes_count = le16_to_cpu(hdr->bytes_count); + + area_size = ssdfs_peb_state_area_size(hdr); + if (area_size < 0) { + err = area_size; + SSDFS_ERR("fail to calculate PEB state area's size: " + "err %d\n", err); + return err; + } else if (area_size == 0) { + SSDFS_ERR("invalid PEB state area's size %d\n", + area_size); + return -ERANGE; + } + + new_bytes_count = bytes_count; + new_bytes_count += diff_count * pair_size; + new_bytes_count += diff_count * peb_state_size; + + if (new_bytes_count > PAGE_SIZE) { + SSDFS_ERR("shift is out of memory page: " + "new_bytes_count %zu, shift %zu\n", + new_bytes_count, shift); + return -ERANGE; + } + + err = ssdfs_memmove(area, shift, PAGE_SIZE - area_offset, + area, 0, PAGE_SIZE - area_offset, + area_size); + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", err); + return err; + } + + hdr->bytes_count = cpu_to_le16((u16)new_bytes_count); + + return 0; +} + +/* + * ssdfs_shift_left_peb_state_area() - shift the whole PEB state area + * @kaddr: pointer on maptbl cache's fragment + * @shift: size of shift in bytes + * + * This method tries to shift the whole PEB state area + * to the left in the fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static inline +int ssdfs_shift_left_peb_state_area(void *kaddr, size_t shift) +{ + struct ssdfs_maptbl_cache_header *hdr; + void *area = NULL; + size_t hdr_size = sizeof(struct ssdfs_maptbl_cache_header); + size_t pair_size = sizeof(struct ssdfs_leb2peb_pair); + size_t peb_state_size = sizeof(struct ssdfs_maptbl_cache_peb_state); + size_t magic_size = peb_state_size; + size_t threshold_size = hdr_size + magic_size; + size_t diff_count; + int area_size; + u32 area_offset = U32_MAX; + size_t bytes_count; + size_t calculated; + size_t new_bytes_count; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!kaddr); + + SSDFS_DBG("kaddr %p, shift %zu\n", kaddr, shift); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (shift % pair_size) { + SSDFS_ERR("invalid request: " + "shift %zu, pair_size %zu\n", + shift, pair_size); + return -ERANGE; + } + + diff_count = shift / pair_size; + + if (diff_count == 0) { + SSDFS_ERR("invalid diff_count %zu\n", diff_count); + return -ERANGE; + } + + area = PEB_STATE_AREA(kaddr, &area_offset); + + if (IS_ERR_OR_NULL(area)) { + err = !area ? PTR_ERR(area) : -ERANGE; + SSDFS_ERR("fail to get the PEB state area: " + "err %d\n", err); + return err; + } + + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + bytes_count = le16_to_cpu(hdr->bytes_count); + + area_size = ssdfs_peb_state_area_size(hdr); + if (area_size < 0) { + err = area_size; + SSDFS_ERR("fail to calculate PEB state area's size: " + "err %d\n", err); + return err; + } else if (area_size == 0) { + SSDFS_ERR("invalid PEB state area's size %d\n", + area_size); + return -ERANGE; + } + + new_bytes_count = bytes_count; + + calculated = diff_count * pair_size; + if (new_bytes_count <= calculated) { + SSDFS_ERR("invalid diff_count %zu\n", + diff_count); + return -ERANGE; + } + + new_bytes_count -= calculated; + + calculated = diff_count * peb_state_size; + + if (new_bytes_count <= calculated) { + SSDFS_ERR("invalid diff_count %zu\n", + diff_count); + return -ERANGE; + } + + new_bytes_count -= calculated; + + if (new_bytes_count < threshold_size) { + SSDFS_ERR("shift is inside of header: " + "new_bytes_count %zu, threshold_size %zu\n", + new_bytes_count, threshold_size); + return -ERANGE; + } + + if ((threshold_size + shift) >= area_offset) { + SSDFS_ERR("invalid shift: " + "threshold_size %zu, shift %zu, " + "area_offset %u\n", + threshold_size, shift, area_offset); + return -ERANGE; + } + + err = ssdfs_memmove((u8 *)area - shift, 0, PAGE_SIZE - area_offset, + area, 0, PAGE_SIZE - area_offset, + area_size); + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", err); + return err; + } + + hdr->bytes_count = cpu_to_le16((u16)new_bytes_count); + + return 0; +} + +/* + * ssdfs_maptbl_cache_add_leb() - add LEB/PEB pair into maptbl cache + * @kaddr: pointer on maptbl cache's fragment + * @item_index: index of item in the fragment + * @src_pair: inserting LEB/PEB pair + * @src_state: inserting PEB state + * + * This method tries to insert LEB/PEB pair and PEB state + * into the maptbl cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_cache_add_leb(void *kaddr, u16 item_index, + struct ssdfs_leb2peb_pair *src_pair, + struct ssdfs_maptbl_cache_peb_state *src_state) +{ + struct ssdfs_maptbl_cache_header *hdr; + struct ssdfs_leb2peb_pair *dest_pair; + size_t pair_size = sizeof(struct ssdfs_leb2peb_pair); + struct ssdfs_maptbl_cache_peb_state *dest_state; + size_t peb_state_size = sizeof(struct ssdfs_maptbl_cache_peb_state); + u16 items_count; + u32 area_offset = U32_MAX; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!kaddr || !src_pair || !src_state); + + SSDFS_DBG("kaddr %p, item_index %u, " + "leb_id %llu, peb_id %llu\n", + kaddr, item_index, + le64_to_cpu(src_pair->leb_id), + le64_to_cpu(src_pair->peb_id)); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + + items_count = le16_to_cpu(hdr->items_count); + + if (item_index != items_count) { + SSDFS_ERR("item_index %u != items_count %u\n", + item_index, items_count); + return -EINVAL; + } + + err = ssdfs_shift_right_peb_state_area(kaddr, pair_size); + if (unlikely(err)) { + SSDFS_ERR("fail to shift the PEB state area: " + "err %d\n", err); + return err; + } + + dest_pair = LEB2PEB_PAIR_AREA(kaddr); + dest_pair += item_index; + + ssdfs_memcpy(dest_pair, 0, pair_size, + src_pair, 0, pair_size, + pair_size); + + dest_state = FIRST_PEB_STATE(kaddr, &area_offset); + if (IS_ERR_OR_NULL(dest_state)) { + err = !dest_state ? PTR_ERR(dest_state) : -ERANGE; + SSDFS_ERR("fail to get the PEB state area: " + "err %d\n", err); + return err; + } + + dest_state += item_index; + + ssdfs_memcpy(dest_state, 0, peb_state_size, + src_state, 0, peb_state_size, + peb_state_size); + + items_count++; + hdr->items_count = cpu_to_le16(items_count); + + if (item_index == 0) + hdr->start_leb = src_pair->leb_id; + + if ((item_index + 1) == items_count) + hdr->end_leb = src_pair->leb_id; + + return 0; +} + +struct page * +ssdfs_maptbl_cache_add_pagevec_page(struct ssdfs_maptbl_cache *cache) +{ + struct page *page; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cache); + + SSDFS_DBG("cache %p\n", cache); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = ssdfs_map_cache_add_pagevec_page(&cache->pvec); + if (unlikely(IS_ERR_OR_NULL(page))) { + err = !page ? -ENOMEM : PTR_ERR(page); + SSDFS_ERR("fail to add pagevec page: err %d\n", + err); + } + + return page; +} + +/* + * ssdfs_maptbl_cache_add_page() - add fragment into maptbl cache + * @cache: maptbl cache object + * @pair: adding LEB/PEB pair + * @state: adding PEB state + * + * This method tries to add fragment into maptbl cache, + * initialize it and insert LEB/PEB pair + PEB state. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOMEM - fail to add empty page into maptbl cache. + */ +static +int ssdfs_maptbl_cache_add_page(struct ssdfs_maptbl_cache *cache, + struct ssdfs_leb2peb_pair *pair, + struct ssdfs_maptbl_cache_peb_state *state) +{ + struct page *page; + void *kaddr; + u16 item_index; + unsigned page_index; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cache || !pair || !state); + + SSDFS_DBG("cache %p, leb_id %llu, peb_id %llu\n", + cache, le64_to_cpu(pair->leb_id), + le64_to_cpu(pair->peb_id)); +#endif /* CONFIG_SSDFS_DEBUG */ + + item_index = 0; + page_index = pagevec_count(&cache->pvec); + + page = ssdfs_map_cache_add_pagevec_page(&cache->pvec); + if (unlikely(IS_ERR_OR_NULL(page))) { + err = !page ? -ENOMEM : PTR_ERR(page); + SSDFS_ERR("fail to add pagevec page: err %d\n", + err); + return err; + } + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + + err = ssdfs_maptbl_cache_init_page(kaddr, page_index); + if (unlikely(err)) { + SSDFS_ERR("fail to init maptbl cache's page: " + "page_index %u, err %d\n", + page_index, err); + goto finish_add_page; + } + + atomic_add(PAGE_SIZE, &cache->bytes_count); + + err = ssdfs_maptbl_cache_add_leb(kaddr, item_index, pair, state); + if (unlikely(err)) { + SSDFS_ERR("fail to add leb_id: " + "page_index %u, item_index %u, err %d\n", + page_index, item_index, err); + goto finish_add_page; + } + +finish_add_page: + flush_dcache_page(page); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + return err; +} + +/* + * is_fragment_full() - check that fragment is full + * @kaddr: pointer on maptbl cache's fragment + */ +static inline +bool is_fragment_full(void *kaddr) +{ + struct ssdfs_maptbl_cache_header *hdr; + size_t pair_size = sizeof(struct ssdfs_leb2peb_pair); + size_t peb_state_size = sizeof(struct ssdfs_maptbl_cache_peb_state); + size_t bytes_count; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!kaddr); + + SSDFS_DBG("kaddr %p\n", kaddr); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + + bytes_count = le16_to_cpu(hdr->bytes_count); + bytes_count += pair_size + peb_state_size; + + return bytes_count > PAGE_SIZE; +} + +/* + * ssdfs_maptbl_cache_get_last_item() - get last item of the fragment + * @kaddr: pointer on maptbl cache's fragment + * @pair: pointer on LEB2PEB pair's buffer [out] + * @state: pointer on PEB state's buffer [out] + * + * This method tries to extract the last item + * (LEB2PEB pair + PEB state) from the fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - empty maptbl cache's page. + */ +static +int ssdfs_maptbl_cache_get_last_item(void *kaddr, + struct ssdfs_leb2peb_pair *pair, + struct ssdfs_maptbl_cache_peb_state *state) +{ + struct ssdfs_maptbl_cache_header *hdr; + struct ssdfs_leb2peb_pair *found_pair = NULL; + struct ssdfs_maptbl_cache_peb_state *found_state = NULL; + size_t pair_size = sizeof(struct ssdfs_leb2peb_pair); + size_t peb_state_size = sizeof(struct ssdfs_maptbl_cache_peb_state); + u16 items_count; + u32 area_offset = U32_MAX; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!kaddr || !pair || !state); + + SSDFS_DBG("kaddr %p, pair %p, peb_state %p\n", + kaddr, pair, state); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + + items_count = le16_to_cpu(hdr->items_count); + + if (items_count == 0) { + SSDFS_ERR("empty maptbl cache's page\n"); + return -ENODATA; + } + + found_pair = LEB2PEB_PAIR_AREA(kaddr); + found_pair += items_count - 1; + + ssdfs_memcpy(pair, 0, pair_size, + found_pair, 0, pair_size, + pair_size); + + found_state = FIRST_PEB_STATE(kaddr, &area_offset); + if (IS_ERR_OR_NULL(found_state)) { + err = !found_state ? PTR_ERR(found_state) : -ERANGE; + SSDFS_ERR("fail to get the PEB state area: " + "err %d\n", err); + return err; + } + + found_state += items_count - 1; + ssdfs_memcpy(state, 0, peb_state_size, + found_state, 0, peb_state_size, + peb_state_size); + + return 0; +} + +/* + * ssdfs_maptbl_cache_move_right_leb2peb_pairs() - move LEB2PEB pairs + * @kaddr: pointer on maptbl cache's fragment + * @item_index: starting index + * + * This method tries to move LEB2PEB pairs to the right + * starting from @item_index. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_cache_move_right_leb2peb_pairs(void *kaddr, + u16 item_index) +{ + struct ssdfs_maptbl_cache_header *hdr; + struct ssdfs_leb2peb_pair *area; + size_t hdr_size = sizeof(struct ssdfs_maptbl_cache_header); + size_t pair_size = sizeof(struct ssdfs_leb2peb_pair); + u16 items_count; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!kaddr); + + SSDFS_DBG("kaddr %p, item_index %u\n", + kaddr, item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + + items_count = le16_to_cpu(hdr->items_count); + + if (items_count == 0) { + SSDFS_ERR("empty maptbl cache page\n"); + return -ERANGE; + } + + if (item_index >= items_count) { + SSDFS_ERR("item_index %u > items_count %u\n", + item_index, items_count); + return -EINVAL; + } + + area = LEB2PEB_PAIR_AREA(kaddr); + err = ssdfs_memmove(area, + (item_index + 1) * pair_size, + PAGE_SIZE - hdr_size, + area, + item_index * pair_size, + PAGE_SIZE - hdr_size, + (items_count - item_index) * pair_size); + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", err); + return err; + } + + return 0; +} + +/* + * ssdfs_maptbl_cache_move_right_peb_states() - move PEB states + * @kaddr: pointer on maptbl cache's fragment + * @item_index: starting index + * + * This method tries to move PEB states to the right + * starting from @item_index. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_cache_move_right_peb_states(void *kaddr, + u16 item_index) +{ + struct ssdfs_maptbl_cache_header *hdr; + struct ssdfs_maptbl_cache_peb_state *area; + size_t peb_state_size = sizeof(struct ssdfs_maptbl_cache_peb_state); + u16 items_count; + u32 area_offset = U32_MAX; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!kaddr); + + SSDFS_DBG("kaddr %p, item_index %u\n", + kaddr, item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + + items_count = le16_to_cpu(hdr->items_count); + + if (items_count == 0) { + SSDFS_ERR("empty maptbl cache page\n"); + return -ERANGE; + } + + if (item_index >= items_count) { + SSDFS_ERR("item_index %u > items_count %u\n", + item_index, items_count); + return -EINVAL; + } + + area = FIRST_PEB_STATE(kaddr, &area_offset); + if (IS_ERR_OR_NULL(area)) { + err = !area ? PTR_ERR(area) : -ERANGE; + SSDFS_ERR("fail to get the PEB state area: " + "err %d\n", err); + return err; + } + + err = ssdfs_memmove(area, + (item_index + 1) * peb_state_size, + PAGE_SIZE - area_offset, + area, + item_index * peb_state_size, + PAGE_SIZE - area_offset, + (items_count - item_index) * peb_state_size); + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", err); + return err; + } + + return 0; +} + +/* + * __ssdfs_maptbl_cache_insert_leb() - insert item into the fragment + * @kaddr: pointer on maptbl cache's fragment + * @item_index: starting index + * @pair: adding LEB2PEB pair + * @state: adding PEB state + * + * This method tries to insert the item (LEB2PEB pair + PEB state) + * into the fragment in @item_index position. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int __ssdfs_maptbl_cache_insert_leb(void *kaddr, u16 item_index, + struct ssdfs_leb2peb_pair *pair, + struct ssdfs_maptbl_cache_peb_state *state) +{ + struct ssdfs_maptbl_cache_header *hdr; + struct ssdfs_leb2peb_pair *dst_pair = NULL; + struct ssdfs_maptbl_cache_peb_state *dst_state = NULL; + size_t pair_size = sizeof(struct ssdfs_leb2peb_pair); + size_t peb_state_size = sizeof(struct ssdfs_maptbl_cache_peb_state); + u16 items_count; + u32 area_offset = U32_MAX; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!kaddr || !pair || !state); + + SSDFS_DBG("kaddr %p, item_index %u, pair %p, state %p\n", + kaddr, item_index, pair, state); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + + items_count = le16_to_cpu(hdr->items_count); + + if (items_count == 0) { + SSDFS_ERR("empty maptbl cache page\n"); + return -ERANGE; + } + + if (item_index >= items_count) { + SSDFS_ERR("item_index %u > items_count %u\n", + item_index, items_count); + return -EINVAL; + } + + dst_pair = LEB2PEB_PAIR_AREA(kaddr); + dst_pair += item_index; + + ssdfs_memcpy(dst_pair, 0, pair_size, + pair, 0, pair_size, + pair_size); + + dst_state = FIRST_PEB_STATE(kaddr, &area_offset); + if (IS_ERR_OR_NULL(dst_state)) { + err = !dst_state ? PTR_ERR(dst_state) : -ERANGE; + SSDFS_ERR("fail to get the PEB state area: " + "err %d\n", err); + return err; + } + + dst_state += item_index; + + ssdfs_memcpy(dst_state, 0, peb_state_size, + state, 0, peb_state_size, + peb_state_size); + + items_count++; + hdr->items_count = cpu_to_le16(items_count); + + if (item_index == 0) + hdr->start_leb = pair->leb_id; + + if ((item_index + 1) == items_count) + hdr->end_leb = pair->leb_id; + + return 0; +} + +/* + * ssdfs_maptbl_cache_remove_leb() - remove item from the fragment + * @cache: maptbl cache object + * @page_index: index of the page + * @item_index: index of the item + * + * This method tries to remove the item (LEB/PEB pair + PEB state) + * from the fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_cache_remove_leb(struct ssdfs_maptbl_cache *cache, + unsigned page_index, + u16 item_index) +{ + struct ssdfs_maptbl_cache_header *hdr; + struct ssdfs_leb2peb_pair *cur_pair; + size_t pair_size = sizeof(struct ssdfs_leb2peb_pair); + struct ssdfs_maptbl_cache_peb_state *cur_state; + struct page *page; + void *kaddr; + u16 items_count; + size_t size; + u32 area_offset = U32_MAX; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cache); + BUG_ON(page_index >= pagevec_count(&cache->pvec)); + + SSDFS_DBG("cache %p, page_index %u, item_index %u\n", + cache, page_index, item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = cache->pvec.pages[page_index]; + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, PAGE_SIZE); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + + items_count = le16_to_cpu(hdr->items_count); + + if (item_index >= items_count) { + err = -ERANGE; + SSDFS_ERR("item_index %u >= items_count %u\n", + item_index, items_count); + goto finish_remove_item; + } else if (items_count == 0) { + err = -ERANGE; + SSDFS_ERR("items_count %u\n", items_count); + goto finish_remove_item; + } + + cur_pair = LEB2PEB_PAIR_AREA(kaddr); + cur_pair += item_index; + + if ((item_index + 1) < items_count) { + size = items_count - item_index; + size *= pair_size; + + memmove(cur_pair, cur_pair + 1, size); + } + + cur_pair = LEB2PEB_PAIR_AREA(kaddr); + cur_pair += items_count - 1; + memset(cur_pair, 0xFF, pair_size); + + cur_state = FIRST_PEB_STATE(kaddr, &area_offset); + if (IS_ERR_OR_NULL(cur_state)) { + err = !cur_state ? PTR_ERR(cur_state) : -ERANGE; + SSDFS_ERR("fail to get the PEB state area: " + "err %d\n", err); + goto finish_remove_item; + } + + cur_state += item_index; + + if ((item_index + 1) < items_count) { + size = items_count - item_index; + size *= sizeof(struct ssdfs_maptbl_cache_peb_state); + + memmove(cur_state, cur_state + 1, size); + } + + cur_state = FIRST_PEB_STATE(kaddr, &area_offset); + cur_state += items_count - 1; + memset(cur_state, 0xFF, sizeof(struct ssdfs_maptbl_cache_peb_state)); + + items_count--; + hdr->items_count = cpu_to_le16(items_count); + + err = ssdfs_shift_left_peb_state_area(kaddr, pair_size); + if (unlikely(err)) { + SSDFS_ERR("fail to shift PEB state area: " + "err %d\n", err); + goto finish_remove_item; + } + + if (items_count == 0) { + hdr->start_leb = U64_MAX; + hdr->end_leb = U64_MAX; + } else { + cur_pair = LEB2PEB_PAIR_AREA(kaddr); + hdr->start_leb = cur_pair->leb_id; + + cur_pair += items_count - 1; + hdr->end_leb = cur_pair->leb_id; + } + +#ifdef CONFIG_SSDFS_DEBUG + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, PAGE_SIZE); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_remove_item: + flush_dcache_page(page); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + return err; +} + +/* + * ssdfs_check_pre_deleted_peb_state() - check pre-deleted state of the item + * @cache: maptbl cache object + * @page_index: index of the page + * @item_index: index of the item + * @pair: adding LEB2PEB pair + * + * This method tries to check that requested item for @item_index + * has the PRE-DELETED consistency. If it's true then this item + * has to be deleted. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - requested LEB is absent. + * %-ENOENT - requested LEB exists and should be saved. + */ +static +int ssdfs_check_pre_deleted_peb_state(struct ssdfs_maptbl_cache *cache, + unsigned page_index, + u16 item_index, + struct ssdfs_leb2peb_pair *pair) +{ + struct ssdfs_leb2peb_pair *cur_pair = NULL; + struct ssdfs_maptbl_cache_peb_state *cur_state = NULL; + struct page *page; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cache || !pair); + BUG_ON(le64_to_cpu(pair->leb_id) == U64_MAX); + BUG_ON(le64_to_cpu(pair->peb_id) == U64_MAX); + + SSDFS_DBG("cache %p, start_page %u, item_index %u\n", + cache, page_index, item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = cache->pvec.pages[page_index]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + + err = ssdfs_maptbl_cache_get_leb2peb_pair(kaddr, item_index, &cur_pair); + if (unlikely(err)) { + SSDFS_ERR("fail to get LEB2PEB pair: err %d\n", err); + goto finish_check_pre_deleted_state; + } + + if (le64_to_cpu(pair->leb_id) != le64_to_cpu(cur_pair->leb_id)) { + err = -ENODATA; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pair->leb_id %llu != cur_pair->leb_id %llu\n", + le64_to_cpu(pair->leb_id), + le64_to_cpu(cur_pair->leb_id)); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_check_pre_deleted_state; + } + + err = ssdfs_maptbl_cache_get_peb_state(kaddr, item_index, &cur_state); + if (unlikely(err)) { + SSDFS_ERR("fail to get PEB state: err %d\n", err); + goto finish_check_pre_deleted_state; + } + + switch (cur_state->consistency) { + case SSDFS_PEB_STATE_CONSISTENT: + case SSDFS_PEB_STATE_INCONSISTENT: + err = -ENOENT; + goto finish_check_pre_deleted_state; + + case SSDFS_PEB_STATE_PRE_DELETED: + /* continue to delete */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("unexpected PEB's state %#x\n", + cur_state->state); + goto finish_check_pre_deleted_state; + } + +finish_check_pre_deleted_state: + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + if (err) + return err; + + err = ssdfs_maptbl_cache_remove_leb(cache, + page_index, + item_index); + if (unlikely(err)) { + SSDFS_ERR("fail to delete LEB: " + "page_index %d, item_index %u, err %d\n", + page_index, item_index, err); + return err; + } + + return 0; +} + +/* + * ssdfs_maptbl_cache_insert_leb() - insert item into the fragment + * @cache: maptbl cache object + * @start_page: page index + * @item_index: index of the item + * @pair: adding LEB/PEB pair + * @state: adding PEB state + * + * This method tries to insert the item (LEB2PEB pair + PEB state) + * into the mapping table cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_maptbl_cache_insert_leb(struct ssdfs_maptbl_cache *cache, + unsigned start_page, + u16 item_index, + struct ssdfs_leb2peb_pair *pair, + struct ssdfs_maptbl_cache_peb_state *state) +{ + struct ssdfs_leb2peb_pair cur_pair, saved_pair; + size_t pair_size = sizeof(struct ssdfs_leb2peb_pair); + struct ssdfs_maptbl_cache_peb_state cur_state, saved_state; + size_t peb_state_size = sizeof(struct ssdfs_maptbl_cache_peb_state); + struct page *page; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cache || !pair || !state); + BUG_ON(le64_to_cpu(pair->leb_id) == U64_MAX); + BUG_ON(le64_to_cpu(pair->peb_id) == U64_MAX); + + SSDFS_DBG("cache %p, start_page %u, item_index %u, " + "leb_id %llu, peb_id %llu\n", + cache, start_page, item_index, + le64_to_cpu(pair->leb_id), + le64_to_cpu(pair->peb_id)); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_check_pre_deleted_peb_state(cache, start_page, + item_index, pair); + if (err == -ENODATA) { + err = 0; + /* + * No pre-deleted item was found. + * Continue the logic. + */ + } else if (err == -ENOENT) { + /* + * Valid item was found. + */ + err = 0; + item_index++; + } else if (unlikely(err)) { + SSDFS_ERR("fail to check the pre-deleted state: " + "err %d\n", err); + return err; + } + + ssdfs_memcpy(&cur_pair, 0, pair_size, + pair, 0, pair_size, + pair_size); + ssdfs_memcpy(&cur_state, 0, peb_state_size, + state, 0, peb_state_size, + peb_state_size); + + memset(&saved_pair, 0xFF, pair_size); + memset(&saved_state, 0xFF, peb_state_size); + + for (; start_page < pagevec_count(&cache->pvec); start_page++) { + bool need_move_item = false; + + page = cache->pvec.pages[start_page]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + + need_move_item = is_fragment_full(kaddr); + + if (need_move_item) { + err = ssdfs_maptbl_cache_get_last_item(kaddr, + &saved_pair, + &saved_state); + if (unlikely(err)) { + SSDFS_ERR("fail to get last item: " + "err %d\n", err); + goto finish_page_modification; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, PAGE_SIZE); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_shift_right_peb_state_area(kaddr, pair_size); + if (unlikely(err)) { + SSDFS_ERR("fail to shift the PEB state area: " + "err %d\n", err); + goto finish_page_modification; + } + +#ifdef CONFIG_SSDFS_DEBUG + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, PAGE_SIZE); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_maptbl_cache_move_right_leb2peb_pairs(kaddr, + item_index); + if (unlikely(err)) { + SSDFS_ERR("fail to move LEB2PEB pairs: " + "page_index %u, item_index %u, " + "err %d\n", + start_page, item_index, err); + goto finish_page_modification; + } + +#ifdef CONFIG_SSDFS_DEBUG + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, PAGE_SIZE); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_maptbl_cache_move_right_peb_states(kaddr, + item_index); + if (unlikely(err)) { + SSDFS_ERR("fail to move PEB states: " + "page_index %u, item_index %u, " + "err %d\n", + start_page, item_index, err); + goto finish_page_modification; + } + +#ifdef CONFIG_SSDFS_DEBUG + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, PAGE_SIZE); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_maptbl_cache_insert_leb(kaddr, item_index, + &cur_pair, &cur_state); + if (unlikely(err)) { + SSDFS_ERR("fail to insert leb descriptor: " + "page_index %u, item_index %u, err %d\n", + start_page, item_index, err); + goto finish_page_modification; + } + +#ifdef CONFIG_SSDFS_DEBUG + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, PAGE_SIZE); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_page_modification: + flush_dcache_page(page); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + if (err || !need_move_item) + goto finish_insert_leb; + + item_index = 0; + + if (need_move_item) { + ssdfs_memcpy(&cur_pair, 0, pair_size, + &saved_pair, 0, pair_size, + pair_size); + ssdfs_memcpy(&cur_state, 0, peb_state_size, + &saved_state, 0, peb_state_size, + peb_state_size); + } + } + + err = ssdfs_maptbl_cache_add_page(cache, &cur_pair, &cur_state); + if (unlikely(err)) { + SSDFS_ERR("fail to add page into maptbl cache: " + "err %d\n", + err); + } + +finish_insert_leb: + return err; +} + +/* + * ssdfs_maptbl_cache_map_leb2peb() - save LEB/PEB pair into maptbl cache + * @cache: maptbl cache object + * @leb_id: LEB ID number + * @pebr: descriptor of mapped LEB/PEB pair + * @consistency: consistency of the item + * + * This method tries to save the item (LEB/PEB pair + PEB state) + * into maptbl cache. If the item is consistent then it means that + * as mapping table cache as mapping table contain the same + * information about the item. Otherwise, for the case of inconsistent + * state, the mapping table cache contains the actual info about + * the item. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EEXIST - LEB/PEB pair is cached already. + */ +int ssdfs_maptbl_cache_map_leb2peb(struct ssdfs_maptbl_cache *cache, + u64 leb_id, + struct ssdfs_maptbl_peb_relation *pebr, + int consistency) +{ + struct ssdfs_maptbl_cache_search_result res; + struct ssdfs_maptbl_cache_header *hdr; + struct ssdfs_leb2peb_pair *tmp_pair = NULL; + u16 item_index = U16_MAX; + struct ssdfs_leb2peb_pair cur_pair; + struct ssdfs_maptbl_cache_peb_state cur_state; + struct page *page; + void *kaddr; + unsigned i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cache || !pebr); + BUG_ON(leb_id == U64_MAX); + + SSDFS_DBG("cache %p, leb_id %llu, pebr %p, consistency %#x\n", + cache, leb_id, pebr, consistency); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset(&res, 0xFF, sizeof(struct ssdfs_maptbl_cache_search_result)); + res.pebs[SSDFS_MAPTBL_MAIN_INDEX].state = + SSDFS_MAPTBL_CACHE_ITEM_UNKNOWN; + res.pebs[SSDFS_MAPTBL_RELATION_INDEX].state = + SSDFS_MAPTBL_CACHE_ITEM_UNKNOWN; + + cur_pair.leb_id = cpu_to_le64(leb_id); + cur_pair.peb_id = + cpu_to_le64(pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id); + + switch (consistency) { + case SSDFS_PEB_STATE_CONSISTENT: + case SSDFS_PEB_STATE_INCONSISTENT: + /* expected state */ + break; + + default: + SSDFS_ERR("unexpected consistency %#x\n", + consistency); + return -EINVAL; + } + + cur_state.consistency = (u8)consistency; + cur_state.state = pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].state; + cur_state.flags = pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].flags; + cur_state.shared_peb_index = + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].shared_peb_index; + + down_write(&cache->lock); + + for (i = 0; i < pagevec_count(&cache->pvec); i++) { + page = cache->pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + err = __ssdfs_maptbl_cache_find_leb(kaddr, i, leb_id, &res); + item_index = res.pebs[SSDFS_MAPTBL_MAIN_INDEX].item_index; + tmp_pair = &res.pebs[SSDFS_MAPTBL_MAIN_INDEX].found; + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + if (err == -EEXIST) { + SSDFS_ERR("maptbl cache contains leb_id %llu\n", + leb_id); + break; + } else if (err == -EFAULT) { + /* we've found place */ + break; + } else if (!err) + BUG(); + } + + if (i >= pagevec_count(&cache->pvec)) { + if (err == -ENODATA) { + /* correct page index */ + i = pagevec_count(&cache->pvec) - 1; + } else { + err = -ERANGE; + SSDFS_ERR("i %u >= pages_count %u\n", + i, pagevec_count(&cache->pvec)); + goto finish_leb_caching; + } + } + + if (err == -EEXIST) + goto finish_leb_caching; + else if (err == -E2BIG) { + err = ssdfs_maptbl_cache_add_page(cache, &cur_pair, &cur_state); + if (unlikely(err)) { + SSDFS_ERR("fail to add page into maptbl cache: " + "err %d\n", + err); + goto finish_leb_caching; + } + } else if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(i >= pagevec_count(&cache->pvec)); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = cache->pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + item_index = le16_to_cpu(hdr->items_count); + err = ssdfs_maptbl_cache_add_leb(kaddr, item_index, + &cur_pair, &cur_state); + flush_dcache_page(page); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + if (unlikely(err)) { + SSDFS_ERR("fail to add leb_id: " + "page_index %u, item_index %u, err %d\n", + i, item_index, err); + } + } else if (err == -EFAULT) { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(i >= pagevec_count(&cache->pvec)); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_maptbl_cache_insert_leb(cache, i, item_index, + &cur_pair, &cur_state); + if (unlikely(err)) { + SSDFS_ERR("fail to add LEB with shift: " + "page_index %u, item_index %u, err %d\n", + i, item_index, err); + goto finish_leb_caching; + } + } else + BUG(); + +finish_leb_caching: + up_write(&cache->lock); + + return err; +} + +/* + * __ssdfs_maptbl_cache_change_peb_state() - change PEB state of the item + * @cache: maptbl cache object + * @page_index: index of memory page + * @item_index: index of the item in the page + * @peb_state: new state of the PEB + * @consistency: consistency of the item + * + * This method tries to change the PEB state. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - unable to get peb state. + * %-ERANGE - internal error. + */ +static inline +int __ssdfs_maptbl_cache_change_peb_state(struct ssdfs_maptbl_cache *cache, + unsigned page_index, + u16 item_index, + int peb_state, + int consistency) +{ + struct ssdfs_maptbl_cache_peb_state *found_state = NULL; + struct page *page; + void *kaddr; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cache); + BUG_ON(!rwsem_is_locked(&cache->lock)); + + SSDFS_DBG("cache %p, page_index %u, item_index %u, " + "peb_state %#x, consistency %#x\n", + cache, page_index, item_index, + peb_state, consistency); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (page_index >= pagevec_count(&cache->pvec)) { + SSDFS_ERR("invalid page index %u\n", page_index); + return -ERANGE; + } + + page = cache->pvec.pages[page_index]; + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + + err = ssdfs_maptbl_cache_get_peb_state(kaddr, item_index, + &found_state); + if (err == -EINVAL) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to get peb state: " + "item_index %u\n", + item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_page_modification; + } else if (unlikely(err)) { + SSDFS_ERR("fail to get peb state: " + "item_index %u, err %d\n", + item_index, err); + goto finish_page_modification; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!found_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (consistency) { + case SSDFS_PEB_STATE_CONSISTENT: + found_state->consistency = (u8)consistency; + found_state->state = (u8)peb_state; + break; + + case SSDFS_PEB_STATE_INCONSISTENT: + if (found_state->state != (u8)peb_state) { + found_state->consistency = (u8)consistency; + found_state->state = (u8)peb_state; + } + break; + + case SSDFS_PEB_STATE_PRE_DELETED: + found_state->consistency = (u8)consistency; + found_state->state = (u8)peb_state; + break; + + default: + SSDFS_ERR("unexpected consistency %#x\n", + consistency); + return -EINVAL; + } + +finish_page_modification: + flush_dcache_page(page); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + return err; +} + +/* + * ssdfs_maptbl_cache_define_relation_index() - define relation index + * @pebr: descriptor of mapped LEB/PEB pair + * @peb_state: new state of the PEB + * @relation_index: index of the item in relation [out] + */ +static int +ssdfs_maptbl_cache_define_relation_index(struct ssdfs_maptbl_peb_relation *pebr, + int peb_state, + int *relation_index) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebr || !relation_index); + + SSDFS_DBG("MAIN_INDEX: peb_id %llu, type %#x, " + "state %#x, consistency %#x; " + "RELATION_INDEX: peb_id %llu, type %#x, " + "state %#x, consistency %#x\n", + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id, + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].type, + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].consistency, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].peb_id, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].type, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].state, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].consistency); +#endif /* CONFIG_SSDFS_DEBUG */ + + *relation_index = SSDFS_MAPTBL_RELATION_MAX; + + switch (peb_state) { + case SSDFS_MAPTBL_CLEAN_PEB_STATE: + case SSDFS_MAPTBL_USING_PEB_STATE: + case SSDFS_MAPTBL_USED_PEB_STATE: + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + switch (pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].consistency) { + case SSDFS_PEB_STATE_CONSISTENT: + case SSDFS_PEB_STATE_INCONSISTENT: + *relation_index = SSDFS_MAPTBL_MAIN_INDEX; + break; + + case SSDFS_PEB_STATE_PRE_DELETED: + *relation_index = SSDFS_MAPTBL_RELATION_INDEX; + break; + + default: + BUG(); + } + break; + + case SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE: + switch (pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].consistency) { + case SSDFS_PEB_STATE_CONSISTENT: + case SSDFS_PEB_STATE_INCONSISTENT: + *relation_index = SSDFS_MAPTBL_MAIN_INDEX; + break; + + case SSDFS_PEB_STATE_PRE_DELETED: + SSDFS_ERR("main index is pre-deleted\n"); + break; + + default: + BUG(); + } + break; + + case SSDFS_MAPTBL_MIGRATION_DST_CLEAN_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_PRE_DIRTY_STATE: + switch (pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].consistency) { + case SSDFS_PEB_STATE_CONSISTENT: + case SSDFS_PEB_STATE_INCONSISTENT: + *relation_index = SSDFS_MAPTBL_RELATION_INDEX; + break; + + case SSDFS_PEB_STATE_PRE_DELETED: + SSDFS_ERR("main index is pre-deleted\n"); + break; + + default: + BUG(); + } + break; + + default: + SSDFS_ERR("unexpected peb_state %#x\n", peb_state); + return -EINVAL; + } + + if (*relation_index == SSDFS_MAPTBL_RELATION_MAX) { + SSDFS_ERR("fail to define relation index\n"); + return -ERANGE; + } + + return 0; +} + +/* + * can_peb_state_be_changed() - check that PEB state can be changed + * @pebr: descriptor of mapped LEB/PEB pair + * @peb_state: new state of the PEB + * @consistency: consistency of the item + * @relation_index: index of the item in relation + */ +static +bool can_peb_state_be_changed(struct ssdfs_maptbl_peb_relation *pebr, + int peb_state, + int consistency, + int relation_index) +{ + int old_consistency = SSDFS_PEB_STATE_UNKNOWN; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebr); + + SSDFS_DBG("peb_state %#x, consistency %#x, relation_index %d\n", + peb_state, consistency, relation_index); + + SSDFS_DBG("MAIN_INDEX: peb_id %llu, type %#x, " + "state %#x, consistency %#x; " + "RELATION_INDEX: peb_id %llu, type %#x, " + "state %#x, consistency %#x\n", + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id, + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].type, + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].consistency, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].peb_id, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].type, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].state, + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].consistency); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (relation_index) { + case SSDFS_MAPTBL_MAIN_INDEX: + old_consistency = + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].consistency; + + switch (consistency) { + case SSDFS_PEB_STATE_CONSISTENT: + case SSDFS_PEB_STATE_INCONSISTENT: + switch (old_consistency) { + case SSDFS_PEB_STATE_PRE_DELETED: + SSDFS_WARN("invalid consistency: " + "peb_state %#x, consistency %#x, " + "relation_index %d\n", + peb_state, + consistency, + relation_index); + return false; + + case SSDFS_PEB_STATE_CONSISTENT: + case SSDFS_PEB_STATE_INCONSISTENT: + /* valid consistency */ + break; + + default: + SSDFS_WARN("invalid old consistency %#x\n", + old_consistency); + return false; + } + + case SSDFS_PEB_STATE_PRE_DELETED: + /* valid consistency */ + break; + + default: + SSDFS_WARN("invalid consistency: " + "peb_state %#x, consistency %#x, " + "relation_index %d\n", + peb_state, + consistency, + relation_index); + return false; + } + + switch (pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].state) { + case SSDFS_MAPTBL_CLEAN_PEB_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_CLEAN_PEB_STATE: + case SSDFS_MAPTBL_USING_PEB_STATE: + case SSDFS_MAPTBL_USED_PEB_STATE: + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE: + goto finish_check; + + default: + SSDFS_ERR("invalid change: " + "old peb_state %#x, " + "new peb_state %#x\n", + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + peb_state); + return false; + } + break; + + case SSDFS_MAPTBL_USING_PEB_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_CLEAN_PEB_STATE: + case SSDFS_MAPTBL_USING_PEB_STATE: + case SSDFS_MAPTBL_USED_PEB_STATE: + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE: + goto finish_check; + + default: + SSDFS_ERR("invalid change: " + "old peb_state %#x, " + "new peb_state %#x\n", + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + peb_state); + return false; + } + break; + + case SSDFS_MAPTBL_USED_PEB_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_CLEAN_PEB_STATE: + case SSDFS_MAPTBL_USING_PEB_STATE: + case SSDFS_MAPTBL_USED_PEB_STATE: + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE: + goto finish_check; + + default: + SSDFS_ERR("invalid change: " + "old peb_state %#x, " + "new peb_state %#x\n", + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + peb_state); + return false; + } + break; + + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_CLEAN_PEB_STATE: + case SSDFS_MAPTBL_USING_PEB_STATE: + case SSDFS_MAPTBL_USED_PEB_STATE: + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE: + goto finish_check; + + default: + SSDFS_ERR("invalid change: " + "old peb_state %#x, " + "new peb_state %#x\n", + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + peb_state); + return false; + } + break; + + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_CLEAN_PEB_STATE: + case SSDFS_MAPTBL_USING_PEB_STATE: + case SSDFS_MAPTBL_USED_PEB_STATE: + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE: + goto finish_check; + + default: + SSDFS_ERR("invalid change: " + "old peb_state %#x, " + "new peb_state %#x\n", + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + peb_state); + return false; + } + break; + + case SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE: + goto finish_check; + + default: + SSDFS_ERR("invalid change: " + "old peb_state %#x, " + "new peb_state %#x\n", + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + peb_state); + return false; + } + break; + + case SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE: + goto finish_check; + + default: + SSDFS_ERR("invalid change: " + "old peb_state %#x, " + "new peb_state %#x\n", + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + peb_state); + return false; + } + break; + + case SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE: + goto finish_check; + + default: + SSDFS_ERR("invalid change: " + "old peb_state %#x, " + "new peb_state %#x\n", + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + peb_state); + return false; + } + break; + + case SSDFS_MAPTBL_MIGRATION_DST_CLEAN_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_CLEAN_PEB_STATE: + goto finish_check; + + default: + SSDFS_ERR("invalid change: " + "old peb_state %#x, " + "new peb_state %#x\n", + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + peb_state); + return false; + } + break; + + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_USING_PEB_STATE: + goto finish_check; + + default: + SSDFS_ERR("invalid change: " + "old peb_state %#x, " + "new peb_state %#x\n", + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + peb_state); + return false; + } + break; + + case SSDFS_MAPTBL_MIGRATION_DST_USED_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_USED_PEB_STATE: + goto finish_check; + + default: + SSDFS_ERR("invalid change: " + "old peb_state %#x, " + "new peb_state %#x\n", + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + peb_state); + return false; + } + break; + + case SSDFS_MAPTBL_MIGRATION_DST_PRE_DIRTY_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + goto finish_check; + + default: + SSDFS_ERR("invalid change: " + "old peb_state %#x, " + "new peb_state %#x\n", + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + peb_state); + return false; + } + break; + + case SSDFS_MAPTBL_MIGRATION_DST_DIRTY_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + goto finish_check; + + default: + SSDFS_ERR("invalid change: " + "old peb_state %#x, " + "new peb_state %#x\n", + pebr->pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + peb_state); + return false; + } + break; + + default: + BUG(); + } + break; + + case SSDFS_MAPTBL_RELATION_INDEX: + old_consistency = + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].consistency; + + switch (consistency) { + case SSDFS_PEB_STATE_CONSISTENT: + case SSDFS_PEB_STATE_INCONSISTENT: + switch (old_consistency) { + case SSDFS_PEB_STATE_CONSISTENT: + case SSDFS_PEB_STATE_INCONSISTENT: + /* valid consistency */ + break; + + default: + SSDFS_WARN("invalid old consistency %#x\n", + old_consistency); + return false; + } + break; + + default: + SSDFS_WARN("invalid consistency: " + "peb_state %#x, consistency %#x, " + "relation_index %d\n", + peb_state, + consistency, + relation_index); + return false; + } + + switch (pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].state) { + case SSDFS_MAPTBL_MIGRATION_DST_CLEAN_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_MIGRATION_DST_CLEAN_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_CLEAN_PEB_STATE: + case SSDFS_MAPTBL_USING_PEB_STATE: + case SSDFS_MAPTBL_USED_PEB_STATE: + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + goto finish_check; + + default: + SSDFS_ERR("invalid change: " + "old peb_state %#x, " + "new peb_state %#x\n", + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].state, + peb_state); + return false; + } + break; + + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_CLEAN_PEB_STATE: + case SSDFS_MAPTBL_USING_PEB_STATE: + case SSDFS_MAPTBL_USED_PEB_STATE: + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + goto finish_check; + + default: + SSDFS_ERR("invalid change: " + "old peb_state %#x, " + "new peb_state %#x\n", + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].state, + peb_state); + return false; + } + break; + + case SSDFS_MAPTBL_MIGRATION_DST_USED_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_CLEAN_PEB_STATE: + case SSDFS_MAPTBL_USING_PEB_STATE: + case SSDFS_MAPTBL_USED_PEB_STATE: + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + goto finish_check; + + default: + SSDFS_ERR("invalid change: " + "old peb_state %#x, " + "new peb_state %#x\n", + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].state, + peb_state); + return false; + } + break; + + case SSDFS_MAPTBL_MIGRATION_DST_PRE_DIRTY_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_CLEAN_PEB_STATE: + case SSDFS_MAPTBL_USING_PEB_STATE: + case SSDFS_MAPTBL_USED_PEB_STATE: + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + goto finish_check; + + default: + SSDFS_ERR("invalid change: " + "old peb_state %#x, " + "new peb_state %#x\n", + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].state, + peb_state); + return false; + } + break; + + case SSDFS_MAPTBL_MIGRATION_DST_DIRTY_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_DIRTY_STATE: + case SSDFS_MAPTBL_CLEAN_PEB_STATE: + case SSDFS_MAPTBL_USING_PEB_STATE: + case SSDFS_MAPTBL_USED_PEB_STATE: + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + goto finish_check; + + default: + SSDFS_ERR("invalid change: " + "old peb_state %#x, " + "new peb_state %#x\n", + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].state, + peb_state); + return false; + } + break; + + case SSDFS_MAPTBL_USING_PEB_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_CLEAN_PEB_STATE: + case SSDFS_MAPTBL_USING_PEB_STATE: + case SSDFS_MAPTBL_USED_PEB_STATE: + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + goto finish_check; + + default: + SSDFS_ERR("invalid change: " + "old peb_state %#x, " + "new peb_state %#x\n", + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].state, + peb_state); + return false; + } + break; + + case SSDFS_MAPTBL_USED_PEB_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_CLEAN_PEB_STATE: + case SSDFS_MAPTBL_USING_PEB_STATE: + case SSDFS_MAPTBL_USED_PEB_STATE: + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + goto finish_check; + + default: + SSDFS_ERR("invalid change: " + "old peb_state %#x, " + "new peb_state %#x\n", + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].state, + peb_state); + return false; + } + break; + + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_CLEAN_PEB_STATE: + case SSDFS_MAPTBL_USING_PEB_STATE: + case SSDFS_MAPTBL_USED_PEB_STATE: + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + goto finish_check; + + default: + SSDFS_ERR("invalid change: " + "old peb_state %#x, " + "new peb_state %#x\n", + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].state, + peb_state); + return false; + } + break; + + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + switch (peb_state) { + case SSDFS_MAPTBL_CLEAN_PEB_STATE: + case SSDFS_MAPTBL_USING_PEB_STATE: + case SSDFS_MAPTBL_USED_PEB_STATE: + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + goto finish_check; + + default: + SSDFS_ERR("invalid change: " + "old peb_state %#x, " + "new peb_state %#x\n", + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].state, + peb_state); + return false; + } + break; + + default: + BUG(); + } + break; + + default: + BUG(); + } + +finish_check: + return true; +} + +/* + * ssdfs_maptbl_cache_change_peb_state_nolock() - change PEB state of the item + * @cache: maptbl cache object + * @leb_id: LEB ID number + * @peb_state: new state of the PEB + * @consistency: consistency of the item + * + * This method tries to change the PEB state. If the item is consistent + * then it means that as mapping table cache as mapping table + * contain the same information about the item. Otherwise, + * for the case of inconsistent state, the mapping table cache contains + * the actual info about the item. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_maptbl_cache_change_peb_state_nolock(struct ssdfs_maptbl_cache *cache, + u64 leb_id, int peb_state, + int consistency) +{ + struct ssdfs_maptbl_cache_search_result res; + struct ssdfs_maptbl_peb_relation pebr; + int relation_index = SSDFS_MAPTBL_RELATION_MAX; + int state; + unsigned page_index; + u16 item_index = U16_MAX; + unsigned i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cache); + BUG_ON(leb_id == U64_MAX); + BUG_ON(!rwsem_is_locked(&cache->lock)); + + SSDFS_DBG("cache %p, leb_id %llu, peb_state %#x, consistency %#x\n", + cache, leb_id, peb_state, consistency); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (consistency) { + case SSDFS_PEB_STATE_CONSISTENT: + case SSDFS_PEB_STATE_INCONSISTENT: + case SSDFS_PEB_STATE_PRE_DELETED: + /* expected state */ + break; + + default: + SSDFS_ERR("unexpected consistency %#x\n", + consistency); + return -EINVAL; + } + + switch (peb_state) { + case SSDFS_MAPTBL_CLEAN_PEB_STATE: + case SSDFS_MAPTBL_USING_PEB_STATE: + case SSDFS_MAPTBL_USED_PEB_STATE: + case SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_DIRTY_PEB_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_CLEAN_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_DST_DIRTY_STATE: + /* expected state */ + break; + + default: + SSDFS_ERR("unexpected peb_state %#x\n", peb_state); + return -EINVAL; + } + + err = ssdfs_maptbl_cache_find_leb(cache, leb_id, &res, &pebr); + if (unlikely(err)) { + SSDFS_ERR("fail to find: leb_id %llu, err %d\n", + leb_id, err); + goto finish_peb_state_change; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("MAIN_INDEX: state %#x, page_index %u, item_index %u; " + "RELATION_INDEX: state %#x, page_index %u, item_index %u\n", + res.pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + res.pebs[SSDFS_MAPTBL_MAIN_INDEX].page_index, + res.pebs[SSDFS_MAPTBL_MAIN_INDEX].item_index, + res.pebs[SSDFS_MAPTBL_RELATION_INDEX].state, + res.pebs[SSDFS_MAPTBL_RELATION_INDEX].page_index, + res.pebs[SSDFS_MAPTBL_RELATION_INDEX].item_index); + + SSDFS_DBG("MAIN_INDEX: peb_id %llu, type %#x, " + "state %#x, consistency %#x; " + "RELATION_INDEX: peb_id %llu, type %#x, " + "state %#x, consistency %#x\n", + pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id, + pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].type, + pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].consistency, + pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX].peb_id, + pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX].type, + pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX].state, + pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX].consistency); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_maptbl_cache_define_relation_index(&pebr, peb_state, + &relation_index); + if (unlikely(err)) { + SSDFS_ERR("fail to define relation index: " + "leb_id %llu, peb_state %#x, err %d\n", + leb_id, peb_state, err); + goto finish_peb_state_change; + } + + if (!can_peb_state_be_changed(&pebr, peb_state, + consistency, relation_index)) { + err = -ERANGE; + SSDFS_ERR("PEB state cannot be changed: " + "leb_id %llu, peb_state %#x, " + "consistency %#x, relation_index %d\n", + leb_id, peb_state, consistency, relation_index); + goto finish_peb_state_change; + } + + state = res.pebs[relation_index].state; + if (state != SSDFS_MAPTBL_CACHE_ITEM_FOUND) { + err = -ERANGE; + SSDFS_ERR("fail to change peb state: " + "state %#x\n", + state); + goto finish_peb_state_change; + } + + page_index = res.pebs[relation_index].page_index; + item_index = res.pebs[relation_index].item_index; + + err = __ssdfs_maptbl_cache_change_peb_state(cache, + page_index, + item_index, + peb_state, + consistency); + if (unlikely(err)) { + SSDFS_ERR("fail to change peb state: " + "page_index %u, item_index %u, " + "err %d\n", + page_index, item_index, err); + goto finish_peb_state_change; + } + +finish_peb_state_change: + if (unlikely(err)) { + struct page *page; + void *kaddr; + + for (i = 0; i < pagevec_count(&cache->pvec); i++) { + page = cache->pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + ssdfs_maptbl_cache_show_items(kaddr); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + } + } + + return err; +} + +/* + * ssdfs_maptbl_cache_change_peb_state() - change PEB state of the item + * @cache: maptbl cache object + * @leb_id: LEB ID number + * @peb_state: new state of the PEB + * @consistency: consistency of the item + * + * This method tries to change the PEB state. If the item is consistent + * then it means that as mapping table cache as mapping table + * contain the same information about the item. Otherwise, + * for the case of inconsistent state, the mapping table cache contains + * the actual info about the item. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_maptbl_cache_change_peb_state(struct ssdfs_maptbl_cache *cache, + u64 leb_id, int peb_state, + int consistency) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cache); + BUG_ON(leb_id == U64_MAX); + + SSDFS_DBG("cache %p, leb_id %llu, peb_state %#x, consistency %#x\n", + cache, leb_id, peb_state, consistency); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&cache->lock); + err = ssdfs_maptbl_cache_change_peb_state_nolock(cache, + leb_id, + peb_state, + consistency); + up_write(&cache->lock); + + return err; +} + +/* + * ssdfs_maptbl_cache_add_migration_peb() - add item for migration PEB + * @cache: maptbl cache object + * @leb_id: LEB ID number + * @pebr: descriptor of mapped LEB/PEB pair + * @consistency: consistency of the item + * + * This method tries to add the item (LEB2PEB pair + PEB state) + * for the migration PEB. If the item is consistent + * then it means that as mapping table cache as mapping table + * contain the same information about the item. Otherwise, + * for the case of inconsistent state, the mapping table cache contains + * the actual info about the item. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_maptbl_cache_add_migration_peb(struct ssdfs_maptbl_cache *cache, + u64 leb_id, + struct ssdfs_maptbl_peb_relation *pebr, + int consistency) +{ + struct ssdfs_maptbl_cache_search_result res; + struct ssdfs_maptbl_cache_header *hdr; + struct ssdfs_leb2peb_pair *tmp_pair = NULL; + u16 item_index = U16_MAX, items_count = U16_MAX; + struct ssdfs_leb2peb_pair cur_pair; + struct ssdfs_maptbl_cache_peb_state cur_state; + struct page *page; + void *kaddr; + unsigned i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cache || !pebr); + BUG_ON(leb_id == U64_MAX); + + SSDFS_DBG("cache %p, leb_id %llu, pebr %p, consistency %#x\n", + cache, leb_id, pebr, consistency); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset(&res, 0xFF, sizeof(struct ssdfs_maptbl_cache_search_result)); + res.pebs[SSDFS_MAPTBL_MAIN_INDEX].state = + SSDFS_MAPTBL_CACHE_ITEM_UNKNOWN; + res.pebs[SSDFS_MAPTBL_RELATION_INDEX].state = + SSDFS_MAPTBL_CACHE_ITEM_UNKNOWN; + + cur_pair.leb_id = cpu_to_le64(leb_id); + cur_pair.peb_id = + cpu_to_le64(pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].peb_id); + + switch (consistency) { + case SSDFS_PEB_STATE_CONSISTENT: + case SSDFS_PEB_STATE_INCONSISTENT: + /* expected state */ + break; + + default: + SSDFS_ERR("unexpected consistency %#x\n", + consistency); + return -EINVAL; + } + + cur_state.consistency = (u8)consistency; + cur_state.state = pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].state; + cur_state.flags = pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].flags; + cur_state.shared_peb_index = + pebr->pebs[SSDFS_MAPTBL_RELATION_INDEX].shared_peb_index; + + down_write(&cache->lock); + + for (i = 0; i < pagevec_count(&cache->pvec); i++) { + page = cache->pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + items_count = le16_to_cpu(hdr->items_count); + err = __ssdfs_maptbl_cache_find_leb(kaddr, i, leb_id, &res); + item_index = res.pebs[SSDFS_MAPTBL_MAIN_INDEX].item_index; + tmp_pair = &res.pebs[SSDFS_MAPTBL_MAIN_INDEX].found; + flush_dcache_page(page); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + if (err == -EEXIST || err == -EFAULT) + break; + else if (err != -E2BIG && err != -ENODATA) + break; + else if (err == -EAGAIN) + continue; + else if (!err) + BUG(); + } + + if (err != -EEXIST && err != -EAGAIN) { + SSDFS_ERR("maptbl cache hasn't item for leb_id %llu, err %d\n", + leb_id, err); + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + ssdfs_maptbl_cache_show_items(kaddr); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + goto finish_add_migration_peb; + } + + if ((item_index + 1) >= ssdfs_maptbl_cache_fragment_capacity()) { + err = ssdfs_maptbl_cache_add_page(cache, &cur_pair, &cur_state); + if (unlikely(err)) { + SSDFS_ERR("fail to add page into maptbl cache: " + "err %d\n", + err); + goto finish_add_migration_peb; + } + } else if ((item_index + 1) < items_count) { + err = ssdfs_maptbl_cache_insert_leb(cache, i, item_index, + &cur_pair, &cur_state); + if (unlikely(err)) { + SSDFS_ERR("fail to insert LEB: " + "page_index %u, item_index %u, err %d\n", + i, item_index, err); + goto finish_add_migration_peb; + } + } else { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(i >= pagevec_count(&cache->pvec)); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = cache->pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + item_index = le16_to_cpu(hdr->items_count); + err = ssdfs_maptbl_cache_add_leb(kaddr, item_index, + &cur_pair, &cur_state); + flush_dcache_page(page); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + if (unlikely(err)) { + SSDFS_ERR("fail to add leb_id: " + "page_index %u, item_index %u, err %d\n", + i, item_index, err); + goto finish_add_migration_peb; + } + } + +finish_add_migration_peb: + up_write(&cache->lock); + + return err; +} + +/* + * ssdfs_maptbl_cache_get_first_item() - get first item of the fragment + * @kaddr: pointer on maptbl cache's fragment + * @pair: pointer on LEB2PEB pair's buffer [out] + * @state: pointer on PEB state's buffer [out] + * + * This method tries to retrieve the first item of the fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - empty maptbl cache page. + */ +static +int ssdfs_maptbl_cache_get_first_item(void *kaddr, + struct ssdfs_leb2peb_pair *pair, + struct ssdfs_maptbl_cache_peb_state *state) +{ + struct ssdfs_maptbl_cache_header *hdr; + struct ssdfs_leb2peb_pair *found_pair = NULL; + struct ssdfs_maptbl_cache_peb_state *found_state = NULL; + size_t pair_size = sizeof(struct ssdfs_leb2peb_pair); + size_t peb_state_size = sizeof(struct ssdfs_maptbl_cache_peb_state); + u16 items_count; + u32 area_offset = U32_MAX; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!kaddr || !pair || !state); + + SSDFS_DBG("kaddr %p, pair %p, peb_state %p\n", + kaddr, pair, state); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + + items_count = le16_to_cpu(hdr->items_count); + + if (items_count == 0) { + SSDFS_ERR("empty maptbl cache page\n"); + return -ENODATA; + } + + found_pair = LEB2PEB_PAIR_AREA(kaddr); + ssdfs_memcpy(pair, 0, pair_size, + found_pair, 0, pair_size, + pair_size); + + found_state = FIRST_PEB_STATE(kaddr, &area_offset); + if (IS_ERR_OR_NULL(found_state)) { + err = !found_state ? PTR_ERR(found_state) : -ERANGE; + SSDFS_ERR("fail to get the PEB state area: " + "err %d\n", err); + return err; + } + + ssdfs_memcpy(state, 0, peb_state_size, + found_state, 0, peb_state_size, + peb_state_size); + + return 0; +} + +/* + * ssdfs_maptbl_cache_move_left_leb2peb_pairs() - move LEB2PEB pairs + * @kaddr: pointer on maptbl cache's fragment + * @item_index: starting index + * + * This method tries to move the LEB2PEB pairs on one position + * to the left starting from @item_index. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +#ifdef CONFIG_SSDFS_UNDER_DEVELOPMENT_FUNC +static +int ssdfs_maptbl_cache_move_left_leb2peb_pairs(void *kaddr, + u16 item_index) +{ + struct ssdfs_maptbl_cache_header *hdr; + struct ssdfs_leb2peb_pair *area; + size_t hdr_size = sizeof(struct ssdfs_maptbl_cache_header); + size_t pair_size = sizeof(struct ssdfs_leb2peb_pair); + u16 items_count; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!kaddr); + + SSDFS_DBG("kaddr %p, item_index %u\n", + kaddr, item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (item_index == 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("do nothing: item_index %u\n", + item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + + items_count = le16_to_cpu(hdr->items_count); + + if (items_count == 0) { + SSDFS_ERR("empty maptbl cache page\n"); + return -ERANGE; + } + + if (item_index >= items_count) { + SSDFS_ERR("item_index %u > items_count %u\n", + item_index, items_count); + return -EINVAL; + } + + area = LEB2PEB_PAIR_AREA(kaddr); + err = ssdfs_memmove(area, + (item_index - 1) * pair_size, + PAGE_SIZE - hdr_size, + area, + item_index * pair_size, + PAGE_SIZE - hdr_size, + (items_count - item_index) * pair_size); + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", err); + return err; + } + + return 0; +} +#endif /* CONFIG_SSDFS_UNDER_DEVELOPMENT_FUNC */ + +/* + * ssdfs_maptbl_cache_move_left_peb_states() - move PEB states + * @kaddr: pointer on maptbl cache's fragment + * @item_index: starting index + * + * This method tries to move the PEB states on one position + * to the left starting from @item_index. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +#ifdef CONFIG_SSDFS_UNDER_DEVELOPMENT_FUNC +static +int ssdfs_maptbl_cache_move_left_peb_states(void *kaddr, + u16 item_index) +{ + struct ssdfs_maptbl_cache_header *hdr; + struct ssdfs_maptbl_cache_peb_state *area; + size_t peb_state_size = sizeof(struct ssdfs_maptbl_cache_peb_state); + u16 items_count; + u32 area_offset = U32_MAX; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!kaddr); + + SSDFS_DBG("kaddr %p, item_index %u\n", + kaddr, item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (item_index == 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("do nothing: item_index %u\n", + item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + + items_count = le16_to_cpu(hdr->items_count); + + if (items_count == 0) { + SSDFS_ERR("empty maptbl cache page\n"); + return -ERANGE; + } + + if (item_index >= items_count) { + SSDFS_ERR("item_index %u > items_count %u\n", + item_index, items_count); + return -EINVAL; + } + + area = FIRST_PEB_STATE(kaddr, &area_offset); + if (IS_ERR_OR_NULL(area)) { + err = !area ? PTR_ERR(area) : -ERANGE; + SSDFS_ERR("fail to get the PEB state area: " + "err %d\n", err); + return err; + } + + err = ssdfs_memmove(area, + (item_index - 1) * peb_state_size, + PAGE_SIZE - area_offset, + area, + item_index * peb_state_size, + PAGE_SIZE - area_offset, + (items_count - item_index) * peb_state_size); + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", err); + return err; + } + + return 0; +} +#endif /* CONFIG_SSDFS_UNDER_DEVELOPMENT_FUNC */ + +/* + * ssdfs_maptbl_cache_forget_leb2peb_nolock() - exclude LEB/PEB pair from cache + * @cache: maptbl cache object + * @leb_id: LEB ID number + * @consistency: consistency of the item + * + * This method tries to exclude LEB/PEB pair from the cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_maptbl_cache_forget_leb2peb_nolock(struct ssdfs_maptbl_cache *cache, + u64 leb_id, + int consistency) +{ + struct ssdfs_maptbl_cache_search_result res; + struct ssdfs_maptbl_cache_header *hdr; + struct ssdfs_leb2peb_pair *found_pair = NULL; + struct ssdfs_leb2peb_pair saved_pair; + struct ssdfs_maptbl_cache_peb_state *found_state = NULL; + struct ssdfs_maptbl_cache_peb_state saved_state; + size_t peb_state_size = sizeof(struct ssdfs_maptbl_cache_peb_state); + struct page *page; + void *kaddr; + u16 item_index, items_count; + unsigned i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!cache); + BUG_ON(leb_id == U64_MAX); + BUG_ON(!rwsem_is_locked(&cache->lock)); + + SSDFS_DBG("cache %p, leb_id %llu, consistency %#x\n", + cache, leb_id, consistency); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset(&res, 0xFF, sizeof(struct ssdfs_maptbl_cache_search_result)); + res.pebs[SSDFS_MAPTBL_MAIN_INDEX].state = + SSDFS_MAPTBL_CACHE_ITEM_UNKNOWN; + res.pebs[SSDFS_MAPTBL_RELATION_INDEX].state = + SSDFS_MAPTBL_CACHE_ITEM_UNKNOWN; + + switch (consistency) { + case SSDFS_PEB_STATE_CONSISTENT: + case SSDFS_PEB_STATE_PRE_DELETED: + /* expected state */ + break; + + default: + SSDFS_ERR("unexpected consistency %#x\n", + consistency); + return -EINVAL; + } + + for (i = 0; i < pagevec_count(&cache->pvec); i++) { + struct ssdfs_maptbl_cache_header *hdr; + int search_state; + + page = cache->pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + items_count = le16_to_cpu(hdr->items_count); + + err = __ssdfs_maptbl_cache_find_leb(kaddr, i, leb_id, &res); + item_index = res.pebs[SSDFS_MAPTBL_MAIN_INDEX].item_index; + found_pair = &res.pebs[SSDFS_MAPTBL_MAIN_INDEX].found; + search_state = res.pebs[SSDFS_MAPTBL_RELATION_INDEX].state; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("MAIN_INDEX: state %#x, " + "page_index %u, item_index %u; " + "RELATION_INDEX: state %#x, " + "page_index %u, item_index %u\n", + res.pebs[SSDFS_MAPTBL_MAIN_INDEX].state, + res.pebs[SSDFS_MAPTBL_MAIN_INDEX].page_index, + res.pebs[SSDFS_MAPTBL_MAIN_INDEX].item_index, + res.pebs[SSDFS_MAPTBL_RELATION_INDEX].state, + res.pebs[SSDFS_MAPTBL_RELATION_INDEX].page_index, + res.pebs[SSDFS_MAPTBL_RELATION_INDEX].item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (err == -EEXIST || err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(le64_to_cpu(found_pair->leb_id) != leb_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search_state) { + case SSDFS_MAPTBL_CACHE_ITEM_FOUND: + if ((item_index + 1) >= items_count) { + err = -ERANGE; + SSDFS_ERR("invalid position found: " + "item_index %u, " + "items_count %u\n", + item_index, items_count); + } + break; + + case SSDFS_MAPTBL_CACHE_ITEM_ABSENT: + if ((item_index + 1) > items_count) { + err = -ERANGE; + SSDFS_ERR("invalid position found: " + "item_index %u, " + "items_count %u\n", + item_index, items_count); + } + break; + + default: + SSDFS_ERR("unexpected state %#x\n", + search_state); + break; + } + + err = ssdfs_maptbl_cache_get_peb_state(kaddr, + item_index, + &found_state); + if (unlikely(err)) { + SSDFS_ERR("fail to get peb state: " + "item_index %u, err %d\n", + item_index, err); + } else { + ssdfs_memcpy(&saved_state, 0, peb_state_size, + found_state, 0, peb_state_size, + peb_state_size); + } + + /* it is expected existence of the item */ + err = -EEXIST; + } + + flush_dcache_page(page); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + if (err == -EEXIST || err == -EFAULT) + break; + else if (err != -E2BIG && err != -ENODATA) + break; + else if (!err) + BUG(); + } + + if (err != -EEXIST) + goto finish_exclude_migration_peb; + + if (consistency == SSDFS_PEB_STATE_PRE_DELETED) { + /* simply change the state */ + goto finish_exclude_migration_peb; + } else { + unsigned page_index = i; + u16 deleted_item = item_index; + u8 new_peb_state = SSDFS_MAPTBL_UNKNOWN_PEB_STATE; + + err = ssdfs_maptbl_cache_remove_leb(cache, i, item_index); + if (unlikely(err)) { + SSDFS_ERR("fail to remove LEB: " + "page_index %u, item_index %u, err %d\n", + i, item_index, err); + goto finish_exclude_migration_peb; + } + + for (++i; i < pagevec_count(&cache->pvec); i++) { + page = cache->pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + err = ssdfs_maptbl_cache_get_first_item(kaddr, + &saved_pair, + &saved_state); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + if (unlikely(err)) { + SSDFS_ERR("fail to get first item: " + "err %d\n", err); + goto finish_exclude_migration_peb; + } + + page = cache->pvec.pages[i - 1]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + items_count = le16_to_cpu(hdr->items_count); + if (items_count == 0) + item_index = 0; + else + item_index = items_count; + + err = ssdfs_maptbl_cache_add_leb(kaddr, item_index, + &saved_pair, + &saved_state); + + flush_dcache_page(page); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + if (unlikely(err)) { + SSDFS_ERR("fail to add leb_id: " + "page_index %u, item_index %u, " + "err %d\n", + i, item_index, err); + goto finish_exclude_migration_peb; + } + + item_index = 0; + err = ssdfs_maptbl_cache_remove_leb(cache, i, + item_index); + if (unlikely(err)) { + SSDFS_ERR("fail to remove LEB: " + "page_index %u, item_index %u, " + "err %d\n", + i, item_index, err); + goto finish_exclude_migration_peb; + } + } + + i = pagevec_count(&cache->pvec); + if (i == 0) { + err = -ERANGE; + SSDFS_ERR("invalid number of fragments %u\n", i); + goto finish_exclude_migration_peb; + } else + i--; + + if (i < page_index) { + err = -ERANGE; + SSDFS_ERR("invalid page index: " + "i %u, page_index %u\n", + i, page_index); + goto finish_exclude_migration_peb; + } + + page = cache->pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + hdr = (struct ssdfs_maptbl_cache_header *)kaddr; + items_count = le16_to_cpu(hdr->items_count); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + if (items_count == 0) { + cache->pvec.pages[i] = NULL; + cache->pvec.nr--; + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_map_cache_free_page(page); + atomic_sub(PAGE_SIZE, &cache->bytes_count); + + if (i == page_index) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("do nothing: " + "page %u was deleted\n", + page_index); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_exclude_migration_peb; + } + } + + switch (saved_state.state) { + case SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE: + case SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE: + /* continue logic */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("do not change PEB state: " + "page_index %u, deleted_item %u, " + "state %#x\n", + page_index, deleted_item, + saved_state.state); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_exclude_migration_peb; + } + + if (deleted_item >= items_count) { + err = -ERANGE; + SSDFS_ERR("deleted_item %u >= items_count %u\n", + deleted_item, items_count); + goto finish_exclude_migration_peb; + } + + page = cache->pvec.pages[page_index]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + err = ssdfs_maptbl_cache_get_peb_state(kaddr, + deleted_item, + &found_state); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + if (unlikely(err)) { + SSDFS_ERR("fail to get peb state: " + "item_index %u, err %d\n", + deleted_item, err); + goto finish_exclude_migration_peb; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found_state->state %#x\n", + found_state->state); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (found_state->state) { + case SSDFS_MAPTBL_MIGRATION_DST_CLEAN_STATE: + new_peb_state = SSDFS_MAPTBL_CLEAN_PEB_STATE; + break; + + case SSDFS_MAPTBL_MIGRATION_DST_USING_STATE: + new_peb_state = SSDFS_MAPTBL_USING_PEB_STATE; + break; + + case SSDFS_MAPTBL_MIGRATION_DST_USED_STATE: + new_peb_state = SSDFS_MAPTBL_USED_PEB_STATE; + break; + + case SSDFS_MAPTBL_MIGRATION_DST_PRE_DIRTY_STATE: + new_peb_state = SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE; + break; + + case SSDFS_MAPTBL_MIGRATION_DST_DIRTY_STATE: + new_peb_state = SSDFS_MAPTBL_DIRTY_PEB_STATE; + break; + + default: + /* do nothing */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PEB not under migration: " + "state %#x\n", + found_state->state); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_exclude_migration_peb; + } + + err = __ssdfs_maptbl_cache_change_peb_state(cache, + page_index, + deleted_item, + new_peb_state, + consistency); + if (unlikely(err)) { + SSDFS_ERR("fail to change peb state: " + "page_index %u, item_index %u, " + "err %d\n", + page_index, deleted_item, err); + goto finish_exclude_migration_peb; + } + } + +finish_exclude_migration_peb: + if (consistency == SSDFS_PEB_STATE_PRE_DELETED) { + err = ssdfs_maptbl_cache_change_peb_state_nolock(cache, leb_id, + saved_state.state, + consistency); + if (unlikely(err)) { + SSDFS_ERR("fail to change PEB state: err %d\n", err); + return err; + } + } + + return err; +} + +/* + * ssdfs_maptbl_cache_exclude_migration_peb() - exclude migration PEB + * @cache: maptbl cache object + * @leb_id: LEB ID number + * @consistency: consistency of the item + * + * This method tries to exclude LEB/PEB pair after + * finishing the migration. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_maptbl_cache_exclude_migration_peb(struct ssdfs_maptbl_cache *cache, + u64 leb_id, + int consistency) +{ + int err; + + down_write(&cache->lock); + err = ssdfs_maptbl_cache_forget_leb2peb_nolock(cache, leb_id, + consistency); + up_write(&cache->lock); + + return err; +} + +/* + * ssdfs_maptbl_cache_forget_leb2peb() - exclude LEB/PEB pair from cache + * @cache: maptbl cache object + * @leb_id: LEB ID number + * @consistency: consistency of the item + * + * This method tries to exclude LEB/PEB pair from the cache. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_maptbl_cache_forget_leb2peb(struct ssdfs_maptbl_cache *cache, + u64 leb_id, + int consistency) +{ + int err; + + down_write(&cache->lock); + err = ssdfs_maptbl_cache_forget_leb2peb_nolock(cache, leb_id, + consistency); + up_write(&cache->lock); + + return err; +} From patchwork Sat Feb 25 01:08:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151951 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7C76C7EE30 for ; Sat, 25 Feb 2023 01:19:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229619AbjBYBTN (ORCPT ); Fri, 24 Feb 2023 20:19:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49164 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229533AbjBYBRQ (ORCPT ); Fri, 24 Feb 2023 20:17:16 -0500 Received: from mail-oi1-x234.google.com (mail-oi1-x234.google.com [IPv6:2607:f8b0:4864:20::234]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EF57614997 for ; Fri, 24 Feb 2023 17:17:11 -0800 (PST) Received: by mail-oi1-x234.google.com with SMTP id o12so803825oik.6 for ; Fri, 24 Feb 2023 17:17:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=F99qk9sU9ZnW6t9yqSpVrYW6OhUnUQD1p/y5nsFiOYA=; b=UHpgeM0ZL/TK0evZJiOfO8az8vpYtgxiQPOmnhcwEVfHndHDc701/eLq7mhTzVjteQ DdXJf5xxz6zDb8ypp2zIs9gQxAG+RjnF3Cg9VNxGDsigq4KjYEOmV4L8lzD4isR/59jV KWwYxK+0CmGzNKj3Ws5DA8v7hIwpNn17m/woZK2R/uFOCvqPncTWUyRIul2rzWvaPj83 VCZngYQFL9My+f2zxYK1itP7JhAEcnByI0cI+CfrxmQPO/7B5Rj5THvOEMO3kKm0lZ42 B8kXYYAWkdvAIrwYtKIQuo10fP0B2Ub4Xu4e9UpaeWxpXQXw3woGQX1yMUL/GE8Tjuc2 1gcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=F99qk9sU9ZnW6t9yqSpVrYW6OhUnUQD1p/y5nsFiOYA=; b=54RoV/tdVaHxdItJWlmio6+UbYmJnP0Y9lsy3zXVdRckCR7z3qVUKrLMD8SgeTtqjk kfvpfZphJfkmRPok5JUR/LvHtVkW8xM+EkIxmPEILeXX+Tk8RS/Y6Wn6NMvkjTiOzt17 DwQmj9Y91Ok3ZP3RFYH/TbdzsQKU/rOlmlkUdZPct2QNdnTw+OxpCk9RyHKkFF+MgBlm kx+sMOQVVlPw9rweiC1Pb2j0ZPkQllmvcBd3gxTjn5EL/X+oYG1paUsfV0ZUPywZY4ee 4dVmxbLUBklI/rR2Vwt5WVpAXCPqDDx88xKjDNETbU2KZQ+v1yY9OGC5qoMMkAuMwJ/H DGpA== X-Gm-Message-State: AO0yUKUEKlgmkT++6Onu/nTlvqcOvBnMI235gg9WNvf4YMZkQESfQKfn iSoCHWow9N6Z8tGR/+mxQsujPLJ+/CPsVaMM X-Google-Smtp-Source: AK7set/U456NzbJlcOi6ANoeQx34HsF7sMt/nsnZBV85v7Jmodz36v3U7hlXsqURgGMEG58GRQNUew== X-Received: by 2002:aca:d0d:0:b0:360:e80e:26a9 with SMTP id 13-20020aca0d0d000000b00360e80e26a9mr4335510oin.12.1677287830005; Fri, 24 Feb 2023 17:17:10 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:09 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 45/76] ssdfs: introduce segment bitmap Date: Fri, 24 Feb 2023 17:08:56 -0800 Message-Id: <20230225010927.813929-46-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Segment bitmap is the critical metadata structure of SSDFS file system that implements several goals: (1) searching for a candidate for a current segment capable of storing new data, (2) searching by GC subsystem for a most optimal segment (dirty state, for example) with the goal of preparing the segment in background for storing new data (converting in clean state). Segment bitmap is able to represent a set of states: (1) clean state means that a segment contains the free logical blocks only, (2) using state means that a segment could contain valid, invalid, and free logical blocks, (3) used state means that a segment contains the valid logical blocks only, (4) pre-dirty state means that a segment contains valid and invalid logical blocks, (5) dirty state means that a segment contains only invalid blocks, (6) reserved state is used for reservation the segment numbers for some metadata structures (for example, for the case of superblock segment). PEB migration scheme implies that segments are able to migrate from one state into another one without the explicit using of GC subsystem. For example, if some segment receives enough truncate operations (data invalidation) then the segment could change the used state in pre-dirty state. Additionally, the segment is able to migrate from pre-dirty into using state by means of PEBs migration in the case of receiving enough data update requests. As a result, the segment in using state could be selected like the current segment without any GC-related activity. However, a segment is able to stick in pre-dirty state in the case of absence the update requests. Finally, such situation can be resolved by GC subsystem by means of valid blocks migration in the background and pre-dirty segment can be transformed into the using state. Segment bitmap is implemented like the bitmap metadata structure that is split on several fragments. Every fragment is stored into a log of specialized PEB. As a result, the full size of segment bitmap and PEB’s capacity define the number of fragments. The mkfs utility reserves the necessary number of segments for storing the segment bitmap’s fragments during a SSDFS file system’s volume creation. Finally, the numbers of reserved segments are stored into the superblock metadata structure. The segment bitmap ”lives” in the same set of reserved segments during the whole lifetime of the volume. However, the update operations of segment bitmap could trigger the PEBs migration in the case of exhaustion of any PEB used for keeping the segment bitmap’s content. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/segment_bitmap.c | 1807 ++++++++++++++++++++++++++++++ fs/ssdfs/segment_bitmap.h | 459 ++++++++ fs/ssdfs/segment_bitmap_tables.c | 814 ++++++++++++++ 3 files changed, 3080 insertions(+) create mode 100644 fs/ssdfs/segment_bitmap.c create mode 100644 fs/ssdfs/segment_bitmap.h create mode 100644 fs/ssdfs/segment_bitmap_tables.c diff --git a/fs/ssdfs/segment_bitmap.c b/fs/ssdfs/segment_bitmap.c new file mode 100644 index 000000000000..633cd4cfca0a --- /dev/null +++ b/fs/ssdfs/segment_bitmap.c @@ -0,0 +1,1807 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/segment_bitmap.c - segment bitmap implementation. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "offset_translation_table.h" +#include "page_array.h" +#include "page_vector.h" +#include "peb.h" +#include "peb_container.h" +#include "segment_bitmap.h" +#include "segment.h" +#include "btree_search.h" +#include "btree_node.h" +#include "btree.h" +#include "extents_tree.h" + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_seg_bmap_page_leaks; +atomic64_t ssdfs_seg_bmap_memory_leaks; +atomic64_t ssdfs_seg_bmap_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_seg_bmap_cache_leaks_increment(void *kaddr) + * void ssdfs_seg_bmap_cache_leaks_decrement(void *kaddr) + * void *ssdfs_seg_bmap_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_seg_bmap_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_seg_bmap_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_seg_bmap_kfree(void *kaddr) + * struct page *ssdfs_seg_bmap_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_seg_bmap_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_seg_bmap_free_page(struct page *page) + * void ssdfs_seg_bmap_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(seg_bmap) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(seg_bmap) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_seg_bmap_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_seg_bmap_page_leaks, 0); + atomic64_set(&ssdfs_seg_bmap_memory_leaks, 0); + atomic64_set(&ssdfs_seg_bmap_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_seg_bmap_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_seg_bmap_page_leaks) != 0) { + SSDFS_ERR("SEGMENT BITMAP: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_seg_bmap_page_leaks)); + } + + if (atomic64_read(&ssdfs_seg_bmap_memory_leaks) != 0) { + SSDFS_ERR("SEGMENT BITMAP: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_seg_bmap_memory_leaks)); + } + + if (atomic64_read(&ssdfs_seg_bmap_cache_leaks) != 0) { + SSDFS_ERR("SEGMENT BITMAP: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_seg_bmap_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +extern const bool detect_clean_seg[U8_MAX + 1]; +extern const bool detect_data_using_seg[U8_MAX + 1]; +extern const bool detect_lnode_using_seg[U8_MAX + 1]; +extern const bool detect_hnode_using_seg[U8_MAX + 1]; +extern const bool detect_idxnode_using_seg[U8_MAX + 1]; +extern const bool detect_used_seg[U8_MAX + 1]; +extern const bool detect_pre_dirty_seg[U8_MAX + 1]; +extern const bool detect_dirty_seg[U8_MAX + 1]; +extern const bool detect_bad_seg[U8_MAX + 1]; +extern const bool detect_clean_using_mask[U8_MAX + 1]; +extern const bool detect_used_dirty_mask[U8_MAX + 1]; + +static +void ssdfs_segbmap_invalidate_folio(struct folio *folio, size_t offset, + size_t length) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("do nothing: offset %zu, length %zu\n", + offset, length); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +/* + * ssdfs_segbmap_release_folio() - Release fs-specific metadata on a folio. + * @folio: The folio which the kernel is trying to free. + * @gfp: Memory allocation flags (and I/O mode). + * + * The address_space is trying to release any data attached to a folio + * (presumably at folio->private). + * + * This will also be called if the private_2 flag is set on a page, + * indicating that the folio has other metadata associated with it. + * + * The @gfp argument specifies whether I/O may be performed to release + * this page (__GFP_IO), and whether the call may block + * (__GFP_RECLAIM & __GFP_FS). + * + * Return: %true if the release was successful, otherwise %false. + */ +static +bool ssdfs_segbmap_release_folio(struct folio *folio, gfp_t gfp) +{ + return false; +} + +static +bool ssdfs_segbmap_noop_dirty_folio(struct address_space *mapping, + struct folio *folio) +{ + return true; +} + +const struct address_space_operations ssdfs_segbmap_aops = { + .invalidate_folio = ssdfs_segbmap_invalidate_folio, + .release_folio = ssdfs_segbmap_release_folio, + .dirty_folio = ssdfs_segbmap_noop_dirty_folio, +}; + +/* + * ssdfs_segbmap_mapping_init() - segment bitmap's mapping init + */ +static inline +void ssdfs_segbmap_mapping_init(struct address_space *mapping, + struct inode *inode) +{ + address_space_init_once(mapping); + mapping->a_ops = &ssdfs_segbmap_aops; + mapping->host = inode; + mapping->flags = 0; + atomic_set(&mapping->i_mmap_writable, 0); + mapping_set_gfp_mask(mapping, GFP_KERNEL); + mapping->private_data = NULL; + mapping->writeback_index = 0; + inode->i_mapping = mapping; +} + +static const struct inode_operations def_segbmap_ino_iops; +static const struct file_operations def_segbmap_ino_fops; +static const struct address_space_operations def_segbmap_ino_aops; + +/* + * ssdfs_segbmap_get_inode() - create segment bitmap's inode object + * @fsi: file system info object + */ +static +int ssdfs_segbmap_get_inode(struct ssdfs_fs_info *fsi) +{ + struct inode *inode; + struct ssdfs_inode_info *ii; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + + SSDFS_DBG("fsi %p\n", fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + + inode = iget_locked(fsi->sb, SSDFS_SEG_BMAP_INO); + if (unlikely(!inode)) { + err = -ENOMEM; + SSDFS_ERR("unable to allocate segment bitmap inode: " + "err %d\n", + err); + return err; + } + + BUG_ON(!(inode->i_state & I_NEW)); + + inode->i_mode = S_IFREG; + mapping_set_gfp_mask(inode->i_mapping, GFP_KERNEL); + + inode->i_op = &def_segbmap_ino_iops; + inode->i_fop = &def_segbmap_ino_fops; + inode->i_mapping->a_ops = &def_segbmap_ino_aops; + + ii = SSDFS_I(inode); + ii->birthtime = current_time(inode); + ii->parent_ino = U64_MAX; + + down_write(&ii->lock); + err = ssdfs_extents_tree_create(fsi, ii); + up_write(&ii->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to create the extents tree: " + "err %d\n", err); + unlock_new_inode(inode); + iput(inode); + return -ERANGE; + } + + unlock_new_inode(inode); + + fsi->segbmap_inode = inode; + + return 0; +} + +/* + * ssdfs_segbmap_define_segments() - determine segment bitmap segment numbers + * @fsi: file system info object + * @array_type: array type (main or copy) + * @segbmap: pointer on segment bitmap object [out] + * + * The method tries to retrieve segment numbers from volume header. + * + * RETURN: + * [success] - count of valid segment numbers in the array. + * [failure] - error code: + * + * %-EIO - volume header is corrupted. + */ +static +int ssdfs_segbmap_define_segments(struct ssdfs_fs_info *fsi, + int array_type, + struct ssdfs_segment_bmap *segbmap) +{ + u64 seg; + u8 count = 0; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !segbmap); + BUG_ON(array_type >= SSDFS_SEGBMAP_SEG_COPY_MAX); + + SSDFS_DBG("fsi %p, array_type %#x, segbmap %p\n", + fsi, array_type, segbmap); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < SSDFS_SEGBMAP_SEGS; i++) + segbmap->seg_numbers[i][array_type] = U64_MAX; + + for (i = 0; i < SSDFS_SEGBMAP_SEGS; i++) { + seg = le64_to_cpu(fsi->vh->segbmap.segs[i][array_type]); + + if (seg == U64_MAX) + break; + else if (seg >= fsi->nsegs) { + SSDFS_ERR("invalid segment %llu, nsegs %llu\n", + seg, fsi->nsegs); + return -EIO; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segbmap: seg[%d][%d] = %llu\n", + i, array_type, seg); +#endif /* CONFIG_SSDFS_DEBUG */ + + segbmap->seg_numbers[i][array_type] = seg; + count++; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segbmap segments count %u\n", count); +#endif /* CONFIG_SSDFS_DEBUG */ + + return count; +} + +/* + * ssdfs_segbmap_create_segments() - create segbmap's segment objects + * @fsi: file system info object + * @array_type: array type (main or copy) + * @segbmap: pointer on segment bitmap object [out] + */ +static +int ssdfs_segbmap_create_segments(struct ssdfs_fs_info *fsi, + int array_type, + struct ssdfs_segment_bmap *segbmap) +{ + u64 seg; + struct ssdfs_segment_info **kaddr; + u16 log_pages; + u16 create_threads; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !segbmap); + BUG_ON(array_type >= SSDFS_SEGBMAP_SEG_COPY_MAX); + BUG_ON(!rwsem_is_locked(&fsi->volume_sem)); + + SSDFS_DBG("fsi %p, array_type %#x, segbmap %p\n", + fsi, array_type, segbmap); +#endif /* CONFIG_SSDFS_DEBUG */ + + log_pages = le16_to_cpu(fsi->vh->segbmap_log_pages); + create_threads = fsi->create_threads_per_seg; + + for (i = 0; i < segbmap->segs_count; i++) { + seg = segbmap->seg_numbers[i][array_type]; + kaddr = &segbmap->segs[i][array_type]; + BUG_ON(*kaddr != NULL); + + *kaddr = ssdfs_segment_allocate_object(seg); + if (IS_ERR_OR_NULL(*kaddr)) { + err = !*kaddr ? -ENOMEM : PTR_ERR(*kaddr); + *kaddr = NULL; + SSDFS_ERR("fail to allocate segment object: " + "seg %llu, err %d\n", + seg, err); + return err; + } + + err = ssdfs_segment_create_object(fsi, seg, + SSDFS_SEG_LEAF_NODE_USING, + SSDFS_SEGBMAP_SEG_TYPE, + log_pages, + create_threads, + *kaddr); + if (err == -EINTR) { + /* + * Ignore this error. + */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create segment: " + "seg %llu, err %d\n", + seg, err); + return err; + } + + ssdfs_segment_get_object(*kaddr); + } + + return 0; +} + +/* + * ssdfs_segbmap_destroy_segments() - destroy segbmap's segment objects + * @segbmap: pointer on segment bitmap object + */ +static +void ssdfs_segbmap_destroy_segments(struct ssdfs_segment_bmap *segbmap) +{ + struct ssdfs_segment_info *si; + int i, j; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap); + + SSDFS_DBG("segbmap %p\n", segbmap); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < segbmap->segs_count; i++) { + for (j = 0; j < SSDFS_SEGBMAP_SEG_COPY_MAX; j++) { + si = segbmap->segs[i][j]; + + if (!si) + continue; + + ssdfs_segment_put_object(si); + err = ssdfs_segment_destroy_object(si); + if (unlikely(err == -EBUSY)) + BUG(); + else if (unlikely(err)) { + SSDFS_WARN("issue during segment destroy: " + "err %d\n", + err); + } + } + } +} + +/* + * ssdfs_segbmap_segment_init() - issue segbmap init command for PEBs + * @segbmap: pointer on segment bitmap object + * @si: segment object + * @seg_index: index of segment in the sequence + */ +static +int ssdfs_segbmap_segment_init(struct ssdfs_segment_bmap *segbmap, + struct ssdfs_segment_info *si, + int seg_index) +{ + u64 logical_offset; + u32 fragment_bytes = segbmap->fragment_size; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si); + + SSDFS_DBG("si %p, seg %llu, seg_index %d\n", + si, si->seg_id, seg_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < si->pebs_count; i++) { + struct ssdfs_peb_container *pebc = &si->peb_array[i]; + struct ssdfs_segment_request *req; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("i %d, pebc %p\n", i, pebc); + + BUG_ON(!pebc); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_peb_container_empty(pebc)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PEB container empty: " + "seg %llu, peb_index %d\n", + si->seg_id, i); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } + + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + req = NULL; + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + return err; + } + + ssdfs_request_init(req); + ssdfs_get_request(req); + + logical_offset = + (u64)segbmap->fragments_per_peb * fragment_bytes; + logical_offset *= si->pebs_count; + logical_offset *= seg_index; + ssdfs_request_prepare_logical_extent(SSDFS_MAPTBL_INO, + logical_offset, + fragment_bytes, + 0, 0, req); + ssdfs_request_define_segment(si->seg_id, req); + + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + SSDFS_READ_INIT_SEGBMAP, + SSDFS_REQ_ASYNC, + req); + ssdfs_peb_read_request_cno(pebc); + ssdfs_requests_queue_add_tail(&pebc->read_rq, req); + } + + wake_up_all(&si->wait_queue[SSDFS_PEB_READ_THREAD]); + + return 0; +} + +/* + * ssdfs_segbmap_init() - issue segbmap init command for all segments + * @segbmap: pointer on segment bitmap object + */ +static +int ssdfs_segbmap_init(struct ssdfs_segment_bmap *segbmap) +{ + struct ssdfs_segment_info *si; + int i, j; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap); + + SSDFS_DBG("segbmap %p, segbmap->segs_count %u\n", + segbmap, segbmap->segs_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < segbmap->segs_count; i++) { + for (j = 0; j < SSDFS_SEGBMAP_SEG_COPY_MAX; j++) { + si = segbmap->segs[i][j]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("i %d, j %d, si %p\n", i, j, si); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!si) + continue; + + err = ssdfs_segbmap_segment_init(segbmap, si, i); + if (unlikely(err)) { + SSDFS_ERR("fail to init segment: " + "seg %llu, err %d\n", + si->seg_id, err); + return err; + } + } + } + + return 0; +} + +/* + * ssdfs_segbmap_create_fragment_bitmaps() - create fragment bitmaps + * @segbmap: pointer on segment bitmap object + */ +static +int ssdfs_segbmap_create_fragment_bitmaps(struct ssdfs_segment_bmap *segbmap) +{ + size_t bmap_bytes; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap); + BUG_ON(segbmap->fragments_count == 0); + + SSDFS_DBG("segbmap %p\n", segbmap); +#endif /* CONFIG_SSDFS_DEBUG */ + + bmap_bytes = segbmap->fragments_count + BITS_PER_LONG - 1; + bmap_bytes /= BITS_PER_BYTE; + + for (i = 0; i < SSDFS_SEGBMAP_FBMAP_TYPE_MAX; i++) { + unsigned long **ptr = &segbmap->fbmap[i]; + + BUG_ON(*ptr); + + *ptr = ssdfs_seg_bmap_kzalloc(bmap_bytes, GFP_KERNEL); + if (!*ptr) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate fbmap: " + "index %d\n", i); + goto free_fbmaps; + } + } + + return 0; + +free_fbmaps: + for (; i >= 0; i--) + ssdfs_seg_bmap_kfree(segbmap->fbmap[i]); + + return err; +} + +/* + * ssdfs_segbmap_destroy_fragment_bitmaps() - destroy fragment bitmaps + * @segbmap: pointer on segment bitmap object + */ +static inline +void ssdfs_segbmap_destroy_fragment_bitmaps(struct ssdfs_segment_bmap *segbmap) +{ + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap); + + SSDFS_DBG("segbmap %p\n", segbmap); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < SSDFS_SEGBMAP_FBMAP_TYPE_MAX; i++) + ssdfs_seg_bmap_kfree(segbmap->fbmap[i]); +} + +/* + * ssdfs_segbmap_create() - create segment bitmap object + * @fsi: file system info object + * + * This method tries to create segment bitmap object. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - volume header is corrupted. + * %-EROFS - segbmap's flags contain error field. + * %-EOPNOTSUPP - fragment size isn't supported. + * %-ENOMEM - fail to allocate memory. + * %-ERANGE - internal error. + */ +int ssdfs_segbmap_create(struct ssdfs_fs_info *fsi) +{ + struct ssdfs_segment_bmap *ptr; + size_t segbmap_obj_size = sizeof(struct ssdfs_segment_bmap); + size_t frag_desc_size = sizeof(struct ssdfs_segbmap_fragment_desc); + int count; + u32 calculated; + void *kaddr; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("fsi %p, segs_count %llu\n", fsi, fsi->nsegs); +#else + SSDFS_DBG("fsi %p, segs_count %llu\n", fsi, fsi->nsegs); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + kaddr = ssdfs_seg_bmap_kzalloc(segbmap_obj_size, GFP_KERNEL); + if (!kaddr) { + SSDFS_ERR("fail to allocate segment bitmap object\n"); + return -ENOMEM; + } + + fsi->segbmap = ptr = (struct ssdfs_segment_bmap *)kaddr; + + ptr->fsi = fsi; + + init_rwsem(&fsi->segbmap->resize_lock); + + ptr->flags = le16_to_cpu(fsi->vh->segbmap.flags); + if (ptr->flags & ~SSDFS_SEGBMAP_FLAGS_MASK) { + err = -EIO; + SSDFS_CRIT("segbmap header corrupted: " + "unknown flags %#x\n", + ptr->flags); + goto free_segbmap_object; + } + + if (ptr->flags & SSDFS_SEGBMAP_ERROR) { + err = -EROFS; + SSDFS_NOTICE("segment bitmap has corrupted state: " + "Please, run fsck utility\n"); + goto free_segbmap_object; + } + + ptr->items_count = fsi->nsegs; + + ptr->bytes_count = le32_to_cpu(fsi->vh->segbmap.bytes_count); + if (ptr->bytes_count != SEG_BMAP_BYTES(ptr->items_count)) { + err = -EIO; + SSDFS_CRIT("segbmap header corrupted: " + "bytes_count %u != calculated %u\n", + ptr->bytes_count, + SEG_BMAP_BYTES(ptr->items_count)); + goto free_segbmap_object; + } + + ptr->fragment_size = le16_to_cpu(fsi->vh->segbmap.fragment_size); + if (ptr->fragment_size != PAGE_SIZE) { + err = -EOPNOTSUPP; + SSDFS_ERR("fragment size %u isn't supported\n", + ptr->fragment_size); + goto free_segbmap_object; + } + + ptr->fragments_count = le16_to_cpu(fsi->vh->segbmap.fragments_count); + if (ptr->fragments_count != SEG_BMAP_FRAGMENTS(ptr->items_count)) { + err = -EIO; + SSDFS_CRIT("segbmap header corrupted: " + "fragments_count %u != calculated %u\n", + ptr->fragments_count, + SEG_BMAP_FRAGMENTS(ptr->items_count)); + goto free_segbmap_object; + } + + ptr->fragments_per_seg = + le16_to_cpu(fsi->vh->segbmap.fragments_per_seg); + calculated = (u32)ptr->fragments_per_seg * ptr->fragment_size; + if (fsi->segsize < calculated) { + err = -EIO; + SSDFS_CRIT("segbmap header corrupted: " + "fragments_per_seg %u is invalid\n", + ptr->fragments_per_seg); + goto free_segbmap_object; + } + + ptr->fragments_per_peb = + le16_to_cpu(fsi->vh->segbmap.fragments_per_peb); + calculated = (u32)ptr->fragments_per_peb * ptr->fragment_size; + if (fsi->erasesize < calculated) { + err = -EIO; + SSDFS_CRIT("segbmap header corrupted: " + "fragments_per_peb %u is invalid\n", + ptr->fragments_per_peb); + goto free_segbmap_object; + } + + init_rwsem(&ptr->search_lock); + + err = ssdfs_segbmap_create_fragment_bitmaps(ptr); + if (unlikely(err)) { + SSDFS_ERR("fail to create fragment bitmaps\n"); + goto free_segbmap_object; + } + + kaddr = ssdfs_seg_bmap_kcalloc(ptr->fragments_count, + frag_desc_size, GFP_KERNEL); + if (!kaddr) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate fragment descriptors array\n"); + goto free_fragment_bmaps; + } + + ptr->desc_array = (struct ssdfs_segbmap_fragment_desc *)kaddr; + + for (i = 0; i < ptr->fragments_count; i++) + init_completion(&ptr->desc_array[i].init_end); + + err = ssdfs_segbmap_get_inode(fsi); + if (unlikely(err)) { + SSDFS_ERR("fail to create segment bitmap's inode: " + "err %d\n", + err); + goto free_desc_array; + } + + ssdfs_segbmap_mapping_init(&ptr->pages, fsi->segbmap_inode); + + count = ssdfs_segbmap_define_segments(fsi, SSDFS_MAIN_SEGBMAP_SEG, + ptr); + if (count < 0) { + err = count; + SSDFS_ERR("fail to get segbmap segment numbers: err %d\n", + err); + goto free_desc_array; + } else if (count == 0 || count > SSDFS_SEGBMAP_SEGS) { + err = -ERANGE; + SSDFS_ERR("invalid segbmap segment numbers count %d\n", + count); + goto forget_inode; + } + + ptr->segs_count = le16_to_cpu(fsi->vh->segbmap.segs_count); + if (ptr->segs_count != count) { + err = -EIO; + SSDFS_CRIT("segbmap header corrupted: " + "segs_count %u != calculated %u\n", + ptr->segs_count, count); + goto forget_inode; + } + + count = ssdfs_segbmap_define_segments(fsi, SSDFS_COPY_SEGBMAP_SEG, + ptr); + if (count < 0) { + err = count; + SSDFS_ERR("fail to get segbmap segment numbers: err %d\n", + err); + goto free_desc_array; + } else if (count > SSDFS_SEGBMAP_SEGS) { + err = -ERANGE; + SSDFS_ERR("invalid segbmap segment numbers count %d\n", + count); + goto forget_inode; + } + + if (ptr->flags & SSDFS_SEGBMAP_HAS_COPY) { + if (count == 0) { + err = -EIO; + SSDFS_CRIT("segbmap header corrupted: " + "copy segments' chain is absent\n"); + goto forget_inode; + } else if (count != ptr->segs_count) { + SSDFS_ERR("count %u != ptr->segs_count %u\n", + count, ptr->segs_count); + goto forget_inode; + } + } else { + if (count != 0) { + err = -EIO; + SSDFS_CRIT("segbmap header corrupted: " + "copy segments' chain is present\n"); + goto forget_inode; + } + } + + err = ssdfs_segbmap_create_segments(fsi, SSDFS_MAIN_SEGBMAP_SEG, ptr); + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto destroy_seg_objects; + } else if (unlikely(err)) { + SSDFS_ERR("fail to create segbmap's segment objects: " + "err %d\n", + err); + goto destroy_seg_objects; + } + + if (ptr->flags & SSDFS_SEGBMAP_HAS_COPY) { + err = ssdfs_segbmap_create_segments(fsi, + SSDFS_COPY_SEGBMAP_SEG, + ptr); + if (unlikely(err)) { + SSDFS_ERR("fail to create segbmap's segment objects: " + "err %d\n", + err); + goto destroy_seg_objects; + } + } + + err = ssdfs_segbmap_init(ptr); + if (unlikely(err)) { + SSDFS_ERR("fail to init segment bitmap: err %d\n", + err); + goto destroy_seg_objects; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("DONE: create segment bitmap\n"); +#else + SSDFS_DBG("DONE: create segment bitmap\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; + +destroy_seg_objects: + ssdfs_segbmap_destroy_segments(fsi->segbmap); + +forget_inode: + iput(fsi->segbmap_inode); + +free_desc_array: + ssdfs_seg_bmap_kfree(fsi->segbmap->desc_array); + +free_fragment_bmaps: + ssdfs_segbmap_destroy_fragment_bitmaps(fsi->segbmap); + +free_segbmap_object: + ssdfs_seg_bmap_kfree(fsi->segbmap); + + fsi->segbmap = NULL; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(err == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_segbmap_destroy() - destroy segment bitmap object + * @fsi: file system info object + * + * This method destroys segment bitmap object. + */ +void ssdfs_segbmap_destroy(struct ssdfs_fs_info *fsi) +{ + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("segbmap %p\n", fsi->segbmap); +#else + SSDFS_DBG("segbmap %p\n", fsi->segbmap); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (!fsi->segbmap) + return; + + inode_lock(fsi->segbmap_inode); + down_write(&fsi->segbmap->resize_lock); + down_write(&fsi->segbmap->search_lock); + + ssdfs_segbmap_destroy_segments(fsi->segbmap); + + if (mapping_tagged(&fsi->segbmap->pages, PAGECACHE_TAG_DIRTY)) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "segment bitmap is dirty on destruction\n"); + } + + for (i = 0; i < fsi->segbmap->fragments_count; i++) { + struct page *page; + + xa_lock_irq(&fsi->segbmap->pages.i_pages); + page = __xa_erase(&fsi->segbmap->pages.i_pages, i); + xa_unlock_irq(&fsi->segbmap->pages.i_pages); + + if (!page) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %d is NULL\n", i); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } + + page->mapping = NULL; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_put_page(page); + ssdfs_seg_bmap_free_page(page); + } + + if (fsi->segbmap->pages.nrpages != 0) + truncate_inode_pages(&fsi->segbmap->pages, 0); + + ssdfs_segbmap_destroy_fragment_bitmaps(fsi->segbmap); + + ssdfs_seg_bmap_kfree(fsi->segbmap->desc_array); + + up_write(&fsi->segbmap->resize_lock); + up_write(&fsi->segbmap->search_lock); + inode_unlock(fsi->segbmap_inode); + + iput(fsi->segbmap_inode); + ssdfs_seg_bmap_kfree(fsi->segbmap); + fsi->segbmap = NULL; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ +} + +/* + * ssdfs_segbmap_get_state_from_byte() - retrieve state of item from byte + * @byte_ptr: pointer on byte + * @byte_item: index of item in byte + */ +static inline +int ssdfs_segbmap_get_state_from_byte(u8 *byte_ptr, u32 byte_item) +{ + u32 shift; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("byte_ptr %p, byte_item %u\n", + byte_ptr, byte_item); + + BUG_ON(!byte_ptr); + BUG_ON(byte_item >= SSDFS_ITEMS_PER_BYTE(SSDFS_SEG_STATE_BITS)); +#endif /* CONFIG_SSDFS_DEBUG */ + + shift = byte_item * SSDFS_SEG_STATE_BITS; + return (int)((*byte_ptr >> shift) & SSDFS_SEG_STATE_MASK); +} + +/* + * ssdfs_segbmap_check_fragment_header() - check fragment's header + * @pebc: pointer on PEB container + * @seg_index: index of segment in segbmap's segments sequence + * @sequence_id: sequence ID of fragment + * @page: page contains fragment + * + * This method tries to check fragment's header. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - fragment is corrupted. + */ +int ssdfs_segbmap_check_fragment_header(struct ssdfs_peb_container *pebc, + u16 seg_index, + u16 sequence_id, + struct page *page) +{ + struct ssdfs_segment_bmap *segbmap; + struct ssdfs_segbmap_fragment_header *hdr; + size_t hdr_size = sizeof(struct ssdfs_segbmap_fragment_header); + void *kaddr; + u16 fragment_bytes; + __le32 old_csum, csum; + u16 total_segs, calculated_segs; + u16 clean_or_using_segs, used_or_dirty_segs, bad_segs; +#ifdef CONFIG_SSDFS_DEBUG + u32 items_per_byte = SSDFS_ITEMS_PER_BYTE(SSDFS_SEG_STATE_BITS); + u32 byte_offset; + u8 *byte_ptr; + u32 byte_item; + int state = SSDFS_SEG_STATE_MAX; + u16 clean_or_using_segs_calculated; + u16 used_or_dirty_segs_calculated; + u16 bad_segs_calculated; + int i; +#endif /* CONFIG_SSDFS_DEBUG */ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!page); + + SSDFS_DBG("seg %llu, peb_index %u, page %p\n", + pebc->parent_si->seg_id, pebc->peb_index, + page); +#endif /* CONFIG_SSDFS_DEBUG */ + + segbmap = pebc->parent_si->fsi->segbmap; + + kaddr = kmap_local_page(page); + + hdr = SSDFS_SBMP_FRAG_HDR(kaddr); + + if (le32_to_cpu(hdr->magic) != SSDFS_SEGBMAP_HDR_MAGIC) { + err = -EIO; + SSDFS_ERR("segbmap header is corrupted: " + "invalid magic\n"); + goto fragment_hdr_corrupted; + } + + fragment_bytes = le16_to_cpu(hdr->fragment_bytes); + if (fragment_bytes > segbmap->fragment_size) { + err = -EIO; + SSDFS_ERR("segbmap header is corrupted: " + "invalid fragment size %u\n", + fragment_bytes); + goto fragment_hdr_corrupted; + } + + old_csum = hdr->checksum; + hdr->checksum = 0; + csum = ssdfs_crc32_le(kaddr, fragment_bytes); + hdr->checksum = old_csum; + + if (old_csum != csum) { + err = -EIO; + SSDFS_ERR("segbmap header is corrupted: " + "old_csum %u != csum %u\n", + le32_to_cpu(old_csum), + le32_to_cpu(csum)); + goto fragment_hdr_corrupted; + } + + if (seg_index != le16_to_cpu(hdr->seg_index)) { + err = -EIO; + SSDFS_ERR("segbmap header is corrupted: " + "seg_index %u != hdr->seg_index %u\n", + seg_index, le16_to_cpu(hdr->seg_index)); + goto fragment_hdr_corrupted; + } + + if (pebc->peb_index != le16_to_cpu(hdr->peb_index)) { + err = -EIO; + SSDFS_ERR("segbmap header is corrupted: " + "peb_index %u != hdr->peb_index %u\n", + pebc->peb_index, + le16_to_cpu(hdr->peb_index)); + goto fragment_hdr_corrupted; + } + + if (hdr->seg_type >= SSDFS_SEGBMAP_SEG_COPY_MAX) { + err = -EIO; + SSDFS_ERR("segbmap header is corrupted: " + "invalid seg_type %u\n", + hdr->seg_type); + goto fragment_hdr_corrupted; + } + + if (sequence_id != le16_to_cpu(hdr->sequence_id)) { + err = -EIO; + SSDFS_ERR("segbmap header is corrupted: " + "sequence_id %u != hdr->sequence_id %u\n", + sequence_id, + le16_to_cpu(hdr->sequence_id)); + goto fragment_hdr_corrupted; + } + + total_segs = le16_to_cpu(hdr->total_segs); + if (fragment_bytes != (SEG_BMAP_BYTES(total_segs) + hdr_size)) { + err = -EIO; + SSDFS_ERR("segbmap header is corrupted: " + "invalid fragment's items count %u\n", + total_segs); + goto fragment_hdr_corrupted; + } + + clean_or_using_segs = le16_to_cpu(hdr->clean_or_using_segs); + used_or_dirty_segs = le16_to_cpu(hdr->used_or_dirty_segs); + bad_segs = le16_to_cpu(hdr->bad_segs); + calculated_segs = clean_or_using_segs + used_or_dirty_segs + bad_segs; + + if (total_segs != calculated_segs) { + err = -EIO; + SSDFS_ERR("segbmap header is corrupted: " + "clean_or_using_segs %u, " + "used_or_dirty_segs %u, " + "bad_segs %u, total_segs %u\n", + clean_or_using_segs, used_or_dirty_segs, + bad_segs, total_segs); + goto fragment_hdr_corrupted; + } + +#ifdef CONFIG_SSDFS_DEBUG + clean_or_using_segs_calculated = 0; + used_or_dirty_segs_calculated = 0; + bad_segs_calculated = 0; + + for (i = 0; i < total_segs; i++) { + byte_offset = ssdfs_segbmap_get_item_byte_offset(i); + + if (byte_offset >= PAGE_SIZE) { + err = -ERANGE; + SSDFS_ERR("invalid byte_offset %u\n", + byte_offset); + goto fragment_hdr_corrupted; + } + + byte_item = i - ((byte_offset - hdr_size) * items_per_byte); + + byte_ptr = (u8 *)kaddr + byte_offset; + state = ssdfs_segbmap_get_state_from_byte(byte_ptr, byte_item); + + switch (state) { + case SSDFS_SEG_CLEAN: + case SSDFS_SEG_DATA_USING: + case SSDFS_SEG_LEAF_NODE_USING: + case SSDFS_SEG_HYBRID_NODE_USING: + case SSDFS_SEG_INDEX_NODE_USING: + case SSDFS_SEG_RESERVED: + clean_or_using_segs_calculated++; + break; + + case SSDFS_SEG_USED: + case SSDFS_SEG_PRE_DIRTY: + case SSDFS_SEG_DIRTY: + used_or_dirty_segs_calculated++; + break; + + case SSDFS_SEG_BAD: + bad_segs_calculated++; + break; + + default: + err = -EIO; + SSDFS_ERR("unexpected state %#x\n", + state); + goto fragment_hdr_corrupted; + } + } + + if (clean_or_using_segs_calculated != clean_or_using_segs) { + err = -EIO; + SSDFS_ERR("calculated %u != clean_or_using_segs %u\n", + clean_or_using_segs_calculated, + clean_or_using_segs); + } + + if (used_or_dirty_segs_calculated != used_or_dirty_segs) { + err = -EIO; + SSDFS_ERR("calculated %u != used_or_dirty_segs %u\n", + used_or_dirty_segs_calculated, + used_or_dirty_segs); + } + + if (bad_segs_calculated != bad_segs) { + err = -EIO; + SSDFS_ERR("calculated %u != bad_segs %u\n", + bad_segs_calculated, + bad_segs); + } + + if (err) + goto fragment_hdr_corrupted; +#endif /* CONFIG_SSDFS_DEBUG */ + +fragment_hdr_corrupted: + kunmap_local(kaddr); + + return err; +} + +/* + * ssdfs_segbmap_fragment_init() - init segbmap's fragment + * @pebc: pointer on PEB container + * @sequence_id: sequence ID of fragment + * @page: page contains fragment + * @state: state of fragment + */ +int ssdfs_segbmap_fragment_init(struct ssdfs_peb_container *pebc, + u16 sequence_id, + struct page *page, + int state) +{ + struct ssdfs_segment_bmap *segbmap; + struct ssdfs_segbmap_fragment_header *hdr; + struct ssdfs_segbmap_fragment_desc *desc; + unsigned long *fbmap; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pebc || !pebc->parent_si || !pebc->parent_si->fsi); + BUG_ON(!pebc->parent_si->fsi->segbmap || !page); + BUG_ON(state <= SSDFS_SEGBMAP_FRAG_CREATED || + state >= SSDFS_SEGBMAP_FRAG_DIRTY); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("seg %llu, peb_index %u, " + "sequence_id %u, page %p, " + "state %#x\n", + pebc->parent_si->seg_id, pebc->peb_index, + sequence_id, page, state); +#else + SSDFS_DBG("seg %llu, peb_index %u, " + "sequence_id %u, page %p, " + "state %#x\n", + pebc->parent_si->seg_id, pebc->peb_index, + sequence_id, page, state); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + segbmap = pebc->parent_si->fsi->segbmap; + + inode_lock_shared(pebc->parent_si->fsi->segbmap_inode); + + ssdfs_get_page(page); + page->index = sequence_id; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&segbmap->search_lock); + + desc = &segbmap->desc_array[sequence_id]; + + xa_lock_irq(&segbmap->pages.i_pages); + err = __xa_insert(&segbmap->pages.i_pages, + sequence_id, page, GFP_NOFS); + if (unlikely(err < 0)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fail to add page %u into address space: err %d\n", + sequence_id, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + page->mapping = NULL; + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + page->mapping = &segbmap->pages; + segbmap->pages.nrpages++; + } + xa_unlock_irq(&segbmap->pages.i_pages); + + if (unlikely(err)) + goto unlock_search_lock; + + if (desc->state != SSDFS_SEGBMAP_FRAG_CREATED) { + err = -ERANGE; + SSDFS_ERR("fail to initialize segbmap fragment\n"); + } else { + hdr = SSDFS_SBMP_FRAG_HDR(kmap_local_page(page)); + desc->total_segs = le16_to_cpu(hdr->total_segs); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("total_segs %u, clean_or_using_segs %u, " + "used_or_dirty_segs %u, bad_segs %u\n", + le16_to_cpu(hdr->total_segs), + le16_to_cpu(hdr->clean_or_using_segs), + le16_to_cpu(hdr->used_or_dirty_segs), + le16_to_cpu(hdr->bad_segs)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fbmap = segbmap->fbmap[SSDFS_SEGBMAP_CLEAN_USING_FBMAP]; + desc->clean_or_using_segs = + le16_to_cpu(hdr->clean_or_using_segs); + if (desc->clean_or_using_segs == 0) + bitmap_clear(fbmap, sequence_id, 1); + else + bitmap_set(fbmap, sequence_id, 1); + + fbmap = segbmap->fbmap[SSDFS_SEGBMAP_USED_DIRTY_FBMAP]; + desc->used_or_dirty_segs = + le16_to_cpu(hdr->used_or_dirty_segs); + if (desc->used_or_dirty_segs == 0) + bitmap_clear(fbmap, sequence_id, 1); + else + bitmap_set(fbmap, sequence_id, 1); + + fbmap = segbmap->fbmap[SSDFS_SEGBMAP_BAD_FBMAP]; + desc->bad_segs = le16_to_cpu(hdr->bad_segs); + if (desc->bad_segs == 0) + bitmap_clear(fbmap, sequence_id, 1); + else + bitmap_set(fbmap, sequence_id, 1); + + desc->state = state; + kunmap_local(hdr); + } + + ssdfs_seg_bmap_account_page(page); + +unlock_search_lock: + complete_all(&desc->init_end); + up_write(&segbmap->search_lock); + inode_unlock_shared(pebc->parent_si->fsi->segbmap_inode); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * ssdfs_sb_segbmap_header_correct_state() - save segbmap's state in superblock + * @segbmap: pointer on segment bitmap object + */ +static +void ssdfs_sb_segbmap_header_correct_state(struct ssdfs_segment_bmap *segbmap) +{ + struct ssdfs_segbmap_sb_header *hdr; + __le64 seg; + int i, j; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap); + BUG_ON(!rwsem_is_locked(&segbmap->resize_lock)); + BUG_ON(!rwsem_is_locked(&segbmap->fsi->volume_sem)); + + SSDFS_DBG("segbmap %p\n", + segbmap); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr = &segbmap->fsi->vh->segbmap; + + hdr->fragments_count = cpu_to_le16(segbmap->fragments_count); + hdr->fragments_per_seg = cpu_to_le16(segbmap->fragments_per_seg); + hdr->fragments_per_peb = cpu_to_le16(segbmap->fragments_per_peb); + hdr->fragment_size = cpu_to_le16(segbmap->fragment_size); + + hdr->bytes_count = cpu_to_le32(segbmap->bytes_count); + hdr->flags = cpu_to_le16(segbmap->flags); + hdr->segs_count = cpu_to_le16(segbmap->segs_count); + + for (i = 0; i < segbmap->segs_count; i++) { + j = SSDFS_MAIN_SEGBMAP_SEG; + seg = cpu_to_le64(segbmap->seg_numbers[i][j]); + hdr->segs[i][j] = seg; + + j = SSDFS_COPY_SEGBMAP_SEG; + seg = cpu_to_le64(segbmap->seg_numbers[i][j]); + hdr->segs[i][j] = seg; + } +} + +/* + * ssdfs_segbmap_copy_dirty_fragment() - copy dirty fragment into request + * @segbmap: pointer on segment bitmap object + * @fragment_index: index of fragment + * @page_index: index of page in request + * @req: segment request + * + * This method tries to copy dirty fragment into request. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_segbmap_copy_dirty_fragment(struct ssdfs_segment_bmap *segbmap, + u16 fragment_index, + u16 page_index, + struct ssdfs_segment_request *req) +{ + struct ssdfs_segbmap_fragment_desc *desc; + struct ssdfs_segbmap_fragment_header *hdr; + struct page *dpage, *spage; + void *kaddr; + u16 fragment_bytes; + __le32 old_csum, csum; + u16 total_segs; + u16 clean_or_using_segs; + u16 used_or_dirty_segs; + u16 bad_segs; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap || !req); + BUG_ON(!rwsem_is_locked(&segbmap->search_lock)); + BUG_ON(page_index >= PAGEVEC_SIZE); + + SSDFS_DBG("segbmap %p, fragment_index %u, " + "page_index %u, req %p\n", + segbmap, fragment_index, page_index, req); +#endif /* CONFIG_SSDFS_DEBUG */ + + desc = &segbmap->desc_array[fragment_index]; + + if (desc->state != SSDFS_SEGBMAP_FRAG_DIRTY) { + SSDFS_ERR("fragment %u isn't dirty\n", + fragment_index); + return -ERANGE; + } + + spage = find_lock_page(&segbmap->pages, fragment_index); + if (!spage) { + SSDFS_ERR("fail to find page: fragment_index %u\n", + fragment_index); + return -ERANGE; + } + + ssdfs_account_locked_page(spage); + + kaddr = kmap_local_page(spage); + hdr = SSDFS_SBMP_FRAG_HDR(kaddr); + + if (le32_to_cpu(hdr->magic) != SSDFS_SEGBMAP_HDR_MAGIC) { + err = -ERANGE; + SSDFS_ERR("segbmap header is corrupted: " + "invalid magic\n"); + goto fail_copy_fragment; + } + + fragment_bytes = le16_to_cpu(hdr->fragment_bytes); + + old_csum = hdr->checksum; + hdr->checksum = 0; + csum = ssdfs_crc32_le(kaddr, fragment_bytes); + hdr->checksum = old_csum; + + if (old_csum != csum) { + err = -ERANGE; + SSDFS_ERR("segbmap header is corrupted: " + "old_csum %u != csum %u\n", + le32_to_cpu(old_csum), + le32_to_cpu(csum)); + goto fail_copy_fragment; + } + + total_segs = desc->total_segs; + if (total_segs != le16_to_cpu(hdr->total_segs)) { + err = -ERANGE; + SSDFS_ERR("segbmap header is corrupted: " + "desc->total_segs %u != hdr->total_segs %u\n", + desc->total_segs, + le16_to_cpu(hdr->total_segs)); + goto fail_copy_fragment; + } + + clean_or_using_segs = desc->clean_or_using_segs; + if (clean_or_using_segs != le16_to_cpu(hdr->clean_or_using_segs)) { + err = -ERANGE; + SSDFS_ERR("segbmap header is corrupted: " + "desc->clean_or_using_segs %u != " + "hdr->clean_or_using_segs %u\n", + desc->clean_or_using_segs, + le16_to_cpu(hdr->clean_or_using_segs)); + goto fail_copy_fragment; + } + + used_or_dirty_segs = desc->used_or_dirty_segs; + if (used_or_dirty_segs != le16_to_cpu(hdr->used_or_dirty_segs)) { + err = -ERANGE; + SSDFS_ERR("segbmap header is corrupted: " + "desc->used_or_dirty_segs %u != " + "hdr->used_or_dirty_segs %u\n", + desc->used_or_dirty_segs, + le16_to_cpu(hdr->used_or_dirty_segs)); + goto fail_copy_fragment; + } + + bad_segs = desc->bad_segs; + if (bad_segs != le16_to_cpu(hdr->bad_segs)) { + err = -ERANGE; + SSDFS_ERR("segbmap header is corrupted: " + "desc->bad_segs %u != " + "hdr->bad_segs %u\n", + desc->bad_segs, + le16_to_cpu(hdr->bad_segs)); + goto fail_copy_fragment; + } + + dpage = req->result.pvec.pages[page_index]; + + if (!dpage) { + err = -ERANGE; + SSDFS_ERR("invalid page: page_index %u\n", + page_index); + goto fail_copy_fragment; + } + + ssdfs_memcpy_to_page(dpage, 0, PAGE_SIZE, + kaddr, 0, PAGE_SIZE, + PAGE_SIZE); + + SetPageUptodate(dpage); + if (!PageDirty(dpage)) + ssdfs_set_page_dirty(dpage); + set_page_writeback(dpage); + + __ssdfs_clear_dirty_page(spage); + + desc->state = SSDFS_SEGBMAP_FRAG_TOWRITE; + +fail_copy_fragment: + kunmap_local(kaddr); + ssdfs_unlock_page(spage); + ssdfs_put_page(spage); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + spage, page_ref_count(spage)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_segbmap_replicate_fragment() - replicate fragment between requests + * @req1: source request + * @page_index: index of replicated page in @req1 + * @req2: destination request + */ +static +void ssdfs_segbmap_replicate_fragment(struct ssdfs_segment_request *req1, + u16 page_index, + struct ssdfs_segment_request *req2) +{ + struct ssdfs_segbmap_fragment_header *hdr; + u16 fragment_bytes; + struct page *spage, *dpage; + void *kaddr; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req1 || !req2); + BUG_ON(page_index >= pagevec_count(&req1->result.pvec)); + BUG_ON(page_index >= pagevec_count(&req2->result.pvec)); + + SSDFS_DBG("req1 %p, req2 %p, page_index %u\n", + req1, req2, page_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + spage = req1->result.pvec.pages[page_index]; + dpage = req2->result.pvec.pages[page_index]; + + ssdfs_memcpy_page(dpage, 0, PAGE_SIZE, + spage, 0, PAGE_SIZE, + PAGE_SIZE); + + kaddr = kmap_local_page(dpage); + hdr = SSDFS_SBMP_FRAG_HDR(kaddr); + hdr->seg_type = SSDFS_COPY_SEGBMAP_SEG; + fragment_bytes = le16_to_cpu(hdr->fragment_bytes); + hdr->checksum = 0; + hdr->checksum = ssdfs_crc32_le(kaddr, fragment_bytes); + flush_dcache_page(dpage); + kunmap_local(kaddr); + + SetPageUptodate(dpage); + if (!PageDirty(dpage)) + ssdfs_set_page_dirty(dpage); + set_page_writeback(dpage); +} + +/* + * ssdfs_segbmap_define_volume_extent() - define volume extent for request + * @segbmap: pointer on segment bitmap object + * @req: segment request + * @hdr: fragment's header + * @fragments_count: count of fragments in the chunk + * @seg_index: index of segment in segbmap's array [out] + */ +static +int ssdfs_segbmap_define_volume_extent(struct ssdfs_segment_bmap *segbmap, + struct ssdfs_segment_request *req, + struct ssdfs_segbmap_fragment_header *hdr, + u16 fragments_count, + u16 *seg_index) +{ + u16 sequence_id; + u16 fragment_index; + u32 pagesize; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap || !req || !hdr || !seg_index); + BUG_ON(!rwsem_is_locked(&segbmap->search_lock)); + BUG_ON(!rwsem_is_locked(&segbmap->resize_lock)); + + SSDFS_DBG("segbmap %p, req %p\n", + segbmap, req); +#endif /* CONFIG_SSDFS_DEBUG */ + + *seg_index = le16_to_cpu(hdr->seg_index); + sequence_id = le16_to_cpu(hdr->sequence_id); + + if (*seg_index != (sequence_id / segbmap->fragments_per_seg)) { + SSDFS_ERR("invalid seg_index %u or sequence_id %u\n", + *seg_index, sequence_id); + return -ERANGE; + } + + fragment_index = sequence_id % segbmap->fragments_per_seg; + pagesize = segbmap->fsi->pagesize; + + if (pagesize < segbmap->fragment_size) { + u32 pages_per_item; + + pages_per_item = segbmap->fragment_size + pagesize - 1; + pages_per_item /= pagesize; + req->place.start.blk_index = fragment_index * pages_per_item; + req->place.len = fragments_count * pages_per_item; + } else if (pagesize > segbmap->fragment_size) { + u32 items_per_page; + + items_per_page = pagesize + segbmap->fragment_size - 1; + items_per_page /= segbmap->fragment_size; + req->place.start.blk_index = fragment_index / items_per_page; + req->place.len = fragments_count + items_per_page - 1; + req->place.len /= items_per_page; + } else { + req->place.start.blk_index = fragment_index; + req->place.len = fragments_count; + } + + return 0; +} + +/* + * ssdfs_segbmap_issue_fragments_update() - issue fragment updates + * @segbmap: pointer on segment bitmap object + * @start_fragment: start fragment number for dirty bitmap + * @fragment_size: size of fragment in bytes + * @dirty_bmap: bitmap for dirty states searching + * + * This method tries to issue updates for all dirty fragments + * in @dirty_bmap. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENODATA - @dirty_bmap hasn't dirty fragments. + * %-ENOMEM - fail to allocate memory. + * %-ERANGE - internal error. + */ +static +int ssdfs_segbmap_issue_fragments_update(struct ssdfs_segment_bmap *segbmap, + u16 start_fragment, + u16 fragment_size, + unsigned long dirty_bmap) +{ + struct ssdfs_segment_request *req1 = NULL, *req2 = NULL; + struct ssdfs_segbmap_fragment_desc *fragment; + struct ssdfs_segbmap_fragment_header *hdr; + struct ssdfs_segment_info *si; + void *kaddr; + bool is_bit_found; + bool has_backup; + u64 ino = SSDFS_SEG_BMAP_INO; + u64 offset; + u32 size; + u16 fragments_count; + u16 seg_index; + int i = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap); + BUG_ON(!rwsem_is_locked(&segbmap->search_lock)); + BUG_ON(!rwsem_is_locked(&segbmap->resize_lock)); + + SSDFS_DBG("segbmap %p, start_fragment %u, dirty_bmap %#lx\n", + segbmap, start_fragment, dirty_bmap); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (dirty_bmap == 0) { + SSDFS_DBG("bmap doesn't contain dirty bits\n"); + return -ENODATA; + } + + has_backup = segbmap->flags & SSDFS_SEGBMAP_HAS_COPY; + + do { + is_bit_found = test_bit(i, &dirty_bmap); + + if (!is_bit_found) { + i++; + continue; + } + + fragment = &segbmap->desc_array[start_fragment + i]; + + if (fragment->state != SSDFS_SEGBMAP_FRAG_DIRTY) { + SSDFS_ERR("invalid fragment's state %#x\n", + fragment->state); + return -ERANGE; + } + + req1 = &fragment->flush_req1; + req2 = &fragment->flush_req2; + + ssdfs_request_init(req1); + ssdfs_get_request(req1); + + if (has_backup) { + ssdfs_request_init(req2); + ssdfs_get_request(req2); + } + + err = ssdfs_request_add_allocated_page_locked(req1); + if (!err && has_backup) + err = ssdfs_request_add_allocated_page_locked(req2); + + if (unlikely(err)) { + SSDFS_ERR("fail allocate memory page: err %d\n", err); + goto fail_issue_fragment_updates; + } + + err = ssdfs_segbmap_copy_dirty_fragment(segbmap, + start_fragment + i, + 0, req1); + if (unlikely(err)) { + SSDFS_ERR("fail to copy dirty fragment: " + "fragment %u, err %d\n", + start_fragment + i, err); + goto fail_issue_fragment_updates; + } + + if (has_backup) + ssdfs_segbmap_replicate_fragment(req1, 0, req2); + + offset = (u64)start_fragment + i; + offset *= fragment_size; + size = fragment_size; + + ssdfs_request_prepare_logical_extent(ino, offset, size, + 0, 0, req1); + + if (has_backup) { + ssdfs_request_prepare_logical_extent(ino, + offset, + size, + 0, 0, + req2); + } + + fragments_count = (u16)pagevec_count(&req1->result.pvec); + kaddr = kmap_local_page(req1->result.pvec.pages[0]); + hdr = SSDFS_SBMP_FRAG_HDR(kaddr); + err = ssdfs_segbmap_define_volume_extent(segbmap, req1, + hdr, + fragments_count, + &seg_index); + kunmap_local(kaddr); + + if (unlikely(err)) { + SSDFS_ERR("fail to define volume extent: " + "err %d\n", + err); + goto fail_issue_fragment_updates; + } + + if (has_backup) { + ssdfs_memcpy(&req2->place, + 0, sizeof(struct ssdfs_volume_extent), + &req1->place, + 0, sizeof(struct ssdfs_volume_extent), + sizeof(struct ssdfs_volume_extent)); + } + + si = segbmap->segs[seg_index][SSDFS_MAIN_SEGBMAP_SEG]; + err = ssdfs_segment_update_extent_async(si, + SSDFS_REQ_ASYNC_NO_FREE, + req1); + si = segbmap->segs[seg_index][SSDFS_COPY_SEGBMAP_SEG]; + if (!err && has_backup) { + err = ssdfs_segment_update_extent_async(si, + SSDFS_REQ_ASYNC_NO_FREE, + req2); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to update extent: " + "seg_index %u, err %d\n", + seg_index, err); + goto fail_issue_fragment_updates; + } + + i++; + } while (i < BITS_PER_LONG); + + return 0; + +fail_issue_fragment_updates: + ssdfs_request_unlock_and_remove_pages(req1); + ssdfs_put_request(req1); + + if (has_backup) { + ssdfs_request_unlock_and_remove_pages(req2); + ssdfs_put_request(req2); + } + + return err; +} diff --git a/fs/ssdfs/segment_bitmap.h b/fs/ssdfs/segment_bitmap.h new file mode 100644 index 000000000000..ddf2d8a15897 --- /dev/null +++ b/fs/ssdfs/segment_bitmap.h @@ -0,0 +1,459 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/segment_bitmap.h - segment bitmap declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Badic + */ + +#ifndef _SSDFS_SEGMENT_BITMAP_H +#define _SSDFS_SEGMENT_BITMAP_H + +#include "common_bitmap.h" +#include "request_queue.h" + +/* Segment states */ +enum { + SSDFS_SEG_CLEAN = 0x0, + SSDFS_SEG_DATA_USING = 0x1, + SSDFS_SEG_LEAF_NODE_USING = 0x2, + SSDFS_SEG_HYBRID_NODE_USING = 0x5, + SSDFS_SEG_INDEX_NODE_USING = 0x3, + SSDFS_SEG_USED = 0x7, + SSDFS_SEG_PRE_DIRTY = 0x6, + SSDFS_SEG_DIRTY = 0x4, + SSDFS_SEG_BAD = 0x8, + SSDFS_SEG_RESERVED = 0x9, + SSDFS_SEG_STATE_MAX = SSDFS_SEG_RESERVED + 1, +}; + +/* Segment state flags */ +#define SSDFS_SEG_CLEAN_STATE_FLAG (1 << 0) +#define SSDFS_SEG_DATA_USING_STATE_FLAG (1 << 1) +#define SSDFS_SEG_LEAF_NODE_USING_STATE_FLAG (1 << 2) +#define SSDFS_SEG_HYBRID_NODE_USING_STATE_FLAG (1 << 3) +#define SSDFS_SEG_INDEX_NODE_USING_STATE_FLAG (1 << 4) +#define SSDFS_SEG_USED_STATE_FLAG (1 << 5) +#define SSDFS_SEG_PRE_DIRTY_STATE_FLAG (1 << 6) +#define SSDFS_SEG_DIRTY_STATE_FLAG (1 << 7) +#define SSDFS_SEG_BAD_STATE_FLAG (1 << 8) +#define SSDFS_SEG_RESERVED_STATE_FLAG (1 << 9) + +/* Segment state masks */ +#define SSDFS_SEG_CLEAN_USING_MASK \ + (SSDFS_SEG_CLEAN_STATE_FLAG | \ + SSDFS_SEG_DATA_USING_STATE_FLAG | \ + SSDFS_SEG_LEAF_NODE_USING_STATE_FLAG | \ + SSDFS_SEG_HYBRID_NODE_USING_STATE_FLAG | \ + SSDFS_SEG_INDEX_NODE_USING_STATE_FLAG) +#define SSDFS_SEG_USED_DIRTY_MASK \ + (SSDFS_SEG_USED_STATE_FLAG | \ + SSDFS_SEG_PRE_DIRTY_STATE_FLAG | \ + SSDFS_SEG_DIRTY_STATE_FLAG) +#define SSDFS_SEG_BAD_STATE_MASK \ + (SSDFS_SEG_BAD_STATE_FLAG) + +#define SSDFS_SEG_STATE_BITS 4 +#define SSDFS_SEG_STATE_MASK 0xF + +/* + * struct ssdfs_segbmap_fragment_desc - fragment descriptor + * @state: fragment's state + * @total_segs: total count of segments in fragment + * @clean_or_using_segs: count of clean or using segments in fragment + * @used_or_dirty_segs: count of used, pre-dirty, dirty or reserved segments + * @bad_segs: count of bad segments in fragment + * @init_end: wait of init ending + * @flush_req1: main flush request + * @flush_req2: backup flush request + */ +struct ssdfs_segbmap_fragment_desc { + int state; + u16 total_segs; + u16 clean_or_using_segs; + u16 used_or_dirty_segs; + u16 bad_segs; + struct completion init_end; + struct ssdfs_segment_request flush_req1; + struct ssdfs_segment_request flush_req2; +}; + +/* Fragment's state */ +enum { + SSDFS_SEGBMAP_FRAG_CREATED = 0, + SSDFS_SEGBMAP_FRAG_INIT_FAILED = 1, + SSDFS_SEGBMAP_FRAG_INITIALIZED = 2, + SSDFS_SEGBMAP_FRAG_DIRTY = 3, + SSDFS_SEGBMAP_FRAG_TOWRITE = 4, + SSDFS_SEGBMAP_FRAG_STATE_MAX = 5, +}; + +/* Fragments bitmap types */ +enum { + SSDFS_SEGBMAP_CLEAN_USING_FBMAP, + SSDFS_SEGBMAP_USED_DIRTY_FBMAP, + SSDFS_SEGBMAP_BAD_FBMAP, + SSDFS_SEGBMAP_MODIFICATION_FBMAP, + SSDFS_SEGBMAP_FBMAP_TYPE_MAX, +}; + +/* + * struct ssdfs_segment_bmap - segments bitmap + * @resize_lock: lock for possible resize operation + * @flags: bitmap flags + * @bytes_count: count of bytes in the whole segment bitmap + * @items_count: count of volume's segments + * @fragments_count: count of fragments in the whole segment bitmap + * @fragments_per_seg: segbmap's fragments per segment + * @fragments_per_peb: segbmap's fragments per PEB + * @fragment_size: size of fragment in bytes + * @seg_numbers: array of segment bitmap's segment numbers + * @segs_count: count of segment objects are used for segment bitmap + * @segs: array of pointers on segment objects + * @search_lock: lock for search and change state operations + * @fbmap: array of fragment bitmaps + * @desc_array: array of fragments' descriptors + * @pages: memory pages of the whole segment bitmap + * @fsi: pointer on shared file system object + */ +struct ssdfs_segment_bmap { + struct rw_semaphore resize_lock; + u16 flags; + u32 bytes_count; + u64 items_count; + u16 fragments_count; + u16 fragments_per_seg; + u16 fragments_per_peb; + u16 fragment_size; +#define SEGS_LIMIT1 SSDFS_SEGBMAP_SEGS +#define SEGS_LIMIT2 SSDFS_SEGBMAP_SEG_COPY_MAX + u64 seg_numbers[SEGS_LIMIT1][SEGS_LIMIT2]; + u16 segs_count; + struct ssdfs_segment_info *segs[SEGS_LIMIT1][SEGS_LIMIT2]; + + struct rw_semaphore search_lock; + unsigned long *fbmap[SSDFS_SEGBMAP_FBMAP_TYPE_MAX]; + struct ssdfs_segbmap_fragment_desc *desc_array; + struct address_space pages; + + struct ssdfs_fs_info *fsi; +}; + +/* + * Inline functions + */ +static inline +u32 SEG_BMAP_BYTES(u64 items_count) +{ + u64 bytes; + + bytes = items_count + SSDFS_ITEMS_PER_BYTE(SSDFS_SEG_STATE_BITS) - 1; + bytes /= SSDFS_ITEMS_PER_BYTE(SSDFS_SEG_STATE_BITS); + + BUG_ON(bytes >= U32_MAX); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_count %llu, bytes %llu\n", + items_count, bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + return (u32)bytes; +} + +static inline +u16 SEG_BMAP_FRAGMENTS(u64 items_count) +{ + u32 hdr_size = sizeof(struct ssdfs_segbmap_fragment_header); + u32 bytes = SEG_BMAP_BYTES(items_count); + u32 pages, fragments; + + pages = (bytes + PAGE_SIZE - 1) >> PAGE_SHIFT; + bytes += pages * hdr_size; + + fragments = (bytes + PAGE_SIZE - 1) >> PAGE_SHIFT; + BUG_ON(fragments >= U16_MAX); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_count %llu, pages %u, " + "bytes %u, fragments %u\n", + items_count, pages, + bytes, fragments); +#endif /* CONFIG_SSDFS_DEBUG */ + + return (u16)fragments; +} + +static inline +u16 ssdfs_segbmap_seg_2_fragment_index(u64 seg) +{ + u16 fragments_count = SEG_BMAP_FRAGMENTS(seg + 1); + + BUG_ON(fragments_count == 0); + return fragments_count - 1; +} + +static inline +u32 ssdfs_segbmap_items_per_fragment(size_t fragment_size) +{ + u32 hdr_size = sizeof(struct ssdfs_segbmap_fragment_header); + u32 payload_bytes; + u64 items; + + BUG_ON(hdr_size >= fragment_size); + + payload_bytes = fragment_size - hdr_size; + items = payload_bytes * SSDFS_ITEMS_PER_BYTE(SSDFS_SEG_STATE_BITS); + + BUG_ON(items >= U32_MAX); + + return (u32)items; +} + +static inline +u64 ssdfs_segbmap_define_first_fragment_item(pgoff_t fragment_index, + size_t fragment_size) +{ + return fragment_index * ssdfs_segbmap_items_per_fragment(fragment_size); +} + +static inline +u32 ssdfs_segbmap_get_item_byte_offset(u32 fragment_item) +{ + u32 hdr_size = sizeof(struct ssdfs_segbmap_fragment_header); + u32 items_per_byte = SSDFS_ITEMS_PER_BYTE(SSDFS_SEG_STATE_BITS); + return hdr_size + (fragment_item / items_per_byte); +} + +static inline +int ssdfs_segbmap_seg_id_2_seg_index(struct ssdfs_segment_bmap *segbmap, + u64 seg_id) +{ + int i; + + if (seg_id == U64_MAX) + return -ENODATA; + + for (i = 0; i < segbmap->segs_count; i++) { + if (seg_id == segbmap->seg_numbers[i][SSDFS_MAIN_SEGBMAP_SEG]) + return i; + if (seg_id == segbmap->seg_numbers[i][SSDFS_COPY_SEGBMAP_SEG]) + return i; + } + + return -ENODATA; +} + +static inline +bool ssdfs_segbmap_fragment_has_content(struct page *page) +{ + bool has_content = false; + void *kaddr; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); + + SSDFS_DBG("page %p\n", page); +#endif /* CONFIG_SSDFS_DEBUG */ + + kaddr = kmap_local_page(page); + if (memchr_inv(kaddr, 0xff, PAGE_SIZE) != NULL) + has_content = true; + kunmap_local(kaddr); + + return has_content; +} + +static inline +bool IS_STATE_GOOD_FOR_MASK(int mask, int state) +{ + bool is_good = false; + + switch (state) { + case SSDFS_SEG_CLEAN: + is_good = mask & SSDFS_SEG_CLEAN_STATE_FLAG; + break; + + case SSDFS_SEG_DATA_USING: + is_good = mask & SSDFS_SEG_DATA_USING_STATE_FLAG; + break; + + case SSDFS_SEG_LEAF_NODE_USING: + is_good = mask & SSDFS_SEG_LEAF_NODE_USING_STATE_FLAG; + break; + + case SSDFS_SEG_HYBRID_NODE_USING: + is_good = mask & SSDFS_SEG_HYBRID_NODE_USING_STATE_FLAG; + break; + + case SSDFS_SEG_INDEX_NODE_USING: + is_good = mask & SSDFS_SEG_INDEX_NODE_USING_STATE_FLAG; + break; + + case SSDFS_SEG_USED: + is_good = mask & SSDFS_SEG_USED_STATE_FLAG; + break; + + case SSDFS_SEG_PRE_DIRTY: + is_good = mask & SSDFS_SEG_PRE_DIRTY_STATE_FLAG; + break; + + case SSDFS_SEG_DIRTY: + is_good = mask & SSDFS_SEG_DIRTY_STATE_FLAG; + break; + + case SSDFS_SEG_BAD: + is_good = mask & SSDFS_SEG_BAD_STATE_FLAG; + break; + + case SSDFS_SEG_RESERVED: + is_good = mask & SSDFS_SEG_RESERVED_STATE_FLAG; + break; + + default: + BUG(); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("mask %#x, state %#x, is_good %#x\n", + mask, state, is_good); +#endif /* CONFIG_SSDFS_DEBUG */ + + return is_good; +} + +static inline +void ssdfs_debug_segbmap_object(struct ssdfs_segment_bmap *bmap) +{ +#ifdef CONFIG_SSDFS_DEBUG + int i, j; + size_t bytes; + + BUG_ON(!bmap); + + SSDFS_DBG("flags %#x, bytes_count %u, items_count %llu, " + "fragments_count %u, fragments_per_seg %u, " + "fragments_per_peb %u, fragment_size %u\n", + bmap->flags, bmap->bytes_count, bmap->items_count, + bmap->fragments_count, bmap->fragments_per_seg, + bmap->fragments_per_peb, bmap->fragment_size); + + for (i = 0; i < SSDFS_SEGBMAP_SEGS; i++) { + for (j = 0; j < SSDFS_SEGBMAP_SEG_COPY_MAX; j++) { + SSDFS_DBG("seg_numbers[%d][%d] = %llu\n", + i, j, bmap->seg_numbers[i][j]); + } + } + + SSDFS_DBG("segs_count %u\n", bmap->segs_count); + + for (i = 0; i < SSDFS_SEGBMAP_SEGS; i++) { + for (j = 0; j < SSDFS_SEGBMAP_SEG_COPY_MAX; j++) { + SSDFS_DBG("segs[%d][%d] = %p\n", + i, j, bmap->segs[i][j]); + } + } + + bytes = bmap->fragments_count + BITS_PER_LONG - 1; + bytes /= BITS_PER_BYTE; + + for (i = 0; i < SSDFS_SEGBMAP_FBMAP_TYPE_MAX; i++) { + SSDFS_DBG("fbmap[%d]\n", i); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + bmap->fbmap[i], bytes); + } + + for (i = 0; i < bmap->fragments_count; i++) { + struct ssdfs_segbmap_fragment_desc *desc; + + desc = &bmap->desc_array[i]; + + SSDFS_DBG("state %#x, total_segs %u, " + "clean_or_using_segs %u, used_or_dirty_segs %u, " + "bad_segs %u\n", + desc->state, desc->total_segs, + desc->clean_or_using_segs, + desc->used_or_dirty_segs, + desc->bad_segs); + } + + for (i = 0; i < bmap->fragments_count; i++) { + struct page *page; + void *kaddr; + + page = find_lock_page(&bmap->pages, i); + + SSDFS_DBG("page[%d] %p\n", i, page); + if (!page) + continue; + + ssdfs_account_locked_page(page); + + SSDFS_DBG("page_index %llu, flags %#lx\n", + (u64)page_index(page), page->flags); + + kaddr = kmap_local_page(page); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, PAGE_SIZE); + kunmap_local(kaddr); + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); + } +#endif /* CONFIG_SSDFS_DEBUG */ +} + +/* + * Segment bitmap's API + */ +int ssdfs_segbmap_create(struct ssdfs_fs_info *fsi); +void ssdfs_segbmap_destroy(struct ssdfs_fs_info *fsi); +int ssdfs_segbmap_check_fragment_header(struct ssdfs_peb_container *pebc, + u16 seg_index, + u16 sequence_id, + struct page *page); +int ssdfs_segbmap_fragment_init(struct ssdfs_peb_container *pebc, + u16 sequence_id, + struct page *page, + int state); +int ssdfs_segbmap_flush(struct ssdfs_segment_bmap *segbmap); +int ssdfs_segbmap_resize(struct ssdfs_segment_bmap *segbmap, + u64 new_items_count); + +int ssdfs_segbmap_check_state(struct ssdfs_segment_bmap *segbmap, + u64 seg, int state, + struct completion **end); +int ssdfs_segbmap_get_state(struct ssdfs_segment_bmap *segbmap, + u64 seg, struct completion **end); +int ssdfs_segbmap_change_state(struct ssdfs_segment_bmap *segbmap, + u64 seg, int new_state, + struct completion **end); +int ssdfs_segbmap_find(struct ssdfs_segment_bmap *segbmap, + u64 start, u64 max, + int state, int mask, + u64 *seg, struct completion **end); +int ssdfs_segbmap_find_and_set(struct ssdfs_segment_bmap *segbmap, + u64 start, u64 max, + int state, int mask, + int new_state, + u64 *seg, struct completion **end); +int ssdfs_segbmap_reserve_clean_segment(struct ssdfs_segment_bmap *segbmap, + u64 start, u64 max, + u64 *seg, struct completion **end); + +#endif /* _SSDFS_SEGMENT_BITMAP_H */ diff --git a/fs/ssdfs/segment_bitmap_tables.c b/fs/ssdfs/segment_bitmap_tables.c new file mode 100644 index 000000000000..a1eeba918a12 --- /dev/null +++ b/fs/ssdfs/segment_bitmap_tables.c @@ -0,0 +1,814 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/segment_bitmap_tables.c - declaration of segbmap's search tables. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include + +/* + * Table for determination presence of clean segment + * state in provided byte. Checking byte is used + * as index in array. + */ +const bool detect_clean_seg[U8_MAX + 1] = { +/* 00 - 0x00 */ true, true, true, true, +/* 01 - 0x04 */ true, true, true, true, +/* 02 - 0x08 */ true, true, true, true, +/* 03 - 0x0C */ true, true, true, true, +/* 04 - 0x10 */ true, false, false, false, +/* 05 - 0x14 */ false, false, false, false, +/* 06 - 0x18 */ false, false, false, false, +/* 07 - 0x1C */ false, false, false, false, +/* 08 - 0x20 */ true, false, false, false, +/* 09 - 0x24 */ false, false, false, false, +/* 10 - 0x28 */ false, false, false, false, +/* 11 - 0x2C */ false, false, false, false, +/* 12 - 0x30 */ true, false, false, false, +/* 13 - 0x34 */ false, false, false, false, +/* 14 - 0x38 */ false, false, false, false, +/* 15 - 0x3C */ false, false, false, false, +/* 16 - 0x40 */ true, false, false, false, +/* 17 - 0x44 */ false, false, false, false, +/* 18 - 0x48 */ false, false, false, false, +/* 19 - 0x4C */ false, false, false, false, +/* 20 - 0x50 */ true, false, false, false, +/* 21 - 0x54 */ false, false, false, false, +/* 22 - 0x58 */ false, false, false, false, +/* 23 - 0x5C */ false, false, false, false, +/* 24 - 0x60 */ true, false, false, false, +/* 25 - 0x64 */ false, false, false, false, +/* 26 - 0x68 */ false, false, false, false, +/* 27 - 0x6C */ false, false, false, false, +/* 28 - 0x70 */ true, false, false, false, +/* 29 - 0x74 */ false, false, false, false, +/* 30 - 0x78 */ false, false, false, false, +/* 31 - 0x7C */ false, false, false, false, +/* 32 - 0x80 */ true, false, false, false, +/* 33 - 0x84 */ false, false, false, false, +/* 34 - 0x88 */ false, false, false, false, +/* 35 - 0x8C */ false, false, false, false, +/* 36 - 0x90 */ true, false, false, false, +/* 37 - 0x94 */ false, false, false, false, +/* 38 - 0x98 */ false, false, false, false, +/* 39 - 0x9C */ false, false, false, false, +/* 40 - 0xA0 */ true, false, false, false, +/* 41 - 0xA4 */ false, false, false, false, +/* 42 - 0xA8 */ false, false, false, false, +/* 43 - 0xAC */ false, false, false, false, +/* 44 - 0xB0 */ true, false, false, false, +/* 45 - 0xB4 */ false, false, false, false, +/* 46 - 0xB8 */ false, false, false, false, +/* 47 - 0xBC */ false, false, false, false, +/* 48 - 0xC0 */ true, false, false, false, +/* 49 - 0xC4 */ false, false, false, false, +/* 50 - 0xC8 */ false, false, false, false, +/* 51 - 0xCC */ false, false, false, false, +/* 52 - 0xD0 */ true, false, false, false, +/* 53 - 0xD4 */ false, false, false, false, +/* 54 - 0xD8 */ false, false, false, false, +/* 55 - 0xDC */ false, false, false, false, +/* 56 - 0xE0 */ true, false, false, false, +/* 57 - 0xE4 */ false, false, false, false, +/* 58 - 0xE8 */ false, false, false, false, +/* 59 - 0xEC */ false, false, false, false, +/* 60 - 0xF0 */ true, false, false, false, +/* 61 - 0xF4 */ false, false, false, false, +/* 62 - 0xF8 */ false, false, false, false, +/* 63 - 0xFC */ false, false, false, false +}; + +/* + * Table for determination presence of data using segment + * state in provided byte. Checking byte is used + * as index in array. + */ +const bool detect_data_using_seg[U8_MAX + 1] = { +/* 00 - 0x00 */ false, true, false, false, +/* 01 - 0x04 */ false, false, false, false, +/* 02 - 0x08 */ false, false, false, false, +/* 03 - 0x0C */ false, false, false, false, +/* 04 - 0x10 */ true, true, true, true, +/* 05 - 0x14 */ true, true, true, true, +/* 06 - 0x18 */ true, true, true, true, +/* 07 - 0x1C */ true, true, true, true, +/* 08 - 0x20 */ false, true, false, false, +/* 09 - 0x24 */ false, false, false, false, +/* 10 - 0x28 */ false, false, false, false, +/* 11 - 0x2C */ false, false, false, false, +/* 12 - 0x30 */ false, true, false, false, +/* 13 - 0x34 */ false, false, false, false, +/* 14 - 0x38 */ false, false, false, false, +/* 15 - 0x3C */ false, false, false, false, +/* 16 - 0x40 */ false, true, false, false, +/* 17 - 0x44 */ false, false, false, false, +/* 18 - 0x48 */ false, false, false, false, +/* 19 - 0x4C */ false, false, false, false, +/* 20 - 0x50 */ false, true, false, false, +/* 21 - 0x54 */ false, false, false, false, +/* 22 - 0x58 */ false, false, false, false, +/* 23 - 0x5C */ false, false, false, false, +/* 24 - 0x60 */ false, true, false, false, +/* 25 - 0x64 */ false, false, false, false, +/* 26 - 0x68 */ false, false, false, false, +/* 27 - 0x6C */ false, false, false, false, +/* 28 - 0x70 */ false, true, false, false, +/* 29 - 0x74 */ false, false, false, false, +/* 30 - 0x78 */ false, false, false, false, +/* 31 - 0x7C */ false, false, false, false, +/* 32 - 0x80 */ false, true, false, false, +/* 33 - 0x84 */ false, false, false, false, +/* 34 - 0x88 */ false, false, false, false, +/* 35 - 0x8C */ false, false, false, false, +/* 36 - 0x90 */ false, true, false, false, +/* 37 - 0x94 */ false, false, false, false, +/* 38 - 0x98 */ false, false, false, false, +/* 39 - 0x9C */ false, false, false, false, +/* 40 - 0xA0 */ false, true, false, false, +/* 41 - 0xA4 */ false, false, false, false, +/* 42 - 0xA8 */ false, false, false, false, +/* 43 - 0xAC */ false, false, false, false, +/* 44 - 0xB0 */ false, true, false, false, +/* 45 - 0xB4 */ false, false, false, false, +/* 46 - 0xB8 */ false, false, false, false, +/* 47 - 0xBC */ false, false, false, false, +/* 48 - 0xC0 */ false, true, false, false, +/* 49 - 0xC4 */ false, false, false, false, +/* 50 - 0xC8 */ false, false, false, false, +/* 51 - 0xCC */ false, false, false, false, +/* 52 - 0xD0 */ false, true, false, false, +/* 53 - 0xD4 */ false, false, false, false, +/* 54 - 0xD8 */ false, false, false, false, +/* 55 - 0xDC */ false, false, false, false, +/* 56 - 0xE0 */ false, true, false, false, +/* 57 - 0xE4 */ false, false, false, false, +/* 58 - 0xE8 */ false, false, false, false, +/* 59 - 0xEC */ false, false, false, false, +/* 60 - 0xF0 */ false, true, false, false, +/* 61 - 0xF4 */ false, false, false, false, +/* 62 - 0xF8 */ false, false, false, false, +/* 63 - 0xFC */ false, false, false, false +}; + +/* + * Table for determination presence of leaf node segment + * state in provided byte. Checking byte is used + * as index in array. + */ +const bool detect_lnode_using_seg[U8_MAX + 1] = { +/* 00 - 0x00 */ false, false, true, false, +/* 01 - 0x04 */ false, false, false, false, +/* 02 - 0x08 */ false, false, false, false, +/* 03 - 0x0C */ false, false, false, false, +/* 04 - 0x10 */ false, false, true, false, +/* 05 - 0x14 */ false, false, false, false, +/* 06 - 0x18 */ false, false, false, false, +/* 07 - 0x1C */ false, false, false, false, +/* 08 - 0x20 */ true, true, true, true, +/* 09 - 0x24 */ true, true, true, true, +/* 10 - 0x28 */ true, true, true, true, +/* 11 - 0x2C */ true, true, true, true, +/* 12 - 0x30 */ false, false, true, false, +/* 13 - 0x34 */ false, false, false, false, +/* 14 - 0x38 */ false, false, false, false, +/* 15 - 0x3C */ false, false, false, false, +/* 16 - 0x40 */ false, false, true, false, +/* 17 - 0x44 */ false, false, false, false, +/* 18 - 0x48 */ false, false, false, false, +/* 19 - 0x4C */ false, false, false, false, +/* 20 - 0x50 */ false, false, true, false, +/* 21 - 0x54 */ false, false, false, false, +/* 22 - 0x58 */ false, false, false, false, +/* 23 - 0x5C */ false, false, false, false, +/* 24 - 0x60 */ false, false, true, false, +/* 25 - 0x64 */ false, false, false, false, +/* 26 - 0x68 */ false, false, false, false, +/* 27 - 0x6C */ false, false, false, false, +/* 28 - 0x70 */ false, false, true, false, +/* 29 - 0x74 */ false, false, false, false, +/* 30 - 0x78 */ false, false, false, false, +/* 31 - 0x7C */ false, false, false, false, +/* 32 - 0x80 */ false, false, true, false, +/* 33 - 0x84 */ false, false, false, false, +/* 34 - 0x88 */ false, false, false, false, +/* 35 - 0x8C */ false, false, false, false, +/* 36 - 0x90 */ false, false, true, false, +/* 37 - 0x94 */ false, false, false, false, +/* 38 - 0x98 */ false, false, false, false, +/* 39 - 0x9C */ false, false, false, false, +/* 40 - 0xA0 */ false, false, true, false, +/* 41 - 0xA4 */ false, false, false, false, +/* 42 - 0xA8 */ false, false, false, false, +/* 43 - 0xAC */ false, false, false, false, +/* 44 - 0xB0 */ false, false, true, false, +/* 45 - 0xB4 */ false, false, false, false, +/* 46 - 0xB8 */ false, false, false, false, +/* 47 - 0xBC */ false, false, false, false, +/* 48 - 0xC0 */ false, false, true, false, +/* 49 - 0xC4 */ false, false, false, false, +/* 50 - 0xC8 */ false, false, false, false, +/* 51 - 0xCC */ false, false, false, false, +/* 52 - 0xD0 */ false, false, true, false, +/* 53 - 0xD4 */ false, false, false, false, +/* 54 - 0xD8 */ false, false, false, false, +/* 55 - 0xDC */ false, false, false, false, +/* 56 - 0xE0 */ false, false, true, false, +/* 57 - 0xE4 */ false, false, false, false, +/* 58 - 0xE8 */ false, false, false, false, +/* 59 - 0xEC */ false, false, false, false, +/* 60 - 0xF0 */ false, false, true, false, +/* 61 - 0xF4 */ false, false, false, false, +/* 62 - 0xF8 */ false, false, false, false, +/* 63 - 0xFC */ false, false, false, false +}; + +/* + * Table for determination presence of hybrid node segment + * state in provided byte. Checking byte is used + * as index in array. + */ +const bool detect_hnode_using_seg[U8_MAX + 1] = { +/* 00 - 0x00 */ false, false, false, false, +/* 01 - 0x04 */ false, true, false, false, +/* 02 - 0x08 */ false, false, false, false, +/* 03 - 0x0C */ false, false, false, false, +/* 04 - 0x10 */ false, false, false, false, +/* 05 - 0x14 */ false, true, false, false, +/* 06 - 0x18 */ false, false, false, false, +/* 07 - 0x1C */ false, false, false, false, +/* 08 - 0x20 */ false, false, false, false, +/* 09 - 0x24 */ false, true, false, false, +/* 10 - 0x28 */ false, false, false, false, +/* 11 - 0x2C */ false, false, false, false, +/* 12 - 0x30 */ false, false, false, false, +/* 13 - 0x34 */ false, true, false, false, +/* 14 - 0x38 */ false, false, false, false, +/* 15 - 0x3C */ false, false, false, false, +/* 16 - 0x40 */ false, false, false, false, +/* 17 - 0x44 */ false, true, false, false, +/* 18 - 0x48 */ false, false, false, false, +/* 19 - 0x4C */ false, false, false, false, +/* 20 - 0x50 */ true, true, true, true, +/* 21 - 0x54 */ true, true, true, true, +/* 22 - 0x58 */ true, true, true, true, +/* 23 - 0x5C */ true, true, true, true, +/* 24 - 0x60 */ false, false, false, false, +/* 25 - 0x64 */ false, true, false, false, +/* 26 - 0x68 */ false, false, false, false, +/* 27 - 0x6C */ false, false, false, false, +/* 28 - 0x70 */ false, false, false, false, +/* 29 - 0x74 */ false, true, false, false, +/* 30 - 0x78 */ false, false, false, false, +/* 31 - 0x7C */ false, false, false, false, +/* 32 - 0x80 */ false, false, false, false, +/* 33 - 0x84 */ false, true, false, false, +/* 34 - 0x88 */ false, false, false, false, +/* 35 - 0x8C */ false, false, false, false, +/* 36 - 0x90 */ false, false, false, false, +/* 37 - 0x94 */ false, true, false, false, +/* 38 - 0x98 */ false, false, false, false, +/* 39 - 0x9C */ false, false, false, false, +/* 40 - 0xA0 */ false, false, false, false, +/* 41 - 0xA4 */ false, true, false, false, +/* 42 - 0xA8 */ false, false, false, false, +/* 43 - 0xAC */ false, false, false, false, +/* 44 - 0xB0 */ false, false, false, false, +/* 45 - 0xB4 */ false, true, false, false, +/* 46 - 0xB8 */ false, false, false, false, +/* 47 - 0xBC */ false, false, false, false, +/* 48 - 0xC0 */ false, false, false, false, +/* 49 - 0xC4 */ false, true, false, false, +/* 50 - 0xC8 */ false, false, false, false, +/* 51 - 0xCC */ false, false, false, false, +/* 52 - 0xD0 */ false, false, false, false, +/* 53 - 0xD4 */ false, true, false, false, +/* 54 - 0xD8 */ false, false, false, false, +/* 55 - 0xDC */ false, false, false, false, +/* 56 - 0xE0 */ false, false, false, false, +/* 57 - 0xE4 */ false, true, false, false, +/* 58 - 0xE8 */ false, false, false, false, +/* 59 - 0xEC */ false, false, false, false, +/* 60 - 0xF0 */ false, false, false, false, +/* 61 - 0xF4 */ false, true, false, false, +/* 62 - 0xF8 */ false, false, false, false, +/* 63 - 0xFC */ false, false, false, false +}; + +/* + * Table for determination presence of index node segment + * state in provided byte. Checking byte is used + * as index in array. + */ +const bool detect_idxnode_using_seg[U8_MAX + 1] = { +/* 00 - 0x00 */ false, false, false, true, +/* 01 - 0x04 */ false, false, false, false, +/* 02 - 0x08 */ false, false, false, false, +/* 03 - 0x0C */ false, false, false, false, +/* 04 - 0x10 */ false, false, false, true, +/* 05 - 0x14 */ false, false, false, false, +/* 06 - 0x18 */ false, false, false, false, +/* 07 - 0x1C */ false, false, false, false, +/* 08 - 0x20 */ false, false, false, true, +/* 09 - 0x24 */ false, false, false, false, +/* 10 - 0x28 */ false, false, false, false, +/* 11 - 0x2C */ false, false, false, false, +/* 12 - 0x30 */ true, true, true, true, +/* 13 - 0x34 */ true, true, true, true, +/* 14 - 0x38 */ true, true, true, true, +/* 15 - 0x3C */ true, true, true, true, +/* 16 - 0x40 */ false, false, false, true, +/* 17 - 0x44 */ false, false, false, false, +/* 18 - 0x48 */ false, false, false, false, +/* 19 - 0x4C */ false, false, false, false, +/* 20 - 0x50 */ false, false, false, true, +/* 21 - 0x54 */ false, false, false, false, +/* 22 - 0x58 */ false, false, false, false, +/* 23 - 0x5C */ false, false, false, false, +/* 24 - 0x60 */ false, false, false, true, +/* 25 - 0x64 */ false, false, false, false, +/* 26 - 0x68 */ false, false, false, false, +/* 27 - 0x6C */ false, false, false, false, +/* 28 - 0x70 */ false, false, false, true, +/* 29 - 0x74 */ false, false, false, false, +/* 30 - 0x78 */ false, false, false, false, +/* 31 - 0x7C */ false, false, false, false, +/* 32 - 0x80 */ false, false, false, true, +/* 33 - 0x84 */ false, false, false, false, +/* 34 - 0x88 */ false, false, false, false, +/* 35 - 0x8C */ false, false, false, false, +/* 36 - 0x90 */ false, false, false, true, +/* 37 - 0x94 */ false, false, false, false, +/* 38 - 0x98 */ false, false, false, false, +/* 39 - 0x9C */ false, false, false, false, +/* 40 - 0xA0 */ false, false, false, true, +/* 41 - 0xA4 */ false, false, false, false, +/* 42 - 0xA8 */ false, false, false, false, +/* 43 - 0xAC */ false, false, false, false, +/* 44 - 0xB0 */ false, false, false, true, +/* 45 - 0xB4 */ false, false, false, false, +/* 46 - 0xB8 */ false, false, false, false, +/* 47 - 0xBC */ false, false, false, false, +/* 48 - 0xC0 */ false, false, false, true, +/* 49 - 0xC4 */ false, false, false, false, +/* 50 - 0xC8 */ false, false, false, false, +/* 51 - 0xCC */ false, false, false, false, +/* 52 - 0xD0 */ false, false, false, true, +/* 53 - 0xD4 */ false, false, false, false, +/* 54 - 0xD8 */ false, false, false, false, +/* 55 - 0xDC */ false, false, false, false, +/* 56 - 0xE0 */ false, false, false, true, +/* 57 - 0xE4 */ false, false, false, false, +/* 58 - 0xE8 */ false, false, false, false, +/* 59 - 0xEC */ false, false, false, false, +/* 60 - 0xF0 */ false, false, false, true, +/* 61 - 0xF4 */ false, false, false, false, +/* 62 - 0xF8 */ false, false, false, false, +/* 63 - 0xFC */ false, false, false, false +}; + +/* + * Table for determination presence of used segment + * state in provided byte. Checking byte is used + * as index in array. + */ +const bool detect_used_seg[U8_MAX + 1] = { +/* 00 - 0x00 */ false, false, false, false, +/* 01 - 0x04 */ false, false, false, true, +/* 02 - 0x08 */ false, false, false, false, +/* 03 - 0x0C */ false, false, false, false, +/* 04 - 0x10 */ false, false, false, false, +/* 05 - 0x14 */ false, false, false, true, +/* 06 - 0x18 */ false, false, false, false, +/* 07 - 0x1C */ false, false, false, false, +/* 08 - 0x20 */ false, false, false, false, +/* 09 - 0x24 */ false, false, false, true, +/* 10 - 0x28 */ false, false, false, false, +/* 11 - 0x2C */ false, false, false, false, +/* 12 - 0x30 */ false, false, false, false, +/* 13 - 0x34 */ false, false, false, true, +/* 14 - 0x38 */ false, false, false, false, +/* 15 - 0x3C */ false, false, false, false, +/* 16 - 0x40 */ false, false, false, false, +/* 17 - 0x44 */ false, false, false, true, +/* 18 - 0x48 */ false, false, false, false, +/* 19 - 0x4C */ false, false, false, false, +/* 20 - 0x50 */ false, false, false, false, +/* 21 - 0x54 */ false, false, false, true, +/* 22 - 0x58 */ false, false, false, false, +/* 23 - 0x5C */ false, false, false, false, +/* 24 - 0x60 */ false, false, false, false, +/* 25 - 0x64 */ false, false, false, true, +/* 26 - 0x68 */ false, false, false, false, +/* 27 - 0x6C */ false, false, false, false, +/* 28 - 0x70 */ true, true, true, true, +/* 29 - 0x74 */ true, true, true, true, +/* 30 - 0x78 */ true, true, true, true, +/* 31 - 0x7C */ true, true, true, true, +/* 32 - 0x80 */ false, false, false, false, +/* 33 - 0x84 */ false, false, false, true, +/* 34 - 0x88 */ false, false, false, false, +/* 35 - 0x8C */ false, false, false, false, +/* 36 - 0x90 */ false, false, false, false, +/* 37 - 0x94 */ false, false, false, true, +/* 38 - 0x98 */ false, false, false, false, +/* 39 - 0x9C */ false, false, false, false, +/* 40 - 0xA0 */ false, false, false, false, +/* 41 - 0xA4 */ false, false, false, true, +/* 42 - 0xA8 */ false, false, false, false, +/* 43 - 0xAC */ false, false, false, false, +/* 44 - 0xB0 */ false, false, false, false, +/* 45 - 0xB4 */ false, false, false, true, +/* 46 - 0xB8 */ false, false, false, false, +/* 47 - 0xBC */ false, false, false, false, +/* 48 - 0xC0 */ false, false, false, false, +/* 49 - 0xC4 */ false, false, false, true, +/* 50 - 0xC8 */ false, false, false, false, +/* 51 - 0xCC */ false, false, false, false, +/* 52 - 0xD0 */ false, false, false, false, +/* 53 - 0xD4 */ false, false, false, true, +/* 54 - 0xD8 */ false, false, false, false, +/* 55 - 0xDC */ false, false, false, false, +/* 56 - 0xE0 */ false, false, false, false, +/* 57 - 0xE4 */ false, false, false, true, +/* 58 - 0xE8 */ false, false, false, false, +/* 59 - 0xEC */ false, false, false, false, +/* 60 - 0xF0 */ false, false, false, false, +/* 61 - 0xF4 */ false, false, false, true, +/* 62 - 0xF8 */ false, false, false, false, +/* 63 - 0xFC */ false, false, false, false +}; + +/* + * Table for determination presence of pre-dirty segment + * state in provided byte. Checking byte is used + * as index in array. + */ +const bool detect_pre_dirty_seg[U8_MAX + 1] = { +/* 00 - 0x00 */ false, false, false, false, +/* 01 - 0x04 */ false, false, true, false, +/* 02 - 0x08 */ false, false, false, false, +/* 03 - 0x0C */ false, false, false, false, +/* 04 - 0x10 */ false, false, false, false, +/* 05 - 0x14 */ false, false, true, false, +/* 06 - 0x18 */ false, false, false, false, +/* 07 - 0x1C */ false, false, false, false, +/* 08 - 0x20 */ false, false, false, false, +/* 09 - 0x24 */ false, false, true, false, +/* 10 - 0x28 */ false, false, false, false, +/* 11 - 0x2C */ false, false, false, false, +/* 12 - 0x30 */ false, false, false, false, +/* 13 - 0x34 */ false, false, true, false, +/* 14 - 0x38 */ false, false, false, false, +/* 15 - 0x3C */ false, false, false, false, +/* 16 - 0x40 */ false, false, false, false, +/* 17 - 0x44 */ false, false, true, false, +/* 18 - 0x48 */ false, false, false, false, +/* 19 - 0x4C */ false, false, false, false, +/* 20 - 0x50 */ false, false, false, false, +/* 21 - 0x54 */ false, false, true, false, +/* 22 - 0x58 */ false, false, false, false, +/* 23 - 0x5C */ false, false, false, false, +/* 24 - 0x60 */ true, true, true, true, +/* 25 - 0x64 */ true, true, true, true, +/* 26 - 0x68 */ true, true, true, true, +/* 27 - 0x6C */ true, true, true, true, +/* 28 - 0x70 */ false, false, false, false, +/* 29 - 0x74 */ false, false, true, false, +/* 30 - 0x78 */ false, false, false, false, +/* 31 - 0x7C */ false, false, false, false, +/* 32 - 0x80 */ false, false, false, false, +/* 33 - 0x84 */ false, false, true, false, +/* 34 - 0x88 */ false, false, false, false, +/* 35 - 0x8C */ false, false, false, false, +/* 36 - 0x90 */ false, false, false, false, +/* 37 - 0x94 */ false, false, true, false, +/* 38 - 0x98 */ false, false, false, false, +/* 39 - 0x9C */ false, false, false, false, +/* 40 - 0xA0 */ false, false, false, false, +/* 41 - 0xA4 */ false, false, true, false, +/* 42 - 0xA8 */ false, false, false, false, +/* 43 - 0xAC */ false, false, false, false, +/* 44 - 0xB0 */ false, false, false, false, +/* 45 - 0xB4 */ false, false, true, false, +/* 46 - 0xB8 */ false, false, false, false, +/* 47 - 0xBC */ false, false, false, false, +/* 48 - 0xC0 */ false, false, false, false, +/* 49 - 0xC4 */ false, false, true, false, +/* 50 - 0xC8 */ false, false, false, false, +/* 51 - 0xCC */ false, false, false, false, +/* 52 - 0xD0 */ false, false, false, false, +/* 53 - 0xD4 */ false, false, true, false, +/* 54 - 0xD8 */ false, false, false, false, +/* 55 - 0xDC */ false, false, false, false, +/* 56 - 0xE0 */ false, false, false, false, +/* 57 - 0xE4 */ false, false, true, false, +/* 58 - 0xE8 */ false, false, false, false, +/* 59 - 0xEC */ false, false, false, false, +/* 60 - 0xF0 */ false, false, false, false, +/* 61 - 0xF4 */ false, false, true, false, +/* 62 - 0xF8 */ false, false, false, false, +/* 63 - 0xFC */ false, false, false, false +}; + +/* + * Table for determination presence of dirty segment + * state in provided byte. Checking byte is used + * as index in array. + */ +const bool detect_dirty_seg[U8_MAX + 1] = { +/* 00 - 0x00 */ false, false, false, false, +/* 01 - 0x04 */ true, false, false, false, +/* 02 - 0x08 */ false, false, false, false, +/* 03 - 0x0C */ false, false, false, false, +/* 04 - 0x10 */ false, false, false, false, +/* 05 - 0x14 */ true, false, false, false, +/* 06 - 0x18 */ false, false, false, false, +/* 07 - 0x1C */ false, false, false, false, +/* 08 - 0x20 */ false, false, false, false, +/* 09 - 0x24 */ true, false, false, false, +/* 10 - 0x28 */ false, false, false, false, +/* 11 - 0x2C */ false, false, false, false, +/* 12 - 0x30 */ false, false, false, false, +/* 13 - 0x34 */ true, false, false, false, +/* 14 - 0x38 */ false, false, false, false, +/* 15 - 0x3C */ false, false, false, false, +/* 16 - 0x40 */ true, true, true, true, +/* 17 - 0x44 */ true, true, true, true, +/* 18 - 0x48 */ true, true, true, true, +/* 19 - 0x4C */ true, true, true, true, +/* 20 - 0x50 */ false, false, false, false, +/* 21 - 0x54 */ true, false, false, false, +/* 22 - 0x58 */ false, false, false, false, +/* 23 - 0x5C */ false, false, false, false, +/* 24 - 0x60 */ false, false, false, false, +/* 25 - 0x64 */ true, false, false, false, +/* 26 - 0x68 */ false, false, false, false, +/* 27 - 0x6C */ false, false, false, false, +/* 28 - 0x70 */ false, false, false, false, +/* 29 - 0x74 */ true, false, false, false, +/* 30 - 0x78 */ false, false, false, false, +/* 31 - 0x7C */ false, false, false, false, +/* 32 - 0x80 */ false, false, false, false, +/* 33 - 0x84 */ true, false, false, false, +/* 34 - 0x88 */ false, false, false, false, +/* 35 - 0x8C */ false, false, false, false, +/* 36 - 0x90 */ false, false, false, false, +/* 37 - 0x94 */ true, false, false, false, +/* 38 - 0x98 */ false, false, false, false, +/* 39 - 0x9C */ false, false, false, false, +/* 40 - 0xA0 */ false, false, false, false, +/* 41 - 0xA4 */ true, false, false, false, +/* 42 - 0xA8 */ false, false, false, false, +/* 43 - 0xAC */ false, false, false, false, +/* 44 - 0xB0 */ false, false, false, false, +/* 45 - 0xB4 */ true, false, false, false, +/* 46 - 0xB8 */ false, false, false, false, +/* 47 - 0xBC */ false, false, false, false, +/* 48 - 0xC0 */ false, false, false, false, +/* 49 - 0xC4 */ true, false, false, false, +/* 50 - 0xC8 */ false, false, false, false, +/* 51 - 0xCC */ false, false, false, false, +/* 52 - 0xD0 */ false, false, false, false, +/* 53 - 0xD4 */ true, false, false, false, +/* 54 - 0xD8 */ false, false, false, false, +/* 55 - 0xDC */ false, false, false, false, +/* 56 - 0xE0 */ false, false, false, false, +/* 57 - 0xE4 */ true, false, false, false, +/* 58 - 0xE8 */ false, false, false, false, +/* 59 - 0xEC */ false, false, false, false, +/* 60 - 0xF0 */ false, false, false, false, +/* 61 - 0xF4 */ true, false, false, false, +/* 62 - 0xF8 */ false, false, false, false, +/* 63 - 0xFC */ false, false, false, false +}; + +/* + * Table for determination presence of bad segment + * state in provided byte. Checking byte is used + * as index in array. + */ +const bool detect_bad_seg[U8_MAX + 1] = { +/* 00 - 0x00 */ false, false, false, false, +/* 01 - 0x04 */ false, false, false, false, +/* 02 - 0x08 */ true, false, false, false, +/* 03 - 0x0C */ false, false, false, false, +/* 04 - 0x10 */ false, false, false, false, +/* 05 - 0x14 */ false, false, false, false, +/* 06 - 0x18 */ true, false, false, false, +/* 07 - 0x1C */ false, false, false, false, +/* 08 - 0x20 */ false, false, false, false, +/* 09 - 0x24 */ false, false, false, false, +/* 10 - 0x28 */ true, false, false, false, +/* 11 - 0x2C */ false, false, false, false, +/* 12 - 0x30 */ false, false, false, false, +/* 13 - 0x34 */ false, false, false, false, +/* 14 - 0x38 */ true, false, false, false, +/* 15 - 0x3C */ false, false, false, false, +/* 16 - 0x40 */ false, false, false, false, +/* 17 - 0x44 */ false, false, false, false, +/* 18 - 0x48 */ true, false, false, false, +/* 19 - 0x4C */ false, false, false, false, +/* 20 - 0x50 */ false, false, false, false, +/* 21 - 0x54 */ false, false, false, false, +/* 22 - 0x58 */ true, false, false, false, +/* 23 - 0x5C */ false, false, false, false, +/* 24 - 0x60 */ false, false, false, false, +/* 25 - 0x64 */ false, false, false, false, +/* 26 - 0x68 */ true, false, false, false, +/* 27 - 0x6C */ false, false, false, false, +/* 28 - 0x70 */ false, false, false, false, +/* 29 - 0x74 */ false, false, false, false, +/* 30 - 0x78 */ true, false, false, false, +/* 31 - 0x7C */ false, false, false, false, +/* 32 - 0x80 */ true, true, true, true, +/* 33 - 0x84 */ true, true, true, true, +/* 34 - 0x88 */ true, true, true, true, +/* 35 - 0x8C */ true, true, true, true, +/* 36 - 0x90 */ false, false, false, false, +/* 37 - 0x94 */ false, false, false, false, +/* 38 - 0x98 */ true, false, false, false, +/* 39 - 0x9C */ false, false, false, false, +/* 40 - 0xA0 */ false, false, false, false, +/* 41 - 0xA4 */ false, false, false, false, +/* 42 - 0xA8 */ true, false, false, false, +/* 43 - 0xAC */ false, false, false, false, +/* 44 - 0xB0 */ false, false, false, false, +/* 45 - 0xB4 */ false, false, false, false, +/* 46 - 0xB8 */ true, false, false, false, +/* 47 - 0xBC */ false, false, false, false, +/* 48 - 0xC0 */ false, false, false, false, +/* 49 - 0xC4 */ false, false, false, false, +/* 50 - 0xC8 */ true, false, false, false, +/* 51 - 0xCC */ false, false, false, false, +/* 52 - 0xD0 */ false, false, false, false, +/* 53 - 0xD4 */ false, false, false, false, +/* 54 - 0xD8 */ true, false, false, false, +/* 55 - 0xDC */ false, false, false, false, +/* 56 - 0xE0 */ false, false, false, false, +/* 57 - 0xE4 */ false, false, false, false, +/* 58 - 0xE8 */ true, false, false, false, +/* 59 - 0xEC */ false, false, false, false, +/* 60 - 0xF0 */ false, false, false, false, +/* 61 - 0xF4 */ false, false, false, false, +/* 62 - 0xF8 */ true, false, false, false, +/* 63 - 0xFC */ false, false, false, false +}; + +/* + * Table for determination presence of clean or using segment + * state in provided byte. Checking byte is used + * as index in array. + */ +const bool detect_clean_using_mask[U8_MAX + 1] = { +/* 00 - 0x00 */ true, true, true, true, +/* 01 - 0x04 */ true, true, true, true, +/* 02 - 0x08 */ true, true, true, true, +/* 03 - 0x0C */ true, true, true, true, +/* 04 - 0x10 */ true, true, true, true, +/* 05 - 0x14 */ true, true, true, true, +/* 06 - 0x18 */ true, true, true, true, +/* 07 - 0x1C */ true, true, true, true, +/* 08 - 0x20 */ true, true, true, true, +/* 09 - 0x24 */ true, true, true, true, +/* 10 - 0x28 */ true, true, true, true, +/* 11 - 0x2C */ true, true, true, true, +/* 12 - 0x30 */ true, true, true, true, +/* 13 - 0x34 */ true, true, true, true, +/* 14 - 0x38 */ true, true, true, true, +/* 15 - 0x3C */ true, true, true, true, +/* 16 - 0x40 */ true, true, true, true, +/* 17 - 0x44 */ false, true, false, false, +/* 18 - 0x48 */ false, false, false, false, +/* 19 - 0x4C */ false, false, false, false, +/* 20 - 0x50 */ true, true, true, true, +/* 21 - 0x54 */ true, true, true, true, +/* 22 - 0x58 */ true, true, true, true, +/* 23 - 0x5C */ true, true, true, true, +/* 24 - 0x60 */ true, true, true, true, +/* 25 - 0x64 */ false, true, false, false, +/* 26 - 0x68 */ false, false, false, false, +/* 27 - 0x6C */ false, false, false, false, +/* 28 - 0x70 */ true, true, true, true, +/* 29 - 0x74 */ false, true, false, false, +/* 30 - 0x78 */ false, false, false, false, +/* 31 - 0x7C */ false, false, false, false, +/* 32 - 0x80 */ true, true, true, true, +/* 33 - 0x84 */ false, true, false, false, +/* 34 - 0x88 */ false, false, false, false, +/* 35 - 0x8C */ false, false, false, false, +/* 36 - 0x90 */ true, true, true, true, +/* 37 - 0x94 */ false, true, false, false, +/* 38 - 0x98 */ false, false, false, false, +/* 39 - 0x9C */ false, false, false, false, +/* 40 - 0xA0 */ true, true, true, true, +/* 41 - 0xA4 */ false, true, false, false, +/* 42 - 0xA8 */ false, false, false, false, +/* 43 - 0xAC */ false, false, false, false, +/* 44 - 0xB0 */ true, true, true, true, +/* 45 - 0xB4 */ false, true, false, false, +/* 46 - 0xB8 */ false, false, false, false, +/* 47 - 0xBC */ false, false, false, false, +/* 48 - 0xC0 */ true, true, true, true, +/* 49 - 0xC4 */ false, true, false, false, +/* 50 - 0xC8 */ false, false, false, false, +/* 51 - 0xCC */ false, false, false, false, +/* 52 - 0xD0 */ true, true, true, true, +/* 53 - 0xD4 */ false, true, false, false, +/* 54 - 0xD8 */ false, false, false, false, +/* 55 - 0xDC */ false, false, false, false, +/* 56 - 0xE0 */ true, true, true, true, +/* 57 - 0xE4 */ false, true, false, false, +/* 58 - 0xE8 */ false, false, false, false, +/* 59 - 0xEC */ false, false, false, false, +/* 60 - 0xF0 */ true, true, true, true, +/* 61 - 0xF4 */ false, true, false, false, +/* 62 - 0xF8 */ false, false, false, false, +/* 63 - 0xFC */ false, false, false, false +}; + +/* + * Table for determination presence of used, pre-dirty or + * dirty segment state in provided byte. + * Checking byte is used as index in array. + */ +const bool detect_used_dirty_mask[U8_MAX + 1] = { +/* 00 - 0x00 */ false, false, false, false, +/* 01 - 0x04 */ true, false, true, true, +/* 02 - 0x08 */ false, false, false, false, +/* 03 - 0x0C */ false, false, false, false, +/* 04 - 0x10 */ false, false, false, false, +/* 05 - 0x14 */ true, false, true, true, +/* 06 - 0x18 */ false, false, false, false, +/* 07 - 0x1C */ false, false, false, false, +/* 08 - 0x20 */ false, false, false, false, +/* 09 - 0x24 */ true, false, true, true, +/* 10 - 0x28 */ false, false, false, false, +/* 11 - 0x2C */ false, false, false, false, +/* 12 - 0x30 */ false, false, false, false, +/* 13 - 0x34 */ true, false, true, true, +/* 14 - 0x38 */ false, false, false, false, +/* 15 - 0x3C */ false, false, false, false, +/* 16 - 0x40 */ true, true, true, true, +/* 17 - 0x44 */ true, true, true, true, +/* 18 - 0x48 */ true, true, true, true, +/* 19 - 0x4C */ true, true, true, true, +/* 20 - 0x50 */ false, false, false, false, +/* 21 - 0x54 */ true, false, true, true, +/* 22 - 0x58 */ false, false, false, false, +/* 23 - 0x5C */ false, false, false, false, +/* 24 - 0x60 */ true, true, true, true, +/* 25 - 0x64 */ true, true, true, true, +/* 26 - 0x68 */ true, true, true, true, +/* 27 - 0x6C */ true, true, true, true, +/* 28 - 0x70 */ true, true, true, true, +/* 29 - 0x74 */ true, true, true, true, +/* 30 - 0x78 */ true, true, true, true, +/* 31 - 0x7C */ true, true, true, true, +/* 32 - 0x80 */ false, false, false, false, +/* 33 - 0x84 */ true, false, true, true, +/* 34 - 0x88 */ false, false, false, false, +/* 35 - 0x8C */ false, false, false, false, +/* 36 - 0x90 */ false, false, false, false, +/* 37 - 0x94 */ true, false, true, true, +/* 38 - 0x98 */ false, false, false, false, +/* 39 - 0x9C */ false, false, false, false, +/* 40 - 0xA0 */ false, false, false, false, +/* 41 - 0xA4 */ true, false, true, true, +/* 42 - 0xA8 */ false, false, false, false, +/* 43 - 0xAC */ false, false, false, false, +/* 44 - 0xB0 */ false, false, false, false, +/* 45 - 0xB4 */ true, false, true, true, +/* 46 - 0xB8 */ false, false, false, false, +/* 47 - 0xBC */ false, false, false, false, +/* 48 - 0xC0 */ false, false, false, false, +/* 49 - 0xC4 */ true, false, true, true, +/* 50 - 0xC8 */ false, false, false, false, +/* 51 - 0xCC */ false, false, false, false, +/* 52 - 0xD0 */ false, false, false, false, +/* 53 - 0xD4 */ true, false, true, true, +/* 54 - 0xD8 */ false, false, false, false, +/* 55 - 0xDC */ false, false, false, false, +/* 56 - 0xE0 */ false, false, false, false, +/* 57 - 0xE4 */ true, false, true, true, +/* 58 - 0xE8 */ false, false, false, false, +/* 59 - 0xEC */ false, false, false, false, +/* 60 - 0xF0 */ false, false, false, false, +/* 61 - 0xF4 */ true, false, true, true, +/* 62 - 0xF8 */ false, false, false, false, +/* 63 - 0xFC */ false, false, false, false +}; From patchwork Sat Feb 25 01:08:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151949 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2793EC7EE32 for ; Sat, 25 Feb 2023 01:19:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229622AbjBYBTQ (ORCPT ); Fri, 24 Feb 2023 20:19:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49188 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229732AbjBYBRS (ORCPT ); Fri, 24 Feb 2023 20:17:18 -0500 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E5FD13533 for ; Fri, 24 Feb 2023 17:17:13 -0800 (PST) Received: by mail-oi1-x22b.google.com with SMTP id y184so797916oiy.8 for ; Fri, 24 Feb 2023 17:17:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=SEG8PxvPPDHctUHzEMZC7jp24LlXiCWPf+XiGalvf6E=; b=shGUeMll5ZaAhcGSzivdIXo6wU8GJff3MFx0a6RNPcLGDHFtcP7D8cptTg8GW4dUk+ mVYGoQTWNthIxRhWvmL1FQdmlzBmAqxfVPiWrtzWZOtpWDrToKz+CHdsrlvVBhBL9iYD 9zLAFwbD87lTC8BhH+gJLAUL6U5JV0la6mLgSxJJromPpknn8OuisZuvyXFTxANICF+e mYLU9qBiLDmHdGBl4sbGUvfGg8Oyj6wWOFW0RbEuRlDkXV8vrsw9J8P5A2/TN0tyLYeL nPuU5Aanbhw/0KCuVadKUVrVTjd3DiiWlDK8TjuvwYbkuHE+eZe0A1Br5UAeascGjDQe KpQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SEG8PxvPPDHctUHzEMZC7jp24LlXiCWPf+XiGalvf6E=; b=y0iwlLvZ/PqY2WgnPlyZCs+kJe/QH91LY3mpesOhrQxmP197UYdrvbg70mu8+qWomo 959n1pTOdetxr4xx9NYv+K/AECzeRz6xlUbcZVTybvqrdq5+C4dgAHON7rj1M53vI3ly mDSkZJZ3DCKPVgX44JTbRCkaEedONjGhNPTkriQ0Rkds6R6cMmWiEYK9hZp8sBZRRnJe +DVSmoPb4DrJJ0tACvKz28RU6EE2pGzGgw4QUZYA9/hqf+5eRV03uGbvPBRCNVxuIvDy Jki7iQ8lIEdPJUNyGEpW8Cj6UsZJ41yVPeWC7kESSIh7/K5FCy99T5yPjQ8frx+xPexl mlaQ== X-Gm-Message-State: AO0yUKUx26Trlc59W2vk/HWlcP0Qm7Ut8B+phYbNCEutd8A2wJUNx2ZK 9PQCcYgIUQ0+3db3qlW4cxhltAWY5Lu6IEoS X-Google-Smtp-Source: AK7set9OC61ibfcPLoFn2wJbOrjIsGmVy4korVplKbTHV8NefOS7XaWHgs25q7evQFoTtcscIVdb0Q== X-Received: by 2002:aca:f0d:0:b0:35a:d192:9a53 with SMTP id 13-20020aca0f0d000000b0035ad1929a53mr7336955oip.41.1677287832242; Fri, 24 Feb 2023 17:17:12 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:11 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 46/76] ssdfs: segment bitmap API implementation Date: Fri, 24 Feb 2023 17:08:57 -0800 Message-Id: <20230225010927.813929-47-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Segment bitmap implements API: (1) create - create empty segment bitmap object (2) destroy - destroy segment bitmap object (3) fragment_init - init fragment of segment bitmap (4) flush - flush dirty segment bitmap (5) check_state - check that segment has particular state (6) get_state - get current state of particular segment (7) change_state - change state of segment (8) find - find segment for requested state or state mask (9) find_and_set - find segment for requested state and change state Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/segment_bitmap.c | 3014 +++++++++++++++++++++++++++++++++++++ 1 file changed, 3014 insertions(+) diff --git a/fs/ssdfs/segment_bitmap.c b/fs/ssdfs/segment_bitmap.c index 633cd4cfca0a..50a7cc692fe3 100644 --- a/fs/ssdfs/segment_bitmap.c +++ b/fs/ssdfs/segment_bitmap.c @@ -1805,3 +1805,3017 @@ int ssdfs_segbmap_issue_fragments_update(struct ssdfs_segment_bmap *segbmap, return err; } + +/* + * ssdfs_segbmap_flush_dirty_fragments() - flush dirty fragments + * @segbmap: pointer on segment bitmap object + * @fragments_count: count of fragments in segbmap + * @fragment_size: size of fragment in bytes + * + * This method tries to flush all dirty fragments. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENODATA - segbmap hasn't dirty fragments. + * %-ERANGE - internal error. + */ +static +int ssdfs_segbmap_flush_dirty_fragments(struct ssdfs_segment_bmap *segbmap, + u16 fragments_count, + u16 fragment_size) +{ + unsigned long *fbmap; + int size; + unsigned long *found; + u16 start_fragment; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap); + BUG_ON(!rwsem_is_locked(&segbmap->search_lock)); + + SSDFS_DBG("segbmap %p, fragments_count %u, fragment_size %u\n", + segbmap, fragments_count, fragment_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + fbmap = segbmap->fbmap[SSDFS_SEGBMAP_MODIFICATION_FBMAP]; + + size = fragments_count; + err = ssdfs_find_first_dirty_fragment(fbmap, size, &found); + if (err == -ENODATA) { + SSDFS_DBG("segbmap hasn't dirty fragments\n"); + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find dirty fragments: " + "err %d\n", + err); + return err; + } else if (!found) { + SSDFS_ERR("invalid bitmap pointer\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(((found - fbmap) * BITS_PER_LONG) >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + start_fragment = (u16)((found - fbmap) * BITS_PER_LONG); + + err = ssdfs_segbmap_issue_fragments_update(segbmap, start_fragment, + fragment_size, *found); + if (unlikely(err)) { + SSDFS_ERR("fail to issue fragments update: " + "start_fragment %u, found %#lx, err %d\n", + start_fragment, *found, err); + return err; + } + + err = ssdfs_clear_dirty_state(found); + if (unlikely(err)) { + SSDFS_ERR("fail to clear dirty state: " + "err %d\n", + err); + return err; + } + + size = fragments_count - (start_fragment + BITS_PER_LONG); + while (size > 0) { + err = ssdfs_find_first_dirty_fragment(++found, size, + &found); + if (err == -ENODATA) + return 0; + else if (unlikely(err)) { + SSDFS_ERR("fail to find dirty fragments: " + "err %d\n", + err); + return err; + } else if (!found) { + SSDFS_ERR("invalid bitmap pointer\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(((found - fbmap) * BITS_PER_LONG) >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + start_fragment = (u16)((found - fbmap) * BITS_PER_LONG); + + err = ssdfs_segbmap_issue_fragments_update(segbmap, + start_fragment, + fragment_size, + *found); + if (unlikely(err)) { + SSDFS_ERR("fail to issue fragments update: " + "start_fragment %u, found %#lx, err %d\n", + start_fragment, *found, err); + return err; + } + + err = ssdfs_clear_dirty_state(found); + if (unlikely(err)) { + SSDFS_ERR("fail to clear dirty state: " + "err %d\n", + err); + return err; + } + + size = fragments_count - (start_fragment + BITS_PER_LONG); + } + + return 0; +} + +/* + * ssdfs_segbmap_wait_flush_end() - wait flush ending + * @segbmap: pointer on segment bitmap object + * @fragments_count: count of fragments in segbmap + * + * This method is waiting the end of flush operation. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_segbmap_wait_flush_end(struct ssdfs_segment_bmap *segbmap, + u16 fragments_count) +{ + struct ssdfs_segbmap_fragment_desc *fragment; + struct ssdfs_segment_request *req1 = NULL, *req2 = NULL; + bool has_backup; + wait_queue_head_t *wq = NULL; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap); + BUG_ON(!rwsem_is_locked(&segbmap->search_lock)); + + SSDFS_DBG("segbmap %p, fragments_count %u\n", + segbmap, fragments_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + has_backup = segbmap->flags & SSDFS_SEGBMAP_HAS_COPY; + + for (i = 0; i < fragments_count; i++) { + fragment = &segbmap->desc_array[i]; + + switch (fragment->state) { + case SSDFS_SEGBMAP_FRAG_DIRTY: + SSDFS_ERR("found unprocessed dirty fragment: " + "index %d\n", i); + return -ERANGE; + + case SSDFS_SEGBMAP_FRAG_TOWRITE: + req1 = &fragment->flush_req1; + req2 = &fragment->flush_req2; + +check_req1_state: + switch (atomic_read(&req1->result.state)) { + case SSDFS_REQ_CREATED: + case SSDFS_REQ_STARTED: + wq = &req1->private.wait_queue; + + err = wait_event_killable_timeout(*wq, + has_request_been_executed(req1), + SSDFS_DEFAULT_TIMEOUT); + if (err < 0) + WARN_ON(err < 0); + else + err = 0; + + goto check_req1_state; + break; + + case SSDFS_REQ_FINISHED: + /* do nothing */ + break; + + case SSDFS_REQ_FAILED: + err = req1->result.err; + + if (!err) { + err = -ERANGE; + SSDFS_ERR("error code is absent\n"); + } + + SSDFS_ERR("flush request is failed: " + "err %d\n", err); + return err; + + default: + SSDFS_ERR("invalid result's state %#x\n", + atomic_read(&req1->result.state)); + return -ERANGE; + } + + if (!has_backup) + goto finish_fragment_check; + +check_req2_state: + switch (atomic_read(&req2->result.state)) { + case SSDFS_REQ_CREATED: + case SSDFS_REQ_STARTED: + wq = &req2->private.wait_queue; + + err = wait_event_killable_timeout(*wq, + has_request_been_executed(req2), + SSDFS_DEFAULT_TIMEOUT); + if (err < 0) + WARN_ON(err < 0); + else + err = 0; + + goto check_req2_state; + break; + + case SSDFS_REQ_FINISHED: + /* do nothing */ + break; + + case SSDFS_REQ_FAILED: + err = req2->result.err; + + if (!err) { + err = -ERANGE; + SSDFS_ERR("error code is absent\n"); + } + + SSDFS_ERR("flush request failed: " + "err %d\n", err); + return err; + + default: + SSDFS_ERR("invalid result's state %#x\n", + atomic_read(&req2->result.state)); + return -ERANGE; + } + +finish_fragment_check: + break; + + default: + /* do nothing */ + break; + } + } + + return 0; +} + +/* + * ssdfs_segbmap_issue_commit_logs() - request logs commit + * @segbmap: pointer on segment bitmap object + * @fragments_count: count of fragments in segbmap + * @fragment_size: size of fragment in bytes + * + * This method tries to issue the commit logs operation. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_segbmap_issue_commit_logs(struct ssdfs_segment_bmap *segbmap, + u16 fragments_count, + u16 fragment_size) +{ + struct ssdfs_segbmap_fragment_desc *fragment; + struct ssdfs_segbmap_fragment_header *hdr; + struct ssdfs_segment_request *req1 = NULL, *req2 = NULL; + struct ssdfs_segment_info *si; + struct page *page; + void *kaddr; + size_t extent_size = sizeof(struct ssdfs_volume_extent); + u64 ino = SSDFS_SEG_BMAP_INO; + bool has_backup; + u64 offset; + u16 seg_index; + int copy_id; + u16 i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap); + BUG_ON(!rwsem_is_locked(&segbmap->search_lock)); + + SSDFS_DBG("segbmap %p, fragments_count %u, fragment_size %u\n", + segbmap, fragments_count, fragment_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + has_backup = segbmap->flags & SSDFS_SEGBMAP_HAS_COPY; + + for (i = 0; i < fragments_count; i++) { + fragment = &segbmap->desc_array[i]; + + switch (fragment->state) { + case SSDFS_SEGBMAP_FRAG_DIRTY: + SSDFS_ERR("found unprocessed dirty fragment: " + "index %d\n", i); + return -ERANGE; + + case SSDFS_SEGBMAP_FRAG_TOWRITE: + req1 = &fragment->flush_req1; + req2 = &fragment->flush_req2; + + ssdfs_request_init(req1); + ssdfs_get_request(req1); + + offset = (u64)i; + offset *= fragment_size; + + ssdfs_request_prepare_logical_extent(ino, offset, + 0, 0, 0, req1); + + page = find_lock_page(&segbmap->pages, i); + if (!page) { + err = -ERANGE; + SSDFS_ERR("fail to find page: " + "fragment_index %u\n", + i); + goto fail_issue_commit_logs; + } + + ssdfs_account_locked_page(page); + kaddr = kmap_local_page(page); + + hdr = SSDFS_SBMP_FRAG_HDR(kaddr); + + err = ssdfs_segbmap_define_volume_extent(segbmap, req1, + hdr, 1, + &seg_index); + if (unlikely(err)) { + SSDFS_ERR("fail to define volume extent: " + "err %d\n", + err); + } + + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) + goto fail_issue_commit_logs; + + copy_id = SSDFS_MAIN_SEGBMAP_SEG; + si = segbmap->segs[seg_index][copy_id]; + + err = ssdfs_segment_commit_log_async(si, + SSDFS_REQ_ASYNC_NO_FREE, + req1); + if (unlikely(err)) { + SSDFS_ERR("fail to issue the commit log: " + "seg_index %u, err %d\n", + seg_index, err); + goto fail_issue_commit_logs; + } + + if (has_backup) { + ssdfs_request_init(req2); + ssdfs_get_request(req2); + + ssdfs_request_prepare_logical_extent(ino, + offset, + 0, 0, 0, + req2); + + ssdfs_memcpy(&req2->place, 0, extent_size, + &req1->place, 0, extent_size, + extent_size); + + copy_id = SSDFS_COPY_SEGBMAP_SEG; + si = segbmap->segs[seg_index][copy_id]; + + err = ssdfs_segment_commit_log_async(si, + SSDFS_REQ_ASYNC_NO_FREE, + req2); + if (unlikely(err)) { + SSDFS_ERR("fail to issue log commit: " + "seg_index %u, err %d\n", + seg_index, err); + goto fail_issue_commit_logs; + } + } + break; + + default: + /* do nothing */ + break; + } + } + + return 0; + +fail_issue_commit_logs: + ssdfs_put_request(req1); + + if (has_backup) + ssdfs_put_request(req2); + + return err; +} + +/* + * ssdfs_segbmap_wait_finish_commit_logs() - wait commit logs ending + * @segbmap: pointer on segment bitmap object + * @fragments_count: count of fragments in segbmap + * + * This method is waiting the end of commit logs operation. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_segbmap_wait_finish_commit_logs(struct ssdfs_segment_bmap *segbmap, + u16 fragments_count) +{ + struct ssdfs_segbmap_fragment_desc *fragment; + struct ssdfs_segment_request *req1 = NULL, *req2 = NULL; + bool has_backup; + wait_queue_head_t *wq = NULL; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap); + BUG_ON(!rwsem_is_locked(&segbmap->search_lock)); + + SSDFS_DBG("segbmap %p, fragments_count %u\n", + segbmap, fragments_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + has_backup = segbmap->flags & SSDFS_SEGBMAP_HAS_COPY; + + for (i = 0; i < fragments_count; i++) { + fragment = &segbmap->desc_array[i]; + + switch (fragment->state) { + case SSDFS_SEGBMAP_FRAG_DIRTY: + SSDFS_ERR("found unprocessed dirty fragment: " + "index %d\n", i); + return -ERANGE; + + case SSDFS_SEGBMAP_FRAG_TOWRITE: + req1 = &fragment->flush_req1; + req2 = &fragment->flush_req2; + +check_req1_state: + switch (atomic_read(&req1->result.state)) { + case SSDFS_REQ_CREATED: + case SSDFS_REQ_STARTED: + wq = &req1->private.wait_queue; + + err = wait_event_killable_timeout(*wq, + has_request_been_executed(req1), + SSDFS_DEFAULT_TIMEOUT); + if (err < 0) + WARN_ON(err < 0); + else + err = 0; + + goto check_req1_state; + break; + + case SSDFS_REQ_FINISHED: + /* do nothing */ + break; + + case SSDFS_REQ_FAILED: + err = req1->result.err; + + if (!err) { + err = -ERANGE; + SSDFS_ERR("error code is absent\n"); + } + + SSDFS_ERR("flush request is failed: " + "err %d\n", err); + return err; + + default: + SSDFS_ERR("invalid result's state %#x\n", + atomic_read(&req1->result.state)); + return -ERANGE; + } + + if (!has_backup) + goto finish_fragment_check; + +check_req2_state: + switch (atomic_read(&req2->result.state)) { + case SSDFS_REQ_CREATED: + case SSDFS_REQ_STARTED: + wq = &req2->private.wait_queue; + + err = wait_event_killable_timeout(*wq, + has_request_been_executed(req2), + SSDFS_DEFAULT_TIMEOUT); + if (err < 0) + WARN_ON(err < 0); + else + err = 0; + + goto check_req2_state; + break; + + case SSDFS_REQ_FINISHED: + /* do nothing */ + break; + + case SSDFS_REQ_FAILED: + err = req2->result.err; + + if (!err) { + err = -ERANGE; + SSDFS_ERR("error code is absent\n"); + } + + SSDFS_ERR("flush request is failed: " + "err %d\n", err); + return err; + + default: + SSDFS_ERR("invalid result's state %#x\n", + atomic_read(&req2->result.state)); + return -ERANGE; + } + +finish_fragment_check: + fragment->state = SSDFS_SEGBMAP_FRAG_INITIALIZED; + break; + + default: + /* do nothing */ + break; + } + } + + return 0; +} + +/* TODO: copy all fragments' headers into checkpoint */ +/* TODO: mark superblock as dirty */ +/* TODO: new checkpoint should be stored into superblock segment */ +static +int ssdfs_segbmap_create_checkpoint(struct ssdfs_segment_bmap *segbmap) +{ +#ifdef CONFIG_SSDFS_DEBUG + /* TODO: implement */ + SSDFS_DBG("TODO: implement %s\n", __func__); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_segbmap_flush() - flush segbmap current state + * @segbmap: pointer on segment bitmap object + * + * This method tries to flush current state of segbmap. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EFAULT - segbmap has corrupted state. + * %-ERANGE - internal error. + */ +int ssdfs_segbmap_flush(struct ssdfs_segment_bmap *segbmap) +{ + u16 fragments_count; + u16 fragment_size; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("segbmap %p\n", + segbmap); +#else + SSDFS_DBG("segbmap %p\n", + segbmap); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + inode_lock_shared(segbmap->fsi->segbmap_inode); + down_read(&segbmap->resize_lock); + + if (segbmap->flags & SSDFS_SEGBMAP_ERROR) { + err = -EFAULT; + ssdfs_fs_error(segbmap->fsi->sb, + __FILE__, __func__, __LINE__, + "segbmap has corrupted state\n"); + goto finish_segbmap_flush; + } + + fragments_count = segbmap->fragments_count; + fragment_size = segbmap->fragment_size; + + ssdfs_sb_segbmap_header_correct_state(segbmap); + + down_write(&segbmap->search_lock); + + err = ssdfs_segbmap_flush_dirty_fragments(segbmap, + fragments_count, + fragment_size); + if (err == -ENODATA) { + err = 0; + up_write(&segbmap->search_lock); + SSDFS_DBG("segbmap hasn't dirty fragments\n"); + goto finish_segbmap_flush; + } else if (unlikely(err)) { + up_write(&segbmap->search_lock); + ssdfs_fs_error(segbmap->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to flush segbmap: err %d\n", + err); + goto finish_segbmap_flush; + } + + err = ssdfs_segbmap_wait_flush_end(segbmap, fragments_count); + if (unlikely(err)) { + up_write(&segbmap->search_lock); + ssdfs_fs_error(segbmap->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to flush segbmap: err %d\n", + err); + goto finish_segbmap_flush; + } + + err = ssdfs_segbmap_issue_commit_logs(segbmap, + fragments_count, + fragment_size); + if (unlikely(err)) { + up_write(&segbmap->search_lock); + ssdfs_fs_error(segbmap->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to flush segbmap: err %d\n", + err); + goto finish_segbmap_flush; + } + + err = ssdfs_segbmap_wait_finish_commit_logs(segbmap, + fragments_count); + if (unlikely(err)) { + up_write(&segbmap->search_lock); + ssdfs_fs_error(segbmap->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to flush segbmap: err %d\n", + err); + goto finish_segbmap_flush; + } + + downgrade_write(&segbmap->search_lock); + + err = ssdfs_segbmap_create_checkpoint(segbmap); + if (unlikely(err)) { + ssdfs_fs_error(segbmap->fsi->sb, + __FILE__, __func__, __LINE__, + "fail to create segbmap's checkpoint: " + "err %d\n", + err); + } + + up_read(&segbmap->search_lock); + +finish_segbmap_flush: + up_read(&segbmap->resize_lock); + inode_unlock_shared(segbmap->fsi->segbmap_inode); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +int ssdfs_segbmap_resize(struct ssdfs_segment_bmap *segbmap, + u64 new_items_count) +{ +#ifdef CONFIG_SSDFS_DEBUG + /* TODO: implement */ + SSDFS_DBG("TODO: implement %s\n", __func__); +#endif /* CONFIG_SSDFS_DEBUG */ + + return -ENOSYS; +} + +/* + * ssdfs_segbmap_check_fragment_validity() - check fragment validity + * @segbmap: pointer on segment bitmap object + * @fragment_index: fragment index + * + * This method checks that fragment is ready for operations. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EAGAIN - fragment is under initialization yet. + * %-EFAULT - fragment initialization has failed. + */ +static +int ssdfs_segbmap_check_fragment_validity(struct ssdfs_segment_bmap *segbmap, + pgoff_t fragment_index) +{ + struct ssdfs_segbmap_fragment_desc *fragment; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap); + BUG_ON(!rwsem_is_locked(&segbmap->search_lock)); + + SSDFS_DBG("segbmap %p, fragment_index %lu\n", + segbmap, fragment_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + fragment = &segbmap->desc_array[fragment_index]; + + switch (fragment->state) { + case SSDFS_SEGBMAP_FRAG_CREATED: + return -EAGAIN; + + case SSDFS_SEGBMAP_FRAG_INIT_FAILED: + return -EFAULT; + + case SSDFS_SEGBMAP_FRAG_INITIALIZED: + case SSDFS_SEGBMAP_FRAG_DIRTY: + /* do nothing */ + break; + + default: + BUG(); + } + + return 0; +} + +/* + * ssdfs_segbmap_get_state() - get segment state + * @segbmap: pointer on segment bitmap object + * @seg: segment number + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to get state of @seg. + * + * RETURN: + * [success] - segment state + * [failure] - error code: + * + * %-EAGAIN - fragment is under initialization yet. + * %-EFAULT - segbmap has inconsistent state. + * %-ERANGE - internal error. + */ +int ssdfs_segbmap_get_state(struct ssdfs_segment_bmap *segbmap, + u64 seg, struct completion **end) +{ + u32 items_per_byte = SSDFS_ITEMS_PER_BYTE(SSDFS_SEG_STATE_BITS); + u32 hdr_size = sizeof(struct ssdfs_segbmap_fragment_header); + u64 items_count; + u16 fragments_count; + u16 fragment_size; + pgoff_t fragment_index; + struct page *page; + u64 page_item; + u32 byte_offset; + void *kaddr; + u8 *byte_ptr; + u32 byte_item; + int state = SSDFS_SEG_STATE_MAX; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap); + + SSDFS_DBG("segbmap %p, seg %llu\n", + segbmap, seg); +#endif /* CONFIG_SSDFS_DEBUG */ + + *end = NULL; + + inode_lock_shared(segbmap->fsi->segbmap_inode); + down_read(&segbmap->resize_lock); + + items_count = segbmap->items_count; + fragments_count = segbmap->fragments_count; + fragment_size = segbmap->fragment_size; + + if (segbmap->flags & SSDFS_SEGBMAP_ERROR) { + err = -EFAULT; + ssdfs_fs_error(segbmap->fsi->sb, + __FILE__, __func__, __LINE__, + "segbmap has corrupted state\n"); + goto finish_segment_check; + } + + if (seg >= items_count) { + err = -ERANGE; + SSDFS_ERR("seg %llu >= items_count %llu\n", + seg, items_count); + goto finish_segment_check; + } + + fragment_index = ssdfs_segbmap_seg_2_fragment_index(seg); + if (fragment_index >= fragments_count) { + err = -EFAULT; + ssdfs_fs_error(segbmap->fsi->sb, + __FILE__, __func__, __LINE__, + "fragment_index %lu >= fragments_count %u\n", + fragment_index, fragments_count); + goto finish_segment_check; + } + + down_read(&segbmap->search_lock); + + *end = &segbmap->desc_array[fragment_index].init_end; + + err = ssdfs_segbmap_check_fragment_validity(segbmap, fragment_index); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment %lu is not initialized yet\n", + fragment_index); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_get_state; + } else if (unlikely(err)) { + SSDFS_ERR("fragment %lu init has failed\n", + fragment_index); + goto finish_get_state; + } + + page = find_lock_page(&segbmap->pages, fragment_index); + if (!page) { + err = -ERANGE; + SSDFS_ERR("fail to get fragment %lu page\n", + fragment_index); + goto finish_get_state; + } + + ssdfs_account_locked_page(page); + + page_item = ssdfs_segbmap_define_first_fragment_item(fragment_index, + fragment_size); + if (seg < page_item) { + err = -ERANGE; + SSDFS_ERR("seg %llu < page_item %llu\n", + seg, page_item); + goto free_page; + } + + page_item = seg - page_item; + + if (page_item >= ssdfs_segbmap_items_per_fragment(fragment_size)) { + err = -ERANGE; + SSDFS_ERR("invalid page_item %llu\n", + page_item); + goto free_page; + } + + byte_offset = ssdfs_segbmap_get_item_byte_offset(page_item); + + if (byte_offset >= PAGE_SIZE) { + err = -ERANGE; + SSDFS_ERR("invalid byte_offset %u\n", + byte_offset); + goto free_page; + } + + byte_item = page_item - ((byte_offset - hdr_size) * items_per_byte); + + kaddr = kmap_local_page(page); + byte_ptr = (u8 *)kaddr + byte_offset; + state = ssdfs_segbmap_get_state_from_byte(byte_ptr, byte_item); + kunmap_local(kaddr); + +free_page: + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_get_state: + up_read(&segbmap->search_lock); + +finish_segment_check: + up_read(&segbmap->resize_lock); + inode_unlock_shared(segbmap->fsi->segbmap_inode); + + if (unlikely(err)) + return err; + + return state; +} + +/* + * ssdfs_segbmap_check_state() - check segment state + * @segbmap: pointer on segment bitmap object + * @seg: segment number + * @state: checking state + * @end: pointer on completion for waiting init ending [out] + * + * This method checks that @seg has @state. + * + * RETURN: + * [success] - segment has (1) or hasn't (0) requested @state + * [failure] - error code: + * + * %-EAGAIN - fragment is under initialization yet. + * %-EFAULT - segbmap has inconsistent state. + * %-ERANGE - internal error. + */ +int ssdfs_segbmap_check_state(struct ssdfs_segment_bmap *segbmap, + u64 seg, int state, + struct completion **end) +{ + int res; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap); + BUG_ON(state < SSDFS_SEG_CLEAN || + state >= SSDFS_SEG_STATE_MAX); + + SSDFS_DBG("segbmap %p, seg %llu, state %#x\n", + segbmap, seg, state); +#endif /* CONFIG_SSDFS_DEBUG */ + + res = ssdfs_segbmap_get_state(segbmap, seg, end); + if (res == -EAGAIN) { + SSDFS_DBG("fragment is not initialized yet\n"); + return res; + } else if (unlikely(res < 0)) { + SSDFS_WARN("fail to get segment %llu state: err %d\n", + seg, res); + return res; + } else if (res != state) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("res %#x != state %#x\n", + res, state); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + return 1; +} + +/* + * ssdfs_segbmap_set_state_in_byte() - set state of item in byte + * @byte_ptr: pointer on byte + * @byte_item: index of item in byte + * @old_state: pointer on old state value [in|out] + * @new_state: new state value + */ +static inline +int ssdfs_segbmap_set_state_in_byte(u8 *byte_ptr, u32 byte_item, + int *old_state, int new_state) +{ + u8 value; + int shift = byte_item * SSDFS_SEG_STATE_BITS; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("byte_ptr %p, byte_item %u, " + "old_state %p, new_state %#x\n", + byte_ptr, byte_item, + old_state, new_state); + + BUG_ON(!byte_ptr || !old_state); + BUG_ON(byte_item >= SSDFS_ITEMS_PER_BYTE(SSDFS_SEG_STATE_BITS)); +#endif /* CONFIG_SSDFS_DEBUG */ + + *old_state = (int)((*byte_ptr >> shift) & SSDFS_SEG_STATE_MASK); + + if (*old_state < SSDFS_SEG_CLEAN || + *old_state >= SSDFS_SEG_STATE_MAX) { + SSDFS_ERR("invalid old_state %#x\n", + *old_state); + return -ERANGE; + } + + if (*old_state == new_state) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("old_state %#x == new_state %#x\n", + *old_state, new_state); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EEXIST; + } + + value = new_state & SSDFS_SEG_STATE_MASK; + value <<= shift; + + *byte_ptr &= ~(SSDFS_SEG_STATE_MASK << shift); + *byte_ptr |= value; + + return 0; +} + +/* + * ssdfs_segbmap_correct_fragment_header() - correct fragment's header + * @segbmap: pointer on segment bitmap object + * @fragment_index: fragment index + * @old_state: old state value + * @new_state: new state value + * @kaddr: pointer on fragment's buffer + */ +static +void ssdfs_segbmap_correct_fragment_header(struct ssdfs_segment_bmap *segbmap, + pgoff_t fragment_index, + int old_state, int new_state, + void *kaddr) +{ + struct ssdfs_segbmap_fragment_desc *fragment; + struct ssdfs_segbmap_fragment_header *hdr; + unsigned long *fbmap; + u16 fragment_bytes; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap || !kaddr); + BUG_ON(!rwsem_is_locked(&segbmap->search_lock)); + + SSDFS_DBG("segbmap %p, fragment_index %lu, " + "old_state %#x, new_state %#x, kaddr %p\n", + segbmap, fragment_index, + old_state, new_state, kaddr); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (old_state == new_state) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("old_state %#x == new_state %#x\n", + old_state, new_state); +#endif /* CONFIG_SSDFS_DEBUG */ + return; + } + + fragment = &segbmap->desc_array[fragment_index]; + hdr = SSDFS_SBMP_FRAG_HDR(kaddr); + fragment_bytes = le16_to_cpu(hdr->fragment_bytes); + + fragment->state = SSDFS_SEGBMAP_FRAG_DIRTY; + + switch (old_state) { + case SSDFS_SEG_CLEAN: + switch (new_state) { + case SSDFS_SEG_DATA_USING: + case SSDFS_SEG_LEAF_NODE_USING: + case SSDFS_SEG_HYBRID_NODE_USING: + case SSDFS_SEG_INDEX_NODE_USING: + case SSDFS_SEG_USED: + case SSDFS_SEG_PRE_DIRTY: + case SSDFS_SEG_DIRTY: + case SSDFS_SEG_RESERVED: + case SSDFS_SEG_BAD: + /* expected state */ + break; + + default: + SSDFS_WARN("unexpected change: " + "old_state %#x, new_state %#x\n", + old_state, new_state); + break; + } + break; + + case SSDFS_SEG_DATA_USING: + case SSDFS_SEG_LEAF_NODE_USING: + case SSDFS_SEG_HYBRID_NODE_USING: + case SSDFS_SEG_INDEX_NODE_USING: + switch (new_state) { + case SSDFS_SEG_CLEAN: + case SSDFS_SEG_USED: + case SSDFS_SEG_PRE_DIRTY: + case SSDFS_SEG_DIRTY: + /* expected state */ + break; + + default: + SSDFS_WARN("unexpected change: " + "old_state %#x, new_state %#x\n", + old_state, new_state); + break; + } + break; + + case SSDFS_SEG_USED: + switch (new_state) { + case SSDFS_SEG_CLEAN: + case SSDFS_SEG_DATA_USING: + case SSDFS_SEG_LEAF_NODE_USING: + case SSDFS_SEG_HYBRID_NODE_USING: + case SSDFS_SEG_INDEX_NODE_USING: + case SSDFS_SEG_PRE_DIRTY: + case SSDFS_SEG_DIRTY: + /* expected state */ + break; + + default: + SSDFS_WARN("unexpected change: " + "old_state %#x, new_state %#x\n", + old_state, new_state); + break; + } + break; + + case SSDFS_SEG_PRE_DIRTY: + switch (new_state) { + case SSDFS_SEG_CLEAN: + case SSDFS_SEG_DATA_USING: + case SSDFS_SEG_LEAF_NODE_USING: + case SSDFS_SEG_HYBRID_NODE_USING: + case SSDFS_SEG_INDEX_NODE_USING: + case SSDFS_SEG_USED: + case SSDFS_SEG_DIRTY: + /* expected state */ + break; + + default: + SSDFS_WARN("unexpected change: " + "old_state %#x, new_state %#x\n", + old_state, new_state); + break; + } + break; + + case SSDFS_SEG_RESERVED: + switch (new_state) { + case SSDFS_SEG_DIRTY: + /* expected state */ + break; + + default: + SSDFS_WARN("unexpected change: " + "old_state %#x, new_state %#x\n", + old_state, new_state); + break; + } + break; + + case SSDFS_SEG_DIRTY: + switch (new_state) { + case SSDFS_SEG_CLEAN: + case SSDFS_SEG_DATA_USING: + case SSDFS_SEG_LEAF_NODE_USING: + case SSDFS_SEG_HYBRID_NODE_USING: + case SSDFS_SEG_INDEX_NODE_USING: + case SSDFS_SEG_USED: + case SSDFS_SEG_PRE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_WARN("unexpected change: " + "old_state %#x, new_state %#x\n", + old_state, new_state); + break; + } + break; + + case SSDFS_SEG_BAD: + switch (new_state) { + case SSDFS_SEG_CLEAN: + case SSDFS_SEG_BAD: + /* expected state */ + break; + + default: + SSDFS_WARN("unexpected change: " + "old_state %#x, new_state %#x\n", + old_state, new_state); + break; + } + break; + + + default: + SSDFS_WARN("unexpected state: " + "old_state %#x\n", + old_state); + break; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("BEFORE: total_segs %u, " + "clean_or_using_segs %u, " + "used_or_dirty_segs %u, " + "bad_segs %u\n", + fragment->total_segs, + fragment->clean_or_using_segs, + fragment->used_or_dirty_segs, + fragment->bad_segs); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (old_state) { + case SSDFS_SEG_CLEAN: + case SSDFS_SEG_DATA_USING: + case SSDFS_SEG_LEAF_NODE_USING: + case SSDFS_SEG_HYBRID_NODE_USING: + case SSDFS_SEG_INDEX_NODE_USING: + case SSDFS_SEG_RESERVED: + fbmap = segbmap->fbmap[SSDFS_SEGBMAP_CLEAN_USING_FBMAP]; + BUG_ON(fragment->clean_or_using_segs == 0); + fragment->clean_or_using_segs--; + if (fragment->clean_or_using_segs == 0) + bitmap_clear(fbmap, fragment_index, 1); + break; + + case SSDFS_SEG_USED: + case SSDFS_SEG_PRE_DIRTY: + case SSDFS_SEG_DIRTY: + fbmap = segbmap->fbmap[SSDFS_SEGBMAP_USED_DIRTY_FBMAP]; + BUG_ON(fragment->used_or_dirty_segs == 0); + fragment->used_or_dirty_segs--; + if (fragment->used_or_dirty_segs == 0) + bitmap_clear(fbmap, fragment_index, 1); + break; + + case SSDFS_SEG_BAD: + fbmap = segbmap->fbmap[SSDFS_SEGBMAP_BAD_FBMAP]; + BUG_ON(fragment->bad_segs == 0); + fragment->bad_segs--; + if (fragment->bad_segs == 0) + bitmap_clear(fbmap, fragment_index, 1); + break; + + default: + BUG(); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("OLD_STATE: total_segs %u, " + "clean_or_using_segs %u, " + "used_or_dirty_segs %u, " + "bad_segs %u\n", + fragment->total_segs, + fragment->clean_or_using_segs, + fragment->used_or_dirty_segs, + fragment->bad_segs); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (new_state) { + case SSDFS_SEG_CLEAN: + case SSDFS_SEG_DATA_USING: + case SSDFS_SEG_LEAF_NODE_USING: + case SSDFS_SEG_HYBRID_NODE_USING: + case SSDFS_SEG_INDEX_NODE_USING: + case SSDFS_SEG_RESERVED: + fbmap = segbmap->fbmap[SSDFS_SEGBMAP_CLEAN_USING_FBMAP]; + if (fragment->clean_or_using_segs == 0) + bitmap_set(fbmap, fragment_index, 1); + BUG_ON((fragment->clean_or_using_segs + 1) == U16_MAX); + fragment->clean_or_using_segs++; + break; + + case SSDFS_SEG_USED: + case SSDFS_SEG_PRE_DIRTY: + case SSDFS_SEG_DIRTY: + fbmap = segbmap->fbmap[SSDFS_SEGBMAP_USED_DIRTY_FBMAP]; + if (fragment->used_or_dirty_segs == 0) + bitmap_set(fbmap, fragment_index, 1); + BUG_ON((fragment->used_or_dirty_segs + 1) == U16_MAX); + fragment->used_or_dirty_segs++; + break; + + case SSDFS_SEG_BAD: + fbmap = segbmap->fbmap[SSDFS_SEGBMAP_BAD_FBMAP]; + if (fragment->bad_segs == 0) + bitmap_set(fbmap, fragment_index, 1); + BUG_ON((fragment->bad_segs + 1) == U16_MAX); + fragment->bad_segs++; + break; + + default: + BUG(); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("NEW_STATE: total_segs %u, " + "clean_or_using_segs %u, " + "used_or_dirty_segs %u, " + "bad_segs %u\n", + fragment->total_segs, + fragment->clean_or_using_segs, + fragment->used_or_dirty_segs, + fragment->bad_segs); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr->clean_or_using_segs = cpu_to_le16(fragment->clean_or_using_segs); + hdr->used_or_dirty_segs = cpu_to_le16(fragment->used_or_dirty_segs); + hdr->bad_segs = cpu_to_le16(fragment->bad_segs); + + hdr->checksum = 0; + hdr->checksum = ssdfs_crc32_le(kaddr, fragment_bytes); + + fbmap = segbmap->fbmap[SSDFS_SEGBMAP_MODIFICATION_FBMAP]; + bitmap_set(fbmap, fragment_index, 1); +} + +/* + * __ssdfs_segbmap_change_state() - change segment state + * @segbmap: pointer on segment bitmap object + * @seg: segment number + * @new_state: new state + * @fragment_index: index of fragment + * @fragment_size: size of fragment in bytes + * + * This method tries to change state of @seg. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EAGAIN - fragment is under initialization yet. + * %-EFAULT - segbmap has inconsistent state. + * %-ERANGE - internal error. + */ +static +int __ssdfs_segbmap_change_state(struct ssdfs_segment_bmap *segbmap, + u64 seg, int new_state, + pgoff_t fragment_index, + u16 fragment_size) +{ + u32 items_per_byte = SSDFS_ITEMS_PER_BYTE(SSDFS_SEG_STATE_BITS); + struct page *page; + u64 page_item; + u32 byte_offset; + u32 byte_item; + void *kaddr; + u8 *byte_ptr; + int old_state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap); + BUG_ON(!rwsem_is_locked(&segbmap->search_lock)); + + SSDFS_DBG("segbmap %p, seg %llu, new_state %#x, " + "fragment_index %lu, fragment_size %u\n", + segbmap, seg, new_state, + fragment_index, fragment_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_segbmap_check_fragment_validity(segbmap, fragment_index); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment %lu is not initialized yet\n", + fragment_index); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_set_state; + } else if (unlikely(err)) { + SSDFS_ERR("fragment %lu init has failed\n", + fragment_index); + goto finish_set_state; + } + + page = find_lock_page(&segbmap->pages, fragment_index); + if (!page) { + err = -ERANGE; + SSDFS_ERR("fail to get fragment %lu page\n", + fragment_index); + goto finish_set_state; + } + + ssdfs_account_locked_page(page); + + page_item = ssdfs_segbmap_define_first_fragment_item(fragment_index, + fragment_size); + if (seg < page_item) { + err = -ERANGE; + SSDFS_ERR("seg %llu < page_item %llu\n", + seg, page_item); + goto free_page; + } + + page_item = seg - page_item; + + if (page_item >= ssdfs_segbmap_items_per_fragment(fragment_size)) { + err = -ERANGE; + SSDFS_ERR("invalid page_item %llu\n", + page_item); + goto free_page; + } + + byte_offset = ssdfs_segbmap_get_item_byte_offset(page_item); + + if (byte_offset >= PAGE_SIZE) { + err = -ERANGE; + SSDFS_ERR("invalid byte_offset %u\n", + byte_offset); + goto free_page; + } + + div_u64_rem(page_item, items_per_byte, &byte_item); + + kaddr = kmap_local_page(page); + byte_ptr = (u8 *)kaddr + byte_offset; + err = ssdfs_segbmap_set_state_in_byte(byte_ptr, byte_item, + &old_state, new_state); + if (!err) { + ssdfs_segbmap_correct_fragment_header(segbmap, fragment_index, + old_state, new_state, + kaddr); + } + kunmap_local(kaddr); + + if (err == -EEXIST) { + err = 0; + SetPageUptodate(page); +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("old_state %#x == new_state %#x\n", + old_state, new_state); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to set state: " + "seg %llu, new_state %#x, err %d\n", + seg, new_state, err); + goto free_page; + } else { + SetPageUptodate(page); + if (!PageDirty(page)) + ssdfs_set_page_dirty(page); + } + +free_page: + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_set_state: + return err; +} + +/* + * ssdfs_segbmap_change_state() - change segment state + * @segbmap: pointer on segment bitmap object + * @seg: segment number + * @new_state: new state + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to change state of @seg. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EAGAIN - fragment is under initialization yet. + * %-EFAULT - segbmap has inconsistent state. + * %-ERANGE - internal error. + */ +int ssdfs_segbmap_change_state(struct ssdfs_segment_bmap *segbmap, + u64 seg, int new_state, + struct completion **end) +{ + u64 items_count; + u16 fragments_count; + u16 fragment_size; + pgoff_t fragment_index; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("segbmap %p, seg %llu, new_state %#x\n", + segbmap, seg, new_state); +#else + SSDFS_DBG("segbmap %p, seg %llu, new_state %#x\n", + segbmap, seg, new_state); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + *end = NULL; + + inode_lock_shared(segbmap->fsi->segbmap_inode); + down_read(&segbmap->resize_lock); + + items_count = segbmap->items_count; + fragments_count = segbmap->fragments_count; + fragment_size = segbmap->fragment_size; + + if (segbmap->flags & SSDFS_SEGBMAP_ERROR) { + err = -EFAULT; + ssdfs_fs_error(segbmap->fsi->sb, + __FILE__, __func__, __LINE__, + "segbmap has corrupted state\n"); + goto finish_segment_check; + } + + if (seg >= items_count) { + err = -ERANGE; + SSDFS_ERR("seg %llu >= items_count %llu\n", + seg, items_count); + goto finish_segment_check; + } + + fragment_index = ssdfs_segbmap_seg_2_fragment_index(seg); + if (fragment_index >= fragments_count) { + err = -EFAULT; + ssdfs_fs_error(segbmap->fsi->sb, + __FILE__, __func__, __LINE__, + "fragment_index %lu >= fragments_count %u\n", + fragment_index, fragments_count); + goto finish_segment_check; + } + + down_write(&segbmap->search_lock); + *end = &segbmap->desc_array[fragment_index].init_end; + err = __ssdfs_segbmap_change_state(segbmap, seg, new_state, + fragment_index, fragment_size); + up_write(&segbmap->search_lock); + +finish_segment_check: + up_read(&segbmap->resize_lock); + inode_unlock_shared(segbmap->fsi->segbmap_inode); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * ssdfs_segbmap_choose_fbmap() - choose fragment bitmap + * @segbmap: pointer on segment bitmap object + * @state: requested state + * @mask: requested mask + * + * RETURN: + * [success] - pointer on fragment bitmap + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-EOPNOTSUPP - operation is not supported. + */ +static +unsigned long *ssdfs_segbmap_choose_fbmap(struct ssdfs_segment_bmap *segbmap, + int state, int mask) +{ + unsigned long *fbmap; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap); + BUG_ON(!rwsem_is_locked(&segbmap->search_lock)); + + if (state < SSDFS_SEG_CLEAN || state >= SSDFS_SEG_STATE_MAX) { + SSDFS_ERR("unknown segment state %#x\n", state); + return ERR_PTR(-EINVAL); + } + + if ((mask & SSDFS_SEG_CLEAN_USING_MASK) != mask && + (mask & SSDFS_SEG_USED_DIRTY_MASK) != mask && + (mask & SSDFS_SEG_BAD_STATE_MASK) != mask) { + SSDFS_ERR("unsupported set of flags %#x\n", + mask); + return ERR_PTR(-EOPNOTSUPP); + } + + SSDFS_DBG("segbmap %p, state %#x, mask %#x\n", + segbmap, state, mask); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (mask & SSDFS_SEG_CLEAN_USING_MASK) { + fbmap = segbmap->fbmap[SSDFS_SEGBMAP_CLEAN_USING_FBMAP]; + + switch (state) { + case SSDFS_SEG_CLEAN: + case SSDFS_SEG_DATA_USING: + case SSDFS_SEG_LEAF_NODE_USING: + case SSDFS_SEG_HYBRID_NODE_USING: + case SSDFS_SEG_INDEX_NODE_USING: + return fbmap; + + default: + return ERR_PTR(-EOPNOTSUPP); + } + } else if (mask & SSDFS_SEG_USED_DIRTY_MASK) { + fbmap = segbmap->fbmap[SSDFS_SEGBMAP_USED_DIRTY_FBMAP]; + + switch (state) { + case SSDFS_SEG_USED: + case SSDFS_SEG_PRE_DIRTY: + case SSDFS_SEG_DIRTY: + return fbmap; + + default: + return ERR_PTR(-EOPNOTSUPP); + } + } else if (mask & SSDFS_SEG_BAD_STATE_MASK) { + fbmap = segbmap->fbmap[SSDFS_SEGBMAP_BAD_FBMAP]; + + switch (state) { + case SSDFS_SEG_BAD: + return fbmap; + + default: + return ERR_PTR(-EOPNOTSUPP); + } + } + + return ERR_PTR(-EOPNOTSUPP); +} + +/* + * ssdfs_segbmap_find_fragment() - find fragment + * @segbmap: pointer on segment bitmap object + * @fbmap: bitmap of fragments + * @start_fragment: start fragment for search + * @max_fragment: upper bound for fragment search + * @found_fragment: found fragment index [out] + * + * This method tries to find fragment in bitmap of + * fragments. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-EAGAIN - fragment is under initialization yet. + * %-EFAULT - segbmap has inconsistent state. + * %-ENODATA - bitmap hasn't any valid fragment. + */ +static +int ssdfs_segbmap_find_fragment(struct ssdfs_segment_bmap *segbmap, + unsigned long *fbmap, + u16 start_fragment, u16 max_fragment, + int *found_fragment) +{ + unsigned long *addr; + u16 long_offset; + u16 first_fragment; + u16 checking_fragment; + u16 size, requested_size, checked_size; + unsigned long found; + u16 i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap || !fbmap || !found_fragment); + BUG_ON(!rwsem_is_locked(&segbmap->search_lock)); + + SSDFS_DBG("fbmap %p, start_fragment %u, max_fragment %u\n", + fbmap, start_fragment, max_fragment); +#endif /* CONFIG_SSDFS_DEBUG */ + + *found_fragment = U16_MAX; + + if (start_fragment >= max_fragment) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_fragment %u >= max_fragment %u\n", + start_fragment, max_fragment); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + + long_offset = (start_fragment + BITS_PER_LONG - 1) / BITS_PER_LONG; + first_fragment = long_offset * BITS_PER_LONG; + + checking_fragment = min_t(u16, start_fragment, first_fragment); + checked_size = max_fragment - checking_fragment; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_fragment %u, max_fragment %u, " + "long_offset %u, first_fragment %u, " + "checking_fragment %u, checked_size %u\n", + start_fragment, max_fragment, + long_offset, first_fragment, + checking_fragment, checked_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < checked_size; i++) { + struct ssdfs_segbmap_fragment_desc *desc; + u16 index = checking_fragment + i; + + desc = &segbmap->desc_array[index]; + + switch (desc->state) { + case SSDFS_SEGBMAP_FRAG_INITIALIZED: + case SSDFS_SEGBMAP_FRAG_DIRTY: + /* + * We can use this fragment. + * Simply go ahead. + */ + break; + + case SSDFS_SEGBMAP_FRAG_CREATED: + /* It needs to wait the fragment's init */ + err = -EAGAIN; + checked_size = index - checking_fragment; + goto check_presence_valid_fragments; + break; + + case SSDFS_SEGBMAP_FRAG_INIT_FAILED: + err = -EFAULT; + *found_fragment = index; + SSDFS_ERR("fragment %u is corrupted\n", + index); + checked_size = 0; + goto check_presence_valid_fragments; + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid fragment's state %#x\n", + desc->state); + goto check_presence_valid_fragments; + break; + } + } + +check_presence_valid_fragments: + if (err == -ERANGE || err == -EFAULT) { + /* Simply return the error */ + return err; + } else if (err == -EAGAIN) { + if (checked_size == 0) { + SSDFS_DBG("no valid fragments yet\n"); + return err; + } else + err = 0; + } + + if (start_fragment < first_fragment) { + unsigned long value = *(fbmap + (long_offset - 1)); + + size = start_fragment - ((long_offset - 1) * BITS_PER_LONG); + size = min_t(u16, size, checked_size); + bitmap_clear(&value, 0, size); + + if (value != 0) { + found = __ffs(value); + *found_fragment = start_fragment + (u16)(found - size); + return 0; + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find fragment: " + "value %#lx\n", + value); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + } else { + /* first_fragment <= start_fragment */ + addr = fbmap + long_offset; + requested_size = max_fragment - first_fragment; + size = min_t(u16, requested_size, checked_size); + + if (size == 0) { + SSDFS_DBG("no valid fragments yet\n"); + return -EAGAIN; + } + + found = find_first_bit(addr, size); + + found += first_fragment; + BUG_ON(found >= U16_MAX); + *found_fragment = found; + + if (found >= size) { + if (size < requested_size) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("Wait init of fragment %lu\n", + found); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EAGAIN; + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find fragment: " + "found %lu, size %u\n", + found, size); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + } + + return 0; + } + + return -ERANGE; +} + +/* + * ssdfs_segbmap_correct_search_start() - correct start item for search + * @fragment_index: index of fragment + * @old_start: old start value + * @max: upper bound for search + * @fragment_size: size of fragment in bytes + */ +static +u64 ssdfs_segbmap_correct_search_start(u16 fragment_index, + u64 old_start, u64 max, + u16 fragment_size) +{ + u64 first_item, corrected_value; + +#ifdef CONFIG_SSDFS_DEBUG + if (old_start >= max) { + SSDFS_ERR("old_start %llu >= max %llu\n", + old_start, max); + return U64_MAX; + } + + SSDFS_DBG("fragment_index %u, old_start %llu, max %llu\n", + fragment_index, old_start, max); +#endif /* CONFIG_SSDFS_DEBUG */ + + first_item = ssdfs_segbmap_define_first_fragment_item(fragment_index, + fragment_size); + + if (first_item >= max) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("first_item %llu >= max %llu\n", + first_item, max); +#endif /* CONFIG_SSDFS_DEBUG */ + return U64_MAX; + } + + corrected_value = first_item > old_start ? first_item : old_start; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("corrected_value %llu\n", corrected_value); +#endif /* CONFIG_SSDFS_DEBUG */ + + return corrected_value; +} + +/* + * ssdfs_segbmap_define_items_count() - define items count for state/mask + * @desc: fragment descriptor + * @state: requested state + * @mask: requested mask + */ +static inline +u16 ssdfs_segbmap_define_items_count(struct ssdfs_segbmap_fragment_desc *desc, + int state, int mask) +{ + int complex_mask; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!desc); + BUG_ON(!mask); + + SSDFS_DBG("desc %p, state %#x, mask %#x\n", + desc, state, mask); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (state) { + case SSDFS_SEG_CLEAN: + complex_mask = SSDFS_SEG_CLEAN_STATE_FLAG | mask; + break; + + case SSDFS_SEG_DATA_USING: + complex_mask = SSDFS_SEG_DATA_USING_STATE_FLAG | mask; + break; + + case SSDFS_SEG_LEAF_NODE_USING: + complex_mask = SSDFS_SEG_LEAF_NODE_USING_STATE_FLAG | mask; + break; + + case SSDFS_SEG_HYBRID_NODE_USING: + complex_mask = SSDFS_SEG_HYBRID_NODE_USING_STATE_FLAG | mask; + break; + + case SSDFS_SEG_INDEX_NODE_USING: + complex_mask = SSDFS_SEG_INDEX_NODE_USING_STATE_FLAG | mask; + break; + + case SSDFS_SEG_USED: + complex_mask = SSDFS_SEG_USED_STATE_FLAG | mask; + break; + + case SSDFS_SEG_PRE_DIRTY: + complex_mask = SSDFS_SEG_PRE_DIRTY_STATE_FLAG | mask; + break; + + case SSDFS_SEG_DIRTY: + complex_mask = SSDFS_SEG_DIRTY_STATE_FLAG | mask; + break; + + case SSDFS_SEG_BAD: + complex_mask = SSDFS_SEG_BAD_STATE_FLAG | mask; + break; + + default: + BUG(); + } + + if ((complex_mask & SSDFS_SEG_CLEAN_USING_MASK) != complex_mask && + (complex_mask & SSDFS_SEG_USED_DIRTY_MASK) != complex_mask && + (complex_mask & SSDFS_SEG_BAD_STATE_MASK) != complex_mask) { + SSDFS_ERR("unsupported set of flags %#x\n", + complex_mask); + return U16_MAX; + } + + if (complex_mask & SSDFS_SEG_CLEAN_USING_MASK) + return desc->clean_or_using_segs; + else if (complex_mask & SSDFS_SEG_USED_DIRTY_MASK) + return desc->used_or_dirty_segs; + else if (complex_mask & SSDFS_SEG_BAD_STATE_MASK) + return desc->bad_segs; + + return U16_MAX; +} + +/* + * BYTE_CONTAINS_STATE() - check that byte contains requested state + * @value: pointer on byte + * @state: requested state + */ +static inline +bool BYTE_CONTAINS_STATE(u8 *value, int state) +{ + switch (state) { + case SSDFS_SEG_CLEAN: + return detect_clean_seg[*value]; + + case SSDFS_SEG_DATA_USING: + return detect_data_using_seg[*value]; + + case SSDFS_SEG_LEAF_NODE_USING: + return detect_lnode_using_seg[*value]; + + case SSDFS_SEG_HYBRID_NODE_USING: + return detect_hnode_using_seg[*value]; + + case SSDFS_SEG_INDEX_NODE_USING: + return detect_idxnode_using_seg[*value]; + + case SSDFS_SEG_USED: + return detect_used_seg[*value]; + + case SSDFS_SEG_PRE_DIRTY: + return detect_pre_dirty_seg[*value]; + + case SSDFS_SEG_DIRTY: + return detect_dirty_seg[*value]; + + case SSDFS_SEG_BAD: + return detect_bad_seg[*value]; + }; + + return false; +} + +/* + * BYTE_CONTAINS_MASK() - check that byte contains any state under mask + * @value: pointer on byte + * @mask: requested mask + */ +static inline +bool BYTE_CONTAINS_MASK(u8 *value, int mask) +{ + if (mask & SSDFS_SEG_CLEAN_USING_MASK) + return detect_clean_using_mask[*value]; + else if (mask & SSDFS_SEG_USED_DIRTY_MASK) + return detect_used_dirty_mask[*value]; + else if (mask & SSDFS_SEG_BAD_STATE_MASK) + return detect_bad_seg[*value]; + + return false; +} + +/* + * FIRST_MASK_IN_BYTE() - determine first item's offset for requested mask + * @value: pointer on analysed byte + * @mask: requested mask + * @start_offset: starting item's offset for analysis beginning + * @state_bits: bits per state + * @state_mask: mask of a bitmap's state + * + * This function tries to determine an item for @mask in + * @value starting from @start_off. + * + * RETURN: + * [success] - found item's offset. + * [failure] - BITS_PER_BYTE. + */ +static inline +u8 FIRST_MASK_IN_BYTE(u8 *value, int mask, + u8 start_offset, u8 state_bits, + int state_mask) +{ + u8 i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!value); + BUG_ON(state_bits > BITS_PER_BYTE); + BUG_ON((state_bits % 2) != 0); + BUG_ON(start_offset > SSDFS_ITEMS_PER_BYTE(state_bits)); + + SSDFS_DBG("value %#x, mask %#x, " + "start_offset %u, state_bits %u\n", + *value, mask, start_offset, state_bits); +#endif /* CONFIG_SSDFS_DEBUG */ + + i = start_offset * state_bits; + for (; i < BITS_PER_BYTE; i += state_bits) { + if (IS_STATE_GOOD_FOR_MASK(mask, (*value >> i) & state_mask)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found bit %u, found item %u\n", + i, i / state_bits); +#endif /* CONFIG_SSDFS_DEBUG */ + return i / state_bits; + } + } + + return SSDFS_ITEMS_PER_BYTE(state_bits); +} + +/* + * FIND_FIRST_ITEM_IN_FRAGMENT() - find first item in fragment + * @hdr: pointer on segbmap fragment's header + * @fragment: pointer on bitmap in fragment + * @start_item: start segment number for search + * @max_item: upper bound of segment number for search + * @state: primary state for search + * @mask: mask of additonal states that can be retrieved too + * @found_seg: found segment number [out] + * @found_for_mask: found segment number for mask [out] + * @found_state_for_mask: found state for mask [out] + * + * This method tries to find first item with requested + * state in fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOENT - found segment number for the mask. + * %-ENODATA - fragment doesn't include segment with requested state/mask. + */ +static +int FIND_FIRST_ITEM_IN_FRAGMENT(struct ssdfs_segbmap_fragment_header *hdr, + u8 *fragment, u64 start_item, u64 max_item, + int state, int mask, + u64 *found_seg, u64 *found_for_mask, + int *found_state_for_mask) +{ + u32 items_per_byte = SSDFS_ITEMS_PER_BYTE(SSDFS_SEG_STATE_BITS); + u64 fragment_start_item; + u64 aligned_start, aligned_end; + u32 byte_index, search_bytes; + u64 byte_range; + u8 start_offset; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!hdr || !fragment || !found_seg || !found_for_mask); + + if (start_item >= max_item) { + SSDFS_ERR("start_item %llu >= max_item %llu\n", + start_item, max_item); + return -EINVAL; + } + + SSDFS_DBG("hdr %p, fragment %p, " + "start_item %llu, max_item %llu, " + "state %#x, mask %#x, " + "found_seg %p, found_for_mask %p\n", + hdr, fragment, start_item, max_item, + state, mask, found_seg, found_for_mask); +#endif /* CONFIG_SSDFS_DEBUG */ + + *found_seg = U64_MAX; + *found_for_mask = U64_MAX; + *found_state_for_mask = SSDFS_SEG_STATE_MAX; + + fragment_start_item = le64_to_cpu(hdr->start_item); + + if (fragment_start_item == U64_MAX) { + SSDFS_ERR("invalid fragment start item\n"); + return -ERANGE; + } + + search_bytes = le16_to_cpu(hdr->fragment_bytes) - + sizeof(struct ssdfs_segbmap_fragment_header); + + if (search_bytes == 0 || search_bytes > PAGE_SIZE) { + SSDFS_ERR("invalid fragment_bytes %u\n", + search_bytes); + return -ERANGE; + } + + aligned_start = ALIGNED_START_ITEM(start_item, SSDFS_SEG_STATE_BITS); + aligned_end = ALIGNED_END_ITEM(max_item, SSDFS_SEG_STATE_BITS); + + byte_range = (aligned_end - fragment_start_item) / items_per_byte; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(byte_range >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + search_bytes = min_t(u32, search_bytes, (u32)byte_range); + + if (fragment_start_item <= aligned_start) { + u32 items_range = aligned_start - fragment_start_item; + byte_index = items_range / items_per_byte; + start_offset = (u8)(start_item - aligned_start); + } else { + byte_index = 0; + start_offset = 0; + } + + for (; byte_index < search_bytes; byte_index++) { + u8 *value = fragment + byte_index; + u8 found_offset; + + err = FIND_FIRST_ITEM_IN_BYTE(value, state, + SSDFS_SEG_STATE_BITS, + SSDFS_SEG_STATE_MASK, + start_offset, + BYTE_CONTAINS_STATE, + FIRST_STATE_IN_BYTE, + &found_offset); + + if (err != -ENODATA || *found_for_mask != U64_MAX) + goto ignore_search_for_mask; + + err = FIND_FIRST_ITEM_IN_BYTE(value, mask, + SSDFS_SEG_STATE_BITS, + SSDFS_SEG_STATE_MASK, + start_offset, + BYTE_CONTAINS_MASK, + FIRST_MASK_IN_BYTE, + &found_offset); + + if (!err && found_offset != U64_MAX) { + err = -ENOENT; + + *found_for_mask = fragment_start_item; + *found_for_mask += byte_index * items_per_byte; + *found_for_mask += found_offset; + + if (*found_for_mask >= max_item) { + *found_for_mask = U64_MAX; + goto ignore_search_for_mask; + } + + *found_state_for_mask = + ssdfs_segbmap_get_state_from_byte(value, + found_offset); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found_for_mask %llu, " + "found_state_for_mask %#x\n", + *found_for_mask, + *found_state_for_mask); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (IS_STATE_GOOD_FOR_MASK(mask, *found_state_for_mask)) + break; + else { + err = -ENODATA; + *found_for_mask = U64_MAX; + *found_state_for_mask = SSDFS_SEG_STATE_MAX; + } + } + +ignore_search_for_mask: + if (err == -ENODATA) { + start_offset = 0; + continue; + } else if (err == -ENOENT) { + /* + * Value for mask has been found. + * Simply end the search. + */ + break; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find items in byte: " + "byte_index %u, state %#x, " + "err %d\n", + byte_index, state, err); + goto end_search; + } + + *found_seg = fragment_start_item; + *found_seg += byte_index * items_per_byte; + *found_seg += found_offset; + + if (*found_seg >= max_item) + *found_seg = U64_MAX; + + break; + } + + if (*found_seg == U64_MAX && *found_for_mask == U64_MAX) + err = -ENODATA; + else if (*found_seg == U64_MAX && *found_for_mask != U64_MAX) + err = -ENOENT; + +#ifdef CONFIG_SSDFS_DEBUG + if (!err || err == -ENOENT) { + SSDFS_DBG("found_seg %llu, found_for_mask %llu\n", + *found_seg, *found_for_mask); + } else + SSDFS_DBG("nothing was found: err %d\n", err); +#endif /* CONFIG_SSDFS_DEBUG */ + +end_search: + return err; +} + +/* + * ssdfs_segbmap_find_in_fragment() - find segment with state in fragment + * @segbmap: pointer on segment bitmap object + * @fragment_index: index of fragment + * @fragment_size: size of fragment in bytes + * @start: start segment number for search + * @max: upper bound of segment number for search + * @state: primary state for search + * @mask: mask of additonal states that can be retrieved too + * @found_seg: found segment number [out] + * @found_for_mask: found segment number for mask [out] + * @found_state_for_mask: found state for mask [out] + * + * This method tries to find segment number for requested state + * in fragment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EAGAIN - fragment is under initialization yet. + * %-EFAULT - fragment has inconsistent state. + */ +static +int ssdfs_segbmap_find_in_fragment(struct ssdfs_segment_bmap *segbmap, + u16 fragment_index, + u16 fragment_size, + u64 start, u64 max, + int state, int mask, + u64 *found_seg, u64 *found_for_mask, + int *found_state_for_mask) +{ + struct ssdfs_segbmap_fragment_desc *fragment; + size_t hdr_size = sizeof(struct ssdfs_segbmap_fragment_header); + struct page *page; + u64 first_item; + u32 items_per_fragment; + u16 items_count; + void *kaddr; + unsigned long *bmap; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap || !found_seg || !found_for_mask); + BUG_ON(!rwsem_is_locked(&segbmap->search_lock)); + + if (start >= max) { + SSDFS_ERR("start %llu >= max %llu\n", + start, max); + return -EINVAL; + } + + SSDFS_DBG("segbmap %p, fragment_index %u, " + "fragment_size %u, start %llu, max %llu, " + "found_seg %p, found_for_mask %p\n", + segbmap, fragment_index, fragment_size, + start, max, + found_seg, found_for_mask); +#endif /* CONFIG_SSDFS_DEBUG */ + + *found_seg = U64_MAX; + *found_for_mask = U64_MAX; + + first_item = ssdfs_segbmap_define_first_fragment_item(fragment_index, + fragment_size); + items_per_fragment = ssdfs_segbmap_items_per_fragment(fragment_size); + + if (first_item >= max) { + SSDFS_ERR("first_item %llu >= max %llu\n", + first_item, max); + return -ERANGE; + } else if ((first_item + items_per_fragment) <= start) { + SSDFS_ERR("first_item %llu, items_per_fragment %u, " + "start %llu\n", + first_item, items_per_fragment, start); + return -ERANGE; + } + + err = ssdfs_segbmap_check_fragment_validity(segbmap, fragment_index); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment %u is not initilaized yet\n", + fragment_index); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (err == -EFAULT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment %u initialization was failed\n", + fragment_index); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fragment %u is corrupted: err %d\n", + fragment_index, err); + return err; + } + + fragment = &segbmap->desc_array[fragment_index]; + + items_count = ssdfs_segbmap_define_items_count(fragment, state, mask); + if (items_count == U16_MAX) { + SSDFS_ERR("segbmap has inconsistent state\n"); + return -ERANGE; + } else if (items_count == 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment %u hasn't items for search\n", + fragment_index); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + + items_count = fragment->total_segs; + + if (items_count == 0 || items_count > items_per_fragment) { + SSDFS_ERR("invalid total_segs %u\n", items_count); + return -ERANGE; + } + + page = find_lock_page(&segbmap->pages, fragment_index); + if (!page) { + SSDFS_ERR("fragment %u hasn't memory page\n", + fragment_index); + return -ERANGE; + } + + ssdfs_account_locked_page(page); + kaddr = kmap_local_page(page); + bmap = (unsigned long *)((u8 *)kaddr + hdr_size); + + err = FIND_FIRST_ITEM_IN_FRAGMENT(SSDFS_SBMP_FRAG_HDR(kaddr), + (u8 *)bmap, start, max, state, mask, + found_seg, found_for_mask, + found_state_for_mask); + + kunmap_local(kaddr); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * __ssdfs_segbmap_find() - find segment with state + * @segbmap: pointer on segment bitmap object + * @start: start segment number for search + * @max: upper bound of segment number for search + * @state: primary state for search + * @mask: mask of additonal states that can be retrieved too + * @fragment_size: fragment size in bytes + * @seg: found segment number [out] + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to find segment number for requested state. + * + * RETURN: + * [success] - found segment state + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-EAGAIN - fragment is under initialization yet. + * %-EOPNOTSUPP - operation is not supported. + * %-ENOMEM - fail to allocate memory. + * %-EFAULT - segbmap has inconsistent state. + * %-ERANGE - internal error. + * %-ENODATA - unable to find segment as for state as for mask. + */ +static +int __ssdfs_segbmap_find(struct ssdfs_segment_bmap *segbmap, + u64 start, u64 max, + int state, int mask, + u16 fragment_size, + u64 *seg, struct completion **end) +{ + unsigned long *fbmap; + int start_fragment, max_fragment, found_fragment; + u64 found = U64_MAX, found_for_mask = U64_MAX; + int found_state_for_mask = SSDFS_SEG_STATE_MAX; + int err = -ENODATA; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap || !seg); + BUG_ON(!rwsem_is_locked(&segbmap->search_lock)); + + SSDFS_DBG("segbmap %p, start %llu, max %llu, " + "state %#x, mask %#x, fragment_size %u, seg %p\n", + segbmap, start, max, state, mask, + fragment_size, seg); +#endif /* CONFIG_SSDFS_DEBUG */ + + *end = NULL; + + if (start >= max) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start %llu >= max %llu\n", + start, max); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + + fbmap = ssdfs_segbmap_choose_fbmap(segbmap, state, mask); + if (IS_ERR_OR_NULL(fbmap)) { + err = (fbmap == NULL ? -ENOMEM : PTR_ERR(fbmap)); + SSDFS_ERR("unable to choose fragment bitmap: err %d\n", + err); + return err; + } + + start_fragment = SEG_BMAP_FRAGMENTS(start + 1); + if (start_fragment > 0) + start_fragment -= 1; + + max_fragment = SEG_BMAP_FRAGMENTS(max); + + do { + u64 found_for_iter = U64_MAX; + int found_state_for_iter = -1; + + err = ssdfs_segbmap_find_fragment(segbmap, + fbmap, + start_fragment, + max_fragment, + &found_fragment); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find fragment: " + "state %#x, mask %#x, " + "start_fragment %d, max_fragment %d\n", + state, mask, + start_fragment, max_fragment); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_seg_search; + } else if (err == -EFAULT) { + ssdfs_fs_error(segbmap->fsi->sb, + __FILE__, __func__, __LINE__, + "segbmap inconsistent state: " + "found_fragment %d\n", + found_fragment); + goto finish_seg_search; + } else if (err == -EAGAIN) { + if (found_fragment >= U16_MAX) { + /* select the first fragment by default */ + found_fragment = 0; + } + + *end = &segbmap->desc_array[found_fragment].init_end; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment %u is not initilaized yet\n", + found_fragment); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_seg_search; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find fragment: " + "start_fragment %d, max_fragment %d, " + "err %d\n", + start_fragment, max_fragment, err); + goto finish_seg_search; + } else if (found_fragment >= U16_MAX) { + err = -ERANGE; + SSDFS_ERR("fail to find fragment: " + "start_fragment %d, max_fragment %d, " + "err %d\n", + start_fragment, max_fragment, err); + goto finish_seg_search; + } + + start = ssdfs_segbmap_correct_search_start(found_fragment, + start, max, + fragment_size); + if (start == U64_MAX || start >= max) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("break search: start %llu, max %llu\n", + start, max); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + } + + *end = &segbmap->desc_array[found_fragment].init_end; + + err = ssdfs_segbmap_find_in_fragment(segbmap, found_fragment, + fragment_size, + start, max, + state, mask, + &found, &found_for_iter, + &found_state_for_iter); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find segment: " + "fragment %d, " + "state %#x, mask %#x, " + "start %llu, max %llu\n", + found_fragment, + state, mask, + start, max); +#endif /* CONFIG_SSDFS_DEBUG */ + /* try next fragment */ + } else if (err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("mask %#x, found_for_mask %llu, " + "found_for_iter %llu, " + "found_state %#x\n", + mask, found_for_mask, found_for_iter, + found_state_for_iter); +#endif /* CONFIG_SSDFS_DEBUG */ + err = 0; + found_for_mask = found_for_iter; + found_state_for_mask = found_state_for_iter; + goto check_search_result; + } else if (err == -EFAULT) { + /* Just try another iteration */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment %d is inconsistent\n", + found_fragment); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fragment %u is not initilaized yet\n", + found_fragment); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_seg_search; + } else if (unlikely(err < 0)) { + SSDFS_ERR("fail to find segment: " + "found_fragment %d, start %llu, " + "max %llu, err %d\n", + found_fragment, start, max, err); + goto finish_seg_search; + } else if (found == U64_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid segment number: " + "found_fragment %d, start %llu, " + "max %llu\n", + found_fragment, start, max); + goto finish_seg_search; + } else + break; + + start_fragment = found_fragment + 1; + } while (start_fragment <= max_fragment); + +check_search_result: + if (unlikely(err < 0)) { + /* we have some error */ + goto finish_seg_search; + } else if (found == U64_MAX) { + if (found_for_mask == U64_MAX) { + err = -ENODATA; + SSDFS_DBG("fail to find segment\n"); + } else { + *seg = found_for_mask; + err = found_state_for_mask; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found for mask %llu, state %#x\n", + *seg, err); +#endif /* CONFIG_SSDFS_DEBUG */ + } + } else { + *seg = found; + err = state; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found segment %llu\n", *seg); +#endif /* CONFIG_SSDFS_DEBUG */ + } + +finish_seg_search: + return err; +} + +/* + * ssdfs_segbmap_find() - find segment with state + * @segbmap: pointer on segment bitmap object + * @start: start segment number for search + * @max: upper bound of segment number for search + * @state: primary state for search + * @mask: mask of additonal states that can be retrieved too + * @seg: found segment number [out] + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to find segment number for requested state. + * + * RETURN: + * [success] - found segment state + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-EAGAIN - fragment is under initialization yet. + * %-EOPNOTSUPP - operation is not supported. + * %-ENOMEM - fail to allocate memory. + * %-EFAULT - segbmap has inconsistent state. + * %-ERANGE - internal error. + * %-ENODATA - unable to find segment as for state as for mask. + */ +int ssdfs_segbmap_find(struct ssdfs_segment_bmap *segbmap, + u64 start, u64 max, + int state, int mask, + u64 *seg, struct completion **end) +{ + u64 items_count; + u16 fragment_size; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap || !seg); + + if (start >= segbmap->items_count) { + SSDFS_ERR("start %llu >= items_count %llu\n", + start, segbmap->items_count); + return -EINVAL; + } + + if (start >= max) { + SSDFS_ERR("start %llu >= max %llu\n", + start, max); + return -EINVAL; + } + + if (state < SSDFS_SEG_CLEAN || state >= SSDFS_SEG_STATE_MAX) { + SSDFS_ERR("unknown segment state %#x\n", state); + return -EINVAL; + } + + if ((mask & SSDFS_SEG_CLEAN_USING_MASK) != mask && + (mask & SSDFS_SEG_USED_DIRTY_MASK) != mask && + (mask & SSDFS_SEG_BAD_STATE_MASK) != mask) { + SSDFS_ERR("unsupported set of flags %#x\n", + mask); + return -EOPNOTSUPP; + } + + SSDFS_DBG("segbmap %p, start %llu, max %llu, " + "state %#x, mask %#x, seg %p\n", + segbmap, start, max, state, mask, seg); +#endif /* CONFIG_SSDFS_DEBUG */ + + *end = NULL; + + inode_lock_shared(segbmap->fsi->segbmap_inode); + down_read(&segbmap->resize_lock); + + items_count = segbmap->items_count; + fragment_size = segbmap->fragment_size; + + if (segbmap->flags & SSDFS_SEGBMAP_ERROR) { + err = -EFAULT; + ssdfs_fs_error(segbmap->fsi->sb, + __FILE__, __func__, __LINE__, + "segbmap has corrupted state\n"); + goto finish_search_preparation; + } + + max = min_t(u64, max, items_count); + + down_read(&segbmap->search_lock); + err = __ssdfs_segbmap_find(segbmap, start, max, state, mask, + fragment_size, seg, end); + up_read(&segbmap->search_lock); + +finish_search_preparation: + up_read(&segbmap->resize_lock); + inode_unlock_shared(segbmap->fsi->segbmap_inode); + + return err; +} + +/* + * ssdfs_segbmap_find_and_set() - find segment and change state + * @segbmap: pointer on segment bitmap object + * @start: start segment number for search + * @max: upper bound of segment number for search + * @state: primary state for search + * @mask: mask of additonal states that can be retrieved too + * @new_state: new state of segment + * @seg: found segment number [out] + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to find segment number for requested state + * and to set segment state as @new_state. + * + * RETURN: + * [success] - found segment state before changing + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-EAGAIN - fragment is under initialization yet. + * %-EOPNOTSUPP - operation is not supported. + * %-ENOMEM - fail to allocate memory. + * %-EFAULT - segbmap has inconsistent state. + * %-ERANGE - internal error. + * %-ENODATA - unable to find segment as for state as for mask. + */ +int ssdfs_segbmap_find_and_set(struct ssdfs_segment_bmap *segbmap, + u64 start, u64 max, + int state, int mask, + int new_state, + u64 *seg, struct completion **end) +{ + u64 items_count; + u16 fragments_count; + u16 fragment_size; + pgoff_t fragment_index; + int err = 0, res = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap || !seg); + + if (start >= segbmap->items_count) { + SSDFS_ERR("start %llu >= items_count %llu\n", + start, segbmap->items_count); + return -EINVAL; + } + + if (start >= max) { + SSDFS_ERR("start %llu >= max %llu\n", + start, max); + return -EINVAL; + } + + if (state < SSDFS_SEG_CLEAN || state >= SSDFS_SEG_STATE_MAX) { + SSDFS_ERR("unknown segment state %#x\n", state); + return -EINVAL; + } + + if ((mask & SSDFS_SEG_CLEAN_USING_MASK) != mask && + (mask & SSDFS_SEG_USED_DIRTY_MASK) != mask && + (mask & SSDFS_SEG_BAD_STATE_MASK) != mask) { + SSDFS_ERR("unsupported set of flags %#x\n", + mask); + return -EOPNOTSUPP; + } + + if (new_state < SSDFS_SEG_CLEAN || new_state >= SSDFS_SEG_STATE_MAX) { + SSDFS_ERR("unknown new segment state %#x\n", new_state); + return -EINVAL; + } +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("segbmap %p, start %llu, max %llu, " + "state %#x, mask %#x, new_state %#x, seg %p\n", + segbmap, start, max, state, mask, new_state, seg); +#else + SSDFS_DBG("segbmap %p, start %llu, max %llu, " + "state %#x, mask %#x, new_state %#x, seg %p\n", + segbmap, start, max, state, mask, new_state, seg); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + *end = NULL; + + inode_lock_shared(segbmap->fsi->segbmap_inode); + down_read(&segbmap->resize_lock); + + items_count = segbmap->items_count; + fragments_count = segbmap->fragments_count; + fragment_size = segbmap->fragment_size; + + if (segbmap->flags & SSDFS_SEGBMAP_ERROR) { + err = -EFAULT; + ssdfs_fs_error(segbmap->fsi->sb, + __FILE__, __func__, __LINE__, + "segbmap has corrupted state\n"); + goto finish_search_preparation; + } + + max = min_t(u64, max, items_count); + + down_write(&segbmap->search_lock); + +try_to_find_seg_id: + res = __ssdfs_segbmap_find(segbmap, start, max, + state, mask, + fragment_size, seg, end); + if (res == -ENODATA) { + err = res; + SSDFS_DBG("unable to find any segment\n"); + goto finish_find_set; + } else if (res == -EAGAIN) { + err = res; + SSDFS_DBG("fragment is not initilaized yet\n"); + goto finish_find_set; + } else if (unlikely(res < 0)) { + err = res; + SSDFS_ERR("fail to find clean segment: err %d\n", + err); + goto finish_find_set; + } + + if (res == new_state) { + /* everything is done */ + goto finish_find_set; + } else if (res == SSDFS_SEG_CLEAN) { + /* + * we can change clean state on any other + */ + } else { + start = *seg + 1; + *seg = U64_MAX; + goto try_to_find_seg_id; + } + + if (*seg >= items_count) { + err = -ERANGE; + SSDFS_ERR("seg %llu >= items_count %llu\n", + *seg, items_count); + goto finish_find_set; + } + + fragment_index = ssdfs_segbmap_seg_2_fragment_index(*seg); + if (fragment_index >= fragments_count) { + err = -EFAULT; + ssdfs_fs_error(segbmap->fsi->sb, + __FILE__, __func__, __LINE__, + "fragment_index %lu >= fragments_count %u\n", + fragment_index, fragments_count); + goto finish_find_set; + } + + err = __ssdfs_segbmap_change_state(segbmap, *seg, + new_state, + fragment_index, + fragment_size); + if (unlikely(err)) { + SSDFS_ERR("fail to reserve segment: err %d\n", + err); + goto finish_find_set; + } + +finish_find_set: + up_write(&segbmap->search_lock); + +finish_search_preparation: + up_read(&segbmap->resize_lock); + inode_unlock_shared(segbmap->fsi->segbmap_inode); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (unlikely(err)) + return err; + + return res; +} + +/* + * ssdfs_segbmap_reserve_clean_segment() - reserve clean segment + * @segbmap: pointer on segment bitmap object + * @start: start segment number for search + * @max: upper bound of segment number for search + * @seg: found segment number [out] + * @end: pointer on completion for waiting init ending [out] + * + * This method tries to find clean segment and to reserve it. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-EAGAIN - fragment is under initialization yet. + * %-EOPNOTSUPP - operation is not supported. + * %-ENOMEM - fail to allocate memory. + * %-EFAULT - segbmap has inconsistent state. + * %-ERANGE - internal error. + * %-ENODATA - unable to find segment. + */ +int ssdfs_segbmap_reserve_clean_segment(struct ssdfs_segment_bmap *segbmap, + u64 start, u64 max, + u64 *seg, struct completion **end) +{ + u64 items_count; + u16 fragments_count; + u16 fragment_size; + pgoff_t fragment_index; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!segbmap || !seg); + + if (start >= segbmap->items_count) { + SSDFS_ERR("start %llu >= items_count %llu\n", + start, segbmap->items_count); + return -EINVAL; + } + + if (start >= max) { + SSDFS_ERR("start %llu >= max %llu\n", + start, max); + return -EINVAL; + } +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("segbmap %p, start %llu, max %llu, " + "seg %p\n", + segbmap, start, max, seg); +#else + SSDFS_DBG("segbmap %p, start %llu, max %llu, " + "seg %p\n", + segbmap, start, max, seg); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + *end = NULL; + + inode_lock_shared(segbmap->fsi->segbmap_inode); + down_read(&segbmap->resize_lock); + + items_count = segbmap->items_count; + fragments_count = segbmap->fragments_count; + fragment_size = segbmap->fragment_size; + + if (segbmap->flags & SSDFS_SEGBMAP_ERROR) { + err = -EFAULT; + ssdfs_fs_error(segbmap->fsi->sb, + __FILE__, __func__, __LINE__, + "segbmap has corrupted state\n"); + goto finish_segment_check; + } + + down_write(&segbmap->search_lock); + + err = __ssdfs_segbmap_find(segbmap, start, max, + SSDFS_SEG_CLEAN, + SSDFS_SEG_CLEAN_STATE_FLAG, + fragment_size, seg, end); + if (err == -ENODATA) { + SSDFS_DBG("unable to find clean segment\n"); + goto finish_reserve_segment; + } else if (err == -EAGAIN) { + SSDFS_DBG("fragment is not initilaized yet\n"); + goto finish_reserve_segment; + } else if (unlikely(err < 0)) { + SSDFS_ERR("fail to find clean segment: err %d\n", + err); + goto finish_reserve_segment; + } + + if (*seg >= items_count) { + err = -ERANGE; + SSDFS_ERR("seg %llu >= items_count %llu\n", + *seg, items_count); + goto finish_reserve_segment; + } + + fragment_index = ssdfs_segbmap_seg_2_fragment_index(*seg); + if (fragment_index >= fragments_count) { + err = -EFAULT; + ssdfs_fs_error(segbmap->fsi->sb, + __FILE__, __func__, __LINE__, + "fragment_index %lu >= fragments_count %u\n", + fragment_index, fragments_count); + goto finish_reserve_segment; + } + + err = __ssdfs_segbmap_change_state(segbmap, *seg, + SSDFS_SEG_RESERVED, + fragment_index, + fragment_size); + if (unlikely(err)) { + SSDFS_ERR("fail to reserve segment: err %d\n", + err); + goto finish_reserve_segment; + } + +finish_reserve_segment: + up_write(&segbmap->search_lock); + +finish_segment_check: + up_read(&segbmap->resize_lock); + inode_unlock_shared(segbmap->fsi->segbmap_inode); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished: seg %llu, err %d\n", *seg, err); +#else + SSDFS_DBG("finished: seg %llu, err %d\n", *seg, err); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} From patchwork Sat Feb 25 01:08:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151952 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49BD4C64ED8 for ; Sat, 25 Feb 2023 01:19:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229732AbjBYBTR (ORCPT ); Fri, 24 Feb 2023 20:19:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48694 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229686AbjBYBRe (ORCPT ); Fri, 24 Feb 2023 20:17:34 -0500 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B509219F2D for ; Fri, 24 Feb 2023 17:17:15 -0800 (PST) Received: by mail-oi1-x22b.google.com with SMTP id y184so797965oiy.8 for ; Fri, 24 Feb 2023 17:17:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=c2iCXdF1K4+cgfpqgtJ38AKebGSO4Ut1e8fY9wUGoQs=; b=hFC1ioGG3m0KB6Q+eLqgec9uxibCudTQvABJBHgIjKS8iLwj9sZfYSNZlL4kuCBDxJ 4ff7HCgzzjd5PfahyQhzTfvSn2ydDQcEfYsO30bn7oIFYuc6HCsp2O+CquczXK1c8hBl eDXoQuUe3ECEEvbemX7RQi/kTkNDzuZ+vG5qtbxoIltS7S0kx6YsAa/gLb4KUKllcVgr ls2FWPgfcSj25UyauoYAgkIPvQ9AcyCEi40sw7irskZPozvlDBlLJXiDU2Rl2hbV4MBf LWoaNr/cgXqQJjVoUconMrzCmBDN6S4ETjSfpLaH+4N3TzMmq4Ksp2aJD4dKykHqV1rw 4gcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=c2iCXdF1K4+cgfpqgtJ38AKebGSO4Ut1e8fY9wUGoQs=; b=zoYRAD/j5d+xjc/wR6ixLS+3NdmX31faQNLi96jly75QFmxjgV3HMykzSVHQOHr5K8 04VwBqzu7+DeyBdelWHxROAlRdXXM5rriPFC7QIZ597fVyKvV3on7hB+QrMLXICX4zcI Ohs0x+MTIZcFGTuguFkq9ddc/P+OpZ+4j/uQQXhai+kgMnAc34/Nucbf728fmSKWbPCU htsI0KBDf8DMC4HvHTPvg1hkJd2mv+i4Ey5ATkrgkgU/ex9n7H7RRoUBAuLrZ1aoWTKc /WsvTvX1wzGeW5WOiot1lgc+mQtH0+VsRArKIftuMbA+LNNmAj2qscF1uEmW/vq5GN8F OpFA== X-Gm-Message-State: AO0yUKVIsRQjSjc5Bb+xyOvNyteVifaByo8LmgGYK4M4s/UmpJfhUPnH 6kK6IvQjLNfVYutj1fopz7DzxAw3M1DLpoxU X-Google-Smtp-Source: AK7set/yMYcw6tUxnXjHgkSpqnTP696qqzHkMLFq655dkoqamuZF9tHXb/wsjIHAcEgENGDo/mH6xg== X-Received: by 2002:a05:6808:206:b0:364:858:7e88 with SMTP id l6-20020a056808020600b0036408587e88mr8019614oie.29.1677287834135; Fri, 24 Feb 2023 17:17:14 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:13 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 47/76] ssdfs: introduce b-tree object Date: Fri, 24 Feb 2023 17:08:58 -0800 Message-Id: <20230225010927.813929-48-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org SSDFS file system is using the logical segment, logical extent concepts, and the "Physical" Erase Blocks (PEB) migration scheme. Generally speaking, these techniques provide the opportunity to exclude completely the wandering tree issue and to decrease significantly the write amplification. SSDFS file system introduces the technique of storing the data on the basis of logical extent that describes this data’s position by means of segment ID and logical block ID. Finally, PEBs migration technique guarantee that data will be described by the same logical extent until the direct change of segment ID or logical block ID. As a result, it means that logical extent will be the same if data is sitting in the same logical segment. The responsibility of PEBs migration technique is to implement the continuous migration of data between PEBs inside of the logical segment for the case of data updates. Generally speaking, SSDFS file system’s internal techniques guarantee that COW policy will not update the content of b-tree. But content of b-tree will be updated only by regular operations of end-user with the file system. SSDFS file system uses b-tree architecture for metadata representation (for example, inodes tree, extents tree, dentries tree, xattr tree) because it provides the compact way of reserving the metadata space without the necessity to use the excessive overprovisioning of metadata reservation (for example, in the case of plain table or array). The b-tree provides the efficient technique of items lookup, especially, for the case of aged or sparse b-tree that is capable to contain the mixture of used and deleted (or freed) items. Such b-tree’s feature could be very useful for the case of extent invalidation, for example. Also SSDFS file system aggregates the b-tree’s root node in the superblock (for example, inodes tree case) or in the inode (for example, extents tree case). As a result, it means that an empty b-tree will contain only the root node without the necessity to reserve any b-tree’s node on the file system’s volume. Moreover, if a b-tree needs to contain only several items (two items, for example) then the root node’s space can be used to store these items inline without the necessity to create the full-featured b-tree’s node. As a result, SSDFS uses b-trees with the goal to achieve the compact representation of metadata, the flexible way to expend or to shrink the b-tree’s space capacity, and the efficient mechanism of items’ lookup. SSDFS file system uses a hybrid b-tree architecture with the goal to eliminate the index nodes’ side effect. The hybrid b-tree operates by three node types: (1) index node, (2) hybrid node, (3) leaf node. Generally speaking, the peculiarity of hybrid node is the mixture as index as data records into one node. Hybrid b-tree starts with root node that is capable to keep the two index records or two data records inline (if size of data record is equal or lesser than size of index record). If the b-tree needs to contain more than two items then it should be added the first hybrid node into the b-tree. The root level of b-tree is able to contain only two nodes because the root node is capable to store only two index records. Generally speaking, the initial goal of hybrid node is to store the data records in the presence of reserved index area. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/btree.c | 1020 +++++++++++++++++++++++++++++++++++++++ fs/ssdfs/btree.h | 218 +++++++++ fs/ssdfs/btree_search.c | 885 +++++++++++++++++++++++++++++++++ fs/ssdfs/btree_search.h | 359 ++++++++++++++ 4 files changed, 2482 insertions(+) create mode 100644 fs/ssdfs/btree.c create mode 100644 fs/ssdfs/btree.h create mode 100644 fs/ssdfs/btree_search.c create mode 100644 fs/ssdfs/btree_search.h diff --git a/fs/ssdfs/btree.c b/fs/ssdfs/btree.c new file mode 100644 index 000000000000..5780077a1eb9 --- /dev/null +++ b/fs/ssdfs/btree.c @@ -0,0 +1,1020 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/btree.c - generalized btree functionality implementation. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "request_queue.h" +#include "segment_bitmap.h" +#include "offset_translation_table.h" +#include "page_array.h" +#include "page_vector.h" +#include "peb_container.h" +#include "segment.h" +#include "btree_search.h" +#include "btree_node.h" +#include "btree_hierarchy.h" +#include "peb_mapping_table.h" +#include "btree.h" + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_btree_page_leaks; +atomic64_t ssdfs_btree_memory_leaks; +atomic64_t ssdfs_btree_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_btree_cache_leaks_increment(void *kaddr) + * void ssdfs_btree_cache_leaks_decrement(void *kaddr) + * void *ssdfs_btree_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_btree_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_btree_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_btree_kfree(void *kaddr) + * struct page *ssdfs_btree_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_btree_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_btree_free_page(struct page *page) + * void ssdfs_btree_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(btree) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(btree) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_btree_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_btree_page_leaks, 0); + atomic64_set(&ssdfs_btree_memory_leaks, 0); + atomic64_set(&ssdfs_btree_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_btree_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_btree_page_leaks) != 0) { + SSDFS_ERR("BTREE: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_btree_page_leaks)); + } + + if (atomic64_read(&ssdfs_btree_memory_leaks) != 0) { + SSDFS_ERR("BTREE: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_btree_memory_leaks)); + } + + if (atomic64_read(&ssdfs_btree_cache_leaks) != 0) { + SSDFS_ERR("BTREE: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_btree_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +/* + * ssdfs_btree_radix_tree_insert() - insert node into the radix tree + * @tree: btree pointer + * @node_id: node ID number + * @node: pointer on btree node + */ +static +int ssdfs_btree_radix_tree_insert(struct ssdfs_btree *tree, + unsigned long node_id, + struct ssdfs_btree_node *node) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !node); + + SSDFS_DBG("tree %p, node_id %llu, node %p\n", + tree, (u64)node_id, node); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = radix_tree_preload(GFP_NOFS); + if (unlikely(err)) { + SSDFS_ERR("fail to preload radix tree: err %d\n", + err); + return err; + } + + spin_lock(&tree->nodes_lock); + err = radix_tree_insert(&tree->nodes, node_id, node); + spin_unlock(&tree->nodes_lock); + + radix_tree_preload_end(); + + if (unlikely(err)) { + SSDFS_ERR("fail to add node into radix tree: " + "node_id %llu, node %p, err %d\n", + (u64)node_id, node, err); + } + + return err; +} + +/* + * ssdfs_btree_radix_tree_delete() - delete node from the radix tree + * @tree: btree pointer + * @node_id: node ID number + * + * This method tries to delete the node from the radix tree. + * + * RETURN: + * pointer of the node object is deleted from the radix tree + */ +static +struct ssdfs_btree_node *ssdfs_btree_radix_tree_delete(struct ssdfs_btree *tree, + unsigned long node_id) +{ + struct ssdfs_btree_node *ptr; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); + + SSDFS_DBG("tree %p, node_id %llu\n", + tree, (u64)node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&tree->nodes_lock); + ptr = radix_tree_delete(&tree->nodes, node_id); + spin_unlock(&tree->nodes_lock); + + return ptr; +} + +/* + * ssdfs_btree_radix_tree_find() - find the node into the radix tree + * @tree: btree pointer + * @node_id: node ID number + * @node: pointer on btree node pointer [out] + * + * This method tries to find node in the radix tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOENT - tree doesn't contain the requested node. + */ +int ssdfs_btree_radix_tree_find(struct ssdfs_btree *tree, + unsigned long node_id, + struct ssdfs_btree_node **node) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !node); + + SSDFS_DBG("tree %p, node_id %llu\n", + tree, (u64)node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&tree->nodes_lock); + *node = radix_tree_lookup(&tree->nodes, node_id); + spin_unlock(&tree->nodes_lock); + + if (!*node) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find the node: id %llu\n", + (u64)node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOENT; + } + + return 0; +} + +static +int __ssdfs_btree_find_item(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search); + +/* + * ssdfs_btree_desc_init() - init the btree's descriptor + * @fsi: pointer on shared file system object + * @tree: pointer on inodes btree object + * @desc: pointer on btree's descriptor + * @min_item_size: minimal possible item size + * @max_item_size: maximal possible item size + */ +int ssdfs_btree_desc_init(struct ssdfs_fs_info *fsi, + struct ssdfs_btree *tree, + struct ssdfs_btree_descriptor *desc, + u8 min_item_size, + u16 max_item_size) +{ + size_t index_size = sizeof(struct ssdfs_btree_index_key); + u32 pagesize; + u32 node_size; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !desc); + + SSDFS_DBG("tree %p, desc %p\n", + tree, desc); +#endif /* CONFIG_SSDFS_DEBUG */ + + pagesize = fsi->pagesize; + node_size = 1 << desc->log_node_size; + + if (node_size != (pagesize * desc->pages_per_node)) { + SSDFS_ERR("invalid pages_per_node: " + "node_size %u, page_size %u, pages_per_node %u\n", + node_size, pagesize, desc->pages_per_node); + return -EIO; + } + + if (desc->node_ptr_size != index_size) { + SSDFS_ERR("invalid node_ptr_size %u\n", + desc->node_ptr_size); + return -EIO; + } + + if (le16_to_cpu(desc->index_size) != index_size) { + SSDFS_ERR("invalid index_size %u\n", + le16_to_cpu(desc->index_size)); + return -EIO; + } + + tree->type = desc->type; + atomic_set(&tree->flags, le16_to_cpu(desc->flags)); + tree->node_size = node_size; + tree->pages_per_node = desc->pages_per_node; + tree->node_ptr_size = desc->node_ptr_size; + tree->index_size = le16_to_cpu(desc->index_size); + tree->item_size = le16_to_cpu(desc->item_size); + tree->min_item_size = min_item_size; + tree->max_item_size = max_item_size; + tree->index_area_min_size = le16_to_cpu(desc->index_area_min_size); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("type %#x, node_size %u, " + "index_size %u, item_size %u\n", + tree->type, tree->node_size, + tree->index_size, tree->item_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_btree_create() - create generalized btree object + * @fsi: pointer on shared file system object + * @desc_ops: pointer on btree descriptor operations + * @btree_ops: pointer on btree operations + * @tree: pointer on memory for btree creation + * + * This method tries to create inodes btree object. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_btree_create(struct ssdfs_fs_info *fsi, + u64 owner_ino, + const struct ssdfs_btree_descriptor_operations *desc_ops, + const struct ssdfs_btree_operations *btree_ops, + struct ssdfs_btree *tree) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !desc_ops || !tree); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("fsi %p, owner_ino %llu, " + "desc_ops %p, btree_ops %p, tree %p\n", + fsi, owner_ino, desc_ops, btree_ops, tree); +#else + SSDFS_DBG("fsi %p, owner_ino %llu, " + "desc_ops %p, btree_ops %p, tree %p\n", + fsi, owner_ino, desc_ops, btree_ops, tree); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + atomic_set(&tree->state, SSDFS_BTREE_UNKNOWN_STATE); + + tree->owner_ino = owner_ino; + + tree->fsi = fsi; + tree->desc_ops = desc_ops; + tree->btree_ops = btree_ops; + + if (!desc_ops->init) { + SSDFS_ERR("empty btree descriptor init operation\n"); + return -ERANGE; + } + + err = desc_ops->init(fsi, tree); + if (unlikely(err)) { + SSDFS_ERR("fail to init btree descriptor: err %d\n", + err); + return err; + } + + atomic_set(&tree->height, U8_MAX); + + init_rwsem(&tree->lock); + spin_lock_init(&tree->nodes_lock); + tree->upper_node_id = SSDFS_BTREE_ROOT_NODE_ID; + INIT_RADIX_TREE(&tree->nodes, GFP_ATOMIC); + + if (!btree_ops && !btree_ops->create_root_node) + SSDFS_WARN("empty create_root_node method\n"); + else { + struct ssdfs_btree_node *node; + + node = ssdfs_btree_node_create(tree, + SSDFS_BTREE_ROOT_NODE_ID, + NULL, + SSDFS_BTREE_LEAF_NODE_HEIGHT, + SSDFS_BTREE_ROOT_NODE, + U64_MAX); + if (unlikely(IS_ERR_OR_NULL(node))) { + err = !node ? -ENOMEM : PTR_ERR(node); + SSDFS_ERR("fail to create root node: err %d\n", + err); + return err; + } + + err = btree_ops->create_root_node(fsi, node); + if (unlikely(err)) { + SSDFS_ERR("fail to init the root node\n"); + goto finish_root_node_creation; + } + + err = ssdfs_btree_radix_tree_insert(tree, + SSDFS_BTREE_ROOT_NODE_ID, + node); + if (unlikely(err)) { + SSDFS_ERR("fail to insert node into radix tree: " + "err %d\n", + err); + goto finish_root_node_creation; + } + +finish_root_node_creation: + if (unlikely(err)) { + ssdfs_btree_node_destroy(node); + return err; + } + } + + atomic_set(&tree->state, SSDFS_BTREE_CREATED); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} + +/* + * ssdfs_btree_destroy() - destroy generalized btree object + * @tree: btree object + */ +void ssdfs_btree_destroy(struct ssdfs_btree *tree) +{ + int tree_state; + struct radix_tree_iter iter; + void **slot; + struct ssdfs_btree_node *node; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_state = atomic_read(&tree->state); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p, type %#x, state %#x\n", + tree, tree->type, tree_state); +#else + SSDFS_DBG("tree %p, type %#x, state %#x\n", + tree, tree->type, tree_state); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + switch (tree_state) { + case SSDFS_BTREE_CREATED: + /* expected state */ + break; + + case SSDFS_BTREE_DIRTY: + if (!is_ssdfs_btree_empty(tree)) { + /* complain */ + SSDFS_WARN("tree is dirty\n"); + } else { + /* regular destroy */ + atomic_set(&tree->state, SSDFS_BTREE_UNKNOWN_STATE); + } + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + SSDFS_WARN("invalid tree state %#x\n", + tree_state); +#endif /* CONFIG_SSDFS_DEBUG */ + return; + } + + if (rwsem_is_locked(&tree->lock)) { + /* inform about possible trouble */ + SSDFS_WARN("tree is locked under destruction\n"); + } + + spin_lock(&tree->nodes_lock); + radix_tree_for_each_slot(slot, &tree->nodes, &iter, + SSDFS_BTREE_ROOT_NODE_ID) { + node = + (struct ssdfs_btree_node *)radix_tree_delete(&tree->nodes, + iter.index); + + spin_unlock(&tree->nodes_lock); + if (!node) { + SSDFS_WARN("empty node pointer: " + "index %llu\n", + (u64)iter.index); + } else { + if (tree->btree_ops && tree->btree_ops->destroy_node) + tree->btree_ops->destroy_node(node); + + ssdfs_btree_node_destroy(node); + } + spin_lock(&tree->nodes_lock); + } + spin_unlock(&tree->nodes_lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ +} + +/* + * ssdfs_btree_desc_flush() - generalized btree's descriptor flush method + * @tree: btree object + * @desc: pointer on btree's descriptor [out] + */ +int ssdfs_btree_desc_flush(struct ssdfs_btree *tree, + struct ssdfs_btree_descriptor *desc) +{ + u32 pagesize; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !tree->fsi || !desc); + + SSDFS_DBG("owner_ino %llu, type %#x, state %#x\n", + tree->owner_ino, tree->type, + atomic_read(&tree->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + pagesize = tree->fsi->pagesize; + + if (tree->node_size != (pagesize * tree->pages_per_node)) { + SSDFS_ERR("invalid pages_per_node: " + "node_size %u, page_size %u, pages_per_node %u\n", + tree->node_size, pagesize, tree->pages_per_node); + return -ERANGE; + } + + if (tree->node_ptr_size != sizeof(struct ssdfs_btree_index_key)) { + SSDFS_ERR("invalid node_ptr_size %u\n", + tree->node_ptr_size); + return -ERANGE; + } + + if (tree->index_size != sizeof(struct ssdfs_btree_index_key)) { + SSDFS_ERR("invalid index_size %u\n", + tree->index_size); + return -ERANGE; + } + + desc->flags = cpu_to_le16(atomic_read(&tree->flags)); + desc->type = tree->type; + desc->log_node_size = ilog2(tree->node_size); + desc->pages_per_node = tree->pages_per_node; + desc->node_ptr_size = tree->node_ptr_size; + desc->index_size = cpu_to_le16(tree->index_size); + desc->index_area_min_size = cpu_to_le16(tree->index_area_min_size); + + return 0; +} + +/* + * ssdfs_btree_flush_nolock() - flush the current state of btree object + * @tree: btree object + * + * This method tries to flush dirty nodes of the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_flush_nolock(struct ssdfs_btree *tree) +{ + struct radix_tree_iter iter; + void **slot; + struct ssdfs_btree_node *node; + int tree_height, cur_height; + struct ssdfs_segment_request *req; + wait_queue_head_t *wq = NULL; + const atomic_t *state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, type %#x, state %#x\n", + tree, tree->type, atomic_read(&tree->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + cur_height = SSDFS_BTREE_LEAF_NODE_HEIGHT; + tree_height = atomic_read(&tree->height); + + for (; cur_height < tree_height; cur_height++) { + rcu_read_lock(); + + spin_lock(&tree->nodes_lock); + radix_tree_for_each_tagged(slot, &tree->nodes, &iter, + SSDFS_BTREE_ROOT_NODE_ID, + SSDFS_BTREE_NODE_DIRTY_TAG) { + + node = SSDFS_BTN(radix_tree_deref_slot(slot)); + if (unlikely(!node)) { + SSDFS_WARN("empty node ptr: node_id %llu\n", + (u64)iter.index); + radix_tree_tag_clear(&tree->nodes, iter.index, + SSDFS_BTREE_NODE_DIRTY_TAG); + continue; + } + spin_unlock(&tree->nodes_lock); + + ssdfs_btree_node_get(node); + + rcu_read_unlock(); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (atomic_read(&node->height) != cur_height) { + ssdfs_btree_node_put(node); + rcu_read_lock(); + spin_lock(&tree->nodes_lock); + continue; + } + + if (!is_ssdfs_btree_node_pre_deleted(node)) { + err = ssdfs_btree_node_pre_flush(node); + if (unlikely(err)) { + ssdfs_btree_node_put(node); + SSDFS_ERR("fail to pre-flush node: " + "node_id %llu, err %d\n", + (u64)iter.index, err); + goto finish_flush_tree_nodes; + } + + err = ssdfs_btree_node_flush(node); + if (unlikely(err)) { + ssdfs_btree_node_put(node); + SSDFS_ERR("fail to flush node: " + "node_id %llu, err %d\n", + (u64)iter.index, err); + goto finish_flush_tree_nodes; + } + } + + rcu_read_lock(); + + spin_lock(&tree->nodes_lock); + radix_tree_tag_clear(&tree->nodes, iter.index, + SSDFS_BTREE_NODE_DIRTY_TAG); + radix_tree_tag_set(&tree->nodes, iter.index, + SSDFS_BTREE_NODE_TOWRITE_TAG); + + ssdfs_btree_node_put(node); + } + spin_unlock(&tree->nodes_lock); + + rcu_read_unlock(); + } + + cur_height = SSDFS_BTREE_LEAF_NODE_HEIGHT; + + for (; cur_height < tree_height; cur_height++) { + rcu_read_lock(); + + spin_lock(&tree->nodes_lock); + radix_tree_for_each_tagged(slot, &tree->nodes, &iter, + SSDFS_BTREE_ROOT_NODE_ID, + SSDFS_BTREE_NODE_TOWRITE_TAG) { + + node = SSDFS_BTN(radix_tree_deref_slot(slot)); + if (unlikely(!node)) { + SSDFS_WARN("empty node ptr: node_id %llu\n", + (u64)iter.index); + radix_tree_tag_clear(&tree->nodes, iter.index, + SSDFS_BTREE_NODE_TOWRITE_TAG); + continue; + } + spin_unlock(&tree->nodes_lock); + + ssdfs_btree_node_get(node); + + rcu_read_unlock(); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (atomic_read(&node->height) != cur_height) { + ssdfs_btree_node_put(node); + rcu_read_lock(); + spin_lock(&tree->nodes_lock); + continue; + } + + if (is_ssdfs_btree_node_pre_deleted(node)) { + ssdfs_btree_node_put(node); + rcu_read_lock(); + spin_lock(&tree->nodes_lock); + continue; + } + +check_flush_result_state: + state = &node->flush_req.result.state; + + switch (atomic_read(state)) { + case SSDFS_REQ_CREATED: + case SSDFS_REQ_STARTED: + req = &node->flush_req; + wq = &req->private.wait_queue; + + err = wait_event_killable_timeout(*wq, + has_request_been_executed(req), + SSDFS_DEFAULT_TIMEOUT); + if (err < 0) + WARN_ON(err < 0); + else + err = 0; + + goto check_flush_result_state; + break; + + case SSDFS_REQ_FINISHED: + /* do nothing */ + break; + + case SSDFS_REQ_FAILED: + ssdfs_btree_node_put(node); + err = node->flush_req.result.err; + + if (!err) { + err = -ERANGE; + SSDFS_ERR("error code is absent\n"); + } + + SSDFS_ERR("flush request is failed: " + "err %d\n", err); + goto finish_flush_tree_nodes; + + default: + ssdfs_btree_node_put(node); + err = -ERANGE; + SSDFS_ERR("invalid result's state %#x\n", + atomic_read(&node->flush_req.result.state)); + goto finish_flush_tree_nodes; + } + + rcu_read_lock(); + + spin_lock(&tree->nodes_lock); + ssdfs_btree_node_put(node); + } + spin_unlock(&tree->nodes_lock); + + rcu_read_unlock(); + } + + cur_height = SSDFS_BTREE_LEAF_NODE_HEIGHT; + + for (; cur_height < tree_height; cur_height++) { + rcu_read_lock(); + + spin_lock(&tree->nodes_lock); + radix_tree_for_each_slot(slot, &tree->nodes, &iter, + SSDFS_BTREE_ROOT_NODE_ID) { + + node = SSDFS_BTN(radix_tree_deref_slot(slot)); + if (unlikely(!node)) { + SSDFS_WARN("empty node ptr: node_id %llu\n", + (u64)iter.index); + radix_tree_tag_clear(&tree->nodes, iter.index, + SSDFS_BTREE_NODE_TOWRITE_TAG); + continue; + } + spin_unlock(&tree->nodes_lock); + + ssdfs_btree_node_get(node); + + rcu_read_unlock(); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (atomic_read(&node->height) != cur_height) { + ssdfs_btree_node_put(node); + rcu_read_lock(); + spin_lock(&tree->nodes_lock); + continue; + } + + if (atomic_read(&node->type) == SSDFS_BTREE_ROOT_NODE) { + /* + * Root node is inline. + * Commit log operation is not necessary. + */ + ssdfs_btree_node_put(node); + rcu_read_lock(); + spin_lock(&tree->nodes_lock); + continue; + } + + if (is_ssdfs_btree_node_pre_deleted(node)) + err = ssdfs_btree_deleted_node_commit_log(node); + else + err = ssdfs_btree_node_commit_log(node); + + if (unlikely(err)) { + ssdfs_btree_node_put(node); + SSDFS_ERR("fail to request commit log: " + "node_id %llu, err %d\n", + (u64)iter.index, err); + goto finish_flush_tree_nodes; + } + + rcu_read_lock(); + + spin_lock(&tree->nodes_lock); + ssdfs_btree_node_put(node); + } + spin_unlock(&tree->nodes_lock); + + rcu_read_unlock(); + } + + cur_height = SSDFS_BTREE_LEAF_NODE_HEIGHT; + + for (; cur_height < tree_height; cur_height++) { + rcu_read_lock(); + + spin_lock(&tree->nodes_lock); + radix_tree_for_each_slot(slot, &tree->nodes, &iter, + SSDFS_BTREE_ROOT_NODE_ID) { + + node = SSDFS_BTN(radix_tree_deref_slot(slot)); + if (unlikely(!node)) { + SSDFS_WARN("empty node ptr: node_id %llu\n", + (u64)iter.index); + radix_tree_tag_clear(&tree->nodes, iter.index, + SSDFS_BTREE_NODE_TOWRITE_TAG); + continue; + } + spin_unlock(&tree->nodes_lock); + + ssdfs_btree_node_get(node); + + rcu_read_unlock(); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (atomic_read(&node->height) != cur_height) { + ssdfs_btree_node_put(node); + rcu_read_lock(); + spin_lock(&tree->nodes_lock); + continue; + } + + if (atomic_read(&node->type) == SSDFS_BTREE_ROOT_NODE) { + /* + * Root node is inline. + * Commit log operation is not necessary. + */ + goto clear_towrite_tag; + } + + if (is_ssdfs_btree_node_pre_deleted(node)) + goto clear_towrite_tag; + +check_commit_log_result_state: + state = &node->flush_req.result.state; + + switch (atomic_read(state)) { + case SSDFS_REQ_CREATED: + case SSDFS_REQ_STARTED: + req = &node->flush_req; + wq = &req->private.wait_queue; + + err = wait_event_killable_timeout(*wq, + has_request_been_executed(req), + SSDFS_DEFAULT_TIMEOUT); + if (err < 0) + WARN_ON(err < 0); + else + err = 0; + + goto check_commit_log_result_state; + break; + + case SSDFS_REQ_FINISHED: + /* do nothing */ + break; + + case SSDFS_REQ_FAILED: + ssdfs_btree_node_put(node); + err = node->flush_req.result.err; + + if (!err) { + err = -ERANGE; + SSDFS_ERR("error code is absent\n"); + } + + SSDFS_ERR("flush request is failed: " + "err %d\n", err); + goto finish_flush_tree_nodes; + + default: + ssdfs_btree_node_put(node); + err = -ERANGE; + SSDFS_ERR("invalid result's state %#x\n", + atomic_read(&node->flush_req.result.state)); + goto finish_flush_tree_nodes; + } + +clear_towrite_tag: + rcu_read_lock(); + spin_lock(&tree->nodes_lock); + + radix_tree_tag_clear(&tree->nodes, iter.index, + SSDFS_BTREE_NODE_TOWRITE_TAG); + + ssdfs_btree_node_put(node); + + spin_unlock(&tree->nodes_lock); + rcu_read_unlock(); + + if (is_ssdfs_btree_node_pre_deleted(node)) { + clear_ssdfs_btree_node_pre_deleted(node); + + ssdfs_btree_radix_tree_delete(tree, + node->node_id); + + if (tree->btree_ops && + tree->btree_ops->delete_node) { + err = tree->btree_ops->delete_node(node); + if (unlikely(err)) { + SSDFS_ERR("delete node failure: " + "err %d\n", err); + } + } + + if (tree->btree_ops && + tree->btree_ops->destroy_node) + tree->btree_ops->destroy_node(node); + + ssdfs_btree_node_destroy(node); + } + + rcu_read_lock(); + spin_lock(&tree->nodes_lock); + } + spin_unlock(&tree->nodes_lock); + + rcu_read_unlock(); + } + +finish_flush_tree_nodes: + if (unlikely(err)) + goto finish_btree_flush; + + if (tree->desc_ops && tree->desc_ops->flush) { + err = tree->desc_ops->flush(tree); + if (unlikely(err)) { + SSDFS_ERR("fail to flush tree descriptor: " + "err %d\n", + err); + goto finish_btree_flush; + } + } + + atomic_set(&tree->state, SSDFS_BTREE_CREATED); + +finish_btree_flush: + return err; +} + +/* + * ssdfs_btree_flush() - flush the current state of btree object + * @tree: btree object + * + * This method tries to flush dirty nodes of the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_btree_flush(struct ssdfs_btree *tree) +{ + int tree_state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_state = atomic_read(&tree->state); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p, type %#x, state %#x\n", + tree, tree->type, tree_state); +#else + SSDFS_DBG("tree %p, type %#x, state %#x\n", + tree, tree->type, tree_state); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + switch (tree_state) { + case SSDFS_BTREE_CREATED: + /* do nothing */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("btree %#x is not dirty\n", + tree->type); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + + case SSDFS_BTREE_DIRTY: + /* expected state */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + SSDFS_WARN("invalid tree state %#x\n", + tree_state); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + down_write(&tree->lock); + err = ssdfs_btree_flush_nolock(tree); + up_write(&tree->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to flush btree: err %d\n", + err); + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} diff --git a/fs/ssdfs/btree.h b/fs/ssdfs/btree.h new file mode 100644 index 000000000000..40009755d016 --- /dev/null +++ b/fs/ssdfs/btree.h @@ -0,0 +1,218 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/btree.h - btree declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#ifndef _SSDFS_BTREE_H +#define _SSDFS_BTREE_H + +struct ssdfs_btree; + +/* + * struct ssdfs_btree_descriptor_operations - btree descriptor operations + * @init: initialize btree object by descriptor + * @flush: save btree descriptor into superblock + */ +struct ssdfs_btree_descriptor_operations { + int (*init)(struct ssdfs_fs_info *fsi, + struct ssdfs_btree *tree); + int (*flush)(struct ssdfs_btree *tree); +}; + +/* + * struct ssdfs_btree_operations - btree operations specialization + * @create_root_node: specialization of root node creation + * @create_node: specialization of node's construction operation + * @init_node: specialization of node's init operation + * @destroy_node: specialization of node's destroy operation + * @add_node: specialization of adding into the tree a new empty node + * @delete_node: specialization of deletion a node from the tree + * @pre_flush_root_node: specialized flush preparation of root node + * @flush_root_node: specialized method of root node flushing + * @pre_flush_node: specialized flush preparation of common node + * @flush_node: specialized method of common node flushing + */ +struct ssdfs_btree_operations { + int (*create_root_node)(struct ssdfs_fs_info *fsi, + struct ssdfs_btree_node *node); + int (*create_node)(struct ssdfs_btree_node *node); + int (*init_node)(struct ssdfs_btree_node *node); + void (*destroy_node)(struct ssdfs_btree_node *node); + int (*add_node)(struct ssdfs_btree_node *node); + int (*delete_node)(struct ssdfs_btree_node *node); + int (*pre_flush_root_node)(struct ssdfs_btree_node *node); + int (*flush_root_node)(struct ssdfs_btree_node *node); + int (*pre_flush_node)(struct ssdfs_btree_node *node); + int (*flush_node)(struct ssdfs_btree_node *node); +}; + +/* + * struct ssdfs_btree - generic btree + * @type: btree type + * @owner_ino: inode identification number of btree owner + * @node_size: size of the node in bytes + * @pages_per_node: physical pages per node + * @node_ptr_size: size in bytes of pointer on btree node + * @index_size: size in bytes of btree's index + * @item_size: default size of item in bytes + * @min_item_size: min size of item in bytes + * @max_item_size: max possible size of item in bytes + * @index_area_min_size: minimal size in bytes of index area in btree node + * @create_cno: btree's create checkpoint + * @state: btree state + * @flags: btree flags + * @height: current height of the tree + * @lock: btree's lock + * @nodes_lock: radix tree lock + * @upper_node_id: last allocated node id + * @nodes: nodes' radix tree + * @fsi: pointer on shared file system object + * + * Btree nodes are organized by radix tree. + * Another good point about radix tree is + * supporting of knowledge about dirty items. + */ +struct ssdfs_btree { + /* static data */ + u8 type; + u64 owner_ino; + u32 node_size; + u8 pages_per_node; + u8 node_ptr_size; + u16 index_size; + u16 item_size; + u8 min_item_size; + u16 max_item_size; + u16 index_area_min_size; + u64 create_cno; + + /* operation specializations */ + const struct ssdfs_btree_descriptor_operations *desc_ops; + const struct ssdfs_btree_operations *btree_ops; + + /* mutable data */ + atomic_t state; + atomic_t flags; + atomic_t height; + + struct rw_semaphore lock; + + spinlock_t nodes_lock; + u32 upper_node_id; + struct radix_tree_root nodes; + + struct ssdfs_fs_info *fsi; +}; + +/* Btree object states */ +enum { + SSDFS_BTREE_UNKNOWN_STATE, + SSDFS_BTREE_CREATED, + SSDFS_BTREE_DIRTY, + SSDFS_BTREE_STATE_MAX +}; + +/* Radix tree tags */ +#define SSDFS_BTREE_NODE_DIRTY_TAG PAGECACHE_TAG_DIRTY +#define SSDFS_BTREE_NODE_TOWRITE_TAG PAGECACHE_TAG_TOWRITE + +/* + * Btree API + */ +int ssdfs_btree_create(struct ssdfs_fs_info *fsi, + u64 owner_ino, + const struct ssdfs_btree_descriptor_operations *desc_ops, + const struct ssdfs_btree_operations *btree_ops, + struct ssdfs_btree *tree); +void ssdfs_btree_destroy(struct ssdfs_btree *tree); +int ssdfs_btree_flush(struct ssdfs_btree *tree); + +int ssdfs_btree_find_item(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search); +int ssdfs_btree_find_range(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search); +bool is_ssdfs_btree_empty(struct ssdfs_btree *tree); +int ssdfs_btree_allocate_item(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search); +int ssdfs_btree_allocate_range(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search); +int ssdfs_btree_add_item(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search); +int ssdfs_btree_add_range(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search); +int ssdfs_btree_change_item(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search); +int ssdfs_btree_delete_item(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search); +int ssdfs_btree_delete_range(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search); +int ssdfs_btree_delete_all(struct ssdfs_btree *tree); + +/* + * Internal Btree API + */ +bool need_migrate_generic2inline_btree(struct ssdfs_btree *tree, + int items_threshold); +int ssdfs_btree_desc_init(struct ssdfs_fs_info *fsi, + struct ssdfs_btree *tree, + struct ssdfs_btree_descriptor *desc, + u8 min_item_size, + u16 max_item_size); +int ssdfs_btree_desc_flush(struct ssdfs_btree *tree, + struct ssdfs_btree_descriptor *desc); +struct ssdfs_btree_node * +ssdfs_btree_get_child_node_for_hash(struct ssdfs_btree *tree, + struct ssdfs_btree_node *parent, + u64 hash); +int ssdfs_btree_update_parent_node_pointer(struct ssdfs_btree *tree, + struct ssdfs_btree_node *parent); +int ssdfs_btree_add_node(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search); +int ssdfs_btree_insert_node(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search); +int ssdfs_btree_delete_node(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search); +int ssdfs_btree_get_head_range(struct ssdfs_btree *tree, + u32 expected_len, + struct ssdfs_btree_search *search); +int ssdfs_btree_extract_range(struct ssdfs_btree *tree, + u16 start_index, u16 count, + struct ssdfs_btree_search *search); +int ssdfs_btree_destroy_node_range(struct ssdfs_btree *tree, + u64 start_hash); +struct ssdfs_btree_node * +__ssdfs_btree_read_node(struct ssdfs_btree *tree, + struct ssdfs_btree_node *parent, + struct ssdfs_btree_index_key *node_index, + u8 node_type, u32 node_id); +int ssdfs_btree_radix_tree_find(struct ssdfs_btree *tree, + unsigned long node_id, + struct ssdfs_btree_node **node); +int ssdfs_btree_synchronize_root_node(struct ssdfs_btree *tree, + struct ssdfs_btree_inline_root_node *root); +int ssdfs_btree_get_next_hash(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + u64 *next_hash); + +void ssdfs_debug_show_btree_node_indexes(struct ssdfs_btree *tree, + struct ssdfs_btree_node *parent); +void ssdfs_check_btree_consistency(struct ssdfs_btree *tree); +void ssdfs_debug_btree_object(struct ssdfs_btree *tree); + +#endif /* _SSDFS_BTREE_H */ diff --git a/fs/ssdfs/btree_search.c b/fs/ssdfs/btree_search.c new file mode 100644 index 000000000000..27eb262690de --- /dev/null +++ b/fs/ssdfs/btree_search.c @@ -0,0 +1,885 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/btree_search.c - btree search object functionality. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "btree_search.h" +#include "btree_node.h" + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_btree_search_page_leaks; +atomic64_t ssdfs_btree_search_memory_leaks; +atomic64_t ssdfs_btree_search_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_btree_search_cache_leaks_increment(void *kaddr) + * void ssdfs_btree_search_cache_leaks_decrement(void *kaddr) + * void *ssdfs_btree_search_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_btree_search_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_btree_search_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_btree_search_kfree(void *kaddr) + * struct page *ssdfs_btree_search_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_btree_search_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_btree_search_free_page(struct page *page) + * void ssdfs_btree_search_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(btree_search) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(btree_search) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_btree_search_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_btree_search_page_leaks, 0); + atomic64_set(&ssdfs_btree_search_memory_leaks, 0); + atomic64_set(&ssdfs_btree_search_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_btree_search_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_btree_search_page_leaks) != 0) { + SSDFS_ERR("BTREE SEARCH: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_btree_search_page_leaks)); + } + + if (atomic64_read(&ssdfs_btree_search_memory_leaks) != 0) { + SSDFS_ERR("BTREE SEARCH: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_btree_search_memory_leaks)); + } + + if (atomic64_read(&ssdfs_btree_search_cache_leaks) != 0) { + SSDFS_ERR("BTREE SEARCH: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_btree_search_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +/****************************************************************************** + * BTREE SEARCH OBJECT CACHE * + ******************************************************************************/ + +static struct kmem_cache *ssdfs_btree_search_obj_cachep; + +void ssdfs_zero_btree_search_obj_cache_ptr(void) +{ + ssdfs_btree_search_obj_cachep = NULL; +} + +static void ssdfs_init_btree_search_object_once(void *obj) +{ + struct ssdfs_btree_search *search_obj = obj; + + memset(search_obj, 0, sizeof(struct ssdfs_btree_search)); +} + +void ssdfs_shrink_btree_search_obj_cache(void) +{ + if (ssdfs_btree_search_obj_cachep) + kmem_cache_shrink(ssdfs_btree_search_obj_cachep); +} + +void ssdfs_destroy_btree_search_obj_cache(void) +{ + if (ssdfs_btree_search_obj_cachep) + kmem_cache_destroy(ssdfs_btree_search_obj_cachep); +} + +int ssdfs_init_btree_search_obj_cache(void) +{ + ssdfs_btree_search_obj_cachep = + kmem_cache_create_usercopy("ssdfs_btree_search_obj_cache", + sizeof(struct ssdfs_btree_search), 0, + SLAB_RECLAIM_ACCOUNT | + SLAB_MEM_SPREAD | + SLAB_ACCOUNT, + offsetof(struct ssdfs_btree_search, raw), + sizeof(union ssdfs_btree_search_raw_data) + + sizeof(struct ssdfs_name_string), + ssdfs_init_btree_search_object_once); + if (!ssdfs_btree_search_obj_cachep) { + SSDFS_ERR("unable to create btree search objects cache\n"); + return -ENOMEM; + } + + return 0; +} + +/****************************************************************************** + * BTREE SEARCH OBJECT FUNCTIONALITY * + ******************************************************************************/ + +/* + * ssdfs_btree_search_alloc() - allocate memory for btree search object + */ +struct ssdfs_btree_search *ssdfs_btree_search_alloc(void) +{ + struct ssdfs_btree_search *ptr; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ssdfs_btree_search_obj_cachep); +#endif /* CONFIG_SSDFS_DEBUG */ + + ptr = kmem_cache_alloc(ssdfs_btree_search_obj_cachep, GFP_KERNEL); + if (!ptr) { + SSDFS_ERR("fail to allocate memory for btree search object\n"); + return ERR_PTR(-ENOMEM); + } + + ssdfs_btree_search_cache_leaks_increment(ptr); + + return ptr; +} + +/* + * ssdfs_btree_search_free() - free memory for btree search object + */ +void ssdfs_btree_search_free(struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ssdfs_btree_search_obj_cachep); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!search) + return; + + if (search->node.parent) { + ssdfs_btree_node_put(search->node.parent); + search->node.parent = NULL; + } + + if (search->node.child) { + ssdfs_btree_node_put(search->node.child); + search->node.child = NULL; + } + + search->node.state = SSDFS_BTREE_SEARCH_NODE_DESC_EMPTY; + + ssdfs_btree_search_free_result_buf(search); + ssdfs_btree_search_free_result_name(search); + + ssdfs_btree_search_cache_leaks_decrement(search); + kmem_cache_free(ssdfs_btree_search_obj_cachep, search); +} + +/* + * ssdfs_btree_search_init() - init btree search object + * @search: btree search object [out] + */ +void ssdfs_btree_search_init(struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_btree_search_free_result_buf(search); + ssdfs_btree_search_free_result_name(search); + + if (search->node.parent) { + ssdfs_btree_node_put(search->node.parent); + search->node.parent = NULL; + } + + if (search->node.child) { + ssdfs_btree_node_put(search->node.child); + search->node.child = NULL; + } + + memset(search, 0, sizeof(struct ssdfs_btree_search)); + search->request.type = SSDFS_BTREE_SEARCH_UNKNOWN_TYPE; + search->node.state = SSDFS_BTREE_SEARCH_NODE_DESC_EMPTY; + search->result.state = SSDFS_BTREE_SEARCH_UNKNOWN_RESULT; + search->result.err = 0; + search->result.buf = NULL; + search->result.buf_state = SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE; + search->result.name = NULL; + search->result.name_state = SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE; +} + +/* + * need_initialize_btree_search() - check necessity to init the search object + * @search: btree search object + */ +bool need_initialize_btree_search(struct ssdfs_btree_search *search) +{ + bool need_initialize = false; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_UNKNOWN_RESULT: + case SSDFS_BTREE_SEARCH_FAILURE: + case SSDFS_BTREE_SEARCH_EMPTY_RESULT: + case SSDFS_BTREE_SEARCH_OBSOLETE_RESULT: + need_initialize = true; + break; + + case SSDFS_BTREE_SEARCH_VALID_ITEM: + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_FIND_ITEM: + case SSDFS_BTREE_SEARCH_FIND_RANGE: + case SSDFS_BTREE_SEARCH_CHANGE_ITEM: + case SSDFS_BTREE_SEARCH_MOVE_ITEM: + case SSDFS_BTREE_SEARCH_DELETE_ITEM: + case SSDFS_BTREE_SEARCH_DELETE_RANGE: + case SSDFS_BTREE_SEARCH_DELETE_ALL: + case SSDFS_BTREE_SEARCH_INVALIDATE_TAIL: + need_initialize = false; + break; + + case SSDFS_BTREE_SEARCH_ALLOCATE_ITEM: + case SSDFS_BTREE_SEARCH_ALLOCATE_RANGE: + case SSDFS_BTREE_SEARCH_ADD_ITEM: + case SSDFS_BTREE_SEARCH_ADD_RANGE: + need_initialize = true; + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_ERR("search->request.type %#x\n", + search->request.type); + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + }; + break; + + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_ALLOCATE_ITEM: + case SSDFS_BTREE_SEARCH_ALLOCATE_RANGE: + case SSDFS_BTREE_SEARCH_ADD_ITEM: + case SSDFS_BTREE_SEARCH_ADD_RANGE: + need_initialize = false; + break; + + case SSDFS_BTREE_SEARCH_FIND_ITEM: + case SSDFS_BTREE_SEARCH_FIND_RANGE: + case SSDFS_BTREE_SEARCH_CHANGE_ITEM: + case SSDFS_BTREE_SEARCH_MOVE_ITEM: + case SSDFS_BTREE_SEARCH_DELETE_ITEM: + case SSDFS_BTREE_SEARCH_DELETE_RANGE: + case SSDFS_BTREE_SEARCH_DELETE_ALL: + case SSDFS_BTREE_SEARCH_INVALIDATE_TAIL: + need_initialize = true; + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_ERR("search->request.type %#x\n", + search->request.type); + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + }; + break; + + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_ADD_ITEM: + need_initialize = false; + break; + + case SSDFS_BTREE_SEARCH_FIND_ITEM: + case SSDFS_BTREE_SEARCH_FIND_RANGE: + case SSDFS_BTREE_SEARCH_CHANGE_ITEM: + case SSDFS_BTREE_SEARCH_MOVE_ITEM: + case SSDFS_BTREE_SEARCH_DELETE_ITEM: + case SSDFS_BTREE_SEARCH_DELETE_RANGE: + case SSDFS_BTREE_SEARCH_DELETE_ALL: + case SSDFS_BTREE_SEARCH_INVALIDATE_TAIL: + need_initialize = true; + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_ERR("search->request.type %#x\n", + search->request.type); + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + }; + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_ERR("search->result.state %#x\n", + search->result.state); + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + }; + + return need_initialize; +} + +/* + * is_btree_search_request_valid() - check validity of search request + * @search: btree search object + */ +bool is_btree_search_request_valid(struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_FIND_ITEM: + case SSDFS_BTREE_SEARCH_FIND_RANGE: + case SSDFS_BTREE_SEARCH_ALLOCATE_ITEM: + case SSDFS_BTREE_SEARCH_ALLOCATE_RANGE: + case SSDFS_BTREE_SEARCH_ADD_ITEM: + case SSDFS_BTREE_SEARCH_ADD_RANGE: + case SSDFS_BTREE_SEARCH_CHANGE_ITEM: + case SSDFS_BTREE_SEARCH_MOVE_ITEM: + case SSDFS_BTREE_SEARCH_DELETE_ITEM: + case SSDFS_BTREE_SEARCH_DELETE_RANGE: + case SSDFS_BTREE_SEARCH_DELETE_ALL: + case SSDFS_BTREE_SEARCH_INVALIDATE_TAIL: + /* valid type */ + break; + + default: + SSDFS_WARN("invalid search request type %#x\n", + search->request.type); + return false; + }; + + if (search->request.flags & ~SSDFS_BTREE_SEARCH_REQUEST_FLAGS_MASK) { + SSDFS_WARN("invalid flags set: %#x\n", + search->request.flags); + return false; + } + + if (search->request.start.hash == U64_MAX) { + SSDFS_WARN("invalid start_hash\n"); + return false; + } else if (search->request.start.hash > search->request.end.hash) { + SSDFS_WARN("invalid range: " + "start_hash %llx, end_hash %llx\n", + search->request.start.hash, + search->request.end.hash); + return false; + } + + return true; +} + +/* + * is_btree_index_search_request_valid() - check index node search request + * @search: btree search object + * @prev_node_id: node ID from previous search + * @prev_node_height: node height from previous search + */ +bool is_btree_index_search_request_valid(struct ssdfs_btree_search *search, + u32 prev_node_id, + u8 prev_node_height) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); + BUG_ON(prev_node_id == SSDFS_BTREE_NODE_INVALID_ID); + BUG_ON(prev_node_height == U8_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_btree_search_request_valid(search)) + return false; + + if (prev_node_id == search->node.id) + return false; + + if (search->node.height != (prev_node_height - 1)) + return false; + + if (search->node.state != SSDFS_BTREE_SEARCH_FOUND_INDEX_NODE_DESC) + return false; + + return true; +} + +/* + * is_btree_leaf_node_found() - check that leaf btree node has been found + * @search: btree search object + */ +bool is_btree_leaf_node_found(struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search->node.state != SSDFS_BTREE_SEARCH_FOUND_LEAF_NODE_DESC) + return false; + + if (search->node.id == SSDFS_BTREE_NODE_INVALID_ID) + return false; + + if (search->node.child == NULL) + return false; + + return true; +} + +/* + * is_btree_search_node_desc_consistent() - check node descriptor consistency + * @search: btree search object + */ +bool is_btree_search_node_desc_consistent(struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search->node.state != SSDFS_BTREE_SEARCH_FOUND_LEAF_NODE_DESC) { + SSDFS_ERR("unexpected search->node.state %#x\n", + search->node.state); + return false; + } + + if (!search->node.parent) { + SSDFS_ERR("search->node.parent is NULL\n"); + return false; + } + + if (!search->node.child) { + SSDFS_ERR("search->node.child is NULL\n"); + return false; + } + + if (search->node.id != search->node.child->node_id) { + SSDFS_ERR("search->node.id %u != search->node.child->node_id %u\n", + search->node.id, search->node.child->node_id); + return false; + } + + if (search->node.height != atomic_read(&search->node.child->height)) { + SSDFS_ERR("invalid height: " + "search->node.height %u, " + "search->node.child->height %d\n", + search->node.height, + atomic_read(&search->node.child->height)); + return false; + } + + return true; +} + +/* + * ssdfs_btree_search_define_child_node() - define child node for the search + * @search: search object + * @child: child node object + */ +void ssdfs_btree_search_define_child_node(struct ssdfs_btree_search *search, + struct ssdfs_btree_node *child) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search->node.child) + ssdfs_btree_node_put(search->node.child); + + search->node.child = child; + + if (search->node.child) + ssdfs_btree_node_get(search->node.child); +} + +/* + * ssdfs_btree_search_forget_child_node() - forget child node for the search + * @search: search object + */ +void ssdfs_btree_search_forget_child_node(struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search->node.child) { + ssdfs_btree_node_put(search->node.child); + search->node.child = NULL; + search->node.state = SSDFS_BTREE_SEARCH_NODE_DESC_EMPTY; + } +} + +/* + * ssdfs_btree_search_define_parent_node() - define parent node for the search + * @search: search object + * @parent: parent node object + */ +void ssdfs_btree_search_define_parent_node(struct ssdfs_btree_search *search, + struct ssdfs_btree_node *parent) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search->node.parent) + ssdfs_btree_node_put(search->node.parent); + + search->node.parent = parent; + + if (search->node.parent) + ssdfs_btree_node_get(search->node.parent); +} + +/* + * ssdfs_btree_search_forget_parent_node() - forget parent node for the search + * @search: search object + */ +void ssdfs_btree_search_forget_parent_node(struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search->node.parent) { + ssdfs_btree_node_put(search->node.parent); + search->node.parent = NULL; + search->node.state = SSDFS_BTREE_SEARCH_NODE_DESC_EMPTY; + } +} + +/* + * ssdfs_btree_search_alloc_result_buf() - allocate result buffer + * @search: search object + * @buf_size: buffer size + */ +int ssdfs_btree_search_alloc_result_buf(struct ssdfs_btree_search *search, + size_t buf_size) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->result.buf = ssdfs_btree_search_kzalloc(buf_size, GFP_KERNEL); + if (!search->result.buf) { + SSDFS_ERR("fail to allocate buffer: size %zu\n", + buf_size); + return -ENOMEM; + } + + search->result.buf_size = buf_size; + search->result.buf_state = SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER; + search->result.items_in_buffer = 0; + return 0; +} + +/* + * ssdfs_btree_search_free_result_buf() - free result buffer + * @search: search object + */ +void ssdfs_btree_search_free_result_buf(struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search->result.buf_state == SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER) { + if (search->result.buf) { + ssdfs_btree_search_kfree(search->result.buf); + search->result.buf = NULL; + search->result.buf_state = + SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE; + } + } +} + +/* + * ssdfs_btree_search_alloc_result_name() - allocate result name + * @search: search object + * @string_size: name string size + */ +int ssdfs_btree_search_alloc_result_name(struct ssdfs_btree_search *search, + size_t string_size) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->result.name = ssdfs_btree_search_kzalloc(string_size, + GFP_KERNEL); + if (!search->result.name) { + SSDFS_ERR("fail to allocate buffer: size %zu\n", + string_size); + return -ENOMEM; + } + + search->result.name_string_size = string_size; + search->result.name_state = SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER; + search->result.names_in_buffer = 0; + return 0; +} + +/* + * ssdfs_btree_search_free_result_name() - free result name + * @search: search object + */ +void ssdfs_btree_search_free_result_name(struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search->result.name_state == SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER) { + if (search->result.name) { + ssdfs_btree_search_kfree(search->result.name); + search->result.name = NULL; + search->result.name = + SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE; + } + } +} + +void ssdfs_debug_btree_search_object(struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + struct ssdfs_btree_index_key *node_index; + struct ssdfs_shdict_ltbl2_item *ltbl2_item; + size_t item_size; + size_t count; + int i; + + BUG_ON(!search); + + SSDFS_DBG("REQUEST: type %#x, flags %#x, count %u, " + "START: name %p, name_len %zu, hash %llx, ino %llu, " + "END: name %p, name_len %zu, hash %llx, ino %llu\n", + search->request.type, + search->request.flags, + search->request.count, + search->request.start.name, + search->request.start.name_len, + search->request.start.hash, + search->request.start.ino, + search->request.end.name, + search->request.end.name_len, + search->request.end.hash, + search->request.end.ino); + + SSDFS_DBG("NODE: state %#x, id %u, height %u, " + "parent %p, child %p\n", + search->node.state, + search->node.id, + search->node.height, + search->node.parent, + search->node.child); + + node_index = &search->node.found_index; + SSDFS_DBG("NODE_INDEX: node_id %u, node_type %#x, " + "height %u, flags %#x, hash %llx, " + "seg_id %llu, logical_blk %u, len %u\n", + le32_to_cpu(node_index->node_id), + node_index->node_type, + node_index->height, + le16_to_cpu(node_index->flags), + le64_to_cpu(node_index->index.hash), + le64_to_cpu(node_index->index.extent.seg_id), + le32_to_cpu(node_index->index.extent.logical_blk), + le32_to_cpu(node_index->index.extent.len)); + + if (search->node.parent) { + SSDFS_DBG("PARENT NODE: node_id %u, state %#x, " + "type %#x, height %d, refs_count %d\n", + search->node.parent->node_id, + atomic_read(&search->node.parent->state), + atomic_read(&search->node.parent->type), + atomic_read(&search->node.parent->height), + atomic_read(&search->node.parent->refs_count)); + } + + if (search->node.child) { + SSDFS_DBG("CHILD NODE: node_id %u, state %#x, " + "type %#x, height %d, refs_count %d\n", + search->node.child->node_id, + atomic_read(&search->node.child->state), + atomic_read(&search->node.child->type), + atomic_read(&search->node.child->height), + atomic_read(&search->node.child->refs_count)); + } + + SSDFS_DBG("RESULT: state %#x, err %d, start_index %u, count %u, " + "search_cno %llu\n", + search->result.state, + search->result.err, + search->result.start_index, + search->result.count, + search->result.search_cno); + + SSDFS_DBG("NAME: name_state %#x, name %p, " + "name_string_size %zu, names_in_buffer %u\n", + search->result.name_state, + search->result.name, + search->result.name_string_size, + search->result.names_in_buffer); + + SSDFS_DBG("LOOKUP: index %u, hash_lo %u, " + "start_index %u, range_len %u\n", + search->name.lookup.index, + le32_to_cpu(search->name.lookup.desc.hash_lo), + le16_to_cpu(search->name.lookup.desc.start_index), + le16_to_cpu(search->name.lookup.desc.range_len)); + + ltbl2_item = &search->name.strings_range.desc; + SSDFS_DBG("STRINGS_RANGE: index %u, hash_lo %u, " + "prefix_len %u, str_count %u, " + "hash_index %u\n", + search->name.strings_range.index, + le32_to_cpu(ltbl2_item->hash_lo), + ltbl2_item->prefix_len, + ltbl2_item->str_count, + le16_to_cpu(ltbl2_item->hash_index)); + + SSDFS_DBG("PREFIX: index %u, hash_hi %u, " + "str_offset %u, str_len %u, type %#x\n", + search->name.prefix.index, + le32_to_cpu(search->name.prefix.desc.hash_hi), + le16_to_cpu(search->name.prefix.desc.str_offset), + search->name.prefix.desc.str_len, + search->name.prefix.desc.type); + + SSDFS_DBG("LEFT_NAME: index %u, hash_hi %u, " + "str_offset %u, str_len %u, type %#x\n", + search->name.left_name.index, + le32_to_cpu(search->name.left_name.desc.hash_hi), + le16_to_cpu(search->name.left_name.desc.str_offset), + search->name.left_name.desc.str_len, + search->name.left_name.desc.type); + + SSDFS_DBG("RIGHT_NAME: index %u, hash_hi %u, " + "str_offset %u, str_len %u, type %#x\n", + search->name.right_name.index, + le32_to_cpu(search->name.right_name.desc.hash_hi), + le16_to_cpu(search->name.right_name.desc.str_offset), + search->name.right_name.desc.str_len, + search->name.right_name.desc.type); + + if (search->result.name) { + count = search->result.names_in_buffer; + + if (count > 0) + item_size = search->result.name_string_size / count; + else + item_size = 0; + + for (i = 0; i < search->result.names_in_buffer; i++) { + struct ssdfs_name_string *name; + u8 *addr; + + addr = (u8 *)search->result.name + (i * item_size); + name = (struct ssdfs_name_string *)addr; + + SSDFS_DBG("NAME: index %d, hash %llx, str_len %zu\n", + i, name->hash, name->len); + + SSDFS_DBG("LOOKUP: index %u, hash_lo %u, " + "start_index %u, range_len %u\n", + name->lookup.index, + le32_to_cpu(name->lookup.desc.hash_lo), + le16_to_cpu(name->lookup.desc.start_index), + le16_to_cpu(name->lookup.desc.range_len)); + + ltbl2_item = &name->strings_range.desc; + SSDFS_DBG("STRINGS_RANGE: index %u, hash_lo %u, " + "prefix_len %u, str_count %u, " + "hash_index %u\n", + name->strings_range.index, + le32_to_cpu(ltbl2_item->hash_lo), + ltbl2_item->prefix_len, + ltbl2_item->str_count, + le16_to_cpu(ltbl2_item->hash_index)); + + SSDFS_DBG("PREFIX: index %u, hash_hi %u, " + "str_offset %u, str_len %u, type %#x\n", + name->prefix.index, + le32_to_cpu(name->prefix.desc.hash_hi), + le16_to_cpu(name->prefix.desc.str_offset), + name->prefix.desc.str_len, + name->prefix.desc.type); + + SSDFS_DBG("LEFT_NAME: index %u, hash_hi %u, " + "str_offset %u, str_len %u, type %#x\n", + name->left_name.index, + le32_to_cpu(name->left_name.desc.hash_hi), + le16_to_cpu(name->left_name.desc.str_offset), + name->left_name.desc.str_len, + name->left_name.desc.type); + + SSDFS_DBG("RIGHT_NAME: index %u, hash_hi %u, " + "str_offset %u, str_len %u, type %#x\n", + name->right_name.index, + le32_to_cpu(name->right_name.desc.hash_hi), + le16_to_cpu(name->right_name.desc.str_offset), + name->right_name.desc.str_len, + name->right_name.desc.type); + + SSDFS_DBG("RAW STRING DUMP: index %d\n", + i); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + name->str, + name->len); + SSDFS_DBG("\n"); + } + } + + SSDFS_DBG("RESULT BUFFER: buf_state %#x, buf %p, " + "buf_size %zu, items_in_buffer %u\n", + search->result.buf_state, + search->result.buf, + search->result.buf_size, + search->result.items_in_buffer); + + if (search->result.buf) { + count = search->result.items_in_buffer; + + if (count > 0) + item_size = search->result.buf_size / count; + else + item_size = 0; + + for (i = 0; i < search->result.items_in_buffer; i++) { + void *item; + + item = (u8 *)search->result.buf + (i * item_size); + + SSDFS_DBG("RAW BUF DUMP: index %d\n", + i); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + item, + item_size); + SSDFS_DBG("\n"); + } + } +#endif /* CONFIG_SSDFS_DEBUG */ +} diff --git a/fs/ssdfs/btree_search.h b/fs/ssdfs/btree_search.h new file mode 100644 index 000000000000..9fbdb796b4dd --- /dev/null +++ b/fs/ssdfs/btree_search.h @@ -0,0 +1,359 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/btree_search.h - btree search object declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#ifndef _SSDFS_BTREE_SEARCH_H +#define _SSDFS_BTREE_SEARCH_H + +/* Search request types */ +enum { + SSDFS_BTREE_SEARCH_UNKNOWN_TYPE, + SSDFS_BTREE_SEARCH_FIND_ITEM, + SSDFS_BTREE_SEARCH_FIND_RANGE, + SSDFS_BTREE_SEARCH_ALLOCATE_ITEM, + SSDFS_BTREE_SEARCH_ALLOCATE_RANGE, + SSDFS_BTREE_SEARCH_ADD_ITEM, + SSDFS_BTREE_SEARCH_ADD_RANGE, + SSDFS_BTREE_SEARCH_CHANGE_ITEM, + SSDFS_BTREE_SEARCH_MOVE_ITEM, + SSDFS_BTREE_SEARCH_DELETE_ITEM, + SSDFS_BTREE_SEARCH_DELETE_RANGE, + SSDFS_BTREE_SEARCH_DELETE_ALL, + SSDFS_BTREE_SEARCH_INVALIDATE_TAIL, + SSDFS_BTREE_SEARCH_TYPE_MAX +}; + +/* + * struct ssdfs_peb_timestamps - PEB timestamps + * @peb_id: PEB ID + * @create_time: PEB's create timestamp + * @last_log_time: PEB's last log create timestamp + */ +struct ssdfs_peb_timestamps { + u64 peb_id; + u64 create_time; + u64 last_log_time; +}; + +/* + * struct ssdfs_btree_search_hash - btree search hash + * @name: name of the searching object + * @name_len: length of the name in bytes + * @uuid: UUID of the searching object + * @hash: hash value + * @ino: inode ID + * @fingerprint: fingerprint value + * @peb2time: PEB timestamps + */ +struct ssdfs_btree_search_hash { + const char *name; + size_t name_len; + u8 *uuid; + u64 hash; + u64 ino; + struct ssdfs_fingerprint *fingerprint; + struct ssdfs_peb_timestamps *peb2time; +}; + +/* + * struct ssdfs_btree_search_request - btree search request + * @type: request type + * @flags: request flags + * @start: starting hash value + * @end: ending hash value + * @count: range of hashes length in the request + */ +struct ssdfs_btree_search_request { + int type; +#define SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE (1 << 0) +#define SSDFS_BTREE_SEARCH_HAS_VALID_COUNT (1 << 1) +#define SSDFS_BTREE_SEARCH_HAS_VALID_NAME (1 << 2) +#define SSDFS_BTREE_SEARCH_HAS_VALID_INO (1 << 3) +#define SSDFS_BTREE_SEARCH_NOT_INVALIDATE (1 << 4) +#define SSDFS_BTREE_SEARCH_HAS_VALID_UUID (1 << 5) +#define SSDFS_BTREE_SEARCH_HAS_VALID_FINGERPRINT (1 << 6) +#define SSDFS_BTREE_SEARCH_INCREMENT_REF_COUNT (1 << 7) +#define SSDFS_BTREE_SEARCH_DECREMENT_REF_COUNT (1 << 8) +#define SSDFS_BTREE_SEARCH_INLINE_BUF_HAS_NEW_ITEM (1 << 9) +#define SSDFS_BTREE_SEARCH_DONT_EXTRACT_RECORD (1 << 10) +#define SSDFS_BTREE_SEARCH_HAS_PEB2TIME_PAIR (1 << 11) +#define SSDFS_BTREE_SEARCH_REQUEST_FLAGS_MASK 0xFFF + u32 flags; + + struct ssdfs_btree_search_hash start; + struct ssdfs_btree_search_hash end; + unsigned int count; +}; + +/* Node descriptor possible states */ +enum { + SSDFS_BTREE_SEARCH_NODE_DESC_EMPTY, + SSDFS_BTREE_SEARCH_ROOT_NODE_DESC, + SSDFS_BTREE_SEARCH_FOUND_INDEX_NODE_DESC, + SSDFS_BTREE_SEARCH_FOUND_LEAF_NODE_DESC, + SSDFS_BTREE_SEARCH_NODE_DESC_STATE_MAX +}; + +/* + * struct ssdfs_btree_search_node_desc - btree node descriptor + * @state: descriptor state + * @id: node ID number + * @height: node height + * @found_index: index of child node + * @parent: last parent node + * @child: last child node + */ +struct ssdfs_btree_search_node_desc { + int state; + + u32 id; + u8 height; + + struct ssdfs_btree_index_key found_index; + struct ssdfs_btree_node *parent; + struct ssdfs_btree_node *child; +}; + +/* Search result possible states */ +enum { + SSDFS_BTREE_SEARCH_UNKNOWN_RESULT, + SSDFS_BTREE_SEARCH_FAILURE, + SSDFS_BTREE_SEARCH_EMPTY_RESULT, + SSDFS_BTREE_SEARCH_VALID_ITEM, + SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND, + SSDFS_BTREE_SEARCH_OUT_OF_RANGE, + SSDFS_BTREE_SEARCH_OBSOLETE_RESULT, + SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE, + SSDFS_BTREE_SEARCH_PLEASE_DELETE_NODE, + SSDFS_BTREE_SEARCH_PLEASE_MOVE_BUF_CONTENT, + SSDFS_BTREE_SEARCH_RESULT_STATE_MAX +}; + +/* Search result buffer possible states */ +enum { + SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE, + SSDFS_BTREE_SEARCH_INLINE_BUFFER, + SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER, + SSDFS_BTREE_SEARCH_BUFFER_STATE_MAX +}; + +/* + * struct ssdfs_lookup_descriptor - lookup descriptor + * @index: index of item in the lookup1 table + * @desc: descriptor of lookup1 table's item + */ +struct ssdfs_lookup_descriptor { + u16 index; + struct ssdfs_shdict_ltbl1_item desc; +}; + +/* + * struct ssdfs_strings_range_descriptor - strings range descriptor + * @index: index of item in the lookup2 table + * @desc: descriptor of lookup2 table's item + */ +struct ssdfs_strings_range_descriptor { + u16 index; + struct ssdfs_shdict_ltbl2_item desc; +}; + +/* + * struct ssdfs_string_descriptor - string descriptor + * @index: index of item in the hash table + * @desc: descriptor of hash table's item + */ +struct ssdfs_string_descriptor { + u16 index; + struct ssdfs_shdict_htbl_item desc; +}; + +/* + * struct ssdfs_string_table_index - string table indexes + * @lookup1_index: index in lookup1 table + * @lookup2_index: index in lookup2 table + * @hash_index: index in hash table + * + * Search operation defines lookup, strings_range, prefix, + * left_name, and right_name. This information contains + * potential position to store the string. However, + * the final position to insert string and indexes can + * be defined during the insert operation. This field + * keeps the knowledge of finally used indexes to store + * the string and lookup1, lookup2, hash indexes. + */ +struct ssdfs_string_table_index { + u16 lookup1_index; + u16 lookup2_index; + u16 hash_index; +}; + +/* + * struct ssdfs_name_string - name string + * @hash: name hash + * @lookup: lookup item descriptor + * @strings_range: range of strings descriptor + * @prefix: prefix descriptor + * @left_name: left name descriptor + * @right_name: right name descriptor + * @placement: stored indexes descriptor + * @len: name length + * @str: name buffer + */ +struct ssdfs_name_string { + u64 hash; + struct ssdfs_lookup_descriptor lookup; + struct ssdfs_strings_range_descriptor strings_range; + struct ssdfs_string_descriptor prefix; + struct ssdfs_string_descriptor left_name; + struct ssdfs_string_descriptor right_name; + + struct ssdfs_string_table_index placement; + + size_t len; + unsigned char str[SSDFS_MAX_NAME_LEN]; +}; + +/* + * struct ssdfs_btree_search_result - btree search result + * @state: result state + * @err: result error code + * @start_index: starting found item index + * @count: count of found items + * @search_cno: checkpoint of search activity + * @name_state: state of the name buffer + * @name: pointer on buffer with name(s) + * @name_string_size: size of the buffer in bytes + * @names_in_buffer: count of names in buffer + * @buf_state: state of the buffer + * @buf: pointer on buffer with item(s) + * @buf_size: size of the buffer in bytes + * @items_in_buffer: count of items in buffer + */ +struct ssdfs_btree_search_result { + int state; + int err; + + u16 start_index; + u16 count; + + u64 search_cno; + + int name_state; + struct ssdfs_name_string *name; + size_t name_string_size; + u32 names_in_buffer; + + int buf_state; + void *buf; + size_t buf_size; + u32 items_in_buffer; +}; + +/* Position check results */ +enum { + SSDFS_CORRECT_POSITION, + SSDFS_SEARCH_LEFT_DIRECTION, + SSDFS_SEARCH_RIGHT_DIRECTION, + SSDFS_CHECK_POSITION_FAILURE +}; + +/* + * struct ssdfs_btree_search - btree search + * @request: search request + * @node: btree node descriptor + * @result: search result + * @raw.fork: raw fork buffer + * @raw.inode: raw inode buffer + * @raw.dentry.header: raw directory entry header + * @raw.xattr.header: raw xattr entry header + * @raw.shared_extent: shared extent buffer + * @raw.snapshot: raw snapshot info buffer + * @raw.peb2time: raw PEB2time set + * @raw.invalidated_extent: invalidated extent buffer + * @name: name string + */ +struct ssdfs_btree_search { + struct ssdfs_btree_search_request request; + struct ssdfs_btree_search_node_desc node; + struct ssdfs_btree_search_result result; + union ssdfs_btree_search_raw_data { + struct ssdfs_raw_fork fork; + struct ssdfs_inode inode; + struct ssdfs_raw_dentry { + struct ssdfs_dir_entry header; + } dentry; + struct ssdfs_raw_xattr { + struct ssdfs_xattr_entry header; + } xattr; + struct ssdfs_shared_extent shared_extent; + struct ssdfs_snapshot snapshot; + struct ssdfs_peb2time_set peb2time; + struct ssdfs_raw_extent invalidated_extent; + } raw; + struct ssdfs_name_string name; +}; + +/* Btree height's classification */ +enum { + SSDFS_BTREE_PARENT2LEAF_HEIGHT = 1, + SSDFS_BTREE_PARENT2HYBRID_HEIGHT = 2, + SSDFS_BTREE_PARENT2INDEX_HEIGHT = 3, +}; + +/* + * Inline functions + */ + +static inline +bool is_btree_search_contains_new_item(struct ssdfs_btree_search *search) +{ + return search->request.flags & + SSDFS_BTREE_SEARCH_INLINE_BUF_HAS_NEW_ITEM; +} + +/* + * Btree search object API + */ +struct ssdfs_btree_search *ssdfs_btree_search_alloc(void); +void ssdfs_btree_search_free(struct ssdfs_btree_search *search); +void ssdfs_btree_search_init(struct ssdfs_btree_search *search); +bool need_initialize_btree_search(struct ssdfs_btree_search *search); +bool is_btree_search_request_valid(struct ssdfs_btree_search *search); +bool is_btree_index_search_request_valid(struct ssdfs_btree_search *search, + u32 prev_node_id, + u8 prev_node_height); +bool is_btree_leaf_node_found(struct ssdfs_btree_search *search); +bool is_btree_search_node_desc_consistent(struct ssdfs_btree_search *search); +void ssdfs_btree_search_define_parent_node(struct ssdfs_btree_search *search, + struct ssdfs_btree_node *parent); +void ssdfs_btree_search_define_child_node(struct ssdfs_btree_search *search, + struct ssdfs_btree_node *child); +void ssdfs_btree_search_forget_parent_node(struct ssdfs_btree_search *search); +void ssdfs_btree_search_forget_child_node(struct ssdfs_btree_search *search); +int ssdfs_btree_search_alloc_result_buf(struct ssdfs_btree_search *search, + size_t buf_size); +void ssdfs_btree_search_free_result_buf(struct ssdfs_btree_search *search); +int ssdfs_btree_search_alloc_result_name(struct ssdfs_btree_search *search, + size_t string_size); +void ssdfs_btree_search_free_result_name(struct ssdfs_btree_search *search); + +void ssdfs_debug_btree_search_object(struct ssdfs_btree_search *search); + +#endif /* _SSDFS_BTREE_SEARCH_H */ From patchwork Sat Feb 25 01:08:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151953 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38AA7C6FA8E for ; Sat, 25 Feb 2023 01:19:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229816AbjBYBTl (ORCPT ); Fri, 24 Feb 2023 20:19:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49392 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229708AbjBYBRj (ORCPT ); Fri, 24 Feb 2023 20:17:39 -0500 Received: from mail-oi1-x234.google.com (mail-oi1-x234.google.com [IPv6:2607:f8b0:4864:20::234]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 14D3A168B5 for ; Fri, 24 Feb 2023 17:17:19 -0800 (PST) Received: by mail-oi1-x234.google.com with SMTP id bk32so763837oib.10 for ; Fri, 24 Feb 2023 17:17:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=++zi/wZDtSxzts4ZrAaw6oO7n8C0CQ3WJdktQdNkSak=; b=AAvBca9a5wQ4pORgrhGg5LcYvIpP4NeyibldF0HZb4V7EsawzqKHCdmweJE40w1nZ6 KgrOyz6cLyi0WCzxllPIzmha9NyHfyvFfx8FOFpEEXcJu9NLph02aC1xC2/Rj/lXU6mb 2UXaD1KUWuyrVZiIArgmU3MwmCf95euCkeDVkEyNxqhLHTrpPtCXSCABHpX0L39CX5hw JhGb3hlihcz8wf0VDrxsTbPXmymbFcsEW6G9vgfnpBuYlR4HMGu9dW1zxHlRoDEmIXfs O/prApvWWfhYAUNbvVj+ShotddMz51Ew1Tbo+MnsnFE9Ois0f8izn0Uu72hOA2MtwLVV 3cPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=++zi/wZDtSxzts4ZrAaw6oO7n8C0CQ3WJdktQdNkSak=; b=kZmpS4daBedax7RGCVv+kce8wFm3AqoDyERnM9lJ8yvUZT+CGhlp270fIgH4D5g3Nl ehGSNoqm51mnyk5H1nj56DZuS3gM5mp4bwfdmvBUozE4G8NbVxRqjS+cGr9SCwbP87lA DbZ1AX/1UBHvAinLa5ZilUK9/s9hmT7aExgeeV4MV6X5LVioGqV4HBOGR45EIHykl6ti q22Sdl0ceZxoBx4qRBNZrSs8dlz0SInQULY0vVPirv7bnwV0o+wpgGH5n5YjL6o7ofL9 TTAsUXSq0OnzHKfBWh8ckd+rpprGMuI1l58zpItPPej6Mm7D55aoBF5n7JjtPzN8UTny l95g== X-Gm-Message-State: AO0yUKWbEPnjmLaasm7alKvXd4AZNoE94yijg0LQrTHWYklYcdkh/FRp MriftwYuu2e7q96TjE46mdZVQmOmFDfHx1LT X-Google-Smtp-Source: AK7set9/OmPu9Lj7TN/goZ+dwwtJ7mDJ6Bhxcep5y17XPRc/xgKiXJ/8thJe4XXSMzl0m+eFK7E/zA== X-Received: by 2002:a05:6808:6147:b0:383:c4c4:22b0 with SMTP id dl7-20020a056808614700b00383c4c422b0mr5774883oib.20.1677287836496; Fri, 24 Feb 2023 17:17:16 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:15 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 48/76] ssdfs: add/delete b-tree node Date: Fri, 24 Feb 2023 17:08:59 -0800 Message-Id: <20230225010927.813929-49-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org B-tree can be imagined like a hierarchy of b-tree nodes. This patch implements the logic of adding, inserting, or deletion the node to/from b-tree. The logic is reused by all specialized b-trees implementation. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/btree.c | 3054 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 3054 insertions(+) diff --git a/fs/ssdfs/btree.c b/fs/ssdfs/btree.c index 5780077a1eb9..d7778cdb67a1 100644 --- a/fs/ssdfs/btree.c +++ b/fs/ssdfs/btree.c @@ -1018,3 +1018,3057 @@ int ssdfs_btree_flush(struct ssdfs_btree *tree) return err; } + +/* + * ssdfs_btree_destroy_node_range() - destroy nodes from radix tree + * @tree: btree object + * @hash: starting hash for nodes destruction + * + * This method tries to flush and destroy + * some nodes from radix tree + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_btree_destroy_node_range(struct ssdfs_btree *tree, + u64 hash) +{ + int tree_state; + struct radix_tree_iter iter; + void **slot; + struct ssdfs_btree_node *node; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_state = atomic_read(&tree->state); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree %p, type %#x, state %#x\n", + tree, tree->type, tree_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (tree_state) { + case SSDFS_BTREE_CREATED: + case SSDFS_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_WARN("invalid tree state %#x\n", + tree_state); + return -ERANGE; + } + + down_write(&tree->lock); + + rcu_read_lock(); + + spin_lock(&tree->nodes_lock); + radix_tree_for_each_slot(slot, &tree->nodes, &iter, + SSDFS_BTREE_ROOT_NODE_ID) { + + node = (struct ssdfs_btree_node *)radix_tree_deref_slot(slot); + if (unlikely(!node)) { + SSDFS_WARN("empty node ptr: node_id %llu\n", + (u64)iter.index); + continue; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_ssdfs_btree_node_pre_deleted(node)) { + if (is_ssdfs_node_shared(node)) { + atomic_set(&node->state, + SSDFS_BTREE_NODE_INVALID); + continue; + } + } + + spin_unlock(&tree->nodes_lock); + rcu_read_unlock(); + + if (is_ssdfs_btree_node_pre_deleted(node)) { + clear_ssdfs_btree_node_pre_deleted(node); + + ssdfs_btree_radix_tree_delete(tree, + node->node_id); + + if (tree->btree_ops && + tree->btree_ops->delete_node) { + err = tree->btree_ops->delete_node(node); + if (unlikely(err)) { + SSDFS_ERR("delete node failure: " + "err %d\n", err); + } + } + + if (tree->btree_ops && + tree->btree_ops->destroy_node) + tree->btree_ops->destroy_node(node); + + ssdfs_btree_node_destroy(node); + } + + rcu_read_lock(); + spin_lock(&tree->nodes_lock); + } + spin_unlock(&tree->nodes_lock); + + rcu_read_unlock(); + + up_write(&tree->lock); + + return err; +} + +/* + * ssdfs_check_leaf_node_absence() - check that node is absent in the tree + * @tree: btree object + * @search: search object + * + * This method tries to detect that node is really absent before + * starting to add a new node. The tree should be exclusively locked + * for this operation in caller method. + * + * RETURN: + * [success] - tree hasn't requested node. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EFAULT - tree is corrupted. + * %-EEXIST - node exists in the tree. + */ +static +int ssdfs_check_leaf_node_absence(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("node_id %u, height %u\n", + search->node.id, search->node.height); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->node.state) { + case SSDFS_BTREE_SEARCH_ROOT_NODE_DESC: + case SSDFS_BTREE_SEARCH_FOUND_INDEX_NODE_DESC: + case SSDFS_BTREE_SEARCH_FOUND_LEAF_NODE_DESC: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid search object state: " + "search->node.state %#x\n", + search->node.state); + return -ERANGE; + } + + if (!search->node.parent) { + SSDFS_ERR("parent node is NULL\n"); + return -ERANGE; + } + + err = __ssdfs_btree_find_item(tree, search); + if (err == -ENODATA) { + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE: + /* + * node doesn't exist in the tree + */ + err = 0; + break; + + default: + /* + * existing node has free space + */ + err = -EEXIST; + break; + } + } else if (unlikely(err)) { + SSDFS_ERR("fail to find index: " + "start_hash %llx, err %d\n", + search->request.start.hash, + err); + } else + err = -EEXIST; + + return err; +} + +/* + * ssdfs_btree_define_new_node_type() - define the type of creating node + * @tree: btree object + * @search: search object + * + * This method tries to define the type of creating node. + * + * RETURN: + * [success] - type of creating node. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EFAULT - tree is corrupted. + */ +static +int ssdfs_btree_define_new_node_type(struct ssdfs_btree *tree, + struct ssdfs_btree_node *parent) +{ + int tree_height; + int parent_height; + int parent_type; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, parent %p\n", + tree, parent); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_height = atomic_read(&tree->height); + + if (tree_height <= SSDFS_BTREE_PARENT2LEAF_HEIGHT) { + /* btree contains root node only */ + return SSDFS_BTREE_LEAF_NODE; + } + + if (!parent) { + SSDFS_ERR("parent node is NULL\n"); + return -ERANGE; + } + + parent_height = atomic_read(&parent->height); + + if (parent_height == 0) { + SSDFS_ERR("invalid parent height %u\n", + parent_height); + return -ERANGE; + } + + parent_type = atomic_read(&parent->type); + switch (parent_type) { + case SSDFS_BTREE_ROOT_NODE: + switch (parent_height) { + case SSDFS_BTREE_LEAF_NODE_HEIGHT: + case SSDFS_BTREE_PARENT2LEAF_HEIGHT: + if (can_add_new_index(parent)) + return SSDFS_BTREE_LEAF_NODE; + else + return SSDFS_BTREE_HYBRID_NODE; + + case SSDFS_BTREE_PARENT2HYBRID_HEIGHT: + if (can_add_new_index(parent)) + return SSDFS_BTREE_HYBRID_NODE; + else + return SSDFS_BTREE_INDEX_NODE; + + default: + return SSDFS_BTREE_INDEX_NODE; + } + + case SSDFS_BTREE_INDEX_NODE: + switch (parent_height) { + case SSDFS_BTREE_PARENT2HYBRID_HEIGHT: + if (can_add_new_index(parent)) + return SSDFS_BTREE_HYBRID_NODE; + else + return SSDFS_BTREE_INDEX_NODE; + + case SSDFS_BTREE_PARENT2LEAF_HEIGHT: + return SSDFS_BTREE_LEAF_NODE; + + default: + return SSDFS_BTREE_INDEX_NODE; + } + + case SSDFS_BTREE_HYBRID_NODE: + return SSDFS_BTREE_LEAF_NODE; + } + + SSDFS_ERR("invalid btree node's type %#x\n", + parent_type); + return -ERANGE; +} + +/* + * ssdfs_current_segment_pre_allocate_node() - pre-allocate the node + * @node_type: type of the node + * @node: node object + * + * This method tries to pre-allocate the node + * in the current segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EFAULT - tree is corrupted. + * %-ENOSPC - volume hasn't free space. + */ +static +int ssdfs_current_segment_pre_allocate_node(int node_type, + struct ssdfs_btree_node *node) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_request *req; + struct ssdfs_segment_info *si; + u64 ino; + u64 logical_offset; + u64 seg_id; + int seg_type; + struct ssdfs_blk2off_range extent; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_type %#x\n", node_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!node) { + SSDFS_ERR("node is NULL\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node->tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + return err; + } + + ssdfs_request_init(req); + ssdfs_get_request(req); + + ino = node->tree->owner_ino; + logical_offset = (u64)node->node_id * node->node_size; + ssdfs_request_prepare_logical_extent(ino, + logical_offset, + node->node_size, + 0, 0, req); + + switch (node_type) { + case SSDFS_BTREE_INDEX_NODE: + err = ssdfs_segment_pre_alloc_index_node_extent_async(fsi, req, + &seg_id, + &extent); + break; + + case SSDFS_BTREE_HYBRID_NODE: + err = ssdfs_segment_pre_alloc_hybrid_node_extent_async(fsi, + req, + &seg_id, + &extent); + break; + + case SSDFS_BTREE_LEAF_NODE: + err = ssdfs_segment_pre_alloc_leaf_node_extent_async(fsi, req, + &seg_id, + &extent); + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid node_type %#x\n", node_type); + goto finish_pre_allocate_node; + } + + if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to pre-allocate node: " + "free space is absent\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + goto free_segment_request; + } else if (unlikely(err)) { + SSDFS_ERR("fail to pre-allocate node: err %d\n", + err); + goto free_segment_request; + } + + if (node->pages_per_node != extent.len) { + err = -ERANGE; + SSDFS_ERR("invalid request result: " + "pages_per_node %u != len %u\n", + node->pages_per_node, + extent.len); + goto finish_pre_allocate_node; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(seg_id == U64_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + seg_type = NODE2SEG_TYPE(node_type); + + si = ssdfs_grab_segment(fsi, seg_type, seg_id, U64_MAX); + if (IS_ERR_OR_NULL(si)) { + err = (si == NULL ? -ERANGE : PTR_ERR(si)); + SSDFS_ERR("fail to grab segment object: " + "err %d\n", + err); + goto finish_pre_allocate_node; + } + + spin_lock(&node->descriptor_lock); + node->seg = si; + node->extent.seg_id = cpu_to_le64(seg_id); + node->extent.logical_blk = cpu_to_le32(extent.start_lblk); + node->extent.len = cpu_to_le32(extent.len); + ssdfs_memcpy(&node->node_index.index.extent, + 0, sizeof(struct ssdfs_raw_extent), + &node->extent, + 0, sizeof(struct ssdfs_raw_extent), + sizeof(struct ssdfs_raw_extent)); + spin_unlock(&node->descriptor_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree_type %#x, node_id %u, node_type %#x, " + "seg_id %llu, logical_blk %u, len %u\n", + node->tree->type, node->node_id, node_type, + seg_id, extent.start_lblk, extent.len); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; + +free_segment_request: + ssdfs_put_request(req); + ssdfs_request_free(req); + +finish_pre_allocate_node: + return err; +} + +/* + * ssdfs_check_leaf_node_state() - check the leaf node's state + * @search: search object + * + * This method checks the leaf node's state. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EEXIST - node exists. + */ +static +int ssdfs_check_leaf_node_state(struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_node *node; + int state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); + + SSDFS_DBG("node_id %u, height %u\n", + search->node.id, search->node.height); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = search->node.state; + if (state != SSDFS_BTREE_SEARCH_FOUND_LEAF_NODE_DESC) { + SSDFS_ERR("invalid node state %#x\n", state); + return -ERANGE; + } + + if (!search->node.child) { + SSDFS_ERR("child node is NULL\n"); + return -ERANGE; + } + +check_leaf_node_state: + switch (atomic_read(&search->node.child->state)) { + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + err = -EEXIST; + break; + + case SSDFS_BTREE_NODE_CREATED: + case SSDFS_BTREE_NODE_CONTENT_PREPARED: + node = search->node.child; + err = SSDFS_WAIT_COMPLETION(&node->init_end); + if (unlikely(err)) { + SSDFS_ERR("node init failed: " + "err %d\n", err); + } else { + err = -EEXIST; + goto check_leaf_node_state; + } + break; + + default: + BUG(); + } + + return err; +} + +/* + * ssdfs_prepare_empty_btree_for_add() - prepare empty btree for adding + * @tree: btree object + * @search: search object + * @hierarchy: hierarchy object + * + * This method prepares empty btree for adding a new node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +#ifdef CONFIG_SSDFS_UNDER_DEVELOPMENT_FUNC +static +int ssdfs_prepare_empty_btree_for_add(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_hierarchy *hierarchy) +{ + struct ssdfs_btree_level *level; + struct ssdfs_btree_node *parent_node; + int cur_height, tree_height; + u64 start_hash, end_hash; + int parent_node_type; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search || !hierarchy); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, search %p, hierarchy %p\n", + tree, search, hierarchy); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_height = atomic_read(&tree->height); + if (tree_height <= 0) { + SSDFS_ERR("invalid tree_height %u\n", + tree_height); + return -ERANGE; + } + + parent_node = search->node.parent; + + if (!parent_node) { + SSDFS_ERR("parent is NULL\n"); + return -ERANGE; + } + + cur_height = search->node.height; + if (cur_height >= tree_height) { + SSDFS_ERR("cur_height %u >= tree_height %u\n", + cur_height, tree_height); + return -ERANGE; + } + + start_hash = search->request.start.hash; + end_hash = search->request.end.hash; + + parent_node_type = atomic_read(&parent_node->type); + + if (parent_node_type != SSDFS_BTREE_ROOT_NODE) { + SSDFS_ERR("corrupted hierarchy: " + "expected parent root node\n"); + return -ERANGE; + } + + if ((tree_height + 1) != hierarchy->desc.height) { + SSDFS_ERR("corrupted hierarchy: " + "tree_height %u, " + "hierarchy->desc.height %u\n", + tree_height, + hierarchy->desc.height); + return -ERANGE; + } + + if (!can_add_new_index(parent_node)) { + SSDFS_ERR("unable add index into the root\n"); + return -ERANGE; + } + + level = hierarchy->array_ptr[cur_height]; + ssdfs_btree_prepare_add_node(tree, SSDFS_BTREE_LEAF_NODE, + start_hash, end_hash, + level, NULL); + + level = hierarchy->array_ptr[cur_height + 1]; + err = ssdfs_btree_prepare_add_index(level, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } + + return 0; +} +#endif /* CONFIG_SSDFS_UNDER_DEVELOPMENT_FUNC */ + +/* + * __ssdfs_btree_read_node() - create and initialize the node + * @tree: btree object + * @parent: parent node + * @node_index: index key of preparing node + * @node_type: type of the node + * @node_id: node ID + * + * This method tries to read the node's content from the disk. + * + * RETURN: + * [success] - pointer on created node object. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EEXIST - node exists already. + */ +struct ssdfs_btree_node * +__ssdfs_btree_read_node(struct ssdfs_btree *tree, + struct ssdfs_btree_node *parent, + struct ssdfs_btree_index_key *node_index, + u8 node_type, u32 node_id) +{ + struct ssdfs_btree_node *ptr, *node; + int height; + u64 start_hash; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree %p, parent %p, " + "node_index %p, node_type %#x, node_id %llu\n", + tree, parent, node_index, + node_type, (u64)node_id); + + BUG_ON(!tree || !parent || !node_index); + BUG_ON(!rwsem_is_locked(&tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (node_type <= SSDFS_BTREE_NODE_UNKNOWN_TYPE || + node_type >= SSDFS_BTREE_NODE_TYPE_MAX) { + SSDFS_WARN("invalid node type %#x\n", + node_type); + return ERR_PTR(-ERANGE); + } + + height = atomic_read(&parent->height); + if (height <= 0) { + SSDFS_ERR("invalid height %u, node_id %u\n", + height, parent->node_id); + return ERR_PTR(-ERANGE); + } else + height -= 1; + + start_hash = le64_to_cpu(node_index->index.hash); + ptr = ssdfs_btree_node_create(tree, node_id, parent, + height, node_type, start_hash); + if (unlikely(IS_ERR_OR_NULL(ptr))) { + err = !ptr ? -ENOMEM : PTR_ERR(ptr); + SSDFS_ERR("fail to create node: err %d\n", + err); + return ptr; + } + + if (tree->btree_ops && tree->btree_ops->create_node) { + err = tree->btree_ops->create_node(ptr); + if (unlikely(err)) { + SSDFS_ERR("fail to create the node: " + "err %d\n", err); + ssdfs_btree_node_destroy(ptr); + return ERR_PTR(err); + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("NODE_INDEX: node_id %u, node_type %#x, " + "height %u, flags %#x, hash %llx, " + "seg_id %llu, logical_blk %u, len %u\n", + le32_to_cpu(node_index->node_id), + node_index->node_type, + node_index->height, + le16_to_cpu(node_index->flags), + le64_to_cpu(node_index->index.hash), + le64_to_cpu(node_index->index.extent.seg_id), + le32_to_cpu(node_index->index.extent.logical_blk), + le32_to_cpu(node_index->index.extent.len)); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&ptr->descriptor_lock); + ssdfs_memcpy(&ptr->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + spin_unlock(&ptr->descriptor_lock); + +try_find_node: + spin_lock(&tree->nodes_lock); + node = radix_tree_lookup(&tree->nodes, node_id); + spin_unlock(&tree->nodes_lock); + + if (!node) { + err = radix_tree_preload(GFP_NOFS); + if (unlikely(err)) { + SSDFS_ERR("fail to preload radix tree: err %d\n", + err); + goto finish_insert_node; + } + + spin_lock(&tree->nodes_lock); + err = radix_tree_insert(&tree->nodes, node_id, ptr); + spin_unlock(&tree->nodes_lock); + + radix_tree_preload_end(); + + if (err == -EEXIST) + goto try_find_node; + else if (unlikely(err)) { + SSDFS_ERR("fail to add node into radix tree: " + "node_id %llu, node %p, err %d\n", + (u64)node_id, ptr, err); + goto finish_insert_node; + } + } else { + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: + err = -EAGAIN; + goto finish_insert_node; + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + err = -EEXIST; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u has been found\n", + node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_insert_node; + + default: + err = -ERANGE; + SSDFS_WARN("invalid node state %#x\n", + atomic_read(&node->state)); + goto finish_insert_node; + } + } + +finish_insert_node: + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(&node->init_end); + if (unlikely(err)) { + SSDFS_ERR("node init failed: " + "err %d\n", err); + ssdfs_btree_node_destroy(ptr); + return ERR_PTR(err); + } else + goto try_find_node; + } else if (err == -EEXIST) { + ssdfs_btree_node_destroy(ptr); + return node; + } else if (unlikely(err)) { + ssdfs_btree_node_destroy(ptr); + return ERR_PTR(err); + } + + err = ssdfs_btree_node_prepare_content(ptr, node_index); + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto fail_read_node; + } else if (unlikely(err)) { + SSDFS_ERR("fail to prepare btree node's content: " + "err %d\n", err); + goto fail_read_node; + } + + if (tree->btree_ops && tree->btree_ops->init_node) { + err = tree->btree_ops->init_node(ptr); + if (unlikely(err)) { + SSDFS_ERR("fail to init btree node: " + "err %d\n", err); + goto fail_read_node; + } + } + + atomic_set(&ptr->state, SSDFS_BTREE_NODE_INITIALIZED); + complete_all(&ptr->init_end); + return ptr; + +fail_read_node: + complete_all(&ptr->init_end); + ssdfs_btree_radix_tree_delete(tree, node_id); + if (tree->btree_ops && tree->btree_ops->delete_node) + tree->btree_ops->delete_node(ptr); + if (tree->btree_ops && tree->btree_ops->destroy_node) + tree->btree_ops->destroy_node(ptr); + ssdfs_btree_node_destroy(ptr); + return ERR_PTR(err); +} + +/* + * ssdfs_btree_read_node() - create and initialize the node + * @tree: btree object + * @search: search object [in|out] + * + * This method tries to read the node's content from the disk. + * + * RETURN: + * [success] - pointer on created node object. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +struct ssdfs_btree_node * +ssdfs_btree_read_node(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search) +{ + u8 node_type; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, id %u, node_id %u, " + "hash %llx, " + "extent (seg %u, logical_blk %u, len %u)\n", + tree, + search->node.id, + le32_to_cpu(search->node.found_index.node_id), + le64_to_cpu(search->node.found_index.index.hash), + le32_to_cpu(search->node.found_index.index.extent.seg_id), + le32_to_cpu(search->node.found_index.index.extent.logical_blk), + le32_to_cpu(search->node.found_index.index.extent.len)); +#endif /* CONFIG_SSDFS_DEBUG */ + + node_type = search->node.found_index.node_type; + return __ssdfs_btree_read_node(tree, search->node.parent, + &search->node.found_index, + node_type, search->node.id); +} + +/* + * ssdfs_btree_get_child_node_for_hash() - get child node for hash + * @tree: btree object + * @parent: parent node + * @upper_hash: upper value of the hash + * + * This method tries to extract child node for the hash value. + * + * RETURN: + * [success] - pointer on the child node. + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EACCES - node is under initialization. + * %-ENOENT - index area is absent. + */ +struct ssdfs_btree_node * +ssdfs_btree_get_child_node_for_hash(struct ssdfs_btree *tree, + struct ssdfs_btree_node *parent, + u64 upper_hash) +{ + struct ssdfs_btree_node *child = ERR_PTR(-ERANGE); + struct ssdfs_btree_node_index_area area; + struct ssdfs_btree_index_key index_key; + int parent_type; + u16 found_index = U16_MAX; + u32 node_id; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !parent); + BUG_ON(upper_hash >= U64_MAX); + + SSDFS_DBG("node_id %u, upper_hash %llx\n", + parent->node_id, upper_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&parent->state)) { + case SSDFS_BTREE_NODE_CREATED: + err = -EACCES; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is under initialization\n", + parent->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return ERR_PTR(err); + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&parent->state)); + return ERR_PTR(err); + } + + if (!is_ssdfs_btree_node_index_area_exist(parent)) { + err = -ENOENT; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u hasn't index area\n", + parent->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return ERR_PTR(err); + } + + down_read(&parent->full_lock); + + parent_type = atomic_read(&parent->type); + if (parent_type <= SSDFS_BTREE_NODE_UNKNOWN_TYPE || + parent_type >= SSDFS_BTREE_NODE_TYPE_MAX) { + child = ERR_PTR(-ERANGE); + SSDFS_ERR("invalid node type %#x\n", + parent_type); + goto finish_child_search; + } + + down_read(&parent->header_lock); + ssdfs_memcpy(&area, + 0, sizeof(struct ssdfs_btree_node_index_area), + &parent->index_area, + 0, sizeof(struct ssdfs_btree_node_index_area), + sizeof(struct ssdfs_btree_node_index_area)); + err = ssdfs_find_index_by_hash(parent, &area, upper_hash, + &found_index); + up_read(&parent->header_lock); + + if (err == -EEXIST) { + /* hash == found hash */ + err = 0; + } else if (err == -ENODATA) { + child = ERR_PTR(err); +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find an index: " + "node_id %u, hash %llx\n", + parent->node_id, upper_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_child_search; + } else if (unlikely(err)) { + child = ERR_PTR(err); + SSDFS_ERR("fail to find an index: " + "node_id %u, hash %llx, err %d\n", + parent->node_id, upper_hash, + err); + goto finish_child_search; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(found_index == U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (parent_type == SSDFS_BTREE_ROOT_NODE) { + err = __ssdfs_btree_root_node_extract_index(parent, + found_index, + &index_key); + } else { + err = __ssdfs_btree_common_node_extract_index(parent, &area, + found_index, + &index_key); + } + + if (unlikely(err)) { + child = ERR_PTR(err); + SSDFS_ERR("fail to extract index: " + "node_id %u, node_type %#x, " + "found_index %u, err %d\n", + parent->node_id, parent_type, + found_index, err); + goto finish_child_search; + } + + node_id = le32_to_cpu(index_key.node_id); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, node_type %#x\n", + node_id, index_key.node_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_btree_radix_tree_find(tree, node_id, &child); + if (err == -ENOENT) { + err = 0; + child = __ssdfs_btree_read_node(tree, parent, + &index_key, + index_key.node_type, + node_id); + if (unlikely(IS_ERR_OR_NULL(child))) { + err = !child ? -ENOMEM : PTR_ERR(child); + SSDFS_ERR("fail to read: " + "node %llu, err %d\n", + (u64)node_id, err); + goto finish_child_search; + } + } else if (unlikely(err)) { + SSDFS_ERR("fail to find node in radix tree: " + "node_id %llu, err %d\n", + (u64)node_id, err); + goto finish_child_search; + } else if (!child) { + child = ERR_PTR(-ERANGE); + SSDFS_WARN("empty node pointer\n"); + goto finish_child_search; + } + +finish_child_search: + up_read(&parent->full_lock); + + if (err) { + SSDFS_ERR("node_id %u, upper_hash %llx\n", + parent->node_id, upper_hash); + SSDFS_ERR("index_area: index_count %u, index_capacity %u, " + "start_hash %llx, end_hash %llx\n", + area.index_count, area.index_capacity, + area.start_hash, area.end_hash); + ssdfs_debug_show_btree_node_indexes(parent->tree, parent); + } + + return child; +} + +/* + * ssdfs_btree_generate_node_id() - generate new node ID + * @tree: btree object + * + * It is possible to use the simple technique. The upper node ID will + * be the latest allocated ID number. Generating the node ID means + * simply increasing the upper node ID value. In the case of node deletion + * it needs to leave the empty node till the whole branch of tree will + * be deleted. The goal is to keep the upper node ID in valid state. + * And the upper node ID can be decreased if the whold branch of empty + * nodes will be deleted. + * + * + * + * RETURN: + * [success] - new node ID + * [failure] - U32_MAX + */ +u32 ssdfs_btree_generate_node_id(struct ssdfs_btree *tree) +{ + struct ssdfs_btree_node *node; + u32 node_id = U32_MAX; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); + + SSDFS_DBG("tree %p\n", tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&tree->nodes_lock); + node_id = tree->upper_node_id; + if (node_id < U32_MAX) { + node_id++; + tree->upper_node_id = node_id; + } + spin_unlock(&tree->nodes_lock); + + if (node_id == U32_MAX) { + SSDFS_DBG("node IDs are completely used\n"); + return node_id; + } + + err = ssdfs_btree_radix_tree_find(tree, + SSDFS_BTREE_ROOT_NODE_ID, + &node); + if (unlikely(err)) { + SSDFS_ERR("fail to find root node in radix tree: " + "err %d\n", err); + return U32_MAX; + } else if (!node) { + SSDFS_WARN("empty node pointer\n"); + return U32_MAX; + } + + set_ssdfs_btree_node_dirty(node); + + return node_id; +} + +/* + * ssdfs_btree_destroy_empty_node() - destroy the empty node. + * @tree: btree object + * @node: node object + * + * This method tries to destroy the empty node. + */ +static inline +void ssdfs_btree_destroy_empty_node(struct ssdfs_btree *tree, + struct ssdfs_btree_node *node) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!node) + return; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, height %u\n", + node->node_id, + atomic_read(&node->height)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (tree->btree_ops && tree->btree_ops->destroy_node) + tree->btree_ops->destroy_node(node); + + ssdfs_btree_node_destroy(node); +} + +/* + * ssdfs_btree_create_empty_node() - create empty node. + * @tree: btree object + * @cur_height: height for node creation + * @hierarchy: hierarchy object + * + * This method tries to create the empty node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_create_empty_node(struct ssdfs_btree *tree, + int cur_height, + struct ssdfs_btree_hierarchy *hierarchy) +{ + struct ssdfs_btree_level *level; + struct ssdfs_btree_node *parent = NULL, *ptr = NULL; + u32 node_id; + int node_type; + int tree_height; + u16 flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !hierarchy); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, cur_height %d\n", + tree, cur_height); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_height = atomic_read(&tree->height); + + if (cur_height > tree_height) { + SSDFS_ERR("cur_height %d > tree_height %d\n", + cur_height, tree_height); + return -ERANGE; + } + + level = hierarchy->array_ptr[cur_height]; + + if (!(level->flags & SSDFS_BTREE_LEVEL_ADD_NODE)) + return 0; + + node_id = ssdfs_btree_generate_node_id(tree); + if (node_id == SSDFS_BTREE_NODE_INVALID_ID) { + SSDFS_ERR("fail to generate node_id: err %d\n", + err); + return -ERANGE; + } + + level = hierarchy->array_ptr[cur_height + 1]; + if (level->flags & SSDFS_BTREE_LEVEL_ADD_NODE) + parent = level->nodes.new_node.ptr; + else if (level->nodes.new_node.type == SSDFS_BTREE_ROOT_NODE) + parent = level->nodes.new_node.ptr; + else + parent = level->nodes.old_node.ptr; + + if (!parent) { + SSDFS_ERR("parent is NULL\n"); + return -ERANGE; + } + + node_type = ssdfs_btree_define_new_node_type(tree, parent); + switch (node_type) { + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_LEAF_NODE: + /* expected state */ + break; + + default: + if (node_type < 0) { + SSDFS_ERR("fail to define the new node type: " + "err %d\n", err); + } else { + SSDFS_ERR("invalid node type %#x\n", + node_type); + } + return node_type < 0 ? node_type : -ERANGE; + } + + level = hierarchy->array_ptr[cur_height]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("level: flags %#x, move.direction %#x\n", + level->flags, + level->index_area.move.direction); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (level->flags & SSDFS_BTREE_INDEX_AREA_NEED_MOVE) { + switch (level->index_area.move.direction) { + case SSDFS_BTREE_MOVE_TO_RIGHT: + case SSDFS_BTREE_MOVE_TO_LEFT: + SSDFS_DBG("correct node type\n"); + /* correct node type */ + node_type = SSDFS_BTREE_INDEX_NODE; + break; + + default: + /* do nothing */ + break; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, node_type %#x\n", + node_id, node_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + level = hierarchy->array_ptr[cur_height]; + ptr = ssdfs_btree_node_create(tree, node_id, parent, cur_height, + node_type, + level->items_area.hash.start); + if (unlikely(IS_ERR_OR_NULL(ptr))) { + err = !ptr ? -ENOMEM : PTR_ERR(ptr); + SSDFS_ERR("fail to create node: err %d\n", + err); + return err; + } + + if (tree->btree_ops && tree->btree_ops->create_node) { + err = tree->btree_ops->create_node(ptr); + if (unlikely(err)) { + SSDFS_ERR("fail to create the node: " + "err %d\n", err); + goto finish_create_node; + } + } + + err = ssdfs_current_segment_pre_allocate_node(node_type, ptr); + if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to preallocate node: id %u\n", + node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_create_node; + } else if (unlikely(err)) { + SSDFS_ERR("fail to preallocate node: id %u, err %d\n", + node_id, err); + goto finish_create_node; + } + + atomic_or(SSDFS_BTREE_NODE_PRE_ALLOCATED, + &ptr->flags); + + flags = le16_to_cpu(ptr->node_index.flags); + flags |= SSDFS_BTREE_INDEX_SHOW_PREALLOCATED_CHILD; + ptr->node_index.flags = cpu_to_le16(flags); + + level->nodes.new_node.type = node_type; + level->nodes.new_node.ptr = ptr; + return 0; + +finish_create_node: + ssdfs_btree_destroy_empty_node(tree, ptr); + return err; +} + +/* + * ssdfs_btree_update_parent_node_pointer() - update child's parent's pointer + * @tree: btree object + * @parent: parent node + * + * This method tries to update parent pointer in child nodes. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_btree_update_parent_node_pointer(struct ssdfs_btree *tree, + struct ssdfs_btree_node *parent) +{ + struct ssdfs_btree_node *child = NULL; + struct ssdfs_btree_node_index_area area; + struct ssdfs_btree_index_key index_key; + int type; + u32 node_id; + spinlock_t *lock; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!parent); + + SSDFS_DBG("node_id %u, height %u\n", + parent->node_id, + atomic_read(&parent->height)); +#endif /* CONFIG_SSDFS_DEBUG */ + + type = atomic_read(&parent->type); + + switch (type) { + case SSDFS_BTREE_ROOT_NODE: + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + if (!is_ssdfs_btree_node_index_area_exist(parent)) { + SSDFS_ERR("corrupted node %u\n", + parent->node_id); + return -ERANGE; + } + break; + + case SSDFS_BTREE_LEAF_NODE: + /* do nothing */ + return 0; + + default: + BUG(); + } + + if (is_ssdfs_btree_node_index_area_empty(parent)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u has empty index area\n", + parent->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + down_read(&parent->full_lock); + + down_read(&parent->header_lock); + ssdfs_memcpy(&area, + 0, sizeof(struct ssdfs_btree_node_index_area), + &parent->index_area, + 0, sizeof(struct ssdfs_btree_node_index_area), + sizeof(struct ssdfs_btree_node_index_area)); + up_read(&parent->header_lock); + + for (i = 0; i < area.index_count; i++) { + if (type == SSDFS_BTREE_ROOT_NODE) { + err = __ssdfs_btree_root_node_extract_index(parent, + i, + &index_key); + } else { + err = __ssdfs_btree_common_node_extract_index(parent, + &area, i, + &index_key); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to extract index key: " + "index_position %d, err %d\n", + i, err); + goto finish_index_processing; + } + + node_id = le32_to_cpu(index_key.node_id); + + if (node_id == parent->node_id) { + /* + * Hybrid node contains index on itself + * in the index area. Ignore this node_id. + */ + if (type != index_key.node_type) { + SSDFS_WARN("type %#x != node_type %#x\n", + type, index_key.node_type); + } + continue; + } + + up_read(&parent->full_lock); + + err = ssdfs_btree_radix_tree_find(tree, node_id, &child); + + if (!child) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("empty node pointer: " + "node_id %u\n", node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + if (!err) { + lock = &child->descriptor_lock; + spin_lock(lock); + child->parent_node = parent; + spin_unlock(lock); + lock = NULL; + } + + down_read(&parent->full_lock); + + if (err == -ENOENT) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is absent\n", + node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + continue; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find node in radix tree: " + "node_id %llu, err %d\n", + (u64)node_id, err); + goto finish_index_processing; + } + } + +finish_index_processing: + up_read(&parent->full_lock); + + if (unlikely(err)) + ssdfs_debug_show_btree_node_indexes(tree, parent); + + return err; +} + +/* + * ssdfs_btree_process_hierarchy_for_add_nolock() - process hierarchy for add + * @tree: btree object + * @search: search object [in|out] + * @hierarchy: hierarchy object [in|out] + * + * This method tries to add a node into the tree with the goal + * to increase capacity of items in the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_process_hierarchy_for_add_nolock(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_hierarchy *hierarchy) +{ + struct ssdfs_btree_level *level; + struct ssdfs_btree_node *node; + int cur_height, tree_height; + u32 node_id; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !tree->fsi || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, start_hash %llx\n", + tree, search->request.start.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_height = atomic_read(&tree->height); + if (tree_height <= 0) { + SSDFS_ERR("invalid tree_height %d\n", + tree_height); + return -ERANGE; + } + + for (cur_height = tree_height; cur_height >= 0; cur_height--) { + level = hierarchy->array_ptr[cur_height]; + + if (!need_add_node(level)) + continue; + + err = ssdfs_btree_create_empty_node(tree, cur_height, + hierarchy); + if (err) { + if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to create empty node: " + "err %d\n", + err); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + SSDFS_ERR("fail to create empty node: " + "err %d\n", + err); + } + + for (cur_height++; cur_height <= tree_height; cur_height++) { + if (!need_add_node(level)) + continue; + + node = level->nodes.new_node.ptr; + + if (!node) + continue; + + node_id = node->node_id; + ssdfs_btree_radix_tree_delete(tree, node_id); + ssdfs_btree_destroy_empty_node(tree, node); + } + + goto finish_create_node; + } + + node = level->nodes.new_node.ptr; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_btree_radix_tree_insert(tree, node->node_id, + node); + if (unlikely(err)) { + SSDFS_ERR("fail to insert node %u into radix tree: " + "err %d\n", + node->node_id, err); + + for (; cur_height < tree_height; cur_height++) { + level = hierarchy->array_ptr[cur_height]; + + if (!need_add_node(level)) + continue; + + node = level->nodes.new_node.ptr; + node_id = node->node_id; + ssdfs_btree_radix_tree_delete(tree, node_id); + ssdfs_btree_destroy_empty_node(tree, node); + } + + goto finish_create_node; + } + + set_ssdfs_btree_node_dirty(node); + } + + cur_height = 0; + for (; cur_height < hierarchy->desc.height; cur_height++) { + err = ssdfs_btree_process_level_for_add(hierarchy, cur_height, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to process the tree's level: " + "cur_height %u, err %d\n", + cur_height, err); + goto finish_create_node; + } + } + + for (cur_height = tree_height; cur_height >= 0; cur_height--) { + level = hierarchy->array_ptr[cur_height]; + + if (!need_add_node(level)) + continue; + + node = level->nodes.new_node.ptr; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (tree->btree_ops && tree->btree_ops->add_node) { + err = tree->btree_ops->add_node(node); + if (unlikely(err)) { + SSDFS_ERR("fail to add the node: " + "err %d\n", err); + + for (; cur_height < tree_height; cur_height++) { + level = hierarchy->array_ptr[cur_height]; + + if (!need_add_node(level)) + continue; + + node = level->nodes.new_node.ptr; + node_id = node->node_id; + ssdfs_btree_radix_tree_delete(tree, + node_id); + if (tree->btree_ops && + tree->btree_ops->delete_node) { + tree->btree_ops->delete_node(node); + } + ssdfs_btree_destroy_empty_node(tree, + node); + } + + goto finish_create_node; + } + } + } + + if (hierarchy->desc.increment_height) { + /* increase tree's height */ + atomic_inc(&tree->height); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree->height %d\n", + atomic_read(&tree->height)); +#endif /* CONFIG_SSDFS_DEBUG */ + + atomic_set(&tree->state, SSDFS_BTREE_DIRTY); + +finish_create_node: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * __ssdfs_btree_add_node() - add a node into the btree + * @tree: btree object + * @search: search object [in|out] + * + * This method tries to add a node into the tree with the goal + * to increase capacity of items in the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-EEXIST - node exists already. + */ +static +int __ssdfs_btree_add_node(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_hierarchy *hierarchy; + struct ssdfs_btree_level *level; + struct ssdfs_btree_node *node; + struct ssdfs_btree_node *parent_node; + int cur_height, tree_height; +#define SSDFS_BTREE_MODIFICATION_PHASE_MAX (3) + int phase_id; + spinlock_t *lock; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !tree->fsi || !search); + + SSDFS_DBG("tree %p, start_hash %llx\n", + tree, search->request.start.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->node.state) { + case SSDFS_BTREE_SEARCH_ROOT_NODE_DESC: + case SSDFS_BTREE_SEARCH_FOUND_INDEX_NODE_DESC: + case SSDFS_BTREE_SEARCH_FOUND_LEAF_NODE_DESC: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node state %#x\n", + search->node.state); + return -ERANGE; + } + + if (!search->node.parent) { + SSDFS_ERR("parent node is NULL\n"); + return -ERANGE; + } + + tree_height = atomic_read(&tree->height); + if (tree_height <= 0) { + SSDFS_ERR("invalid tree_height %d\n", + tree_height); + return -ERANGE; + } + + hierarchy = ssdfs_btree_hierarchy_allocate(tree); + if (IS_ERR_OR_NULL(hierarchy)) { + err = !hierarchy ? -ENOMEM : PTR_ERR(hierarchy); + SSDFS_ERR("fail to allocate tree levels' array: " + "err %d\n", err); + return err; + } + + down_write(&tree->lock); + + err = ssdfs_check_leaf_node_absence(tree, search); + if (err == -EEXIST) { + up_write(&tree->lock); + SSDFS_DBG("new node has been added\n"); + return ssdfs_check_leaf_node_state(search); + } else if (unlikely(err)) { + SSDFS_ERR("fail to check leaf node absence: " + "err %d\n", err); + goto finish_create_node; + } + + err = ssdfs_btree_check_hierarchy_for_add(tree, search, hierarchy); + + phase_id = 0; + while (err == -EAGAIN) { + if (phase_id > SSDFS_BTREE_MODIFICATION_PHASE_MAX) { + err = -ERANGE; + SSDFS_WARN("too many phases of modification\n"); + goto finish_create_node; + } + + err = ssdfs_btree_process_hierarchy_for_add_nolock(tree, + search, + hierarchy); + if (unlikely(err)) { + SSDFS_ERR("fail to process hierarchy for add: " + "err %d\n", err); + goto finish_create_node; + } + + ssdfs_btree_hierarchy_free(hierarchy); + + hierarchy = ssdfs_btree_hierarchy_allocate(tree); + if (IS_ERR_OR_NULL(hierarchy)) { + err = !hierarchy ? -ENOMEM : PTR_ERR(hierarchy); + SSDFS_ERR("fail to allocate tree levels' array: " + "err %d\n", err); + goto finish_create_node; + } + + /* correct parent node */ + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + lock = &search->node.child->descriptor_lock; + spin_lock(lock); + parent_node = search->node.child->parent_node; + spin_unlock(lock); + lock = NULL; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!parent_node); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_btree_search_define_parent_node(search, parent_node); + + err = ssdfs_btree_check_hierarchy_for_add(tree, search, + hierarchy); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare info about hierarchy: " + "err %d\n", err); + goto finish_create_node; + } + + phase_id++; + } + + if (unlikely(err)) { + SSDFS_ERR("fail to prepare information about hierarchy: " + "err %d\n", err); + goto finish_create_node; + } + + err = ssdfs_btree_process_hierarchy_for_add_nolock(tree, search, + hierarchy); + if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to process hierarchy for add: " + "err %d\n", err); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_create_node; + } else if (unlikely(err)) { + SSDFS_ERR("fail to process hierarchy for add: " + "err %d\n", err); + goto finish_create_node; + } + + up_write(&tree->lock); + + if (search->node.parent) + complete_all(&search->node.parent->init_end); + + tree_height = atomic_read(&tree->height); + for (cur_height = 0; cur_height < tree_height; cur_height++) { + level = hierarchy->array_ptr[cur_height]; + + if (!need_add_node(level)) + continue; + + node = level->nodes.new_node.ptr; + complete_all(&node->init_end); + } + + ssdfs_btree_hierarchy_free(hierarchy); + + search->result.err = 0; + search->node.state = SSDFS_BTREE_SEARCH_NODE_DESC_EMPTY; + search->result.state = SSDFS_BTREE_SEARCH_UNKNOWN_RESULT; + return 0; + +finish_create_node: + up_write(&tree->lock); + + if (search->node.parent) + complete_all(&search->node.parent->init_end); + + if (err != -ENOSPC) + ssdfs_show_btree_hierarchy_object(hierarchy); + + ssdfs_btree_hierarchy_free(hierarchy); + + search->result.err = err; + search->result.state = SSDFS_BTREE_SEARCH_FAILURE; + return err; +} + +/* + * ssdfs_btree_node_convert_index2id() - convert index into node ID + * @tree: btree object + * @search: search object [in|out] + */ +static inline +int ssdfs_btree_node_convert_index2id(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search) +{ + u32 id; + u8 height; + u8 tree_height; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); +#endif /* CONFIG_SSDFS_DEBUG */ + + id = le32_to_cpu(search->node.found_index.node_id); + height = search->node.found_index.height; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, height %u\n", + id, height); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (id == SSDFS_BTREE_NODE_INVALID_ID) { + SSDFS_ERR("invalid node_id\n"); + return -ERANGE; + } + + tree_height = atomic_read(&tree->height); + + if (height >= tree_height) { + SSDFS_ERR("height %u >= tree->height %u\n", + height, tree_height); + return -ERANGE; + } + + search->node.id = id; + search->node.height = height; + return 0; +} + +/* + * ssdfs_btree_find_right_sibling_leaf_node() - find sibling right leaf node + * @tree: btree object + * @node: btree node object + * @search: search object [in|out] + */ +static +int ssdfs_btree_find_right_sibling_leaf_node(struct ssdfs_btree *tree, + struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_node *parent_node = NULL; + struct ssdfs_btree_node_index_area area; + struct ssdfs_btree_index_key index_key; + spinlock_t *lock; + size_t desc_len = sizeof(struct ssdfs_btree_node_index_area); + u64 start_hash = U64_MAX, end_hash = U64_MAX; + u64 search_hash; + u16 items_count, items_capacity; + u16 index_count = 0; + u16 index_capacity = 0; + u16 index_position; + int node_type; + u32 node_id; + bool is_found = false; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !node || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("node_id %u, start_hash %llx\n", + node->node_id, search->request.start.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + lock = &node->descriptor_lock; + spin_lock(lock); + search_hash = le64_to_cpu(node->node_index.index.hash); + node = node->parent_node; + spin_unlock(lock); + lock = NULL; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->header_lock); + + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + index_count = node->index_area.index_count; + index_capacity = node->index_area.index_capacity; + break; + + default: + err = -ERANGE; + break; + } + + up_read(&node->header_lock); + + if (unlikely(err)) { + SSDFS_ERR("index area is absent\n"); + return -ERANGE; + } + + err = ssdfs_btree_node_find_index_position(node, search_hash, + &index_position); + if (unlikely(err)) { + SSDFS_ERR("fail to find the index position: " + "search_hash %llx, err %d\n", + search_hash, err); + return err; + } + + index_position++; + + if (index_position >= index_count) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index_position %u >= index_count %u\n", + index_position, index_count); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOENT; + } + + node_type = atomic_read(&node->type); + + down_read(&node->full_lock); + + if (node_type == SSDFS_BTREE_ROOT_NODE) { + err = __ssdfs_btree_root_node_extract_index(node, + index_position, + &index_key); + } else { + down_read(&node->header_lock); + ssdfs_memcpy(&area, 0, desc_len, + &node->index_area, 0, desc_len, + desc_len); + up_read(&node->header_lock); + + err = __ssdfs_btree_common_node_extract_index(node, + &area, + index_position, + &index_key); + } + + up_read(&node->full_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to extract index key: " + "index_position %u, err %d\n", + index_position, err); + ssdfs_debug_show_btree_node_indexes(tree, node); + return err; + } + + parent_node = node; + node_id = le32_to_cpu(index_key.node_id); + + err = ssdfs_btree_radix_tree_find(tree, node_id, &node); + if (err == -ENOENT) { + err = 0; + node = __ssdfs_btree_read_node(tree, parent_node, + &index_key, + index_key.node_type, + node_id); + if (unlikely(IS_ERR_OR_NULL(node))) { + err = !node ? -ENOMEM : PTR_ERR(node); + SSDFS_ERR("fail to read: " + "node %llu, err %d\n", + (u64)node_id, err); + return err; + } + } else if (unlikely(err)) { + SSDFS_ERR("fail to find node in radix tree: " + "node_id %llu, err %d\n", + (u64)node_id, err); + return err; + } else if (!node) { + SSDFS_WARN("empty node pointer\n"); + return -ERANGE; + } + + ssdfs_btree_search_define_parent_node(search, parent_node); + ssdfs_btree_search_define_child_node(search, node); + + ssdfs_memcpy(&search->node.found_index, + 0, sizeof(struct ssdfs_btree_index_key), + &index_key, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + + err = ssdfs_btree_node_convert_index2id(tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to convert index to ID: " + "err %d\n", err); + return err; + } + + if (!is_btree_leaf_node_found(search)) { + SSDFS_ERR("leaf node hasn't been found\n"); + return -ERANGE; + } + + node = search->node.child; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_WARN("corrupted node %u\n", + node->node_id); + return -ERANGE; + } + + down_read(&node->header_lock); + start_hash = node->items_area.start_hash; + end_hash = node->items_area.end_hash; + items_count = node->items_area.items_count; + items_capacity = node->items_area.items_capacity; + up_read(&node->header_lock); + + if (start_hash == U64_MAX || end_hash == U64_MAX) { + SSDFS_ERR("invalid items area's hash range: " + "start_hash %llx, end_hash %llx\n", + start_hash, end_hash); + return -ERANGE; + } + + is_found = start_hash <= search->request.start.hash && + search->request.start.hash <= end_hash; + + if (!is_found && items_count >= items_capacity) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node (start_hash %llx, end_hash %llx), " + "request (start_hash %llx, end_hash %llx), " + "items_count %u, items_capacity %u\n", + start_hash, end_hash, + search->request.start.hash, + search->request.end.hash, + items_count, items_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOENT; + } + + return 0; +} + +/* + * ssdfs_btree_check_found_leaf_node() - check found leaf node + * @tree: btree object + * @search: search object [in|out] + */ +static +int ssdfs_btree_check_found_leaf_node(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_node *node = NULL; + u64 start_hash = U64_MAX, end_hash = U64_MAX; + u16 items_count, items_capacity; + bool is_found = false; + bool is_right_adjacent = false; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, start_hash %llx\n", + tree, search->request.start.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_btree_leaf_node_found(search)) { + SSDFS_ERR("leaf node hasn't been found\n"); + return -EINVAL; + } + + node = search->node.child; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_WARN("corrupted node %u\n", + node->node_id); + return -ERANGE; + } + + down_read(&node->header_lock); + start_hash = node->items_area.start_hash; + end_hash = node->items_area.end_hash; + items_count = node->items_area.items_count; + items_capacity = node->items_area.items_capacity; + up_read(&node->header_lock); + + if (start_hash == U64_MAX || end_hash == U64_MAX) { + SSDFS_ERR("invalid items area's hash range: " + "start_hash %llx, end_hash %llx\n", + start_hash, end_hash); + return -ERANGE; + } + + is_found = start_hash <= search->request.start.hash && + search->request.start.hash <= end_hash; + is_right_adjacent = search->request.start.hash > end_hash; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node (start_hash %llx, end_hash %llx), " + "request (start_hash %llx, end_hash %llx), " + "is_found %#x, is_right_adjacent %#x, " + "items_count %u, items_capacity %u\n", + start_hash, end_hash, + search->request.start.hash, + search->request.end.hash, + is_found, is_right_adjacent, + items_count, items_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (node->tree->type) { + case SSDFS_INODES_BTREE: + if (!is_found) { + SSDFS_DBG("unable to find leaf node\n"); + goto unable_find_leaf_node; + } + break; + + default: + if (!is_found && items_count >= items_capacity) { + if (!is_right_adjacent) + goto unable_find_leaf_node; + + err = ssdfs_btree_find_right_sibling_leaf_node(tree, + node, + search); + if (err == -ENOENT) { + SSDFS_DBG("unable to find leaf node\n"); + goto unable_find_leaf_node; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find leaf node: " + "node %u, err %d\n", + node->node_id, err); + return err; + } + } + break; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found leaf node: " + "node_id %u, start_hash %llx, " + "end_hash %llx, search_hash %llx\n", + node->node_id, start_hash, end_hash, + search->request.start.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; + +unable_find_leaf_node: + search->node.state = SSDFS_BTREE_SEARCH_FOUND_INDEX_NODE_DESC; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find a leaf node: " + "start_hash %llx, end_hash %llx, " + "search_hash %llx\n", + start_hash, end_hash, + search->request.start.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + return -ENOENT; +} + +/* + * ssdfs_btree_find_leaf_node() - find a leaf node in the tree + * @tree: btree object + * @search: search object [in|out] + * + * This method tries to find a leaf node for the requested + * start hash and end hash pair. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EEXIST - try the old search result. + * %-ENOENT - leaf node hasn't been found. + */ +static +int ssdfs_btree_find_leaf_node(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_node *node; + u8 upper_height; + u8 prev_height; + u64 start_hash = U64_MAX, end_hash = U64_MAX; + bool is_found = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, start_hash %llx\n", + tree, search->request.start.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search->node.state == SSDFS_BTREE_SEARCH_FOUND_LEAF_NODE_DESC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try to use old search result: " + "node_id %llu, height %u\n", + (u64)search->node.id, search->node.height); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EEXIST; + } + + if (search->request.start.hash == U64_MAX || + search->request.end.hash == U64_MAX) { + SSDFS_ERR("invalid hash range in the request: " + "start_hash %llx, end_hash %llx\n", + search->request.start.hash, + search->request.end.hash); + return -ERANGE; + } + + upper_height = atomic_read(&tree->height); + if (upper_height <= 0) { + SSDFS_ERR("invalid tree height %u\n", + upper_height); + return -ERANGE; + } else + upper_height--; + + search->node.id = SSDFS_BTREE_ROOT_NODE_ID; + search->node.height = upper_height; + search->node.state = SSDFS_BTREE_SEARCH_ROOT_NODE_DESC; + + do { + unsigned long prev_id = search->node.id; + int node_height; + int node_type; + prev_height = search->node.height; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, hash %llx\n", + search->node.id, + search->request.start.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_btree_search_define_parent_node(search, + search->node.child); + + err = ssdfs_btree_radix_tree_find(tree, search->node.id, + &node); + if (err == -ENOENT) { + err = 0; + node = ssdfs_btree_read_node(tree, search); + if (unlikely(IS_ERR_OR_NULL(node))) { + err = !node ? -ENOMEM : PTR_ERR(node); + SSDFS_ERR("fail to read: " + "node %llu, err %d\n", + (u64)search->node.id, err); + goto finish_search_leaf_node; + } + } else if (unlikely(err)) { + SSDFS_ERR("fail to find node in radix tree: " + "node_id %llu, err %d\n", + (u64)search->node.id, err); + goto finish_search_leaf_node; + } else if (!node) { + err = -ERANGE; + SSDFS_WARN("empty node pointer\n"); + goto finish_search_leaf_node; + } + + ssdfs_btree_search_define_child_node(search, node); + node_height = atomic_read(&node->height); + + if (search->node.height != node_height) { + err = -ERANGE; + SSDFS_WARN("search->height %u != height %u\n", + search->node.height, + node_height); + goto finish_search_leaf_node; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(node_height >= U8_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->node.height = (u8)node_height; + + if (node_height == SSDFS_BTREE_LEAF_NODE_HEIGHT) { + if (upper_height == SSDFS_BTREE_LEAF_NODE_HEIGHT) { + /* there is only root node */ + search->node.state = + SSDFS_BTREE_SEARCH_ROOT_NODE_DESC; + } else { + search->node.state = + SSDFS_BTREE_SEARCH_FOUND_LEAF_NODE_DESC; + } + goto check_found_node; + } + + down_read(&node->header_lock); + start_hash = node->index_area.start_hash; + end_hash = node->index_area.end_hash; + up_read(&node->header_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, start_hash %llx, " + "end_hash %llx\n", + node->node_id, start_hash, + end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + node_type = atomic_read(&node->type); + if (node_type == SSDFS_BTREE_HYBRID_NODE) { + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + err = -ERANGE; + SSDFS_WARN("corrupted node %u\n", + node->node_id); + goto finish_search_leaf_node; + } + + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + /* expected state */ + break; + + default: + err = -ERANGE; + SSDFS_WARN("corrupted node %u\n", + node->node_id); + goto finish_search_leaf_node; + } + + down_read(&node->header_lock); + start_hash = node->items_area.start_hash; + end_hash = node->items_area.end_hash; + is_found = start_hash <= search->request.start.hash && + search->request.start.hash <= end_hash; + up_read(&node->header_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, start_hash %llx, " + "end_hash %llx, is_found %#x\n", + node->node_id, start_hash, + end_hash, is_found); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (start_hash < U64_MAX && end_hash == U64_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid items area's hash range: " + "start_hash %llx, end_hash %llx\n", + start_hash, end_hash); + goto finish_search_leaf_node; + } + + if (is_found) { + search->node.state = + SSDFS_BTREE_SEARCH_FOUND_LEAF_NODE_DESC; + goto check_found_node; + } else if (search->request.start.hash > end_hash) { + /* + * Hybrid node is exausted already. + * It needs to use this node as + * starting point for adding a new node. + */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, " + "request.start.hash %llx, " + "end_hash %llx\n", + node->node_id, + search->request.start.hash, + end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->node.state = + SSDFS_BTREE_SEARCH_FOUND_LEAF_NODE_DESC; + goto check_found_node; + } + } + +try_find_index: + err = ssdfs_btree_node_find_index(search); + if (err == -ENODATA) { + err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find node index: " + "node_state %#x, node_id %llu, " + "height %u\n", + search->node.state, + (u64)search->node.id, + search->node.height); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (upper_height == 0) { + search->node.state = + SSDFS_BTREE_SEARCH_ROOT_NODE_DESC; + } else { + search->node.state = + SSDFS_BTREE_SEARCH_FOUND_LEAF_NODE_DESC; + } + goto check_found_node; + } else if (err == -EACCES) { + err = SSDFS_WAIT_COMPLETION(&node->init_end); + if (unlikely(err)) { + SSDFS_ERR("node init failed: " + "err %d\n", err); + goto finish_search_leaf_node; + } else + goto try_find_index; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find index: " + "start_hash %llx, err %d\n", + search->request.start.hash, + err); + goto finish_search_leaf_node; + } + + err = ssdfs_btree_node_convert_index2id(tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to convert index to ID: " + "err %d\n", err); + goto finish_search_leaf_node; + } + + search->node.state = SSDFS_BTREE_SEARCH_FOUND_INDEX_NODE_DESC; + + if (!is_btree_index_search_request_valid(search, + prev_id, + prev_height)) { + err = -ERANGE; + SSDFS_ERR("invalid index search request: " + "prev_id %llu, prev_height %u, " + "id %llu, height %u\n", + (u64)prev_id, prev_height, + (u64)search->node.id, + search->node.height); + goto finish_search_leaf_node; + } + } while (prev_height > SSDFS_BTREE_LEAF_NODE_HEIGHT); + +check_found_node: + if (search->node.state == SSDFS_BTREE_SEARCH_ROOT_NODE_DESC) { + err = -ENOENT; + ssdfs_btree_search_define_parent_node(search, + search->node.child); + ssdfs_btree_search_define_child_node(search, NULL); + SSDFS_DBG("btree has empty root node\n"); + goto finish_search_leaf_node; + } else if (is_btree_leaf_node_found(search)) { + err = ssdfs_btree_check_found_leaf_node(tree, search); + if (err) + goto finish_search_leaf_node; + } else { + err = -ENOENT; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("invalid leaf node descriptor: " + "node_state %#x, node_id %llu, " + "height %u\n", + search->node.state, + (u64)search->node.id, + search->node.height); +#endif /* CONFIG_SSDFS_DEBUG */ + } + +finish_search_leaf_node: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node descriptor: " + "node_state %#x, node_id %llu, " + "height %u\n", + search->node.state, + (u64)search->node.id, + search->node.height); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_btree_add_node() - add a node into the btree + * @tree: btree object + * @search: search object [in|out] + * + * This method tries to add a node into the tree with the goal + * to increase capacity of items in the tree. It means that + * the new leaf node should be added into the tail of leaf + * nodes' chain. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EFAULT - tree is corrupted. + * %-ENOSPC - unable to add the new node. + */ +int ssdfs_btree_add_node(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !tree->fsi || !search); + + SSDFS_DBG("tree %p, start_hash %llx\n", + tree, search->request.start.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->state)) { + case SSDFS_BTREE_CREATED: + case SSDFS_BTREE_DIRTY: + /* expected state */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + SSDFS_WARN("invalid tree state %#x\n", + atomic_read(&tree->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + fsi = tree->fsi; + + err = ssdfs_reserve_free_pages(fsi, tree->pages_per_node, + SSDFS_METADATA_PAGES); + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to add the new node: " + "pages_per_node %u, err %d\n", + tree->pages_per_node, err); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } + + down_read(&tree->lock); + err = ssdfs_btree_find_leaf_node(tree, search); + up_read(&tree->lock); + + if (!err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found leaf node %u\n", + search->node.id); +#endif /* CONFIG_SSDFS_DEBUG */ + return ssdfs_check_leaf_node_state(search); + } else if (err == -ENOENT) { + /* + * Parent node was found. + */ + err = 0; + } else { + err = -ERANGE; + SSDFS_ERR("fail to define the parent node: " + "hash %llx, err %d\n", + search->request.start.hash, + err); + return err; + } + + err = __ssdfs_btree_add_node(tree, search); + if (err == -EEXIST) { + SSDFS_DBG("node has been added\n"); + } else if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to add a new node: err %d\n", + err); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to add a new node: err %d\n", + err); + } + + ssdfs_debug_btree_object(tree); + ssdfs_check_btree_consistency(tree); + + return err; +} + +/* + * ssdfs_btree_insert_node() - insert a node into the btree + * @tree: btree object + * @search: search object [in|out] + * + * This method tries to insert a node into the tree for + * the requested hash value. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EFAULT - tree is corrupted. + * %-ENOSPC - unable to insert the new node. + */ +int ssdfs_btree_insert_node(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !tree->fsi || !search); + + SSDFS_DBG("tree %p, start_hash %llx\n", + tree, search->request.start.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->state)) { + case SSDFS_BTREE_CREATED: + case SSDFS_BTREE_DIRTY: + /* expected state */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + SSDFS_WARN("invalid tree state %#x\n", + atomic_read(&tree->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + fsi = tree->fsi; + + err = ssdfs_reserve_free_pages(fsi, tree->pages_per_node, + SSDFS_METADATA_PAGES); + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to add the new node: " + "pages_per_node %u, err %d\n", + tree->pages_per_node, err); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } + + err = __ssdfs_btree_add_node(tree, search); + if (err == -EEXIST) + SSDFS_DBG("node has been added\n"); + else if (unlikely(err)) { + SSDFS_ERR("fail to add a new node: err %d\n", + err); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); + + ssdfs_debug_btree_object(tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_check_btree_consistency(tree); + + return err; +} + +/* + * ssdfs_segment_invalidate_node() - invalidate the node in the segment + * @node: node object + * + * This method tries to invalidate the node + * in the current segment. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EFAULT - tree is corrupted. + */ +static +int ssdfs_segment_invalidate_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_segment_info *seg; + u32 start_blk; + u32 len; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&node->descriptor_lock); + start_blk = le32_to_cpu(node->extent.logical_blk); + len = le32_to_cpu(node->extent.len); + seg = node->seg; + spin_unlock(&node->descriptor_lock); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!seg); + + SSDFS_DBG("node_id %u, seg_id %llu, start_blk %u, len %u\n", + node->node_id, seg->seg_id, start_blk, len); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_segment_invalidate_logical_extent(seg, start_blk, len); + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate node: " + "node_id %u, seg_id %llu, " + "start_blk %u, len %u\n", + node->node_id, seg->seg_id, + start_blk, len); + } + + return 0; +} + +/* + * ssdfs_btree_delete_index_in_parent_node() - delete index in parent node + * @tree: btree object + * @search: search object + * + * This method tries to delete the index records in all parent nodes. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EFAULT - tree is corrupted. + */ +static +int ssdfs_btree_delete_index_in_parent_node(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_hierarchy *hierarchy; + struct ssdfs_btree_level *level; + struct ssdfs_btree_node *node; + int cur_height, tree_height; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !tree->fsi || !search); + + SSDFS_DBG("tree %p, start_hash %llx\n", + tree, search->request.start.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->state)) { + case SSDFS_BTREE_CREATED: + case SSDFS_BTREE_DIRTY: + /* expected state */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + SSDFS_WARN("invalid tree state %#x\n", + atomic_read(&tree->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + switch (search->node.state) { + case SSDFS_BTREE_SEARCH_FOUND_LEAF_NODE_DESC: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node state %#x\n", + search->node.state); + return -ERANGE; + } + + if (!search->node.child) { + SSDFS_ERR("child node is NULL\n"); + return -ERANGE; + } + + if (!search->node.parent) { + SSDFS_ERR("parent node is NULL\n"); + return -ERANGE; + } + + tree_height = atomic_read(&tree->height); + if (tree_height <= 0) { + SSDFS_ERR("invalid tree_height %u\n", + tree_height); + return -ERANGE; + } + + hierarchy = ssdfs_btree_hierarchy_allocate(tree); + if (IS_ERR_OR_NULL(hierarchy)) { + err = !hierarchy ? -ENOMEM : PTR_ERR(hierarchy); + SSDFS_ERR("fail to allocate tree levels' array: " + "err %d\n", err); + return err; + } + + err = ssdfs_btree_check_hierarchy_for_delete(tree, search, hierarchy); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare information about hierarchy: " + "err %d\n", + err); + goto finish_delete_index; + } + + for (cur_height = 0; cur_height < tree_height; cur_height++) { + err = ssdfs_btree_process_level_for_delete(hierarchy, + cur_height, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to process the tree's level: " + "cur_height %u, err %d\n", + cur_height, err); + goto finish_delete_index; + } + } + + for (cur_height = 0; cur_height < (tree_height - 1); cur_height++) { + level = hierarchy->array_ptr[cur_height]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_height %d, tree_height %d\n", + cur_height, tree_height); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!need_delete_node(level)) + continue; + + node = level->nodes.old_node.ptr; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + SSDFS_DBG("node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + set_ssdfs_btree_node_pre_deleted(node); + + err = ssdfs_segment_invalidate_node(node); + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate node: id %u, err %d\n", + node->node_id, err); + } + } + + atomic_set(&tree->state, SSDFS_BTREE_DIRTY); + ssdfs_btree_hierarchy_free(hierarchy); + + ssdfs_btree_search_define_child_node(search, NULL); + search->result.err = 0; + search->result.state = SSDFS_BTREE_SEARCH_UNKNOWN_RESULT; + return 0; + +finish_delete_index: + ssdfs_btree_hierarchy_free(hierarchy); + + search->result.err = err; + search->result.state = SSDFS_BTREE_SEARCH_FAILURE; + return err; +} + +/* + * ssdfs_btree_delete_node() - delete the node from the btree + * @tree: btree object + * @search: search object [in|out] + * + * This method tries to delete a node from the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EFAULT - cannot delete the node. + * %-EBUSY - node has several owners. + */ +int ssdfs_btree_delete_node(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_node *node; + u16 items_count; + u16 items_capacity; + u16 index_count; + u16 index_capacity; + bool cannot_delete = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + + SSDFS_DBG("tree %p, start_hash %llx\n", + tree, search->request.start.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->state)) { + case SSDFS_BTREE_CREATED: + case SSDFS_BTREE_DIRTY: + /* expected state */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + SSDFS_WARN("invalid tree state %#x\n", + atomic_read(&tree->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + if (search->result.state != SSDFS_BTREE_SEARCH_PLEASE_DELETE_NODE) { + SSDFS_ERR("invalid search->result.state %#x\n", + search->result.state); + return -ERANGE; + } + + switch (search->node.state) { + case SSDFS_BTREE_SEARCH_FOUND_INDEX_NODE_DESC: + case SSDFS_BTREE_SEARCH_FOUND_LEAF_NODE_DESC: + /* expected state */ + break; + + case SSDFS_BTREE_SEARCH_ROOT_NODE_DESC: + SSDFS_ERR("fail to delete root node\n"); + return -ERANGE; + + default: + BUG(); + } + + if (!search->node.child) { + SSDFS_ERR("child node pointer is NULL\n"); + return -ERANGE; + } + + if (!search->node.parent) { + SSDFS_ERR("parent node pointer is NULL\n"); + return -ERANGE; + } + + node = search->node.child; + + if (node->node_id != search->node.id || + atomic_read(&node->height) != search->node.height) { + SSDFS_ERR("corrupted search object: " + "node->node_id %u, search->node.id %u, " + "node->height %u, search->node.height %u\n", + node->node_id, search->node.id, + atomic_read(&node->height), + search->node.height); + return -ERANGE; + } + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node state: id %u, state %#x\n", + node->node_id, + atomic_read(&node->state)); + return -ERANGE; + } + + down_read(&node->header_lock); + items_count = node->items_area.items_count; + items_capacity = node->items_area.items_capacity; + index_count = node->index_area.index_count; + index_capacity = node->index_area.index_capacity; + if (items_count != 0) + cannot_delete = true; + if (index_count != 0) + cannot_delete = true; + up_read(&node->header_lock); + + if (cannot_delete) { + SSDFS_ERR("node has content in index/items area: " + "items_count %u, items_capacity %u, " + "index_count %u, index_capacity %u\n", + items_count, items_capacity, + index_count, index_capacity); + return -EFAULT; + } + + if (is_ssdfs_node_shared(node)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u has several owners %d\n", + node->node_id, + atomic_read(&node->refs_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EBUSY; + } + + down_write(&tree->lock); + err = ssdfs_btree_delete_index_in_parent_node(tree, search); + up_write(&tree->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to delete index from parent node: " + "err %d\n", err); + } + + ssdfs_debug_btree_object(tree); + ssdfs_check_btree_consistency(tree); + + return err; +} From patchwork Sat Feb 25 01:09:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151956 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CAE1C7EE2F for ; Sat, 25 Feb 2023 01:19:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229745AbjBYBTm (ORCPT ); Fri, 24 Feb 2023 20:19:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229737AbjBYBRj (ORCPT ); Fri, 24 Feb 2023 20:17:39 -0500 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D05CA24487 for ; Fri, 24 Feb 2023 17:17:19 -0800 (PST) Received: by mail-oi1-x22b.google.com with SMTP id t22so778452oiw.12 for ; Fri, 24 Feb 2023 17:17:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=np2P0rGlao8eryLzY9RQxOCrCGk/xgT1ye7crkmQ1i8=; b=n1fypWCLSpS75pNxtHz0nk6T+7q3tL2l+Kjrhk5O153mHiGwFAjlTnAmo0Fq+ODsbB 3nGKNCvGD1w2CldtLojkdChO260gxE+RVUvKVgIP/NwLGeT9cWQPHV9HRtp7czyZFbRM QFnDRW+1nm8LzoWJiflb0aC23NOBerbEipPE1AWP4apJTmp/patLL/rd8p+N0emfrcEi p2C+c1KFj+LI3R6DP9r+fOhZb9CYYqSBmmWrLOzLW+Lvt9xGxJKNPNBbYxkNYx0lRwP8 vBw+LkRbNWfkcetMtLTZDo9bztVIIzTeOdP7+fYD5aj6l0vth6bu2NH3pWOuOydRKUOV HS1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=np2P0rGlao8eryLzY9RQxOCrCGk/xgT1ye7crkmQ1i8=; b=tx5nWP0wmPtEGkfb/bgx+Nu6p0dxVLSYZT9G9vAmt2h9LxMZXHmbjifltH3K5emP9C 4SRsuOnM4KXMtISV54bqVqn/qKDQOfvuLK7pdM933aaO0PJGwn2napD8IUfVPGxESUdK DCGjOtEyWhl2w3wOdPmpfoyojquWH25RXJxtXv64zzW/tRCzyv9oSke2uQQTYqFscV1A iDoFIPtEoSbOZBsuNZI4C/mTOmZsi3JURrs3szlnQYVtvbCsWr8k/SHZA455nkvcAZ+F T89oK5natx2HwdOzet/7iROcLKLaQ4a/ZVDO/pMCOtm9ZH33MHeL4TrbGj8DPgU9nMJ1 ztTg== X-Gm-Message-State: AO0yUKVCt0G0mzD8Fz11hh2Q0BY644n9M1mYTbKgJpy/wFzAF0sxPixu iOvo/87iHPqKALIgkRM97hk2vKE9Nk1r6InI X-Google-Smtp-Source: AK7set8hJBCUkPIwnpYIP27tO0iC/uCcXhxf1UBaE6xamngazcZgK3HU/4Nwns54tZxzCgI7WuMLig== X-Received: by 2002:a05:6808:634d:b0:364:7d13:acf with SMTP id eb13-20020a056808634d00b003647d130acfmr4744113oib.13.1677287838339; Fri, 24 Feb 2023 17:17:18 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:17 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 49/76] ssdfs: b-tree API implementation Date: Fri, 24 Feb 2023 17:09:00 -0800 Message-Id: <20230225010927.813929-50-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Generalized b-tree functionality implements API: (1) create - create empty b-tree object (2) destroy - destroy b-tree object (3) flush - flush dirty b-tree object (4) find_item - find item in b-tree hierarchy (5) find_range - find range of items in b-tree hierarchy (6) allocate_item - allocate item in existing b-tree's node (7) allocate_range - allocate range of items in existing b-tree's node (8) add_item - add item into b-tree (9) add_range - add range of items into b-tree (10) change_item - change existing item in b-tree (11) delete_item - delete item from b-tree (12) delete_range - delete range of items from b-tree (13) delete_all - delete all items from b-tree Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/btree.c | 3713 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 3713 insertions(+) diff --git a/fs/ssdfs/btree.c b/fs/ssdfs/btree.c index d7778cdb67a1..f88f599a72df 100644 --- a/fs/ssdfs/btree.c +++ b/fs/ssdfs/btree.c @@ -4072,3 +4072,3716 @@ int ssdfs_btree_delete_node(struct ssdfs_btree *tree, return err; } + +/* + * node_needs_in_additional_check() - does it need to check the node? + * @err: error code + * @search: search object + */ +static inline +bool node_needs_in_additional_check(int err, + struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err == -ENODATA && + search->result.state == SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE; +} + +/* + * ssdfs_btree_switch_on_hybrid_parent_node() - change current node + * @tree: btree object + * @search: search object [in|out] + * + * This method tries to change the current node on hybrid parent one. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - item can be added. + * %-ENOENT - no space for a new item. + */ +static +int ssdfs_btree_switch_on_hybrid_parent_node(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_node *node; + int state; + u64 start_hash, end_hash; + u16 items_count, items_capacity; + u16 free_items; + u16 flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + + SSDFS_DBG("tree %p, type %#x, " + "request->type %#x, request->flags %#x, " + "result->err %d, result->state %#x, " + "start_hash %llx, end_hash %llx\n", + tree, tree->type, + search->request.type, search->request.flags, + search->result.err, + search->result.state, + search->request.start.hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search->result.err != -ENODATA) { + SSDFS_ERR("unexpected result's error %d\n", + search->result.err); + return -ERANGE; + } + + if (search->result.state != SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE) { + SSDFS_ERR("unexpected result's state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->request.start.hash == U64_MAX || + search->request.end.hash == U64_MAX) { + SSDFS_ERR("invalid request: " + "start_hash %llx, end_hash %llx\n", + search->request.start.hash, + search->request.end.hash); + return -ERANGE; + } + + node = search->node.child; + if (!node) { + node = search->node.parent; + + if (!node) { + SSDFS_ERR("corrupted search request: " + "child and parent nodes are NULL\n"); + return -ERANGE; + } + + if (atomic_read(&node->type) == SSDFS_BTREE_ROOT_NODE) { + SSDFS_DBG("parent is root node\n"); + return -ENOENT; + } else { + SSDFS_ERR("corrupted search request: " + "child nodes is NULL\n"); + return -ERANGE; + } + } + + if (atomic_read(&node->type) == SSDFS_BTREE_ROOT_NODE) { + SSDFS_DBG("child is root node\n"); + return -ENOENT; + } + + state = atomic_read(&node->items_area.state); + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + SSDFS_ERR("invalid items area's state %#x\n", + state); + return -ERANGE; + } + + down_read(&node->header_lock); + items_count = node->items_area.items_count; + items_capacity = node->items_area.items_capacity; + start_hash = node->items_area.start_hash; + end_hash = node->items_area.end_hash; + up_read(&node->header_lock); + + if (start_hash == U64_MAX || end_hash == U64_MAX) { + SSDFS_ERR("corrupted items area: " + "start_hash %llx, end_hash %llx\n", + start_hash, end_hash); + return -ERANGE; + } + + if (start_hash > end_hash) { + SSDFS_ERR("corrupted items area: " + "start_hash %llx, end_hash %llx\n", + start_hash, end_hash); + return -ERANGE; + } + + if (items_count > items_capacity) { + SSDFS_ERR("corrupted items area: " + "items_count %u, items_capacity %u\n", + items_count, items_capacity); + return -ERANGE; + } + + free_items = items_capacity - items_count; + + if (free_items != 0) { + SSDFS_WARN("invalid free_items %u, " + "items_count %u, items_capacity %u\n", + free_items, items_count, items_capacity); + return -ERANGE; + } + + node = search->node.parent; + if (!node) { + SSDFS_ERR("corrupted search request: parent node is NULL\n"); + return -ERANGE; + } + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_ROOT_NODE: + case SSDFS_BTREE_INDEX_NODE: + /* nothing can be done */ + SSDFS_DBG("node is root or index\n"); + return -ENOENT; + + case SSDFS_BTREE_HYBRID_NODE: + /* it needs to check the node's state */ + break; + + case SSDFS_BTREE_LEAF_NODE: + SSDFS_WARN("btree is corrupted: " + "leaf node %u cannot be the parent\n", + node->node_id); + return -ERANGE; + + default: + SSDFS_ERR("invalid node's type %#x\n", + atomic_read(&node->type)); + return -ERANGE; + } + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_WARN("invalid node's %u state %#x\n", + node->node_id, + atomic_read(&node->state)); + return -ERANGE; + } + + flags = atomic_read(&node->flags); + + if (!(flags & SSDFS_BTREE_NODE_HAS_ITEMS_AREA)) { + SSDFS_WARN("hybrid node %u hasn't items area\n", + node->node_id); + return -ENOENT; + } + + ssdfs_btree_search_define_child_node(search, node); + + spin_lock(&node->descriptor_lock); + ssdfs_btree_search_define_parent_node(search, node->parent_node); + spin_unlock(&node->descriptor_lock); + + ssdfs_memcpy(&search->node.found_index, + 0, sizeof(struct ssdfs_btree_index_key), + &node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + + err = ssdfs_btree_node_convert_index2id(tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to convert index to ID: " + "node %u, height %u\n", + node->node_id, + atomic_read(&node->height)); + return err; + } + + down_read(&node->header_lock); + items_count = node->items_area.items_count; + items_capacity = node->items_area.items_capacity; + start_hash = node->items_area.start_hash; + end_hash = node->items_area.end_hash; + up_read(&node->header_lock); + + free_items = items_capacity - items_count; + + if (free_items == 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is exhausted: free_items %u, " + "items_count %u, items_capacity %u\n", + node->node_id, + free_items, items_count, items_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + search->result.state = SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE; + return -ENOENT; + } + + if (items_count == 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is empty: free_items %u, " + "items_count %u, items_capacity %u\n", + node->node_id, + free_items, items_count, items_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_node_check; + } + + if (search->request.start.hash < start_hash) { + if (search->request.end.hash < start_hash) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("request (start_hash %llx, end_hash %llx), " + "area (start_hash %llx, end_hash %llx)\n", + search->request.start.hash, + search->request.end.hash, + start_hash, end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_node_check; + } else { + SSDFS_ERR("invalid request: " + "request (start_hash %llx, end_hash %llx), " + "area (start_hash %llx, end_hash %llx)\n", + search->request.start.hash, + search->request.end.hash, + start_hash, end_hash); + return -ERANGE; + } + } + +finish_node_check: + search->node.state = SSDFS_BTREE_SEARCH_FOUND_LEAF_NODE_DESC; + search->result.state = SSDFS_BTREE_SEARCH_OUT_OF_RANGE; + search->result.err = -ENODATA; + + if (items_count == 0) + search->result.start_index = 0; + else + search->result.start_index = items_count; + + search->result.count = search->request.count; + search->result.search_cno = ssdfs_current_cno(tree->fsi->sb); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, items_count %u, items_capacity %u, " + "start_hash %llx, end_hash %llx\n", + node->node_id, items_count, items_capacity, + start_hash, end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + return -ENODATA; +} + +/* + * __ssdfs_btree_find_item() - find item into btree + * @tree: btree object + * @search: search object [in|out] + * + * This method tries to find an item into the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - item hasn't been found + * %-ENOSPC - node hasn't free space. + */ +static +int __ssdfs_btree_find_item(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search) +{ + int tree_state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_state = atomic_read(&tree->state); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree %p, type %#x, state %#x, " + "request->type %#x, request->flags %#x, " + "start_hash %llx, end_hash %llx\n", + tree, tree->type, tree_state, + search->request.type, search->request.flags, + search->request.start.hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (tree_state) { + case SSDFS_BTREE_CREATED: + case SSDFS_BTREE_DIRTY: + /* expected state */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + SSDFS_WARN("invalid tree state %#x\n", + tree_state); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + if (!is_btree_search_request_valid(search)) { + SSDFS_ERR("invalid search object\n"); + return -EINVAL; + } + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_FIND_ITEM: + case SSDFS_BTREE_SEARCH_ALLOCATE_ITEM: + case SSDFS_BTREE_SEARCH_ALLOCATE_RANGE: + case SSDFS_BTREE_SEARCH_ADD_ITEM: + case SSDFS_BTREE_SEARCH_ADD_RANGE: + case SSDFS_BTREE_SEARCH_CHANGE_ITEM: + case SSDFS_BTREE_SEARCH_DELETE_ITEM: + case SSDFS_BTREE_SEARCH_INVALIDATE_TAIL: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid request type %#x\n", + search->request.type); + return -EINVAL; + } + + err = ssdfs_btree_find_leaf_node(tree, search); + if (err == -EEXIST) { + err = 0; + /* try to find an item */ + } else if (err == -ENOENT) { + err = -ENODATA; + search->result.err = -ENODATA; + search->result.state = SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index node was found\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_ADD_ITEM: + err = ssdfs_btree_switch_on_hybrid_parent_node(tree, + search); + if (err == -ENODATA) + goto finish_search_item; + else if (err == -ENOENT) { + err = -ENODATA; + goto finish_search_item; + } else if (unlikely(err)) { + SSDFS_ERR("fail to switch on parent node: " + "err %d\n", err); + } else { + err = -ENODATA; + goto finish_search_item; + } + break; + + default: + /* do nothing */ + break; + } + + goto finish_search_item; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find leaf node: err %d\n", + err); + goto finish_search_item; + } + + if (search->request.type == SSDFS_BTREE_SEARCH_ADD_ITEM) { +try_another_node: + err = ssdfs_btree_node_find_item(search); + if (node_needs_in_additional_check(err, search)) { + search->result.err = -ENODATA; + err = ssdfs_btree_switch_on_hybrid_parent_node(tree, + search); + if (err == -ENODATA) + goto finish_search_item; + else if (err == -ENOENT) { + err = -ENODATA; + goto finish_search_item; + } else if (unlikely(err)) { + SSDFS_ERR("fail to switch on parent node: " + "err %d\n", err); + goto finish_search_item; + } else { + err = -ENODATA; + goto finish_search_item; + } + } else if (err == -EACCES) { + struct ssdfs_btree_node *node = search->node.child; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = SSDFS_WAIT_COMPLETION(&node->init_end); + if (unlikely(err)) { + SSDFS_ERR("node init failed: " + "err %d\n", err); + goto finish_search_item; + } else + goto try_another_node; + } + } else { +try_find_item_again: + err = ssdfs_btree_node_find_item(search); + if (err == -EACCES) { + struct ssdfs_btree_node *node = search->node.child; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = SSDFS_WAIT_COMPLETION(&node->init_end); + if (unlikely(err)) { + SSDFS_ERR("node init failed: " + "err %d\n", err); + goto finish_search_item; + } else + goto try_find_item_again; + } + } + + if (err == -EAGAIN) { + if (search->result.items_in_buffer > 0 && + search->result.state == SSDFS_BTREE_SEARCH_VALID_ITEM) { + /* finish search */ + err = 0; + search->result.err = 0; + goto finish_search_item; + } else { + err = -ENODATA; + SSDFS_DBG("node hasn't requested data\n"); + goto finish_search_item; + } + } else if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find item: " + "start_hash %llx, end_hash %llx\n", + search->request.start.hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_search_item; + } else if (err == -ENOSPC) { + err = -ENODATA; + search->result.state = SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE; + SSDFS_DBG("index node was found\n"); + goto finish_search_item; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find item: " + "start_hash %llx, end_hash %llx, " + "err %d\n", + search->request.start.hash, + search->request.end.hash, + err); + goto finish_search_item; + } + +finish_search_item: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search->result.state %#x, err %d\n", + search->result.state, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_btree_find_item() - find item into btree + * @tree: btree object + * @search: search object [in|out] + * + * This method tries to find an item into the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - item hasn't been found + */ +int ssdfs_btree_find_item(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + + SSDFS_DBG("tree %p, type %#x, " + "request->type %#x, request->flags %#x, " + "start_hash %llx, end_hash %llx\n", + tree, tree->type, + search->request.type, search->request.flags, + search->request.start.hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&tree->lock); + err = __ssdfs_btree_find_item(tree, search); + up_read(&tree->lock); + + return err; +} + +/* + * __ssdfs_btree_find_range() - find a range of items into btree + * @tree: btree object + * @search: search object [in|out] + * + * This method tries to find a range of item into the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int __ssdfs_btree_find_range(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search) +{ + int tree_state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_state = atomic_read(&tree->state); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree %p, type %#x, state %#x, " + "request->type %#x, request->flags %#x, " + "start_hash %llx, end_hash %llx\n", + tree, tree->type, tree_state, + search->request.type, search->request.flags, + search->request.start.hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (tree_state) { + case SSDFS_BTREE_CREATED: + case SSDFS_BTREE_DIRTY: + /* expected state */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + SSDFS_WARN("invalid tree state %#x\n", + tree_state); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + if (!is_btree_search_request_valid(search)) { + SSDFS_ERR("invalid search object\n"); + return -EINVAL; + } + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_FIND_RANGE: + case SSDFS_BTREE_SEARCH_ADD_RANGE: + case SSDFS_BTREE_SEARCH_DELETE_RANGE: + case SSDFS_BTREE_SEARCH_INVALIDATE_TAIL: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid request type %#x\n", + search->request.type); + return -EINVAL; + } + +try_next_search: + err = ssdfs_btree_find_leaf_node(tree, search); + if (err == -EEXIST) { + err = 0; + /* try to find an item */ + } else if (err == -ENOENT) { + err = -ENODATA; + search->result.err = -ENODATA; + search->result.state = SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE; + SSDFS_DBG("index node was found\n"); + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_ADD_ITEM: + err = ssdfs_btree_switch_on_hybrid_parent_node(tree, + search); + if (err == -ENODATA) { + /* + * do nothing + */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to switch on parent node: " + "err %d\n", err); + } else { + /* finish search */ + err = -ENODATA; + } + break; + + default: + /* do nothing */ + break; + } + + goto finish_search_range; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find leaf node: err %d\n", + err); + goto finish_search_range; + } + + if (search->request.type == SSDFS_BTREE_SEARCH_ADD_RANGE) { +try_another_node: + err = ssdfs_btree_node_find_range(search); + + if (node_needs_in_additional_check(err, search)) { + search->result.err = -ENODATA; + err = ssdfs_btree_switch_on_hybrid_parent_node(tree, + search); + if (err == -ENODATA) + goto finish_search_range; + else if (unlikely(err)) { + SSDFS_ERR("fail to switch on parent node: " + "err %d\n", err); + goto finish_search_range; + } else { + /* finish search */ + err = -ENODATA; + goto finish_search_range; + } + } else if (err == -EACCES) { + struct ssdfs_btree_node *node = search->node.child; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = SSDFS_WAIT_COMPLETION(&node->init_end); + if (unlikely(err)) { + SSDFS_ERR("node init failed: " + "err %d\n", err); + goto finish_search_range; + } else + goto try_another_node; + } + } else { +try_find_range_again: + err = ssdfs_btree_node_find_range(search); + if (err == -EACCES) { + struct ssdfs_btree_node *node = search->node.child; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = SSDFS_WAIT_COMPLETION(&node->init_end); + if (unlikely(err)) { + SSDFS_ERR("node init failed: " + "err %d\n", err); + goto finish_search_range; + } else + goto try_find_range_again; + } + } + + if (err == -EAGAIN) { + if (search->result.items_in_buffer > 0 && + search->result.state == SSDFS_BTREE_SEARCH_VALID_ITEM) { + /* finish search */ + err = 0; + search->result.err = 0; + goto finish_search_range; + } else { + err = 0; + search->node.state = SSDFS_BTREE_SEARCH_NODE_DESC_EMPTY; + SSDFS_DBG("try next search\n"); + goto try_next_search; + } + } else if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find range: " + "start_hash %llx, end_hash %llx\n", + search->request.start.hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_search_range; + } else if (err == -ENOSPC) { + err = -ENODATA; + search->result.state = SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE; + SSDFS_DBG("index node was found\n"); + goto finish_search_range; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find range: " + "start_hash %llx, end_hash %llx, " + "err %d\n", + search->request.start.hash, + search->request.end.hash, + err); + goto finish_search_range; + } + +finish_search_range: + return err; +} + +/* + * ssdfs_btree_find_range() - find a range of items into btree + * @tree: btree object + * @search: search object [in|out] + * + * This method tries to find a range of item into the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_btree_find_range(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + + SSDFS_DBG("tree %p, type %#x, " + "request->type %#x, request->flags %#x, " + "start_hash %llx, end_hash %llx\n", + tree, tree->type, + search->request.type, search->request.flags, + search->request.start.hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&tree->lock); + err = __ssdfs_btree_find_range(tree, search); + up_read(&tree->lock); + + return err; +} + +/* + * ssdfs_btree_allocate_item() - allocate item into btree + * @tree: btree object + * @search: search object [in|out] + * + * This method tries to allocate the item into the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_btree_allocate_item(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search) +{ + int tree_state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_state = atomic_read(&tree->state); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p, type %#x, state %#x, " + "request->type %#x, request->flags %#x, " + "start_hash %llx, end_hash %llx\n", + tree, tree->type, tree_state, + search->request.type, search->request.flags, + search->request.start.hash, + search->request.end.hash); +#else + SSDFS_DBG("tree %p, type %#x, state %#x, " + "request->type %#x, request->flags %#x, " + "start_hash %llx, end_hash %llx\n", + tree, tree->type, tree_state, + search->request.type, search->request.flags, + search->request.start.hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + switch (tree_state) { + case SSDFS_BTREE_CREATED: + case SSDFS_BTREE_DIRTY: + /* expected state */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + SSDFS_WARN("invalid tree state %#x\n", + tree_state); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + if (!is_btree_search_request_valid(search)) { + SSDFS_ERR("invalid search object\n"); + return -EINVAL; + } + + if (search->request.type != SSDFS_BTREE_SEARCH_ALLOCATE_ITEM) { + SSDFS_ERR("invalid request type %#x\n", + search->request.type); + return -EINVAL; + } + + down_read(&tree->lock); + +try_next_search: + err = ssdfs_btree_find_leaf_node(tree, search); + if (err == -EEXIST) { + err = 0; + /* try the old search result */ + } else if (err == -ENOENT) { + err = -ENODATA; + search->result.err = -ENODATA; + search->result.state = SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index node was found\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + up_read(&tree->lock); + err = ssdfs_btree_insert_node(tree, search); + down_read(&tree->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to insert node: err %d\n", + err); + goto finish_allocate_item; + } + + err = ssdfs_btree_find_leaf_node(tree, search); + if (err == -EEXIST) { + err = 0; + /* try the old search result */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to find leaf node: err %d\n", + err); + goto finish_allocate_item; + } + } else if (unlikely(err)) { + SSDFS_ERR("fail to find leaf node: err %d\n", + err); + goto finish_allocate_item; + } + +try_allocate_item: + err = ssdfs_btree_node_allocate_item(search); + if (err == -EAGAIN) { + err = 0; + search->node.state = SSDFS_BTREE_SEARCH_NODE_DESC_EMPTY; + goto try_next_search; + } else if (err == -EACCES) { + struct ssdfs_btree_node *node = search->node.child; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = SSDFS_WAIT_COMPLETION(&node->init_end); + if (unlikely(err)) { + SSDFS_ERR("node init failed: " + "err %d\n", err); + goto finish_allocate_item; + } else + goto try_allocate_item; + } else if (unlikely(err)) { + SSDFS_ERR("fail to allocate item: " + "start_hash %llx, end_hash %llx, " + "err %d\n", + search->request.start.hash, + search->request.end.hash, + err); + goto finish_allocate_item; + } + + atomic_set(&tree->state, SSDFS_BTREE_DIRTY); + +finish_allocate_item: + up_read(&tree->lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ssdfs_debug_btree_object(tree); + +#ifdef CONFIG_SSDFS_BTREE_STRICT_CONSISTENCY_CHECK + ssdfs_check_btree_consistency(tree); +#endif /* CONFIG_SSDFS_BTREE_STRICT_CONSISTENCY_CHECK */ + + return err; +} + +/* + * ssdfs_btree_allocate_range() - allocate range of items into btree + * @tree: btree object + * @search: search object [in|out] + * + * This method tries to allocate the range of items into the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_btree_allocate_range(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search) +{ + int tree_state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_state = atomic_read(&tree->state); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p, type %#x, state %#x, " + "request->type %#x, request->flags %#x, " + "start_hash %llx, end_hash %llx\n", + tree, tree->type, tree_state, + search->request.type, search->request.flags, + search->request.start.hash, + search->request.end.hash); +#else + SSDFS_DBG("tree %p, type %#x, state %#x, " + "request->type %#x, request->flags %#x, " + "start_hash %llx, end_hash %llx\n", + tree, tree->type, tree_state, + search->request.type, search->request.flags, + search->request.start.hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + switch (tree_state) { + case SSDFS_BTREE_CREATED: + case SSDFS_BTREE_DIRTY: + /* expected state */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + SSDFS_WARN("invalid tree state %#x\n", + tree_state); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + if (!is_btree_search_request_valid(search)) { + SSDFS_ERR("invalid search object\n"); + return -EINVAL; + } + + if (search->request.type != SSDFS_BTREE_SEARCH_ALLOCATE_RANGE) { + SSDFS_ERR("invalid request type %#x\n", + search->request.type); + return -EINVAL; + } + + down_read(&tree->lock); + +try_next_search: + err = ssdfs_btree_find_leaf_node(tree, search); + if (err == -EEXIST) { + err = 0; + /* try the old search result */ + } else if (err == -ENOENT) { + err = -ENODATA; + search->result.err = -ENODATA; + search->result.state = SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index node was found\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + up_read(&tree->lock); + err = ssdfs_btree_insert_node(tree, search); + down_read(&tree->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to insert node: err %d\n", + err); + goto finish_allocate_range; + } + + err = ssdfs_btree_find_leaf_node(tree, search); + if (err == -EEXIST) { + err = 0; + /* try the old search result */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to find leaf node: err %d\n", + err); + goto finish_allocate_range; + } + } else if (unlikely(err)) { + SSDFS_ERR("fail to find leaf node: err %d\n", + err); + goto finish_allocate_range; + } + +try_allocate_range: + err = ssdfs_btree_node_allocate_range(search); + if (err == -EAGAIN) { + err = 0; + search->node.state = SSDFS_BTREE_SEARCH_NODE_DESC_EMPTY; + goto try_next_search; + } else if (err == -EACCES) { + struct ssdfs_btree_node *node = search->node.child; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = SSDFS_WAIT_COMPLETION(&node->init_end); + if (unlikely(err)) { + SSDFS_ERR("node init failed: " + "err %d\n", err); + goto finish_allocate_range; + } else + goto try_allocate_range; + } else if (unlikely(err)) { + SSDFS_ERR("fail to allocate range: " + "start_hash %llx, end_hash %llx, " + "err %d\n", + search->request.start.hash, + search->request.end.hash, + err); + goto finish_allocate_range; + } + + atomic_set(&tree->state, SSDFS_BTREE_DIRTY); + +finish_allocate_range: + up_read(&tree->lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ssdfs_debug_btree_object(tree); + +#ifdef CONFIG_SSDFS_BTREE_STRICT_CONSISTENCY_CHECK + ssdfs_check_btree_consistency(tree); +#endif /* CONFIG_SSDFS_BTREE_STRICT_CONSISTENCY_CHECK */ + + return err; +} + +/* + * need_update_parent_node() - check necessity to update index in parent node + * @search: search object + */ +static inline +bool need_update_parent_node(struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_node *child; + u64 start_hash; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); +#endif /* CONFIG_SSDFS_DEBUG */ + + start_hash = search->request.start.hash; + + child = search->node.child; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!child); +#endif /* CONFIG_SSDFS_DEBUG */ + + return need_update_parent_index_area(start_hash, child); +} + +/* + * ssdfs_btree_update_index_in_parent_node() - update index in parent node + * @tree: btree object + * @search: search object [in|out] + * @ptr: hierarchy object + * + * This method tries to update an index in parent nodes. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_update_index_in_parent_node(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_hierarchy *ptr) +{ + int cur_height, tree_height; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !ptr); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, hierarchy %p\n", + tree, ptr); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_height = atomic_read(&tree->height); + if (tree_height <= 0) { + SSDFS_ERR("invalid tree_height %u\n", + tree_height); + return -ERANGE; + } + + for (cur_height = 0; cur_height < tree_height; cur_height++) { + err = ssdfs_btree_process_level_for_update(ptr, + cur_height, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to process the tree's level: " + "cur_height %u, err %d\n", + cur_height, err); + return err; + } + } + + return 0; +} + +/* + * ssdfs_btree_add_item() - add item into btree + * @tree: btree object + * @search: search object [in|out] + * + * This method tries to add the item into the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EEXIST - item exists in the tree. + */ +int ssdfs_btree_add_item(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_hierarchy *hierarchy = NULL; + int tree_state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_state = atomic_read(&tree->state); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p, type %#x, state %#x, " + "request->type %#x, request->flags %#x, " + "start_hash %llx, end_hash %llx\n", + tree, tree->type, tree_state, + search->request.type, search->request.flags, + search->request.start.hash, + search->request.end.hash); +#else + SSDFS_DBG("tree %p, type %#x, state %#x, " + "request->type %#x, request->flags %#x, " + "start_hash %llx, end_hash %llx\n", + tree, tree->type, tree_state, + search->request.type, search->request.flags, + search->request.start.hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + switch (tree_state) { + case SSDFS_BTREE_CREATED: + case SSDFS_BTREE_DIRTY: + /* expected state */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + SSDFS_WARN("invalid tree state %#x\n", + tree_state); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + if (!is_btree_search_request_valid(search)) { + SSDFS_ERR("invalid search object\n"); + return -EINVAL; + } + + if (search->request.type != SSDFS_BTREE_SEARCH_ADD_ITEM) { + SSDFS_ERR("invalid request type %#x\n", + search->request.type); + return -EINVAL; + } + + down_read(&tree->lock); + +try_find_item: + err = __ssdfs_btree_find_item(tree, search); + if (!err) { + err = -EEXIST; + SSDFS_ERR("item exists in the tree: " + "start_hash %llx, end_hash %llx\n", + search->request.start.hash, + search->request.end.hash); + goto finish_add_item; + } else if (err == -ENODATA) { + err = 0; + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + /* position in node was found */ + break; + case SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE: + /* none node is able to store the new item */ + break; + default: + err = -ERANGE; + SSDFS_ERR("invalid search result: " + "start_hash %llx, end_hash %llx, " + "state %#x\n", + search->request.start.hash, + search->request.end.hash, + search->result.state); + goto finish_add_item; + }; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find item: " + "start_hash %llx, end_hash %llx, err %d\n", + search->request.start.hash, + search->request.end.hash, + err); + goto finish_add_item; + } + + if (search->result.state == SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE) { + up_read(&tree->lock); + err = ssdfs_btree_insert_node(tree, search); + down_read(&tree->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to insert node: err %d\n", + err); + goto finish_add_item; + } + + err = __ssdfs_btree_find_item(tree, search); + if (!err) { + err = -EEXIST; + SSDFS_ERR("item exists in the tree: " + "start_hash %llx, end_hash %llx\n", + search->request.start.hash, + search->request.end.hash); + goto finish_add_item; + } else if (err == -ENODATA) { + err = 0; + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + /* position in node was found */ + break; + default: + err = -ERANGE; + SSDFS_ERR("invalid search result: " + "start_hash %llx, end_hash %llx, " + "state %#x\n", + search->request.start.hash, + search->request.end.hash, + search->result.state); + goto finish_add_item; + }; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find item: " + "start_hash %llx, end_hash %llx, err %d\n", + search->request.start.hash, + search->request.end.hash, + err); + goto finish_add_item; + } + } + +try_insert_item: + err = ssdfs_btree_node_insert_item(search); + if (err == -EACCES) { + struct ssdfs_btree_node *node = search->node.child; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = SSDFS_WAIT_COMPLETION(&node->init_end); + if (unlikely(err)) { + SSDFS_ERR("node init failed: " + "err %d\n", err); + goto finish_add_item; + } else + goto try_insert_item; + } else if (err == -EFBIG) { + int state = search->result.state; + + err = 0; + + if (state != SSDFS_BTREE_SEARCH_PLEASE_MOVE_BUF_CONTENT) { + err = -ERANGE; + SSDFS_WARN("invalid search's result state %#x\n", + state); + goto finish_add_item; + } else + goto try_find_item; + } else if (unlikely(err)) { + SSDFS_ERR("fail to insert item: " + "start_hash %llx, end_hash %llx, " + "err %d\n", + search->request.start.hash, + search->request.end.hash, + err); + goto finish_add_item; + } + + if (need_update_parent_node(search)) { + hierarchy = ssdfs_btree_hierarchy_allocate(tree); + if (!hierarchy) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate tree levels' array\n"); + goto finish_add_item; + } + + err = ssdfs_btree_check_hierarchy_for_update(tree, search, + hierarchy); + if (unlikely(err)) { + atomic_set(&search->node.child->state, + SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to prepare hierarchy information : " + "err %d\n", + err); + goto finish_update_parent; + } + + err = ssdfs_btree_update_index_in_parent_node(tree, search, + hierarchy); + if (unlikely(err)) { + SSDFS_ERR("fail to update index records: " + "err %d\n", + err); + goto finish_update_parent; + } + +finish_update_parent: + ssdfs_btree_hierarchy_free(hierarchy); + + if (unlikely(err)) + goto finish_add_item; + } + + atomic_set(&tree->state, SSDFS_BTREE_DIRTY); + +finish_add_item: + up_read(&tree->lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ssdfs_debug_btree_object(tree); + +#ifdef CONFIG_SSDFS_BTREE_STRICT_CONSISTENCY_CHECK + ssdfs_check_btree_consistency(tree); +#endif /* CONFIG_SSDFS_BTREE_STRICT_CONSISTENCY_CHECK */ + + return err; +} + +/* + * ssdfs_btree_add_range() - add a range of items into btree + * @tree: btree object + * @search: search object [in|out] + * + * This method tries to add the range of items into the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EEXIST - range exists in the tree. + */ +int ssdfs_btree_add_range(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_hierarchy *hierarchy = NULL; + int tree_state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_state = atomic_read(&tree->state); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p, type %#x, state %#x, " + "request->type %#x, request->flags %#x, " + "start_hash %llx, end_hash %llx\n", + tree, tree->type, tree_state, + search->request.type, search->request.flags, + search->request.start.hash, + search->request.end.hash); +#else + SSDFS_DBG("tree %p, type %#x, state %#x, " + "request->type %#x, request->flags %#x, " + "start_hash %llx, end_hash %llx\n", + tree, tree->type, tree_state, + search->request.type, search->request.flags, + search->request.start.hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + switch (tree_state) { + case SSDFS_BTREE_CREATED: + case SSDFS_BTREE_DIRTY: + /* expected state */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + SSDFS_WARN("invalid tree state %#x\n", + tree_state); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + if (!is_btree_search_request_valid(search)) { + SSDFS_ERR("invalid search object\n"); + return -EINVAL; + } + + if (search->request.type != SSDFS_BTREE_SEARCH_ADD_RANGE) { + SSDFS_ERR("invalid request type %#x\n", + search->request.type); + return -EINVAL; + } + + down_read(&tree->lock); + +try_find_range: + err = __ssdfs_btree_find_range(tree, search); + if (!err) { + err = -EEXIST; + SSDFS_ERR("range exists in the tree: " + "start_hash %llx, end_hash %llx\n", + search->request.start.hash, + search->request.end.hash); + goto finish_add_range; + } else if (err == -ENODATA) { + err = 0; + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + /* position in node was found */ + break; + case SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE: + /* none node is able to store the new range */ + break; + default: + err = -ERANGE; + SSDFS_ERR("invalid search result: " + "start_hash %llx, end_hash %llx, " + "state %#x\n", + search->request.start.hash, + search->request.end.hash, + search->result.state); + goto finish_add_range; + }; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find range: " + "start_hash %llx, end_hash %llx, err %d\n", + search->request.start.hash, + search->request.end.hash, + err); + goto finish_add_range; + } + + if (search->result.state == SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE) { + up_read(&tree->lock); + err = ssdfs_btree_insert_node(tree, search); + down_read(&tree->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to insert node: err %d\n", + err); + goto finish_add_range; + } + + err = __ssdfs_btree_find_range(tree, search); + if (!err) { + err = -EEXIST; + SSDFS_ERR("range exists in the tree: " + "start_hash %llx, end_hash %llx\n", + search->request.start.hash, + search->request.end.hash); + goto finish_add_range; + } else if (err == -ENODATA) { + err = 0; + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + /* position in node was found */ + break; + default: + err = -ERANGE; + SSDFS_ERR("invalid search result: " + "start_hash %llx, end_hash %llx, " + "state %#x\n", + search->request.start.hash, + search->request.end.hash, + search->result.state); + goto finish_add_range; + }; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find range: " + "start_hash %llx, end_hash %llx, err %d\n", + search->request.start.hash, + search->request.end.hash, + err); + goto finish_add_range; + } + } + +try_insert_range: + err = ssdfs_btree_node_insert_range(search); + if (err == -EAGAIN) { + err = 0; + goto try_find_range; + } else if (err == -EACCES) { + struct ssdfs_btree_node *node = search->node.child; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = SSDFS_WAIT_COMPLETION(&node->init_end); + if (unlikely(err)) { + SSDFS_ERR("node init failed: " + "err %d\n", err); + goto finish_add_range; + } else + goto try_insert_range; + } else if (err == -EFBIG) { + int state = search->result.state; + + err = 0; + + if (state != SSDFS_BTREE_SEARCH_PLEASE_MOVE_BUF_CONTENT) { + err = -ERANGE; + SSDFS_WARN("invalid search's result state %#x\n", + state); + goto finish_add_range; + } else + goto try_find_range; + } else if (unlikely(err)) { + SSDFS_ERR("fail to insert item: " + "start_hash %llx, end_hash %llx, " + "err %d\n", + search->request.start.hash, + search->request.end.hash, + err); + goto finish_add_range; + } + + if (need_update_parent_node(search)) { + hierarchy = ssdfs_btree_hierarchy_allocate(tree); + if (!hierarchy) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate tree levels' array\n"); + goto finish_add_range; + } + + err = ssdfs_btree_check_hierarchy_for_update(tree, search, + hierarchy); + if (unlikely(err)) { + atomic_set(&search->node.child->state, + SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to prepare hierarchy information : " + "err %d\n", + err); + goto finish_update_parent; + } + + err = ssdfs_btree_update_index_in_parent_node(tree, search, + hierarchy); + if (unlikely(err)) { + SSDFS_ERR("fail to update index records: " + "err %d\n", + err); + goto finish_update_parent; + } + +finish_update_parent: + ssdfs_btree_hierarchy_free(hierarchy); + + if (unlikely(err)) + goto finish_add_range; + } + + atomic_set(&tree->state, SSDFS_BTREE_DIRTY); + +finish_add_range: + up_read(&tree->lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ssdfs_debug_btree_object(tree); + +#ifdef CONFIG_SSDFS_BTREE_STRICT_CONSISTENCY_CHECK + ssdfs_check_btree_consistency(tree); +#endif /* CONFIG_SSDFS_BTREE_STRICT_CONSISTENCY_CHECK */ + + return err; +} + +/* + * ssdfs_btree_change_item() - change an existing item in the btree + * @tree: btree object + * @search: search object [in|out] + * + * This method tries to change the existing item in the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_btree_change_item(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_hierarchy *hierarchy = NULL; + int tree_state; + int tree_height; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_state = atomic_read(&tree->state); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p, type %#x, state %#x, " + "request->type %#x, request->flags %#x, " + "start_hash %llx, end_hash %llx\n", + tree, tree->type, tree_state, + search->request.type, search->request.flags, + search->request.start.hash, + search->request.end.hash); +#else + SSDFS_DBG("tree %p, type %#x, state %#x, " + "request->type %#x, request->flags %#x, " + "start_hash %llx, end_hash %llx\n", + tree, tree->type, tree_state, + search->request.type, search->request.flags, + search->request.start.hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + switch (tree_state) { + case SSDFS_BTREE_CREATED: + case SSDFS_BTREE_DIRTY: + /* expected state */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + SSDFS_WARN("invalid tree state %#x\n", + tree_state); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + if (!is_btree_search_request_valid(search)) { + SSDFS_ERR("invalid search object\n"); + return -EINVAL; + } + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_CHANGE_ITEM: + case SSDFS_BTREE_SEARCH_INVALIDATE_TAIL: + /* expected type */ + break; + + default: + SSDFS_ERR("invalid request type %#x\n", + search->request.type); + return -EINVAL; + } + + tree_height = atomic_read(&tree->height); + if (tree_height <= 0) { + SSDFS_ERR("invalid tree_height %u\n", + tree_height); + return -ERANGE; + } + + down_read(&tree->lock); + +try_next_search: + err = ssdfs_btree_find_leaf_node(tree, search); + if (err == -EEXIST) { + err = 0; + /* try the old search result */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to find leaf node: err %d\n", + err); + goto finish_change_item; + } + +try_change_item: + err = ssdfs_btree_node_change_item(search); + if (err == -EAGAIN) { + err = 0; + search->node.state = SSDFS_BTREE_SEARCH_NODE_DESC_EMPTY; + goto try_next_search; + } else if (err == -EACCES) { + struct ssdfs_btree_node *node = search->node.child; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = SSDFS_WAIT_COMPLETION(&node->init_end); + if (unlikely(err)) { + SSDFS_ERR("node init failed: " + "err %d\n", err); + goto finish_change_item; + } else + goto try_change_item; + } else if (unlikely(err)) { + SSDFS_ERR("fail to change item: " + "start_hash %llx, end_hash %llx, " + "err %d\n", + search->request.start.hash, + search->request.end.hash, + err); + goto finish_change_item; + } + + if (need_update_parent_node(search)) { + hierarchy = ssdfs_btree_hierarchy_allocate(tree); + if (!hierarchy) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate tree levels' array\n"); + goto finish_change_item; + } + + err = ssdfs_btree_check_hierarchy_for_update(tree, search, + hierarchy); + if (unlikely(err)) { + atomic_set(&search->node.child->state, + SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to prepare hierarchy information : " + "err %d\n", + err); + goto finish_update_parent; + } + + err = ssdfs_btree_update_index_in_parent_node(tree, search, + hierarchy); + if (unlikely(err)) { + SSDFS_ERR("fail to update index records: " + "err %d\n", + err); + goto finish_update_parent; + } + +finish_update_parent: + ssdfs_btree_hierarchy_free(hierarchy); + + if (unlikely(err)) + goto finish_change_item; + } + + atomic_set(&tree->state, SSDFS_BTREE_DIRTY); + +finish_change_item: + up_read(&tree->lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ssdfs_debug_btree_object(tree); + +#ifdef CONFIG_SSDFS_BTREE_STRICT_CONSISTENCY_CHECK + ssdfs_check_btree_consistency(tree); +#endif /* CONFIG_SSDFS_BTREE_STRICT_CONSISTENCY_CHECK */ + + return err; +} + +/* + * ssdfs_btree_delete_item() - delete an existing item in the btree + * @tree: btree object + * @search: search object [in|out] + * + * This method tries to delete the existing item in the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_btree_delete_item(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_hierarchy *hierarchy = NULL; + int tree_state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_state = atomic_read(&tree->state); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p, type %#x, state %#x, " + "request->type %#x, request->flags %#x, " + "start_hash %llx, end_hash %llx\n", + tree, tree->type, tree_state, + search->request.type, search->request.flags, + search->request.start.hash, + search->request.end.hash); +#else + SSDFS_DBG("tree %p, type %#x, state %#x, " + "request->type %#x, request->flags %#x, " + "start_hash %llx, end_hash %llx\n", + tree, tree->type, tree_state, + search->request.type, search->request.flags, + search->request.start.hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + switch (tree_state) { + case SSDFS_BTREE_CREATED: + case SSDFS_BTREE_DIRTY: + /* expected state */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + SSDFS_WARN("invalid tree state %#x\n", + tree_state); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + if (!is_btree_search_request_valid(search)) { + SSDFS_ERR("invalid search object\n"); + return -EINVAL; + } + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_DELETE_ITEM: + case SSDFS_BTREE_SEARCH_INVALIDATE_TAIL: + /* expected type */ + break; + + default: + SSDFS_ERR("invalid request type %#x\n", + search->request.type); + return -EINVAL; + } + + down_read(&tree->lock); + + err = __ssdfs_btree_find_item(tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to find item: " + "start_hash %llx, end_hash %llx, err %d\n", + search->request.start.hash, + search->request.end.hash, + err); + goto finish_delete_item; + } + +try_delete_item: + err = ssdfs_btree_node_delete_item(search); + if (err == -EACCES) { + struct ssdfs_btree_node *node = search->node.child; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = SSDFS_WAIT_COMPLETION(&node->init_end); + if (unlikely(err)) { + SSDFS_ERR("node init failed: " + "err %d\n", err); + goto finish_delete_item; + } else + goto try_delete_item; + } else if (unlikely(err)) { + SSDFS_ERR("fail to delete item: " + "start_hash %llx, end_hash %llx, err %d\n", + search->request.start.hash, + search->request.end.hash, + err); + goto finish_delete_item; + } + + if (search->result.state == SSDFS_BTREE_SEARCH_PLEASE_DELETE_NODE) { + up_read(&tree->lock); + err = ssdfs_btree_delete_node(tree, search); + down_read(&tree->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to delete btree node: " + "node_id %llu, err %d\n", + (u64)search->node.id, err); + goto finish_delete_item; + } + } else if (need_update_parent_node(search)) { + hierarchy = ssdfs_btree_hierarchy_allocate(tree); + if (!hierarchy) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate tree levels' array\n"); + goto finish_delete_item; + } + + err = ssdfs_btree_check_hierarchy_for_update(tree, search, + hierarchy); + if (unlikely(err)) { + atomic_set(&search->node.child->state, + SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to prepare hierarchy information : " + "err %d\n", + err); + goto finish_update_parent; + } + + err = ssdfs_btree_update_index_in_parent_node(tree, search, + hierarchy); + if (unlikely(err)) { + SSDFS_ERR("fail to update index records: " + "err %d\n", + err); + goto finish_update_parent; + } + +finish_update_parent: + ssdfs_btree_hierarchy_free(hierarchy); + + if (unlikely(err)) + goto finish_delete_item; + } + + atomic_set(&tree->state, SSDFS_BTREE_DIRTY); + +finish_delete_item: + up_read(&tree->lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ssdfs_debug_btree_object(tree); + +#ifdef CONFIG_SSDFS_BTREE_STRICT_CONSISTENCY_CHECK + ssdfs_check_btree_consistency(tree); +#endif /* CONFIG_SSDFS_BTREE_STRICT_CONSISTENCY_CHECK */ + + return err; +} + +/* + * ssdfs_btree_delete_range() - delete a range of items in the btree + * @tree: btree object + * @search: search object [in|out] + * + * This method tries to delete a range of existing items in the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_btree_delete_range(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_hierarchy *hierarchy = NULL; + int tree_state; + bool need_continue_deletion = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_state = atomic_read(&tree->state); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p, type %#x, state %#x, " + "request->type %#x, request->flags %#x, " + "start_hash %llx, end_hash %llx\n", + tree, tree->type, tree_state, + search->request.type, search->request.flags, + search->request.start.hash, + search->request.end.hash); +#else + SSDFS_DBG("tree %p, type %#x, state %#x, " + "request->type %#x, request->flags %#x, " + "start_hash %llx, end_hash %llx\n", + tree, tree->type, tree_state, + search->request.type, search->request.flags, + search->request.start.hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + switch (tree_state) { + case SSDFS_BTREE_CREATED: + case SSDFS_BTREE_DIRTY: + /* expected state */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + SSDFS_WARN("invalid tree state %#x\n", + tree_state); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + if (!is_btree_search_request_valid(search)) { + SSDFS_ERR("invalid search object\n"); + return -EINVAL; + } + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_DELETE_RANGE: + case SSDFS_BTREE_SEARCH_DELETE_ALL: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid request type %#x\n", + search->request.type); + return -EINVAL; + } + + down_read(&tree->lock); + +try_delete_next_range: + err = __ssdfs_btree_find_range(tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to find range: " + "start_hash %llx, end_hash %llx, err %d\n", + search->request.start.hash, + search->request.end.hash, + err); + up_read(&tree->lock); + return err; + } + +try_delete_range_again: + err = ssdfs_btree_node_delete_range(search); + if (err == -EACCES) { + struct ssdfs_btree_node *node = search->node.child; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = SSDFS_WAIT_COMPLETION(&node->init_end); + if (unlikely(err)) { + SSDFS_ERR("node init failed: " + "err %d\n", err); + goto finish_delete_range; + } else + goto try_delete_range_again; + } + +finish_delete_range: + if (err == -EAGAIN) { + /* the range have to be deleted in the next node */ + err = 0; + need_continue_deletion = true; + } else if (unlikely(err)) { + SSDFS_ERR("fail to delete range: " + "start_hash %llx, end_hash %llx, err %d\n", + search->request.start.hash, + search->request.end.hash, + err); + up_read(&tree->lock); + return err; + } + + if (search->result.state == SSDFS_BTREE_SEARCH_PLEASE_DELETE_NODE) { + up_read(&tree->lock); + err = ssdfs_btree_delete_node(tree, search); + down_read(&tree->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to delete btree node: " + "node_id %llu, err %d\n", + (u64)search->node.id, err); + goto fail_delete_range; + } + } else if (need_update_parent_node(search)) { + hierarchy = ssdfs_btree_hierarchy_allocate(tree); + if (!hierarchy) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate tree levels' array\n"); + goto fail_delete_range; + } + + err = ssdfs_btree_check_hierarchy_for_update(tree, search, + hierarchy); + if (unlikely(err)) { + atomic_set(&search->node.child->state, + SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to prepare hierarchy information : " + "err %d\n", + err); + goto finish_update_parent; + } + + err = ssdfs_btree_update_index_in_parent_node(tree, search, + hierarchy); + if (unlikely(err)) { + SSDFS_ERR("fail to update index records: " + "err %d\n", + err); + goto finish_update_parent; + } + +finish_update_parent: + ssdfs_btree_hierarchy_free(hierarchy); + + if (unlikely(err)) + goto fail_delete_range; + } + + if (need_continue_deletion) { + need_continue_deletion = false; + goto try_delete_next_range; + } + + atomic_set(&tree->state, SSDFS_BTREE_DIRTY); + +fail_delete_range: + up_read(&tree->lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ssdfs_debug_btree_object(tree); + +#ifdef CONFIG_SSDFS_BTREE_STRICT_CONSISTENCY_CHECK + ssdfs_check_btree_consistency(tree); +#endif /* CONFIG_SSDFS_BTREE_STRICT_CONSISTENCY_CHECK */ + + return err; +} + +/* + * ssdfs_btree_delete_all() - delete all items in the btree + * @tree: btree object + * @search: search object [in|out] + * + * This method tries to delete all items in the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_btree_delete_all(struct ssdfs_btree *tree) +{ + struct ssdfs_btree_search *search; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p\n", tree); +#else + SSDFS_DBG("tree %p\n", tree); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + search = ssdfs_btree_search_alloc(); + if (!search) { + SSDFS_ERR("fail to allocate btree search object\n"); + return -ENOMEM; + } + + ssdfs_btree_search_init(search); + + search->request.type = SSDFS_BTREE_SEARCH_DELETE_ALL; + search->request.start.hash = 0; + search->request.end.hash = U64_MAX; + + err = ssdfs_btree_delete_range(tree, search); + if (unlikely(err)) + SSDFS_ERR("fail to delete all items: err %d\n", err); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ssdfs_btree_search_free(search); + return err; +} + +/* + * ssdfs_btree_get_head_range() - extract head range of the tree + * @tree: btree object + * @expected_len: expected length of the range + * @search: search object + * + * This method tries to extract a head range of items from the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EAGAIN - expected length of the range is not extracted + */ +int ssdfs_btree_get_head_range(struct ssdfs_btree *tree, + u32 expected_len, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_node *node; + struct ssdfs_btree_index_key key; + int tree_state; + u64 hash; + u32 buf_size; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_state = atomic_read(&tree->state); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree %p, type %#x, state %#x, " + "expected_len %u\n", + tree, tree->type, tree_state, + expected_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (tree_state) { + case SSDFS_BTREE_CREATED: + case SSDFS_BTREE_DIRTY: + /* expected state */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + SSDFS_WARN("invalid tree state %#x\n", + tree_state); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + down_read(&tree->lock); + + err = ssdfs_btree_radix_tree_find(tree, + SSDFS_BTREE_ROOT_NODE_ID, + &node); + if (unlikely(err)) { + SSDFS_ERR("fail to find root node: err %d\n", + err); + goto finish_get_range; + } else if (!node) { + err = -ERANGE; + SSDFS_ERR("node is NULL\n"); + goto finish_get_range; + } + + if (!is_ssdfs_btree_node_index_area_exist(node)) { + err = -ERANGE; + SSDFS_WARN("root node hasn't index area\n"); + goto finish_get_range; + } + + if (is_ssdfs_btree_node_index_area_empty(node)) + goto finish_get_range; + + down_read(&node->full_lock); + err = __ssdfs_btree_root_node_extract_index(node, + SSDFS_ROOT_NODE_LEFT_LEAF_NODE, + &key); + up_read(&node->full_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to get index: err %d\n", + err); + goto finish_get_range; + } + + hash = le64_to_cpu(key.index.hash); + if (hash >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid hash\n"); + goto finish_get_range; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hash %llx\n", hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + search->request.flags = SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT; + search->request.start.hash = hash; + search->request.end.hash = hash; + search->request.count = 1; + + err = __ssdfs_btree_find_item(tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the item: " + "hash %llx, err %d\n", + hash, err); + goto finish_get_range; + } + + buf_size = expected_len * tree->max_item_size; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(expected_len >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_btree_node_extract_range(search->result.start_index, + (u16)expected_len, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to extract the range: " + "start_index %u, expected_len %u, err %d\n", + search->result.start_index, + expected_len, err); + goto finish_get_range; + } + + if (expected_len != search->result.count) { + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("expected_len %u != search->result.count %u\n", + expected_len, search->result.count); +#endif /* CONFIG_SSDFS_DEBUG */ + } + +finish_get_range: + up_read(&tree->lock); + + return err; +} + +/* + * ssdfs_btree_extract_range() - extract range from the node + * @tree: btree object + * @start_index: start index in the node + * @count: count of items in the range + * @search: search object + * + * This method tries to extract a range of items from the found node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_btree_extract_range(struct ssdfs_btree *tree, + u16 start_index, u16 count, + struct ssdfs_btree_search *search) +{ + int tree_state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_state = atomic_read(&tree->state); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree %p, type %#x, state %#x, " + "start_index %u, count %u\n", + tree, tree->type, tree_state, + start_index, count); + + ssdfs_debug_btree_object(tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (tree_state) { + case SSDFS_BTREE_CREATED: + case SSDFS_BTREE_DIRTY: + /* expected state */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + SSDFS_WARN("invalid tree state %#x\n", + tree_state); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + down_read(&tree->lock); + + err = ssdfs_btree_node_extract_range(start_index, count, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to extract the range: " + "start_index %u, count %u, err %d\n", + start_index, count, err); + goto finish_get_range; + } + +finish_get_range: + up_read(&tree->lock); + + return err; +} + +/* + * is_ssdfs_btree_empty() - check that btree is empty + * @tree: btree object + */ +bool is_ssdfs_btree_empty(struct ssdfs_btree *tree) +{ + struct ssdfs_btree_node *node; + struct ssdfs_btree_index_key key1, key2; + int tree_state; + u32 node_id1, node_id2; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_state = atomic_read(&tree->state); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree %p, type %#x, state %#x\n", + tree, tree->type, tree_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (tree_state) { + case SSDFS_BTREE_CREATED: + case SSDFS_BTREE_DIRTY: + /* expected state */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + SSDFS_WARN("invalid tree state %#x\n", + tree_state); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + down_read(&tree->lock); + + err = ssdfs_btree_radix_tree_find(tree, + SSDFS_BTREE_ROOT_NODE_ID, + &node); + if (unlikely(err)) { + SSDFS_ERR("fail to find root node: err %d\n", + err); + goto finish_check_tree; + } else if (!node) { + err = -ERANGE; + SSDFS_ERR("node is NULL\n"); + goto finish_check_tree; + } + + if (!is_ssdfs_btree_node_index_area_exist(node)) { + err = -ERANGE; + SSDFS_WARN("root node hasn't index area\n"); + goto finish_check_tree; + } + + if (is_ssdfs_btree_node_index_area_empty(node)) + goto finish_check_tree; + + down_read(&node->full_lock); + err = __ssdfs_btree_root_node_extract_index(node, + SSDFS_ROOT_NODE_LEFT_LEAF_NODE, + &key1); + up_read(&node->full_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to get index: err %d\n", + err); + goto finish_check_tree; + } + + node_id1 = le32_to_cpu(key1.node_id); + if (node_id1 == SSDFS_BTREE_NODE_INVALID_ID) { + SSDFS_WARN("index is invalid\n"); + goto finish_check_tree; + } + + down_read(&node->full_lock); + err = __ssdfs_btree_root_node_extract_index(node, + SSDFS_ROOT_NODE_RIGHT_LEAF_NODE, + &key2); + up_read(&node->full_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to get index: err %d\n", + err); + goto finish_check_tree; + } + + node_id2 = le32_to_cpu(key2.node_id); + if (node_id2 != SSDFS_BTREE_NODE_INVALID_ID) { + err = -EEXIST; + goto finish_check_tree; + } + + err = ssdfs_btree_radix_tree_find(tree, node_id1, &node); + if (unlikely(err)) { + SSDFS_ERR("fail to find node: node_id %u, err %d\n", + node_id1, err); + goto finish_check_tree; + } else if (!node) { + err = -ERANGE; + SSDFS_ERR("node is NULL\n"); + goto finish_check_tree; + } + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_LEAF_NODE: + if (!is_ssdfs_btree_node_items_area_empty(node)) { + err = -EEXIST; + goto finish_check_tree; + } else { + /* empty node */ + goto finish_check_tree; + } + break; + + case SSDFS_BTREE_HYBRID_NODE: + if (!is_ssdfs_btree_node_items_area_empty(node)) { + err = -EEXIST; + goto finish_check_tree; + } else if (!is_ssdfs_btree_node_index_area_empty(node)) { + err = -EEXIST; + goto finish_check_tree; + } else { + /* empty node */ + goto finish_check_tree; + } + break; + + case SSDFS_BTREE_INDEX_NODE: + err = -EEXIST; + goto finish_check_tree; + + case SSDFS_BTREE_ROOT_NODE: + err = -ERANGE; + SSDFS_WARN("node %u has root node type\n", + node_id1); + goto finish_check_tree; + + default: + err = -ERANGE; + SSDFS_ERR("invalid node type %#x\n", + atomic_read(&node->type)); + goto finish_check_tree; + } + +finish_check_tree: + up_read(&tree->lock); + + return err ? false : true; +} + +/* + * need_migrate_generic2inline_btree() - is it time to migrate? + * @tree: btree object + * @items_threshold: items migration threshold + */ +bool need_migrate_generic2inline_btree(struct ssdfs_btree *tree, + int items_threshold) +{ + struct ssdfs_btree_node *node; + struct ssdfs_btree_index_key key1, key2; + int tree_state; + u32 node_id1, node_id2; + u16 items_count; + bool need_migrate = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_state = atomic_read(&tree->state); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree %p, type %#x, state %#x\n", + tree, tree->type, tree_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (tree_state) { + case SSDFS_BTREE_CREATED: + case SSDFS_BTREE_DIRTY: + /* expected state */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + SSDFS_WARN("invalid tree state %#x\n", + tree_state); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + down_read(&tree->lock); + + err = ssdfs_btree_radix_tree_find(tree, + SSDFS_BTREE_ROOT_NODE_ID, + &node); + if (unlikely(err)) { + SSDFS_ERR("fail to find root node: err %d\n", + err); + goto finish_check_tree; + } else if (!node) { + err = -ERANGE; + SSDFS_ERR("node is NULL\n"); + goto finish_check_tree; + } + + if (!is_ssdfs_btree_node_index_area_exist(node)) { + err = -ERANGE; + SSDFS_WARN("root node hasn't index area\n"); + goto finish_check_tree; + } + + if (is_ssdfs_btree_node_index_area_empty(node)) + goto finish_check_tree; + + down_read(&node->full_lock); + err = __ssdfs_btree_root_node_extract_index(node, + SSDFS_ROOT_NODE_LEFT_LEAF_NODE, + &key1); + up_read(&node->full_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to get index: err %d\n", + err); + goto finish_check_tree; + } + + node_id1 = le32_to_cpu(key1.node_id); + if (node_id1 == SSDFS_BTREE_NODE_INVALID_ID) { + SSDFS_WARN("index is invalid\n"); + goto finish_check_tree; + } + + down_read(&node->full_lock); + err = __ssdfs_btree_root_node_extract_index(node, + SSDFS_ROOT_NODE_RIGHT_LEAF_NODE, + &key2); + up_read(&node->full_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to get index: err %d\n", + err); + goto finish_check_tree; + } + + node_id2 = le32_to_cpu(key2.node_id); + if (node_id2 != SSDFS_BTREE_NODE_INVALID_ID) { + err = -EEXIST; + goto finish_check_tree; + } + + err = ssdfs_btree_radix_tree_find(tree, node_id1, &node); + if (unlikely(err)) { + SSDFS_ERR("fail to find node: node_id %u, err %d\n", + node_id1, err); + goto finish_check_tree; + } else if (!node) { + err = -ERANGE; + SSDFS_ERR("node is NULL\n"); + goto finish_check_tree; + } + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_LEAF_NODE: + if (!is_ssdfs_btree_node_items_area_empty(node)) { + down_read(&node->header_lock); + items_count = node->items_area.items_count; + up_read(&node->header_lock); + + if (items_count <= items_threshold) { + /* time to migrate */ + need_migrate = true; + } + + goto finish_check_tree; + } else { + /* empty node */ + goto finish_check_tree; + } + break; + + case SSDFS_BTREE_HYBRID_NODE: + if (is_ssdfs_btree_node_index_area_empty(node) && + !is_ssdfs_btree_node_items_area_empty(node)) { + down_read(&node->header_lock); + items_count = node->items_area.items_count; + up_read(&node->header_lock); + + if (items_count <= items_threshold) { + /* time to migrate */ + need_migrate = true; + } + + goto finish_check_tree; + } else if (!is_ssdfs_btree_node_index_area_empty(node)) { + err = -EEXIST; + goto finish_check_tree; + } else { + /* empty node */ + goto finish_check_tree; + } + break; + + case SSDFS_BTREE_INDEX_NODE: + err = -EEXIST; + goto finish_check_tree; + + case SSDFS_BTREE_ROOT_NODE: + err = -ERANGE; + SSDFS_WARN("node %u has root node type\n", + node_id1); + goto finish_check_tree; + + default: + err = -ERANGE; + SSDFS_ERR("invalid node type %#x\n", + atomic_read(&node->type)); + goto finish_check_tree; + } + +finish_check_tree: + up_read(&tree->lock); + + return need_migrate; +} + +/* + * ssdfs_btree_synchronize_root_node() - synchronize root node state + * @tree: btree object + * @root: root node + */ +int ssdfs_btree_synchronize_root_node(struct ssdfs_btree *tree, + struct ssdfs_btree_inline_root_node *root) +{ + int tree_state; + struct ssdfs_btree_node *node; + u16 items_count; + int height; + size_t ids_array_size = sizeof(__le32) * + SSDFS_BTREE_ROOT_NODE_INDEX_COUNT; + size_t indexes_size = sizeof(struct ssdfs_btree_index) * + SSDFS_BTREE_ROOT_NODE_INDEX_COUNT; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !root); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_state = atomic_read(&tree->state); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree %p, root %p, type %#x, state %#x\n", + tree, root, tree->type, tree_state); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (tree_state) { + case SSDFS_BTREE_CREATED: + case SSDFS_BTREE_DIRTY: + /* expected state */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + SSDFS_WARN("invalid tree state %#x\n", + tree_state); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + down_read(&tree->lock); + + err = ssdfs_btree_radix_tree_find(tree, + SSDFS_BTREE_ROOT_NODE_ID, + &node); + if (unlikely(err)) { + SSDFS_ERR("fail to find root node: err %d\n", + err); + goto finish_synchronize_root; + } else if (!node) { + err = -ERANGE; + SSDFS_ERR("node is NULL\n"); + goto finish_synchronize_root; + } + + down_read(&node->header_lock); + height = atomic_read(&node->tree->height); + root->header.height = (u8)height; + items_count = node->index_area.index_count; + root->header.items_count = cpu_to_le16(items_count); + root->header.flags = (u8)atomic_read(&node->flags); + root->header.type = (u8)atomic_read(&node->type); + ssdfs_memcpy(root->header.node_ids, + 0, ids_array_size, + node->raw.root_node.header.node_ids, + 0, ids_array_size, + ids_array_size); + ssdfs_memcpy(root->indexes, 0, indexes_size, + node->raw.root_node.indexes, 0, indexes_size, + indexes_size); + up_read(&node->header_lock); + + spin_lock(&node->tree->nodes_lock); + root->header.upper_node_id = + cpu_to_le32(node->tree->upper_node_id); + spin_unlock(&node->tree->nodes_lock); + +finish_synchronize_root: + up_read(&tree->lock); + + return err; +} + +/* + * ssdfs_btree_get_next_hash() - get next node's starting hash + * @tree: btree object + * @search: search object + * @next_hash: next node's starting hash [out] + */ +int ssdfs_btree_get_next_hash(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + u64 *next_hash) +{ + struct ssdfs_btree_node *parent; + struct ssdfs_btree_node_index_area area; + struct ssdfs_btree_index_key index_key; + u64 old_hash = U64_MAX; + int type; + spinlock_t *lock; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search || !next_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + old_hash = le64_to_cpu(search->node.found_index.index.hash); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search %p, next_hash %p, old (node %u, hash %llx)\n", + search, next_hash, search->node.id, old_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + *next_hash = U64_MAX; + + parent = search->node.parent; + + if (!parent) { + SSDFS_ERR("node pointer is NULL\n"); + return -ERANGE; + } + + type = atomic_read(&parent->type); + + down_read(&tree->lock); + + do { + u16 found_pos; + + err = -ENOENT; + + down_read(&parent->full_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("old_hash %llx\n", old_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&parent->header_lock); + ssdfs_memcpy(&area, + 0, sizeof(struct ssdfs_btree_node_index_area), + &parent->index_area, + 0, sizeof(struct ssdfs_btree_node_index_area), + sizeof(struct ssdfs_btree_node_index_area)); + err = ssdfs_find_index_by_hash(parent, &area, + old_hash, + &found_pos); + up_read(&parent->header_lock); + + if (err == -EEXIST) { + /* hash == found hash */ + err = 0; + } else if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find the index position: " + "old_hash %llx\n", + old_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_index_search; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find the index position: " + "old_hash %llx, err %d\n", + old_hash, err); + goto finish_index_search; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(found_pos == U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + found_pos++; + + if (found_pos >= area.index_count) { + err = -ENOENT; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index area is finished: " + "found_pos %u, area.index_count %u\n", + found_pos, area.index_count); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_index_search; + } + + if (type == SSDFS_BTREE_ROOT_NODE) { + err = __ssdfs_btree_root_node_extract_index(parent, + found_pos, + &index_key); + } else { + err = __ssdfs_btree_common_node_extract_index(parent, + &area, + found_pos, + &index_key); + } + +finish_index_search: + up_read(&parent->full_lock); + + if (err == -ENOENT) { + if (type == SSDFS_BTREE_ROOT_NODE) { + SSDFS_DBG("no more next hashes\n"); + goto finish_get_next_hash; + } + + spin_lock(&parent->descriptor_lock); + old_hash = le64_to_cpu(parent->node_index.index.hash); + spin_unlock(&parent->descriptor_lock); + + /* try next parent */ + lock = &parent->descriptor_lock; + spin_lock(lock); + parent = parent->parent_node; + spin_unlock(lock); + lock = NULL; + + if (!parent) { + err = -ERANGE; + SSDFS_ERR("node pointer is NULL\n"); + goto finish_get_next_hash; + } + } else if (err == -ENODATA) { + SSDFS_DBG("unable to find the index position\n"); + goto finish_get_next_hash; + } else if (unlikely(err)) { + SSDFS_ERR("fail to extract index key: " + "index_position %u, err %d\n", + found_pos, err); + ssdfs_debug_show_btree_node_indexes(parent->tree, + parent); + goto finish_get_next_hash; + } else { + /* next hash has been found */ + err = 0; + *next_hash = le64_to_cpu(index_key.index.hash); +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("next_hash %llx\n", *next_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_get_next_hash; + } + + type = atomic_read(&parent->type); + } while (parent != NULL); + +finish_get_next_hash: + up_read(&tree->lock); + + return err; +} + +void ssdfs_debug_show_btree_node_indexes(struct ssdfs_btree *tree, + struct ssdfs_btree_node *parent) +{ +#ifdef CONFIG_SSDFS_BTREE_CONSISTENCY_CHECK + struct ssdfs_btree_node_index_area area; + struct ssdfs_btree_index_key index_key; + int type; + int i; + int err = 0; + + BUG_ON(!tree || !parent); + + type = atomic_read(&parent->type); + + if (!is_ssdfs_btree_node_index_area_exist(parent)) { + SSDFS_ERR("corrupted node %u\n", + parent->node_id); + BUG(); + } + + if (is_ssdfs_btree_node_index_area_empty(parent)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u has empty index area\n", + parent->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return; + } + + down_read(&parent->full_lock); + + down_read(&parent->header_lock); + ssdfs_memcpy(&area, + 0, sizeof(struct ssdfs_btree_node_index_area), + &parent->index_area, + 0, sizeof(struct ssdfs_btree_node_index_area), + sizeof(struct ssdfs_btree_node_index_area)); + up_read(&parent->header_lock); + + for (i = 0; i < area.index_count; i++) { + if (type == SSDFS_BTREE_ROOT_NODE) { + err = __ssdfs_btree_root_node_extract_index(parent, + i, + &index_key); + } else { + err = __ssdfs_btree_common_node_extract_index(parent, + &area, i, + &index_key); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to extract index key: " + "index_position %d, err %d\n", + i, err); + goto finish_index_processing; + } + + SSDFS_ERR("index %d, node_id %u, " + "node_type %#x, height %u, " + "flags %#x, hash %llx, seg_id %llu, " + "logical_blk %u, len %u\n", + i, + le32_to_cpu(index_key.node_id), + index_key.node_type, + index_key.height, + le16_to_cpu(index_key.flags), + le64_to_cpu(index_key.index.hash), + le64_to_cpu(index_key.index.extent.seg_id), + le32_to_cpu(index_key.index.extent.logical_blk), + le32_to_cpu(index_key.index.extent.len)); + } + +finish_index_processing: + up_read(&parent->full_lock); + + ssdfs_show_btree_node_info(parent); +#endif /* CONFIG_SSDFS_BTREE_CONSISTENCY_CHECK */ +} + +#ifdef CONFIG_SSDFS_BTREE_CONSISTENCY_CHECK +static +void ssdfs_debug_btree_check_indexes(struct ssdfs_btree *tree, + struct ssdfs_btree_node *parent) +{ + struct ssdfs_btree_node *child = NULL; + struct ssdfs_btree_node_index_area area; + struct ssdfs_btree_index_key index_key; + int type; + u32 node_id1, node_id2; + u64 start_hash1 = U64_MAX; + u64 start_hash2 = U64_MAX; + u64 prev_hash = U64_MAX; + int i; + int err = 0; + + BUG_ON(!tree || !parent); + + type = atomic_read(&parent->type); + + switch (type) { + case SSDFS_BTREE_ROOT_NODE: + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + if (!is_ssdfs_btree_node_index_area_exist(parent)) { + SSDFS_ERR("corrupted node %u\n", + parent->node_id); + ssdfs_show_btree_node_info(parent); + BUG(); + } + break; + + case SSDFS_BTREE_LEAF_NODE: + /* do nothing */ + return; + + default: + BUG(); + } + + if (is_ssdfs_btree_node_index_area_empty(parent)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u has empty index area\n", + parent->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return; + } + + down_read(&parent->full_lock); + + down_read(&parent->header_lock); + ssdfs_memcpy(&area, + 0, sizeof(struct ssdfs_btree_node_index_area), + &parent->index_area, + 0, sizeof(struct ssdfs_btree_node_index_area), + sizeof(struct ssdfs_btree_node_index_area)); + up_read(&parent->header_lock); + + node_id1 = parent->node_id; + + for (i = 0; i < area.index_count; i++) { + if (type == SSDFS_BTREE_ROOT_NODE) { + err = __ssdfs_btree_root_node_extract_index(parent, + i, + &index_key); + } else { + err = __ssdfs_btree_common_node_extract_index(parent, + &area, i, + &index_key); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to extract index key: " + "index_position %d, err %d\n", + i, err); + goto finish_index_processing; + } + + node_id2 = le32_to_cpu(index_key.node_id); + start_hash1 = le64_to_cpu(index_key.index.hash); + + up_read(&parent->full_lock); + + err = ssdfs_btree_radix_tree_find(tree, node_id2, &child); + + if (err || !child) { + SSDFS_ERR("node_id %u is absent\n", + node_id2); + goto continue_tree_check; + } + + switch (atomic_read(&child->type)) { + case SSDFS_BTREE_INDEX_NODE: + if (!is_ssdfs_btree_node_index_area_exist(child)) { + SSDFS_ERR("corrupted node %u\n", + child->node_id); + ssdfs_show_btree_node_info(child); + BUG(); + } + + if (node_id1 == node_id2) { + SSDFS_WARN("node_id1 %u == node_id2 %u\n", + node_id1, node_id2); + ssdfs_show_btree_node_info(child); + BUG(); + } + + down_read(&child->header_lock); + start_hash2 = child->index_area.start_hash; + up_read(&child->header_lock); + break; + + case SSDFS_BTREE_HYBRID_NODE: + if (!is_ssdfs_btree_node_index_area_exist(child)) { + SSDFS_ERR("corrupted node %u\n", + child->node_id); + ssdfs_show_btree_node_info(child); + BUG(); + } + + if (!is_ssdfs_btree_node_items_area_exist(child)) { + SSDFS_ERR("corrupted node %u\n", + child->node_id); + ssdfs_show_btree_node_info(child); + BUG(); + } + + if (node_id1 == node_id2) { + down_read(&child->header_lock); + start_hash2 = child->items_area.start_hash; + up_read(&child->header_lock); + } else { + down_read(&child->header_lock); + start_hash2 = child->index_area.start_hash; + up_read(&child->header_lock); + } + break; + + case SSDFS_BTREE_LEAF_NODE: + if (!is_ssdfs_btree_node_items_area_exist(child)) { + SSDFS_ERR("corrupted node %u\n", + child->node_id); + ssdfs_show_btree_node_info(child); + BUG(); + } + + if (node_id1 == node_id2) { + SSDFS_WARN("node_id1 %u == node_id2 %u\n", + node_id1, node_id2); + ssdfs_show_btree_node_info(child); + BUG(); + } + + down_read(&child->header_lock); + start_hash2 = child->items_area.start_hash; + up_read(&child->header_lock); + break; + + default: + BUG(); + } + + if (start_hash1 != start_hash2) { + SSDFS_WARN("parent: node_id %u, " + "index %d, hash %llx, " + "child: node_id %u, type %#x, " + "start_hash %llx\n", + node_id1, i, start_hash1, + node_id2, atomic_read(&child->type), + start_hash2); + ssdfs_debug_show_btree_node_indexes(tree, + parent); + ssdfs_show_btree_node_info(parent); + ssdfs_show_btree_node_info(child); + BUG(); + } + + if (i > 1) { + if (prev_hash >= start_hash1) { + SSDFS_WARN("node_id %u, index_position %d, " + "prev_hash %llx >= hash %llx\n", + node_id1, i, + prev_hash, start_hash1); + ssdfs_debug_show_btree_node_indexes(tree, + parent); + BUG(); + } + } + +continue_tree_check: + prev_hash = start_hash1; + + down_read(&parent->full_lock); + } + +finish_index_processing: + up_read(&parent->full_lock); +} +#endif /* CONFIG_SSDFS_BTREE_CONSISTENCY_CHECK */ + +void ssdfs_check_btree_consistency(struct ssdfs_btree *tree) +{ +#ifdef CONFIG_SSDFS_BTREE_CONSISTENCY_CHECK + struct radix_tree_iter iter1, iter2; + void **slot1; + void **slot2; + struct ssdfs_btree_node *node1; + struct ssdfs_btree_node *node2; + u32 node_id1, node_id2; + u64 start_hash1, start_hash2; + u64 end_hash1, end_hash2; + u16 items_count; + u16 index_position; + bool is_exist = false; + int err; + + BUG_ON(!tree); + + down_read(&tree->lock); + + rcu_read_lock(); + radix_tree_for_each_slot(slot1, &tree->nodes, &iter1, + SSDFS_BTREE_ROOT_NODE_ID) { + node1 = SSDFS_BTN(radix_tree_deref_slot(slot1)); + + if (!node1) + continue; + + if (is_ssdfs_btree_node_pre_deleted(node1)) { + SSDFS_DBG("node %u has pre-deleted state\n", + node1->node_id); + continue; + } + + rcu_read_unlock(); + + ssdfs_debug_btree_check_indexes(tree, node1); + + switch (atomic_read(&node1->type)) { + case SSDFS_BTREE_ROOT_NODE: + rcu_read_lock(); + continue; + + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + if (!is_ssdfs_btree_node_index_area_exist(node1)) { + SSDFS_ERR("corrupted node %u\n", + node1->node_id); + ssdfs_show_btree_node_info(node1); + BUG(); + } + + down_read(&node1->header_lock); + start_hash1 = node1->index_area.start_hash; + end_hash1 = node1->index_area.end_hash; + up_read(&node1->header_lock); + break; + + case SSDFS_BTREE_LEAF_NODE: + if (!is_ssdfs_btree_node_items_area_exist(node1)) { + SSDFS_ERR("corrupted node %u\n", + node1->node_id); + ssdfs_show_btree_node_info(node1); + BUG(); + } + + down_read(&node1->header_lock); + start_hash1 = node1->items_area.start_hash; + end_hash1 = node1->items_area.end_hash; + up_read(&node1->header_lock); + break; + + default: + BUG(); + } + + SSDFS_DBG("node %u, type %#x, " + "start_hash %llx, end_hash %llx\n", + node1->node_id, atomic_read(&node1->type), + start_hash1, end_hash1); + + err = ssdfs_btree_node_find_index_position(node1->parent_node, + start_hash1, + &index_position); + if (unlikely(err)) { + SSDFS_WARN("fail to find the index position: " + "search_hash %llx, err %d\n", + start_hash1, err); + ssdfs_show_btree_node_info(node1); + BUG(); + } + + node_id1 = node1->node_id; + + down_read(&node1->header_lock); + start_hash1 = node1->items_area.start_hash; + end_hash1 = node1->items_area.end_hash; + items_count = node1->items_area.items_count; + up_read(&node1->header_lock); + + if (start_hash1 >= U64_MAX && end_hash1 >= U64_MAX) { + if (items_count == 0) { + /* + * empty node + */ + rcu_read_lock(); + continue; + } else { + SSDFS_WARN("node_id %u, " + "start_hash1 %llx, end_hash1 %llx\n", + node_id1, start_hash1, end_hash1); + ssdfs_show_btree_node_info(node1); + BUG(); + } + } else if (start_hash1 >= U64_MAX || end_hash1 >= U64_MAX) { + SSDFS_WARN("node_id %u, " + "start_hash1 %llx, end_hash1 %llx\n", + node_id1, start_hash1, end_hash1); + ssdfs_show_btree_node_info(node1); + BUG(); + } + + rcu_read_lock(); + radix_tree_for_each_slot(slot2, &tree->nodes, &iter2, + SSDFS_BTREE_ROOT_NODE_ID) { + node2 = SSDFS_BTN(radix_tree_deref_slot(slot2)); + + if (!node2) + continue; + + if (is_ssdfs_btree_node_pre_deleted(node2)) { + SSDFS_DBG("node %u has pre-deleted state\n", + node2->node_id); + continue; + } + + rcu_read_unlock(); + + is_exist = is_ssdfs_btree_node_items_area_exist(node2); + + switch (atomic_read(&node2->type)) { + case SSDFS_BTREE_ROOT_NODE: + case SSDFS_BTREE_INDEX_NODE: + rcu_read_lock(); + continue; + + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_LEAF_NODE: + if (!is_exist) { + SSDFS_ERR("corrupted node %u\n", + node2->node_id); + ssdfs_show_btree_node_info(node2); + BUG(); + } + break; + + default: + BUG(); + } + + node_id2 = node2->node_id; + + if (node_id1 == node_id2) { + rcu_read_lock(); + continue; + } + + down_read(&node2->header_lock); + start_hash2 = node2->items_area.start_hash; + end_hash2 = node2->items_area.end_hash; + items_count = node2->items_area.items_count; + up_read(&node2->header_lock); + + if (start_hash2 >= U64_MAX && end_hash2 >= U64_MAX) { + if (items_count == 0) { + /* + * empty node + */ + rcu_read_lock(); + continue; + } else { + SSDFS_WARN("node_id %u, " + "start_hash2 %llx, " + "end_hash2 %llx\n", + node_id2, + start_hash2, + end_hash2); + ssdfs_show_btree_node_info(node2); + BUG(); + } + } else if (start_hash2 >= U64_MAX || + end_hash2 >= U64_MAX) { + SSDFS_WARN("node_id %u, start_hash2 %llx, " + "end_hash2 %llx\n", + node_id2, start_hash2, end_hash2); + ssdfs_show_btree_node_info(node2); + BUG(); + } + + if (RANGE_HAS_PARTIAL_INTERSECTION(start_hash1, + end_hash1, + start_hash2, + end_hash2)) { + SSDFS_WARN("there is intersection: " + "node_id %u (start_hash %llx, " + "end_hash %llx), " + "node_id %u (start_hash %llx, " + "end_hash %llx)\n", + node_id1, start_hash1, end_hash1, + node_id2, start_hash2, end_hash2); + ssdfs_debug_show_btree_node_indexes(tree, + node1->parent_node); + ssdfs_show_btree_node_info(node1); + ssdfs_show_btree_node_info(node2); + BUG(); + } + + rcu_read_lock(); + } + + rcu_read_lock(); + } + rcu_read_unlock(); + + up_read(&tree->lock); +#endif /* CONFIG_SSDFS_BTREE_CONSISTENCY_CHECK */ +} + +void ssdfs_debug_btree_object(struct ssdfs_btree *tree) +{ +#ifdef CONFIG_SSDFS_DEBUG + struct radix_tree_iter iter; + void **slot; + struct ssdfs_btree_node *node; + + BUG_ON(!tree); + + down_read(&tree->lock); + + SSDFS_DBG("STATIC DATA: " + "type %#x, owner_ino %llu, node_size %u, " + "pages_per_node %u, node_ptr_size %u, " + "index_size %u, item_size %u, " + "min_item_size %u, max_item_size %u, " + "index_area_min_size %u, create_cno %llu, " + "fsi %p\n", + tree->type, tree->owner_ino, + tree->node_size, tree->pages_per_node, + tree->node_ptr_size, tree->index_size, + tree->item_size, tree->min_item_size, + tree->max_item_size, tree->index_area_min_size, + tree->create_cno, tree->fsi); + + SSDFS_DBG("OPERATIONS: " + "desc_ops %p, btree_ops %p\n", + tree->desc_ops, tree->btree_ops); + + SSDFS_DBG("MUTABLE DATA: " + "state %#x, flags %#x, height %d, " + "upper_node_id %u\n", + atomic_read(&tree->state), + atomic_read(&tree->flags), + atomic_read(&tree->height), + tree->upper_node_id); + + SSDFS_DBG("tree->lock %d, nodes_lock %d\n", + rwsem_is_locked(&tree->lock), + spin_is_locked(&tree->nodes_lock)); + + rcu_read_lock(); + radix_tree_for_each_slot(slot, &tree->nodes, &iter, + SSDFS_BTREE_ROOT_NODE_ID) { + node = SSDFS_BTN(radix_tree_deref_slot(slot)); + + if (!node) + continue; + + SSDFS_DBG("NODE: node_id %u, state %#x, " + "type %#x, height %d, refs_count %d\n", + node->node_id, + atomic_read(&node->state), + atomic_read(&node->type), + atomic_read(&node->height), + atomic_read(&node->refs_count)); + + SSDFS_DBG("INDEX_AREA: state %#x, " + "offset %u, size %u, " + "index_size %u, index_count %u, " + "index_capacity %u, " + "start_hash %llx, end_hash %llx\n", + atomic_read(&node->index_area.state), + node->index_area.offset, + node->index_area.area_size, + node->index_area.index_size, + node->index_area.index_count, + node->index_area.index_capacity, + node->index_area.start_hash, + node->index_area.end_hash); + + SSDFS_DBG("ITEMS_AREA: state %#x, " + "offset %u, size %u, free_space %u, " + "item_size %u, min_item_size %u, " + "max_item_size %u, items_count %u, " + "items_capacity %u, " + "start_hash %llx, end_hash %llx\n", + atomic_read(&node->items_area.state), + node->items_area.offset, + node->items_area.area_size, + node->items_area.free_space, + node->items_area.item_size, + node->items_area.min_item_size, + node->items_area.max_item_size, + node->items_area.items_count, + node->items_area.items_capacity, + node->items_area.start_hash, + node->items_area.end_hash); + } + rcu_read_unlock(); + + up_read(&tree->lock); +#endif /* CONFIG_SSDFS_DEBUG */ +} From patchwork Sat Feb 25 01:09:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151955 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B263AC64ED8 for ; Sat, 25 Feb 2023 01:19:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229748AbjBYBTo (ORCPT ); Fri, 24 Feb 2023 20:19:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49402 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229741AbjBYBRk (ORCPT ); Fri, 24 Feb 2023 20:17:40 -0500 Received: from mail-oi1-x236.google.com (mail-oi1-x236.google.com [IPv6:2607:f8b0:4864:20::236]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 12005199CF for ; Fri, 24 Feb 2023 17:17:22 -0800 (PST) Received: by mail-oi1-x236.google.com with SMTP id be35so815630oib.4 for ; Fri, 24 Feb 2023 17:17:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5itNSNOQ7Xt/KLdPbCOiJeOxGwK3R6LkyoMJFtSgNek=; b=NG1/KKuzrXqidnYDpBDQzZXJ+pBHcMLKhr+2vQApQF66822ZIqKcvgj2M5b0kD32z0 p1V9jUB7ElMOmtY5v+5H+zQGHaCSKXEJ7pPm5EvoEpggiI7VBtwoXw750cOgNABEh6/a 2dITIetP1t0+PVFmPTUZvrZck15tv2I1kkQoMP1N4WTyfm3ZFoUY4MGRTQXzHhiO4Xpg v+dKMqV1zKRNIpzMnuSquTLKTcyzZZQDc5KG/duS31u7PjVt2xzVlt56QAC42ToyRwRm inV7VRtgSeqF3Zr/TAbn9kBlootLnqYtcxTmuIzg4pH8/KZDzJF+Ua1Ta+Xo98V+FeS7 YJMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5itNSNOQ7Xt/KLdPbCOiJeOxGwK3R6LkyoMJFtSgNek=; b=R5VEAJyGO7TQPqNhvf3yAojfVyIpTo5pvae1iI16hvSA0yQJwLdk0qX7Owm2SO7P5l 9glWGnX//fUYbsKwdILnVYZuiTL//tcuJjAZKyrrTaiYDi3ZHAuU4yShDe1a1cEBIres 2dPaKVhvf4g1fs9yW1x6I1Kuf/IJGc7UjLeQzmtD+VHDVtT3OLI3xdSvDplSVocY6ON9 Q+lKJciJLl0gPiXAPzUNjkCNHmaofF3nOBzuvo3kyMD1i121TXawIydi3Vq1CgIkI1Sm hHQg3UM7EXMfEXo9Z3O+J44es+a2cdziuAPTGvSPjZEDbAAFIaTJKMgVRQhm1hJh00FH DygA== X-Gm-Message-State: AO0yUKXv0WCHWxgcjJm6EcQZOwxQ3qn2Zz79+7nWB5OEc6HJ52+rAVSi 51fE0akv0EVv/bk+ViL1We2syjw4/MylzpFb X-Google-Smtp-Source: AK7set8wbggEZ66xf2md4rSJKLjx262WbatAuqG6daRnjmQe0kBzykqqM8JQmlT6hHa7Bgk3NaQpgQ== X-Received: by 2002:a05:6808:4c2:b0:377:ff1d:dcb2 with SMTP id a2-20020a05680804c200b00377ff1ddcb2mr5022274oie.0.1677287840339; Fri, 24 Feb 2023 17:17:20 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:19 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 50/76] ssdfs: introduce b-tree node object Date: Fri, 24 Feb 2023 17:09:01 -0800 Message-Id: <20230225010927.813929-51-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org SSDFS file system uses a hybrid b-tree architecture with the goal to eliminate the index nodes’ side effect. The hybrid b-tree operates by three node types: (1) index node, (2) hybrid node, (3) leaf node. Generally speaking, the peculiarity of hybrid node is the mixture as index as data records into one node. Hybrid b-tree starts with root node that is capable to keep the two index records or two data records inline (if size of data record is equal or lesser than size of index record). If the b-tree needs to contain more than two items then it should be added the first hybrid node into the b-tree. The root level of b-tree is able to contain only two nodes because the root node is capable to store only two index records. Generally speaking, the initial goal of hybrid node is to store the data records in the presence of reserved index area. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/btree_node.c | 2176 +++++++++++++++++++++++++++++++++++++++++ fs/ssdfs/btree_node.h | 768 +++++++++++++++ 2 files changed, 2944 insertions(+) create mode 100644 fs/ssdfs/btree_node.c create mode 100644 fs/ssdfs/btree_node.h diff --git a/fs/ssdfs/btree_node.c b/fs/ssdfs/btree_node.c new file mode 100644 index 000000000000..9f09090e5cfd --- /dev/null +++ b/fs/ssdfs/btree_node.c @@ -0,0 +1,2176 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/btree_node.c - generalized btree node implementation. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "offset_translation_table.h" +#include "page_array.h" +#include "page_vector.h" +#include "peb_container.h" +#include "segment_bitmap.h" +#include "segment.h" +#include "extents_queue.h" +#include "btree_search.h" +#include "btree_node.h" +#include "btree.h" +#include "shared_extents_tree.h" +#include "diff_on_write.h" + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_btree_node_page_leaks; +atomic64_t ssdfs_btree_node_memory_leaks; +atomic64_t ssdfs_btree_node_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_btree_node_cache_leaks_increment(void *kaddr) + * void ssdfs_btree_node_cache_leaks_decrement(void *kaddr) + * void *ssdfs_btree_node_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_btree_node_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_btree_node_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_btree_node_kfree(void *kaddr) + * struct page *ssdfs_btree_node_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_btree_node_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_btree_node_free_page(struct page *page) + * void ssdfs_btree_node_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(btree_node) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(btree_node) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_btree_node_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_btree_node_page_leaks, 0); + atomic64_set(&ssdfs_btree_node_memory_leaks, 0); + atomic64_set(&ssdfs_btree_node_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_btree_node_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_btree_node_page_leaks) != 0) { + SSDFS_ERR("BTREE NODE: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_btree_node_page_leaks)); + } + + if (atomic64_read(&ssdfs_btree_node_memory_leaks) != 0) { + SSDFS_ERR("BTREE NODE: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_btree_node_memory_leaks)); + } + + if (atomic64_read(&ssdfs_btree_node_cache_leaks) != 0) { + SSDFS_ERR("BTREE NODE: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_btree_node_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +/****************************************************************************** + * BTREE NODE CACHE * + ******************************************************************************/ + +static struct kmem_cache *ssdfs_btree_node_obj_cachep; + +void ssdfs_zero_btree_node_obj_cache_ptr(void) +{ + ssdfs_btree_node_obj_cachep = NULL; +} + +static void ssdfs_init_btree_node_object_once(void *obj) +{ + struct ssdfs_btree_node *node_obj = obj; + + memset(node_obj, 0, sizeof(struct ssdfs_btree_node)); +} + +void ssdfs_shrink_btree_node_obj_cache(void) +{ + if (ssdfs_btree_node_obj_cachep) + kmem_cache_shrink(ssdfs_btree_node_obj_cachep); +} + +void ssdfs_destroy_btree_node_obj_cache(void) +{ + if (ssdfs_btree_node_obj_cachep) + kmem_cache_destroy(ssdfs_btree_node_obj_cachep); +} + +int ssdfs_init_btree_node_obj_cache(void) +{ + ssdfs_btree_node_obj_cachep = + kmem_cache_create("ssdfs_btree_node_obj_cache", + sizeof(struct ssdfs_btree_node), 0, + SLAB_RECLAIM_ACCOUNT | + SLAB_MEM_SPREAD | + SLAB_ACCOUNT, + ssdfs_init_btree_node_object_once); + if (!ssdfs_btree_node_obj_cachep) { + SSDFS_ERR("unable to create btree node objects cache\n"); + return -ENOMEM; + } + + return 0; +} + +/* + * ssdfs_btree_node_alloc() - allocate memory for btree node object + */ +static +struct ssdfs_btree_node *ssdfs_btree_node_alloc(void) +{ + struct ssdfs_btree_node *ptr; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ssdfs_btree_node_obj_cachep); +#endif /* CONFIG_SSDFS_DEBUG */ + + ptr = kmem_cache_alloc(ssdfs_btree_node_obj_cachep, GFP_KERNEL); + if (!ptr) { + SSDFS_ERR("fail to allocate memory for btree node object\n"); + return ERR_PTR(-ENOMEM); + } + + ssdfs_btree_node_cache_leaks_increment(ptr); + + return ptr; +} + +/* + * ssdfs_btree_node_free() - free memory for btree node object + */ +static +void ssdfs_btree_node_free(struct ssdfs_btree_node *ptr) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ssdfs_btree_node_obj_cachep); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ptr) + return; + + ssdfs_btree_node_cache_leaks_decrement(ptr); + kmem_cache_free(ssdfs_btree_node_obj_cachep, ptr); +} + +/****************************************************************************** + * BTREE NODE OBJECT FUNCTIONALITY * + ******************************************************************************/ + +/* + * ssdfs_btree_node_create_empty_index_area() - create empty index area + * @tree: btree object + * @node: node object + * @type: node's type + * @start_hash: starting hash of the node + * + * This method tries to create the empty index area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + */ +static +int ssdfs_btree_node_create_empty_index_area(struct ssdfs_btree *tree, + struct ssdfs_btree_node *node, + int type, + u64 start_hash) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !node); + + if (type <= SSDFS_BTREE_NODE_UNKNOWN_TYPE || + type >= SSDFS_BTREE_NODE_TYPE_MAX) { + SSDFS_WARN("invalid node type %#x\n", type); + return -EINVAL; + } + + SSDFS_DBG("tree %p, node %p, " + "type %#x, start_hash %llx\n", + tree, node, type, start_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset(&node->index_area, 0xFF, + sizeof(struct ssdfs_btree_node_index_area)); + + switch (type) { + case SSDFS_BTREE_ROOT_NODE: + atomic_set(&node->index_area.state, + SSDFS_BTREE_NODE_INDEX_AREA_EXIST); + node->index_area.offset = + offsetof(struct ssdfs_btree_inline_root_node, indexes); + node->index_area.index_size = sizeof(struct ssdfs_btree_index); + node->index_area.index_capacity = + SSDFS_BTREE_ROOT_NODE_INDEX_COUNT; + node->index_area.area_size = node->index_area.index_size; + node->index_area.area_size *= node->index_area.index_capacity; + node->index_area.index_count = 0; + node->index_area.start_hash = start_hash; + node->index_area.end_hash = U64_MAX; + break; + + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_INDEX_NODE: + /* + * Partial preliminary initialization. + * The final creation should be done in specialized + * tree->btree_ops->create_node() and + * tree->btree_ops->init_node() methods. + */ + atomic_set(&node->index_area.state, + SSDFS_BTREE_NODE_INDEX_AREA_EXIST); + atomic_or(SSDFS_BTREE_NODE_HAS_INDEX_AREA, &node->flags); + node->index_area.index_size = + sizeof(struct ssdfs_btree_index_key); + node->index_area.index_count = 0; + node->index_area.start_hash = start_hash; + node->index_area.end_hash = U64_MAX; + break; + + case SSDFS_BTREE_LEAF_NODE: + atomic_set(&node->index_area.state, + SSDFS_BTREE_NODE_AREA_ABSENT); + node->index_area.index_size = 0; + node->index_area.index_capacity = 0; + node->index_area.area_size = 0; + node->index_area.index_count = 0; + break; + + default: + SSDFS_WARN("invalid node type %#x\n", type); + return -EINVAL; + } + + return 0; +} + +/* + * ssdfs_btree_node_create_empty_items_area() - create empty items area + * @tree: btree object + * @node: node object + * @type: node's type + * @start_hash: starting hash of the node + * + * This method tries to create the empty index area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + */ +static +int ssdfs_btree_node_create_empty_items_area(struct ssdfs_btree *tree, + struct ssdfs_btree_node *node, + int type, + u64 start_hash) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !node); + + if (type <= SSDFS_BTREE_NODE_UNKNOWN_TYPE || + type >= SSDFS_BTREE_NODE_TYPE_MAX) { + SSDFS_WARN("invalid node type %#x\n", type); + return -EINVAL; + } + + SSDFS_DBG("tree %p, node %p, " + "type %#x, start_hash %llx\n", + tree, node, type, start_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset(&node->items_area, 0xFF, + sizeof(struct ssdfs_btree_node_items_area)); + + switch (type) { + case SSDFS_BTREE_ROOT_NODE: + case SSDFS_BTREE_INDEX_NODE: + atomic_set(&node->items_area.state, + SSDFS_BTREE_NODE_AREA_ABSENT); + node->items_area.area_size = 0; + node->items_area.item_size = 0; + node->items_area.min_item_size = 0; + node->items_area.max_item_size = 0; + node->items_area.items_count = 0; + node->items_area.items_capacity = 0; + break; + + case SSDFS_BTREE_HYBRID_NODE: + /* + * Partial preliminary initialization. + * The final creation should be done in specialized + * tree->btree_ops->create_node() and + * tree->btree_ops->init_node() methods. + */ + atomic_set(&node->items_area.state, + SSDFS_BTREE_NODE_ITEMS_AREA_EXIST); + atomic_or(SSDFS_BTREE_NODE_HAS_ITEMS_AREA, &node->flags); + node->items_area.item_size = tree->item_size; + node->items_area.min_item_size = tree->min_item_size; + node->items_area.max_item_size = tree->max_item_size; + node->items_area.start_hash = start_hash; + node->items_area.end_hash = start_hash; + break; + + case SSDFS_BTREE_LEAF_NODE: + atomic_set(&node->items_area.state, + SSDFS_BTREE_NODE_ITEMS_AREA_EXIST); + atomic_or(SSDFS_BTREE_NODE_HAS_ITEMS_AREA, &node->flags); + node->items_area.item_size = tree->item_size; + node->items_area.min_item_size = tree->min_item_size; + node->items_area.max_item_size = tree->max_item_size; + node->items_area.start_hash = start_hash; + node->items_area.end_hash = start_hash; + break; + + default: + SSDFS_WARN("invalid node type %#x\n", type); + return -EINVAL; + } + + return 0; +} + +/* + * ssdfs_btree_node_create_empty_lookup_table() - create empty lookup table + * @node: node object + * + * This method tries to create the empty lookup table area. + */ +static +void ssdfs_btree_node_create_empty_lookup_table(struct ssdfs_btree_node *node) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node %p\n", node); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset(&node->lookup_tbl_area, 0xFF, + sizeof(struct ssdfs_btree_node_index_area)); + + atomic_set(&node->lookup_tbl_area.state, + SSDFS_BTREE_NODE_AREA_ABSENT); + node->lookup_tbl_area.index_size = 0; + node->lookup_tbl_area.index_capacity = 0; + node->lookup_tbl_area.area_size = 0; + node->lookup_tbl_area.index_count = 0; +} + +/* + * ssdfs_btree_node_create_empty_hash_table() - create empty hash table + * @node: node object + * + * This method tries to create the empty hash table area. + */ +static +void ssdfs_btree_node_create_empty_hash_table(struct ssdfs_btree_node *node) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node %p\n", node); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset(&node->hash_tbl_area, 0xFF, + sizeof(struct ssdfs_btree_node_index_area)); + + atomic_set(&node->hash_tbl_area.state, + SSDFS_BTREE_NODE_AREA_ABSENT); + node->hash_tbl_area.index_size = 0; + node->hash_tbl_area.index_capacity = 0; + node->hash_tbl_area.area_size = 0; + node->hash_tbl_area.index_count = 0; +} + +/* + * ssdfs_btree_node_create() - create btree node object + * @tree: btree object + * @node_id: node ID number + * @parent: parent node + * @height: node's height + * @type: node's type + * @start_hash: starting hash of the node + * + * This method tries to create a btree node object. + * + * RETURN: + * [success] - pointer on created btree node object. + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENOMEM - cannot allocate memory. + * %-ERANGE - internal error. + */ +struct ssdfs_btree_node * +ssdfs_btree_node_create(struct ssdfs_btree *tree, + u32 node_id, + struct ssdfs_btree_node *parent, + u8 height, int type, + u64 start_hash) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_btree_node *ptr; + u8 tree_height; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !tree->fsi); + + if (type <= SSDFS_BTREE_NODE_UNKNOWN_TYPE || + type >= SSDFS_BTREE_NODE_TYPE_MAX) { + SSDFS_WARN("invalid node type %#x\n", type); + return ERR_PTR(-EINVAL); + } + + if (type != SSDFS_BTREE_ROOT_NODE && !parent) { + SSDFS_WARN("node %u should have parent\n", + node_id); + return ERR_PTR(-EINVAL); + } +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p, parent %p, node_id %u, " + "height %u, type %#x, start_hash %llx\n", + tree, parent, node_id, height, + type, start_hash); +#else + SSDFS_DBG("tree %p, parent %p, node_id %u, " + "height %u, type %#x, start_hash %llx\n", + tree, parent, node_id, height, + type, start_hash); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + fsi = tree->fsi; + + ptr = ssdfs_btree_node_alloc(); + if (!ptr) { + SSDFS_ERR("fail to allocate btree node object\n"); + return ERR_PTR(-ENOMEM); + } + + ptr->parent_node = parent; + ptr->tree = tree; + + if (node_id == SSDFS_BTREE_NODE_INVALID_ID) { + err = -EINVAL; + SSDFS_WARN("invalid node_id\n"); + goto fail_create_node; + } + ptr->node_id = node_id; + + tree_height = atomic_read(&tree->height); + if (height > tree_height) { + err = -EINVAL; + SSDFS_WARN("height %u > tree->height %u\n", + height, tree_height); + goto fail_create_node; + } + + atomic_set(&ptr->height, height); + +#ifdef CONFIG_SSDFS_DEBUG + if (tree->node_size < fsi->pagesize || + tree->node_size > fsi->erasesize) { + err = -EINVAL; + SSDFS_WARN("invalid node_size %u, " + "pagesize %u, erasesize %u\n", + tree->node_size, + fsi->pagesize, + fsi->erasesize); + goto fail_create_node; + } +#endif /* CONFIG_SSDFS_DEBUG */ + ptr->node_size = tree->node_size; + +#ifdef CONFIG_SSDFS_DEBUG + if (tree->pages_per_node != (ptr->node_size / fsi->pagesize)) { + err = -EINVAL; + SSDFS_WARN("invalid pages_per_node %u, " + "node_size %u, pagesize %u\n", + tree->pages_per_node, + ptr->node_size, + fsi->pagesize); + goto fail_create_node; + } +#endif /* CONFIG_SSDFS_DEBUG */ + ptr->pages_per_node = tree->pages_per_node; + + ptr->create_cno = ssdfs_current_cno(fsi->sb); + ptr->node_ops = NULL; + + atomic_set(&ptr->refs_count, 0); + atomic_set(&ptr->flags, 0); + atomic_set(&ptr->type, type); + + init_rwsem(&ptr->header_lock); + memset(&ptr->raw, 0xFF, sizeof(ptr->raw)); + + err = ssdfs_btree_node_create_empty_index_area(tree, ptr, + type, + start_hash); + if (unlikely(err)) { + SSDFS_ERR("fail to create empty index area: err %d\n", + err); + goto fail_create_node; + } + + err = ssdfs_btree_node_create_empty_items_area(tree, ptr, + type, + start_hash); + if (unlikely(err)) { + SSDFS_ERR("fail to create empty items area: err %d\n", + err); + goto fail_create_node; + } + + ssdfs_btree_node_create_empty_lookup_table(ptr); + ssdfs_btree_node_create_empty_hash_table(ptr); + + spin_lock_init(&ptr->descriptor_lock); + ptr->update_cno = ptr->create_cno; + + /* + * Partial preliminary initialization. + * The final creation should be done in specialized + * tree->btree_ops->create_node() and + * tree->btree_ops->init_node() methods. + */ + memset(&ptr->extent, 0xFF, sizeof(struct ssdfs_raw_extent)); + ptr->seg = NULL; + + ptr->node_index.node_id = cpu_to_le32(node_id); + ptr->node_index.node_type = (u8)type; + ptr->node_index.height = height; + ptr->node_index.flags = cpu_to_le16(SSDFS_BTREE_INDEX_SHOW_EMPTY_NODE); + ptr->node_index.index.hash = cpu_to_le64(start_hash); + + init_completion(&ptr->init_end); + + /* + * Partial preliminary initialization. + * The final creation should be done in specialized + * tree->btree_ops->create_node() and + * tree->btree_ops->init_node() methods. + */ + init_rwsem(&ptr->bmap_array.lock); + ptr->bmap_array.bits_count = 0; + ptr->bmap_array.bmap_bytes = 0; + ptr->bmap_array.index_start_bit = ULONG_MAX; + ptr->bmap_array.item_start_bit = ULONG_MAX; + for (i = 0; i < SSDFS_BTREE_NODE_BMAP_COUNT; i++) { + spin_lock_init(&ptr->bmap_array.bmap[i].lock); + ptr->bmap_array.bmap[i].flags = 0; + ptr->bmap_array.bmap[i].ptr = NULL; + } + + init_waitqueue_head(&ptr->wait_queue); + init_rwsem(&ptr->full_lock); + + atomic_set(&ptr->state, SSDFS_BTREE_NODE_CREATED); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return ptr; + +fail_create_node: + ptr->parent_node = NULL; + ptr->tree = NULL; + ssdfs_btree_node_free(ptr); + return ERR_PTR(err); +} + +/* + * ssdfs_btree_create_root_node() - create root node + * @node: node object + * @root_node: pointer on the on-disk root node object + * + * This method tries to create the root node of the btree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - corrupted root node object. + */ +int ssdfs_btree_create_root_node(struct ssdfs_btree_node *node, + struct ssdfs_btree_inline_root_node *root_node) +{ + struct ssdfs_btree_root_node_header *hdr; + struct ssdfs_btree_index *index1, *index2; + size_t rnode_size = sizeof(struct ssdfs_btree_inline_root_node); + u8 height; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !root_node); + + SSDFS_DBG("node %p, root_node %p\n", + node, root_node); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr = &root_node->header; + + if (hdr->type != SSDFS_BTREE_ROOT_NODE) { + SSDFS_ERR("invalid node type %#x\n", + hdr->type); + return -EIO; + } + + if (hdr->items_count > SSDFS_BTREE_ROOT_NODE_INDEX_COUNT) { + SSDFS_ERR("invalid items_count %u\n", + hdr->items_count); + return -EIO; + } + + height = hdr->height; + + if (height >= U8_MAX) { + SSDFS_ERR("invalid height %u\n", + height); + return -EIO; + } + + if (le32_to_cpu(hdr->upper_node_id) == 0) { + height = 1; + atomic_set(&node->tree->height, height); + atomic_set(&node->height, height - 1); + } else { + if (height == 0) { + SSDFS_ERR("invalid height %u\n", + height); + return -EIO; + } + + atomic_set(&node->tree->height, height); + atomic_set(&node->height, height - 1); + } + + node->node_size = rnode_size; + node->pages_per_node = 0; + node->create_cno = le64_to_cpu(0); + node->tree->create_cno = node->create_cno; + node->node_id = SSDFS_BTREE_ROOT_NODE_ID; + + node->parent_node = NULL; + node->node_ops = NULL; + + atomic_set(&node->flags, hdr->flags); + atomic_set(&node->type, hdr->type); + + down_write(&node->header_lock); + ssdfs_memcpy(&node->raw.root_node, 0, rnode_size, + root_node, 0, rnode_size, + rnode_size); + node->index_area.index_count = hdr->items_count; + node->index_area.start_hash = U64_MAX; + node->index_area.end_hash = U64_MAX; + if (hdr->items_count > 0) { + index1 = &root_node->indexes[SSDFS_ROOT_NODE_LEFT_LEAF_NODE]; + node->index_area.start_hash = le64_to_cpu(index1->hash); + } + if (hdr->items_count > 1) { + index2 = &root_node->indexes[SSDFS_ROOT_NODE_RIGHT_LEAF_NODE]; + node->index_area.end_hash = le64_to_cpu(index2->hash); + } + up_write(&node->header_lock); + + spin_lock(&node->tree->nodes_lock); + node->tree->upper_node_id = + le32_to_cpu(root_node->header.upper_node_id); + spin_unlock(&node->tree->nodes_lock); + + atomic_set(&node->state, SSDFS_BTREE_NODE_INITIALIZED); + return 0; +} + +/* + * ssdfs_btree_node_allocate_content_space() - allocate content space + * @node: node object + * @node_size: node size + */ +int ssdfs_btree_node_allocate_content_space(struct ssdfs_btree_node *node, + u32 node_size) +{ + struct page *page; + u32 pages_count; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree); + + SSDFS_DBG("node_id %u, node_size %u\n", + node->node_id, node_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + pages_count = node_size / PAGE_SIZE; + + if (pages_count == 0 || pages_count > PAGEVEC_SIZE) { + SSDFS_ERR("invalid pages_count %u\n", + pages_count); + return -ERANGE; + } + + down_write(&node->full_lock); + + pagevec_init(&node->content.pvec); + for (i = 0; i < pages_count; i++) { + page = ssdfs_btree_node_alloc_page(GFP_KERNEL | __GFP_ZERO); + if (IS_ERR_OR_NULL(page)) { + err = (page == NULL ? -ENOMEM : PTR_ERR(page)); + SSDFS_ERR("unable to allocate memory page\n"); + goto finish_init_pvec; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + pagevec_add(&node->content.pvec, page); + } + +finish_init_pvec: + up_write(&node->full_lock); + + return err; +} + +/* + * ssdfs_btree_node_allocate_bmaps() - allocate node's bitmaps + * @addr: array of pointers + * @bmap_bytes: size of the bitmap in bytes + */ +int ssdfs_btree_node_allocate_bmaps(void *addr[SSDFS_BTREE_NODE_BMAP_COUNT], + size_t bmap_bytes) +{ + int i; + + for (i = 0; i < SSDFS_BTREE_NODE_BMAP_COUNT; i++) { + addr[i] = ssdfs_btree_node_kzalloc(bmap_bytes, GFP_KERNEL); + if (!addr[i]) { + SSDFS_ERR("fail to allocate node's bmap: index %d\n", + i); + for (; i >= 0; i--) { + ssdfs_btree_node_kfree(addr[i]); + addr[i] = NULL; + } + return -ENOMEM; + } + } + + return 0; +} + +/* + * ssdfs_btree_node_init_bmaps() - init node's bitmaps + * @node: node object + * @addr: array of pointers + */ +void ssdfs_btree_node_init_bmaps(struct ssdfs_btree_node *node, + void *addr[SSDFS_BTREE_NODE_BMAP_COUNT]) +{ + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->bmap_array.lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < SSDFS_BTREE_NODE_BMAP_COUNT; i++) { + void *tmp_addr = NULL; + + spin_lock(&node->bmap_array.bmap[i].lock); + tmp_addr = node->bmap_array.bmap[i].ptr; + node->bmap_array.bmap[i].ptr = addr[i]; + addr[i] = NULL; + spin_unlock(&node->bmap_array.bmap[i].lock); + + if (tmp_addr) + ssdfs_btree_node_kfree(tmp_addr); + } +} + +/* + * ssdfs_btree_node_destroy() - destroy the btree node + * @node: node object + */ +void ssdfs_btree_node_destroy(struct ssdfs_btree_node *node) +{ + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("node_id %u, height %u, type %#x\n", + node->node_id, atomic_read(&node->height), + atomic_read(&node->type)); +#else + SSDFS_DBG("node_id %u, height %u, type %#x\n", + node->node_id, atomic_read(&node->height), + atomic_read(&node->type)); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_DIRTY: + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_ROOT_NODE: + /* ignore root node dirty state */ + break; + + default: + SSDFS_WARN("node %u is dirty\n", node->node_id); + break; + } + /* FALLTHRU */ + fallthrough; + case SSDFS_BTREE_NODE_CREATED: + /* FALLTHRU */ + fallthrough; + case SSDFS_BTREE_NODE_INITIALIZED: + atomic_set(&node->state, SSDFS_BTREE_NODE_UNKNOWN_STATE); + wake_up_all(&node->wait_queue); + complete_all(&node->init_end); + + spin_lock(&node->descriptor_lock); + if (node->seg) { + ssdfs_segment_put_object(node->seg); + node->seg = NULL; + } + spin_unlock(&node->descriptor_lock); + + if (rwsem_is_locked(&node->bmap_array.lock)) { + /* inform about possible trouble */ + SSDFS_WARN("node is locked under destruction\n"); + } + + node->bmap_array.bits_count = 0; + node->bmap_array.bmap_bytes = 0; + node->bmap_array.index_start_bit = ULONG_MAX; + node->bmap_array.item_start_bit = ULONG_MAX; + for (i = 0; i < SSDFS_BTREE_NODE_BMAP_COUNT; i++) { + spin_lock(&node->bmap_array.bmap[i].lock); + ssdfs_btree_node_kfree(node->bmap_array.bmap[i].ptr); + node->bmap_array.bmap[i].ptr = NULL; + spin_unlock(&node->bmap_array.bmap[i].lock); + } + + if (rwsem_is_locked(&node->full_lock)) { + /* inform about possible trouble */ + SSDFS_WARN("node is locked under destruction\n"); + } + + if (atomic_read(&node->type) != SSDFS_BTREE_ROOT_NODE) { + struct pagevec *pvec = &node->content.pvec; + ssdfs_btree_node_pagevec_release(pvec); + } + break; + + default: + SSDFS_WARN("invalid node state: " + "node %u, state %#x\n", + node->node_id, + atomic_read(&node->state)); + break; + } + + ssdfs_btree_node_free(node); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ +} + +/* + * __ssdfs_btree_node_prepare_content() - prepare the btree node's content + * @fsi: pointer on shared file system object + * @ptr: btree node's index + * @node_size: size of the node + * @owner_ino: owner inode ID + * @si: segment object [out] + * @pvec: pagevec with node's content [out] + * + * This method tries to read the raw node from the volume. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +int __ssdfs_btree_node_prepare_content(struct ssdfs_fs_info *fsi, + struct ssdfs_btree_index_key *ptr, + u32 node_size, + u64 owner_ino, + struct ssdfs_segment_info **si, + struct pagevec *pvec) +{ + struct ssdfs_segment_request *req; + struct ssdfs_peb_container *pebc; + struct ssdfs_blk2off_table *table; + struct ssdfs_offset_position pos; + u32 node_id; + u8 node_type; + u8 height; + u64 seg_id; + u32 logical_blk; + u32 len; + u32 pvec_size; + u64 logical_offset; + u32 data_bytes; + struct completion *end; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !ptr || !si || !pvec); +#endif /* CONFIG_SSDFS_DEBUG */ + + node_id = le32_to_cpu(ptr->node_id); + node_type = ptr->node_type; + height = ptr->height; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, node_size %u, height %u, type %#x\n", + node_id, node_size, height, node_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (node_type <= SSDFS_BTREE_NODE_UNKNOWN_TYPE || + node_type >= SSDFS_BTREE_NODE_TYPE_MAX) { + SSDFS_WARN("invalid node type %#x\n", + node_type); + return -ERANGE; + } + + if (node_type == SSDFS_BTREE_ROOT_NODE) { + SSDFS_WARN("root node should be initialize during creation\n"); + return -ERANGE; + } + + seg_id = le64_to_cpu(ptr->index.extent.seg_id); + logical_blk = le32_to_cpu(ptr->index.extent.logical_blk); + len = le32_to_cpu(ptr->index.extent.len); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, logical_blk %u, len %u\n", + seg_id, logical_blk, len); + + BUG_ON(seg_id == U64_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + *si = ssdfs_grab_segment(fsi, NODE2SEG_TYPE(node_type), + seg_id, U64_MAX); + if (unlikely(IS_ERR_OR_NULL(*si))) { + err = !*si ? -ENOMEM : PTR_ERR(*si); + if (err == -EINTR) { + /* + * Ignore this error. + */ + } else { + SSDFS_ERR("fail to grab segment object: " + "seg %llu, err %d\n", + seg_id, err); + } + goto fail_get_segment; + } + + pvec_size = node_size >> PAGE_SHIFT; + + if (pvec_size == 0 || pvec_size > PAGEVEC_SIZE) { + err = -ERANGE; + SSDFS_WARN("invalid memory pages count: " + "node_size %u, pvec_size %u\n", + node_size, pvec_size); + goto finish_prepare_content; + } + + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + goto finish_prepare_content; + } + + ssdfs_request_init(req); + ssdfs_get_request(req); + + logical_offset = (u64)node_id * node_size; + data_bytes = node_size; + ssdfs_request_prepare_logical_extent(owner_ino, + (u64)logical_offset, + (u32)data_bytes, + 0, 0, req); + + for (i = 0; i < pvec_size; i++) { + err = ssdfs_request_add_allocated_page_locked(req); + if (unlikely(err)) { + SSDFS_ERR("fail to add page into request: " + "err %d\n", + err); + goto fail_read_node; + } + } + + ssdfs_request_define_segment(seg_id, req); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(logical_blk >= U16_MAX); + BUG_ON(len >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + ssdfs_request_define_volume_extent((u16)logical_blk, (u16)len, req); + + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + SSDFS_READ_PAGES_READAHEAD, + SSDFS_REQ_SYNC, + req); + + table = (*si)->blk2off_table; + + err = ssdfs_blk2off_table_get_offset_position(table, logical_blk, &pos); + if (err == -EAGAIN) { + end = &table->full_init_end; + + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("blk2off init failed: " + "seg_id %llu, logical_blk %u, " + "len %u, err %d\n", + seg_id, logical_blk, len, err); + + for (i = 0; i < (*si)->pebs_count; i++) { + u64 peb_id1 = U64_MAX; + u64 peb_id2 = U64_MAX; + + pebc = &(*si)->peb_array[i]; + + if (pebc->src_peb) + peb_id1 = pebc->src_peb->peb_id; + + if (pebc->dst_peb) + peb_id2 = pebc->dst_peb->peb_id; + + SSDFS_ERR("seg_id %llu, peb_index %u, " + "src_peb %llu, dst_peb %llu\n", + seg_id, i, peb_id1, peb_id2); + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#endif /* CONFIG_SSDFS_DEBUG */ + + goto fail_read_node; + } + + err = ssdfs_blk2off_table_get_offset_position(table, + logical_blk, + &pos); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to convert: " + "seg_id %llu, logical_blk %u, len %u, err %d\n", + seg_id, logical_blk, len, err); + goto fail_read_node; + } + + pebc = &(*si)->peb_array[pos.peb_index]; + + err = ssdfs_peb_readahead_pages(pebc, req, &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("PEB init failed: " + "err %d\n", err); + goto fail_read_node; + } + + err = ssdfs_peb_readahead_pages(pebc, req, &end); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to read page: err %d\n", + err); + goto fail_read_node; + } + + for (i = 0; i < req->result.processed_blks; i++) + ssdfs_peb_mark_request_block_uptodate(pebc, req, i); + +#ifdef CONFIG_SSDFS_DEBUG + for (i = 0; i < pagevec_count(&req->result.pvec); i++) { + void *kaddr; + struct page *page = req->result.pvec.pages[i]; + + kaddr = kmap_local_page(page); + SSDFS_DBG("PAGE DUMP: index %d\n", + i); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, + PAGE_SIZE); + SSDFS_DBG("\n"); + kunmap_local(kaddr); + + WARN_ON(!PageLocked(page)); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_btree_node_pagevec_release(pvec); + + for (i = 0; i < pagevec_count(&req->result.pvec); i++) { + pagevec_add(pvec, req->result.pvec.pages[i]); + ssdfs_btree_node_account_page(req->result.pvec.pages[i]); + ssdfs_request_unlock_and_remove_page(req, i); + } + pagevec_reinit(&req->result.pvec); + + ssdfs_request_unlock_and_remove_diffs(req); + + ssdfs_put_request(req); + ssdfs_request_free(req); + + return 0; + +fail_read_node: + ssdfs_request_unlock_and_remove_pages(req); + ssdfs_put_request(req); + ssdfs_request_free(req); + +finish_prepare_content: + ssdfs_segment_put_object(*si); + +fail_get_segment: + return err; +} + +/* + * ssdfs_btree_node_prepare_content() - prepare the btree node's content + * @node: node object + * @ptr: btree node's index + * + * This method tries to read the raw node from the volume. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_btree_node_prepare_content(struct ssdfs_btree_node *node, + struct ssdfs_btree_index_key *ptr) +{ + struct ssdfs_segment_info *si = NULL; + size_t extent_size = sizeof(struct ssdfs_raw_extent); + u32 node_id; + u8 node_type; + u8 height; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !ptr); + + SSDFS_DBG("node_id %u, height %u, type %#x\n", + node->node_id, atomic_read(&node->height), + atomic_read(&node->type)); +#endif /* CONFIG_SSDFS_DEBUG */ + + node_id = le32_to_cpu(ptr->node_id); + node_type = ptr->node_type; + height = ptr->height; + +#ifdef CONFIG_SSDFS_DEBUG + if (node->node_id != node_id) { + SSDFS_WARN("node->node_id %u != node_id %u\n", + node->node_id, node_id); + return -EINVAL; + } + + if (atomic_read(&node->type) != node_type) { + SSDFS_WARN("node->type %#x != node_type %#x\n", + atomic_read(&node->type), node_type); + return -EINVAL; + } + + if (atomic_read(&node->height) != height) { + SSDFS_WARN("node->height %u != height %u\n", + atomic_read(&node->height), height); + return -EINVAL; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&node->full_lock); + err = __ssdfs_btree_node_prepare_content(node->tree->fsi, ptr, + node->node_size, + node->tree->owner_ino, + &si, + &node->content.pvec); + up_write(&node->full_lock); + + if (err == -EINTR) { + /* + * Ignore this error. + */ + goto finish_prepare_node_content; + } else if (unlikely(err)) { + SSDFS_ERR("fail to prepare node's content: " + "node_id %u, err %d\n", + node->node_id, err); + goto finish_prepare_node_content; + } + + spin_lock(&node->descriptor_lock); + ssdfs_memcpy(&node->extent, 0, extent_size, + &ptr->index.extent, 0, extent_size, + extent_size); + node->seg = si; + spin_unlock(&node->descriptor_lock); + + atomic_set(&node->state, SSDFS_BTREE_NODE_CONTENT_PREPARED); + +finish_prepare_node_content: + return err; +} + +/* + * __ssdfs_define_memory_page() - define memory page for the position + * @area_offset: area offset from the node's beginning + * @area_size: size of the area + * @node_size: node size in bytes + * @item_size: size of the item in bytes + * @position: position of index record in the node + * @page_index: index of memory page in the node [out] + * @page_off: offset from the memory page's beginning in bytes [out] + * + * This method tries to define a memory page's index and byte + * offset to the index record. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int __ssdfs_define_memory_page(u32 area_offset, u32 area_size, + u32 node_size, size_t item_size, + u16 position, + u32 *page_index, u32 *page_off) +{ + u32 item_offset; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page_index || !page_off); + + SSDFS_DBG("area_offset %u, area_size %u, " + "node_size %u, item_size %zu, position %u\n", + area_offset, area_size, + node_size, item_size, position); +#endif /* CONFIG_SSDFS_DEBUG */ + + *page_index = U32_MAX; + *page_off = U32_MAX; + + item_offset = position * item_size; + if (item_offset >= area_size) { + SSDFS_ERR("item_offset %u >= area_size %u\n", + item_offset, area_size); + return -ERANGE; + } + + item_offset += area_offset; + + if (item_offset >= (area_offset + area_size)) { + SSDFS_ERR("invalid index offset: " + "item_offset %u, area_offset %u, " + "area_size %u\n", + item_offset, area_offset, area_size); + return -ERANGE; + } + + *page_index = item_offset >> PAGE_SHIFT; + *page_off = item_offset % PAGE_SIZE; + + if ((*page_off + item_size) > PAGE_SIZE) { + SSDFS_ERR("invalid request: " + "page_off %u, item_size %zu\n", + *page_off, item_size); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_define_memory_page() - define memory page for the position + * @node: node object + * @area: pointer on index area descriptor + * @position: position of index record in the node + * @page_index: index of memory page in the node [out] + * @page_off: offset from the memory page's beginning in bytes [out] + * + * This method tries to define a memory page's index and byte + * offset to the index record. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_define_memory_page(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_index_area *area, + u16 position, + u32 *page_index, u32 *page_off) +{ + size_t index_size = sizeof(struct ssdfs_btree_index_key); + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !page_index || !page_off); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); + + SSDFS_DBG("node_id %u, node_type %#x, position %u\n", + node->node_id, atomic_read(&node->type), + position); +#endif /* CONFIG_SSDFS_DEBUG */ + + *page_index = U32_MAX; + *page_off = U32_MAX; + + if (atomic_read(&area->state) != SSDFS_BTREE_NODE_INDEX_AREA_EXIST) { + SSDFS_ERR("invalid area state %#x\n", + atomic_read(&area->state)); + return -ERANGE; + } + + if (area->index_capacity == 0 || + area->index_count > area->index_capacity) { + SSDFS_ERR("invalid area: " + "index_count %u, index_capacity %u\n", + area->index_count, + area->index_capacity); + return -ERANGE; + } + + if (position > area->index_count) { + SSDFS_ERR("position %u > index_count %u\n", + position, area->index_count); + return -ERANGE; + } + + if ((area->offset + area->area_size) > node->node_size) { + SSDFS_ERR("invalid area: " + "offset %u, area_size %u, node_size %u\n", + area->offset, + area->area_size, + node->node_size); + return -ERANGE; + } + + if (area->index_size != index_size) { + SSDFS_ERR("invalid index size %u\n", + area->index_size); + return -ERANGE; + } + + err = __ssdfs_define_memory_page(area->offset, area->area_size, + node->node_size, index_size, + position, page_index, page_off); + if (unlikely(err)) { + SSDFS_ERR("fail to define page index: err %d\n", + err); + return err; + } + + if ((*page_off + area->index_size) > PAGE_SIZE) { + SSDFS_ERR("invalid offset into the page: " + "offset %u, index_size %u\n", + *page_off, area->index_size); + return -ERANGE; + } + + if (*page_index >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("invalid page index: " + "page_index %u, pagevec_count %u\n", + *page_index, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + return 0; +} + +/* + * __ssdfs_init_index_area_hash_range() - extract hash range of index area + * @node: node object + * @index_count: count of indexes in the node + * @start_hash: starting hash of index area [out] + * @end_hash: ending hash of index area [out] + * + * This method tries to extract start and end hash from + * the raw index area. + * + + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_init_index_area_hash_range(struct ssdfs_btree_node *node, + u16 index_count, + u64 *start_hash, u64 *end_hash) +{ + struct ssdfs_btree_index_key *ptr; + struct page *page; + void *kaddr; + u32 page_index; + u32 page_off; + u16 position; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree); + BUG_ON(!start_hash || !end_hash); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, height %u\n", + node->node_id, + atomic_read(&node->height)); +#endif /* CONFIG_SSDFS_DEBUG */ + + *start_hash = U64_MAX; + *end_hash = U64_MAX; + + if (index_count == 0) + return 0; + + position = 0; + + err = ssdfs_define_memory_page(node, &node->index_area, + position, + &page_index, &page_off); + if (unlikely(err)) { + SSDFS_ERR("fail to define memory page: " + "node_id %u, position %u, err %d\n", + node->node_id, position, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(page_index >= U32_MAX); + BUG_ON(page_off >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (page_index >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("page_index %u > pvec_size %u\n", + page_index, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + page = node->content.pvec.pages[page_index]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + ptr = (struct ssdfs_btree_index_key *)((u8 *)kaddr + page_off); + *start_hash = le64_to_cpu(ptr->index.hash); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + position = index_count - 1; + + if (position == 0) { + *end_hash = *start_hash; + return 0; + } + + err = ssdfs_define_memory_page(node, &node->index_area, + position, + &page_index, &page_off); + if (unlikely(err)) { + SSDFS_ERR("fail to define memory page: " + "node_id %u, position %u, err %d\n", + node->node_id, position, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(page_index >= U32_MAX); + BUG_ON(page_off >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (page_index >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("page_index %u > pvec_size %u\n", + page_index, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + page = node->content.pvec.pages[page_index]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + ptr = (struct ssdfs_btree_index_key *)((u8 *)kaddr + page_off); + *end_hash = le64_to_cpu(ptr->index.hash); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + return 0; +} + +/* + * ssdfs_init_index_area_hash_range() - extract hash range of index area + * @node: node object + * @hdr: node's header + * @start_hash: starting hash of index area [out] + * @end_hash: ending hash of index area [out] + * + * This method tries to extract start and end hash from + * the raw index area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_init_index_area_hash_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_header *hdr, + u64 *start_hash, u64 *end_hash) +{ + u16 flags; + u16 index_count; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !hdr); + BUG_ON(!start_hash || !end_hash); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, height %u\n", + node->node_id, + atomic_read(&node->height)); +#endif /* CONFIG_SSDFS_DEBUG */ + + *start_hash = U64_MAX; + *end_hash = U64_MAX; + + flags = le16_to_cpu(hdr->flags); + if (!(flags & SSDFS_BTREE_NODE_HAS_INDEX_AREA)) + return 0; + + index_count = le16_to_cpu(hdr->index_count); + if (index_count == 0) + return 0; + + return __ssdfs_init_index_area_hash_range(node, index_count, + start_hash, end_hash); +} + +/* + * ssdfs_btree_init_node_index_area() - init the node's index area + * @node: node object + * @hdr: node's header + * @hdr_size: size of the header + * + * This method tries to init the node's index area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - header is corrupted. + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_init_node_index_area(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_header *hdr, + size_t hdr_size) +{ + u16 flags; + u32 index_area_size; + u8 index_size; + u16 index_count; + u16 index_capacity; + u32 offset; + u64 start_hash = U64_MAX; + u64 end_hash = U64_MAX; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !hdr); + BUG_ON(hdr_size <= sizeof(struct ssdfs_btree_node_header)); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, height %u\n", + node->node_id, + atomic_read(&node->height)); +#endif /* CONFIG_SSDFS_DEBUG */ + + flags = le16_to_cpu(hdr->flags); + index_area_size = 0; + + if (flags & SSDFS_BTREE_NODE_HAS_INDEX_AREA) { + index_area_size = 1 << hdr->log_index_area_size; + + if (index_area_size == 0 || + index_area_size > node->node_size) { + SSDFS_ERR("invalid index area size %u\n", + index_area_size); + return -EIO; + } + + switch (hdr->type) { + case SSDFS_BTREE_INDEX_NODE: + if (index_area_size != node->node_size) { + SSDFS_ERR("invalid index area's size: " + "index_area_size %u, node_size %u\n", + index_area_size, + node->node_size); + return -EIO; + } + + index_area_size -= hdr_size; + break; + + case SSDFS_BTREE_HYBRID_NODE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node type %#x\n", + hdr->type); + return -EIO; + } + } else { + if (index_area_size != 0) { + SSDFS_ERR("invalid index area size %u\n", + index_area_size); + return -EIO; + } + + switch (hdr->type) { + case SSDFS_BTREE_LEAF_NODE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node type %#x\n", + hdr->type); + return -EIO; + } + } + + index_size = hdr->index_size; + index_count = le16_to_cpu(hdr->index_count); + + if (index_area_size < ((u32)index_count * index_size)) { + SSDFS_ERR("index area is corrupted: " + "index_area_size %u, index_count %u, " + "index_size %u\n", + index_area_size, + index_count, + index_size); + return -EIO; + } + + index_capacity = index_area_size / index_size; + if (index_capacity < index_count) { + SSDFS_ERR("index_capacity %u < index_count %u\n", + index_capacity, index_count); + return -ERANGE; + } + + if (flags & SSDFS_BTREE_NODE_HAS_INDEX_AREA) { + atomic_set(&node->index_area.state, + SSDFS_BTREE_NODE_INDEX_AREA_EXIST); + + offset = le16_to_cpu(hdr->index_area_offset); + + if (offset != hdr_size) { + SSDFS_ERR("invalid index_area_offset %u\n", + offset); + return -EIO; + } + + if ((offset + index_area_size) > node->node_size) { + SSDFS_ERR("offset %u + area_size %u > node_size %u\n", + offset, index_area_size, node->node_size); + return -ERANGE; + } + + node->index_area.offset = offset; + node->index_area.area_size = index_area_size; + node->index_area.index_size = index_size; + node->index_area.index_count = index_count; + node->index_area.index_capacity = index_capacity; + + err = ssdfs_init_index_area_hash_range(node, hdr, + &start_hash, + &end_hash); + if (unlikely(err)) { + SSDFS_ERR("fail to retrieve index area hash range: " + "err %d\n", + err); + return err; + } + + node->index_area.start_hash = start_hash; + node->index_area.end_hash = end_hash; + + if (start_hash > end_hash) { + SSDFS_WARN("node_id %u, height %u, " + "start_hash %llx, end_hash %llx\n", + node->node_id, + atomic_read(&node->height), + start_hash, + end_hash); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + return -EIO; +#endif /* CONFIG_SSDFS_DEBUG */ + } + } else { + atomic_set(&node->index_area.state, + SSDFS_BTREE_NODE_AREA_ABSENT); + node->index_area.offset = U32_MAX; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(index_area_size != 0); +#endif /* CONFIG_SSDFS_DEBUG */ + node->index_area.area_size = index_area_size; + node->index_area.index_size = index_size; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(index_count != 0); +#endif /* CONFIG_SSDFS_DEBUG */ + node->index_area.index_count = index_count; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(index_capacity != 0); +#endif /* CONFIG_SSDFS_DEBUG */ + node->index_area.index_capacity = index_capacity; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(start_hash != U64_MAX); + BUG_ON(end_hash != U64_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + node->index_area.start_hash = start_hash; + node->index_area.end_hash = end_hash; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_hash %llx, end_hash %llx, " + "index_count %u, index_capacity %u\n", + start_hash, end_hash, + index_count, index_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_btree_init_node_items_area() - init the node's items area + * @node: node object + * @hdr: node's header + * @hdr_size: size of the header + * + * This method tries to init the node's items area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - header is corrupted. + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_init_node_items_area(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_header *hdr, + size_t hdr_size) +{ + u16 flags; + u32 index_area_size; + u32 items_area_size; + u8 min_item_size; + u16 max_item_size; + u32 offset; + u64 start_hash; + u64 end_hash; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !hdr); + BUG_ON(hdr_size <= sizeof(struct ssdfs_btree_node_header)); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, height %u\n", + node->node_id, + atomic_read(&node->height)); +#endif /* CONFIG_SSDFS_DEBUG */ + + flags = le16_to_cpu(hdr->flags); + + if (hdr->log_index_area_size > 0) { + index_area_size = 1 << hdr->log_index_area_size; + + switch (hdr->type) { + case SSDFS_BTREE_INDEX_NODE: + if (index_area_size != node->node_size) { + SSDFS_ERR("invalid index area's size: " + "index_area_size %u, node_size %u\n", + index_area_size, + node->node_size); + return -EIO; + } + + index_area_size -= hdr_size; + break; + + case SSDFS_BTREE_HYBRID_NODE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node type %#x\n", + hdr->type); + return -EIO; + } + } else + index_area_size = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON((index_area_size + hdr_size) > node->node_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + items_area_size = node->node_size; + items_area_size -= index_area_size; + items_area_size -= hdr_size; + + if (flags & SSDFS_BTREE_NODE_HAS_ITEMS_AREA) { + if (items_area_size == 0) { + SSDFS_ERR("invalid items area size %u\n", + items_area_size); + return -EIO; + } + + switch (hdr->type) { + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_LEAF_NODE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node type %#x\n", + hdr->type); + return -EIO; + } + } else { + if (items_area_size != 0) { + SSDFS_ERR("invalid items area size %u\n", + items_area_size); + return -EIO; + } + + switch (hdr->type) { + case SSDFS_BTREE_INDEX_NODE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node type %#x\n", + hdr->type); + return -EIO; + } + } + + offset = hdr_size + index_area_size; + + switch (hdr->type) { + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_LEAF_NODE: + if (offset != le32_to_cpu(hdr->item_area_offset)) { + SSDFS_ERR("invalid item_area_offset %u\n", + le32_to_cpu(hdr->item_area_offset)); + return -EIO; + } + break; + } + + if ((offset + items_area_size) > node->node_size) { + SSDFS_ERR("offset %u + items_area_size %u > node_size %u\n", + offset, items_area_size, node->node_size); + return -ERANGE; + } + + min_item_size = hdr->min_item_size; + max_item_size = le16_to_cpu(hdr->max_item_size); + + if (max_item_size < min_item_size) { + SSDFS_ERR("invalid item size: " + "min size %u, max size %u\n", + min_item_size, max_item_size); + return -EIO; + } + + start_hash = le64_to_cpu(hdr->start_hash); + end_hash = le64_to_cpu(hdr->end_hash); + + if (start_hash > end_hash) { + SSDFS_WARN("start_hash %llx > end_hash %llx\n", + start_hash, end_hash); +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + return -EIO; +#endif /* CONFIG_SSDFS_DEBUG */ + } + + if (flags & SSDFS_BTREE_NODE_HAS_ITEMS_AREA) { + atomic_set(&node->items_area.state, + SSDFS_BTREE_NODE_ITEMS_AREA_EXIST); + node->items_area.offset = offset; + node->items_area.area_size = items_area_size; + node->items_area.min_item_size = node->tree->item_size; + node->items_area.min_item_size = min_item_size; + node->items_area.max_item_size = max_item_size; + node->items_area.items_count = U16_MAX; + node->items_area.items_capacity = U16_MAX; + node->items_area.start_hash = start_hash; + node->items_area.end_hash = end_hash; + } else { + atomic_set(&node->items_area.state, + SSDFS_BTREE_NODE_AREA_ABSENT); + node->items_area.offset = U32_MAX; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(items_area_size != 0); +#endif /* CONFIG_SSDFS_DEBUG */ + node->items_area.area_size = items_area_size; + node->items_area.min_item_size = node->tree->item_size; + node->items_area.min_item_size = min_item_size; + node->items_area.max_item_size = max_item_size; + node->items_area.items_count = 0; + node->items_area.items_capacity = 0; + node->items_area.start_hash = U64_MAX; + node->items_area.end_hash = U64_MAX; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_hash %llx, end_hash %llx, " + "items_count %u, items_capacity %u\n", + start_hash, end_hash, + node->items_area.items_count, + node->items_area.items_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_btree_init_node() - init node object + * @node: node object + * @hdr: node's header + * @hdr_size: size of the header + * + * This method tries to init the node object. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EIO - header is corrupted. + * %-ERANGE - internal error. + */ +int ssdfs_btree_init_node(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_header *hdr, + size_t hdr_size) +{ + u8 tree_height; + u64 create_cno; + u16 flags; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !hdr); + BUG_ON(hdr_size <= sizeof(struct ssdfs_btree_node_header)); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("node %p, hdr %p, node_id %u\n", + node, hdr, node->node_id); +#else + SSDFS_DBG("node %p, hdr %p, node_id %u\n", + node, hdr, node->node_id); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + tree_height = atomic_read(&node->tree->height); + if (hdr->height >= tree_height) { + SSDFS_ERR("invalid height: " + "tree_height %u, node_height %u\n", + tree_height, hdr->height); + return -EIO; + } + atomic_set(&node->height, hdr->height); + + if (node->node_size != (1 << hdr->log_node_size)) { + SSDFS_ERR("invalid node size: " + "node_size %u != node_size %u\n", + node->node_size, + (1 << hdr->log_node_size)); + return -EIO; + } + + if (le32_to_cpu(hdr->node_id) != node->node_id) { + SSDFS_WARN("node->node_id %u != hdr->node_id %u\n", + node->node_id, + le32_to_cpu(hdr->node_id)); + return -EIO; + } + + create_cno = le64_to_cpu(hdr->create_cno); + if (create_cno < node->tree->create_cno) { + SSDFS_ERR("create_cno %llu < node->tree->create_cno %llu\n", + create_cno, + node->tree->create_cno); + return -EIO; + } + node->create_cno = create_cno; + + flags = le16_to_cpu(hdr->flags); + if (flags & ~SSDFS_BTREE_NODE_FLAGS_MASK) { + SSDFS_ERR("invalid flags %#x\n", + flags); + return -EIO; + } + atomic_set(&node->flags, flags); + + if (hdr->type <= SSDFS_BTREE_ROOT_NODE || + hdr->type >= SSDFS_BTREE_NODE_TYPE_MAX) { + SSDFS_ERR("invalid type %#x\n", + hdr->type); + return -EIO; + } + atomic_set(&node->type, hdr->type); + + switch (hdr->type) { + case SSDFS_BTREE_INDEX_NODE: + if (flags & SSDFS_BTREE_NODE_HAS_INDEX_AREA && + !(flags & SSDFS_BTREE_NODE_HAS_ITEMS_AREA)) { + /* + * expected set of flags + */ + } else { + SSDFS_ERR("invalid set of flags %#x for index node\n", + flags); + return -EIO; + } + break; + + case SSDFS_BTREE_HYBRID_NODE: + if (flags & SSDFS_BTREE_NODE_HAS_INDEX_AREA && + flags & SSDFS_BTREE_NODE_HAS_ITEMS_AREA) { + /* + * expected set of flags + */ + } else { + SSDFS_ERR("invalid set of flags %#x for hybrid node\n", + flags); + return -EIO; + } + break; + + case SSDFS_BTREE_LEAF_NODE: + if (!(flags & SSDFS_BTREE_NODE_HAS_INDEX_AREA) && + flags & SSDFS_BTREE_NODE_HAS_ITEMS_AREA) { + /* + * expected set of flags + */ + } else { + SSDFS_ERR("invalid set of flags %#x for leaf node\n", + flags); + return -EIO; + } + break; + + default: + SSDFS_ERR("invalid node type %#x\n", hdr->type); + return -ERANGE; + }; + + err = ssdfs_btree_init_node_index_area(node, hdr, hdr_size); + if (unlikely(err)) { + SSDFS_ERR("fail to init index area: " + "node_id %u, err %d\n", + node->node_id, err); + return err; + } + + err = ssdfs_btree_init_node_items_area(node, hdr, hdr_size); + if (unlikely(err)) { + SSDFS_ERR("fail to init items area: " + "node_id %u, err %d\n", + node->node_id, err); + return err; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} diff --git a/fs/ssdfs/btree_node.h b/fs/ssdfs/btree_node.h new file mode 100644 index 000000000000..4dbb98ab1d61 --- /dev/null +++ b/fs/ssdfs/btree_node.h @@ -0,0 +1,768 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/btree_node.h - btree node declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#ifndef _SSDFS_BTREE_NODE_H +#define _SSDFS_BTREE_NODE_H + +#include "request_queue.h" + +/* + * struct ssdfs_btree_node_operations - node operations specialization + * @find_item: specialized item searching algorithm + * @find_range: specialized range searching algorithm + * @extract_range: specialized extract range operation + * @allocate_item: specialized item allocation operation + * @allocate_range: specialized range allocation operation + * @insert_item: specialized insert item operation + * @insert_range: specialized insert range operation + * @change_item: specialized change item operation + * @delete_item: specialized delete item operation + * @delete_range: specialized delete range operation + * @move_items_range: specialized move items operation + * @resize_items_area: specialized resize items area operation + */ +struct ssdfs_btree_node_operations { + int (*find_item)(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search); + int (*find_range)(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search); + int (*extract_range)(struct ssdfs_btree_node *node, + u16 start_index, u16 count, + struct ssdfs_btree_search *search); + int (*allocate_item)(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search); + int (*allocate_range)(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search); + int (*insert_item)(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search); + int (*insert_range)(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search); + int (*change_item)(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search); + int (*delete_item)(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search); + int (*delete_range)(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search); + int (*move_items_range)(struct ssdfs_btree_node *src, + struct ssdfs_btree_node *dst, + u16 start_item, u16 count); + int (*resize_items_area)(struct ssdfs_btree_node *node, + u32 new_size); +}; + +/* Btree node area's states */ +enum { + SSDFS_BTREE_NODE_AREA_UNKNOWN_STATE, + SSDFS_BTREE_NODE_AREA_ABSENT, + SSDFS_BTREE_NODE_INDEX_AREA_EXIST, + SSDFS_BTREE_NODE_ITEMS_AREA_EXIST, + SSDFS_BTREE_NODE_LOOKUP_TBL_EXIST, + SSDFS_BTREE_NODE_HASH_TBL_EXIST, + SSDFS_BTREE_NODE_AREA_STATE_MAX +}; + +/* + * struct ssdfs_btree_node_index_area - btree node's index area + * @state: area state + * @offset: area offset from node's beginning + * @area_size: area size in bytes + * @index_size: index size in bytes + * @index_count: count of indexes in area + * @index_capacity: index area capacity + * @start_hash: starting hash in index area + * @end_hash: ending hash in index area + */ +struct ssdfs_btree_node_index_area { + atomic_t state; + + u32 offset; + u32 area_size; + + u8 index_size; + u16 index_count; + u16 index_capacity; + + u64 start_hash; + u64 end_hash; +}; + +/* + * struct ssdfs_btree_node_items_area - btree node's data area + * @state: area state + * @offset: area offset from node's beginning + * @area_size: area size in bytes + * @free_space: free space in bytes + * @item_size: item size in bytes + * @min_item_size: minimal possible item size in bytes + * @max_item_size: maximal possible item size in bytes + * @items_count: count of allocated items in area + * @items_capacity: items area capacity + * @start_hash: starting hash in items area + * @end_hash: ending hash in items area + */ +struct ssdfs_btree_node_items_area { + atomic_t state; + + u32 offset; + u32 area_size; + u32 free_space; + + u16 item_size; + u8 min_item_size; + u16 max_item_size; + + u16 items_count; + u16 items_capacity; + + u64 start_hash; + u64 end_hash; +}; + +struct ssdfs_btree; + +/* + * struct ssdfs_state_bitmap - bitmap of states + * @lock: bitmap lock + * @flags: bitmap's flags + * @ptr: bitmap + */ +struct ssdfs_state_bitmap { + spinlock_t lock; + +#define SSDFS_LOOKUP_TBL2_IS_USING (1 << 0) +#define SSDFS_HASH_TBL_IS_USING (1 << 1) +#define SSDFS_BMAP_ARRAY_FLAGS_MASK 0x3 + u32 flags; + + unsigned long *ptr; +}; + +/* + * struct ssdfs_state_bitmap_array - array of bitmaps + * @lock: bitmap array lock + * @bits_count: whole bits count in the bitmap + * @bmap_bytes: size in bytes of every bitmap + * @index_start_bit: starting bit of index area in the bitmap + * @item_start_bit: starting bit of items area in the bitmap + * @bmap: partial locks, alloc and dirty bitmaps + */ +struct ssdfs_state_bitmap_array { + struct rw_semaphore lock; + unsigned long bits_count; + size_t bmap_bytes; + unsigned long index_start_bit; + unsigned long item_start_bit; + +#define SSDFS_BTREE_NODE_LOCK_BMAP (0) +#define SSDFS_BTREE_NODE_ALLOC_BMAP (1) +#define SSDFS_BTREE_NODE_DIRTY_BMAP (2) +#define SSDFS_BTREE_NODE_BMAP_COUNT (3) + struct ssdfs_state_bitmap bmap[SSDFS_BTREE_NODE_BMAP_COUNT]; +}; + +/* + * struct ssdfs_btree_node_content - btree node's content + * @pvec: page vector + */ +struct ssdfs_btree_node_content { + struct pagevec pvec; +}; + +union ssdfs_aggregated_btree_node_header { + struct ssdfs_inodes_btree_node_header inodes_header; + struct ssdfs_dentries_btree_node_header dentries_header; + struct ssdfs_extents_btree_node_header extents_header; + struct ssdfs_xattrs_btree_node_header xattrs_header; +}; + +/* + * struct ssdfs_btree_node - btree node + * @height: node's height + * @node_size: node size in bytes + * @pages_per_node: count of memory pages per node + * @create_cno: create checkpoint + * @node_id: node identification number + * @tree: pointer on node's parent tree + * @node_ops: btree's node operation specialization + * @refs_count: reference counter + * @state: node state + * @flags: node's flags + * @type: node type + * @header_lock: header lock + * @raw.root_node: root node copy + * @raw.generic_header: generic node's header + * @raw.inodes_header: inodes node's header + * @raw.dentries_header: dentries node's header + * @raw.extents_header: extents node's header + * @raw.dict_header: shared dictionary node's header + * @raw.xattrs_header: xattrs node's header + * @raw.shextree_header: shared extents tree's header + * @raw.snapshots_header: snapshots node's header + * @raw.invextree_header: invalidated extents tree's header + * @index_area: index area descriptor + * @items_area: items area descriptor + * @lookup_tbl_area: lookup table's area descriptor + * @hash_tbl_area: hash table's area descriptor + * @descriptor_lock: node's descriptor lock + * @update_cno: last update checkpoint + * @parent_node: pointer on parent node + * @node_index: node's index (for using in search operations) + * @extent: node's location + * @seg: pointer on segment object + * @init_end: wait of init ending + * @flush_req: flush request + * @bmap_array: partial locks, alloc and dirty bitmaps + * @wait_queue: queue of threads are waiting partial lock + * @full_lock: the whole node lock + * @content: node's content + */ +struct ssdfs_btree_node { + /* static data */ + atomic_t height; + u32 node_size; + u8 pages_per_node; + u64 create_cno; + u32 node_id; + + struct ssdfs_btree *tree; + + /* btree's node operation specialization */ + const struct ssdfs_btree_node_operations *node_ops; + + /* + * Reference counter + * The goal of reference counter is to account how + * many btree search objects are referencing the + * node's object. If some thread deletes all records + * in a node then the node will be left undeleted + * from the tree in the case of @refs_count is greater + * than one. + */ + atomic_t refs_count; + + /* mutable data */ + atomic_t state; + atomic_t flags; + atomic_t type; + + /* node's header */ + struct rw_semaphore header_lock; + union { + struct ssdfs_btree_inline_root_node root_node; + struct ssdfs_btree_node_header generic_header; + struct ssdfs_inodes_btree_node_header inodes_header; + struct ssdfs_dentries_btree_node_header dentries_header; + struct ssdfs_extents_btree_node_header extents_header; + struct ssdfs_shared_dictionary_node_header dict_header; + struct ssdfs_xattrs_btree_node_header xattrs_header; + struct ssdfs_shextree_node_header shextree_header; + struct ssdfs_snapshots_btree_node_header snapshots_header; + struct ssdfs_invextree_node_header invextree_header; + } raw; + struct ssdfs_btree_node_index_area index_area; + struct ssdfs_btree_node_items_area items_area; + struct ssdfs_btree_node_index_area lookup_tbl_area; + struct ssdfs_btree_node_index_area hash_tbl_area; + + /* node's descriptor */ + spinlock_t descriptor_lock; + u64 update_cno; + struct ssdfs_btree_node *parent_node; + struct ssdfs_btree_index_key node_index; + struct ssdfs_raw_extent extent; + struct ssdfs_segment_info *seg; + struct completion init_end; + struct ssdfs_segment_request flush_req; + + /* partial locks, alloc and dirty bitmaps */ + struct ssdfs_state_bitmap_array bmap_array; + wait_queue_head_t wait_queue; + + /* node raw content */ + struct rw_semaphore full_lock; + struct ssdfs_btree_node_content content; +}; + +/* Btree node states */ +enum { + SSDFS_BTREE_NODE_UNKNOWN_STATE, + SSDFS_BTREE_NODE_CREATED, + SSDFS_BTREE_NODE_CONTENT_PREPARED, + SSDFS_BTREE_NODE_INITIALIZED, + SSDFS_BTREE_NODE_DIRTY, + SSDFS_BTREE_NODE_PRE_DELETED, + SSDFS_BTREE_NODE_INVALID, + SSDFS_BTREE_NODE_CORRUPTED, + SSDFS_BTREE_NODE_STATE_MAX +}; + +/* + * TODO: it is possible to use knowledge about partial + * updates and to send only changed pieces of + * data for the case of Diff-On-Write approach. + * Metadata is good case for determination of + * partial updates and to send changed part(s) + * only. For example, bitmap could show dirty + * items in the node. + */ + +/* + * Inline functions + */ + +/* + * NODE2SEG_TYPE() - convert node type into segment type + * @node_type: node type + */ +static inline +u8 NODE2SEG_TYPE(u8 node_type) +{ + switch (node_type) { + case SSDFS_BTREE_INDEX_NODE: + return SSDFS_INDEX_NODE_SEG_TYPE; + + case SSDFS_BTREE_HYBRID_NODE: + return SSDFS_HYBRID_NODE_SEG_TYPE; + + case SSDFS_BTREE_LEAF_NODE: + return SSDFS_LEAF_NODE_SEG_TYPE; + } + + SSDFS_WARN("invalid node type %#x\n", node_type); + + return SSDFS_UNKNOWN_SEG_TYPE; +} + +/* + * RANGE_WITHOUT_INTERSECTION() - check that ranges have intersection + * @start1: starting hash of the first range + * @end1: ending hash of the first range + * @start2: starting hash of the second range + * @end2: ending hash of the second range + * + * This method checks that ranges have intersection. + * + * RETURN: + * 0 - ranges have intersection + * 1 - range1 > range2 + * -1 - range1 < range2 + */ +static inline +int RANGE_WITHOUT_INTERSECTION(u64 start1, u64 end1, u64 start2, u64 end2) +{ + SSDFS_DBG("start1 %llx, end1 %llx, start2 %llx, end2 %llx\n", + start1, end1, start2, end2); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(start1 >= U64_MAX || end1 >= U64_MAX || + start2 >= U64_MAX || end2 >= U64_MAX); + BUG_ON(start1 > end1); + BUG_ON(start2 > end2); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (start1 > end2) + return 1; + + if (end1 < start2) + return -1; + + return 0; +} + +/* + * RANGE_HAS_PARTIAL_INTERSECTION() - check that ranges intersect partially + * @start1: starting hash of the first range + * @end1: ending hash of the first range + * @start2: starting hash of the second range + * @end2: ending hash of the second range + */ +static inline +bool RANGE_HAS_PARTIAL_INTERSECTION(u64 start1, u64 end1, + u64 start2, u64 end2) +{ + SSDFS_DBG("start1 %llx, end1 %llx, start2 %llx, end2 %llx\n", + start1, end1, start2, end2); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(start1 >= U64_MAX || end1 >= U64_MAX || + start2 >= U64_MAX || end2 >= U64_MAX); + BUG_ON(start1 > end1); + BUG_ON(start2 > end2); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (start1 > end2) + return false; + + if (end1 < start2) + return false; + + return true; +} + +/* + * __ssdfs_items_per_lookup_index() - calculate items per lookup index + * @items_per_node: number of items per node + * @lookup_table_capacity: maximal number of items in lookup table + */ +static inline +u16 __ssdfs_items_per_lookup_index(u32 items_per_node, + int lookup_table_capacity) +{ + u32 items_per_lookup_index; + + items_per_lookup_index = items_per_node / lookup_table_capacity; + + if (items_per_node % lookup_table_capacity) + items_per_lookup_index++; + + SSDFS_DBG("items_per_lookup_index %u\n", items_per_lookup_index); + + return items_per_lookup_index; +} + +/* + * __ssdfs_convert_lookup2item_index() - convert lookup into item index + * @lookup_index: lookup index + * @node_size: size of the node in bytes + * @item_size: size of the item in bytes + * @lookup_table_capacity: maximal number of items in lookup table + */ +static inline +u16 __ssdfs_convert_lookup2item_index(u16 lookup_index, + u32 node_size, + size_t item_size, + int lookup_table_capacity) +{ + u32 items_per_node; + u32 items_per_lookup_index; + u32 item_index; + + SSDFS_DBG("lookup_index %u, node_size %u, " + "item_size %zu, table_capacity %d\n", + lookup_index, node_size, + item_size, lookup_table_capacity); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(lookup_index >= lookup_table_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + items_per_node = node_size / item_size; + items_per_lookup_index = __ssdfs_items_per_lookup_index(items_per_node, + lookup_table_capacity); + + item_index = (u32)lookup_index * items_per_lookup_index; + + SSDFS_DBG("lookup_index %u, item_index %u\n", + lookup_index, item_index); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(item_index >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + return (u16)item_index; +} + +/* + * __ssdfs_convert_item2lookup_index() - convert item into lookup index + * @item_index: item index + * @node_size: size of the node in bytes + * @item_size: size of the item in bytes + * @lookup_table_capacity: maximal number of items in lookup table + */ +static inline +u16 __ssdfs_convert_item2lookup_index(u16 item_index, + u32 node_size, + size_t item_size, + int lookup_table_capacity) +{ + u32 items_per_node; + u32 items_per_lookup_index; + u16 lookup_index; + + SSDFS_DBG("item_index %u, node_size %u, " + "item_size %zu, table_capacity %d\n", + item_index, node_size, + item_size, lookup_table_capacity); + + items_per_node = node_size / item_size; + items_per_lookup_index = __ssdfs_items_per_lookup_index(items_per_node, + lookup_table_capacity); + lookup_index = item_index / items_per_lookup_index; + + SSDFS_DBG("item_index %u, lookup_index %u, table_capacity %d\n", + item_index, lookup_index, lookup_table_capacity); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(lookup_index >= lookup_table_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + return lookup_index; +} + +/* + * Btree node API + */ +struct ssdfs_btree_node * +ssdfs_btree_node_create(struct ssdfs_btree *tree, + u32 node_id, + struct ssdfs_btree_node *parent, + u8 height, int type, u64 start_hash); +void ssdfs_btree_node_destroy(struct ssdfs_btree_node *node); +int ssdfs_btree_node_prepare_content(struct ssdfs_btree_node *node, + struct ssdfs_btree_index_key *index); +int ssdfs_btree_init_node(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_header *hdr, + size_t hdr_size); +int ssdfs_btree_pre_flush_root_node(struct ssdfs_btree_node *node); +void ssdfs_btree_flush_root_node(struct ssdfs_btree_node *node, + struct ssdfs_btree_inline_root_node *root_node); +int ssdfs_btree_node_pre_flush(struct ssdfs_btree_node *node); +int ssdfs_btree_node_flush(struct ssdfs_btree_node *node); + +void ssdfs_btree_node_get(struct ssdfs_btree_node *node); +void ssdfs_btree_node_put(struct ssdfs_btree_node *node); +bool is_ssdfs_node_shared(struct ssdfs_btree_node *node); + +bool is_ssdfs_btree_node_dirty(struct ssdfs_btree_node *node); +void set_ssdfs_btree_node_dirty(struct ssdfs_btree_node *node); +void clear_ssdfs_btree_node_dirty(struct ssdfs_btree_node *node); +bool is_ssdfs_btree_node_pre_deleted(struct ssdfs_btree_node *node); +void set_ssdfs_btree_node_pre_deleted(struct ssdfs_btree_node *node); +void clear_ssdfs_btree_node_pre_deleted(struct ssdfs_btree_node *node); + +bool is_ssdfs_btree_node_index_area_exist(struct ssdfs_btree_node *node); +bool is_ssdfs_btree_node_index_area_empty(struct ssdfs_btree_node *node); +int ssdfs_btree_node_resize_index_area(struct ssdfs_btree_node *node, + u32 new_size); +int ssdfs_btree_node_find_index(struct ssdfs_btree_search *search); +bool can_add_new_index(struct ssdfs_btree_node *node); +int ssdfs_btree_node_add_index(struct ssdfs_btree_node *node, + struct ssdfs_btree_index_key *key); +int ssdfs_btree_node_change_index(struct ssdfs_btree_node *node, + struct ssdfs_btree_index_key *old_key, + struct ssdfs_btree_index_key *new_key); +int ssdfs_btree_node_delete_index(struct ssdfs_btree_node *node, + u64 hash); + +bool is_ssdfs_btree_node_items_area_exist(struct ssdfs_btree_node *node); +bool is_ssdfs_btree_node_items_area_empty(struct ssdfs_btree_node *node); +int ssdfs_btree_node_find_item(struct ssdfs_btree_search *search); +int ssdfs_btree_node_find_range(struct ssdfs_btree_search *search); +int ssdfs_btree_node_allocate_item(struct ssdfs_btree_search *search); +int ssdfs_btree_node_allocate_range(struct ssdfs_btree_search *search); +int ssdfs_btree_node_insert_item(struct ssdfs_btree_search *search); +int ssdfs_btree_node_insert_range(struct ssdfs_btree_search *search); +int ssdfs_btree_node_change_item(struct ssdfs_btree_search *search); +int ssdfs_btree_node_delete_item(struct ssdfs_btree_search *search); +int ssdfs_btree_node_delete_range(struct ssdfs_btree_search *search); + +/* + * Internal Btree node API + */ +int ssdfs_lock_items_range(struct ssdfs_btree_node *node, + u16 start_index, u16 count); +void ssdfs_unlock_items_range(struct ssdfs_btree_node *node, + u16 start_index, u16 count); +int ssdfs_lock_whole_index_area(struct ssdfs_btree_node *node); +void ssdfs_unlock_whole_index_area(struct ssdfs_btree_node *node); +int ssdfs_allocate_items_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search, + u16 items_capacity, + u16 start_index, u16 count); +bool is_ssdfs_node_items_range_allocated(struct ssdfs_btree_node *node, + u16 items_capacity, + u16 start_index, u16 count); +int ssdfs_free_items_range(struct ssdfs_btree_node *node, + u16 start_index, u16 count); +int ssdfs_set_node_header_dirty(struct ssdfs_btree_node *node, + u16 items_capacity); +void ssdfs_clear_node_header_dirty_state(struct ssdfs_btree_node *node); +int ssdfs_set_dirty_items_range(struct ssdfs_btree_node *node, + u16 items_capacity, + u16 start_index, u16 count); +void ssdfs_clear_dirty_items_range_state(struct ssdfs_btree_node *node, + u16 start_index, u16 count); + +int ssdfs_btree_node_allocate_bmaps(void *addr[SSDFS_BTREE_NODE_BMAP_COUNT], + size_t bmap_bytes); +void ssdfs_btree_node_init_bmaps(struct ssdfs_btree_node *node, + void *addr[SSDFS_BTREE_NODE_BMAP_COUNT]); +int ssdfs_btree_node_allocate_content_space(struct ssdfs_btree_node *node, + u32 node_size); +int __ssdfs_btree_node_prepare_content(struct ssdfs_fs_info *fsi, + struct ssdfs_btree_index_key *ptr, + u32 node_size, + u64 owner_id, + struct ssdfs_segment_info **si, + struct pagevec *pvec); +int ssdfs_btree_create_root_node(struct ssdfs_btree_node *node, + struct ssdfs_btree_inline_root_node *root_node); +int ssdfs_btree_node_pre_flush_header(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_header *hdr); +int ssdfs_btree_common_node_flush(struct ssdfs_btree_node *node); +int ssdfs_btree_node_commit_log(struct ssdfs_btree_node *node); +int ssdfs_btree_deleted_node_commit_log(struct ssdfs_btree_node *node); +int __ssdfs_btree_root_node_extract_index(struct ssdfs_btree_node *node, + u16 found_index, + struct ssdfs_btree_index_key *ptr); +int ssdfs_btree_root_node_delete_index(struct ssdfs_btree_node *node, + u16 position); +int ssdfs_btree_common_node_delete_index(struct ssdfs_btree_node *node, + u16 position); +int ssdfs_find_index_by_hash(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_index_area *area, + u64 hash, + u16 *found_index); +int ssdfs_btree_node_find_index_position(struct ssdfs_btree_node *node, + u64 hash, + u16 *found_position); +int ssdfs_btree_node_extract_range(u16 start_index, u16 count, + struct ssdfs_btree_search *search); +int ssdfs_btree_node_get_index(struct pagevec *pvec, + u32 area_offset, u32 area_size, + u32 node_size, u16 position, + struct ssdfs_btree_index_key *ptr); +int ssdfs_btree_node_move_index_range(struct ssdfs_btree_node *src, + u16 src_start, + struct ssdfs_btree_node *dst, + u16 dst_start, u16 count); +int ssdfs_btree_node_move_items_range(struct ssdfs_btree_node *src, + struct ssdfs_btree_node *dst, + u16 start_item, u16 count); +int ssdfs_copy_item_in_buffer(struct ssdfs_btree_node *node, + u16 index, + size_t item_size, + struct ssdfs_btree_search *search); +bool is_last_leaf_node_found(struct ssdfs_btree_search *search); +int ssdfs_btree_node_find_lookup_index_nolock(struct ssdfs_btree_search *search, + __le64 *lookup_table, + int table_capacity, + u16 *lookup_index); +typedef int (*ssdfs_check_found_item)(struct ssdfs_fs_info *fsi, + struct ssdfs_btree_search *search, + void *kaddr, + u16 item_index, + u64 *start_hash, + u64 *end_hash, + u16 *found_index); +typedef int (*ssdfs_prepare_result_buffer)(struct ssdfs_btree_search *search, + u16 found_index, + u64 start_hash, + u64 end_hash, + u16 items_count, + size_t item_size); +typedef int (*ssdfs_extract_found_item)(struct ssdfs_fs_info *fsi, + struct ssdfs_btree_search *search, + size_t item_size, + void *kaddr, + u64 *start_hash, + u64 *end_hash); +int __ssdfs_extract_range_by_lookup_index(struct ssdfs_btree_node *node, + u16 lookup_index, + int lookup_table_capacity, + size_t item_size, + struct ssdfs_btree_search *search, + ssdfs_check_found_item check_item, + ssdfs_prepare_result_buffer prepare_buffer, + ssdfs_extract_found_item extract_item); +int ssdfs_shift_range_right(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + size_t item_size, + u16 start_index, u16 range_len, + u16 shift); +int ssdfs_shift_range_right2(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_index_area *area, + size_t item_size, + u16 start_index, u16 range_len, + u16 shift); +int ssdfs_shift_range_left(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + size_t item_size, + u16 start_index, u16 range_len, + u16 shift); +int ssdfs_shift_range_left2(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_index_area *area, + size_t item_size, + u16 start_index, u16 range_len, + u16 shift); +int ssdfs_shift_memory_range_right(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + u16 offset, u16 range_len, + u16 shift); +int ssdfs_shift_memory_range_right2(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_index_area *area, + u16 offset, u16 range_len, + u16 shift); +int ssdfs_shift_memory_range_left(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + u16 offset, u16 range_len, + u16 shift); +int ssdfs_shift_memory_range_left2(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_index_area *area, + u16 offset, u16 range_len, + u16 shift); +int ssdfs_generic_insert_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + size_t item_size, + struct ssdfs_btree_search *search); +int ssdfs_invalidate_root_node_hierarchy(struct ssdfs_btree_node *node); +int __ssdfs_btree_node_extract_range(struct ssdfs_btree_node *node, + u16 start_index, u16 count, + size_t item_size, + struct ssdfs_btree_search *search); +int __ssdfs_btree_node_resize_items_area(struct ssdfs_btree_node *node, + size_t item_size, + size_t index_size, + u32 new_size); +int __ssdfs_define_memory_page(u32 area_offset, u32 area_size, + u32 node_size, size_t item_size, + u16 position, + u32 *page_index, u32 *page_off); +int ssdfs_btree_node_get_hash_range(struct ssdfs_btree_search *search, + u64 *start_hash, u64 *end_hash, + u16 *items_count); +int __ssdfs_btree_common_node_extract_index(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_index_area *area, + u16 found_index, + struct ssdfs_btree_index_key *ptr); +int ssdfs_btree_node_check_hash_range(struct ssdfs_btree_node *node, + u16 items_count, + u16 items_capacity, + u64 start_hash, + u64 end_hash, + struct ssdfs_btree_search *search); +int ssdfs_btree_node_clear_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + size_t item_size, + struct ssdfs_btree_search *search); +int __ssdfs_btree_node_clear_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + size_t item_size, + u16 start_index, + unsigned int range_len); +int ssdfs_btree_node_copy_header_nolock(struct ssdfs_btree_node *node, + struct page *page, + u32 *write_offset); + +void ssdfs_show_btree_node_info(struct ssdfs_btree_node *node); +void ssdfs_debug_btree_node_object(struct ssdfs_btree_node *node); + +#endif /* _SSDFS_BTREE_NODE_H */ From patchwork Sat Feb 25 01:09:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151954 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38E40C7EE2E for ; Sat, 25 Feb 2023 01:19:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229740AbjBYBTn (ORCPT ); Fri, 24 Feb 2023 20:19:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48986 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229748AbjBYBRk (ORCPT ); Fri, 24 Feb 2023 20:17:40 -0500 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 901CE19F2B for ; Fri, 24 Feb 2023 17:17:23 -0800 (PST) Received: by mail-oi1-x22b.google.com with SMTP id bg11so811275oib.5 for ; Fri, 24 Feb 2023 17:17:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=js+J53dyqfDfOu/8vOUaH+zBocY+L2EC1Afpz5MRyYA=; b=zpnFc1O0lebfGp36L4zNlJSpE2/mBMcSaDlUMueFT5i0vZYU+RLNbiDr/qT2ZGfb54 VKDcwTH+BGyBA2AoWjS5G7ILJA8EssaKd8QBArB64hTduRFH7ymFU7IJWrLnLYLi8qwP MWdCVTHWDdzp19Ojjww4bUirtCjgK0hFooALoNnmq/MucBdt5xFedp2UwMo7dLaVNR// ymQ5FdclNdxNWkyONpv4qkDZyKJ1fPLueQqNiPbtZwz+sy0MIJBRCbETJvaqPKf85QGY KEotkR/tRTqJUmZ1ul9EG/2y13TkSRhvtvd1UzfvOPNlqUZshPkUezpQUswZ2mNRKK7t vl+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=js+J53dyqfDfOu/8vOUaH+zBocY+L2EC1Afpz5MRyYA=; b=R+AcA+dsobzsIasSUxbCMSON9YlqUDnk9E4nfatWr4wDPLgktUyg6sjfzDwxrr8L13 tweKFnfZSzMi4Pz94sD6ZGsp4SNB6ODm/iyiQzcY9HsqPM24P7UuzRAJ6HJXHFKatFZ0 pkZVTYBgnrTsgxEyR+Di2g5fvT0gGC+H5Pt7OwNjmfmYSu+XqDEYXWEu2UIbJIs7C1+B yjFII2qk5Ohz8/EtTMDI6KRidjt7PTBZaSUlD96hHzDrvflzg/HMQVgiAmhT1eQ6YMVP +L3uQwkz5EnhDQgBwMpY1dqAk2+eMGuwLwNpXXeiUhliBN2R6Y8Iien8uGDvYY19Ijoh zp7Q== X-Gm-Message-State: AO0yUKWQWZw/JA7JSIUJlq/PApLUs129SKFn6k6MyjCm/o9QI0HfOxvx Wx2FuS107XVG4Gxf/RM7vvnvY9HZ6tsTvThY X-Google-Smtp-Source: AK7set8gkKB7w4X8tK78mV+rGrJlFrYadMzA5/pmGiBrp2iz7qG4/fT1mGo4zgqyAbIsrFw7fcSv4g== X-Received: by 2002:a05:6808:1385:b0:37b:21c8:4f30 with SMTP id c5-20020a056808138500b0037b21c84f30mr848964oiw.22.1677287842230; Fri, 24 Feb 2023 17:17:22 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:21 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 51/76] ssdfs: flush b-tree node object Date: Fri, 24 Feb 2023 17:09:02 -0800 Message-Id: <20230225010927.813929-52-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Dirty b-tree implies the presence of one or several dirty b-tree node. B-tree flush logic detects the dirty b-tree nodes and request the flush operation for every dirty b-tree node. B-tree node can include several memory pages (8K, for example). It means that one b-tree node can be located in one or several logical blocks. Finally, flush operation means that b-tree node's flush logic has to issue update request(s) for all logical blocks that contain the b-tree node content. Every b-tree node is described by index record (or key) that includes: (1) node ID, (2) node type, (3) node height, (4) starting hash value, (5) raw extent. The raw extent describes the segment ID, logical block ID, and length. As a result, flush logic needs to add update request into an update queue of particular PEB for segment ID. Also, flush logic has to request the log commit operation because b-tree node has to be stored peristently right now. Flush thread(s) of particular PEB(s) executes the update requests. Finally, b-tree flush logic requires to wait the finish of update operations for all dirty b-tree node(s). Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/btree_node.c | 3048 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 3048 insertions(+) diff --git a/fs/ssdfs/btree_node.c b/fs/ssdfs/btree_node.c index 9f09090e5cfd..a826b1c9699d 100644 --- a/fs/ssdfs/btree_node.c +++ b/fs/ssdfs/btree_node.c @@ -2174,3 +2174,3051 @@ int ssdfs_btree_init_node(struct ssdfs_btree_node *node, return 0; } + +/* + * ssdfs_btree_pre_flush_root_node() - pre-flush the dirty root node + * @node: node object + * + * This method tries to pre-flush the dirty root node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_btree_pre_flush_root_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_btree_inline_root_node *root_node; + size_t root_node_size = sizeof(struct ssdfs_btree_inline_root_node); + int height, tree_height; + int type; + u32 area_size, calculated_area_size; + u32 area_offset; + u16 index_count; + u16 index_size; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + root_node = &node->raw.root_node; + + height = atomic_read(&node->height); + if (height >= U8_MAX || height <= 0) { + SSDFS_ERR("invalid height %d\n", height); + return -ERANGE; + } + + tree_height = atomic_read(&node->tree->height); + if (tree_height >= U8_MAX || tree_height <= 0) { + SSDFS_ERR("invalid tree's height %d\n", + tree_height); + return -ERANGE; + } + + if ((tree_height - 1) != height) { + SSDFS_ERR("tree_height %d, root node's height %d\n", + tree_height, height); + return -ERANGE; + } + + root_node->header.height = (u8)height; + + if (node->node_size != root_node_size) { + SSDFS_ERR("corrupted root node size %u\n", + node->node_size); + return -ERANGE; + } + + calculated_area_size = sizeof(struct ssdfs_btree_index); + calculated_area_size *= SSDFS_BTREE_ROOT_NODE_INDEX_COUNT; + + area_size = node->index_area.area_size; + if (area_size != calculated_area_size) { + SSDFS_ERR("corrupted index area size %u\n", + area_size); + return -ERANGE; + } + + type = atomic_read(&node->type); + if (type != SSDFS_BTREE_ROOT_NODE) { + SSDFS_ERR("invalid node type %#x\n", + type); + return -ERANGE; + } + + root_node->header.type = (u8)type; + + area_offset = node->index_area.offset; + if (area_offset < sizeof(struct ssdfs_btree_root_node_header) || + area_offset >= node->node_size) { + SSDFS_ERR("corrupted index area offset %u\n", + area_offset); + return -ERANGE; + } + + if (node->index_area.index_count > node->index_area.index_capacity) { + SSDFS_ERR("corrupted index area descriptor: " + "index_count %u, index_capacity %u\n", + node->index_area.index_count, + node->index_area.index_capacity); + return -ERANGE; + } + + index_count = node->index_area.index_count; + + if (index_count > SSDFS_BTREE_ROOT_NODE_INDEX_COUNT) { + SSDFS_ERR("invalid index count %u\n", + index_count); + return -ERANGE; + } + + root_node->header.items_count = (u8)index_count; + + index_size = node->index_area.index_size; + + if (index_size != sizeof(struct ssdfs_btree_index)) { + SSDFS_ERR("invalid index size %u\n", index_size); + return -ERANGE; + } + + if (((u32)index_count * index_size) > area_size) { + SSDFS_ERR("corrupted index area: " + "index_count %u, index_size %u, area_size %u\n", + index_count, + index_size, + area_size); + return -ERANGE; + } + + root_node->header.upper_node_id = + cpu_to_le32(node->tree->upper_node_id); + + return 0; +} + +/* + * ssdfs_btree_node_pre_flush_header() - pre-flush node's header + * @node: node object + * @hdr: node's header + * + * This method tries to pre-flush the node's header. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_btree_node_pre_flush_header(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_header *hdr) +{ + int height; + int type; + int flags; + u32 area_size; + u32 area_offset; + u8 index_size; + u16 index_count; + u16 index_capacity; + u16 items_capacity; + u16 item_size; + u8 min_item_size; + u16 max_item_size; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !hdr); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + height = atomic_read(&node->height); + if (height >= U8_MAX || height < 0) { + SSDFS_ERR("invalid height %d\n", height); + return -ERANGE; + } + + hdr->height = (u8)height; + + if ((1 << ilog2(node->node_size)) != node->node_size) { + SSDFS_ERR("corrupted node size %u\n", + node->node_size); + return -ERANGE; + } + + hdr->log_node_size = (u8)ilog2(node->node_size); + + type = atomic_read(&node->type); + if (type <= SSDFS_BTREE_NODE_UNKNOWN_TYPE || + type >= SSDFS_BTREE_NODE_TYPE_MAX) { + SSDFS_ERR("invalid node type %#x\n", + type); + return -ERANGE; + } + + hdr->type = (u8)type; + + flags = atomic_read(&node->flags); + if (flags & ~SSDFS_BTREE_NODE_FLAGS_MASK) { + SSDFS_ERR("corrupted set of flags %#x\n", + flags); + return -ERANGE; + } + + /* + * Flag SSDFS_BTREE_NODE_PRE_ALLOCATED needs to be excluded. + * The pre-allocated node will be created during the flush + * operation. This flag needs only on kernel side. + */ + flags &= ~SSDFS_BTREE_NODE_PRE_ALLOCATED; + + hdr->flags = cpu_to_le16((u16)flags); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, type %#x, flags %#x\n", + node->node_id, type, flags); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_INDEX_NODE: + /* + * Initialization code expect the node size. + * As a result, the index area's size is calculated + * by means of subtraction the header size + * from the node size. + */ + area_size = node->node_size; + if ((1 << ilog2(area_size)) != area_size) { + SSDFS_ERR("corrupted index area size %u\n", + area_size); + return -ERANGE; + } + + hdr->log_index_area_size = (u8)ilog2(area_size); + + /* + * Real area size is used for checking + * the rest of fields. + */ + area_size = node->index_area.area_size; + break; + + default: + area_size = node->index_area.area_size; + if ((1 << ilog2(area_size)) != area_size) { + SSDFS_ERR("corrupted index area size %u\n", + area_size); + return -ERANGE; + } + + hdr->log_index_area_size = (u8)ilog2(area_size); + break; + } + + area_offset = node->index_area.offset; + if (area_offset <= sizeof(struct ssdfs_btree_node_header) || + area_offset >= node->node_size || + area_offset >= node->items_area.offset) { + SSDFS_ERR("corrupted index area offset %u\n", + area_offset); + return -ERANGE; + } + + hdr->index_area_offset = cpu_to_le16((u16)area_offset); + + index_count = node->index_area.index_count; + index_capacity = node->index_area.index_capacity; + + if (index_count > index_capacity) { + SSDFS_ERR("corrupted index area descriptor: " + "index_count %u, index_capacity %u\n", + index_count, index_capacity); + return -ERANGE; + } + + hdr->index_count = cpu_to_le16(index_count); + + index_size = node->index_area.index_size; + + if (((u32)index_count * index_size) > area_size) { + SSDFS_ERR("corrupted index area: " + "index_count %u, index_size %u, " + "area_size %u\n", + index_count, index_size, area_size); + return -ERANGE; + } + + hdr->index_size = index_size; + break; + + default: + hdr->log_index_area_size = (u8)ilog2(0); + hdr->index_area_offset = cpu_to_le16(U16_MAX); + hdr->index_count = cpu_to_le16(0); + hdr->index_size = U8_MAX; + break; + } + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + item_size = node->items_area.item_size; + min_item_size = node->items_area.min_item_size; + max_item_size = node->items_area.max_item_size; + items_capacity = node->items_area.items_capacity; + area_size = node->items_area.area_size; + break; + + default: + item_size = U16_MAX; + min_item_size = 0; + max_item_size = 0; + items_capacity = 0; + area_size = 0; + break; + } + + switch (type) { + case SSDFS_BTREE_LEAF_NODE: + if (item_size == 0) { + SSDFS_ERR("corrupted items area: " + "item_size %u\n", + item_size); + return -ERANGE; + } else if (min_item_size > item_size) { + SSDFS_ERR("corrupted items area: " + "min_item_size %u, " + "item_size %u\n", + min_item_size, item_size); + return -ERANGE; + } else if (item_size > max_item_size) { + SSDFS_ERR("corrupted items area: " + "item_size %u, " + "max_item_size %u\n", + item_size, max_item_size); + return -ERANGE; + } else if (item_size > area_size) { + SSDFS_ERR("corrupted items area: " + "item_size %u, " + "area_size %u\n", + item_size, area_size); + return -ERANGE; + } else + hdr->min_item_size = min_item_size; + + if (max_item_size == 0) { + SSDFS_ERR("corrupted items area: " + "max_item_size %u\n", + max_item_size); + return -ERANGE; + } else if (max_item_size > area_size) { + SSDFS_ERR("corrupted items area: " + "max_item_size %u, " + "area_size %u\n", + max_item_size, area_size); + return -ERANGE; + } else + hdr->max_item_size = cpu_to_le16(max_item_size); + + if (items_capacity == 0) { + SSDFS_ERR("corrupted items area's state\n"); + return -ERANGE; + } else if (((u32)items_capacity * item_size) > area_size) { + SSDFS_ERR("corrupted items area's state: " + "items_capacity %u, item_size %u, " + "area_size %u\n", + items_capacity, + item_size, + area_size); + return -ERANGE; + } else + hdr->items_capacity = cpu_to_le16(items_capacity); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, node_type %#x, " + "start_hash %llx, end_hash %llx\n", + node->node_id, + atomic_read(&node->type), + node->items_area.start_hash, + node->items_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr->start_hash = cpu_to_le64(node->items_area.start_hash); + hdr->end_hash = cpu_to_le64(node->items_area.end_hash); + + area_offset = node->items_area.offset; + area_size = node->items_area.area_size; + if ((area_offset + area_size) > node->node_size) { + SSDFS_ERR("corrupted items area offset %u\n", + area_offset); + return -ERANGE; + } + + hdr->item_area_offset = cpu_to_le32(area_offset); + break; + + case SSDFS_BTREE_HYBRID_NODE: + if (item_size == 0) { + SSDFS_ERR("corrupted items area: " + "item_size %u\n", + item_size); + return -ERANGE; + } else if (min_item_size > item_size) { + SSDFS_ERR("corrupted items area: " + "min_item_size %u, " + "item_size %u\n", + min_item_size, item_size); + return -ERANGE; + } else if (item_size > max_item_size) { + SSDFS_ERR("corrupted items area: " + "item_size %u, " + "max_item_size %u\n", + item_size, max_item_size); + return -ERANGE; + } else if (item_size > area_size) { + SSDFS_ERR("corrupted items area: " + "item_size %u, " + "area_size %u\n", + item_size, area_size); + return -ERANGE; + } else + hdr->min_item_size = min_item_size; + + if (max_item_size == 0) { + SSDFS_ERR("corrupted items area: " + "max_item_size %u\n", + max_item_size); + return -ERANGE; + } else if (max_item_size > area_size) { + SSDFS_ERR("corrupted items area: " + "max_item_size %u, " + "area_size %u\n", + max_item_size, area_size); + return -ERANGE; + } else + hdr->max_item_size = cpu_to_le16(max_item_size); + + if (items_capacity == 0) { + SSDFS_ERR("corrupted items area's state\n"); + return -ERANGE; + } else if (((u32)items_capacity * min_item_size) > area_size) { + SSDFS_ERR("corrupted items area's state: " + "items_capacity %u, min_item_szie %u, " + "area_size %u\n", + items_capacity, + min_item_size, + area_size); + return -ERANGE; + } else + hdr->items_capacity = cpu_to_le16(items_capacity); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, node_type %#x, " + "start_hash %llx, end_hash %llx\n", + node->node_id, + atomic_read(&node->type), + node->items_area.start_hash, + node->items_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr->start_hash = cpu_to_le64(node->items_area.start_hash); + hdr->end_hash = cpu_to_le64(node->items_area.end_hash); + + area_offset = node->items_area.offset; + area_size = node->items_area.area_size; + if ((area_offset + area_size) > node->node_size) { + SSDFS_ERR("corrupted items area offset %u\n", + area_offset); + return -ERANGE; + } + + hdr->item_area_offset = cpu_to_le32(area_offset); + + area_offset = node->index_area.offset; + area_size = node->index_area.area_size; + if ((area_offset + area_size) > node->node_size) { + SSDFS_ERR("corrupted index area offset %u\n", + area_offset); + return -ERANGE; + } else if ((area_offset + area_size) > node->items_area.offset) { + SSDFS_ERR("corrupted index area offset %u\n", + area_offset); + return -ERANGE; + } + + hdr->index_area_offset = cpu_to_le32(area_offset); + break; + + case SSDFS_BTREE_INDEX_NODE: + if (min_item_size != 0) { + SSDFS_ERR("corrupted items area: " + "min_item_size %u\n", + min_item_size); + return -ERANGE; + } else + hdr->min_item_size = min_item_size; + + if (max_item_size != 0) { + SSDFS_ERR("corrupted items area: " + "max_item_size %u\n", + max_item_size); + return -ERANGE; + } else + hdr->max_item_size = cpu_to_le16(max_item_size); + + if (items_capacity != 0) { + SSDFS_ERR("corrupted items area's state\n"); + return -ERANGE; + } else + hdr->items_capacity = cpu_to_le16(items_capacity); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, node_type %#x, " + "start_hash %llx, end_hash %llx\n", + node->node_id, + atomic_read(&node->type), + node->index_area.start_hash, + node->index_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr->start_hash = cpu_to_le64(node->index_area.start_hash); + hdr->end_hash = cpu_to_le64(node->index_area.end_hash); + + area_offset = node->index_area.offset; + area_size = node->index_area.area_size; + if ((area_offset + area_size) > node->node_size) { + SSDFS_ERR("corrupted index area offset %u\n", + area_offset); + return -ERANGE; + } + + hdr->index_area_offset = cpu_to_le32(area_offset); + break; + + default: + SSDFS_ERR("invalid node type %#x\n", type); + return -ERANGE; + } + + hdr->create_cno = cpu_to_le64(node->create_cno); + hdr->node_id = cpu_to_le32(node->node_id); + + return 0; +} + +/* + * ssdfs_btree_node_pre_flush() - pre-flush the dirty btree node + * @node: node object + * + * This method tries to pre-flush the dirty btree node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_btree_node_pre_flush(struct ssdfs_btree_node *node) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree); + + SSDFS_DBG("node_id %u, height %u, type %#x, state %#x\n", + node->node_id, atomic_read(&node->height), + atomic_read(&node->type), + atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_ssdfs_btree_node_dirty(node)) + return 0; + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_ROOT_NODE: + if (!node->tree->btree_ops || + !node->tree->btree_ops->pre_flush_root_node) { + SSDFS_WARN("unable to pre-flush the root node\n"); + return -EOPNOTSUPP; + } + + err = node->tree->btree_ops->pre_flush_root_node(node); + if (unlikely(err)) { + SSDFS_ERR("fail to pre-flush root node: " + "node_id %u, height %u, err %d\n", + node->node_id, + atomic_read(&node->height), + err); + } + break; + + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_LEAF_NODE: + if (!node->tree->btree_ops || + !node->tree->btree_ops->pre_flush_node) { + SSDFS_WARN("unable to pre-flush common node\n"); + return -EOPNOTSUPP; + } + + err = node->tree->btree_ops->pre_flush_node(node); + if (unlikely(err)) { + SSDFS_ERR("fail to pre-flush common node: " + "node_id %u, height %u, err %d\n", + node->node_id, + atomic_read(&node->height), + err); + } + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid node type %#x\n", + atomic_read(&node->type)); + break; + } + + return err; +} + +/* + * ssdfs_btree_flush_root_node() - flush root node + * @node: node object + * @root_node: pointer on the on-disk root node object + */ +void ssdfs_btree_flush_root_node(struct ssdfs_btree_node *node, + struct ssdfs_btree_inline_root_node *root_node) +{ + size_t node_ids_len = sizeof(__le32) * + SSDFS_BTREE_ROOT_NODE_INDEX_COUNT; + size_t indexes_len = sizeof(struct ssdfs_btree_index) * + SSDFS_BTREE_ROOT_NODE_INDEX_COUNT; + u16 items_count; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !root_node); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); + + SSDFS_DBG("node_id %u, height %u, type %#x\n", + node->node_id, atomic_read(&node->height), + atomic_read(&node->type)); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&node->header_lock); + + items_count = node->index_area.index_count; + root_node->header.height = (u8)atomic_read(&node->tree->height); + root_node->header.items_count = cpu_to_le16(items_count); + root_node->header.flags = (u8)atomic_read(&node->flags); + root_node->header.type = (u8)atomic_read(&node->type); + ssdfs_memcpy(root_node->header.node_ids, 0, node_ids_len, + node->raw.root_node.header.node_ids, 0, node_ids_len, + node_ids_len); + ssdfs_memcpy(root_node->indexes, 0, indexes_len, + node->raw.root_node.indexes, 0, indexes_len, + indexes_len); + clear_ssdfs_btree_node_dirty(node); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("left index (node_id %u, hash %llx), " + "right index (node_id %u, hash %llx)\n", + le32_to_cpu(root_node->header.node_ids[0]), + le64_to_cpu(root_node->indexes[0].hash), + le32_to_cpu(root_node->header.node_ids[1]), + le64_to_cpu(root_node->indexes[1].hash)); +#endif /* CONFIG_SSDFS_DEBUG */ + + up_write(&node->header_lock); + + spin_lock(&node->tree->nodes_lock); + root_node->header.upper_node_id = + cpu_to_le32(node->tree->upper_node_id); + spin_unlock(&node->tree->nodes_lock); + + ssdfs_request_init(&node->flush_req); + atomic_set(&node->flush_req.result.state, SSDFS_REQ_FINISHED); +} + +/* + * ssdfs_btree_node_copy_header_nolock() - copy btree node's header + * @node: node object + * @page: memory page to store the metadata [out] + * @write_offset: current write offset [out] + * + * This method tries to save the btree node's header. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_btree_node_copy_header_nolock(struct ssdfs_btree_node *node, + struct page *page, + u32 *write_offset) +{ + size_t hdr_size; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !page); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, height %u, type %#x, write_offset %u\n", + node->node_id, atomic_read(&node->height), + atomic_read(&node->type), *write_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr_size = sizeof(node->raw); + + if (*write_offset >= PAGE_SIZE) { + SSDFS_ERR("invalid write_offset %u\n", + *write_offset); + return -EINVAL; + } + + /* all btrees have the same node's header size */ + err = ssdfs_memcpy_to_page(page, *write_offset, PAGE_SIZE, + &node->raw, 0, hdr_size, + hdr_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy node's header: " + "write_offset %u, size %zu, err %d\n", + *write_offset, hdr_size, err); + return err; + } + + *write_offset += hdr_size; + + if (*write_offset >= PAGE_SIZE) { + SSDFS_ERR("invalid write_offset %u\n", + *write_offset); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_btree_node_prepare_flush_request() - prepare node's content for flush + * @node: node object + * + * This method tries to prepare the node's content + * for flush operation. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_node_prepare_flush_request(struct ssdfs_btree_node *node) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + struct ssdfs_state_bitmap *bmap; + struct page *page; + u64 logical_offset; + u32 data_bytes; + u64 seg_id; + u32 logical_blk; + u32 len; + u32 pvec_size; + int node_flags; + u32 write_offset = 0; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !node->tree->fsi); + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_LEAF_NODE: + /* expected state */ + break; + + default: + BUG(); + }; + + SSDFS_DBG("node_id %u, height %u, type %#x\n", + node->node_id, atomic_read(&node->height), + atomic_read(&node->type)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + pvec_size = node->node_size >> PAGE_SHIFT; + + if (pvec_size == 0 || pvec_size > PAGEVEC_SIZE) { + SSDFS_WARN("invalid memory pages count: " + "node_size %u, pvec_size %u\n", + node->node_size, pvec_size); + return -ERANGE; + } + + if (pagevec_count(&node->content.pvec) != pvec_size) { + SSDFS_ERR("invalid pvec_size: " + "pvec_size1 %u != pvec_size2 %u\n", + pagevec_count(&node->content.pvec), + pvec_size); + return -ERANGE; + } + + ssdfs_request_init(&node->flush_req); + ssdfs_get_request(&node->flush_req); + + logical_offset = (u64)node->node_id * node->node_size; + data_bytes = node->node_size; + ssdfs_request_prepare_logical_extent(node->tree->owner_ino, + (u64)logical_offset, + (u32)data_bytes, + 0, 0, &node->flush_req); + + for (i = 0; i < pvec_size; i++) { + err = ssdfs_request_add_allocated_page_locked(&node->flush_req); + if (unlikely(err)) { + SSDFS_ERR("fail to add page into request: " + "err %d\n", + err); + goto fail_prepare_flush_request; + } + + page = node->flush_req.result.pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + set_page_writeback(page); + } + + down_write(&node->full_lock); + down_write(&node->header_lock); + + ssdfs_lock_page(node->content.pvec.pages[0]); + ssdfs_btree_node_copy_header_nolock(node, + node->content.pvec.pages[0], + &write_offset); + ssdfs_unlock_page(node->content.pvec.pages[0]); + + spin_lock(&node->descriptor_lock); + si = node->seg; + seg_id = le64_to_cpu(node->extent.seg_id); + logical_blk = le32_to_cpu(node->extent.logical_blk); + len = le32_to_cpu(node->extent.len); + spin_unlock(&node->descriptor_lock); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si); + BUG_ON(seg_id != si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_define_segment(seg_id, &node->flush_req); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(logical_blk >= U16_MAX); + BUG_ON(len >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + ssdfs_request_define_volume_extent((u16)logical_blk, (u16)len, + &node->flush_req); + + for (i = 0; i < pvec_size; i++) { + struct page *page; + + ssdfs_lock_page(node->content.pvec.pages[i]); + + page = node->flush_req.result.pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("REQUEST: page %p, count %d, " + "flags %#lx, page_index %lu\n", + page, page_ref_count(page), + page->flags, page_index(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = node->content.pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("NODE CONTENT: page %p, count %d, " + "flags %#lx, page_index %lu\n", + page, page_ref_count(page), + page->flags, page_index(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_memmove_page(node->flush_req.result.pvec.pages[i], + 0, PAGE_SIZE, + node->content.pvec.pages[i], + 0, PAGE_SIZE, + PAGE_SIZE); + + ssdfs_unlock_page(node->content.pvec.pages[i]); + } + + node_flags = atomic_read(&node->flags); + + if (node_flags & SSDFS_BTREE_NODE_PRE_ALLOCATED) { + /* update pre-allocated extent */ + err = ssdfs_segment_update_pre_alloc_extent_async(si, + SSDFS_REQ_ASYNC_NO_FREE, + &node->flush_req); + } else { + /* update extent */ + err = ssdfs_segment_update_extent_async(si, + SSDFS_REQ_ASYNC_NO_FREE, + &node->flush_req); + } + + if (!err) { + down_read(&node->bmap_array.lock); + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_DIRTY_BMAP]; + spin_lock(&bmap->lock); + bitmap_clear(bmap->ptr, 0, node->bmap_array.bits_count); + spin_unlock(&bmap->lock); + up_read(&node->bmap_array.lock); + clear_ssdfs_btree_node_dirty(node); + } + + up_write(&node->header_lock); + up_write(&node->full_lock); + + if (unlikely(err)) { + SSDFS_ERR("update request failed: " + "ino %llu, logical_offset %llu, size %u, err %d\n", + node->flush_req.extent.ino, + node->flush_req.extent.logical_offset, + node->flush_req.extent.data_bytes, + err); + return err; + } + + return 0; + +fail_prepare_flush_request: + for (i = 0; i < pagevec_count(&node->flush_req.result.pvec); i++) { + page = node->flush_req.result.pvec.pages[i]; + + if (!page) + continue; + + SetPageError(page); + end_page_writeback(page); + } + + ssdfs_request_unlock_and_remove_pages(&node->flush_req); + ssdfs_put_request(&node->flush_req); + + return err; +} + +/* + * ssdfs_btree_common_node_flush() - common method of node flushing + * @node: node object + * + * This method tries to flush the node in general way. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_btree_common_node_flush(struct ssdfs_btree_node *node) +{ + struct ssdfs_fs_info *fsi; + struct page *page; + size_t index_key_size = sizeof(struct ssdfs_btree_index_key); + u32 pvec_size; + int node_flags; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !node->tree->fsi); + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_LEAF_NODE: + /* expected state */ + break; + + default: + BUG(); + }; + + SSDFS_DBG("node_id %u, height %u, type %#x\n", + node->node_id, atomic_read(&node->height), + atomic_read(&node->type)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + pvec_size = node->node_size >> PAGE_SHIFT; + + if (pvec_size == 0 || pvec_size > PAGEVEC_SIZE) { + SSDFS_WARN("invalid memory pages count: " + "node_size %u, pvec_size %u\n", + node->node_size, pvec_size); + return -ERANGE; + } + + if (pagevec_count(&node->content.pvec) != pvec_size) { + SSDFS_ERR("invalid pvec_size: " + "pvec_size1 %u != pvec_size2 %u\n", + pagevec_count(&node->content.pvec), + pvec_size); + return -ERANGE; + } + + node_flags = atomic_read(&node->flags); + + if (can_diff_on_write_metadata_be_used(node)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, height %u, type %#x\n", + node->node_id, atomic_read(&node->height), + atomic_read(&node->type)); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_btree_node_prepare_diff(node); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is not ready for diff: " + "ino %llu, logical_offset %llu, size %u\n", + node->node_id, + node->flush_req.extent.ino, + node->flush_req.extent.logical_offset, + node->flush_req.extent.data_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_btree_node_prepare_flush_request(node); + } + } else { + err = ssdfs_btree_node_prepare_flush_request(node); + } + + if (unlikely(err)) { + SSDFS_ERR("update request failed: " + "ino %llu, logical_offset %llu, size %u, err %d\n", + node->flush_req.extent.ino, + node->flush_req.extent.logical_offset, + node->flush_req.extent.data_bytes, + err); + goto fail_flush_node; + } else if (node_flags & SSDFS_BTREE_NODE_PRE_ALLOCATED) { + struct ssdfs_btree_node *parent; + struct ssdfs_btree_index_key old_key, new_key; + u16 flags; + + spin_lock(&node->descriptor_lock); + ssdfs_memcpy(&old_key, 0, index_key_size, + &node->node_index, 0, index_key_size, + index_key_size); + spin_unlock(&node->descriptor_lock); + + ssdfs_memcpy(&new_key, 0, index_key_size, + &old_key, 0, index_key_size, + index_key_size); + + flags = le16_to_cpu(old_key.flags); + flags &= ~SSDFS_BTREE_INDEX_SHOW_PREALLOCATED_CHILD; + new_key.flags = le16_to_cpu(flags); + + spin_lock(&node->descriptor_lock); + parent = node->parent_node; + spin_unlock(&node->descriptor_lock); + + err = ssdfs_btree_node_change_index(parent, + &old_key, + &new_key); + if (!err) { + spin_lock(&node->descriptor_lock); + ssdfs_memcpy(&node->node_index, 0, index_key_size, + &new_key, 0, index_key_size, + index_key_size); + spin_unlock(&node->descriptor_lock); + + atomic_and(~SSDFS_BTREE_NODE_PRE_ALLOCATED, + &node->flags); + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index_start_bit %lu, item_start_bit %lu, " + "bits_count %lu\n", + node->bmap_array.index_start_bit, + node->bmap_array.item_start_bit, + node->bmap_array.bits_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; + +fail_flush_node: + for (i = 0; i < pagevec_count(&node->flush_req.result.pvec); i++) { + page = node->flush_req.result.pvec.pages[i]; + + if (!page) + continue; + + SetPageError(page); + end_page_writeback(page); + } + + ssdfs_request_unlock_and_remove_pages(&node->flush_req); + ssdfs_put_request(&node->flush_req); + + return err; +} + +/* + * ssdfs_btree_node_flush() - flush the dirty btree node + * @node: node object + * + * This method tries to flush the dirty btree node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_btree_node_flush(struct ssdfs_btree_node *node) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("node_id %u, height %u, type %#x, state %#x\n", + node->node_id, atomic_read(&node->height), + atomic_read(&node->type), + atomic_read(&node->state)); +#else + SSDFS_DBG("node_id %u, height %u, type %#x, state %#x\n", + node->node_id, atomic_read(&node->height), + atomic_read(&node->type), + atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (!is_ssdfs_btree_node_dirty(node)) + return 0; + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_ROOT_NODE: + if (!node->tree->btree_ops || + !node->tree->btree_ops->flush_root_node) { + SSDFS_WARN("unable to flush the root node\n"); + return -EOPNOTSUPP; + } + + err = node->tree->btree_ops->flush_root_node(node); + if (unlikely(err)) { + SSDFS_ERR("fail to flush root node: " + "node_id %u, height %u, err %d\n", + node->node_id, + atomic_read(&node->height), + err); + } + break; + + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_LEAF_NODE: + if (!node->tree->btree_ops || + !node->tree->btree_ops->flush_node) { + SSDFS_WARN("unable to flush the common node\n"); + return -EOPNOTSUPP; + } + + err = node->tree->btree_ops->flush_node(node); + if (unlikely(err)) { + SSDFS_ERR("fail to flush common node: " + "node_id %u, height %u, err %d\n", + node->node_id, + atomic_read(&node->height), + err); + } + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid node type %#x\n", + atomic_read(&node->type)); + break; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ssdfs_debug_btree_node_object(node); + + return err; +} + +/* + * ssdfs_btree_node_commit_log() - request the log commit for the node + * @node: node object + * + * This method tries to request the log commit for the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_btree_node_commit_log(struct ssdfs_btree_node *node) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + u64 logical_offset; + u64 seg_id; + u32 logical_blk; + u32 len; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !node->tree->fsi); + + SSDFS_DBG("node_id %u, height %u, type %#x\n", + node->node_id, atomic_read(&node->height), + atomic_read(&node->type)); + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_LEAF_NODE: + /* expected state */ + break; + + default: + BUG(); + }; +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + ssdfs_request_init(&node->flush_req); + ssdfs_get_request(&node->flush_req); + + logical_offset = (u64)node->node_id * node->node_size; + ssdfs_request_prepare_logical_extent(node->tree->owner_ino, + (u64)logical_offset, + 0, 0, 0, &node->flush_req); + + spin_lock(&node->descriptor_lock); + si = node->seg; + seg_id = le64_to_cpu(node->extent.seg_id); + logical_blk = le32_to_cpu(node->extent.logical_blk); + len = le32_to_cpu(node->extent.len); + spin_unlock(&node->descriptor_lock); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si); + BUG_ON(seg_id != si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_define_segment(seg_id, &node->flush_req); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(logical_blk >= U16_MAX); + BUG_ON(len >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + ssdfs_request_define_volume_extent((u16)logical_blk, (u16)len, + &node->flush_req); + + err = ssdfs_segment_commit_log_async(si, SSDFS_REQ_ASYNC_NO_FREE, + &node->flush_req); + if (unlikely(err)) { + SSDFS_ERR("commit log request failed: " + "ino %llu, logical_offset %llu, err %d\n", + node->flush_req.extent.ino, + node->flush_req.extent.logical_offset, + err); + ssdfs_put_request(&node->flush_req); + } + + return err; +} + +/* + * ssdfs_btree_deleted_node_commit_log() - request the log commit (deleted node) + * @node: node object + * + * This method tries to request the log commit for the deleted node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_btree_deleted_node_commit_log(struct ssdfs_btree_node *node) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_segment_info *si; + u64 seg_id; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !node->tree->fsi); + + SSDFS_DBG("node_id %u, height %u, type %#x\n", + node->node_id, atomic_read(&node->height), + atomic_read(&node->type)); + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_LEAF_NODE: + /* expected state */ + break; + + default: + BUG(); + }; +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_ssdfs_btree_node_pre_deleted(node)) { + SSDFS_ERR("node %u is not pre-deleted\n", + node->node_id); + return -ERANGE; + } + + fsi = node->tree->fsi; + + spin_lock(&node->descriptor_lock); + si = node->seg; + seg_id = le64_to_cpu(node->extent.seg_id); + spin_unlock(&node->descriptor_lock); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!si); + BUG_ON(seg_id != si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < si->pebs_count; i++) { + struct ssdfs_segment_request *req; + struct ssdfs_peb_container *pebc; + struct ssdfs_requests_queue *rq; + wait_queue_head_t *wait; + + pebc = &si->peb_array[i]; + + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + return err; + } + + ssdfs_request_init(req); + ssdfs_get_request(req); + + ssdfs_request_prepare_internal_data(SSDFS_PEB_UPDATE_REQ, + SSDFS_COMMIT_LOG_NOW, + SSDFS_REQ_ASYNC, req); + ssdfs_request_define_segment(si->seg_id, req); + + ssdfs_segment_create_request_cno(si); + + rq = &pebc->update_rq; + ssdfs_requests_queue_add_tail_inc(si->fsi, rq, req); + + wait = &si->wait_queue[SSDFS_PEB_FLUSH_THREAD]; + wake_up_all(wait); + } + + return 0; +} + +/* + * is_ssdfs_btree_node_dirty() - check that btree node is dirty + * @node: node object + */ +bool is_ssdfs_btree_node_dirty(struct ssdfs_btree_node *node) +{ + int state; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_read(&node->state); + + switch (state) { + case SSDFS_BTREE_NODE_DIRTY: + case SSDFS_BTREE_NODE_PRE_DELETED: + return true; + + case SSDFS_BTREE_NODE_INITIALIZED: + return false; + + default: + SSDFS_WARN("invalid node state %#x\n", + state); + /* FALLTHRU */ + }; + + return false; +} + +/* + * set_ssdfs_btree_node_dirty() - set btree node in dirty state + * @node: node object + */ +void set_ssdfs_btree_node_dirty(struct ssdfs_btree_node *node) +{ + int state; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_read(&node->state); + + switch (state) { + case SSDFS_BTREE_NODE_DIRTY: + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_CREATED: + atomic_set(&node->state, SSDFS_BTREE_NODE_DIRTY); + spin_lock(&node->tree->nodes_lock); + radix_tree_tag_set(&node->tree->nodes, node->node_id, + SSDFS_BTREE_NODE_DIRTY_TAG); + spin_unlock(&node->tree->nodes_lock); + break; + + default: + SSDFS_WARN("invalid node state %#x\n", + state); + /* FALLTHRU */ + }; +} + +/* + * clear_ssdfs_btree_node_dirty() - clear dirty state of btree node + * @node: node object + */ +void clear_ssdfs_btree_node_dirty(struct ssdfs_btree_node *node) +{ + int state; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_read(&node->state); + + switch (state) { + case SSDFS_BTREE_NODE_DIRTY: + atomic_set(&node->state, SSDFS_BTREE_NODE_INITIALIZED); + spin_lock(&node->tree->nodes_lock); + radix_tree_tag_clear(&node->tree->nodes, node->node_id, + SSDFS_BTREE_NODE_DIRTY_TAG); + spin_unlock(&node->tree->nodes_lock); + break; + + case SSDFS_BTREE_NODE_CORRUPTED: + spin_lock(&node->tree->nodes_lock); + radix_tree_tag_clear(&node->tree->nodes, node->node_id, + SSDFS_BTREE_NODE_DIRTY_TAG); + spin_unlock(&node->tree->nodes_lock); + break; + + case SSDFS_BTREE_NODE_INITIALIZED: + /* do nothing */ + break; + + default: + SSDFS_WARN("invalid node state %#x\n", + state); + /* FALLTHRU */ + }; +} + +/* + * is_ssdfs_btree_node_pre_deleted() - check that btree node is pre-deleted + * @node: node object + */ +bool is_ssdfs_btree_node_pre_deleted(struct ssdfs_btree_node *node) +{ + int state; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_read(&node->state); + + switch (state) { + case SSDFS_BTREE_NODE_PRE_DELETED: + return true; + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + return false; + + default: + SSDFS_WARN("invalid node state %#x\n", + state); + /* FALLTHRU */ + }; + + return false; +} + +/* + * set_ssdfs_btree_node_pre_deleted() - set btree node in pre-deleted state + * @node: node object + */ +void set_ssdfs_btree_node_pre_deleted(struct ssdfs_btree_node *node) +{ + int state; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_read(&node->state); + + switch (state) { + case SSDFS_BTREE_NODE_PRE_DELETED: + case SSDFS_BTREE_NODE_DIRTY: + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_CREATED: + atomic_set(&node->state, SSDFS_BTREE_NODE_PRE_DELETED); + spin_lock(&node->tree->nodes_lock); + radix_tree_tag_set(&node->tree->nodes, node->node_id, + SSDFS_BTREE_NODE_DIRTY_TAG); + spin_unlock(&node->tree->nodes_lock); + break; + + default: + SSDFS_WARN("invalid node state %#x\n", + state); + /* FALLTHRU */ + }; +} + +/* + * clear_ssdfs_btree_node_pre_deleted() - clear pre-deleted state of btree node + * @node: node object + */ +void clear_ssdfs_btree_node_pre_deleted(struct ssdfs_btree_node *node) +{ + int state; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_read(&node->state); + + switch (state) { + case SSDFS_BTREE_NODE_PRE_DELETED: + atomic_set(&node->state, SSDFS_BTREE_NODE_INITIALIZED); + spin_lock(&node->tree->nodes_lock); + radix_tree_tag_clear(&node->tree->nodes, node->node_id, + SSDFS_BTREE_NODE_DIRTY_TAG); + spin_unlock(&node->tree->nodes_lock); + break; + + case SSDFS_BTREE_NODE_CORRUPTED: + spin_lock(&node->tree->nodes_lock); + radix_tree_tag_clear(&node->tree->nodes, node->node_id, + SSDFS_BTREE_NODE_DIRTY_TAG); + spin_unlock(&node->tree->nodes_lock); + break; + + case SSDFS_BTREE_NODE_INITIALIZED: + /* do nothing */ + break; + + default: + SSDFS_WARN("invalid node state %#x\n", + state); + /* FALLTHRU */ + }; +} + +/* + * is_ssdfs_btree_node_index_area_exist() - check that node has index area + * @node: node object + */ +bool is_ssdfs_btree_node_index_area_exist(struct ssdfs_btree_node *node) +{ + u16 flags; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is not initialized\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + case SSDFS_BTREE_NODE_PRE_DELETED: + /* expected state */ + break; + + default: + BUG(); + } + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_ROOT_NODE: + return true; + + case SSDFS_BTREE_INDEX_NODE: + flags = atomic_read(&node->flags); + if (flags & SSDFS_BTREE_NODE_HAS_INDEX_AREA) + return true; + else { + SSDFS_WARN("index node %u hasn't index area\n", + node->node_id); + } + break; + + case SSDFS_BTREE_HYBRID_NODE: + flags = atomic_read(&node->flags); + if (flags & SSDFS_BTREE_NODE_HAS_INDEX_AREA) + return true; + break; + + case SSDFS_BTREE_LEAF_NODE: + /* do nothing */ + break; + + default: + BUG(); + } + + return false; +} + +/* + * is_ssdfs_btree_node_index_area_empty() - check that index area is empty + * @node: node object + */ +bool is_ssdfs_btree_node_index_area_empty(struct ssdfs_btree_node *node) +{ + bool is_empty = false; + int state; + int flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is not initialized\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + case SSDFS_BTREE_NODE_PRE_DELETED: + /* expected state */ + break; + + default: + BUG(); + } + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_ROOT_NODE: + /* need to check the index area */ + break; + + case SSDFS_BTREE_INDEX_NODE: + flags = atomic_read(&node->flags); + if (flags & SSDFS_BTREE_NODE_HAS_INDEX_AREA) { + /* + * need to check the index area + */ + } else { + SSDFS_WARN("index node %u hasn't index area\n", + node->node_id); + return false; + } + break; + + case SSDFS_BTREE_HYBRID_NODE: + flags = atomic_read(&node->flags); + if (flags & SSDFS_BTREE_NODE_HAS_INDEX_AREA) { + /* + * need to check the index area + */ + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u hasn't index area\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return true; + } + break; + + case SSDFS_BTREE_LEAF_NODE: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is leaf node\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return true; + + default: + BUG(); + } + + down_read(&node->header_lock); + state = atomic_read(&node->index_area.state); + if (state != SSDFS_BTREE_NODE_INDEX_AREA_EXIST) + err = -ERANGE; + else if (node->index_area.index_capacity == 0) + err = -ERANGE; + else + is_empty = node->index_area.index_count == 0; + up_read(&node->header_lock); + + if (unlikely(err)) { + SSDFS_WARN("node %u is corrupted\n", node->node_id); + return false; + } + + return is_empty; +} + +/* + * is_ssdfs_btree_node_items_area_exist() - check that node has items area + * @node: node object + */ +bool is_ssdfs_btree_node_items_area_exist(struct ssdfs_btree_node *node) +{ + u16 flags; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is under initialization\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + case SSDFS_BTREE_NODE_PRE_DELETED: + /* expected state */ + break; + + default: + BUG(); + } + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_ROOT_NODE: + case SSDFS_BTREE_INDEX_NODE: + return false; + + case SSDFS_BTREE_HYBRID_NODE: + flags = atomic_read(&node->flags); + if (flags & SSDFS_BTREE_NODE_HAS_ITEMS_AREA) + return true; + break; + + case SSDFS_BTREE_LEAF_NODE: + flags = atomic_read(&node->flags); + if (flags & SSDFS_BTREE_NODE_HAS_ITEMS_AREA) + return true; + else { + SSDFS_WARN("corrupted leaf node %u\n", + node->node_id); + } + break; + + default: + BUG(); + } + + return false; +} + +/* + * is_ssdfs_btree_node_items_area_empty() - check that items area is empty + * @node: node object + */ +bool is_ssdfs_btree_node_items_area_empty(struct ssdfs_btree_node *node) +{ + bool is_empty = false; + int state; + int flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is under initialization\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + case SSDFS_BTREE_NODE_PRE_DELETED: + /* expected state */ + break; + + default: + BUG(); + } + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_ROOT_NODE: + case SSDFS_BTREE_INDEX_NODE: + return true; + + case SSDFS_BTREE_HYBRID_NODE: + flags = atomic_read(&node->flags); + if (flags & SSDFS_BTREE_NODE_HAS_ITEMS_AREA) { + /* + * need to check the items area + */ + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u hasn't items area\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return true; + } + break; + + case SSDFS_BTREE_LEAF_NODE: + flags = atomic_read(&node->flags); + if (flags & SSDFS_BTREE_NODE_HAS_ITEMS_AREA) { + /* + * need to check the items area + */ + } else { + SSDFS_WARN("leaf node %u hasn't items area\n", + node->node_id); + return false; + } + break; + + default: + BUG(); + } + + down_read(&node->header_lock); + state = atomic_read(&node->items_area.state); + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) + err = -ERANGE; + else if (node->items_area.items_capacity == 0) + err = -ERANGE; + else + is_empty = node->items_area.items_count == 0; + up_read(&node->header_lock); + + if (unlikely(err)) { + SSDFS_WARN("node %u is corrupted\n", node->node_id); + return false; + } + + return is_empty; +} + +/* + * ssdfs_btree_node_shrink_index_area() - shrink the index area + * @node: node object + * @new_size: the new size of index area in bytes + * + * This method tries to shrink the index area in size. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EOPNOTSUPP - requsted action is not supported. + */ +static +int ssdfs_btree_node_shrink_index_area(struct ssdfs_btree_node *node, + u32 new_size) +{ + u8 index_size; + u16 index_count; + u16 index_capacity; + u32 area_size; + u32 cur_size; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, new_size %u\n", + node->node_id, new_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + index_size = node->index_area.index_size; + index_count = node->index_area.index_count; + index_capacity = node->index_area.index_capacity; + area_size = node->index_area.area_size; + + cur_size = (u32)index_size * index_count; + + if (area_size <= new_size) { + SSDFS_ERR("cannot grow index area: " + "area_size %u, new_size %u\n", + area_size, new_size); + return -EOPNOTSUPP; + } + + if (new_size % index_size) { + SSDFS_ERR("unaligned new_size: " + "index_size %u, new_size %u\n", + index_size, new_size); + return -ERANGE; + } + + if (cur_size > area_size) { + SSDFS_WARN("invalid cur_size: " + "cur_size %u, area_size %u\n", + cur_size, area_size); + return -ERANGE; + } + + if (cur_size == area_size || cur_size > new_size) { + SSDFS_ERR("unable to shrink index area: " + "cur_size %u, new_size %u, area_size %u\n", + cur_size, new_size, area_size); + return -ERANGE; + } + + node->index_area.area_size = new_size; + node->index_area.index_capacity = new_size / index_size; + + return 0; +} + +/* + * ssdfs_btree_node_grow_index_area() - grow the index area + * @node: node object + * @new_size: the new size of index area in bytes + * + * This method tries to increase the size of index area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EOPNOTSUPP - requsted action is not supported. + */ +static +int ssdfs_btree_node_grow_index_area(struct ssdfs_btree_node *node, + u32 new_size) +{ + u8 index_size; + u16 index_count; + u16 index_capacity; + u32 area_size; + u32 cur_size; + unsigned long offset1, offset2; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, new_size %u\n", + node->node_id, new_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (new_size > node->node_size) { + SSDFS_ERR("invalid new size: " + "new_size %u, node_size %u\n", + new_size, node->node_size); + return -ERANGE; + } + + index_size = node->index_area.index_size; + index_count = node->index_area.index_count; + index_capacity = node->index_area.index_capacity; + area_size = node->index_area.area_size; + + cur_size = (u32)index_size * index_count; + + if (area_size > new_size) { + SSDFS_ERR("cannot shrink index area: " + "area_size %u, new_size %u\n", + area_size, new_size); + return -EOPNOTSUPP; + } + + if (new_size % index_size) { + SSDFS_ERR("unaligned new_size: " + "index_size %u, new_size %u\n", + index_size, new_size); + return -ERANGE; + } + + if (cur_size > area_size) { + SSDFS_WARN("invalid cur_size: " + "cur_size %u, area_size %u\n", + cur_size, area_size); + return -ERANGE; + } + + offset1 = node->items_area.offset; + offset2 = node->index_area.offset; + + if (new_size == node->node_size) { + node->index_area.index_capacity = + (new_size - node->index_area.offset) / index_size; + } else if ((offset1 - offset2) != new_size) { + SSDFS_ERR("unable to resize the index area: " + "items_area.offset %u, index_area.offset %u, " + "new_size %u\n", + node->items_area.offset, + node->index_area.offset, + new_size); + return -ERANGE; + } else + node->index_area.index_capacity = new_size / index_size; + + down_read(&node->bmap_array.lock); + offset1 = node->bmap_array.item_start_bit; + offset2 = node->bmap_array.index_start_bit; + if ((offset1 - offset2) < node->index_area.index_capacity) + err = -ERANGE; + up_read(&node->bmap_array.lock); + + if (unlikely(err)) { + SSDFS_ERR("unable to resize the index area: " + "items_start_bit %lu, index_start_bit %lu, " + "new_index_capacity %u\n", + node->bmap_array.item_start_bit, + node->bmap_array.index_start_bit, + new_size / index_size); + return -ERANGE; + } + + if (new_size == node->node_size) + node->index_area.area_size = new_size - node->index_area.offset; + else + node->index_area.area_size = new_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_area: offset %u, area_size %u, " + "free_space %u, capacity %u; " + "index_area: offset %u, area_size %u, " + "capacity %u\n", + node->items_area.offset, + node->items_area.area_size, + node->items_area.free_space, + node->items_area.items_capacity, + node->index_area.offset, + node->index_area.area_size, + node->index_area.index_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_check_btree_node_after_resize() - check btree node's consistency + * @node: node object + * + * This method tries to check the consistency of btree node + * after resize. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - btree node is inconsistent. + */ +#ifdef CONFIG_SSDFS_DEBUG +static +int ssdfs_check_btree_node_after_resize(struct ssdfs_btree_node *node) +{ + u32 offset; + u32 area_size; + u8 index_size; + u16 index_count; + u16 index_capacity; + u16 items_count; + u16 items_capacity; + u32 average_item_size; + unsigned long bits_count; + unsigned long index_start_bit, item_start_bit; + int err = 0; + + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + offset = node->index_area.offset; + area_size = node->index_area.area_size; + + if ((offset + area_size) == node->node_size) { + /* + * Continue logic + */ + } else if ((offset + area_size) != node->items_area.offset) { + SSDFS_ERR("invalid index area: " + "index_area.offset %u, " + "index_area.area_size %u, " + "items_area.offset %u\n", + node->index_area.offset, + node->index_area.area_size, + node->items_area.offset); + return -ERANGE; + } + + index_size = node->index_area.index_size; + index_count = node->index_area.index_count; + index_capacity = node->index_area.index_capacity; + + if (index_count > index_capacity) { + SSDFS_ERR("index_count %u > index_capacity %u\n", + node->index_area.index_count, + node->index_area.index_capacity); + return -ERANGE; + } + + if (((u32)index_size * index_capacity) > area_size) { + SSDFS_ERR("invalid index area: " + "index_size %u, index_capacity %u, " + "area_size %u\n", + node->index_area.index_size, + node->index_area.index_capacity, + node->index_area.area_size); + return -ERANGE; + } + + offset = node->items_area.offset; + area_size = node->items_area.area_size; + + if (area_size > 0) { + if ((offset + area_size) != node->node_size) { + SSDFS_ERR("invalid items area: " + "items_area.offset %u, " + "items_area.area_size %u, " + "node_size %u\n", + node->items_area.offset, + node->items_area.area_size, + node->node_size); + return -ERANGE; + } + } + + items_count = node->items_area.items_count; + items_capacity = node->items_area.items_capacity; + + if (items_count > items_capacity) { + SSDFS_ERR("invalid items area: " + "items_area.items_count %u, " + "items_area.items_capacity %u\n", + node->items_area.items_count, + node->items_area.items_capacity); + return -ERANGE; + } + + if (items_capacity > 0) { + average_item_size = area_size / items_capacity; + if (average_item_size < node->items_area.item_size || + average_item_size > node->items_area.max_item_size) { + SSDFS_ERR("invalid items area: " + "average_item_size %u, " + "item_size %u, max_item_size %u\n", + average_item_size, + node->items_area.item_size, + node->items_area.max_item_size); + return -ERANGE; + } + } + + down_read(&node->bmap_array.lock); + bits_count = node->bmap_array.bits_count; + index_start_bit = node->bmap_array.index_start_bit; + item_start_bit = node->bmap_array.item_start_bit; + if ((index_capacity + items_capacity + 1) > bits_count) + err = -ERANGE; + if ((item_start_bit - index_start_bit) < index_capacity) + err = -ERANGE; + if ((bits_count - item_start_bit) < items_capacity) + err = -ERANGE; + up_read(&node->bmap_array.lock); + + if (unlikely(err)) { + SSDFS_ERR("invalid bmap_array: " + "bits_count %lu, index_start_bit %lu, " + "item_start_bit %lu, index_capacity %u, " + "items_capacity %u\n", + bits_count, index_start_bit, + item_start_bit, index_capacity, + items_capacity); + return err; + } + + return 0; +} +#endif /* CONFIG_SSDFS_DEBUG */ + +static inline +void ssdfs_set_node_update_cno(struct ssdfs_btree_node *node) +{ + u64 current_cno = ssdfs_current_cno(node->tree->fsi->sb); + + spin_lock(&node->descriptor_lock); + node->update_cno = current_cno; + spin_unlock(&node->descriptor_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("current_cno %llu\n", current_cno); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +/* + * ssdfs_btree_node_resize_index_area() - resize the node's index area + * @node: node object + * @new_size: new size of node's index area + * + * This method tries to resize the index area of btree node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EACCES - node is under initialization yet. + * %-ENOENT - index area is absent. + * %-ENOSPC - index area cannot be resized. + * %-EOPNOTSUPP - resize operation is not supported. + */ +int ssdfs_btree_node_resize_index_area(struct ssdfs_btree_node *node, + u32 new_size) +{ + struct ssdfs_fs_info *fsi; + u16 flags; + u8 index_size; + u16 index_count; + u16 index_capacity; + u32 area_size; + u32 cur_size; + u32 new_items_area_size; + int err = 0, err2; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !node->tree->fsi); + + SSDFS_DBG("node_id %u, new_size %u\n", + node->node_id, new_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is under initialization\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EACCES; + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_ROOT_NODE: + case SSDFS_BTREE_INDEX_NODE: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("resize operation is unavailable: " + "node_id %u\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + + case SSDFS_BTREE_LEAF_NODE: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index area is absent: " + "node_id %u\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOENT; + + case SSDFS_BTREE_HYBRID_NODE: + /* expected node type */ + break; + + default: + BUG(); + } + + if (!is_ssdfs_btree_node_index_area_exist(node)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index area is absent: " + "node_id %u\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOENT; + } + + flags = atomic_read(&node->tree->flags); + if (!(flags & SSDFS_BTREE_DESC_INDEX_AREA_RESIZABLE)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to resize the index area: " + "node_id %u\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } + + if (new_size < node->tree->index_area_min_size || + new_size > node->node_size) { + SSDFS_ERR("invalid new_size %u\n", + new_size); + return -ERANGE; + } + + if (!node->node_ops || !node->node_ops->resize_items_area) { + SSDFS_DBG("unable to resize items area\n"); + return -EOPNOTSUPP; + } + + down_write(&node->full_lock); + down_write(&node->header_lock); + + index_size = node->index_area.index_size; + index_count = node->index_area.index_count; + index_capacity = node->index_area.index_capacity; + area_size = node->index_area.area_size; + + if (index_count > index_capacity) { + err = -ERANGE; + SSDFS_ERR("index_count %u > index_capacity %u\n", + index_count, index_capacity); + goto finish_resize_operation; + } + + if (new_size % index_size) { + err = -ERANGE; + SSDFS_ERR("unaligned new_size: " + "new_size %u, index_size %u\n", + new_size, index_size); + goto finish_resize_operation; + } + + if ((index_size * index_capacity) != area_size) { + err = -ERANGE; + SSDFS_ERR("invalid index area descriptor: " + "index_size %u, index_capacity %u, " + "area_size %u\n", + index_size, index_capacity, area_size); + goto finish_resize_operation; + } + + cur_size = (u32)index_size * index_count; + + if (cur_size > area_size) { + err = -ERANGE; + SSDFS_ERR("cur_size %u > area_size %u\n", + cur_size, area_size); + goto finish_resize_operation; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index_size %u, index_count %u, " + "index_capacity %u, index_area_size %u, " + "cur_size %u, new_size %u\n", + index_size, index_count, + index_capacity, area_size, + cur_size, new_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (new_size < node->index_area.area_size) { + /* shrink index area */ + + if (cur_size > new_size) { + err = -ENOSPC; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to resize: " + "cur_size %u, new_size %u\n", + cur_size, new_size); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_resize_operation; + } + + err = ssdfs_btree_node_shrink_index_area(node, new_size); + if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to shrink index area: " + "new_size %u\n", + new_size); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_resize_operation; + } else if (unlikely(err)) { + SSDFS_ERR("fail to shrink index area: " + "new_size %u, err %d\n", + new_size, err); + goto finish_resize_operation; + } + + new_items_area_size = node->items_area.area_size; + new_items_area_size += area_size - new_size; + + err = node->node_ops->resize_items_area(node, + new_items_area_size); + if (err) { + err2 = ssdfs_btree_node_grow_index_area(node, + cur_size); + if (err == -EOPNOTSUPP || err == -ENOSPC) { + err = err2; + SSDFS_ERR("fail to recover node state: " + "err %d\n", err); + goto finish_resize_operation; + } + } + + if (err == -EOPNOTSUPP) { + err = -ENOSPC; + SSDFS_DBG("resize operation is unavailable\n"); + goto finish_resize_operation; + } else if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to resize items area: " + "new_size %u\n", + new_items_area_size); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_resize_operation; + } else if (unlikely(err)) { + SSDFS_ERR("fail to resize items area: " + "new_size %u, err %d\n", + new_items_area_size, err); + goto finish_resize_operation; + } + } else if (new_size > node->index_area.area_size) { + /* grow index area */ + + if (new_size == node->node_size) { + /* eliminate items area */ + new_items_area_size = 0; + } else if ((new_size - area_size) > node->items_area.area_size) { + err = -ENOSPC; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to resize items area: " + "new_size %u, index_area_size %u, " + "items_area_size %u\n", + new_size, + node->index_area.area_size, + node->items_area.area_size); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_resize_operation; + } else { + new_items_area_size = node->items_area.area_size; + new_items_area_size -= new_size - area_size; + } + + err = node->node_ops->resize_items_area(node, + new_items_area_size); + if (err == -EOPNOTSUPP) { + err = -ENOSPC; + SSDFS_DBG("resize operation is unavailable\n"); + goto finish_resize_operation; + } else if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to resize items area: " + "new_size %u\n", + new_items_area_size); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_resize_operation; + } else if (unlikely(err)) { + SSDFS_ERR("fail to resize items area: " + "new_size %u, err %d\n", + new_items_area_size, err); + goto finish_resize_operation; + } + + err = ssdfs_btree_node_grow_index_area(node, new_size); + if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to grow index area: " + "new_size %u\n", + new_size); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_resize_operation; + } else if (unlikely(err)) { + SSDFS_ERR("fail to grow index area: " + "new_size %u, err %d\n", + new_size, err); + goto finish_resize_operation; + } + } else { + err = -EOPNOTSUPP; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("resize is not necessary: " + "old_size %u, new_size %u\n", + node->index_area.area_size, + new_size); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_resize_operation; + } + +#ifdef CONFIG_SSDFS_DEBUG + err = ssdfs_check_btree_node_after_resize(node); + if (unlikely(err)) { + SSDFS_ERR("node %u is corrupted after resize\n", + node->node_id); + goto finish_resize_operation; + } +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_resize_operation: + up_write(&node->header_lock); + up_write(&node->full_lock); + + if (err == -EOPNOTSUPP) + return 0; + else if (unlikely(err)) + return err; + + ssdfs_set_node_update_cno(node); + set_ssdfs_btree_node_dirty(node); + + return 0; +} + +/* + * ssdfs_set_dirty_index_range() - set index range as dirty + * @node: node object + * @start_index: starting index + * @count: count of indexes in the range + * + * This method tries to mark an index range as dirty. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EEXIST - area is dirty already. + */ +static +int ssdfs_set_dirty_index_range(struct ssdfs_btree_node *node, + u16 start_index, u16 count) +{ + struct ssdfs_state_bitmap *bmap; + unsigned long found = ULONG_MAX; + unsigned long start_area; + u16 capacity = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("capacity %u, start_index %u, count %u\n", + capacity, start_index, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->bmap_array.lock); + + start_area = node->bmap_array.index_start_bit; + if (start_area == ULONG_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid items_area_start\n"); + goto finish_set_dirty_index; + } + + if (node->bmap_array.item_start_bit == ULONG_MAX) + capacity = node->bmap_array.bits_count; + else + capacity = node->bmap_array.item_start_bit - start_area; + + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_DIRTY_BMAP]; + if (!bmap->ptr) { + err = -ERANGE; + SSDFS_WARN("dirty bitmap is empty\n"); + goto finish_set_dirty_index; + } + + spin_lock(&bmap->lock); + + found = bitmap_find_next_zero_area(bmap->ptr, capacity, + start_area + start_index, + count, 0); + if (found != (start_area + start_index)) { + /* area is dirty already */ + err = -EEXIST; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("set bit: start_area %lu, start_index %u, len %u\n", + start_area, start_index, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + bitmap_set(bmap->ptr, start_area + start_index, count); + + spin_unlock(&bmap->lock); + + if (unlikely(err)) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found %lu != start %lu\n", + found, start_area + start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + } + +finish_set_dirty_index: + up_read(&node->bmap_array.lock); + + if (!err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u, tree_type %#x, " + "start_index %u, count %u\n", + node->node_id, node->tree->type, + start_index, count); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return err; +} + +/* + * ssdfs_clear_dirty_index_range_state() - clear an index range as dirty + * @node: node object + * @start_index: starting index + * @count: count of indexes in the range + * + * This method tries to clear the state of index range as dirty. + */ +#ifdef CONFIG_SSDFS_UNDER_DEVELOPMENT_FUNC +static +void ssdfs_clear_dirty_index_range_state(struct ssdfs_btree_node *node, + u16 start_index, u16 count) +{ + struct ssdfs_state_bitmap *bmap; + unsigned long start_area; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("start_index %u, count %u\n", + start_index, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->bmap_array.lock); + + start_area = node->bmap_array.index_start_bit; + BUG_ON(start_area == ULONG_MAX); + + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_DIRTY_BMAP]; + BUG_ON(!bmap->ptr); + + spin_lock(&bmap->lock); + bitmap_clear(bmap->ptr, start_area + start_index, count); + spin_unlock(&bmap->lock); + + up_read(&node->bmap_array.lock); +} +#endif /* CONFIG_SSDFS_UNDER_DEVELOPMENT_FUNC */ + +/* + * __ssdfs_lock_index_range() - lock index range + * @node: node object + * @start_index: starting index + * @count: count of indexes in the range + * + * This method tries to lock index range without semaphore protection. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - unable to lock the index range. + */ +static +int __ssdfs_lock_index_range(struct ssdfs_btree_node *node, + u16 start_index, u16 count) +{ + DEFINE_WAIT(wait); + struct ssdfs_state_bitmap *bmap; + unsigned long start_area; + unsigned long upper_bound; + int i = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->bmap_array.lock)); + + SSDFS_DBG("start_index %u, count %u\n", + start_index, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + start_area = node->bmap_array.index_start_bit; + if (start_area == ULONG_MAX) { + SSDFS_ERR("invalid items_area_start\n"); + return -ERANGE; + } + + upper_bound = start_area + start_index + count; + if (upper_bound > node->bmap_array.item_start_bit) { + SSDFS_ERR("invalid request: " + "start_area %lu, start_index %u, " + "count %u, item_start_bit %lu\n", + start_area, start_index, count, + node->bmap_array.item_start_bit); + return -ERANGE; + } + + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_LOCK_BMAP]; + if (!bmap->ptr) { + SSDFS_WARN("lock bitmap is empty\n"); + return -ERANGE; + } + +try_lock_area: + spin_lock(&bmap->lock); + + for (; i < count; i++) { + err = bitmap_allocate_region(bmap->ptr, + start_area + start_index + i, 0); + if (err) + break; + } + + if (err == -EBUSY) { + err = 0; + bitmap_clear(bmap->ptr, start_area + start_index, i); + prepare_to_wait(&node->wait_queue, &wait, + TASK_UNINTERRUPTIBLE); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("waiting unlocked state of item %u\n", + start_index + i); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_unlock(&bmap->lock); + + schedule(); + finish_wait(&node->wait_queue, &wait); + goto try_lock_area; + } + + spin_unlock(&bmap->lock); + + return err; +} + +/* + * ssdfs_lock_index_range() - lock index range + * @node: node object + * @start_index: starting index + * @count: count of indexes in the range + * + * This method tries to lock index range. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - unable to lock the index range. + */ +static inline +int ssdfs_lock_index_range(struct ssdfs_btree_node *node, + u16 start_index, u16 count) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("start_index %u, count %u\n", + start_index, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->bmap_array.lock); + err = __ssdfs_lock_index_range(node, start_index, count); + up_read(&node->bmap_array.lock); + + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to lock range: " + "start %u, count %u, err %d\n", + start_index, count, err); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return err; +} + +/* + * ssdfs_lock_whole_index_area() - lock the whole index area + * @node: node object + * @start_index: starting index + * @count: count of indexes in the range + * + * This method tries to lock the whole index area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - unable to lock the index range. + */ +int ssdfs_lock_whole_index_area(struct ssdfs_btree_node *node) +{ + unsigned long start, count; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->bmap_array.lock); + start = node->bmap_array.index_start_bit; + count = node->bmap_array.item_start_bit - start; +#ifdef CONFIG_SSDFS_DEBUG + if (start >= U16_MAX || count >= U16_MAX) { + SSDFS_ERR("start %lu, count %lu\n", + start, count); + } + + BUG_ON(start >= U16_MAX); + BUG_ON(count >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + err = __ssdfs_lock_index_range(node, 0, (u16)count); + up_read(&node->bmap_array.lock); + + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to lock range: " + "start %lu, count %lu, err %d\n", + start, count, err); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return err; +} + +/* + * __ssdfs_unlock_index_range() - unlock an index range + * @node: node object + * @start_index: starting index + * @count: count of indexes in the range + * + * This method tries to unlock an index range without node's + * semaphore protection. + */ +static +void __ssdfs_unlock_index_range(struct ssdfs_btree_node *node, + u16 start_index, u16 count) +{ + struct ssdfs_state_bitmap *bmap; + unsigned long upper_bound; + unsigned long start_area; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->bmap_array.lock)); + + SSDFS_DBG("start_index %u, count %u\n", + start_index, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_LOCK_BMAP]; + start_area = node->bmap_array.index_start_bit; + upper_bound = start_area + start_index + count; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap->ptr); + BUG_ON(start_area == ULONG_MAX); + BUG_ON(upper_bound > node->bmap_array.item_start_bit); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&bmap->lock); + bitmap_clear(bmap->ptr, start_area + start_index, count); + spin_unlock(&bmap->lock); +} + +/* + * ssdfs_unlock_index_range() - unlock an index range + * @node: node object + * @start_index: starting index + * @count: count of indexes in the range + * + * This method tries to unlock an index range. + */ +static inline +void ssdfs_unlock_index_range(struct ssdfs_btree_node *node, + u16 start_index, u16 count) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("start_index %u, count %u\n", + start_index, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->bmap_array.lock); + __ssdfs_unlock_index_range(node, start_index, count); + up_read(&node->bmap_array.lock); + wake_up_all(&node->wait_queue); +} + +/* + * ssdfs_unlock_whole_index_area() - unlock the whole index area + * @node: node object + * @start_index: starting index + * @count: count of indexes in the range + * + * This method tries to unlock the whole index area. + */ +void ssdfs_unlock_whole_index_area(struct ssdfs_btree_node *node) +{ + unsigned long start, count; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->bmap_array.lock); + start = node->bmap_array.index_start_bit; + count = node->bmap_array.item_start_bit - start; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(start >= U16_MAX); + BUG_ON(count >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + __ssdfs_unlock_index_range(node, 0, (u16)count); + up_read(&node->bmap_array.lock); + wake_up_all(&node->wait_queue); +} + +/* + * ssdfs_btree_node_get() - increment node's reference counter + * @node: pointer on node object + */ +void ssdfs_btree_node_get(struct ssdfs_btree_node *node) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + WARN_ON(atomic_inc_return(&node->refs_count) <= 0); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree_type %#x, node_id %u, refs_count %d\n", + node->tree->type, node->node_id, + atomic_read(&node->refs_count)); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +/* + * ssdfs_btree_node_put() - decrement node's reference counter + * @node: pointer on node object + */ +void ssdfs_btree_node_put(struct ssdfs_btree_node *node) +{ + if (!node) + return; + + WARN_ON(atomic_dec_return(&node->refs_count) < 0); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree_type %#x, node_id %u, refs_count %d\n", + node->tree->type, node->node_id, + atomic_read(&node->refs_count)); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +/* + * is_ssdfs_node_shared() - check that node is shared between threads + * @node: pointer on node object + */ +bool is_ssdfs_node_shared(struct ssdfs_btree_node *node) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + return atomic_read(&node->refs_count) > 1; +} From patchwork Sat Feb 25 01:09:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151958 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C8D3C7EE30 for ; Sat, 25 Feb 2023 01:19:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229755AbjBYBTp (ORCPT ); Fri, 24 Feb 2023 20:19:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49004 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229751AbjBYBRk (ORCPT ); Fri, 24 Feb 2023 20:17:40 -0500 Received: from mail-oi1-x234.google.com (mail-oi1-x234.google.com [IPv6:2607:f8b0:4864:20::234]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ED49814225 for ; Fri, 24 Feb 2023 17:17:25 -0800 (PST) Received: by mail-oi1-x234.google.com with SMTP id o12so804130oik.6 for ; Fri, 24 Feb 2023 17:17:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=T+9ckWGyOYjVhi0ls0UKK6KqggQpUQFUMZwvw7LkdYE=; b=jJdpjOMwH6hcPO6hDt9f6bRSEValfStxuTEm7zB/iQWlJIbXpRoeKi6CFnbo76WLh5 YG7KOW608coJbtxaAkmL7hCeGiS7CbPOxRjxg5d30T8LF4uqoY9k6xbT/x3477NrhAFP 4K5WpHAknTFuYLgmjZ8DaIADffeumFT/tZlx29gp44NkV2UsCMFL12kAm6373Efeln3o DqJr/0tUQjYZAF+mLIpPMPy72pIIsX1xDrKGQh20VxBhSYzS1VP8rVt4F+qAqdlbwbU5 aI415fFd7mMLBj5jpTT9HSKgG6ydGDmhDj7xIA2BPiK7ofy3eTWXP3NPmEH7M4TB5EKK rrdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=T+9ckWGyOYjVhi0ls0UKK6KqggQpUQFUMZwvw7LkdYE=; b=V6TNdCpZXGsf5rT55HUOddRPjcNor4HyO4Bb6Xj+pQmp70Lq2XTnKfi6S9U3o1+O78 V6PCTQPlT/sLAudB34BCDXtaHuWafEmJOsX+JnTSEmVBQPpoXsYiu69w0SHR5mN5jQ8U yTybJB8OsT2Mdq5ploUHpB2kNlYUbJ2hXj/WjzkSukpohTPt5CskiHzrmENdMct38SeR gcjQWNEwi7iD/UBgVfcjzGaGkhABI4mPRaOkQahyv3iWp66fjzFzD4j9VFXhDSOz2LQX YzfABMgWAmOPdD6GhHXIExrUU+B7WMc+inMh3d6Wexn4xMw3FhHqU6kou5bp8hhUEjYB v8Jw== X-Gm-Message-State: AO0yUKVfX3rDVxMiGrkiVOpnVt30C9vRIG7JNcztzdHaw7ZyYj5AdV8c xCfzmcj8xu46DFx6s7mh6wcerkERVHcQlOiH X-Google-Smtp-Source: AK7set/chlsM5OQitag5fTru1bDfh4Wk8Fan6vmhvjbdBjW/LAupEIJNFsZthpBWygZX0kHzRqfIJA== X-Received: by 2002:aca:1c0a:0:b0:384:20c:88f2 with SMTP id c10-20020aca1c0a000000b00384020c88f2mr1423128oic.44.1677287844682; Fri, 24 Feb 2023 17:17:24 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:23 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 52/76] ssdfs: b-tree node index operations Date: Fri, 24 Feb 2023 17:09:03 -0800 Message-Id: <20230225010927.813929-53-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Every b-tree node is described by index record (or key) that includes: (1) node ID, (2) node type, (3) node height, (4) starting hash value, (5) raw extent. The raw extent describes the segment ID, logical block ID, and length. Index records are stored into index and hybrid b-tree nodes. These records implement mechanism of lookup and traverse operations in b-tree. Index operations include: (1) resize_index_area - operation of increasing size of index area in hybrid b-tree node by means of redistribution of free space between index and item area and shift of item area (2) find_index - find index in hybrid or index b-tree node (3) add_index - add index in hybrid of index b-tree node (4) change_index - change index record in hybrid of index b-tree node (5) delete_index - delete index record from hybrid of index b-tree node Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/btree_node.c | 2985 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 2985 insertions(+) diff --git a/fs/ssdfs/btree_node.c b/fs/ssdfs/btree_node.c index a826b1c9699d..f4402cb8df64 100644 --- a/fs/ssdfs/btree_node.c +++ b/fs/ssdfs/btree_node.c @@ -5222,3 +5222,2988 @@ bool is_ssdfs_node_shared(struct ssdfs_btree_node *node) return atomic_read(&node->refs_count) > 1; } + +/* + * ssdfs_btree_root_node_find_index() - find index record in root node + * @node: node object + * search_hash: hash for search in the index area + * @found_index: identification number of found index [out] + * + * This method tries to find the index record for the requested hash. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENODATA - unable to find the node's index. + * %-EEXIST - search hash has been found. + */ +static +int ssdfs_btree_root_node_find_index(struct ssdfs_btree_node *node, + u64 search_hash, + u16 *found_index) +{ + int i; + int err = -ENODATA; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !found_index); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, node_type %#x, search_hash %llx\n", + node->node_id, atomic_read(&node->type), + search_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + *found_index = U16_MAX; + + for (i = 0; i < SSDFS_BTREE_ROOT_NODE_INDEX_COUNT; i++) { + struct ssdfs_btree_index *ptr = &node->raw.root_node.indexes[i]; + u64 hash = le64_to_cpu(ptr->hash); + + if (hash == U64_MAX) + break; + + if (search_hash < hash) + break; + + err = 0; + *found_index = i; + + if (search_hash == hash) { + err = -EEXIST; + break; + } + } + + return err; +} + +#define CUR_INDEX(kaddr, page_off, index) \ + ((struct ssdfs_btree_index_key *)((u8 *)kaddr + \ + page_off + (index * sizeof(struct ssdfs_btree_index_key)))) + +/* + * ssdfs_get_index_key_hash() - get hash from a range + * @node: node object + * @kaddr: pointer on starting address in the page + * @page_off: offset from page's beginning in bytes + * @index: requested starting index in the range + * @upper_index: last index in the available range + * @hash_index: available (not locked) index in the range [out] + * @hash: hash value of found index [out] + * + * This method tries to find any unlocked index in suggested + * range. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENODATA - unable to find the node's index. + */ +static +int ssdfs_get_index_key_hash(struct ssdfs_btree_node *node, + void *kaddr, u32 page_off, + u32 index, u32 upper_index, + u32 *hash_index, u64 *hash) +{ + struct ssdfs_btree_index_key *ptr; + int err = -ENODATA; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !kaddr || !hash_index || !hash); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("kaddr %p, page_off %u, " + "index %u, upper_index %u\n", + kaddr, page_off, index, upper_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + *hash = U64_MAX; + + for (*hash_index = index; *hash_index <= upper_index; ++(*hash_index)) { + err = ssdfs_lock_index_range(node, *hash_index, 1); + if (unlikely(err)) { + SSDFS_ERR("fail to lock index %u, err %d\n", + *hash_index, err); + break; + } + + err = -EEXIST; + ptr = CUR_INDEX(kaddr, page_off, *hash_index); + *hash = le64_to_cpu(ptr->index.hash); + + ssdfs_unlock_index_range(node, *hash_index, 1); + + if (err == -EEXIST) { + err = 0; + break; + } else if (err == -ENODATA) + continue; + else if (unlikely(err)) + break; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hash_index %u, hash %llx\n", + *hash_index, *hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_check_last_index() - check last index in the search + * @node: node object + * @kaddr: pointer on starting address in the page + * @page_off: offset from page's beginning in bytes + * @index: requested index for the check + * @search_hash: hash for search + * @range_start: first index in the index area + * @range_end: last index in the index area + * @prev_found: processed index on previous iteration + * @found_index: value of found index [out] + * + * This method tries to check the index for the case when + * range has only one index. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOENT - requested hash is located in previous range. + * %-ENODATA - unable to find the node's index. + */ +static +int ssdfs_check_last_index(struct ssdfs_btree_node *node, + void *kaddr, u32 page_off, + u32 index, u64 search_hash, + u32 range_start, u32 range_end, + u32 prev_found, u16 *found_index) +{ + u32 hash_index; + u64 hash; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !kaddr || !found_index); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("kaddr %p, page_off %u, " + "index %u, search_hash %llx, " + "range_start %u, range_end %u, " + "prev_found %u\n", + kaddr, page_off, index, search_hash, + range_start, range_end, prev_found); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_get_index_key_hash(node, kaddr, page_off, + index, index, + &hash_index, &hash); + if (unlikely(err)) { + SSDFS_ERR("fail to get hash: " + "index %u, err %d\n", + index, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hash_index %u, hash %llx\n", + hash_index, hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (hash_index != index) { + SSDFS_ERR("hash_index %u != index %u\n", + hash_index, index); + return -ERANGE; + } + + if (search_hash < hash) { + err = -ENOENT; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find index: " + "index %u, search_hash %llx, " + "hash %llx\n", + hash_index, search_hash, + hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (prev_found < U16_MAX) + *found_index = prev_found; + else + *found_index = hash_index; + } else if (search_hash == hash) { + err = 0; + *found_index = hash_index; + } else { + err = -ENODATA; + *found_index = hash_index; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("prev_found %u, found_index %u\n", + prev_found, *found_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_check_last_index_pair() - check last pair of indexes in the search + * @node: node object + * @kaddr: pointer on starting address in the page + * @page_off: offset from page's beginning in bytes + * @lower_index: starting index in the search + * @upper_index: ending index in the search + * @search_hash: hash for search + * @range_start: first index in the index area + * @range_end: last index in the index area + * @prev_found: processed index on previous iteration + * @found_index: value of found index [out] + * + * This method tries to find an index for the case when + * range has only two indexes. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOENT - requested hash is located in previous range. + * %-ENODATA - unable to find the node's index. + */ +static +int ssdfs_check_last_index_pair(struct ssdfs_btree_node *node, + void *kaddr, u32 page_off, + u32 lower_index, u32 upper_index, + u64 search_hash, + u32 range_start, u32 range_end, + u32 prev_found, u16 *found_index) +{ + u32 hash_index; + u64 hash; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !kaddr || !found_index); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("kaddr %p, page_off %u, " + "lower_index %u, upper_index %u, " + "search_hash %llx, range_start %u, prev_found %u\n", + kaddr, page_off, lower_index, upper_index, + search_hash, range_start, prev_found); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_get_index_key_hash(node, kaddr, page_off, + lower_index, upper_index, + &hash_index, &hash); + if (unlikely(err)) { + SSDFS_ERR("fail to get hash: " + "lower_index %u, upper_index %u, err %d\n", + lower_index, upper_index, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hash_index %u, hash %llx\n", + hash_index, hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (hash_index == lower_index) { + if (search_hash < hash) { + err = -ENOENT; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find index: " + "index %u, search_hash %llx, " + "hash %llx\n", + hash_index, search_hash, + hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (prev_found < U16_MAX) + *found_index = prev_found; + else + *found_index = hash_index; + } else if (search_hash == hash) { + err = 0; + *found_index = hash_index; + } else { + prev_found = hash_index; + err = ssdfs_check_last_index(node, kaddr, page_off, + upper_index, search_hash, + range_start, range_end, + prev_found, found_index); + if (err == -ENOENT) { + err = 0; + *found_index = prev_found; + } + } + } else if (hash_index == upper_index) { + if (search_hash > hash) { + err = -ENODATA; + *found_index = upper_index; + } else if (search_hash == hash) { + err = 0; + *found_index = upper_index; + } else { + prev_found = hash_index; + err = ssdfs_check_last_index(node, kaddr, page_off, + lower_index, search_hash, + range_start, range_end, + prev_found, found_index); + if (err == -ENOENT) { + err = 0; + *found_index = prev_found; + } + } + } else { + SSDFS_ERR("invalid index: hash_index %u, " + "lower_index %u, upper_index %u\n", + hash_index, lower_index, upper_index); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("prev_found %u, found_index %u, err %d\n", + prev_found, *found_index, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_find_index_in_memory_page() - find index record in memory page + * @node: node object + * @area: description of index area + * @start_offset: offset in the index area of the node + * search_hash: hash for search in the index area + * @found_index: identification number of found index [out] + * @processed_bytes: amount of processed bytes into index area [out] + * + * This method tries to find the index record for the requested hash. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - unable to find the node's index. + * %-ENOENT - index record is outside of this memory page. + */ +static +int ssdfs_find_index_in_memory_page(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_index_area *area, + u32 start_offset, + u64 search_hash, + u16 *found_index, + u32 *processed_bytes) +{ + struct page *page; + void *kaddr; + u32 page_index; + u32 page_off; + u32 search_bytes; + u32 index_count; + u32 cur_index, upper_index, lower_index; + u32 range_start, range_end; + u32 prev_found; + u64 hash; + u32 processed_indexes = 0; + int err = -ENODATA; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !processed_bytes || !found_index); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, node_type %#x, " + "start_offset %u, search_hash %llx\n", + node->node_id, atomic_read(&node->type), + start_offset, search_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + *found_index = U16_MAX; + *processed_bytes = 0; + + if (start_offset >= (area->offset + area->area_size)) { + SSDFS_ERR("invalid start_offset: " + "offset %u, area_start %u, area_size %u\n", + start_offset, area->offset, area->area_size); + return -ERANGE; + } + + if (area->index_size != sizeof(struct ssdfs_btree_index_key)) { + SSDFS_ERR("invalid index size %u\n", + area->index_size); + return -ERANGE; + } + + page_index = start_offset >> PAGE_SHIFT; + page_off = start_offset % PAGE_SIZE; + + if ((page_off + area->index_size) > PAGE_SIZE) { + SSDFS_ERR("invalid offset into the page: " + "offset %u, index_size %u\n", + page_off, area->index_size); + return -ERANGE; + } + + if (page_index == 0 && page_off < area->offset) { + SSDFS_ERR("page_off %u < area->offset %u\n", + page_off, area->offset); + return -ERANGE; + } + + if (page_off % area->index_size) { + SSDFS_ERR("offset is not aligned: " + "page_off %u, index_size %u\n", + page_off, area->index_size); + return -ERANGE; + } + + if (page_index >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("invalid page index: " + "page_index %u, pagevec_count %u\n", + page_index, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + search_bytes = PAGE_SIZE - page_off; + search_bytes = min_t(u32, search_bytes, + (area->offset + area->area_size) - start_offset); + + index_count = search_bytes / area->index_size; + if (index_count == 0) { + SSDFS_ERR("invalid index_count %u\n", + index_count); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search_bytes %u, offset %u, area_size %u\n", + search_bytes, area->offset, area->area_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + processed_indexes = (start_offset - area->offset); + processed_indexes /= area->index_size; + + if (processed_indexes >= area->index_capacity) { + SSDFS_ERR("processed_indexes %u >= area->index_capacity %u\n", + processed_indexes, + area->index_capacity); + return -ERANGE; + } else if (processed_indexes >= area->index_count) { + err = -ENOENT; + *processed_bytes = search_bytes; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find an index: " + "processed_indexes %u, area->index_count %u\n", + processed_indexes, + area->index_count); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOENT; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("area->index_count %u, area->index_capacity %u\n", + area->index_count, area->index_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + index_count = min_t(u32, index_count, + area->index_count - processed_indexes); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("area->index_count %u, processed_indexes %u, " + "index_count %u\n", + area->index_count, processed_indexes, index_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + cur_index = 0; + range_start = lower_index = 0; + range_end = upper_index = index_count - 1; + + page = node->content.pvec.pages[page_index]; + kaddr = kmap_local_page(page); + + prev_found = *found_index; + while (lower_index <= upper_index) { + int diff = upper_index - lower_index; + u32 hash_index; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("lower_index %u, upper_index %u, diff %d\n", + lower_index, upper_index, diff); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (diff < 0) { + err = -ERANGE; + SSDFS_ERR("invalid diff: " + "diff %d, lower_index %u, " + "upper_index %u\n", + diff, lower_index, upper_index); + goto finish_search; + } + + if (diff == 0) { + err = ssdfs_check_last_index(node, kaddr, page_off, + lower_index, search_hash, + range_start, range_end, + prev_found, found_index); + if (err == -ENOENT) { + if (prev_found < U16_MAX) + *found_index = prev_found; + } else if (err == -ENODATA) { + /* + * Nothing was found + */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to check the last index: " + "index %u, err %d\n", + lower_index, err); + } + + *processed_bytes = search_bytes; + goto finish_search; + } else if (diff == 1) { + err = ssdfs_check_last_index_pair(node, kaddr, + page_off, + lower_index, + upper_index, + search_hash, + range_start, + range_end, + prev_found, + found_index); + if (err == -ENOENT) { + if (prev_found < U16_MAX) + *found_index = prev_found; + } else if (err == -ENODATA) { + /* + * Nothing was found + */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to check the last index pair: " + "lower_index %u, upper_index %u, " + "err %d\n", + lower_index, upper_index, err); + } + + *processed_bytes = search_bytes; + goto finish_search; + } else + cur_index = lower_index + (diff / 2); + + + err = ssdfs_get_index_key_hash(node, kaddr, page_off, + cur_index, upper_index, + &hash_index, &hash); + if (unlikely(err)) { + SSDFS_WARN("fail to get hash: " + "cur_index %u, upper_index %u, err %d\n", + cur_index, upper_index, err); + goto finish_search; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search_hash %llx, hash %llx, " + "hash_index %u, range_start %u, range_end %u\n", + search_hash, hash, hash_index, + range_start, range_end); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search_hash < hash) { + if (hash_index == range_start) { + err = -ENOENT; + *found_index = hash_index; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find index: " + "index %u, search_hash %llx, " + "hash %llx\n", + hash_index, search_hash, + hash); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_search; + } else { + prev_found = lower_index; + upper_index = cur_index; + } + } else if (search_hash == hash) { + err = -EEXIST; + *found_index = cur_index; + *processed_bytes = search_bytes; + goto finish_search; + } else { + if (hash_index == range_end) { + err = -ENODATA; + *found_index = hash_index; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find index: " + "index %u, search_hash %llx, " + "hash %llx\n", + hash_index, search_hash, + hash); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_search; + } else { + prev_found = lower_index; + lower_index = cur_index; + } + } + }; + +finish_search: + kunmap_local(kaddr); + + if (!err || err == -EEXIST) { + *found_index += processed_indexes; + if (*found_index >= area->index_capacity) { + SSDFS_ERR("found_index %u >= capacity %u\n", + *found_index, + area->index_capacity); + return -ERANGE; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("prev_found %u, found_index %u\n", + prev_found, *found_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_btree_common_node_find_index() - find index record + * @node: node object + * @area: description of index area + * search_hash: hash for search in the index area + * @found_index: identification number of found index [out] + * + * This method tries to find the index record for the requested hash. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - unable to find the node's index. + * %-EEXIST - search hash has been found. + */ +static +int ssdfs_btree_common_node_find_index(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_index_area *area, + u64 search_hash, + u16 *found_index) +{ + u32 start_offset, end_offset; + u32 processed_bytes = 0; + u16 prev_found = U16_MAX; + int err = -ENODATA; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !found_index); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, node_type %#x, search_hash %llx\n", + node->node_id, atomic_read(&node->type), + search_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + *found_index = U16_MAX; + + if (atomic_read(&area->state) != SSDFS_BTREE_NODE_INDEX_AREA_EXIST) { + SSDFS_ERR("invalid area state %#x\n", + atomic_read(&area->state)); + return -ERANGE; + } + + if (area->index_count == 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u hasn't any index\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + + if (area->index_count > area->index_capacity) { + SSDFS_ERR("invalid area: " + "index_count %u, index_capacity %u\n", + area->index_count, + area->index_capacity); + return -ERANGE; + } + + if ((area->offset + area->area_size) > node->node_size) { + SSDFS_ERR("invalid area: " + "offset %u, area_size %u, node_size %u\n", + area->offset, + area->area_size, + node->node_size); + return -ERANGE; + } + + if (area->index_size != sizeof(struct ssdfs_btree_index_key)) { + SSDFS_ERR("invalid index size %u\n", + area->index_size); + return -ERANGE; + } + + start_offset = area->offset; + end_offset = area->offset + area->area_size; + + while (start_offset < end_offset) { + prev_found = *found_index; + err = ssdfs_find_index_in_memory_page(node, area, + start_offset, + search_hash, + found_index, + &processed_bytes); + if (err == -ENODATA) { + err = 0; + + if (*found_index >= U16_MAX) { + /* + * continue to search + */ + } else if ((*found_index + 1) >= area->index_count) { + /* + * index has been found + */ + break; + } + } else if (err == -ENOENT) { + err = 0; + + if (prev_found != U16_MAX) { + err = 0; + *found_index = prev_found; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, search_hash %llx, " + "found_index %u\n", + node->node_id, search_hash, + *found_index); +#endif /* CONFIG_SSDFS_DEBUG */ + } + break; + } else if (err == -EEXIST) { + /* + * index has been found + */ + break; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find index: err %d\n", + err); + break; + } else + break; + + start_offset += processed_bytes; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("prev_found %u, found_index %u, err %d\n", + prev_found, *found_index, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * __ssdfs_btree_root_node_extract_index() - extract index from root node + * @node: node object + * @found_index: identification number of found index + * @search: btree search object [out] + * + * This method tries to extract index record from the index area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int __ssdfs_btree_root_node_extract_index(struct ssdfs_btree_node *node, + u16 found_index, + struct ssdfs_btree_index_key *ptr) +{ + size_t index_size = sizeof(struct ssdfs_btree_index); + __le32 node_id; + int node_height; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !ptr); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); + + SSDFS_DBG("node_id %u, node_type %#x, found_index %u\n", + node->node_id, atomic_read(&node->type), + found_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (found_index >= SSDFS_BTREE_ROOT_NODE_INDEX_COUNT) { + SSDFS_ERR("invalid found_index %u\n", + found_index); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index 0: node_id %u; index 1: node_id %u\n", + cpu_to_le32(node->raw.root_node.header.node_ids[0]), + cpu_to_le32(node->raw.root_node.header.node_ids[1])); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->header_lock); + node_id = node->raw.root_node.header.node_ids[found_index]; + ptr->node_id = cpu_to_le32(node_id); + ssdfs_memcpy(&ptr->index, 0, index_size, + &node->raw.root_node.indexes[found_index], 0, index_size, + index_size); + up_read(&node->header_lock); + + node_height = atomic_read(&node->height); +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(node_height < 0); +#endif /* CONFIG_SSDFS_DEBUG */ + ptr->height = node_height - 1; + + switch (node_height) { + case SSDFS_BTREE_LEAF_NODE_HEIGHT: + case SSDFS_BTREE_PARENT2LEAF_HEIGHT: + ptr->node_type = SSDFS_BTREE_LEAF_NODE; + break; + + case SSDFS_BTREE_PARENT2HYBRID_HEIGHT: + ptr->node_type = SSDFS_BTREE_HYBRID_NODE; + break; + + default: + ptr->node_type = SSDFS_BTREE_INDEX_NODE; + break; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_height %u, node_type %#x\n", + node_height, ptr->node_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + ptr->flags = cpu_to_le16(SSDFS_BTREE_INDEX_HAS_VALID_EXTENT); + + return 0; +} + +/* + * ssdfs_btree_root_node_extract_index() - extract index from root node + * @node: node object + * @found_index: identification number of found index + * @search: btree search object [out] + * + * This method tries to extract index record from the index area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static inline +int ssdfs_btree_root_node_extract_index(struct ssdfs_btree_node *node, + u16 found_index, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_index_key *ptr; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); + + SSDFS_DBG("node_id %u, node_type %#x, found_index %u\n", + node->node_id, atomic_read(&node->type), + found_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + ptr = &search->node.found_index; + return __ssdfs_btree_root_node_extract_index(node, found_index, ptr); +} + +/* + * ssdfs_btree_node_get_index() - extract index from node + * @pvec: pagevec object + * @area_offset: area offset from the node's beginning + * @area_size: size of the area + * @node_size: node size in bytes + * @position: position of index record in the node + * @ptr: pointer on index buffer [out] + * + * This method tries to extract index record from the index area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_btree_node_get_index(struct pagevec *pvec, + u32 area_offset, u32 area_size, + u32 node_size, u16 position, + struct ssdfs_btree_index_key *ptr) +{ + size_t index_size = sizeof(struct ssdfs_btree_index_key); + struct page *page; + u32 page_index; + u32 page_off; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pvec || !ptr); + + SSDFS_DBG("area_offset %u, area_size %u, " + "node_size %u, position %u\n", + area_offset, area_size, + node_size, position); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_define_memory_page(area_offset, area_size, + node_size, index_size, position, + &page_index, &page_off); + if (unlikely(err)) { + SSDFS_ERR("fail to define memory page: " + "position %u, err %d\n", + position, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(page_index >= U32_MAX); + BUG_ON(page_off >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (page_index >= pagevec_count(pvec)) { + SSDFS_ERR("page_index %u > pvec_size %u\n", + page_index, + pagevec_count(pvec)); + return -ERANGE; + } + + page = pvec->pages[page_index]; + err = ssdfs_memcpy_from_page(ptr, 0, index_size, + page, page_off, PAGE_SIZE, + index_size); + if (unlikely(err)) { + SSDFS_ERR("invalid page_off %u, err %d\n", + page_off, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, node_type %#x, hash %llx, " + "seg_id %llu, logical_blk %u, len %u\n", + le32_to_cpu(ptr->node_id), + ptr->node_type, + le64_to_cpu(ptr->index.hash), + le64_to_cpu(ptr->index.extent.seg_id), + le32_to_cpu(ptr->index.extent.logical_blk), + le32_to_cpu(ptr->index.extent.len)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * __ssdfs_btree_common_node_extract_index() - extract index from node + * @node: node object + * @area: description of index area + * @found_index: identification number of found index + * @search: btree search object [out] + * + * This method tries to extract index record from the index area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int __ssdfs_btree_common_node_extract_index(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_index_area *area, + u16 found_index, + struct ssdfs_btree_index_key *ptr) +{ + struct page *page; + size_t index_key_len = sizeof(struct ssdfs_btree_index_key); + u32 page_index; + u32 page_off; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !ptr); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); + + SSDFS_DBG("node_id %u, node_type %#x, found_index %u\n", + node->node_id, atomic_read(&node->type), + found_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (found_index == area->index_count) { + SSDFS_ERR("found_index %u == index_count %u\n", + found_index, area->index_count); + return -ERANGE; + } + + err = ssdfs_define_memory_page(node, area, found_index, + &page_index, &page_off); + if (unlikely(err)) { + SSDFS_ERR("fail to define memory page: " + "node_id %u, found_index %u, err %d\n", + node->node_id, found_index, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(page_index >= U32_MAX); + BUG_ON(page_off >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (page_index >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("page_index %u > pvec_size %u\n", + page_index, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + page = node->content.pvec.pages[page_index]; + err = ssdfs_memcpy_from_page(ptr, 0, index_key_len, + page, page_off, PAGE_SIZE, + index_key_len); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + return err; + } + + if (ptr->node_type <= SSDFS_BTREE_NODE_UNKNOWN_TYPE || + ptr->node_type >= SSDFS_BTREE_NODE_TYPE_MAX) { + SSDFS_WARN("node_id %u, node_type %#x, found_index %u\n", + node->node_id, atomic_read(&node->type), + found_index); + SSDFS_ERR("page_index %u, page_off %u\n", + page_index, page_off); + SSDFS_ERR("FOUND_INDEX: node_id %u, node_type %#x, " + "height %u, flags %#x, hash %llx, " + "seg_id %llu, logical_blk %u, len %u\n", + le32_to_cpu(ptr->node_id), + ptr->node_type, ptr->height, ptr->flags, + le64_to_cpu(ptr->index.hash), + le64_to_cpu(ptr->index.extent.seg_id), + le32_to_cpu(ptr->index.extent.logical_blk), + le32_to_cpu(ptr->index.extent.len)); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("FOUND_INDEX: node_id %u, node_type %#x, " + "height %u, flags %#x, hash %llx, " + "seg_id %llu, logical_blk %u, len %u\n", + le32_to_cpu(ptr->node_id), + ptr->node_type, ptr->height, ptr->flags, + le64_to_cpu(ptr->index.hash), + le64_to_cpu(ptr->index.extent.seg_id), + le32_to_cpu(ptr->index.extent.logical_blk), + le32_to_cpu(ptr->index.extent.len)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_btree_common_node_extract_index() - extract index from node + * @node: node object + * @area: description of index area + * @found_index: identification number of found index + * @search: btree search object [out] + * + * This method tries to extract index record from the index area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static inline +int ssdfs_btree_common_node_extract_index(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_index_area *area, + u16 found_index, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_index_key *ptr; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); + + SSDFS_DBG("node_id %u, node_type %#x, found_index %u\n", + node->node_id, atomic_read(&node->type), + found_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + ptr = &search->node.found_index; + return __ssdfs_btree_common_node_extract_index(node, area, + found_index, ptr); +} + +/* + * ssdfs_find_index_by_hash() - find index record in the node by hash + * @node: node object + * @area: description of index area + * @hash: hash value for the search + * @found_index: found position of index record in the node [out] + * + * This method tries to find node's index for + * the requested hash value. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - unable to find the node's index. + * %-EEXIST - search hash has been found. + */ +int ssdfs_find_index_by_hash(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_index_area *area, + u64 hash, + u16 *found_index) +{ + int node_type; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !found_index); + + SSDFS_DBG("node_id %u, hash %llx, " + "area->start_hash %llx, area->end_hash %llx\n", + node->node_id, hash, + area->start_hash, area->end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + *found_index = U16_MAX; + + node_type = atomic_read(&node->type); + if (node_type <= SSDFS_BTREE_NODE_UNKNOWN_TYPE || + node_type >= SSDFS_BTREE_NODE_TYPE_MAX) { + SSDFS_ERR("invalid node type %#x\n", + node_type); + return -ERANGE; + } + + if (atomic_read(&area->state) != SSDFS_BTREE_NODE_INDEX_AREA_EXIST) { + SSDFS_ERR("index area hasn't been created: " + "node_id %u, node_type %#x\n", + node->node_id, + atomic_read(&node->type)); + return -ERANGE; + } + + if (area->index_count == 0) { + *found_index = 0; + SSDFS_DBG("index area is empty\n"); + return -ENODATA; + } + + if (area->index_count > area->index_capacity) { + SSDFS_ERR("index_count %u > index_capacity %u\n", + area->index_count, + area->index_capacity); + return -ERANGE; + } + + if (area->start_hash == U64_MAX) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_hash is invalid: node_id %u\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + +#ifdef CONFIG_SSDFS_DEBUG + if (hash == U64_MAX) { + SSDFS_ERR("invalid requested hash\n"); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + if (hash < area->start_hash) { + err = 0; + *found_index = 0; + goto finish_hash_search; + } + + if (area->end_hash == U64_MAX) + *found_index = 0; + else if (hash >= area->end_hash) { + *found_index = area->index_count - 1; + } else { + if (node_type == SSDFS_BTREE_ROOT_NODE) { + err = ssdfs_btree_root_node_find_index(node, hash, + found_index); + if (err == -ENODATA) { + SSDFS_DBG("unable to find index\n"); + goto finish_hash_search; + } else if (err == -EEXIST) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index exists already: " + "hash %llx, index %u\n", + hash, *found_index); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_hash_search; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find index in root node: " + "err %d\n", + err); + goto finish_hash_search; + } else if (*found_index == U16_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid index was found\n"); + goto finish_hash_search; + } + } else { + err = ssdfs_btree_common_node_find_index(node, area, + hash, + found_index); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find index: " + "node_id %u\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_hash_search; + } else if (err == -EEXIST) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index exists already: " + "hash %llx, index %u\n", + hash, *found_index); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_hash_search; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find index in the node: " + "node_id %u, err %d\n", + node->node_id, err); + goto finish_hash_search; + } else if (*found_index == U16_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid index was found\n"); + goto finish_hash_search; + } + } + } + +finish_hash_search: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hash %llx, found_index %u, err %d\n", + hash, *found_index, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_btree_node_find_index_position() - find index's position + * @node: node object + * @hash: hash value + * @found_position: pointer on returned value [out] + * + * This method tries to find node's index for + * the requested hash value. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - unable to find the node's index. + * %-ENOENT - node hasn't the index area. + * %-EACCES - node is under initialization yet. + */ +int ssdfs_btree_node_find_index_position(struct ssdfs_btree_node *node, + u64 hash, + u16 *found_position) +{ + struct ssdfs_btree_node_index_area area; + size_t desc_size = sizeof(struct ssdfs_btree_node_index_area); + int node_type; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !found_position); + BUG_ON(!node->tree); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); + + SSDFS_DBG("node_id %u, node_type %#x, hash %llx\n", + node->node_id, + atomic_read(&node->type), + hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is under initialization\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EACCES; + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + if (!is_ssdfs_btree_node_index_area_exist(node)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u hasn't index area\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOENT; + } + + *found_position = U16_MAX; + + down_read(&node->full_lock); + + node_type = atomic_read(&node->type); + if (node_type <= SSDFS_BTREE_NODE_UNKNOWN_TYPE || + node_type >= SSDFS_BTREE_NODE_TYPE_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid node type %#x\n", + node_type); + goto finish_index_search; + } + + down_read(&node->header_lock); + ssdfs_memcpy(&area, 0, desc_size, + &node->index_area, 0, desc_size, + desc_size); + err = ssdfs_find_index_by_hash(node, &area, hash, + found_position); + up_read(&node->header_lock); + + if (err == -EEXIST) { + /* hash == found hash */ + err = 0; + } else if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find an index: " + "node_id %u, hash %llx\n", + node->node_id, hash); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_index_search; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find an index: " + "node_id %u, hash %llx, err %d\n", + node->node_id, hash, err); + goto finish_index_search; + } + +finish_index_search: + up_read(&node->full_lock); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(*found_position == U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_btree_node_find_index() - find node's index + * @search: btree search object + * + * This method tries to find node's index for + * the requested hash value. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - unable to find the node's index. + * %-ENOENT - node hasn't the index area. + * %-EACCES - node is under initialization yet. + */ +int ssdfs_btree_node_find_index(struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_node *node; + struct ssdfs_btree_node_index_area area; + size_t desc_size = sizeof(struct ssdfs_btree_node_index_area); + int tree_height; + int node_type; + u16 found_index = U16_MAX; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + search->node.state, search->node.id, + search->node.height, search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search->node.parent) { + node = search->node.parent; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node->tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_height = atomic_read(&node->tree->height); + if (tree_height <= (SSDFS_BTREE_LEAF_NODE_HEIGHT + 1)) { + /* tree has only root node */ + return -ENODATA; + } + } + + node = search->node.child; + if (!node) { + SSDFS_WARN("child node is NULL\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node->tree); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is under initialization\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EACCES; + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + if (!is_ssdfs_btree_node_index_area_exist(node)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u hasn't index area\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + node = search->node.parent; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("try parent node %u\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + down_read(&node->full_lock); + + node_type = atomic_read(&node->type); + if (node_type <= SSDFS_BTREE_NODE_UNKNOWN_TYPE || + node_type >= SSDFS_BTREE_NODE_TYPE_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid node type %#x\n", + node_type); + goto finish_index_search; + } + + down_read(&node->header_lock); + ssdfs_memcpy(&area, 0, desc_size, + &node->index_area, 0, desc_size, + desc_size); + err = ssdfs_find_index_by_hash(node, &area, + search->request.start.hash, + &found_index); + up_read(&node->header_lock); + + if (err == -EEXIST) { + /* hash == found hash */ + err = 0; + } else if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find an index: " + "node_id %u, hash %llx\n", + node->node_id, search->request.start.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_index_search; + } else if (unlikely(err)) { + SSDFS_WARN("fail to find an index: " + "node_id %u, hash %llx, err %d\n", + node->node_id, search->request.start.hash, + err); + goto finish_index_search; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(found_index == U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (node_type == SSDFS_BTREE_ROOT_NODE) { + err = ssdfs_btree_root_node_extract_index(node, + found_index, + search); + } else { + err = ssdfs_btree_common_node_extract_index(node, &area, + found_index, + search); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to extract index: " + "node_id %u, node_type %#x, " + "found_index %u, err %d\n", + node->node_id, node_type, + found_index, err); + goto finish_index_search; + } + +finish_index_search: + up_read(&node->full_lock); + + if (unlikely(err)) + ssdfs_debug_show_btree_node_indexes(node->tree, node); + + return err; +} + +/* + * can_add_new_index() - check that index area has free space + * @node: node object + */ +bool can_add_new_index(struct ssdfs_btree_node *node) +{ + bool can_add = false; + u16 count, capacity; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->header_lock); + count = node->index_area.index_count; + capacity = node->index_area.index_capacity; + if (count > capacity) + err = -ERANGE; + else + can_add = count < capacity; + up_read(&node->header_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("count %u, capacity %u, can_add %#x, err %d\n", + count, capacity, can_add, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) { + SSDFS_WARN("count %u > capacity %u\n", + count, capacity); + return false; + } + + return can_add; +} + +/* + * ssdfs_btree_root_node_add_index() - add index record into the root node + * @node: node object + * @position: position in the node for storing the new index record + * @ptr: pointer on storing index record [in] + * + * This method tries to add index record into the root node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOSPC - root node hasn't free space. + * %-EEXIST - root node contains such record already. + */ +static +int ssdfs_btree_root_node_add_index(struct ssdfs_btree_node *node, + u16 position, + struct ssdfs_btree_index_key *ptr) +{ + struct ssdfs_btree_index *found = NULL; + size_t index_size = sizeof(struct ssdfs_btree_index); + u64 hash1, hash2; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !ptr); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, position %u\n", + node->node_id, position); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (position >= SSDFS_BTREE_ROOT_NODE_INDEX_COUNT) { + SSDFS_ERR("invalid position %u\n", + position); + return -ERANGE; + } + + if (node->index_area.index_count > node->index_area.index_capacity) { + SSDFS_ERR("index_count %u > index_capacity %u\n", + node->index_area.index_count, + node->index_area.index_capacity); + return -ERANGE; + } + + if (node->index_area.index_count == node->index_area.index_capacity) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to add the index: " + "index_count %u, index_capacity %u\n", + node->index_area.index_count, + node->index_area.index_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } + + found = &node->raw.root_node.indexes[position]; + + hash1 = le64_to_cpu(found->hash); + hash2 = le64_to_cpu(ptr->index.hash); + + if (hash1 == hash2) { + ssdfs_memcpy(&node->raw.root_node.indexes[position], + 0, index_size, + &ptr->index, 0, index_size, + index_size); + } else if (hash1 < hash2) { + ssdfs_memcpy(&node->raw.root_node.indexes[position + 1], + 0, index_size, + &ptr->index, 0, index_size, + index_size); + position++; + node->index_area.index_count++; + } else { + void *indexes = node->raw.root_node.indexes; + u32 dst_off = (u32)(position + 1) * index_size; + u32 src_off = (u32)position * index_size; + u32 array_size = index_size * SSDFS_BTREE_ROOT_NODE_INDEX_COUNT; + u32 move_size = (u32)(node->index_area.index_count - position) * + index_size; + + err = ssdfs_memmove(indexes, dst_off, array_size, + indexes, src_off, array_size, + move_size); + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", err); + return err; + } + + ssdfs_memcpy(&node->raw.root_node.indexes[position], + 0, index_size, + &ptr->index, 0, index_size, + index_size); + node->index_area.index_count++; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, node_type %#x, hash %llx, " + "seg_id %llu, logical_blk %u, len %u\n", + le32_to_cpu(ptr->node_id), + ptr->node_type, + le64_to_cpu(ptr->index.hash), + le64_to_cpu(ptr->index.extent.seg_id), + le32_to_cpu(ptr->index.extent.logical_blk), + le32_to_cpu(ptr->index.extent.len)); +#endif /* CONFIG_SSDFS_DEBUG */ + + found = &node->raw.root_node.indexes[0]; + node->index_area.start_hash = le64_to_cpu(found->hash); + + found = &node->raw.root_node.indexes[node->index_area.index_count - 1]; + node->index_area.end_hash = le64_to_cpu(found->hash); + + ssdfs_memcpy(&node->raw.root_node.header.node_ids[position], + 0, sizeof(__le32), + &ptr->node_id, 0, sizeof(__le32), + sizeof(__le32)); + + return 0; +} + +/* + * __ssdfs_btree_common_node_add_index() - add index record into the node + * @node: node object + * @position: position in the node for storing the new index record + * @ptr: pointer on storing index record [in] + * + * This method tries to add index record into the common node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int __ssdfs_btree_common_node_add_index(struct ssdfs_btree_node *node, + u16 position, + struct ssdfs_btree_index_key *ptr) +{ + struct page *page; + size_t index_key_len = sizeof(struct ssdfs_btree_index_key); + u32 page_index; + u32 page_off; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !ptr); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, position %u\n", + node->node_id, position); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (position != node->index_area.index_count) { + SSDFS_ERR("cannot add index: " + "position %u, index_count %u\n", + position, + node->index_area.index_count); + return -ERANGE; + } + + err = ssdfs_define_memory_page(node, &node->index_area, + position, + &page_index, &page_off); + if (unlikely(err)) { + SSDFS_ERR("fail to define memory page: " + "node_id %u, position %u, err %d\n", + node->node_id, position, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(page_index >= U32_MAX); + BUG_ON(page_off >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (page_index >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("page_index %u > pvec_size %u\n", + page_index, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + page = node->content.pvec.pages[page_index]; + + ssdfs_lock_page(page); + err = ssdfs_memcpy_to_page(page, page_off, PAGE_SIZE, + ptr, 0, index_key_len, + index_key_len); + ssdfs_unlock_page(page); + + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page_index %u, page_off %u\n", + page_index, page_off); + SSDFS_DBG("node_id %u, node_type %#x, hash %llx, " + "seg_id %llu, logical_blk %u, len %u\n", + le32_to_cpu(ptr->node_id), + ptr->node_type, + le64_to_cpu(ptr->index.hash), + le64_to_cpu(ptr->index.extent.seg_id), + le32_to_cpu(ptr->index.extent.logical_blk), + le32_to_cpu(ptr->index.extent.len)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_btree_root_node_change_index() - change index record into root node + * @node: node object + * @found_index: position in the node of the changing index record + * @new_index: pointer on new index record state [in] + * + * This method tries to change the index record into the root node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static inline +int ssdfs_btree_root_node_change_index(struct ssdfs_btree_node *node, + u16 found_index, + struct ssdfs_btree_index_key *new_index) +{ + size_t index_size = sizeof(struct ssdfs_btree_index); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !new_index); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, node_type %#x, found_index %u\n", + node->node_id, atomic_read(&node->type), + found_index); + SSDFS_DBG("node_id %u, node_type %#x, hash %llx, " + "seg_id %llu, logical_blk %u, len %u\n", + le32_to_cpu(new_index->node_id), + new_index->node_type, + le64_to_cpu(new_index->index.hash), + le64_to_cpu(new_index->index.extent.seg_id), + le32_to_cpu(new_index->index.extent.logical_blk), + le32_to_cpu(new_index->index.extent.len)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (found_index >= SSDFS_BTREE_ROOT_NODE_INDEX_COUNT) { + SSDFS_ERR("invalid found_index %u\n", + found_index); + return -ERANGE; + } + + ssdfs_memcpy(&node->raw.root_node.indexes[found_index], + 0, index_size, + &new_index->index, 0, index_size, + index_size); + + switch (found_index) { + case SSDFS_ROOT_NODE_LEFT_LEAF_NODE: + node->index_area.start_hash = + le64_to_cpu(new_index->index.hash); + break; + + case SSDFS_ROOT_NODE_RIGHT_LEAF_NODE: + node->index_area.end_hash = + le64_to_cpu(new_index->index.hash); + break; + + default: + BUG(); + } + + ssdfs_memcpy(&node->raw.root_node.header.node_ids[found_index], + 0, sizeof(__le32), + &new_index->node_id, 0, sizeof(__le32), + sizeof(__le32)); + + return 0; +} + +/* + * ssdfs_btree_common_node_change_index() - change index record into common node + * @node: node object + * @found_index: position in the node of the changing index record + * @new_index: pointer on new index record state [in] + * + * This method tries to change the index record into the common node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_common_node_change_index(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_index_area *area, + u16 found_index, + struct ssdfs_btree_index_key *new_index) +{ + struct page *page; + size_t index_key_len = sizeof(struct ssdfs_btree_index_key); + u32 page_index; + u32 page_off; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, node_type %#x, found_index %u\n", + node->node_id, atomic_read(&node->type), + found_index); + SSDFS_DBG("node_id %u, node_type %#x, hash %llx, " + "seg_id %llu, logical_blk %u, len %u\n", + le32_to_cpu(new_index->node_id), + new_index->node_type, + le64_to_cpu(new_index->index.hash), + le64_to_cpu(new_index->index.extent.seg_id), + le32_to_cpu(new_index->index.extent.logical_blk), + le32_to_cpu(new_index->index.extent.len)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (found_index == area->index_count) { + SSDFS_ERR("found_index %u == index_count %u\n", + found_index, area->index_count); + return -ERANGE; + } + + err = ssdfs_define_memory_page(node, area, found_index, + &page_index, &page_off); + if (unlikely(err)) { + SSDFS_ERR("fail to define memory page: " + "node_id %u, found_index %u, err %d\n", + node->node_id, found_index, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(page_index >= U32_MAX); + BUG_ON(page_off >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (page_index >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("page_index %u > pvec_size %u\n", + page_index, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + page = node->content.pvec.pages[page_index]; + err = ssdfs_memcpy_to_page(page, page_off, PAGE_SIZE, + new_index, 0, index_key_len, + index_key_len); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + return err; + } + + if (found_index == 0) + area->start_hash = le64_to_cpu(new_index->index.hash); + else if (found_index == (area->index_count - 1)) + area->end_hash = le64_to_cpu(new_index->index.hash); + + return 0; +} + +/* + * ssdfs_btree_common_node_insert_index() - insert index record into the node + * @node: node object + * @position: position in the node for storing the new index record + * @ptr: pointer on storing index record [in] + * + * This method tries to insert the index record into the common node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_common_node_insert_index(struct ssdfs_btree_node *node, + u16 position, + struct ssdfs_btree_index_key *ptr) +{ + struct ssdfs_btree_index_key buffer[2]; + struct page *page; + void *kaddr; + u32 page_index; + u32 page_off; + u16 cur_pos = position; + u8 index_size; + bool is_valid_index_in_buffer = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !ptr); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, position %u\n", + node->node_id, position); + SSDFS_DBG("node_id %u, node_type %#x, hash %llx, " + "seg_id %llu, logical_blk %u, len %u\n", + le32_to_cpu(ptr->node_id), + ptr->node_type, + le64_to_cpu(ptr->index.hash), + le64_to_cpu(ptr->index.extent.seg_id), + le32_to_cpu(ptr->index.extent.logical_blk), + le32_to_cpu(ptr->index.extent.len)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!(position < node->index_area.index_count)) { + SSDFS_ERR("cannot insert index: " + "position %u, index_count %u\n", + position, + node->index_area.index_count); + return -ERANGE; + } + + index_size = node->index_area.index_size; + if (index_size != sizeof(struct ssdfs_btree_index_key)) { + SSDFS_ERR("invalid index_size %u\n", + index_size); + return -ERANGE; + } + + ssdfs_memcpy(&buffer[0], 0, index_size, + ptr, 0, index_size, + index_size); + + do { + u32 rest_capacity; + u32 moving_count; + u32 moving_bytes; + + if (cur_pos > node->index_area.index_count) { + SSDFS_ERR("cur_pos %u, index_area.index_count %u\n", + cur_pos, node->index_area.index_count); + return -ERANGE; + } + + err = ssdfs_define_memory_page(node, &node->index_area, + cur_pos, + &page_index, &page_off); + if (unlikely(err)) { + SSDFS_ERR("fail to define memory page: " + "node_id %u, position %u, err %d\n", + node->node_id, cur_pos, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(page_index >= U32_MAX); + BUG_ON(page_off >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + rest_capacity = PAGE_SIZE - page_off; + rest_capacity /= index_size; + + if (rest_capacity == 0) { + SSDFS_WARN("rest_capacity == 0\n"); + return -ERANGE; + } + + moving_count = node->index_area.index_count - cur_pos; + moving_count = min_t(u32, moving_count, rest_capacity); + + if (moving_count == rest_capacity) { + /* + * Latest item will be moved into + * temporary buffer (exclude from count) + */ + moving_bytes = (moving_count - 1) * index_size; + } else + moving_bytes = moving_count * index_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page_index %u, page_off %u, cur_pos %u, " + "moving_count %u, rest_capacity %u\n", + page_index, page_off, cur_pos, + moving_count, rest_capacity); + + if ((page_off + index_size) > PAGE_SIZE) { + SSDFS_WARN("invalid offset: " + "page_off %u, index_size %u\n", + page_off, index_size); + return -ERANGE; + } + + if ((page_off + moving_bytes + index_size) > PAGE_SIZE) { + SSDFS_WARN("invalid offset: " + "page_off %u, moving_bytes %u, " + "index_size %u\n", + page_off, moving_bytes, index_size); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + if (page_index >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("page_index %u > pvec_size %u\n", + page_index, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + page = node->content.pvec.pages[page_index]; + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + + if (moving_count == 0) { + err = ssdfs_memcpy(kaddr, page_off, PAGE_SIZE, + &buffer[0], 0, index_size, + index_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + goto finish_copy; + } + + is_valid_index_in_buffer = false; + cur_pos++; + } else { + if (moving_count == rest_capacity) { + err = ssdfs_memcpy(&buffer[1], + 0, index_size, + kaddr, + PAGE_SIZE - index_size, + PAGE_SIZE, + index_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", + err); + goto finish_copy; + } + + is_valid_index_in_buffer = true; + } else + is_valid_index_in_buffer = false; + + err = ssdfs_memmove(kaddr, + page_off + index_size, + PAGE_SIZE, + kaddr, + page_off, + PAGE_SIZE, + moving_bytes); + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", + err); + goto finish_copy; + } + + err = ssdfs_memcpy(kaddr, page_off, PAGE_SIZE, + &buffer[0], 0, index_size, + index_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", + err); + goto finish_copy; + } + + if (is_valid_index_in_buffer) { + ssdfs_memcpy(&buffer[0], 0, index_size, + &buffer[1], 0, index_size, + index_size); + } + + cur_pos += moving_count; + } + +finish_copy: + flush_dcache_page(page); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_pos %u, index_area.index_count %u, " + "is_valid_index_in_buffer %#x\n", + cur_pos, node->index_area.index_count, + is_valid_index_in_buffer); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) { + SSDFS_ERR("cur_pos %u, index_area.index_count %u, " + "is_valid_index_in_buffer %#x\n", + cur_pos, node->index_area.index_count, + is_valid_index_in_buffer); + return err; + } + } while (is_valid_index_in_buffer); + + return 0; +} + +/* + * ssdfs_btree_common_node_add_index() - add index record into the node + * @node: node object + * @position: position in the node for storing the new index record + * @ptr: pointer on storing index record [in] + * + * This method tries to add the index record into the common node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOSPC - node hasn't free space. + */ +static +int ssdfs_btree_common_node_add_index(struct ssdfs_btree_node *node, + u16 position, + struct ssdfs_btree_index_key *ptr) +{ + struct ssdfs_btree_index_key tmp_key; + u64 hash1, hash2; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !ptr); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, position %u, index_count %u\n", + node->node_id, position, + node->index_area.index_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (node->index_area.index_count > node->index_area.index_capacity) { + SSDFS_ERR("index_count %u > index_capacity %u\n", + node->index_area.index_count, + node->index_area.index_capacity); + return -ERANGE; + } + + if (node->index_area.index_count == node->index_area.index_capacity) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to add the index: " + "index_count %u, index_capacity %u\n", + node->index_area.index_count, + node->index_area.index_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } + + if (position > node->index_area.index_count) { + SSDFS_ERR("invalid index place: " + "position %u, index_count %u\n", + position, + node->index_area.index_count); + return -ERANGE; + } + + if (position == node->index_area.index_count) { + err = __ssdfs_btree_common_node_add_index(node, position, ptr); + if (unlikely(err)) { + SSDFS_ERR("fail to add index: " + "node_id %u, position %u, err %d\n", + node->node_id, position, err); + return err; + } + + node->index_area.index_count++; + } else { + err = __ssdfs_btree_common_node_extract_index(node, + &node->index_area, + position, + &tmp_key); + if (unlikely(err)) { + SSDFS_ERR("fail to extract the index: err %d\n", err); + return err; + } + + hash1 = le64_to_cpu(tmp_key.index.hash); + hash2 = le64_to_cpu(ptr->index.hash); + + if (hash1 == hash2) { + err = ssdfs_btree_common_node_change_index(node, + &node->index_area, + position, + ptr); + if (unlikely(err)) { + SSDFS_ERR("fail to change index: " + "node_id %u, position %u, err %d\n", + node->node_id, position, err); + return err; + } + } else { + if (hash2 > hash1) + position++; + + if (position == node->index_area.index_count) { + err = __ssdfs_btree_common_node_add_index(node, + position, + ptr); + if (unlikely(err)) { + SSDFS_ERR("fail to add index: " + "node_id %u, position %u, " + "err %d\n", + node->node_id, position, err); + return err; + } + } else { + err = ssdfs_btree_common_node_insert_index(node, + position, + ptr); + if (unlikely(err)) { + SSDFS_ERR("fail to insert index: " + "node_id %u, position %u, " + "err %d\n", + node->node_id, position, err); + return err; + } + } + + node->index_area.index_count++; + } + } + + err = __ssdfs_btree_common_node_extract_index(node, + &node->index_area, + 0, + &tmp_key); + if (unlikely(err)) { + SSDFS_ERR("fail to extract the index: err %d\n", err); + return err; + } + + node->index_area.start_hash = le64_to_cpu(tmp_key.index.hash); + + err = __ssdfs_btree_common_node_extract_index(node, + &node->index_area, + node->index_area.index_count - 1, + &tmp_key); + if (unlikely(err)) { + SSDFS_ERR("fail to extract the index: err %d\n", err); + return err; + } + + node->index_area.end_hash = le64_to_cpu(tmp_key.index.hash); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_hash %llx, end_hash %llx, " + "index_count %u, index_capacity %u\n", + node->index_area.start_hash, + node->index_area.end_hash, + node->index_area.index_count, + node->index_area.index_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_btree_node_add_index() - add index into node's index area + * @node: node object + * @index: new index + * + * This method tries to insert the index into node's index area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOSPC - index area hasn't free space. + * %-ENOENT - node hasn't the index area. + * %-EFAULT - corrupted index or node's index area. + * %-EACCES - node is under initialization yet. + */ +int ssdfs_btree_node_add_index(struct ssdfs_btree_node *node, + struct ssdfs_btree_index_key *index) +{ + struct ssdfs_fs_info *fsi; + u64 hash; + int node_type; + u16 found = U16_MAX; + u16 count; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !node->tree->fsi || !index); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + hash = le64_to_cpu(index->index.hash); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, hash %llx\n", + node->node_id, hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is under initialization\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EACCES; + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + if (!is_ssdfs_btree_node_index_area_exist(node)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u hasn't index area\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOENT; + } + +#ifdef CONFIG_SSDFS_DEBUG + if (hash == U64_MAX) { + SSDFS_ERR("invalid hash %llx\n", hash); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + node_type = atomic_read(&node->type); + if (node_type <= SSDFS_BTREE_NODE_UNKNOWN_TYPE || + node_type >= SSDFS_BTREE_NODE_TYPE_MAX) { + SSDFS_ERR("invalid node type %#x\n", + node_type); + return -ERANGE; + } + + if (!can_add_new_index(node)) { + u32 new_size; + + down_read(&node->header_lock); + new_size = node->index_area.area_size * 2; + up_read(&node->header_lock); + + err = ssdfs_btree_node_resize_index_area(node, new_size); + if (err == -EACCES) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is under initialization\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index area cannot be resized: " + "node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to resize index area: " + "node_id %u, new_size %u, err %d\n", + node->node_id, new_size, err); + return err; + } + } + + if (node_type == SSDFS_BTREE_ROOT_NODE) { + down_read(&node->full_lock); + down_write(&node->header_lock); + + err = ssdfs_find_index_by_hash(node, &node->index_area, + hash, &found); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(found >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (err == -ENODATA) { + /* node hasn't any index */ + err = 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find an index: " + "node_id %u, hash %llx, err %d\n", + node->node_id, hash, err); + goto finish_change_root_node; + } + + err = ssdfs_btree_root_node_add_index(node, found, + index); + if (unlikely(err)) { + SSDFS_ERR("fail to change index: " + "node_id %u, node_type %#x, " + "found_index %u, err %d\n", + node->node_id, node_type, + found, err); + } + +finish_change_root_node: + up_write(&node->header_lock); + up_read(&node->full_lock); + + if (unlikely(err)) { + ssdfs_debug_show_btree_node_indexes(node->tree, node); + return err; + } + } else { + down_write(&node->full_lock); + down_write(&node->header_lock); + + err = ssdfs_find_index_by_hash(node, &node->index_area, + hash, &found); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(found >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (err == -EEXIST) { + /* index exist already */ + err = 0; + } else if (err == -ENODATA) { + /* node hasn't any index */ + err = 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find an index: " + "node_id %u, hash %llx, err %d\n", + node->node_id, hash, err); + up_write(&node->header_lock); + up_write(&node->full_lock); + return err; + } + + count = (node->index_area.index_count + 1) - found; + err = ssdfs_lock_index_range(node, found, count); + BUG_ON(err == -ENODATA); + if (unlikely(err)) { + SSDFS_ERR("fail to lock index range: " + "start %u, count %u, err %d\n", + found, count, err); + up_write(&node->header_lock); + up_write(&node->full_lock); + return err; + } + + downgrade_write(&node->full_lock); + + err = ssdfs_btree_common_node_add_index(node, found, + index); + ssdfs_unlock_index_range(node, found, count); + + if (!err) + err = ssdfs_set_dirty_index_range(node, found, count); + + up_write(&node->header_lock); + up_read(&node->full_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to add index: " + "node_id %u, node_type %#x, " + "found_index %u, err %d\n", + node->node_id, node_type, + found, err); + ssdfs_debug_show_btree_node_indexes(node->tree, node); + } + } + + ssdfs_set_node_update_cno(node); + set_ssdfs_btree_node_dirty(node); + + return 0; +} + +static inline +bool is_ssdfs_btree_index_key_identical(struct ssdfs_btree_index_key *index1, + struct ssdfs_btree_index_key *index2) +{ + u32 node_id1, node_id2; + u8 node_type1, node_type2; + u64 hash1, hash2; + u64 seg_id1, seg_id2; + u32 logical_blk1, logical_blk2; + u32 len1, len2; + + node_id1 = le32_to_cpu(index1->node_id); + node_type1 = index1->node_type; + hash1 = le64_to_cpu(index1->index.hash); + seg_id1 = le64_to_cpu(index1->index.extent.seg_id); + logical_blk1 = le32_to_cpu(index1->index.extent.logical_blk); + len1 = le32_to_cpu(index1->index.extent.len); + + node_id2 = le32_to_cpu(index2->node_id); + node_type2 = index2->node_type; + hash2 = le64_to_cpu(index2->index.hash); + seg_id2 = le64_to_cpu(index2->index.extent.seg_id); + logical_blk2 = le32_to_cpu(index2->index.extent.logical_blk); + len2 = le32_to_cpu(index2->index.extent.len); + + return node_id1 == node_id2 && node_type1 == node_type2 && + hash1 == hash2 && seg_id1 == seg_id2 && + logical_blk1 == logical_blk2 && len1 == len2; +} + +/* + * __ssdfs_btree_node_change_index() - change existing index + * @node: node object + * @old_index: old index + * @new_index: new index + * + * This method tries to change @old_index on @new_index into + * node's index area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - node's index area doesn't contain @old_index. + * %-ENOENT - node hasn't the index area. + * %-EFAULT - corrupted index or node's index area. + * %-EACCES - node is under initialization yet. + */ +static +int __ssdfs_btree_node_change_index(struct ssdfs_btree_node *node, + struct ssdfs_btree_index_key *old_index, + struct ssdfs_btree_index_key *new_index) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_btree_node_index_area area; + size_t desc_size = sizeof(struct ssdfs_btree_node_index_area); + int node_type; + u64 old_hash, new_hash; + u16 found = U16_MAX; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !node->tree->fsi); + BUG_ON(!old_index || !new_index); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + old_hash = le64_to_cpu(old_index->index.hash); + new_hash = le64_to_cpu(new_index->index.hash); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, old_hash %llx, new_hash %llx\n", + node->node_id, old_hash, new_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is under initialization\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EACCES; + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + if (!is_ssdfs_btree_node_index_area_exist(node)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u hasn't index area\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOENT; + } + +#ifdef CONFIG_SSDFS_DEBUG + if (old_hash == U64_MAX || new_hash == U64_MAX) { + SSDFS_ERR("invalid old_hash %llx or new_hash %llx\n", + old_hash, new_hash); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + node_type = atomic_read(&node->type); + if (node_type <= SSDFS_BTREE_NODE_UNKNOWN_TYPE || + node_type >= SSDFS_BTREE_NODE_TYPE_MAX) { + SSDFS_ERR("invalid node type %#x\n", + node_type); + return -ERANGE; + } + + if (node_type == SSDFS_BTREE_ROOT_NODE) { + down_read(&node->full_lock); + + err = ssdfs_find_index_by_hash(node, &node->index_area, + old_hash, &found); + if (err == -EEXIST) { + err = 0; + /* + * Index has been found. + * Continue logic. + */ + } else if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find an index: " + "node_id %u, hash %llx\n", + node->node_id, old_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_change_root_node; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find an index: " + "node_id %u, hash %llx, err %d\n", + node->node_id, old_hash, err); + goto finish_change_root_node; + } else { + err = -ERANGE; + SSDFS_ERR("no index for the hash %llx\n", + old_hash); + goto finish_change_root_node; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(found == U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&node->header_lock); + + err = ssdfs_btree_root_node_change_index(node, found, + new_index); + if (unlikely(err)) { + SSDFS_ERR("fail to change index: " + "node_id %u, node_type %#x, " + "found %u, err %d\n", + node->node_id, node_type, + found, err); + } + + up_write(&node->header_lock); +finish_change_root_node: + up_read(&node->full_lock); + + if (unlikely(err)) + return err; + } else { + down_read(&node->full_lock); + + down_read(&node->header_lock); + ssdfs_memcpy(&area, 0, desc_size, + &node->index_area, 0, desc_size, + desc_size); + up_read(&node->header_lock); + + err = ssdfs_find_index_by_hash(node, &area, + old_hash, &found); + if (err == -EEXIST) { + err = 0; + /* + * Index has been found. + * Continue logic. + */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to find an index: " + "node_id %u, hash %llx, err %d\n", + node->node_id, old_hash, err); + } else { + err = -ERANGE; + SSDFS_ERR("no index for the hash %llx\n", + old_hash); + } + + up_read(&node->full_lock); + + if (unlikely(err)) + return err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(found == U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&node->full_lock); + down_write(&node->header_lock); + + err = ssdfs_lock_index_range(node, found, 1); + BUG_ON(err == -ENODATA); + + if (unlikely(err)) { + SSDFS_ERR("fail to lock index %u, err %d\n", + found, err); + up_write(&node->header_lock); + up_write(&node->full_lock); + return err; + } + + downgrade_write(&node->full_lock); + + err = ssdfs_btree_common_node_change_index(node, + &node->index_area, + found, new_index); + ssdfs_unlock_index_range(node, found, 1); + + if (!err) + err = ssdfs_set_dirty_index_range(node, found, 1); + + up_write(&node->header_lock); + up_read(&node->full_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to change index: " + "node_id %u, node_type %#x, " + "found %u, err %d\n", + node->node_id, node_type, + found, err); + return err; + } + } + + ssdfs_set_node_update_cno(node); + set_ssdfs_btree_node_dirty(node); + + return 0; +} + +/* + * ssdfs_btree_node_change_index() - change existing index + * @node: node object + * @old_index: old index + * @new_index: new index + * + * This method tries to change @old_index on @new_index into + * node's index area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - node's index area doesn't contain @old_index. + * %-ENOENT - node hasn't the index area. + * %-EFAULT - corrupted index or node's index area. + * %-EACCES - node is under initialization yet. + */ +int ssdfs_btree_node_change_index(struct ssdfs_btree_node *node, + struct ssdfs_btree_index_key *old_index, + struct ssdfs_btree_index_key *new_index) +{ + u64 old_hash, new_hash; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !node->tree->fsi); + BUG_ON(!old_index || !new_index); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + old_hash = le64_to_cpu(old_index->index.hash); + new_hash = le64_to_cpu(new_index->index.hash); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, old_hash %llx, new_hash %llx\n", + node->node_id, old_hash, new_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is under initialization\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EACCES; + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + if (!is_ssdfs_btree_node_index_area_exist(node)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u hasn't index area\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOENT; + } + +#ifdef CONFIG_SSDFS_DEBUG + if (old_hash == U64_MAX || new_hash == U64_MAX) { + SSDFS_ERR("invalid old_hash %llx or new_hash %llx\n", + old_hash, new_hash); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_ssdfs_btree_index_key_identical(old_index, new_index)) { + SSDFS_DBG("old and new index are identical\n"); + return 0; + } + + if (old_hash == new_hash) { + err = __ssdfs_btree_node_change_index(node, old_index, + new_index); + if (unlikely(err)) { + SSDFS_ERR("fail to change index: " + "old_hash %llx, err %d\n", + old_hash, err); + goto fail_change_index; + } + } else { + err = ssdfs_btree_node_delete_index(node, old_hash); + if (unlikely(err)) { + SSDFS_ERR("fail to delete index: " + "old_hash %llx, err %d\n", + old_hash, err); + goto fail_change_index; + } + + err = ssdfs_btree_node_add_index(node, new_index); + if (unlikely(err)) { + SSDFS_ERR("fail to add index: " + "new_hash %llx, err %d\n", + new_hash, err); + goto fail_change_index; + } + } + + return 0; + +fail_change_index: + SSDFS_ERR("node_id %u, node_type %#x\n", + node->node_id, + atomic_read(&node->type)); + SSDFS_ERR("node_id %u, node_type %#x, old_hash %llx, " + "seg_id %llu, logical_blk %u, len %u\n", + le32_to_cpu(old_index->node_id), + old_index->node_type, + le64_to_cpu(old_index->index.hash), + le64_to_cpu(old_index->index.extent.seg_id), + le32_to_cpu(old_index->index.extent.logical_blk), + le32_to_cpu(old_index->index.extent.len)); + SSDFS_ERR("node_id %u, node_type %#x, new_hash %llx, " + "seg_id %llu, logical_blk %u, len %u\n", + le32_to_cpu(new_index->node_id), + new_index->node_type, + le64_to_cpu(new_index->index.hash), + le64_to_cpu(new_index->index.extent.seg_id), + le32_to_cpu(new_index->index.extent.logical_blk), + le32_to_cpu(new_index->index.extent.len)); + + return err; +} From patchwork Sat Feb 25 01:09:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151961 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53A9AC7EE33 for ; Sat, 25 Feb 2023 01:19:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229768AbjBYBTt (ORCPT ); Fri, 24 Feb 2023 20:19:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49418 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229754AbjBYBRk (ORCPT ); Fri, 24 Feb 2023 20:17:40 -0500 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 49B9717CF7 for ; Fri, 24 Feb 2023 17:17:28 -0800 (PST) Received: by mail-oi1-x22b.google.com with SMTP id y184so798240oiy.8 for ; Fri, 24 Feb 2023 17:17:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=H2cwP2vpbYW44JitUfrxO8oWbr3F8ymhSAf3P0Fo/B8=; b=VcfqaeQBk9P+SlUofyvgDTNDg74ioHa3Jsn5/O9S9P0EFXT8gw3Ib/P4o/MKFZRMli FKKhyDNzIaPfE716OuVv5GCmYQdY3SVeT/QF5eZXGwFWlIE8oBAUbnzfWm8BUDij//W9 JJIqNbWVRc1HVd43RpWJ0zCh+AD+l678OPR0Ao+ItfqiFkuffZ6CztkM9yZPWSp5r/lr usy5/IxytJ8HUuCPgaTlRdEukMcRXgt/lAi+/Da6qG+k0Qn3/aFJHtY5FFOw5BQNTpaR 8GWYKkWO2LYj9zHxVgwIfbU0d6pmN32oynO68iwOVp7KWR+fKOVYwXar+rJFj/npScZo A9Aw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=H2cwP2vpbYW44JitUfrxO8oWbr3F8ymhSAf3P0Fo/B8=; b=W+wCQpngnKm+CDiqvuj4VmxhDk8FNL+LlKPzn4Na6/aBsiqgy9yiFws0FtLpHsNlb2 o3YNyTi2p2h3UxgN16ox1SK4ZyHzSdrFKeaLxvUDPGV06xPTqAAUihhQm1yBC0ASjtq0 Sy857dbOuZJN95nkoyp19vDS4H1fEMh/SZ4aWHxxOYvrHAcD8dEWOn0Y7bNCzIFNDT5q uIVjjW/Bc5zswMOYRVkfQgZmtXfeDICmvMfzEWUIQ6qZqX4/9f7OObT2cgZvX743kDGt HpI8ezmjtiVMyiKVtN5jCAEEv/iqoz4/tstlyKJw7RVtlsI9uDZHFggNNEgPIYl9T3UG QkGw== X-Gm-Message-State: AO0yUKU8U49d0Bga26hyblGTLdFyGxoQief7OW+JqMqg9gOYB5c4CSMC 4gpO+IQdi0Dt/f9e8+Gimbt40bYkdHM1hwjK X-Google-Smtp-Source: AK7set+xZsalrCkTENSemTheTEIoXwYAFA6F1+Bd/ES9QfmtfNd90wzddJbFTTEU+mZbHf7QRafZ7Q== X-Received: by 2002:a05:6808:a81:b0:378:2ac9:b7bc with SMTP id q1-20020a0568080a8100b003782ac9b7bcmr4495933oij.56.1677287846933; Fri, 24 Feb 2023 17:17:26 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:26 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 53/76] ssdfs: search/allocate/insert b-tree node operations Date: Fri, 24 Feb 2023 17:09:04 -0800 Message-Id: <20230225010927.813929-54-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org B-tree node implements search, allocation, and insert item or range of items in the node: (1) find_item - find item in b-tree node (2) find_range - find range of items in b-tree node (3) allocate_item - allocate item in b-tree node (4) allocate_range - allocate range of items in b-tree node (5) insert_item - insert/add item into b-tree node (6) insert_range - insert/add range of items into b-tree node Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/btree_node.c | 3135 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 3135 insertions(+) diff --git a/fs/ssdfs/btree_node.c b/fs/ssdfs/btree_node.c index f4402cb8df64..aa9d90ba8598 100644 --- a/fs/ssdfs/btree_node.c +++ b/fs/ssdfs/btree_node.c @@ -8207,3 +8207,3138 @@ int ssdfs_btree_node_change_index(struct ssdfs_btree_node *node, return err; } + +/* + * ssdfs_btree_root_node_delete_index() - delete index record from root node + * @node: node object + * @position: position in the node of the deleting index record + * + * This method tries to delete the index record from the root node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_btree_root_node_delete_index(struct ssdfs_btree_node *node, + u16 position) +{ + size_t index_size = sizeof(struct ssdfs_btree_index); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("index 0: node_id %u; index 1: node_id %u\n", + cpu_to_le32(node->raw.root_node.header.node_ids[0]), + cpu_to_le32(node->raw.root_node.header.node_ids[1])); + SSDFS_DBG("node_id %u, position %u\n", + node->node_id, position); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (node->index_area.index_count > node->index_area.index_capacity) { + SSDFS_ERR("index_count %u > index_capacity %u\n", + node->index_area.index_count, + node->index_area.index_capacity); + return -ERANGE; + } + + if (position >= node->index_area.index_count) { + SSDFS_ERR("invalid position %u, index_count %u\n", + position, + node->index_area.index_count); + return -ERANGE; + } + + if (node->index_area.index_count == 0) { + SSDFS_WARN("index_count == 0\n"); + return -ERANGE; + } + + switch (position) { + case SSDFS_ROOT_NODE_LEFT_LEAF_NODE: + if ((position + 1) < node->index_area.index_count) { + node->index_area.start_hash = node->index_area.end_hash; + ssdfs_memcpy(&node->raw.root_node.indexes[position], + 0, index_size, + &node->raw.root_node.indexes[position + 1], + 0, index_size, + index_size); + memset(&node->raw.root_node.indexes[position + 1], 0xFF, + index_size); + node->raw.root_node.header.node_ids[position + 1] = + cpu_to_le32(U32_MAX); + } else { + node->index_area.start_hash = U64_MAX; + node->index_area.end_hash = U64_MAX; + memset(&node->raw.root_node.indexes[position], 0xFF, + index_size); + node->raw.root_node.header.node_ids[position] = + cpu_to_le32(U32_MAX); + } + break; + + case SSDFS_ROOT_NODE_RIGHT_LEAF_NODE: + node->index_area.end_hash = node->index_area.start_hash; + memset(&node->raw.root_node.indexes[position], 0xFF, + index_size); + node->raw.root_node.header.node_ids[position] = + cpu_to_le32(U32_MAX); + break; + + default: + BUG(); + } + + node->index_area.index_count--; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node->index_area.index_count %u\n", + node->index_area.index_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_btree_common_node_delete_tail_index() - delete the tail index record + * @node: node object + * @position: position in the node of the deleting index record + * @ptr: index record before @position [out] + * + * This method tries to delete the tail index record from the common node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_common_node_delete_tail_index(struct ssdfs_btree_node *node, + u16 position, + struct ssdfs_btree_index_key *ptr) +{ + size_t index_size = sizeof(struct ssdfs_btree_index_key); + struct page *page; + u32 page_index; + u32 page_off; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !ptr); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, position %u\n", + node->node_id, position); +#endif /* CONFIG_SSDFS_DEBUG */ + + if ((position + 1) != node->index_area.index_count) { + SSDFS_ERR("cannot delete index: " + "position %u, index_count %u\n", + position, + node->index_area.index_count); + return -ERANGE; + } + + err = ssdfs_define_memory_page(node, &node->index_area, + position, + &page_index, &page_off); + if (unlikely(err)) { + SSDFS_ERR("fail to define memory page: " + "node_id %u, position %u, err %d\n", + node->node_id, position, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(page_index >= U32_MAX); + BUG_ON(page_off >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (page_index >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("page_index %u > pvec_size %u\n", + page_index, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + if ((page_off + index_size) > PAGE_SIZE) { + SSDFS_ERR("invalid page_off %u\n", + page_off); + return -ERANGE; + } + + page = node->content.pvec.pages[page_index]; + ssdfs_lock_page(page); + ssdfs_memset_page(page, page_off, PAGE_SIZE, + 0xFF, index_size); + ssdfs_unlock_page(page); + + if (position == 0) + memset(ptr, 0xFF, index_size); + else { + err = ssdfs_define_memory_page(node, &node->index_area, + position - 1, + &page_index, &page_off); + if (unlikely(err)) { + SSDFS_ERR("fail to define memory page: " + "node_id %u, position %u, err %d\n", + node->node_id, position - 1, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(page_index >= U32_MAX); + BUG_ON(page_off >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (page_index >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("page_index %u > pvec_size %u\n", + page_index, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + page = node->content.pvec.pages[page_index]; + ssdfs_lock_page(page); + ssdfs_memcpy_from_page(ptr, 0, index_size, + page, page_off, PAGE_SIZE, + index_size); + ssdfs_unlock_page(page); + + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + return err; + } + } + + return 0; +} + +/* + * ssdfs_btree_common_node_remove_index() - remove the index record + * @node: node object + * @position: position in the node of the deleting index record + * @ptr: index record on @position after deletion [out] + * + * This method tries to delete the index record from the common node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_common_node_remove_index(struct ssdfs_btree_node *node, + u16 position, + struct ssdfs_btree_index_key *ptr) +{ + struct ssdfs_btree_index_key buffer; + struct page *page; + void *kaddr; + u32 page_index; + u32 page_off; + u16 cur_pos = position; + u8 index_size; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !ptr); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, position %u\n", + node->node_id, position); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!((position + 1) < node->index_area.index_count)) { + SSDFS_ERR("cannot remove index: " + "position %u, index_count %u\n", + position, + node->index_area.index_count); + return -ERANGE; + } + + index_size = node->index_area.index_size; + if (index_size != sizeof(struct ssdfs_btree_index_key)) { + SSDFS_ERR("invalid index_size %u\n", + index_size); + return -ERANGE; + } + + do { + u32 rest_capacity; + u32 moving_count; + u32 moving_bytes; + + err = ssdfs_define_memory_page(node, &node->index_area, + cur_pos, + &page_index, &page_off); + if (unlikely(err)) { + SSDFS_ERR("fail to define memory page: " + "node_id %u, position %u, err %d\n", + node->node_id, cur_pos, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_pos %u, page_index %u, page_off %u\n", + cur_pos, page_index, page_off); + + BUG_ON(page_index >= U32_MAX); + BUG_ON(page_off >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + rest_capacity = PAGE_SIZE - (page_off + index_size); + rest_capacity /= index_size; + + moving_count = node->index_area.index_count - (cur_pos + 1); + moving_count = min_t(u32, moving_count, rest_capacity); + moving_bytes = moving_count * index_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("rest_capacity %u, index_count %u, " + "moving_count %u, moving_bytes %u\n", + rest_capacity, + node->index_area.index_count, + moving_count, moving_bytes); + + if ((page_off + index_size) > PAGE_SIZE) { + SSDFS_WARN("invalid offset: " + "page_off %u, index_size %u\n", + page_off, index_size); + return -ERANGE; + } + + if ((page_off + moving_bytes) > PAGE_SIZE) { + SSDFS_WARN("invalid offset: " + "page_off %u, moving_bytes %u\n", + page_off, moving_bytes); + return -ERANGE; + } + + if (page_index >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("page_index %u > pvec_size %u\n", + page_index, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + page = node->content.pvec.pages[page_index]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + kaddr = kmap_local_page(page); + + if (moving_count == 0) { + err = ssdfs_memcpy(&buffer, 0, index_size, + kaddr, page_off, PAGE_SIZE, + index_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + goto finish_copy; + } + + memset((u8 *)kaddr + page_off, 0xFF, index_size); + } else { + err = ssdfs_memcpy(&buffer, 0, index_size, + kaddr, page_off, PAGE_SIZE, + index_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + goto finish_copy; + } + + err = ssdfs_memmove(kaddr, + page_off, PAGE_SIZE, + kaddr, + page_off + index_size, PAGE_SIZE, + moving_bytes); + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", err); + goto finish_copy; + } + + memset((u8 *)kaddr + page_off + moving_bytes, + 0xFF, index_size); + } + + if (cur_pos == position) { + err = ssdfs_memcpy(ptr, 0, index_size, + kaddr, page_off, PAGE_SIZE, + index_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + goto finish_copy; + } + } + +finish_copy: + flush_dcache_page(page); + kunmap_local(kaddr); + ssdfs_unlock_page(page); + + if (unlikely(err)) + return err; + + if (cur_pos != position) { + err = ssdfs_define_memory_page(node, &node->index_area, + cur_pos - 1, + &page_index, &page_off); + if (unlikely(err)) { + SSDFS_ERR("fail to define memory page: " + "node_id %u, position %u, err %d\n", + node->node_id, cur_pos - 1, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_pos %u, page_index %u, page_off %u\n", + cur_pos, page_index, page_off); + + BUG_ON(page_index >= U32_MAX); + BUG_ON(page_off >= U32_MAX); + + if ((page_off + index_size) > PAGE_SIZE) { + SSDFS_WARN("invalid offset: " + "page_off %u, index_size %u\n", + page_off, index_size); + return -ERANGE; + } + + if (page_index >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("page_index %u > pvec_size %u\n", + page_index, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + page = node->content.pvec.pages[page_index]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_lock_page(page); + err = ssdfs_memcpy_to_page(page, page_off, PAGE_SIZE, + &buffer, 0, index_size, + index_size); + ssdfs_unlock_page(page); + + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + return err; + } + } + + cur_pos += moving_count + 1; + } while (cur_pos < node->index_area.index_count); + + return 0; +} + +/* + * ssdfs_btree_common_node_delete_index() - delete the index record + * @node: node object + * @position: position in the node of the deleting index record + * + * This method tries to delete the index record from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_btree_common_node_delete_index(struct ssdfs_btree_node *node, + u16 position) +{ + struct ssdfs_btree_index_key buffer; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, position %u, index_count %u\n", + node->node_id, position, + node->index_area.index_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (node->index_area.index_count > node->index_area.index_capacity) { + SSDFS_ERR("index_count %u > index_capacity %u\n", + node->index_area.index_count, + node->index_area.index_capacity); + return -ERANGE; + } + + if (node->index_area.index_count == 0) { + SSDFS_WARN("index_count == 0\n"); + return -ERANGE; + } + + if (position >= node->index_area.index_count) { + SSDFS_ERR("invalid index place: " + "position %u, index_count %u\n", + position, + node->index_area.index_count); + return -ERANGE; + } + + if ((position + 1) == node->index_area.index_count) { + err = ssdfs_btree_common_node_delete_tail_index(node, position, + &buffer); + if (unlikely(err)) { + SSDFS_ERR("fail to delete index: " + "node_id %u, position %u, err %d\n", + node->node_id, position, err); + return err; + } + } else { + err = ssdfs_btree_common_node_remove_index(node, position, + &buffer); + if (unlikely(err)) { + SSDFS_ERR("fail to remove index: " + "node_id %u, position %u, err %d\n", + node->node_id, position, err); + return err; + } + } + + node->index_area.index_count--; + + switch (node->tree->type) { + case SSDFS_INODES_BTREE: + /* keep the index range unchanged */ + goto finish_common_node_delete_index; + + default: + /* continue logic */ + break; + } + + if (node->index_area.index_count == 0) { + node->index_area.start_hash = U64_MAX; + node->index_area.end_hash = U64_MAX; + } else { + if (position == 0) { + node->index_area.start_hash = + le64_to_cpu(buffer.index.hash); + } else if (position == node->index_area.index_count) { + node->index_area.end_hash = + le64_to_cpu(buffer.index.hash); + } + } + +finish_common_node_delete_index: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_hash %llx, end_hash %llx, " + "index_count %u, index_capacity %u\n", + node->index_area.start_hash, + node->index_area.end_hash, + node->index_area.index_count, + node->index_area.index_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * need_shrink_index_area() - check that index area should be shrinked + * @node: node object + * @new_size: new size of the node after shrinking [out] + */ +static +bool need_shrink_index_area(struct ssdfs_btree_node *node, u32 *new_size) +{ + u16 index_area_min_size; + u16 count, capacity; + u8 index_size; + bool need_check_size = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !new_size); + + SSDFS_DBG("node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + *new_size = U32_MAX; + index_area_min_size = node->tree->index_area_min_size; + + down_read(&node->header_lock); + count = node->index_area.index_count; + capacity = node->index_area.index_capacity; + index_size = node->index_area.index_size; + if (capacity == 0) + err = -ERANGE; + if (count > capacity) + err = -ERANGE; + up_read(&node->header_lock); + + if (unlikely(err)) { + SSDFS_WARN("count %u > capacity %u\n", + count, capacity); + return false; + } + + if (index_area_min_size == 0 || index_area_min_size % index_size) { + SSDFS_WARN("invalid index size: " + "index_area_min_size %u, index_size %u\n", + index_area_min_size, index_size); + return false; + } + + if (count == 0) + need_check_size = true; + else + need_check_size = ((capacity / count) >= 2); + + if (need_check_size) { + *new_size = (capacity / 2) * index_size; + if (*new_size >= index_area_min_size) + return true; + else + *new_size = U32_MAX; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("count %u, capacity %u, index_size %u, " + "index_area_min_size %u, new_size %u\n", + count, capacity, index_size, + index_area_min_size, *new_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + return false; +} + +/* + * ssdfs_btree_node_delete_index() - delete existing index + * @node: node object + * @hash: hash value + * + * This method tries to delete index for @hash from node's + * index area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - node's index area doesn't contain index for @hash. + * %-ENOENT - node hasn't the index area. + * %-EFAULT - corrupted node's index area. + * %-EACCES - node is under initialization yet. + */ +int ssdfs_btree_node_delete_index(struct ssdfs_btree_node *node, + u64 hash) +{ + struct ssdfs_fs_info *fsi; + int node_type; + u16 found = U16_MAX; + u16 count; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !node->tree->fsi); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); + + SSDFS_DBG("node_id %u, hash %llx\n", + node->node_id, hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is under initialization\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EACCES; + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + if (!is_ssdfs_btree_node_index_area_exist(node)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u hasn't index area\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOENT; + } + +#ifdef CONFIG_SSDFS_DEBUG + if (hash == U64_MAX) { + SSDFS_ERR("invalid hash %llx\n", hash); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + node_type = atomic_read(&node->type); + if (node_type <= SSDFS_BTREE_NODE_UNKNOWN_TYPE || + node_type >= SSDFS_BTREE_NODE_TYPE_MAX) { + SSDFS_ERR("invalid node type %#x\n", + node_type); + return -ERANGE; + } + + if (node_type == SSDFS_BTREE_ROOT_NODE) { + down_read(&node->full_lock); + down_write(&node->header_lock); + + err = ssdfs_find_index_by_hash(node, &node->index_area, + hash, &found); + if (err == -EEXIST) { + /* hash == found hash */ + err = 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find an index: " + "node_id %u, hash %llx, err %d\n", + node->node_id, hash, err); + goto finish_change_root_node; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(found == U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_btree_root_node_delete_index(node, found); + if (unlikely(err)) { + SSDFS_ERR("fail to delete index: " + "node_id %u, node_type %#x, " + "found_index %u, err %d\n", + node->node_id, node_type, + found, err); + } + +finish_change_root_node: + up_write(&node->header_lock); + up_read(&node->full_lock); + + if (unlikely(err)) + return err; + } else { + down_write(&node->full_lock); + down_write(&node->header_lock); + + err = ssdfs_find_index_by_hash(node, &node->index_area, + hash, &found); + if (err == -EEXIST) { + /* hash == found hash */ + err = 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find an index: " + "node_id %u, hash %llx, err %d\n", + node->node_id, hash, err); + up_write(&node->header_lock); + up_write(&node->full_lock); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(found == U16_MAX); + + SSDFS_DBG("index_count %u, found %u\n", + node->index_area.index_count, found); +#endif /* CONFIG_SSDFS_DEBUG */ + + count = node->index_area.index_count - found; + err = ssdfs_lock_index_range(node, found, count); + BUG_ON(err == -ENODATA); + if (unlikely(err)) { + SSDFS_ERR("fail to lock index range: " + "start %u, count %u, err %d\n", + found, count, err); + up_write(&node->header_lock); + up_write(&node->full_lock); + return err; + } + + downgrade_write(&node->full_lock); + + err = ssdfs_btree_common_node_delete_index(node, found); + ssdfs_unlock_index_range(node, found, count); + + if (!err) + err = ssdfs_set_dirty_index_range(node, found, count); + + up_write(&node->header_lock); + up_read(&node->full_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to delete index: " + "node_id %u, node_type %#x, " + "found_index %u, err %d\n", + node->node_id, node_type, + found, err); + } + } + + ssdfs_set_node_update_cno(node); + set_ssdfs_btree_node_dirty(node); + + if (node_type != SSDFS_BTREE_ROOT_NODE) { + u32 new_size; + + if (need_shrink_index_area(node, &new_size)) { + err = ssdfs_btree_node_resize_index_area(node, + new_size); + if (err == -ENOSPC) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index area cannot be resized: " + "node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to resize index area: " + "node_id %u, new_size %u, err %d\n", + node->node_id, new_size, err); + return err; + } + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_hash %llx, end_hash %llx, " + "index_count %u, index_capacity %u\n", + node->index_area.start_hash, + node->index_area.end_hash, + node->index_area.index_count, + node->index_area.index_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_move_root2common_node_index_range() - move index range (root -> common) + * @src: source node + * @src_start: starting index in the source node + * @dst: destination node + * @dst_start: starting index in the destination node + * @count: count of indexes in the range + * + * This method tries to move the index range from the source node + * into destination one. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_move_root2common_node_index_range(struct ssdfs_btree_node *src, + u16 src_start, + struct ssdfs_btree_node *dst, + u16 dst_start, u16 count) +{ + struct ssdfs_fs_info *fsi; + int i, j; + int upper_bound; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!src || !dst); + BUG_ON(!src->tree || !src->tree->fsi); + BUG_ON(!rwsem_is_locked(&src->tree->lock)); + + if (!is_ssdfs_btree_node_index_area_exist(src)) { + SSDFS_DBG("src node %u hasn't index area\n", + src->node_id); + return -EINVAL; + } + + if (!is_ssdfs_btree_node_index_area_exist(dst)) { + SSDFS_DBG("dst node %u hasn't index area\n", + dst->node_id); + return -EINVAL; + } + + if (atomic_read(&src->type) != SSDFS_BTREE_ROOT_NODE) { + SSDFS_ERR("invalid src node type %#x\n", + atomic_read(&src->type)); + return -EINVAL; + } + + SSDFS_DBG("src_node %u, src_start %u, " + "dst_node %u, dst_start %u, " + "count %u\n", + src->node_id, src_start, + dst->node_id, dst_start, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = src->tree->fsi; + + if (src_start >= SSDFS_BTREE_ROOT_NODE_INDEX_COUNT) { + SSDFS_ERR("invalid src_start %u\n", + src_start); + return -ERANGE; + } + + if (count == 0) { + SSDFS_ERR("count is zero\n"); + return -ERANGE; + } + + atomic_set(&src->state, SSDFS_BTREE_NODE_CREATED); + atomic_set(&dst->state, SSDFS_BTREE_NODE_CREATED); + + count = min_t(u16, count, + SSDFS_BTREE_ROOT_NODE_INDEX_COUNT - src_start); + + upper_bound = src_start + count; + for (i = src_start, j = dst_start; i < upper_bound; i++, j++) { + struct ssdfs_btree_index_key index; + + down_write(&src->full_lock); + + err = __ssdfs_btree_root_node_extract_index(src, i, + &index); + if (unlikely(err)) { + SSDFS_ERR("fail extract index: " + "index %u, err %d\n", + i, err); + } + + up_write(&src->full_lock); + + if (unlikely(err)) { + atomic_set(&src->state, SSDFS_BTREE_NODE_CORRUPTED); + atomic_set(&dst->state, SSDFS_BTREE_NODE_CORRUPTED); + return err; + } + + down_write(&dst->full_lock); + + down_write(&dst->header_lock); + err = ssdfs_btree_common_node_add_index(dst, j, &index); + up_write(&dst->header_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to insert index: " + "index %u, err %d\n", + j, err); + } + + up_write(&dst->full_lock); + + if (unlikely(err)) { + atomic_set(&src->state, SSDFS_BTREE_NODE_CORRUPTED); + atomic_set(&dst->state, SSDFS_BTREE_NODE_CORRUPTED); + return err; + } + } + + for (i = 0; i < count; i++) { + down_write(&src->full_lock); + + down_write(&src->header_lock); + err = ssdfs_btree_root_node_delete_index(src, src_start); + up_write(&src->header_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to delete index: " + "index %u, err %d\n", + i, err); + } + + up_write(&src->full_lock); + + if (unlikely(err)) { + atomic_set(&src->state, SSDFS_BTREE_NODE_CORRUPTED); + atomic_set(&dst->state, SSDFS_BTREE_NODE_CORRUPTED); + return err; + } + } + + ssdfs_set_node_update_cno(src); + set_ssdfs_btree_node_dirty(src); + + ssdfs_set_node_update_cno(dst); + set_ssdfs_btree_node_dirty(dst); + + return 0; +} + +/* + * ssdfs_copy_index_range_in_buffer() - copy index range in buffer + * @node: node object + * @start: starting index in the node + * @count: requested count of indexes in the range + * @area_offset: offset of the index area in the node + * @index_size: size of the index in bytes + * @buf: pointer on buffer + * @range_len: pointer on value of count of indexes in the buffer [out] + * + * This method tries to copy the index range into the buffer. + * If a current memory page of node contains lesser amount of indexes + * then @range_len will contain real number of indexes in the @buf. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_copy_index_range_in_buffer(struct ssdfs_btree_node *node, + u16 start, u16 count, + u32 area_offset, u16 index_size, + struct ssdfs_btree_index_key *buf, + u16 *range_len) +{ + struct page *page; + u32 offset; + u32 page_index; + u32 page_off; + u32 copy_bytes; +#ifdef CONFIG_SSDFS_DEBUG + int i; +#endif /* CONFIG_SSDFS_DEBUG */ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !buf || !range_len); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + if (!is_ssdfs_btree_node_index_area_exist(node)) { + SSDFS_DBG("node %u hasn't index area\n", + node->node_id); + return -EINVAL; + } + + SSDFS_DBG("node %u, start %u, count %u\n", + node->node_id, start, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (count == 0) { + SSDFS_ERR("count is zero\n"); + return -ERANGE; + } + + *range_len = U16_MAX; + + offset = area_offset + (start * index_size); + page_index = offset / PAGE_SIZE; + page_off = offset % PAGE_SIZE; + + *range_len = PAGE_SIZE - page_off; + *range_len /= index_size; + *range_len = min_t(u32, *range_len, (u32)count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("offset %u, page_index %u, page_off %u\n", + offset, page_index, page_off); + SSDFS_DBG("start %u, count %u, range_len %u\n", + start, count, *range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (*range_len == 0) { + SSDFS_ERR("range_len == 0\n"); + return -ERANGE; + } + + copy_bytes = *range_len * index_size; + + if (page_index >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("invalid page_index: " + "page_index %u, pagevec %u\n", + page_index, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + page = node->content.pvec.pages[page_index]; + + if (!page) { + SSDFS_ERR("page is NULL\n"); + return -ERANGE; + } + + err = ssdfs_memcpy_from_page(buf, 0, PAGE_SIZE, + page, page_off, PAGE_SIZE, + copy_bytes); + if (unlikely(err)) { + SSDFS_ERR("buffer is too small: " + "range_len %u, index_size %u, " + "buf_size %lu\n", + *range_len, index_size, + PAGE_SIZE); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + for (i = 0; i < *range_len; i++) { + SSDFS_DBG("index %d, node_id %u, " + "node_type %#x, height %u, " + "flags %#x, hash %llx, seg_id %llu, " + "logical_blk %u, len %u\n", + i, + le32_to_cpu(buf[i].node_id), + buf[i].node_type, + buf[i].height, + le16_to_cpu(buf[i].flags), + le64_to_cpu(buf[i].index.hash), + le64_to_cpu(buf[i].index.extent.seg_id), + le32_to_cpu(buf[i].index.extent.logical_blk), + le32_to_cpu(buf[i].index.extent.len)); + } +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_save_index_range_in_node() - save index range in the node + * @node: node object + * @start: starting index in the node + * @count: requested count of indexes in the range + * @area_offset: offset of the index area in the node + * @index_size: size of the index in bytes + * @buf: pointer on buffer + * + * This method tries to save the index range from @buf into @node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_save_index_range_in_node(struct ssdfs_btree_node *node, + u16 start, u16 count, + u32 area_offset, u16 index_size, + struct ssdfs_btree_index_key *buf) +{ + struct page *page; + u32 offset; + u32 page_index; + u32 page_off; + int i; + u16 copied = 0; + u32 sub_range_len = 0; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !buf); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + if (!is_ssdfs_btree_node_index_area_exist(node)) { + SSDFS_DBG("node %u hasn't index area\n", + node->node_id); + return -EINVAL; + } + + SSDFS_DBG("node %u, start %u, count %u\n", + node->node_id, start, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (count == 0) { + SSDFS_ERR("count is zero\n"); + return -ERANGE; + } + + i = start; + + while (count > 0) { + offset = area_offset + (i * index_size); + page_index = offset / PAGE_SIZE; + page_off = offset % PAGE_SIZE; + + sub_range_len = PAGE_SIZE - page_off; + sub_range_len /= index_size; + sub_range_len = min_t(u32, sub_range_len, count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("i %d, offset %u, page_index %u, " + "page_off %u, sub_range_len %u\n", + i, offset, page_index, + page_off, sub_range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (sub_range_len == 0) { + SSDFS_ERR("invalid sub_range_len: " + "i %d, count %u, " + "page_index %u, page_off %u, " + "sub_range_len %u\n", + i, count, page_index, page_off, + sub_range_len); + return -ERANGE; + } + + if (page_index >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("invalid page_index: " + "page_index %u, pagevec %u\n", + page_index, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + page = node->content.pvec.pages[page_index]; + + if (!page) { + SSDFS_ERR("page is NULL\n"); + return -ERANGE; + } + + if ((page_off + (sub_range_len * index_size)) > PAGE_SIZE) { + SSDFS_ERR("out of page: " + "page_off %u, sub_range_len %u, " + "index_size %u, page_size %lu\n", + page_off, sub_range_len, index_size, + PAGE_SIZE); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("i %u, count %u, page_index %u, " + "page_off %u, copied %u, sub_range_len %u\n", + i, count, page_index, + page_off, copied, sub_range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_memcpy_to_page(page, page_off, PAGE_SIZE, + buf, copied * index_size, PAGE_SIZE, + sub_range_len * index_size); + if (unlikely(err)) { + SSDFS_ERR("out of page: " + "sub_range_len %u, index_size %u, " + "page_size %lu\n", + sub_range_len, index_size, + PAGE_SIZE); + return err; + } + + err = ssdfs_set_dirty_index_range(node, i, + (u16)sub_range_len); + if (unlikely(err)) { + SSDFS_ERR("fail to set dirty index range: " + "start %u, len %u, err %d\n", + i, sub_range_len, err); + return err; + } + + i += sub_range_len; + copied += sub_range_len; + count -= sub_range_len; + }; + + return 0; +} + +/* + * ssdfs_clear_index_range_in_node() - clear index range in the node + * @node: node object + * @start: starting index in the node + * @count: requested count of indexes in the range + * @area_offset: offset of the index area in the node + * @index_size: size of the index in bytes + * + * This method tries to clear the index range into @node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_clear_index_range_in_node(struct ssdfs_btree_node *node, + u16 start, u16 count, + u32 area_offset, u16 index_size) +{ + struct page *page; + u32 offset; + u32 page_index; + u32 page_off; + int i; + u32 sub_range_len = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + if (!is_ssdfs_btree_node_index_area_exist(node)) { + SSDFS_DBG("node %u hasn't index area\n", + node->node_id); + return -EINVAL; + } + + SSDFS_DBG("node %u, start %u, count %u\n", + node->node_id, start, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (count == 0) { + SSDFS_ERR("count is zero\n"); + return -ERANGE; + } + + i = start; + + while (count > 0) { + offset = area_offset + (i * index_size); + page_index = offset / PAGE_SIZE; + page_off = offset % PAGE_SIZE; + + sub_range_len = PAGE_SIZE - page_off; + sub_range_len /= index_size; + sub_range_len = min_t(u32, sub_range_len, count); + + if (sub_range_len == 0) { + SSDFS_ERR("invalid sub_range_len: " + "i %d, count %u, " + "page_index %u, page_off %u, " + "sub_range_len %u\n", + i, count, page_index, page_off, + sub_range_len); + return -ERANGE; + } + + if ((sub_range_len * index_size) > PAGE_SIZE) { + SSDFS_ERR("out of page: " + "sub_range_len %u, index_size %u, " + "page_size %lu\n", + sub_range_len, index_size, + PAGE_SIZE); + return -ERANGE; + } + + if (page_index >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("invalid page_index: " + "page_index %u, pagevec %u\n", + page_index, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + page = node->content.pvec.pages[page_index]; + + if (!page) { + SSDFS_ERR("page is NULL\n"); + return -ERANGE; + } + + if ((page_off + (sub_range_len * index_size)) > PAGE_SIZE) { + SSDFS_ERR("out of page: " + "page_off %u, sub_range_len %u, " + "index_size %u, page_size %lu\n", + page_off, sub_range_len, index_size, + PAGE_SIZE); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start %u, count %u, page_index %u, " + "page_off %u, sub_range_len %u\n", + start, count, page_index, + page_off, sub_range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_memset_page(page, page_off, PAGE_SIZE, + 0xFF, sub_range_len * index_size); + + i += sub_range_len; + count -= sub_range_len; + }; + + return 0; +} + +/* + * ssdfs_move_common2common_node_index_range() - move index range + * @src: source node + * @src_start: starting index in the source node + * @dst: destination node + * @dst_start: starting index in the destination node + * @count: count of indexes in the range + * + * This method tries to move the index range from the common node + * @src into the common node @dst. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_move_common2common_node_index_range(struct ssdfs_btree_node *src, + u16 src_start, + struct ssdfs_btree_node *dst, + u16 dst_start, u16 count) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_btree_index_key *buf; + u16 i, j; + u32 src_offset, dst_offset; + u32 src_area_size, dst_area_size; + u16 index_size; + u16 src_index_count, dst_index_count; + u16 dst_index_capacity; + u64 src_start_hash, src_end_hash; + u64 dst_start_hash, dst_end_hash; + u16 processed = 0; + u16 copied = 0; + u16 rest_unmoved = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!src || !dst); + BUG_ON(!src->tree || !src->tree->fsi); + BUG_ON(!rwsem_is_locked(&src->tree->lock)); + + if (!is_ssdfs_btree_node_index_area_exist(src)) { + SSDFS_DBG("src node %u hasn't index area\n", + src->node_id); + return -EINVAL; + } + + if (!is_ssdfs_btree_node_index_area_exist(dst)) { + SSDFS_DBG("dst node %u hasn't index area\n", + dst->node_id); + return -EINVAL; + } + + SSDFS_DBG("src_node %u, src_start %u, " + "dst_node %u, dst_start %u, " + "count %u\n", + src->node_id, src_start, + dst->node_id, dst_start, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = src->tree->fsi; + + if (count == 0) { + SSDFS_ERR("count is zero\n"); + return -ERANGE; + } + + buf = ssdfs_btree_node_kzalloc(PAGE_SIZE, GFP_KERNEL); + if (!buf) { + SSDFS_ERR("fail to allocate buffer\n"); + return -ERANGE; + } + + atomic_set(&src->state, SSDFS_BTREE_NODE_CREATED); + atomic_set(&dst->state, SSDFS_BTREE_NODE_CREATED); + + down_read(&src->header_lock); + src_offset = src->index_area.offset; + src_area_size = src->index_area.area_size; + index_size = src->index_area.index_size; + src_index_count = src->index_area.index_count; + src_start_hash = src->index_area.start_hash; + src_end_hash = src->index_area.end_hash; + up_read(&src->header_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("src_node %u, index_count %u, count %u\n", + src->node_id, src_index_count, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&dst->header_lock); + dst_offset = dst->index_area.offset; + dst_area_size = dst->index_area.area_size; + dst_index_count = dst->index_area.index_count; + dst_index_capacity = dst->index_area.index_capacity; + dst_start_hash = dst->index_area.start_hash; + dst_end_hash = dst->index_area.end_hash; + up_read(&dst->header_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dst_node %u, index_count %u, " + "count %u, dst_index_capacity %u\n", + dst->node_id, dst_index_count, + count, dst_index_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + if ((src_start + count) > src_index_count) { + err = -ERANGE; + SSDFS_ERR("invalid count: " + "src_start %u, count %u, " + "src_index_count %u\n", + src_start, count, src_index_count); + goto finish_index_moving; + } + + if ((dst_index_count + count) > dst_index_capacity) { + err = -ERANGE; + SSDFS_ERR("invalid count: " + "dst_index_count %u, count %u, " + "dst_index_capacity %u\n", + dst_index_count, count, + dst_index_capacity); + goto finish_index_moving; + } + + i = src_start; + j = dst_start; + + down_write(&src->full_lock); + err = ssdfs_lock_whole_index_area(src); + downgrade_write(&src->full_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to lock source's index area: err %d\n", + err); + goto unlock_src_node; + } + + down_write(&dst->full_lock); + err = ssdfs_lock_whole_index_area(dst); + downgrade_write(&dst->full_lock); + + if (unlikely(err)) { + ssdfs_unlock_whole_index_area(src); + SSDFS_ERR("fail to lock destination's index area: err %d\n", + err); + goto unlock_dst_node; + } + + if (dst_start == 0 && dst_start != dst_index_count) { + down_write(&dst->header_lock); + err = ssdfs_shift_range_right2(dst, &dst->index_area, + index_size, + 0, dst_index_count, + count); + up_write(&dst->header_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to shift index range right: " + "dst_node %u, index_count %u, " + "shift %u, err %d\n", + dst->node_id, dst_index_count, + count, err); + goto unlock_index_area; + } + } else if (dst_start != dst_index_count) { + err = -ERANGE; + SSDFS_ERR("dst_start %u != dst_index_count %u\n", + dst_start, dst_index_count); + SSDFS_ERR("source (start_hash %llx, end_hash %llx), " + "destination (start_hash %llx, end_hash %llx)\n", + src_start_hash, src_end_hash, + dst_start_hash, dst_end_hash); + goto unlock_index_area; + } + + while (processed < count) { + u16 range_len = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("i %u, j %u, processed %u, " + "count %u, range_len %u\n", + i, j, processed, count, range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_copy_index_range_in_buffer(src, i, + count - processed, + src_offset, + index_size, + buf, + &range_len); + if (unlikely(err)) { + SSDFS_ERR("fail to copy index range in buffer: " + "err %d\n", err); + goto unlock_index_area; + } + + err = ssdfs_save_index_range_in_node(dst, j, range_len, + dst_offset, index_size, + buf); + if (unlikely(err)) { + SSDFS_ERR("fail to save index range into node: " + "err %d\n", err); + goto unlock_index_area; + } + + i += range_len; + j += range_len; + processed += range_len; + } + + err = ssdfs_clear_index_range_in_node(src, src_start, count, + src_offset, index_size); + if (unlikely(err)) { + SSDFS_ERR("fail to clear the source node's index range: " + "err %d\n", err); + goto unlock_index_area; + } + + down_write(&dst->header_lock); + dst->index_area.index_count += processed; + err = __ssdfs_init_index_area_hash_range(dst, + dst->index_area.index_count, + &dst->index_area.start_hash, + &dst->index_area.end_hash); + up_write(&dst->header_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to set the destination node's index range: " + "err %d\n", err); + goto unlock_index_area; + } + + if ((src_start + processed) < src_index_count) { + i = src_start + processed; + j = src_start; + + rest_unmoved = src_index_count - (src_start + processed); + copied = 0; + + while (copied < rest_unmoved) { + u16 range_len = 0; + + err = ssdfs_copy_index_range_in_buffer(src, i, + rest_unmoved - copied, + src_offset, + index_size, + buf, + &range_len); + if (unlikely(err)) { + SSDFS_ERR("fail to copy index range in buffer: " + "err %d\n", err); + goto finish_source_correction; + } + + err = ssdfs_save_index_range_in_node(src, j, range_len, + src_offset, + index_size, + buf); + if (unlikely(err)) { + SSDFS_ERR("fail to save index range into node: " + "err %d\n", err); + goto finish_source_correction; + } + +finish_source_correction: + if (unlikely(err)) + goto unlock_index_area; + + i += range_len; + j += range_len; + copied += range_len; + } + + err = ssdfs_clear_index_range_in_node(src, + src_start + processed, + rest_unmoved, + src_offset, index_size); + if (unlikely(err)) { + SSDFS_ERR("fail to clear the src node's index range: " + "err %d\n", err); + goto unlock_index_area; + } + + err = ssdfs_set_dirty_index_range(src, src_start, + rest_unmoved); + if (unlikely(err)) { + SSDFS_ERR("fail to set dirty index range: " + "start %u, len %u, err %d\n", + src_start, rest_unmoved, err); + goto unlock_index_area; + } + } + + down_write(&src->header_lock); + src->index_area.index_count -= processed; + err = __ssdfs_init_index_area_hash_range(src, + src->index_area.index_count, + &src->index_area.start_hash, + &src->index_area.end_hash); + up_write(&src->header_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to set the source node's hash range: " + "err %d\n", err); + goto unlock_index_area; + } + +unlock_index_area: + ssdfs_unlock_whole_index_area(dst); + ssdfs_unlock_whole_index_area(src); + +unlock_dst_node: + up_read(&dst->full_lock); + +unlock_src_node: + up_read(&src->full_lock); + +finish_index_moving: + if (unlikely(err)) { + atomic_set(&src->state, SSDFS_BTREE_NODE_CORRUPTED); + atomic_set(&dst->state, SSDFS_BTREE_NODE_CORRUPTED); + } else { + ssdfs_set_node_update_cno(src); + set_ssdfs_btree_node_dirty(src); + + ssdfs_set_node_update_cno(dst); + set_ssdfs_btree_node_dirty(dst); + } + + ssdfs_btree_node_kfree(buf); + return err; +} + +/* + * ssdfs_btree_node_move_index_range() - move index range + * @src: source node + * @src_start: starting index in the source node + * @dst: destination node + * @dst_start: starting index in the destination node + * @count: count of indexes in the range + * + * This method tries to move the index range from @src into @dst. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOENT - index area is absent. + */ +int ssdfs_btree_node_move_index_range(struct ssdfs_btree_node *src, + u16 src_start, + struct ssdfs_btree_node *dst, + u16 dst_start, u16 count) +{ + int src_type, dst_type; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!src || !dst); + BUG_ON(!rwsem_is_locked(&src->tree->lock)); + + SSDFS_DBG("src_node %u, src_start %u, " + "dst_node %u, dst_start %u, " + "count %u\n", + src->node_id, src_start, + dst->node_id, dst_start, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&src->state)) { + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&src->state)); + return -ERANGE; + } + + if (!is_ssdfs_btree_node_index_area_exist(src)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("src node %u hasn't index area\n", + src->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOENT; + } + + switch (atomic_read(&dst->state)) { + case SSDFS_BTREE_NODE_CREATED: + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&dst->state)); + return -ERANGE; + } + + if (!is_ssdfs_btree_node_index_area_exist(dst)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dst node %u hasn't index area\n", + dst->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOENT; + } + + src_type = atomic_read(&src->type); + switch (src_type) { + case SSDFS_BTREE_ROOT_NODE: + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid src node type %#x\n", + src_type); + return -ERANGE; + } + + dst_type = atomic_read(&dst->type); + switch (dst_type) { + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_INDEX_NODE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dst node type %#x\n", + dst_type); + return -ERANGE; + } + + if (src_type == SSDFS_BTREE_ROOT_NODE) { + err = ssdfs_move_root2common_node_index_range(src, src_start, + dst, dst_start, + count); + } else { + err = ssdfs_move_common2common_node_index_range(src, src_start, + dst, dst_start, + count); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to move index range: err %d\n", + err); + } + + return err; +} + +/* + * ssdfs_btree_node_check_result_for_search() - check search result for search + * @search: btree search object + */ +static +int ssdfs_btree_node_check_result_for_search(struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_node *node; + u64 update_cno; + u64 start_hash, end_hash; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search || !search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + node = search->node.child; + + down_read(&node->header_lock); + update_cno = node->update_cno; + start_hash = node->items_area.start_hash; + end_hash = node->items_area.end_hash; + up_read(&node->header_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search (state %#x, search_cno %llu, " + "start_hash %llx, end_hash %llx), " + "node (update_cno %llu, " + "start_hash %llx, end_hash %llx)\n", + search->result.state, + search->result.search_cno, + search->request.start.hash, + search->request.end.hash, + update_cno, start_hash, end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_VALID_ITEM: + if (search->result.search_cno < update_cno) { + search->result.state = + SSDFS_BTREE_SEARCH_OBSOLETE_RESULT; + return -EAGAIN; + } + + if (search->request.start.hash < start_hash && + search->request.start.hash > end_hash) { + search->result.state = + SSDFS_BTREE_SEARCH_OBSOLETE_RESULT; + return -EAGAIN; + } + + return 0; + + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + if (search->result.search_cno < update_cno) { + search->result.state = + SSDFS_BTREE_SEARCH_OBSOLETE_RESULT; + return -EAGAIN; + } + + return 0; + + case SSDFS_BTREE_SEARCH_UNKNOWN_RESULT: + /* expected state */ + break; + + case SSDFS_BTREE_SEARCH_FAILURE: + case SSDFS_BTREE_SEARCH_EMPTY_RESULT: + case SSDFS_BTREE_SEARCH_OBSOLETE_RESULT: + search->result.state = SSDFS_BTREE_SEARCH_UNKNOWN_RESULT; + break; + + case SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE: + SSDFS_DBG("search result requests to add a node already\n"); + break; + + case SSDFS_BTREE_SEARCH_PLEASE_DELETE_NODE: + SSDFS_WARN("unexpected search result state\n"); + search->result.state = SSDFS_BTREE_SEARCH_UNKNOWN_RESULT; + break; + + default: + SSDFS_WARN("invalid search result state\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search->result.state %#x\n", + search->result.state); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_btree_node_check_hash_range() - check necessity to do search + * @node: pointer on node object + * @items_count: items count in the node + * @items_capacity: node's capacity for items + * @start_hash: items' area starting hash + * @end_hash: items' area ending hash + * @search: pointer on search request object + * + * This method tries to check the necessity to do + * the real search in the node.. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - requested range is out of the node. + * %-ENOMEM - unable to allocate memory. + */ +int ssdfs_btree_node_check_hash_range(struct ssdfs_btree_node *node, + u16 items_count, + u16 items_capacity, + u64 start_hash, + u64 end_hash, + struct ssdfs_btree_search *search) +{ + u16 vacant_items; + bool have_enough_space; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "search (start_hash %llx, end_hash %llx), " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + start_hash, end_hash, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + vacant_items = items_capacity - items_count; + have_enough_space = search->request.count <= vacant_items; + + switch (RANGE_WITHOUT_INTERSECTION(search->request.start.hash, + search->request.end.hash, + start_hash, end_hash)) { + case 0: + /* ranges have intersection */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ranges have intersection\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + case -1: /* range1 < range2 */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("range1 < range2\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + if (have_enough_space) { + search->result.state = + SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND; + } else { + search->result.state = + SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE; + } + + search->result.err = -ENODATA; + search->result.start_index = 0; + search->result.count = search->request.count; + search->result.search_cno = + ssdfs_current_cno(node->tree->fsi->sb); + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_ADD_ITEM: + case SSDFS_BTREE_SEARCH_ADD_RANGE: + case SSDFS_BTREE_SEARCH_CHANGE_ITEM: + /* do nothing */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->result.buf_state = + SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE; + search->result.buf = NULL; + search->result.buf_size = 0; + search->result.items_in_buffer = 0; + break; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search->result.start_index %u, " + "search->result.state %#x, " + "search->result.err %d\n", + search->result.start_index, + search->result.state, + search->result.err); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + + case 1: /* range1 > range2 */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("range1 > range2\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (have_enough_space) { + search->result.state = + SSDFS_BTREE_SEARCH_OUT_OF_RANGE; + } else { + search->result.state = + SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE; + } + + search->result.err = -ENODATA; + search->result.start_index = items_count; + search->result.count = search->request.count; + search->result.search_cno = + ssdfs_current_cno(node->tree->fsi->sb); + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_ADD_ITEM: + case SSDFS_BTREE_SEARCH_ADD_RANGE: + case SSDFS_BTREE_SEARCH_CHANGE_ITEM: + /* do nothing */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->result.buf_state = + SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE; + search->result.buf = NULL; + search->result.buf_size = 0; + search->result.items_in_buffer = 0; + break; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search->result.start_index %u, " + "search->result.state %#x, " + "search->result.err %d\n", + search->result.start_index, + search->result.state, + search->result.err); +#endif /* CONFIG_SSDFS_DEBUG */ + + return -ENODATA; + + default: + BUG(); + } + + if (!RANGE_HAS_PARTIAL_INTERSECTION(search->request.start.hash, + search->request.end.hash, + start_hash, end_hash)) { + SSDFS_ERR("invalid request: " + "request (start_hash %llx, end_hash %llx), " + "node (start_hash %llx, end_hash %llx)\n", + search->request.start.hash, + search->request.end.hash, + start_hash, end_hash); + return -ERANGE; + } + + if (items_count == 0) { + search->result.state = + SSDFS_BTREE_SEARCH_OUT_OF_RANGE; + + search->result.err = -ENODATA; + search->result.start_index = 0; + search->result.count = search->request.count; + search->result.search_cno = + ssdfs_current_cno(node->tree->fsi->sb); + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_ADD_ITEM: + case SSDFS_BTREE_SEARCH_ADD_RANGE: + case SSDFS_BTREE_SEARCH_CHANGE_ITEM: + /* do nothing */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->result.buf_state = + SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE; + search->result.buf = NULL; + search->result.buf_size = 0; + search->result.items_in_buffer = 0; + break; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search->result.start_index %u, " + "search->result.state %#x, " + "search->result.err %d\n", + search->result.start_index, + search->result.state, + search->result.err); +#endif /* CONFIG_SSDFS_DEBUG */ + + return -ENODATA; + } + + return 0; +} + +/* + * ssdfs_btree_node_find_item() - find the item in the node + * @search: btree search object + * + * This method tries to find an item in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - node doesn't contain item for the requested hash. + * %-ENOENT - node hasn't the items area. + * %-ENOSPC - node hasn't free space. + * %-EACCES - node is under initialization yet. + * %-EAGAIN - search object contains obsolete result. + * %-EOPNOTSUPP - specialized searching method doesn't been implemented + */ +int ssdfs_btree_node_find_item(struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_node *node; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + search->node.state, search->node.id, + search->node.height, search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + node = search->node.child; + if (!node) { + SSDFS_WARN("child node is NULL\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node->tree); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is under initialization\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EACCES; + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + if (!is_btree_search_node_desc_consistent(search)) { + SSDFS_WARN("node descriptor is inconsistent\n"); + return -ERANGE; + } + + err = ssdfs_btree_node_check_result_for_search(search); + if (err) + return err; + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + case SSDFS_BTREE_NODE_AREA_ABSENT: + SSDFS_WARN("items area is absent: node_id %u\n", + search->node.id); + return -ENOENT; + + default: + SSDFS_WARN("invalid items area state: node_id %u\n", + search->node.id); + return -ERANGE; + } + + if (!node->node_ops || !node->node_ops->find_item) { + SSDFS_WARN("unable to search in the node\n"); + return -EOPNOTSUPP; + } + + down_read(&node->full_lock); + err = node->node_ops->find_item(node, search); + up_read(&node->full_lock); + + if (err == -ENODATA) { + u16 items_count; + u16 items_capacity; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u " + "hasn't item for request " + "(start_hash %llx, end_hash %llx)\n", + node->node_id, + search->request.start.hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_ALLOCATE_ITEM: + case SSDFS_BTREE_SEARCH_ADD_ITEM: + down_read(&node->header_lock); + items_count = node->items_area.items_count; + items_capacity = node->items_area.items_capacity; + up_read(&node->header_lock); + + if (items_count >= items_capacity) { + err = -ENOSPC; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node hasn't free space: " + "items_count %u, " + "items_capacity %u\n", + items_count, + items_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + search->result.err = -ENODATA; + } + break; + + default: + search->result.err = err; + break; + } + } else if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u " + "hasn't all items for request " + "(start_hash %llx, end_hash %llx)\n", + node->node_id, + search->request.start.hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u hasn't items area\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->result.state = SSDFS_BTREE_SEARCH_FAILURE; + search->result.err = err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find: " + "node %u, " + "request (start_hash %llx, end_hash %llx), " + "err %d\n", + node->node_id, + search->request.start.hash, + search->request.end.hash, + err); + + search->result.state = SSDFS_BTREE_SEARCH_FAILURE; + search->result.err = err; + } + + return err; +} + +/* + * ssdfs_btree_node_find_range() - find the range in the node + * @search: btree search object + * + * This method tries to find a range of items in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - node doesn't contain items for the requested range. + * %-ENOENT - node hasn't the items area. + * %-ENOSPC - node hasn't free space. + * %-EACCES - node is under initialization yet. + * %-EAGAIN - search object contains obsolete result. + * %-EOPNOTSUPP - specialized searching method doesn't been implemented + */ +int ssdfs_btree_node_find_range(struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_node *node; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + search->node.state, search->node.id, + search->node.height, search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + node = search->node.child; + if (!node) { + SSDFS_WARN("child node is NULL\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node->tree); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is under initialization\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EACCES; + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + if (!is_btree_search_node_desc_consistent(search)) { + SSDFS_WARN("node descriptor is inconsistent\n"); + return -ERANGE; + } + + err = ssdfs_btree_node_check_result_for_search(search); + if (err) + return err; + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + case SSDFS_BTREE_NODE_AREA_ABSENT: + SSDFS_WARN("items area is absent: node_id %u\n", + search->node.id); + return -ENOENT; + + default: + SSDFS_WARN("invalid items area state: node_id %u\n", + search->node.id); + return -ERANGE; + } + + if (!node->node_ops || !node->node_ops->find_range) { + SSDFS_WARN("unable to search in the node\n"); + return -EOPNOTSUPP; + } + + down_read(&node->full_lock); + err = node->node_ops->find_range(node, search); + up_read(&node->full_lock); + + if (err == -ENODATA) { + u16 items_count; + u16 items_capacity; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u " + "hasn't item for request " + "(start_hash %llx, end_hash %llx)\n", + node->node_id, + search->request.start.hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_ALLOCATE_ITEM: + case SSDFS_BTREE_SEARCH_ADD_ITEM: + down_read(&node->header_lock); + items_count = node->items_area.items_count; + items_capacity = node->items_area.items_capacity; + up_read(&node->header_lock); + + if (items_count >= items_capacity) { + err = -ENOSPC; + search->result.err = -ENODATA; + } + break; + + default: + search->result.err = err; + break; + } + } else if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u " + "hasn't all items for request " + "(start_hash %llx, end_hash %llx)\n", + node->node_id, + search->request.start.hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u hasn't items area\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->result.state = SSDFS_BTREE_SEARCH_FAILURE; + search->result.err = err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find: " + "node %u, " + "request (start_hash %llx, end_hash %llx), " + "err %d\n", + node->node_id, + search->request.start.hash, + search->request.end.hash, + err); + + search->result.state = SSDFS_BTREE_SEARCH_FAILURE; + search->result.err = err; + } + + return err; +} + +/* + * ssdfs_btree_node_check_result_for_alloc() - check search result for alloc + * @search: btree search object + */ +static inline +int ssdfs_btree_node_check_result_for_alloc(struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_VALID_ITEM: + return -EEXIST; + + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + /* expected state */ + break; + + default: + SSDFS_WARN("invalid search result state %#x\n", + search->result.state); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_btree_node_allocate_item() - allocate the item in the node + * @search: btree search object + * + * This method tries to allocate an item in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EEXIST - item is used already. + * %-ENOSPC - item is out of node. + * %-ENOENT - node hasn't the items area. + * %-EACCES - node is under initialization yet. + * %-EAGAIN - search object contains obsolete result. + */ +int ssdfs_btree_node_allocate_item(struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_btree_node *node; + u16 flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); + BUG_ON(search->request.start.hash > search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + search->node.state, search->node.id, + search->node.height, search->node.parent, + search->node.child); +#else + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + search->node.state, search->node.id, + search->node.height, search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + node = search->node.child; + if (!node) { + SSDFS_WARN("child node is NULL\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node->tree || !node->tree->fsi); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is under initialization\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EACCES; + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + if (!is_btree_search_node_desc_consistent(search)) { + SSDFS_WARN("node descriptor is inconsistent\n"); + return -ERANGE; + } + + err = ssdfs_btree_node_check_result_for_alloc(search); + if (err) + return err; + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + case SSDFS_BTREE_NODE_AREA_ABSENT: + SSDFS_WARN("items area is absent: node_id %u\n", + search->node.id); + return -ENOENT; + + default: + SSDFS_WARN("invalid items area state: node_id %u\n", + search->node.id); + return -ERANGE; + } + + if (node->node_ops && node->node_ops->allocate_item) { + err = node->node_ops->allocate_item(node, search); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate item: err %d\n", + err); + search->result.state = SSDFS_BTREE_SEARCH_FAILURE; + search->result.search_cno = U64_MAX; + search->result.start_index = U16_MAX; + search->result.count = U16_MAX; + return err; + } + } else + return -EOPNOTSUPP; + + spin_lock(&node->descriptor_lock); + search->result.search_cno = ssdfs_current_cno(fsi->sb); + node->update_cno = search->result.search_cno; + flags = le16_to_cpu(node->node_index.flags); + flags &= ~SSDFS_BTREE_INDEX_SHOW_EMPTY_NODE; + node->node_index.flags = cpu_to_le16(flags); + spin_unlock(&node->descriptor_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node->update_cno %llu\n", + search->result.search_cno); +#endif /* CONFIG_SSDFS_DEBUG */ + + set_ssdfs_btree_node_dirty(node); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} + +/* + * ssdfs_btree_node_allocate_range() - allocate the range in the node + * @search: btree search object + * + * This method tries to allocate a range of items in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EEXIST - range of items is used already. + * %-ENOSPC - range is out of node. + * %-ENOENT - node hasn't the items area. + * %-EACCES - node is under initialization yet. + * %-EAGAIN - search object contains obsolete result. + */ +int ssdfs_btree_node_allocate_range(struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_btree_node *node; + u16 flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); + BUG_ON(search->request.start.hash > search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + search->node.state, search->node.id, + search->node.height, search->node.parent, + search->node.child); +#else + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + search->node.state, search->node.id, + search->node.height, search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + node = search->node.child; + if (!node) { + SSDFS_WARN("child node is NULL\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node->tree || !node->tree->fsi); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is under initialization\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EACCES; + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + if (!is_btree_search_node_desc_consistent(search)) { + SSDFS_WARN("node descriptor is inconsistent\n"); + return -ERANGE; + } + + err = ssdfs_btree_node_check_result_for_alloc(search); + if (err) + return err; + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + case SSDFS_BTREE_NODE_AREA_ABSENT: + SSDFS_WARN("items area is absent: node_id %u\n", + search->node.id); + return -ENOENT; + + default: + SSDFS_WARN("invalid items area state: node_id %u\n", + search->node.id); + return -ERANGE; + } + + if (node->node_ops && node->node_ops->allocate_range) { + err = node->node_ops->allocate_range(node, search); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate item: err %d\n", + err); + search->result.state = SSDFS_BTREE_SEARCH_FAILURE; + search->result.search_cno = U64_MAX; + search->result.start_index = U16_MAX; + search->result.count = U16_MAX; + return err; + } + } else + return -EOPNOTSUPP; + + spin_lock(&node->descriptor_lock); + search->result.search_cno = ssdfs_current_cno(fsi->sb); + node->update_cno = search->result.search_cno; + flags = le16_to_cpu(node->node_index.flags); + flags &= ~SSDFS_BTREE_INDEX_SHOW_EMPTY_NODE; + node->node_index.flags = cpu_to_le16(flags); + spin_unlock(&node->descriptor_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node->update_cno %llu\n", + search->result.search_cno); +#endif /* CONFIG_SSDFS_DEBUG */ + + set_ssdfs_btree_node_dirty(node); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} + +/* + * ssdfs_btree_node_check_result_for_insert() - check search result for insert + * @search: btree search object + */ +static inline +int ssdfs_btree_node_check_result_for_insert(struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_VALID_ITEM: + return -EEXIST; + + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + /* expected state */ + break; + + default: + SSDFS_WARN("invalid search result state\n"); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_btree_node_insert_item() - insert the item in the node + * @search: btree search object + * + * This method tries to insert an item in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EEXIST - item exists. + * %-ENOSPC - node hasn't free space. + * %-EFBIG - some items were pushed out from the node. + * %-ENOENT - node hasn't the items area. + * %-EACCES - node is under initialization yet. + * %-EAGAIN - search object contains obsolete result. + * %-EOPNOTSUPP - specialized insert method doesn't been implemented + */ +int ssdfs_btree_node_insert_item(struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_btree_node *node; + u16 flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); + BUG_ON(search->request.start.hash > search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + search->node.state, search->node.id, + search->node.height, search->node.parent, + search->node.child); +#else + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + search->node.state, search->node.id, + search->node.height, search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + node = search->node.child; + if (!node) { + SSDFS_WARN("child node is NULL\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node->tree || !node->tree->fsi); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is under initialization\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EACCES; + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_space %u\n", node->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_btree_search_node_desc_consistent(search)) { + SSDFS_WARN("node descriptor is inconsistent\n"); + return -ERANGE; + } + + err = ssdfs_btree_node_check_result_for_insert(search); + if (err) + return err; + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + case SSDFS_BTREE_NODE_AREA_ABSENT: + SSDFS_WARN("items area is absent: node_id %u\n", + search->node.id); + return -ENOENT; + + default: + SSDFS_WARN("invalid items area state: node_id %u\n", + search->node.id); + return -ERANGE; + } + + if (!node->node_ops || !node->node_ops->insert_item) { + SSDFS_WARN("unable to insert item\n"); + return -EOPNOTSUPP; + } + + err = node->node_ops->insert_item(node, search); + if (unlikely(err)) { + SSDFS_ERR("fail to insert: " + "node %u, " + "request (start_hash %llx, end_hash %llx), " + "err %d\n", + node->node_id, + search->request.start.hash, + search->request.end.hash, + err); + + search->result.state = SSDFS_BTREE_SEARCH_FAILURE; + search->result.err = err; + return err; + } + + spin_lock(&node->descriptor_lock); + search->result.search_cno = ssdfs_current_cno(fsi->sb); + node->update_cno = search->result.search_cno; + flags = le16_to_cpu(node->node_index.flags); + flags &= ~SSDFS_BTREE_INDEX_SHOW_EMPTY_NODE; + node->node_index.flags = cpu_to_le16(flags); + spin_unlock(&node->descriptor_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node->update_cno %llu\n", + search->result.search_cno); +#endif /* CONFIG_SSDFS_DEBUG */ + + set_ssdfs_btree_node_dirty(node); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} + +/* + * ssdfs_btree_node_insert_range() - insert the range in the node + * @search: btree search object + * + * This method tries to insert a range of items in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOSPC - node hasn't free space. + * %-EFBIG - some items were pushed out from the node. + * %-ENOENT - node hasn't the items area. + * %-EACCES - node is under initialization yet. + * %-EAGAIN - search object contains obsolete result. + */ +int ssdfs_btree_node_insert_range(struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_btree_node *node; + u16 flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); + BUG_ON(search->request.start.hash > search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + search->node.state, search->node.id, + search->node.height, search->node.parent, + search->node.child); +#else + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + search->node.state, search->node.id, + search->node.height, search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + node = search->node.child; + if (!node) { + SSDFS_WARN("child node is NULL\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node->tree || !node->tree->fsi); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is under initialization\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EACCES; + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + if (!is_btree_search_node_desc_consistent(search)) { + SSDFS_WARN("node descriptor is inconsistent\n"); + return -ERANGE; + } + + err = ssdfs_btree_node_check_result_for_insert(search); + if (err) + return err; + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + case SSDFS_BTREE_NODE_AREA_ABSENT: + SSDFS_WARN("items area is absent: node_id %u\n", + search->node.id); + return -ENOENT; + + default: + SSDFS_WARN("invalid items area state: node_id %u\n", + search->node.id); + return -ERANGE; + } + + if (!node->node_ops || !node->node_ops->insert_range) { + SSDFS_WARN("unable to insert range\n"); + return -EOPNOTSUPP; + } + + err = node->node_ops->insert_range(node, search); + if (unlikely(err)) { + SSDFS_ERR("fail to insert: " + "node %u, " + "request (start_hash %llx, end_hash %llx), " + "err %d\n", + node->node_id, + search->request.start.hash, + search->request.end.hash, + err); + + search->result.state = SSDFS_BTREE_SEARCH_FAILURE; + search->result.err = err; + return err; + } + + spin_lock(&node->descriptor_lock); + search->result.search_cno = ssdfs_current_cno(fsi->sb); + node->update_cno = search->result.search_cno; + flags = le16_to_cpu(node->node_index.flags); + flags &= ~SSDFS_BTREE_INDEX_SHOW_EMPTY_NODE; + node->node_index.flags = cpu_to_le16(flags); + spin_unlock(&node->descriptor_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node->update_cno %llu\n", + search->result.search_cno); +#endif /* CONFIG_SSDFS_DEBUG */ + + set_ssdfs_btree_node_dirty(node); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} From patchwork Sat Feb 25 01:09:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151957 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8126C7EE31 for ; Sat, 25 Feb 2023 01:19:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229757AbjBYBTq (ORCPT ); Fri, 24 Feb 2023 20:19:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49422 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229758AbjBYBRl (ORCPT ); Fri, 24 Feb 2023 20:17:41 -0500 Received: from mail-oi1-x236.google.com (mail-oi1-x236.google.com [IPv6:2607:f8b0:4864:20::236]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 91FA41B2D6 for ; Fri, 24 Feb 2023 17:17:30 -0800 (PST) Received: by mail-oi1-x236.google.com with SMTP id be35so815810oib.4 for ; Fri, 24 Feb 2023 17:17:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=DcX2bwxCaphM4LGPdHyekYkzySQ65p1ZEfBWv9igM5o=; b=Ga9fpXyU4UfkCFEE/thbX7FtMlfJWUXgZDo/knxxPnxreNDX6krJmlWTDllrM/3n2l jrUsCvF4RR2c8/Yh8Cd+A6gB5x2B3QFbCj4cwWnK2WFSJF3kEjFcePxXOAiNdMcFdouB sbAf6lGD9QGuURE0jDVd+ItBcPMzaOTjV3feQz3kmZiS8L4XgZ4cYenhwRF1fC30TyLQ j4o+cIVTinD2/gVNnUU1EdYdmHBd4aifxncnJSY+dxK99+KchFTKS6bYwy8bJqLq7g6v BA8lZeD+hqOhgG2Rc+2I8FmAwR77U2IAJuzsOdSKG1i2dlb9ow+efIxXX7oCQMRYP3Qd +BEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DcX2bwxCaphM4LGPdHyekYkzySQ65p1ZEfBWv9igM5o=; b=xywNKPQmrhk4nAjoZ1bpvR6ELhEP0YrtrwRsy63ng2dwcp09LEiizXkQQTS80gHMKP DcOf59e3Z3D7NUZZglSQdLy+9dvGX8PbyBsKzsivwMY0DAQIjsM7g+swRy6/VxJYm690 bCGg7OH8gcPp4WwbILJhIS0RCxLT4TMVvohc72uvBOCgyCAW13VOssg0qpBCVXun9+xY yaePmQKfg1kC/iYMCFOSJEX5dPEqmSdo5qBGOnAoKDjc3LjOc+xf3wIkU+sw21oMl/k8 fk+LEMYM0+lRllSaAJJSo9Wp9bNTvrkIe/oS2qTR+BcThpwpGHzYdpvkNLPCTehVaKde t95w== X-Gm-Message-State: AO0yUKVO1tTnQwE1gIRE6NTSjNiRFtviN62zOOJn3R2tgQuP9bdMfig8 i/8S3vjtK1r50gujTRvWPrdGBzLQN76Q0tle X-Google-Smtp-Source: AK7set+HvvpgHCTm/cnJcSRhJXshylckLX67cKUX+dVGYfj12mfznClyLVd1DnQoptYg1C0WznRJkQ== X-Received: by 2002:a05:6808:6249:b0:37b:562:2138 with SMTP id dt9-20020a056808624900b0037b05622138mr4501879oib.42.1677287849272; Fri, 24 Feb 2023 17:17:29 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:28 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 54/76] ssdfs: change/delete b-tree node operations Date: Fri, 24 Feb 2023 17:09:05 -0800 Message-Id: <20230225010927.813929-55-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org B-tree node implements change and delete item or range of items in the node: (1) change_item - change item in the b-tre node (2) delete_item - delete item from b-tree node (3) delete_range - delete range of items from b-tree node Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/btree_node.c | 2577 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 2577 insertions(+) diff --git a/fs/ssdfs/btree_node.c b/fs/ssdfs/btree_node.c index aa9d90ba8598..8d939451de05 100644 --- a/fs/ssdfs/btree_node.c +++ b/fs/ssdfs/btree_node.c @@ -11342,3 +11342,2580 @@ int ssdfs_btree_node_insert_range(struct ssdfs_btree_search *search) return 0; } + +/* + * ssdfs_btree_node_check_result_for_change() - check search result for change + * @search: btree search object + */ +static inline +int ssdfs_btree_node_check_result_for_change(struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_VALID_ITEM: + /* expected state */ + break; + + default: + SSDFS_WARN("invalid search result state\n"); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_btree_node_change_item() - change the item in the node + * @search: btree search object + * + * This method tries to change an item in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - node doesn't contain the item. + * %-ENOSPC - the new item's state cannot be stored in the node. + * %-ENOENT - node hasn't the items area. + * %-EACCES - node is under initialization yet. + * %-EAGAIN - search object contains obsolete result. + */ +int ssdfs_btree_node_change_item(struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_btree_node *node; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); + BUG_ON(search->request.start.hash > search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + search->node.state, search->node.id, + search->node.height, search->node.parent, + search->node.child); +#else + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + search->node.state, search->node.id, + search->node.height, search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + node = search->node.child; + if (!node) { + SSDFS_WARN("child node is NULL\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node->tree || !node->tree->fsi); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is under initialization\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EACCES; + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + if (!is_btree_search_node_desc_consistent(search)) { + SSDFS_WARN("node descriptor is inconsistent\n"); + return -ERANGE; + } + + err = ssdfs_btree_node_check_result_for_change(search); + if (err) + return err; + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + case SSDFS_BTREE_NODE_AREA_ABSENT: + SSDFS_WARN("items area is absent: node_id %u\n", + search->node.id); + return -ENOENT; + + default: + SSDFS_WARN("invalid items area state: node_id %u\n", + search->node.id); + return -ERANGE; + } + + if (!node->node_ops || !node->node_ops->change_item) { + SSDFS_WARN("unable to change item\n"); + return -EOPNOTSUPP; + } + + err = node->node_ops->change_item(node, search); + if (unlikely(err)) { + SSDFS_ERR("fail to change item: " + "node %u, " + "request (start_hash %llx, end_hash %llx), " + "err %d\n", + node->node_id, + search->request.start.hash, + search->request.end.hash, + err); + + search->result.state = SSDFS_BTREE_SEARCH_FAILURE; + search->result.err = err; + return err; + } + + spin_lock(&node->descriptor_lock); + search->result.search_cno = ssdfs_current_cno(fsi->sb); + node->update_cno = search->result.search_cno; + spin_unlock(&node->descriptor_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node->update_cno %llu\n", + search->result.search_cno); +#endif /* CONFIG_SSDFS_DEBUG */ + + set_ssdfs_btree_node_dirty(node); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} + +/* + * ssdfs_btree_node_check_result_for_delete() - check search result for delete + * @search: btree search object + */ +static inline +int ssdfs_btree_node_check_result_for_delete(struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_VALID_ITEM: + /* expected state */ + break; + + default: + SSDFS_WARN("invalid search result state\n"); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_btree_node_delete_item() - delete the item from the node + * @search: btree search object + * + * This method tries to delete an item from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - node doesn't contain the item. + * %-ENOENT - node's items area is empty. + * %-EACCES - node is under initialization yet. + * %-EAGAIN - search object contains obsolete result. + */ +int ssdfs_btree_node_delete_item(struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_btree_node *node; + u16 items_count, index_count; + bool is_node_empty = false; + u16 flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); + BUG_ON(search->request.start.hash > search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + search->node.state, search->node.id, + search->node.height, search->node.parent, + search->node.child); +#else + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + search->node.state, search->node.id, + search->node.height, search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + node = search->node.child; + if (!node) { + SSDFS_WARN("child node is NULL\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node->tree || !node->tree->fsi); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is under initialization\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EACCES; + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + if (!is_btree_search_node_desc_consistent(search)) { + SSDFS_WARN("node descriptor is inconsistent\n"); + return -ERANGE; + } + + err = ssdfs_btree_node_check_result_for_delete(search); + if (err) + return err; + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + case SSDFS_BTREE_NODE_AREA_ABSENT: + SSDFS_WARN("items area is absent: node_id %u\n", + search->node.id); + return -ENOENT; + + default: + SSDFS_WARN("invalid items area state: node_id %u\n", + search->node.id); + return -ERANGE; + } + + if (!node->node_ops || !node->node_ops->delete_item) { + SSDFS_WARN("unable to delete item\n"); + return -EOPNOTSUPP; + } + + err = node->node_ops->delete_item(node, search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete item: " + "node %u, " + "request (start_hash %llx, end_hash %llx), " + "err %d\n", + node->node_id, + search->request.start.hash, + search->request.end.hash, + err); + + search->result.state = SSDFS_BTREE_SEARCH_FAILURE; + search->result.err = err; + return err; + } + + down_read(&node->header_lock); + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + items_count = node->items_area.items_count; + break; + + default: + items_count = 0; + break; + } + + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + index_count = node->index_area.index_count; + break; + + default: + index_count = 0; + break; + } + + is_node_empty = index_count == 0 && items_count == 0; + + up_read(&node->header_lock); + + spin_lock(&node->descriptor_lock); + search->result.search_cno = ssdfs_current_cno(fsi->sb); + node->update_cno = search->result.search_cno; + if (is_node_empty) { + flags = le16_to_cpu(node->node_index.flags); + flags = SSDFS_BTREE_INDEX_SHOW_EMPTY_NODE; + node->node_index.flags = cpu_to_le16(flags); + } + spin_unlock(&node->descriptor_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node->update_cno %llu\n", + search->result.search_cno); +#endif /* CONFIG_SSDFS_DEBUG */ + + set_ssdfs_btree_node_dirty(node); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} + +/* + * ssdfs_btree_node_delete_range() - delete the range of items from the node + * @search: btree search object + * + * This method tries to delete a range of items from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - node doesn't contain the range of items. + * %-ENOENT - node's items area is empty. + * %-EACCES - node is under initialization yet. + * %-EAGAIN - search object contains obsolete result. + */ +int ssdfs_btree_node_delete_range(struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_btree_node *node; + u16 items_count, index_count; + bool is_node_empty = false; + u16 flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); + BUG_ON(search->request.start.hash > search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + search->node.state, search->node.id, + search->node.height, search->node.parent, + search->node.child); +#else + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + search->node.state, search->node.id, + search->node.height, search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + node = search->node.child; + if (!node) { + SSDFS_WARN("child node is NULL\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node->tree || !node->tree->fsi); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is under initialization\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EACCES; + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + if (!is_btree_search_node_desc_consistent(search)) { + SSDFS_WARN("node descriptor is inconsistent\n"); + return -ERANGE; + } + + err = ssdfs_btree_node_check_result_for_delete(search); + if (err) + return err; + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + case SSDFS_BTREE_NODE_AREA_ABSENT: + SSDFS_WARN("items area is absent: node_id %u\n", + search->node.id); + return -ENOENT; + + default: + SSDFS_WARN("invalid items area state: node_id %u\n", + search->node.id); + return -ERANGE; + } + + if (!node->node_ops || !node->node_ops->delete_range) { + SSDFS_WARN("unable to delete item\n"); + return -EOPNOTSUPP; + } + + err = node->node_ops->delete_range(node, search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete range: " + "node %u, " + "request (start_hash %llx, end_hash %llx), " + "err %d\n", + node->node_id, + search->request.start.hash, + search->request.end.hash, + err); + + search->result.state = SSDFS_BTREE_SEARCH_FAILURE; + search->result.err = err; + return err; + } + + down_read(&node->header_lock); + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + items_count = node->items_area.items_count; + break; + + default: + items_count = 0; + break; + } + + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + index_count = node->index_area.index_count; + break; + + default: + index_count = 0; + break; + } + + is_node_empty = index_count == 0 && items_count == 0; + + up_read(&node->header_lock); + + spin_lock(&node->descriptor_lock); + search->result.search_cno = ssdfs_current_cno(fsi->sb); + node->update_cno = search->result.search_cno; + if (is_node_empty) { + flags = le16_to_cpu(node->node_index.flags); + flags = SSDFS_BTREE_INDEX_SHOW_EMPTY_NODE; + node->node_index.flags = cpu_to_le16(flags); + } + spin_unlock(&node->descriptor_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node->update_cno %llu\n", + search->result.search_cno); +#endif /* CONFIG_SSDFS_DEBUG */ + + set_ssdfs_btree_node_dirty(node); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} + +/* + * ssdfs_btree_node_clear_range() - clear range of deleted items + * @node: pointer on node object + * @area: items area descriptor + * @item_size: size of item in bytes + * @search: search object + * + * This method tries to clear the range of deleted items. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int __ssdfs_btree_node_clear_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + size_t item_size, + u16 start_index, + unsigned int range_len) +{ + int page_index; + int dst_index; + struct page *page; + u32 item_offset; + u16 cleared_items = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, item_size %zu\n", + node->node_id, item_size); + SSDFS_DBG("start_index %u, range_len %u\n", + start_index, range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (range_len == 0) { + SSDFS_WARN("range_len == 0\n"); + return -ERANGE; + } + + if ((start_index + range_len) > area->items_capacity) { + SSDFS_ERR("range is out of capacity: " + "start_index %u, range_len %u, items_capacity %u\n", + start_index, range_len, area->items_capacity); + return -ERANGE; + } + + dst_index = start_index; + + do { + u32 clearing_items; + u32 vacant_positions; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_index %u, dst_index %d\n", + start_index, dst_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + item_offset = (u32)dst_index * item_size; + if (item_offset >= area->area_size) { + SSDFS_ERR("item_offset %u >= area_size %u\n", + item_offset, area->area_size); + return -ERANGE; + } + + item_offset += area->offset; + if (item_offset >= node->node_size) { + SSDFS_ERR("item_offset %u >= node_size %u\n", + item_offset, node->node_size); + return -ERANGE; + } + + page_index = item_offset >> PAGE_SHIFT; + if (page_index >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("invalid page_index: " + "index %d, pvec_size %u\n", + page_index, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + if (page_index > 0) + item_offset %= page_index * PAGE_SIZE; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(start_index > dst_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + clearing_items = dst_index - start_index; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(clearing_items > range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + clearing_items = range_len - clearing_items; + + if (clearing_items == 0) { + SSDFS_WARN("no items for clearing\n"); + return -ERANGE; + } + + vacant_positions = PAGE_SIZE - item_offset; + vacant_positions /= item_size; + + if (vacant_positions == 0) { + SSDFS_WARN("invalid vacant_positions %u\n", + vacant_positions); + return -ERANGE; + } + + clearing_items = min_t(u32, clearing_items, vacant_positions); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(clearing_items >= U16_MAX); + + SSDFS_DBG("clearing_items %u, item_offset %u\n", + clearing_items, item_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + if ((item_offset + (clearing_items * item_size)) > PAGE_SIZE) { + SSDFS_ERR("invalid request: " + "item_offset %u, clearing_items %u, " + "item_size %zu\n", + item_offset, clearing_items, item_size); + return -ERANGE; + } + + page = node->content.pvec.pages[page_index]; + ssdfs_memset_page(page, item_offset, PAGE_SIZE, + 0x0, clearing_items * item_size); + + dst_index += clearing_items; + cleared_items += clearing_items; + } while (cleared_items < range_len); + + if (cleared_items != range_len) { + SSDFS_ERR("cleared_items %u != range_len %u\n", + cleared_items, range_len); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_btree_node_clear_range() - clear range of deleted items + * @node: pointer on node object + * @area: items area descriptor + * @item_size: size of item in bytes + * @search: search object + * + * This method tries to clear the range of deleted items. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_btree_node_clear_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + size_t item_size, + struct ssdfs_btree_search *search) +{ + u16 start_index; + unsigned int range_len; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, item_size %zu\n", + node->node_id, item_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + start_index = search->result.start_index; + range_len = search->request.count; + + return __ssdfs_btree_node_clear_range(node, area, item_size, + start_index, range_len); +} + +/* + * ssdfs_btree_node_extract_range() - extract the range from the node + * @start_index: starting index in the node + * @count: count of items in the range + * @search: btree search object + * + * This method tries to extract a range of items from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - node doesn't contain items for the requested range. + * %-ENOENT - node hasn't the items area. + * %-EACCES - node is under initialization yet. + * %-EOPNOTSUPP - specialized extract method doesn't been implemented + */ +int ssdfs_btree_node_extract_range(u16 start_index, u16 count, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_node *node; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p, " + "start_index %u, count %u\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + search->node.state, search->node.id, + search->node.height, search->node.parent, + search->node.child, start_index, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + node = search->node.child; + if (!node) { + SSDFS_WARN("child node is NULL\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node->tree); + BUG_ON(!rwsem_is_locked(&node->tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is under initialization\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EACCES; + + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + case SSDFS_BTREE_NODE_AREA_ABSENT: + SSDFS_WARN("items area is absent: node_id %u\n", + search->node.id); + return -ENOENT; + + default: + SSDFS_WARN("invalid items area state: node_id %u\n", + search->node.id); + return -ERANGE; + } + + if (!node->node_ops || !node->node_ops->extract_range) { + SSDFS_WARN("unable to extract the range from the node\n"); + return -EOPNOTSUPP; + } + + err = node->node_ops->extract_range(node, start_index, count, search); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u " + "hasn't item for request " + "(start_index %u, count %u)\n", + node->node_id, + start_index, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->result.state = SSDFS_BTREE_SEARCH_EMPTY_RESULT; + search->result.err = err; + } else if (err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u hasn't items area\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->result.state = SSDFS_BTREE_SEARCH_FAILURE; + search->result.err = err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to extract the range: " + "node %u, " + "request (start_index %u, count %u), " + "err %d\n", + node->node_id, + start_index, count, err); + + search->result.state = SSDFS_BTREE_SEARCH_FAILURE; + search->result.err = err; + } + + return err; +} + +/* + * __ssdfs_btree_node_move_items_range() - move range between nodes + * @src: source node + * @dst: destination node + * @start_item: starting index of the item + * @count: count of items in the range + * + * This method tries to move a range of items from @src node into + * @dst node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + * %-ENOMEM - fail to allocate memory. + * %-ENODATA - no such range in the node. + */ +static +int __ssdfs_btree_node_move_items_range(struct ssdfs_btree_node *src, + struct ssdfs_btree_node *dst, + u16 start_item, u16 count) +{ + struct ssdfs_btree_search *search; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!src || !dst); + + SSDFS_DBG("src node_id %u, dst node_id %u, " + "start_item %u, count %u\n", + src->node_id, dst->node_id, + start_item, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + search = ssdfs_btree_search_alloc(); + if (!search) { + SSDFS_ERR("fail to allocate btree search object\n"); + return -ENOMEM; + } + + ssdfs_btree_search_init(search); + + if (!src->node_ops) { + if (!src->node_ops->extract_range) { + SSDFS_WARN("unable to extract the items range\n"); + return -EOPNOTSUPP; + } + + if (!src->node_ops->delete_range) { + SSDFS_WARN("unable to delete the items range\n"); + return -EOPNOTSUPP; + } + } + + if (!dst->node_ops) { + if (!dst->node_ops->find_range) { + SSDFS_WARN("unable to find the items range\n"); + return -EOPNOTSUPP; + } + + if (!dst->node_ops->insert_range) { + SSDFS_WARN("unable to insert the items range\n"); + return -EOPNOTSUPP; + } + } + + err = src->node_ops->extract_range(src, start_item, count, search); + if (unlikely(err)) { + SSDFS_ERR("fail to extract range: " + "node_id %u, start_item %u, " + "count %u, err %d\n", + src->node_id, start_item, count, err); + goto finish_move_items_range; + } + + ssdfs_debug_btree_search_object(search); + + if (count != search->result.count) { + err = -ERANGE; + SSDFS_ERR("invalid count (request %u, result %u)\n", + count, search->result.count); + goto finish_move_items_range; + } + + switch (src->tree->type) { + case SSDFS_EXTENTS_BTREE: + case SSDFS_XATTR_BTREE: + search->request.flags |= SSDFS_BTREE_SEARCH_NOT_INVALIDATE; + break; + + default: + /* continue logic */ + break; + } + + search->request.type = SSDFS_BTREE_SEARCH_DELETE_RANGE; + + err = src->node_ops->delete_range(src, search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete range: " + "node_id %u, start_item %u, " + "count %u, err %d\n", + src->node_id, start_item, count, err); + goto finish_move_items_range; + } + + search->request.type = SSDFS_BTREE_SEARCH_ADD_RANGE; + + err = dst->node_ops->find_range(dst, search); + if (err == -ENODATA) { + err = 0; + /* + * Node is empty. We are ready to insert. + */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to find range: " + "node_id %u, err %d\n", + dst->node_id, err); + goto finish_move_items_range; + } + + err = dst->node_ops->insert_range(dst, search); + if (unlikely(err)) { + SSDFS_ERR("fail to insert range: " + "node_id %u, err %d\n", + dst->node_id, err); + goto finish_move_items_range; + } + +finish_move_items_range: + ssdfs_btree_search_free(search); + return err; +} + +/* + * ssdfs_btree_node_move_items_range() - move items range + * @src: source node + * @dst: destination node + * @start_item: startig index of the item + * @count: count of items in the range + * + * This method tries to move the range of items from @src into @dst. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOENT - items area is absent. + * %-EOPNOTSUPP - btree doesn't support the items moving operation. + */ +int ssdfs_btree_node_move_items_range(struct ssdfs_btree_node *src, + struct ssdfs_btree_node *dst, + u16 start_item, u16 count) +{ + struct ssdfs_fs_info *fsi; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!src || !dst); + BUG_ON(!src->tree); + BUG_ON(!rwsem_is_locked(&src->tree->lock)); + + SSDFS_DBG("src node_id %u, dst node_id %u, " + "start_item %u, count %u\n", + src->node_id, dst->node_id, + start_item, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = src->tree->fsi; + + switch (atomic_read(&src->state)) { + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid src node state %#x\n", + atomic_read(&src->state)); + return -ERANGE; + } + + switch (atomic_read(&dst->state)) { + case SSDFS_BTREE_NODE_CREATED: + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dst node state %#x\n", + atomic_read(&dst->state)); + return -ERANGE; + } + + switch (atomic_read(&src->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + case SSDFS_BTREE_NODE_AREA_ABSENT: + SSDFS_WARN("items area is absent: node_id %u\n", + src->node_id); + return -ENOENT; + + default: + SSDFS_WARN("invalid items area state: node_id %u\n", + src->node_id); + return -ERANGE; + } + + switch (atomic_read(&dst->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + case SSDFS_BTREE_NODE_AREA_ABSENT: + SSDFS_WARN("items area is absent: node_id %u\n", + dst->node_id); + return -ENOENT; + + default: + SSDFS_WARN("invalid items area state: node_id %u\n", + dst->node_id); + return -ERANGE; + } + + if (!src->node_ops) { + SSDFS_WARN("unable to move the items range\n"); + return -EOPNOTSUPP; + } else if (!src->node_ops->move_items_range) { + err = __ssdfs_btree_node_move_items_range(src, dst, + start_item, count); + } else { + err = src->node_ops->move_items_range(src, dst, + start_item, count); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to move the items range: " + "src node_id %u, dst node_id %u, " + "start_item %u, count %u\n", + src->node_id, dst->node_id, + start_item, count); + return err; + } + + ssdfs_set_node_update_cno(src); + ssdfs_set_node_update_cno(dst); + set_ssdfs_btree_node_dirty(src); + set_ssdfs_btree_node_dirty(dst); + + return 0; +} + +/* + * ssdfs_copy_item_in_buffer() - copy item from node into buffer + * @node: pointer on node object + * @index: item index + * @item_size: size of item in bytes + * @search: pointer on search request object [in|out] + * + * This method tries to copy item from the node into buffer. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_copy_item_in_buffer(struct ssdfs_btree_node *node, + u16 index, + size_t item_size, + struct ssdfs_btree_search *search) +{ + DEFINE_WAIT(wait); + struct ssdfs_state_bitmap *bmap; + u32 area_offset; + u32 area_size; + u32 item_offset; + u32 buf_offset; + int page_index; + struct page *page; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("node_id %u, index %u\n", + node->node_id, index); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->header_lock); + area_offset = node->items_area.offset; + area_size = node->items_area.area_size; + up_read(&node->header_lock); + + item_offset = (u32)index * item_size; + if (item_offset >= area_size) { + SSDFS_ERR("item_offset %u >= area_size %u\n", + item_offset, area_size); + return -ERANGE; + } + + item_offset += area_offset; + if (item_offset >= node->node_size) { + SSDFS_ERR("item_offset %u >= node_size %u\n", + item_offset, node->node_size); + return -ERANGE; + } + + page_index = item_offset >> PAGE_SHIFT; + + if (page_index > 0) + item_offset %= page_index * PAGE_SIZE; + + down_read(&node->full_lock); + + if (page_index >= pagevec_count(&node->content.pvec)) { + err = -ERANGE; + SSDFS_ERR("invalid page_index: " + "index %d, pvec_size %u\n", + page_index, + pagevec_count(&node->content.pvec)); + goto finish_copy_item; + } + + page = node->content.pvec.pages[page_index]; + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_LOCK_BMAP]; + + down_read(&node->bmap_array.lock); + +try_lock_area: + spin_lock(&bmap->lock); + + err = bitmap_allocate_region(bmap->ptr, (unsigned int)index, 0); + if (err == -EBUSY) { + err = 0; + prepare_to_wait(&node->wait_queue, &wait, + TASK_UNINTERRUPTIBLE); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("waiting unlocked state of item %u\n", + index); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_unlock(&bmap->lock); + + schedule(); + finish_wait(&node->wait_queue, &wait); + goto try_lock_area; + } + + spin_unlock(&bmap->lock); + + up_read(&node->bmap_array.lock); + + if (err) { + SSDFS_ERR("fail to lock: index %u, err %d\n", + index, err); + goto finish_copy_item; + } + + if (!search->result.buf) { + err = -ERANGE; + SSDFS_ERR("buffer is not created\n"); + goto finish_copy_item; + } + + buf_offset = search->result.items_in_buffer * item_size; + + err = ssdfs_memcpy_from_page(search->result.buf, + buf_offset, search->result.buf_size, + page, item_offset, PAGE_SIZE, + item_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + goto unlock_area; + } + + search->result.items_in_buffer++; + +unlock_area: + down_read(&node->bmap_array.lock); + spin_lock(&bmap->lock); + bitmap_clear(bmap->ptr, (unsigned int)index, 1); + spin_unlock(&bmap->lock); + up_read(&node->bmap_array.lock); + + wake_up_all(&node->wait_queue); + +finish_copy_item: + up_read(&node->full_lock); + + if (unlikely(err)) + return err; + + return 0; +} + +/* + * ssdfs_lock_items_range() - lock range of items in the node + * @node: pointer on node object + * @start_index: start index of the range + * @count: count of items in the range + * + * This method tries to lock range of items in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOENT - unable to lock the node's header + * %-ENODATA - unable to lock the range of items + */ +int ssdfs_lock_items_range(struct ssdfs_btree_node *node, + u16 start_index, u16 count) +{ + DEFINE_WAIT(wait); + struct ssdfs_state_bitmap *bmap; + unsigned long start_area; + int i = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("start_index %u, count %u\n", + start_index, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->bmap_array.lock); + + start_area = node->bmap_array.item_start_bit; + if (start_area == ULONG_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid items_area_start\n"); + goto finish_lock; + } + + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_LOCK_BMAP]; + if (!bmap->ptr) { + err = -ERANGE; + SSDFS_WARN("lock bitmap is empty\n"); + goto finish_lock; + } + +try_lock_area: + spin_lock(&bmap->lock); + + for (; i < count; i++) { + err = bitmap_allocate_region(bmap->ptr, + start_area + start_index + i, 0); + if (err) + break; + } + + if (err == -EBUSY) { + err = 0; + bitmap_clear(bmap->ptr, start_area + start_index, i); + prepare_to_wait(&node->wait_queue, &wait, + TASK_UNINTERRUPTIBLE); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("waiting unlocked state of item %u\n", + start_index + i); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_unlock(&bmap->lock); + + schedule(); + finish_wait(&node->wait_queue, &wait); + goto try_lock_area; + } + + spin_unlock(&bmap->lock); + +finish_lock: + up_read(&node->bmap_array.lock); + + return err; +} + +/* + * ssdfs_unlock_items_range() - unlock range of items in the node + * @node: pointer on node object + * @start_index: start index of the range + * @count: count of items in the range + */ +void ssdfs_unlock_items_range(struct ssdfs_btree_node *node, + u16 start_index, u16 count) +{ + struct ssdfs_state_bitmap *bmap; + unsigned long start_area; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("start_index %u, count %u\n", + start_index, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->bmap_array.lock); + + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_LOCK_BMAP]; + start_area = node->bmap_array.item_start_bit; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!bmap->ptr); + BUG_ON(start_area == ULONG_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&bmap->lock); + bitmap_clear(bmap->ptr, start_area + start_index, count); + spin_unlock(&bmap->lock); + + up_read(&node->bmap_array.lock); + wake_up_all(&node->wait_queue); +} + +/* + * ssdfs_allocate_items_range() - allocate range of items in bitmap + * @node: pointer on node object + * @search: pointer on search request object + * @items_capacity: items capacity in the node + * @start_index: start index of the range + * @count: count of items in the range + * + * This method tries to allocate range of items in bitmap. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EEXIST - range is allocated already. + */ +int ssdfs_allocate_items_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search, + u16 items_capacity, + u16 start_index, u16 count) +{ + struct ssdfs_state_bitmap *bmap; + unsigned long found = ULONG_MAX; + unsigned long start_area; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("items_capacity %u, start_index %u, count %u\n", + items_capacity, start_index, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->bmap_array.lock); + + start_area = node->bmap_array.item_start_bit; + if (start_area == ULONG_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid items_area_start\n"); + goto finish_allocate_items_range; + } + + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_ALLOC_BMAP]; + if (!bmap->ptr) { + err = -ERANGE; + SSDFS_WARN("alloc bitmap is empty\n"); + goto finish_allocate_items_range; + } + + spin_lock(&bmap->lock); + + found = bitmap_find_next_zero_area(bmap->ptr, + start_area + items_capacity, + start_area + start_index, + count, 0); + if (search->request.flags & SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE && + found != (start_area + start_index)) { + /* area is allocated already */ + err = -EEXIST; + } + + if (!err) + bitmap_set(bmap->ptr, found, count); + + spin_unlock(&bmap->lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found %lu, start_area %lu, start_index %u\n", + found, start_area, start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) { + SSDFS_ERR("found %lu != start %lu\n", + found, start_area + start_index); + } + +finish_allocate_items_range: + up_read(&node->bmap_array.lock); + + return err; +} + +/* + * is_ssdfs_node_items_range_allocated() - check that range is allocated + * @node: pointer on node object + * @items_capacity: items capacity in the node + * @start_index: start index of the range + * @count: count of items in the range + */ +bool is_ssdfs_node_items_range_allocated(struct ssdfs_btree_node *node, + u16 items_capacity, + u16 start_index, u16 count) +{ + struct ssdfs_state_bitmap *bmap; + unsigned long found = ULONG_MAX; + unsigned long start_area; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("start_index %u, count %u\n", + start_index, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->bmap_array.lock); + + start_area = node->bmap_array.item_start_bit; + BUG_ON(start_area == ULONG_MAX); + + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_ALLOC_BMAP]; + if (!bmap->ptr) + BUG(); + + spin_lock(&bmap->lock); + found = bitmap_find_next_zero_area(bmap->ptr, + start_area + items_capacity, + start_area + start_index, count, 0); + if (found != (start_area + start_index)) { + /* area is allocated already */ + err = -EEXIST; + } + spin_unlock(&bmap->lock); + + up_read(&node->bmap_array.lock); + + if (err == -EEXIST) + return true; + + return false; +} + +/* + * ssdfs_free_items_range() - free range of items in bitmap + * @node: pointer on node object + * @start_index: start index of the range + * @count: count of items in the range + * + * This method tries to free the range of items in bitmap. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_free_items_range(struct ssdfs_btree_node *node, + u16 start_index, u16 count) +{ + struct ssdfs_state_bitmap *bmap; + unsigned long start_area; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("start_index %u, count %u\n", + start_index, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->bmap_array.lock); + + start_area = node->bmap_array.item_start_bit; + if (start_area == ULONG_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid items_area_start\n"); + goto finish_free_items_range; + } + + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_ALLOC_BMAP]; + if (!bmap->ptr) { + err = -ERANGE; + SSDFS_WARN("alloc bitmap is empty\n"); + goto finish_free_items_range; + } + + spin_lock(&bmap->lock); + bitmap_clear(bmap->ptr, start_area + start_index, count); + spin_unlock(&bmap->lock); + +finish_free_items_range: + up_read(&node->bmap_array.lock); + + return err; +} + +/* + * ssdfs_set_node_header_dirty() - mark the node's header as dirty + * @node: pointer on node object + * @items_capacity: items capacity in the node + * + * This method tries to mark the node's header as dirty. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_set_node_header_dirty(struct ssdfs_btree_node *node, + u16 items_capacity) +{ + struct ssdfs_state_bitmap *bmap; + unsigned long found = ULONG_MAX; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, items_capacity %u\n", + node->node_id, items_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->bmap_array.lock); + + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_DIRTY_BMAP]; + if (!bmap->ptr) { + err = -ERANGE; + SSDFS_WARN("dirty bitmap is empty\n"); + goto finish_set_header_dirty; + } + + spin_lock(&bmap->lock); + + found = bitmap_find_next_zero_area(bmap->ptr, items_capacity, + SSDFS_BTREE_NODE_HEADER_INDEX, + 1, 0); + if (found == SSDFS_BTREE_NODE_HEADER_INDEX) + bitmap_set(bmap->ptr, found, 1); + + spin_unlock(&bmap->lock); + +finish_set_header_dirty: + up_read(&node->bmap_array.lock); + + return err; +} + +/* + * ssdfs_clear_node_header_dirty_state() - clear node's header dirty state + * @node: pointer on node object + * + * This method tries to clear the node's header dirty state. + */ +void ssdfs_clear_node_header_dirty_state(struct ssdfs_btree_node *node) +{ + struct ssdfs_state_bitmap *bmap; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->bmap_array.lock); + + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_DIRTY_BMAP]; + if (!bmap->ptr) + BUG(); + + spin_lock(&bmap->lock); + bitmap_clear(bmap->ptr, SSDFS_BTREE_NODE_HEADER_INDEX, 1); + spin_unlock(&bmap->lock); + + up_read(&node->bmap_array.lock); +} + +/* + * ssdfs_set_dirty_items_range() - mark the range of items as dirty + * @node: pointer on node object + * @items_capacity: items capacity in the node + * @start_index: start index of the range + * @count: count of items in the range + * + * This method tries to mark the range of items as dirty. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_set_dirty_items_range(struct ssdfs_btree_node *node, + u16 items_capacity, + u16 start_index, u16 count) +{ + struct ssdfs_state_bitmap *bmap; + unsigned long found = ULONG_MAX; + unsigned long start_area; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("items_capacity %u, start_index %u, count %u\n", + items_capacity, start_index, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->bmap_array.lock); + + start_area = node->bmap_array.item_start_bit; + if (start_area == ULONG_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid items_area_start\n"); + goto finish_set_dirty_items; + } + + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_DIRTY_BMAP]; + if (!bmap->ptr) { + err = -ERANGE; + SSDFS_WARN("dirty bitmap is empty\n"); + goto finish_set_dirty_items; + } + + spin_lock(&bmap->lock); + + found = bitmap_find_next_zero_area(bmap->ptr, + start_area + items_capacity, + start_area + start_index, + count, 0); + if (found != (start_area + start_index)) { + /* area is dirty already */ + err = -EEXIST; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("set bit: start_area %lu, start_index %u, len %u\n", + start_area, start_index, count); + + SSDFS_DBG("BMAP DUMP\n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + bmap->ptr, + node->bmap_array.bmap_bytes); + SSDFS_DBG("\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + bitmap_set(bmap->ptr, start_area + start_index, count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("BMAP DUMP\n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + bmap->ptr, + node->bmap_array.bmap_bytes); + SSDFS_DBG("\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_unlock(&bmap->lock); + + if (unlikely(err)) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found %lu != start %lu\n", + found, start_area + start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + } + +finish_set_dirty_items: + up_read(&node->bmap_array.lock); + + if (!err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u, tree_type %#x, " + "start_index %u, count %u\n", + node->node_id, node->tree->type, + start_index, count); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return err; +} + +/* + * ssdfs_clear_dirty_items_range_state() - clear items range's dirty state + * @node: pointer on node object + * @start_index: start index of the range + * @count: count of items in the range + * + * This method tries to clear the range of items' dirty state. + */ +void ssdfs_clear_dirty_items_range_state(struct ssdfs_btree_node *node, + u16 start_index, u16 count) +{ + struct ssdfs_state_bitmap *bmap; + unsigned long start_area; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("start_index %u, count %u\n", + start_index, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->bmap_array.lock); + + start_area = node->bmap_array.item_start_bit; + BUG_ON(start_area == ULONG_MAX); + + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_DIRTY_BMAP]; + BUG_ON(!bmap->ptr); + + spin_lock(&bmap->lock); + bitmap_clear(bmap->ptr, start_area + start_index, count); + spin_unlock(&bmap->lock); + + up_read(&node->bmap_array.lock); +} + +/* + * is_last_leaf_node_found() - check that found leaf node is the last + * @search: pointer on search object + */ +bool is_last_leaf_node_found(struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_node *parent; + u64 leaf_end_hash; + u64 index_end_hash; + int node_type = SSDFS_BTREE_LEAF_NODE; + spinlock_t * lock; + int state; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); + + SSDFS_DBG("start_hash %llx, end_hash %llx, node_id %u\n", + search->request.start.hash, + search->request.end.hash, + search->node.id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!search->node.child) { + SSDFS_WARN("empty child node pointer\n"); + return false; + } + + if (!search->node.parent) { + SSDFS_WARN("empty parent node pointer\n"); + return false; + } + + switch (atomic_read(&search->node.child->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_WARN("invalid area state %#x\n", + atomic_read(&search->node.child->items_area.state)); + return false; + } + + down_read(&search->node.child->header_lock); + leaf_end_hash = search->node.child->items_area.end_hash; + up_read(&search->node.child->header_lock); + + if (leaf_end_hash >= U64_MAX) { + SSDFS_WARN("leaf node end_hash %llx\n", + leaf_end_hash); + return false; + } + + parent = search->node.parent; + + do { + if (!parent) { + SSDFS_WARN("empty parent node pointer\n"); + return false; + } + + node_type = atomic_read(&parent->type); + + switch (node_type) { + case SSDFS_BTREE_ROOT_NODE: + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + state = atomic_read(&parent->index_area.state); + + switch (state) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_WARN("invalid area state %#x\n", + state); + return false; + } + + down_read(&parent->header_lock); + index_end_hash = parent->index_area.end_hash; + up_read(&parent->header_lock); + + if (index_end_hash >= U64_MAX) { + SSDFS_WARN("index area: end hash %llx\n", + index_end_hash); + return false; + } + + if (leaf_end_hash < index_end_hash) { + /* internal node */ + return false; + } + break; + + default: + SSDFS_WARN("invalid node type %#x\n", + node_type); + return false; + } + + lock = &parent->descriptor_lock; + spin_lock(lock); + parent = parent->parent_node; + spin_unlock(lock); + lock = NULL; + } while (node_type != SSDFS_BTREE_ROOT_NODE); + + return true; +} + +/* + * ssdfs_btree_node_find_lookup_index_nolock() - find lookup index + * @search: search object + * @lookup_table: lookup table + * @table_capacity: capacity of the lookup table + * @lookup_index: lookup index [out] + * + * This method tries to find a lookup index for requested items. + * It needs to lock the lookup table before calling this method. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - lookup index doesn't exist for requested hash. + */ +int +ssdfs_btree_node_find_lookup_index_nolock(struct ssdfs_btree_search *search, + __le64 *lookup_table, + int table_capacity, + u16 *lookup_index) +{ + u64 hash; + u64 lower_bound, upper_bound; + int index; + int lower_index, upper_index; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search || !lookup_table || !lookup_index); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "lookup_table %p, table_capacity %d\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + lookup_table, table_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + *lookup_index = U16_MAX; + hash = search->request.start.hash; + + if (hash >= U64_MAX) { + SSDFS_ERR("invalid hash for search\n"); + return -ERANGE; + } + + lower_index = 0; + lower_bound = le64_to_cpu(lookup_table[lower_index]); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("lower_index %d, lower_bound %llu\n", + lower_index, lower_bound); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (lower_bound >= U64_MAX) { + err = -ENODATA; + *lookup_index = lower_index; + goto finish_index_search; + } else if (hash < lower_bound) { + err = -ENODATA; + *lookup_index = lower_index; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hash %llx < lower_bound %llx\n", + hash, lower_bound); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_index_search; + } else if (hash == lower_bound) { + err = -EEXIST; + *lookup_index = lower_index; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hash %llx == lower_bound %llx\n", + hash, lower_bound); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_index_search; + } + + upper_index = table_capacity - 1; + upper_bound = le64_to_cpu(lookup_table[upper_index]); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("upper_index %d, upper_bound %llu\n", + upper_index, upper_bound); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (upper_bound >= U64_MAX) { + /* + * continue to search + */ + } else if (hash == upper_bound) { + err = -EEXIST; + *lookup_index = upper_index; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hash %llx == upper_bound %llx\n", + hash, upper_bound); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_index_search; + } else if (hash > upper_bound) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hash %llx > upper_bound %llx\n", + hash, upper_bound); +#endif /* CONFIG_SSDFS_DEBUG */ + *lookup_index = upper_index; + goto finish_index_search; + } + + do { + int diff = upper_index - lower_index; + + index = lower_index + (diff / 2); + + lower_bound = le64_to_cpu(lookup_table[index]); + upper_bound = le64_to_cpu(lookup_table[index + 1]); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index %d, lower_index %d, upper_index %d, " + "lower_bound %llx, upper_bound %llx\n", + index, lower_index, upper_index, + lower_bound, upper_bound); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (lower_bound >= U64_MAX) + upper_index = index; + else if (hash < lower_bound) + upper_index = index; + else if (hash == lower_bound) { + err = -EEXIST; + *lookup_index = index; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hash %llx == lower_bound %llx\n", + hash, lower_bound); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_index_search; + } + + if (lower_bound < hash && upper_bound >= U64_MAX) { + err = 0; + *lookup_index = index; + goto finish_index_search; + } else if (lower_bound < hash && hash < upper_bound) { + err = 0; + lower_index = index; + } else if (hash == upper_bound) { + err = -EEXIST; + *lookup_index = index; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hash %llx == upper_bound %llx\n", + hash, upper_bound); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_index_search; + } else if (hash > upper_bound) + lower_index = index; + } while ((upper_index - lower_index) > 1); + + if ((upper_index - lower_index) > 1) { + err = -ERANGE; + SSDFS_ERR("lower_index %d, upper_index %d\n", + lower_index, upper_index); + goto finish_index_search; + } + + *lookup_index = lower_index; + +finish_index_search: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("lookup_index %u\n", *lookup_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (err == -EEXIST) { + /* index found */ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(*lookup_index >= table_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + return err; +} + +/* + * __ssdfs_extract_range_by_lookup_index() - extract a range of items + * @node: pointer on node object + * @lookup_index: lookup index for requested range + * @lookup_table_capacity: maximal number of items in lookup table + * @item_size: size of item in bytes + * @search: pointer on search request object + * @check_item: specialized method of checking item + * @prepare_buffer: specialized method of buffer preparing + * @get_hash_range: specialized method of getting hash range + * + * This method tries to extract a range of items from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - requested range is out of the node. + */ +int __ssdfs_extract_range_by_lookup_index(struct ssdfs_btree_node *node, + u16 lookup_index, + int lookup_table_capacity, + size_t item_size, + struct ssdfs_btree_search *search, + ssdfs_check_found_item check_item, + ssdfs_prepare_result_buffer prepare_buffer, + ssdfs_extract_found_item extract_item) +{ + DEFINE_WAIT(wait); + struct ssdfs_fs_info *fsi; + struct ssdfs_state_bitmap *bmap; + u16 index, found_index; + u16 items_count; + u32 area_offset; + u32 area_size; + u32 item_offset; + u64 start_hash = U64_MAX; + u64 end_hash = U64_MAX; + int page_index; + struct page *page; + void *kaddr; + unsigned long start_index; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !node->tree->fsi || !search); + BUG_ON(lookup_index >= lookup_table_capacity); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %d, node_id %u, height %d, " + "lookup_index %u\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), + lookup_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + down_read(&node->header_lock); + area_offset = node->items_area.offset; + area_size = node->items_area.area_size; + items_count = node->items_area.items_count; + up_read(&node->header_lock); + + if (items_count == 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u is empty\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + + found_index = U16_MAX; + index = __ssdfs_convert_lookup2item_index(lookup_index, + node->node_size, + item_size, + lookup_table_capacity); + if (index >= items_count) { + err = -ERANGE; + SSDFS_ERR("index %u >= items_count %u\n", + index, items_count); + return err; + } + + down_read(&node->full_lock); + + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_LOCK_BMAP]; + + if (found_index != U16_MAX) + goto try_extract_range; + + for (; index < items_count; index++) { + item_offset = (u32)index * item_size; + if (item_offset >= area_size) { + err = -ERANGE; + SSDFS_ERR("item_offset %u >= area_size %u\n", + item_offset, area_size); + goto finish_extract_range; + } + + item_offset += area_offset; + if (item_offset >= node->node_size) { + err = -ERANGE; + SSDFS_ERR("item_offset %u >= node_size %u\n", + item_offset, node->node_size); + goto finish_extract_range; + } + + page_index = item_offset >> PAGE_SHIFT; + + if (page_index > 0) + item_offset %= page_index * PAGE_SIZE; + + if (page_index >= pagevec_count(&node->content.pvec)) { + err = -ERANGE; + SSDFS_ERR("invalid page_index: " + "index %d, pvec_size %u\n", + page_index, + pagevec_count(&node->content.pvec)); + goto finish_extract_range; + } + + page = node->content.pvec.pages[page_index]; + + down_read(&node->bmap_array.lock); + +try_lock_checking_item: + spin_lock(&bmap->lock); + + start_index = node->bmap_array.item_start_bit + index; + err = bitmap_allocate_region(bmap->ptr, + (unsigned int)start_index, 0); + if (err == -EBUSY) { + err = 0; + prepare_to_wait(&node->wait_queue, &wait, + TASK_UNINTERRUPTIBLE); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("waiting unlocked state of item %lu\n", + start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_unlock(&bmap->lock); + + schedule(); + finish_wait(&node->wait_queue, &wait); + goto try_lock_checking_item; + } + + spin_unlock(&bmap->lock); + + up_read(&node->bmap_array.lock); + + if (err) { + SSDFS_ERR("fail to lock: index %lu, err %d\n", + start_index, err); + goto finish_extract_range; + } + + if ((item_offset + item_size) > PAGE_SIZE) { + err = -ERANGE; + SSDFS_ERR("invalid offset: " + "item_offset %u, item_size %zu\n", + item_offset, item_size); + goto finish_extract_range; + } + + kaddr = kmap_local_page(page); + err = check_item(fsi, search, + (u8 *)kaddr + item_offset, + index, + &start_hash, &end_hash, + &found_index); + kunmap_local(kaddr); + + down_read(&node->bmap_array.lock); + spin_lock(&bmap->lock); + bitmap_clear(bmap->ptr, (unsigned int)start_index, 1); + spin_unlock(&bmap->lock); + up_read(&node->bmap_array.lock); + + wake_up_all(&node->wait_queue); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("requested (start_hash %llx, end_hash %llx), " + "start_hash %llx, end_hash %llx, " + "index %u, found_index %u, err %d\n", + search->request.start.hash, search->request.end.hash, + start_hash, end_hash, index, found_index, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (err == -EAGAIN) + continue; + else if (unlikely(err)) + goto finish_extract_range; + else if (found_index != U16_MAX) + break; + } + + if (err == -EAGAIN) { + if (found_index >= U16_MAX) { + SSDFS_ERR("fail to find index\n"); + goto finish_extract_range; + } else if (found_index == items_count) { + err = 0; + found_index = items_count - 1; + } else { + err = -ERANGE; + SSDFS_ERR("fail to find index\n"); + goto finish_extract_range; + } + } + + if (found_index > items_count) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found_index %u, items_count %u\n", + found_index, items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = -ENODATA; + search->result.state = + SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND; + search->result.err = -ENODATA; + search->result.start_index = items_count; + search->result.count = 1; + goto finish_extract_range; + } else if (is_btree_search_contains_new_item(search)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found_index %u, items_count %u\n", + found_index, items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = -ENODATA; + search->result.state = + SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND; + search->result.err = -ENODATA; + search->result.start_index = found_index; + search->result.count = 0; + goto finish_extract_range; + } else { + err = prepare_buffer(search, found_index, + search->request.start.hash, + search->request.end.hash, + items_count, item_size); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare buffers: " + "requested (start_hash %llx, end_hash %llx), " + "found_index %u, start_hash %llx, " + "end_hash %llx, items_count %u, " + "item_size %zu, err %d\n", + search->request.start.hash, + search->request.end.hash, + found_index, start_hash, end_hash, + items_count, item_size, err); + goto finish_extract_range; + } + + search->result.start_index = found_index; + search->result.count = 0; + } + +try_extract_range: + for (; found_index < items_count; found_index++) { + item_offset = (u32)found_index * item_size; + if (item_offset >= area_size) { + err = -ERANGE; + SSDFS_ERR("item_offset %u >= area_size %u\n", + item_offset, area_size); + goto finish_extract_range; + } + + item_offset += area_offset; + if (item_offset >= node->node_size) { + err = -ERANGE; + SSDFS_ERR("item_offset %u >= node_size %u\n", + item_offset, node->node_size); + goto finish_extract_range; + } + + page_index = item_offset >> PAGE_SHIFT; + + if (page_index > 0) + item_offset %= page_index * PAGE_SIZE; + + if (page_index >= pagevec_count(&node->content.pvec)) { + err = -ERANGE; + SSDFS_ERR("invalid page_index: " + "index %d, pvec_size %u\n", + page_index, + pagevec_count(&node->content.pvec)); + goto finish_extract_range; + } + + page = node->content.pvec.pages[page_index]; + + down_read(&node->bmap_array.lock); + +try_lock_extracting_item: + spin_lock(&bmap->lock); + + start_index = node->bmap_array.item_start_bit + found_index; + err = bitmap_allocate_region(bmap->ptr, + (unsigned int)start_index, 0); + if (err == -EBUSY) { + err = 0; + prepare_to_wait(&node->wait_queue, &wait, + TASK_UNINTERRUPTIBLE); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("waiting unlocked state of item %lu\n", + start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_unlock(&bmap->lock); + + schedule(); + finish_wait(&node->wait_queue, &wait); + goto try_lock_extracting_item; + } + + spin_unlock(&bmap->lock); + + up_read(&node->bmap_array.lock); + + if (err) { + SSDFS_ERR("fail to lock: index %lu, err %d\n", + start_index, err); + goto finish_extract_range; + } + + if ((item_offset + item_size) > PAGE_SIZE) { + err = -ERANGE; + SSDFS_ERR("invalid offset: " + "item_offset %u, item_size %zu\n", + item_offset, item_size); + goto finish_extract_range; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("item_offset %u, item_size %zu\n", + item_offset, item_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + kaddr = kmap_local_page(page); + err = extract_item(fsi, search, item_size, + (u8 *)kaddr + item_offset, + &start_hash, &end_hash); + kunmap_local(kaddr); + + down_read(&node->bmap_array.lock); + spin_lock(&bmap->lock); + bitmap_clear(bmap->ptr, (unsigned int)start_index, 1); + spin_unlock(&bmap->lock); + up_read(&node->bmap_array.lock); + + wake_up_all(&node->wait_queue); + + if (err == -ENODATA && search->result.count > 0) { + err = 0; + search->result.err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("stop search: " + "found_index %u, start_hash %llx, " + "end_hash %llx, search->request.end.hash %llx\n", + found_index, start_hash, end_hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + goto finish_extract_range; + } else if (unlikely(err)) { + SSDFS_ERR("fail to extract item: " + "kaddr %p, item_offset %u, err %d\n", + kaddr, item_offset, err); + goto finish_extract_range; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found_index %u, start_hash %llx, " + "end_hash %llx, search->request.end.hash %llx\n", + found_index, start_hash, end_hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search->request.end.hash <= end_hash) + break; + } + + if (search->request.end.hash > end_hash) + err = -EAGAIN; + +finish_extract_range: + up_read(&node->full_lock); + + if (err == -ENODATA || err == -EAGAIN) { + /* + * do nothing + */ + search->result.err = err; + } else if (unlikely(err)) { + search->result.state = SSDFS_BTREE_SEARCH_FAILURE; + search->result.err = err; + } + + return err; +} + +/* + * ssdfs_calculate_item_offset() - calculate item's offset + * @node: pointer on node object + * @area_offset: area offset in bytes from the node's beginning + * @area_size: area size in bytes + * @index: item's index in the node + * @item_size: size of item in bytes + * @page_index: index of a page in the node [out] + * @item_offset: offset in bytes from a page's beginning + * + * This method tries to calculate item's offset in a page. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_calculate_item_offset(struct ssdfs_btree_node *node, + u32 area_offset, u32 area_size, + int index, size_t item_size, + int *page_index, + u32 *item_offset) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !page_index || !item_offset); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, area_offset %u, area_size %u, " + "item_size %zu, index %d\n", + node->node_id, area_offset, area_size, + item_size, index); +#endif /* CONFIG_SSDFS_DEBUG */ + + *item_offset = (u32)index * item_size; + if (*item_offset >= area_size) { + SSDFS_ERR("item_offset %u >= area_size %u\n", + *item_offset, area_size); + return -ERANGE; + } + + *item_offset += area_offset; + if (*item_offset >= node->node_size) { + SSDFS_ERR("item_offset %u >= node_size %u\n", + *item_offset, node->node_size); + return -ERANGE; + } + + *page_index = *item_offset >> PAGE_SHIFT; + if (*page_index >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("invalid page_index: " + "index %d, pvec_size %u\n", + *page_index, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + if (*page_index != 0) + *item_offset %= PAGE_SIZE; + + return 0; +} From patchwork Sat Feb 25 01:09:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151959 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C392C7EE32 for ; Sat, 25 Feb 2023 01:19:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229760AbjBYBTr (ORCPT ); Fri, 24 Feb 2023 20:19:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49424 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229761AbjBYBRl (ORCPT ); Fri, 24 Feb 2023 20:17:41 -0500 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5AACE22DD9 for ; Fri, 24 Feb 2023 17:17:33 -0800 (PST) Received: by mail-oi1-x22e.google.com with SMTP id e21so830339oie.1 for ; Fri, 24 Feb 2023 17:17:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0gV0dLF2EK75xk2PFWcxPsanjOEOWuwferiU4ssff4Y=; b=sl4LuDOlRi076O4X0SFMTEQjQExbYl9tzNFR70+9aBWYJo4Fqcs0xQEvaK10DH2BNU /xAlaAPXoG1LUlsXsPsxwcBPAnWCpcOvQay2Nux8PW+wopRAm9bxOm6KIfAe20yPfcMb 1Y6WuzQ/8ma2zndVbdjW9Fut02ICtlVnbnK8dml7yWd1T2OMQlvTn5EGY4JFTs4dSFp9 ZX0OFRip1ZMccTmw3PySQNuaFmHrI+hsqPKBeyuCCuojCGDL/JAusbcQUDn9JbhQplgE vhzx72wh0ua48qF59AU3LJOONLjHC5d0zABuez1xJ9wgxufnT/dMo+zsIhnWcnresxkl zRag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0gV0dLF2EK75xk2PFWcxPsanjOEOWuwferiU4ssff4Y=; b=vxDo3AGCadr1qn/6y0If8gokzzSQfoDg8pxrBStdApVl8cXOgxE4d+rBAzf6dWRjas ZVVrRgWKe4WlZnbJGcg/qZrujjXX0aLN8eduKM+mvUM8kCFhFh/WGYGZMCYLlNyzk2AB I5keG0ZGpbyVqhQPx0srOz6Dbo4GK4JsoR+2MsNel34AL3CerOQjBAdlAlu7eb8AmJP7 YQgbdpMU8rjTRhp9wcbvzrpUsRCkcMezYwvZ5TvQibAykBhd0zCAu36eKv3y3R4Ej9QK ue736kWWOKd+ycSkF5L4f3Bn4w4IHRm1Q1OjMmZoAeo8mZP9kvT20KM4G/7bq+Py8pcO W/hg== X-Gm-Message-State: AO0yUKX+09y+aYjmg4y0qYEjE9U3lSxbS40Opn2k0d67rT+a3JONqezJ +Z5EgpGE5cRjH5Z6wJQmMWhoReRzrsq7VMMC X-Google-Smtp-Source: AK7set/tpVNk6kuuKmSqt+bFMyx7ooAsr3bcR620pcWN2ya4KPeR4cdfOiNNfizAL7eLGyv7/exlNQ== X-Received: by 2002:aca:f19:0:b0:368:a6f6:bfdf with SMTP id 25-20020aca0f19000000b00368a6f6bfdfmr4157513oip.20.1677287851647; Fri, 24 Feb 2023 17:17:31 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:30 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 55/76] ssdfs: range operations of b-tree node Date: Fri, 24 Feb 2023 17:09:06 -0800 Message-Id: <20230225010927.813929-56-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Hybrid b-tree node plays very special role in b-tree architecture. First of all, hybrid node includes index and item areas. The goal is to combine in one node index and items records for the case of small b-trees. B-tree starts by creation of root node that can contain only two index keys. It means that root node keeps knowledge about two nodes only. At first, b-tree logic creates leaf nodes until the root node will contain two index keys for leaf nodes. Next step implies the creation of hybrid node that contains index records for two existing leaf nodes. The root node contains index key for hybrid node. Now hybrid node becomes to play. New items will be added into items area of hybrid node until this area becomes completely full. B-tree logic allocates new leaf node, all existing items in hybrid node are moved into newly created leaf node, and index key is added into hybrid node's index area. Such operation repeat multiple times until index area of hybrid node becomes completely full. Now index area is resized by increasing twice in size after moving existing items into newly created node. Finally, hybrid node will be converted into index node. Important point that small b-tree has one hybrid node with index keys and items instead of two nodes (index + leaf). Hybrid node combines as index as items operations that makes this type of node by "hot" type of metadata and it provides the way to isolate/distinguish hot, warm, and cold data. As a result, it provides the way to make b-tree more compact by decreasing number of nodes, makes GC operations not neccessary because update operations of "hot" hybrid node(s) makes migration scheme efficient, and decrease write amplification. Hybrid nodes require range operations that are represented by: (1) extract_range - extract range of items (or all items) from node (2) insert_range - insert range of items into node (3) delete_range - remove range of items from node Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/btree_node.c | 3007 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 3007 insertions(+) diff --git a/fs/ssdfs/btree_node.c b/fs/ssdfs/btree_node.c index 8d939451de05..45a992064154 100644 --- a/fs/ssdfs/btree_node.c +++ b/fs/ssdfs/btree_node.c @@ -13919,3 +13919,3010 @@ int ssdfs_calculate_item_offset(struct ssdfs_btree_node *node, return 0; } + +/* + * __ssdfs_shift_range_right() - shift the items' range to the right + * @node: pointer on node object + * @area_offset: area offset in bytes from the node's beginning + * @area_size: area size in bytes + * @item_size: size of item in bytes + * @start_index: starting index of the range + * @range_len: number of items in the range + * @shift: number of position in the requested shift + * + * This method tries to shift the range of items to the right + * direction. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_shift_range_right(struct ssdfs_btree_node *node, + u32 area_offset, u32 area_size, + size_t item_size, + u16 start_index, u16 range_len, + u16 shift) +{ + int page_index1, page_index2; + int src_index, dst_index; + struct page *page1, *page2; + u32 item_offset1, item_offset2; + void *kaddr; + u32 moved_items = 0; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, area_offset %u, area_size %u, " + "item_size %zu, start_index %u, " + "range_len %u, shift %u\n", + node->node_id, area_offset, area_size, + item_size, start_index, range_len, shift); +#endif /* CONFIG_SSDFS_DEBUG */ + + src_index = start_index + range_len - 1; + dst_index = src_index + shift; + + if ((dst_index * item_size) > area_size) { + SSDFS_ERR("shift is out of area: " + "src_index %d, shift %u, " + "item_size %zu, area_size %u\n", + src_index, shift, item_size, area_size); + return -ERANGE; + } + + do { + u32 offset_diff; + u32 index_diff; + int moving_items; + u32 moving_bytes; + + item_offset2 = (u32)dst_index * item_size; + if (item_offset2 >= area_size) { + SSDFS_ERR("item_offset %u >= area_size %u\n", + item_offset2, area_size); + return -ERANGE; + } + + item_offset2 += area_offset; + if (item_offset2 >= node->node_size) { + SSDFS_ERR("item_offset %u >= node_size %u\n", + item_offset2, node->node_size); + return -ERANGE; + } + + page_index2 = item_offset2 >> PAGE_SHIFT; + if (page_index2 >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("invalid page_index: " + "index %d, pvec_size %u\n", + page_index2, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + if (page_index2 == 0) + offset_diff = item_offset2 - area_offset; + else + offset_diff = item_offset2 - (page_index2 * PAGE_SIZE); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(offset_diff % item_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + index_diff = offset_diff / item_size; + index_diff++; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(index_diff >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (index_diff < shift) { + /* + * The shift moves data out of the node. + * This is the reason that index_diff is + * lesser than shift. Keep the index_diff + * the same. + */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index_diff %u, shift %u\n", + index_diff, shift); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (index_diff == shift) { + /* + * It's the case when destination page + * has no items at all. Otherwise, + * it is the case of presence of free + * space in the begin of the page is equal + * to the @shift. This space was prepared + * by previous move operation. Simply, + * keep the index_diff the same. + */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index_diff %u, shift %u\n", + index_diff, shift); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + /* + * It needs to know the number of items + * from the page's beginning or area's beginning. + * So, excluding the shift from the account. + */ + index_diff -= shift; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(moved_items > range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + moving_items = range_len - moved_items; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(moving_items < 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + moving_items = min_t(int, moving_items, (int)index_diff); + + if (moving_items == 0) { + SSDFS_WARN("no items for moving\n"); + return -ERANGE; + } + + moving_bytes = moving_items * item_size; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(moving_items >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + src_index -= moving_items - 1; + dst_index = src_index + shift; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("moving_items %d, src_index %d, dst_index %d\n", + moving_items, src_index, dst_index); + + BUG_ON(start_index > src_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_calculate_item_offset(node, area_offset, area_size, + src_index, item_size, + &page_index1, &item_offset1); + if (unlikely(err)) { + SSDFS_ERR("fail to calculate item's offset: " + "item_index %d, err %d\n", + src_index, err); + return err; + } + + err = ssdfs_calculate_item_offset(node, area_offset, area_size, + dst_index, item_size, + &page_index2, &item_offset2); + if (unlikely(err)) { + SSDFS_ERR("fail to calculate item's offset: " + "item_index %d, err %d\n", + dst_index, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_offset1 %u, item_offset2 %u\n", + item_offset1, item_offset2); + + if ((item_offset1 + moving_bytes) > PAGE_SIZE) { + SSDFS_WARN("invalid offset: " + "item_offset1 %u, moving_bytes %u\n", + item_offset1, moving_bytes); + return -ERANGE; + } + + if ((item_offset2 + moving_bytes) > PAGE_SIZE) { + SSDFS_WARN("invalid offset: " + "item_offset2 %u, moving_bytes %u\n", + item_offset2, moving_bytes); + return -ERANGE; + } + + SSDFS_DBG("pvec_size %u, page_index1 %d, item_offset1 %u, " + "page_index2 %d, item_offset2 %u, " + "moving_bytes %u\n", + pagevec_count(&node->content.pvec), + page_index1, item_offset1, + page_index2, item_offset2, + moving_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (page_index1 != page_index2) { + page1 = node->content.pvec.pages[page_index1]; + page2 = node->content.pvec.pages[page_index2]; + ssdfs_lock_page(page1); + ssdfs_lock_page(page2); + err = ssdfs_memmove_page(page2, item_offset2, PAGE_SIZE, + page1, item_offset1, PAGE_SIZE, + moving_bytes); + ssdfs_unlock_page(page1); + ssdfs_unlock_page(page2); + + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", err); + return err; + } + } else { + page1 = node->content.pvec.pages[page_index1]; + ssdfs_lock_page(page1); + kaddr = kmap_local_page(page1); + err = ssdfs_memmove(kaddr, item_offset2, PAGE_SIZE, + kaddr, item_offset1, PAGE_SIZE, + moving_bytes); + flush_dcache_page(page1); + kunmap_local(kaddr); + ssdfs_unlock_page(page1); + + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", err); + return err; + } + } + + src_index--; + dst_index--; + moved_items += moving_items; + } while (src_index >= start_index); + + if (moved_items != range_len) { + SSDFS_ERR("moved_items %u != range_len %u\n", + moved_items, range_len); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_shift_range_right2() - shift the items' range to the right + * @node: pointer on node object + * @area: area descriptor + * @item_size: size of item in bytes + * @start_index: starting index of the range + * @range_len: number of items in the range + * @shift: number of position in the requested shift + * + * This method tries to shift the range of items to the right + * direction. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_shift_range_right2(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_index_area *area, + size_t item_size, + u16 start_index, u16 range_len, + u16 shift) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, item_size %zu, " + "start_index %u, range_len %u, shift %u\n", + node->node_id, item_size, + start_index, range_len, shift); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (start_index > area->index_count) { + SSDFS_ERR("invalid request: " + "start_index %u, index_count %u\n", + start_index, area->index_count); + return -ERANGE; + } else if (start_index == area->index_count) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_index %u == index_count %u\n", + start_index, area->index_count); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } else if ((start_index + range_len) > area->index_count) { + SSDFS_ERR("range is out of existing items: " + "start_index %u, range_len %u, index_count %u\n", + start_index, range_len, area->index_count); + return -ERANGE; + } else if ((start_index + range_len + shift) > area->index_capacity) { + SSDFS_ERR("shift is out of capacity: " + "start_index %u, range_len %u, " + "shift %u, index_capacity %u\n", + start_index, range_len, + shift, area->index_capacity); + return -ERANGE; + } + + return __ssdfs_shift_range_right(node, area->offset, area->area_size, + item_size, start_index, range_len, + shift); +} + +/* + * ssdfs_shift_range_right() - shift the items' range to the right + * @node: pointer on node object + * @area: items area descriptor + * @item_size: size of item in bytes + * @start_index: starting index of the range + * @range_len: number of items in the range + * @shift: number of position in the requested shift + * + * This method tries to shift the range of items to the right + * direction. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_shift_range_right(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + size_t item_size, + u16 start_index, u16 range_len, + u16 shift) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, item_size %zu, " + "start_index %u, range_len %u, shift %u\n", + node->node_id, item_size, + start_index, range_len, shift); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (start_index > area->items_count) { + SSDFS_ERR("invalid request: " + "start_index %u, items_count %u\n", + start_index, area->items_count); + return -ERANGE; + } else if (start_index == area->items_count) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_index %u == items_count %u\n", + start_index, area->items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } else if ((start_index + range_len) > area->items_count) { + SSDFS_ERR("range is out of existing items: " + "start_index %u, range_len %u, items_count %u\n", + start_index, range_len, area->items_count); + return -ERANGE; + } else if ((start_index + range_len + shift) > area->items_capacity) { + SSDFS_ERR("shift is out of capacity: " + "start_index %u, range_len %u, " + "shift %u, items_capacity %u\n", + start_index, range_len, + shift, area->items_capacity); + return -ERANGE; + } + + return __ssdfs_shift_range_right(node, area->offset, area->area_size, + item_size, start_index, range_len, + shift); +} + +/* + * __ssdfs_shift_range_left() - shift the items' range to the left + * @node: pointer on node object + * @area_offset: area offset in bytes from the node's beginning + * @area_size: area size in bytes + * @item_size: size of item in bytes + * @start_index: starting index of the range + * @range_len: number of items in the range + * @shift: number of position in the requested shift + * + * This method tries to shift the range of items to the left + * direction. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_shift_range_left(struct ssdfs_btree_node *node, + u32 area_offset, u32 area_size, + size_t item_size, + u16 start_index, u16 range_len, + u16 shift) +{ + int page_index1, page_index2; + int src_index, dst_index; + struct page *page1, *page2; + u32 item_offset1, item_offset2; + void *kaddr; + u16 moved_items = 0; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, area_offset %u, area_size %u, " + "item_size %zu, start_index %u, " + "range_len %u, shift %u\n", + node->node_id, area_offset, area_size, + item_size, start_index, range_len, shift); +#endif /* CONFIG_SSDFS_DEBUG */ + + src_index = start_index; + dst_index = start_index - shift; + + do { + u32 range_len1, range_len2; + u32 moving_items; + u32 moving_bytes; + + if (moved_items >= range_len) { + SSDFS_ERR("moved_items %u >= range_len %u\n", + moved_items, range_len); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("src_index %d, dst_index %d\n", + src_index, dst_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + item_offset1 = (u32)src_index * item_size; + if (item_offset1 >= area_size) { + SSDFS_ERR("item_offset %u >= area_size %u\n", + item_offset1, area_size); + return -ERANGE; + } + + item_offset1 += area_offset; + if (item_offset1 >= node->node_size) { + SSDFS_ERR("item_offset %u >= node_size %u\n", + item_offset1, node->node_size); + return -ERANGE; + } + + page_index1 = item_offset1 >> PAGE_SHIFT; + if (page_index1 >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("invalid page_index: " + "index %d, pvec_size %u\n", + page_index1, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + if (page_index1 > 0) + item_offset1 %= page_index1 * PAGE_SIZE; + + item_offset2 = (u32)dst_index * item_size; + if (item_offset2 >= area_size) { + SSDFS_ERR("item_offset %u >= area_size %u\n", + item_offset2, area_size); + return -ERANGE; + } + + item_offset2 += area_offset; + if (item_offset2 >= node->node_size) { + SSDFS_ERR("item_offset %u >= node_size %u\n", + item_offset2, node->node_size); + return -ERANGE; + } + + page_index2 = item_offset2 >> PAGE_SHIFT; + if (page_index2 >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("invalid page_index: " + "index %d, pvec_size %u\n", + page_index2, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + if (page_index2 > 0) + item_offset2 %= page_index2 * PAGE_SIZE; + + range_len1 = (PAGE_SIZE - item_offset1) / item_size; + range_len2 = (PAGE_SIZE - item_offset2) / item_size; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(range_len1 == 0); + BUG_ON(range_len2 == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + moving_items = min_t(u32, range_len1, range_len2); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(moved_items > range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + moving_items = min_t(u32, moving_items, + (u32)range_len - moved_items); + + if (moving_items == 0) { + SSDFS_WARN("no items for moving\n"); + return -ERANGE; + } + + moving_bytes = moving_items * item_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page_index1 %d, item_offset1 %u, " + "page_index2 %d, item_offset2 %u\n", + page_index1, item_offset1, + page_index2, item_offset2); + + if ((item_offset1 + moving_bytes) > PAGE_SIZE) { + SSDFS_WARN("invalid offset: " + "item_offset1 %u, moving_bytes %u\n", + item_offset1, moving_bytes); + return -ERANGE; + } + + if ((item_offset2 + moving_bytes) > PAGE_SIZE) { + SSDFS_WARN("invalid offset: " + "item_offset2 %u, moving_bytes %u\n", + item_offset2, moving_bytes); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + if (page_index1 != page_index2) { + page1 = node->content.pvec.pages[page_index1]; + page2 = node->content.pvec.pages[page_index2]; + err = ssdfs_memmove_page(page2, item_offset2, PAGE_SIZE, + page1, item_offset1, PAGE_SIZE, + moving_bytes); + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", err); + return err; + } + } else { + page1 = node->content.pvec.pages[page_index1]; + kaddr = kmap_local_page(page1); + err = ssdfs_memmove(kaddr, item_offset2, PAGE_SIZE, + kaddr, item_offset1, PAGE_SIZE, + moving_bytes); + flush_dcache_page(page1); + kunmap_local(kaddr); + + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", err); + return err; + } + } + + src_index += moving_items; + dst_index += moving_items; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("moving_items %u, src_index %d, dst_index %d\n", + moving_items, src_index, dst_index); + + BUG_ON(moving_items >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + moved_items += moving_items; + } while (moved_items < range_len); + + return 0; +} + +/* + * ssdfs_shift_range_left2() - shift the items' range to the left + * @node: pointer on node object + * @area: area descriptor + * @item_size: size of item in bytes + * @start_index: starting index of the range + * @range_len: number of items in the range + * @shift: number of position in the requested shift + * + * This method tries to shift the range of items to the left + * direction. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_shift_range_left2(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_index_area *area, + size_t item_size, + u16 start_index, u16 range_len, + u16 shift) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, item_size %zu, " + "start_index %u, range_len %u, shift %u\n", + node->node_id, item_size, + start_index, range_len, shift); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (start_index > area->index_count) { + SSDFS_ERR("invalid request: " + "start_index %u, index_count %u\n", + start_index, area->index_count); + return -ERANGE; + } else if (start_index == area->index_count) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_index %u == index_count %u\n", + start_index, area->index_count); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } else if ((start_index + range_len) > area->index_count) { + SSDFS_ERR("range is out of existing items: " + "start_index %u, range_len %u, index_count %u\n", + start_index, range_len, area->index_count); + return -ERANGE; + } else if (shift > start_index) { + SSDFS_ERR("shift is out of node: " + "start_index %u, shift %u\n", + start_index, shift); + return -ERANGE; + } + + return __ssdfs_shift_range_left(node, area->offset, area->area_size, + item_size, start_index, range_len, + shift); +} + +/* + * ssdfs_shift_range_left() - shift the items' range to the left + * @node: pointer on node object + * @area: items area descriptor + * @item_size: size of item in bytes + * @start_index: starting index of the range + * @range_len: number of items in the range + * @shift: number of position in the requested shift + * + * This method tries to shift the range of items to the left + * direction. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_shift_range_left(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + size_t item_size, + u16 start_index, u16 range_len, + u16 shift) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, item_size %zu, " + "start_index %u, range_len %u, shift %u\n", + node->node_id, item_size, + start_index, range_len, shift); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (start_index >= area->items_capacity) { + SSDFS_ERR("invalid request: " + "start_index %u, items_capacity %u\n", + start_index, area->items_capacity); + return -ERANGE; + } else if ((start_index + range_len) > area->items_capacity) { + SSDFS_ERR("range is out of capacity: " + "start_index %u, range_len %u, items_capacity %u\n", + start_index, range_len, area->items_capacity); + return -ERANGE; + } else if (shift > start_index) { + SSDFS_ERR("shift is out of node: " + "start_index %u, shift %u\n", + start_index, shift); + return -ERANGE; + } + + return __ssdfs_shift_range_left(node, area->offset, area->area_size, + item_size, start_index, range_len, + shift); +} + +/* + * __ssdfs_shift_memory_range_right() - shift the memory range to the right + * @node: pointer on node object + * @area_offset: area offset in bytes from the node's beginning + * @area_size: area size in bytes + * @offset: offset from the area's beginning to the range start + * @range_len: length of the range in bytes + * @shift: value of the shift in bytes + * + * This method tries to move the memory range (@offset; @range_len) + * in the @node for the @shift in bytes to the right. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_shift_memory_range_right(struct ssdfs_btree_node *node, + u32 area_offset, u32 area_size, + u16 offset, u16 range_len, + u16 shift) +{ + int page_index1, page_index2; + int src_offset, dst_offset; + struct page *page1, *page2; + u32 range_offset1, range_offset2; + void *kaddr; + u32 cur_range; + u32 moved_range = 0; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, area_offset %u, area_size %u, " + "offset %u, range_len %u, shift %u\n", + node->node_id, area_offset, area_size, + offset, range_len, shift); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (((u32)offset + range_len + shift) > (area_offset + area_size)) { + SSDFS_ERR("invalid request: " + "offset %u, range_len %u, shift %u, " + "area_offset %u, area_size %u\n", + offset, range_len, shift, + area_offset, area_size); + return -ERANGE; + } + + src_offset = offset + range_len; + dst_offset = src_offset + shift; + + do { + u32 offset_diff; + u32 moving_range; + + range_offset1 = src_offset; + if (range_offset1 > area_size) { + SSDFS_ERR("range_offset1 %u > area_size %u\n", + range_offset1, area_size); + return -ERANGE; + } + + range_offset1 += area_offset; + if (range_offset1 > node->node_size) { + SSDFS_ERR("range_offset1 %u > node_size %u\n", + range_offset1, node->node_size); + return -ERANGE; + } + + page_index1 = (range_offset1 - 1) >> PAGE_SHIFT; + if (page_index1 >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("invalid page_index: " + "index %d, pvec_size %u\n", + page_index1, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + if (range_len <= moved_range) { + SSDFS_ERR("range_len %u <= moved_range %u\n", + range_len, moved_range); + return -ERANGE; + } + + cur_range = range_len - moved_range; + offset_diff = range_offset1 - (page_index1 * PAGE_SIZE); + + moving_range = min_t(u32, cur_range, offset_diff); + range_offset1 -= moving_range; + + if (page_index1 > 0) + range_offset1 %= page_index1 * PAGE_SIZE; + + if ((range_offset1 + moving_range + shift) > PAGE_SIZE) { + range_offset1 += moving_range - shift; + moving_range = shift; + } + + range_offset2 = range_offset1 + shift; + + if (range_offset2 > area_size) { + SSDFS_ERR("range_offset2 %u > area_size %u\n", + range_offset2, area_size); + return -ERANGE; + } + + page_index2 = range_offset2 >> PAGE_SHIFT; + if (page_index2 >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("invalid page_index: " + "index %d, pvec_size %u\n", + page_index2, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + if (page_index2 > 0) + range_offset2 %= page_index2 * PAGE_SIZE; + +#ifdef CONFIG_SSDFS_DEBUG + if ((range_offset1 + moving_range) > PAGE_SIZE) { + SSDFS_WARN("invalid offset: " + "range_offset1 %u, moving_range %u\n", + range_offset1, moving_range); + return -ERANGE; + } + + if ((range_offset2 + moving_range) > PAGE_SIZE) { + SSDFS_WARN("invalid offset: " + "range_offset2 %u, moving_range %u\n", + range_offset2, moving_range); + return -ERANGE; + } + + SSDFS_DBG("page_index1 %d, page_index2 %d, " + "range_offset1 %u, range_offset2 %u\n", + page_index1, page_index2, + range_offset1, range_offset2); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (page_index1 != page_index2) { + page1 = node->content.pvec.pages[page_index1]; + page2 = node->content.pvec.pages[page_index2]; + err = ssdfs_memmove_page(page2, + range_offset2, PAGE_SIZE, + page1, + range_offset1, PAGE_SIZE, + moving_range); + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", err); + return err; + } + } else { + page1 = node->content.pvec.pages[page_index1]; + kaddr = kmap_local_page(page1); + err = ssdfs_memmove(kaddr, range_offset2, PAGE_SIZE, + kaddr, range_offset1, PAGE_SIZE, + moving_range); + flush_dcache_page(page1); + kunmap_local(kaddr); + + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", err); + return err; + } + } + + src_offset -= moving_range; + dst_offset -= moving_range; + + if (src_offset < 0 || dst_offset < 0) { + SSDFS_ERR("src_offset %d, dst_offset %d\n", + src_offset, dst_offset); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(moving_range >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + moved_range += moving_range; + } while (src_offset > offset); + + if (moved_range != range_len) { + SSDFS_ERR("moved_range %u != range_len %u\n", + moved_range, range_len); + return -ERANGE; + } + + if (src_offset != offset) { + SSDFS_ERR("src_offset %d != offset %u\n", + src_offset, offset); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_shift_memory_range_right() - shift the memory range to the right + * @node: pointer on node object + * @area: pointer on the area descriptor + * @offset: offset from the area's beginning to the range start + * @range_len: length of the range in bytes + * @shift: value of the shift in bytes + * + * This method tries to move the memory range (@offset; @range_len) + * in the @node for the @shift in bytes to the right. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_shift_memory_range_right(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + u16 offset, u16 range_len, + u16 shift) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, offset %u, range_len %u, shift %u\n", + node->node_id, offset, range_len, shift); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_shift_memory_range_right(node, + area->offset, area->area_size, + offset, range_len, + shift); +} + +/* + * ssdfs_shift_memory_range_right2() - shift the memory range to the right + * @node: pointer on node object + * @area: pointer on the area descriptor + * @offset: offset from the area's beginning to the range start + * @range_len: length of the range in bytes + * @shift: value of the shift in bytes + * + * This method tries to move the memory range (@offset; @range_len) + * in the @node for the @shift in bytes to the right. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_shift_memory_range_right2(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_index_area *area, + u16 offset, u16 range_len, + u16 shift) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, offset %u, range_len %u, shift %u\n", + node->node_id, offset, range_len, shift); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_shift_memory_range_right(node, + area->offset, area->area_size, + offset, range_len, + shift); +} + +/* + * __ssdfs_shift_memory_range_left() - shift the memory range to the left + * @node: pointer on node object + * @area_offset: offset area from the node's beginning + * @area_size: size of area in bytes + * @offset: offset from the area's beginning to the range start + * @range_len: length of the range in bytes + * @shift: value of the shift in bytes + * + * This method tries to move the memory range (@offset; @range_len) + * in the @node for the @shift in bytes to the left. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_shift_memory_range_left(struct ssdfs_btree_node *node, + u32 area_offset, u32 area_size, + u16 offset, u16 range_len, + u16 shift) +{ + int page_index1, page_index2; + int src_offset, dst_offset; + struct page *page1, *page2; + u32 range_offset1, range_offset2; + void *kaddr; + u32 range_len1, range_len2; + u32 moved_range = 0; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, area_offset %u, area_size %u, " + "offset %u, range_len %u, shift %u\n", + node->node_id, area_offset, area_size, + offset, range_len, shift); +#endif /* CONFIG_SSDFS_DEBUG */ + + if ((offset + range_len) >= (area_offset + area_size)) { + SSDFS_ERR("invalid request: " + "offset %u, range_len %u, " + "area_offset %u, area_size %u\n", + offset, range_len, + area_offset, area_size); + return -ERANGE; + } else if (shift > offset) { + SSDFS_ERR("shift is out of area: " + "offset %u, shift %u\n", + offset, shift); + return -ERANGE; + } + + src_offset = offset; + dst_offset = offset - shift; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("src_offset %u, dst_offset %u\n", + src_offset, dst_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + do { + u32 moving_range; + + range_offset1 = src_offset; + if (range_offset1 > area_size) { + SSDFS_ERR("range_offset1 %u > area_size %u\n", + range_offset1, area_size); + return -ERANGE; + } + + range_offset1 += area_offset; + if (range_offset1 > node->node_size) { + SSDFS_ERR("range_offset1 %u > node_size %u\n", + range_offset1, node->node_size); + return -ERANGE; + } + + page_index1 = range_offset1 >> PAGE_SHIFT; + if (page_index1 >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("invalid page_index: " + "index %d, pvec_size %u\n", + page_index1, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + if (page_index1 > 0) + range_offset1 %= page_index1 * PAGE_SIZE; + + range_offset2 = dst_offset; + if (range_offset2 >= area_size) { + SSDFS_ERR("range_offset2 %u >= area_size %u\n", + range_offset2, area_size); + return -ERANGE; + } + + range_offset2 += area_offset; + if (range_offset2 >= node->node_size) { + SSDFS_ERR("range_offset2 %u >= node_size %u\n", + range_offset2, node->node_size); + return -ERANGE; + } + + page_index2 = range_offset2 >> PAGE_SHIFT; + if (page_index2 >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("invalid page_index: " + "index %d, pvec_size %u\n", + page_index2, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + if (page_index2 > 0) + range_offset2 %= page_index2 * PAGE_SIZE; + + range_len1 = PAGE_SIZE - range_offset1; + range_len2 = PAGE_SIZE - range_offset2; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(range_len1 == 0); + BUG_ON(range_len2 == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + + moving_range = min_t(u32, range_len1, range_len2); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(moved_range > range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + moving_range = min_t(u32, moving_range, + (u32)range_len - moved_range); + + if (moving_range == 0) { + SSDFS_WARN("no items for moving\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + if ((range_offset1 + moving_range) > PAGE_SIZE) { + SSDFS_WARN("invalid offset: " + "range_offset1 %u, moving_range %u\n", + range_offset1, moving_range); + return -ERANGE; + } + + if ((range_offset2 + moving_range) > PAGE_SIZE) { + SSDFS_WARN("invalid offset: " + "range_offset2 %u, moving_range %u\n", + range_offset2, moving_range); + return -ERANGE; + } + + SSDFS_DBG("page_index1 %d, page_index2 %d, " + "range_offset1 %u, range_offset2 %u\n", + page_index1, page_index2, + range_offset1, range_offset2); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (page_index1 != page_index2) { + page1 = node->content.pvec.pages[page_index1]; + page2 = node->content.pvec.pages[page_index2]; + err = ssdfs_memmove_page(page2, + range_offset2, PAGE_SIZE, + page1, + range_offset1, PAGE_SIZE, + moving_range); + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", err); + return err; + } + } else { + page1 = node->content.pvec.pages[page_index1]; + kaddr = kmap_local_page(page1); + err = ssdfs_memmove(kaddr, range_offset2, PAGE_SIZE, + kaddr, range_offset1, PAGE_SIZE, + moving_range); + flush_dcache_page(page1); + kunmap_local(kaddr); + + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", err); + return err; + } + } + + src_offset += moving_range; + dst_offset += moving_range; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(moving_range >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + moved_range += moving_range; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("src_offset %u, dst_offset %u, " + "moving_range %u, moved_range %u\n", + src_offset, dst_offset, + moving_range, moved_range); +#endif /* CONFIG_SSDFS_DEBUG */ + } while (moved_range < range_len); + + if (moved_range != range_len) { + SSDFS_ERR("moved_range %u != range_len %u\n", + moved_range, range_len); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_shift_memory_range_left() - shift the memory range to the left + * @node: pointer on node object + * @area: pointer on the area descriptor + * @offset: offset from the area's beginning to the range start + * @range_len: length of the range in bytes + * @shift: value of the shift in bytes + * + * This method tries to move the memory range (@offset; @range_len) + * in the @node for the @shift in bytes to the left. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_shift_memory_range_left(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + u16 offset, u16 range_len, + u16 shift) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, offset %u, range_len %u, shift %u\n", + node->node_id, offset, range_len, shift); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_shift_memory_range_left(node, + area->offset, area->area_size, + offset, range_len, shift); +} + +/* + * ssdfs_shift_memory_range_left2() - shift the memory range to the left + * @node: pointer on node object + * @area: pointer on the area descriptor + * @offset: offset from the area's beginning to the range start + * @range_len: length of the range in bytes + * @shift: value of the shift in bytes + * + * This method tries to move the memory range (@offset; @range_len) + * in the @node for the @shift in bytes to the left. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_shift_memory_range_left2(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_index_area *area, + u16 offset, u16 range_len, + u16 shift) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, offset %u, range_len %u, shift %u\n", + node->node_id, offset, range_len, shift); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_shift_memory_range_left(node, + area->offset, area->area_size, + offset, range_len, shift); +} + +/* + * ssdfs_generic_insert_range() - insert range of items into the node + * @node: pointer on node object + * @area: items area descriptor + * @item_size: size of item in bytes + * @search: search object + * + * This method tries to insert the range of items into the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_generic_insert_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + size_t item_size, + struct ssdfs_btree_search *search) +{ + int page_index; + int src_index, dst_index; + struct page *page; + u32 item_offset1, item_offset2; + u16 copied_items = 0; + u16 start_index; + unsigned int range_len; + u32 items; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, item_size %zu\n", + node->node_id, item_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.buf_state) { + case SSDFS_BTREE_SEARCH_INLINE_BUFFER: + case SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid buf_state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + if (!search->result.buf) { + SSDFS_ERR("buffer pointer is NULL\n"); + return -ERANGE; + } + + items = search->result.items_in_buffer; + if (search->result.buf_size != (items * item_size)) { + SSDFS_ERR("buf_size %zu, items_in_buffer %u, " + "item_size %zu\n", + search->result.buf_size, + items, item_size); + return -ERANGE; + } + + start_index = search->result.start_index; + range_len = search->result.count; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items %u, start_index %u, range_len %u\n", + items, start_index, range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (range_len == 0) { + SSDFS_WARN("search->request.count == 0\n"); + return -ERANGE; + } + + if (start_index > area->items_count) { + SSDFS_ERR("invalid request: " + "start_index %u, items_count %u\n", + start_index, area->items_count); + return -ERANGE; + } else if ((start_index + range_len) > area->items_capacity) { + SSDFS_ERR("range is out of capacity: " + "start_index %u, range_len %u, items_capacity %u\n", + start_index, range_len, area->items_capacity); + return -ERANGE; + } + + src_index = start_index; + dst_index = 0; + + do { + u32 copying_items; + u32 copying_bytes; + u32 vacant_positions; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_index %u, src_index %d, dst_index %d\n", + start_index, src_index, dst_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + item_offset1 = (u32)src_index * item_size; + if (item_offset1 >= area->area_size) { + SSDFS_ERR("item_offset %u >= area_size %u\n", + item_offset1, area->area_size); + return -ERANGE; + } + + item_offset1 += area->offset; + if (item_offset1 >= node->node_size) { + SSDFS_ERR("item_offset %u >= node_size %u\n", + item_offset1, node->node_size); + return -ERANGE; + } + + page_index = item_offset1 >> PAGE_SHIFT; + if (page_index >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("invalid page_index: " + "index %d, pvec_size %u\n", + page_index, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + if (page_index > 0) + item_offset1 %= page_index * PAGE_SIZE; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(start_index > src_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + copying_items = src_index - start_index; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(copying_items > range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + copying_items = range_len - copying_items; + + if (copying_items == 0) { + SSDFS_WARN("no items for moving\n"); + return -ERANGE; + } + + vacant_positions = PAGE_SIZE - item_offset1; + vacant_positions /= item_size; + + if (vacant_positions == 0) { + SSDFS_WARN("invalid vacant_positions %u\n", + vacant_positions); + return -ERANGE; + } + + copying_items = min_t(u32, copying_items, vacant_positions); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(copying_items >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + copying_bytes = copying_items * item_size; + + item_offset2 = (u32)dst_index * item_size; + if (item_offset2 >= search->result.buf_size) { + SSDFS_ERR("item_offset %u >= buf_size %zu\n", + item_offset2, search->result.buf_size); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("copying_items %u, item_offset1 %u, " + "item_offset2 %u\n", + copying_items, item_offset1, item_offset2); + + if ((item_offset1 + copying_bytes) > PAGE_SIZE) { + SSDFS_WARN("invalid offset: " + "item_offset1 %u, copying_bytes %u\n", + item_offset1, copying_bytes); + return -ERANGE; + } + + if ((item_offset2 + copying_bytes) > search->result.buf_size) { + SSDFS_WARN("invalid offset: " + "item_offset2 %u, copying_bytes %u, " + "result.buf_size %zu\n", + item_offset2, copying_bytes, + search->result.buf_size); + return -ERANGE; + } + + SSDFS_DBG("page_index %d, pvec_size %u, " + "item_offset1 %u, item_offset2 %u, " + "copying_bytes %u, result.buf_size %zu\n", + page_index, + pagevec_count(&node->content.pvec), + item_offset1, item_offset2, + copying_bytes, + search->result.buf_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = node->content.pvec.pages[page_index]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d, " + "flags %#lx, page_index %d\n", + page, page_ref_count(page), + page->flags, page_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_memcpy_to_page(page, + item_offset1, + PAGE_SIZE, + search->result.buf, + item_offset2, + search->result.buf_size, + copying_bytes); + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", err); + return err; + } + + src_index += copying_items; + dst_index += copying_items; + copied_items += copying_items; + } while (copied_items < range_len); + + if (copied_items != range_len) { + SSDFS_ERR("copied_items %u != range_len %u\n", + copied_items, range_len); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_invalidate_root_node_hierarchy() - invalidate the whole hierarchy + * @node: pointer on node object + * + * This method tries to add the whole hierarchy of forks into + * pre-invalid queue of the shared extents tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_invalidate_root_node_hierarchy(struct ssdfs_btree_node *node) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_btree *tree; + struct ssdfs_btree_index_key indexes[SSDFS_BTREE_ROOT_NODE_INDEX_COUNT]; + struct ssdfs_shared_extents_tree *shextree; + u16 index_count; + int index_type = SSDFS_EXTENT_INFO_UNKNOWN_TYPE; + u16 i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !node->tree->fsi); + + SSDFS_DBG("node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree = node->tree; + switch (tree->type) { + case SSDFS_EXTENTS_BTREE: + index_type = SSDFS_EXTENT_INFO_INDEX_DESCRIPTOR; + break; + + case SSDFS_DENTRIES_BTREE: + index_type = SSDFS_EXTENT_INFO_DENTRY_INDEX_DESCRIPTOR; + break; + + case SSDFS_XATTR_BTREE: + index_type = SSDFS_EXTENT_INFO_XATTR_INDEX_DESCRIPTOR; + break; + + case SSDFS_SHARED_DICTIONARY_BTREE: + index_type = SSDFS_EXTENT_INFO_SHDICT_INDEX_DESCRIPTOR; + break; + + default: + SSDFS_ERR("unsupported tree type %#x\n", + tree->type); + return -ERANGE; + } + + if (atomic_read(&node->type) != SSDFS_BTREE_ROOT_NODE) { + SSDFS_ERR("invalid node type %#x\n", + atomic_read(&node->type)); + return -ERANGE; + } + + fsi = tree->fsi; + shextree = fsi->shextree; + + if (!shextree) { + SSDFS_ERR("shared extents tree is absent\n"); + return -ERANGE; + } + + down_write(&node->full_lock); + + for (i = 0; i < SSDFS_BTREE_ROOT_NODE_INDEX_COUNT; i++) { + err = __ssdfs_btree_root_node_extract_index(node, i, + &indexes[i]); + if (unlikely(err)) { + SSDFS_ERR("fail to extract the index: " + "index_id %u, err %d\n", + i, err); + goto finish_invalidate_root_node_hierarchy; + } + } + + down_write(&node->header_lock); + + index_count = node->index_area.index_count; + + if (index_count == 0) { + err = -ERANGE; + SSDFS_ERR("invalid index_count %u\n", + index_count); + goto finish_process_root_node; + } + + for (i = 0; i < index_count; i++) { + if (le64_to_cpu(indexes[i].index.hash) >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("index %u has invalid hash\n", i); + goto finish_process_root_node; + } + + err = ssdfs_btree_root_node_delete_index(node, i); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to delete index: " + "index_id %u, err %d\n", + i, err); + goto finish_process_root_node; + } + + err = ssdfs_shextree_add_pre_invalid_index(shextree, + tree->owner_ino, + index_type, + &indexes[i]); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to pre-invalid index: " + "index_id %u, err %d\n", + i, err); + goto finish_process_root_node; + } + } + +finish_process_root_node: + up_write(&node->header_lock); + +finish_invalidate_root_node_hierarchy: + up_write(&node->full_lock); + + return err; +} + +/* + * __ssdfs_btree_node_extract_range() - extract range of items from node + * @node: pointer on node object + * @start_index: starting index of the range + * @count: count of items in the range + * @search: pointer on search request object + * + * This method tries to extract a range of items from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + * %-ENOMEM - fail to allocate memory. + * %-ENODATA - no such range in the node. + */ +int __ssdfs_btree_node_extract_range(struct ssdfs_btree_node *node, + u16 start_index, u16 count, + size_t item_size, + struct ssdfs_btree_search *search) +{ + DEFINE_WAIT(wait); + struct ssdfs_btree *tree; + struct ssdfs_btree_node_items_area items_area; + struct ssdfs_state_bitmap *bmap; + struct page *page; + size_t desc_size = sizeof(struct ssdfs_btree_node_items_area); + size_t buf_size; + u32 item_offset; + int page_index; + u32 calculated; + unsigned long cur_index; + u16 i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("type %#x, flags %#x, " + "start_index %u, count %u, " + "state %d, node_id %u, height %d\n", + search->request.type, search->request.flags, + start_index, count, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height)); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree = node->tree; + search->result.start_index = U16_MAX; + search->result.count = 0; + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid items_area state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + + down_read(&node->header_lock); + ssdfs_memcpy(&items_area, 0, desc_size, + &node->items_area, 0, desc_size, + desc_size); + up_read(&node->header_lock); + + if (items_area.items_capacity == 0 || + items_area.items_capacity < items_area.items_count) { + SSDFS_ERR("invalid items accounting: " + "node_id %u, items_capacity %u, items_count %u\n", + search->node.id, + items_area.items_capacity, + items_area.items_count); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, items_capacity %u, items_count %u\n", + search->node.id, + items_area.items_capacity, + items_area.items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (count == 0) { + SSDFS_ERR("empty request\n"); + return -ERANGE; + } + + if (start_index >= items_area.items_count) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_index %u >= items_count %u\n", + start_index, items_area.items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + + if ((start_index + count) > items_area.items_count) + count = items_area.items_count - start_index; + + buf_size = count * item_size; + + switch (search->result.buf_state) { + case SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE: + case SSDFS_BTREE_SEARCH_INLINE_BUFFER: + if (count == 1) { + switch (tree->type) { + case SSDFS_INODES_BTREE: + search->result.buf = &search->raw.inode; + break; + + case SSDFS_EXTENTS_BTREE: + search->result.buf = &search->raw.fork; + break; + + case SSDFS_DENTRIES_BTREE: + search->result.buf = &search->raw.dentry; + break; + + case SSDFS_XATTR_BTREE: + search->result.buf = &search->raw.xattr; + break; + + default: + SSDFS_ERR("unsupported tree type %#x\n", + tree->type); + return -ERANGE; + } + + search->result.buf_state = + SSDFS_BTREE_SEARCH_INLINE_BUFFER; + search->result.buf_size = buf_size; + search->result.items_in_buffer = 0; + } else { + err = ssdfs_btree_search_alloc_result_buf(search, + buf_size); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate buffer\n"); + return err; + } + } + break; + + case SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER: + if (count == 1) { + ssdfs_btree_search_free_result_buf(search); + + switch (tree->type) { + case SSDFS_INODES_BTREE: + search->result.buf = &search->raw.inode; + break; + + case SSDFS_EXTENTS_BTREE: + search->result.buf = &search->raw.fork; + break; + + case SSDFS_DENTRIES_BTREE: + search->result.buf = &search->raw.dentry; + break; + + case SSDFS_XATTR_BTREE: + search->result.buf = &search->raw.xattr; + break; + + default: + SSDFS_ERR("unsupported tree type %#x\n", + tree->type); + return -ERANGE; + } + + search->result.buf_state = + SSDFS_BTREE_SEARCH_INLINE_BUFFER; + search->result.buf_size = buf_size; + search->result.items_in_buffer = 0; + } else { + search->result.buf = krealloc(search->result.buf, + buf_size, GFP_KERNEL); + if (!search->result.buf) { + SSDFS_ERR("fail to allocate buffer\n"); + return -ENOMEM; + } + search->result.buf_state = + SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER; + search->result.buf_size = buf_size; + search->result.items_in_buffer = 0; + } + break; + + default: + SSDFS_ERR("invalid buf_state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_LOCK_BMAP]; + + for (i = start_index; i < (start_index + count); i++) { + item_offset = (u32)i * item_size; + if (item_offset >= items_area.area_size) { + err = -ERANGE; + SSDFS_ERR("item_offset %u >= area_size %u\n", + item_offset, items_area.area_size); + goto finish_extract_range; + } + + item_offset += items_area.offset; + if (item_offset >= node->node_size) { + err = -ERANGE; + SSDFS_ERR("item_offset %u >= node_size %u\n", + item_offset, node->node_size); + goto finish_extract_range; + } + + page_index = item_offset >> PAGE_SHIFT; + + if (page_index > 0) + item_offset %= page_index * PAGE_SIZE; + + if (page_index >= pagevec_count(&node->content.pvec)) { + err = -ERANGE; + SSDFS_ERR("invalid page_index: " + "index %d, pvec_size %u\n", + page_index, + pagevec_count(&node->content.pvec)); + goto finish_extract_range; + } + + calculated = search->result.items_in_buffer * item_size; + if (calculated >= search->result.buf_size) { + err = -ERANGE; + SSDFS_ERR("calculated %u >= buf_size %zu\n", + calculated, search->result.buf_size); + goto finish_extract_range; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = node->content.pvec.pages[page_index]; + + down_read(&node->bmap_array.lock); + +try_lock_item: + spin_lock(&bmap->lock); + + cur_index = node->bmap_array.item_start_bit + i; + err = bitmap_allocate_region(bmap->ptr, + (unsigned int)cur_index, 0); + if (err == -EBUSY) { + err = 0; + prepare_to_wait(&node->wait_queue, &wait, + TASK_UNINTERRUPTIBLE); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("waiting unlocked state of item %lu\n", + cur_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_unlock(&bmap->lock); + + schedule(); + finish_wait(&node->wait_queue, &wait); + goto try_lock_item; + } + + spin_unlock(&bmap->lock); + + up_read(&node->bmap_array.lock); + + if (err) { + SSDFS_ERR("fail to lock: index %lu, err %d\n", + cur_index, err); + goto finish_extract_range; + } + + err = ssdfs_memcpy_from_page(search->result.buf, + calculated, + search->result.buf_size, + page, + item_offset, + PAGE_SIZE, + item_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", + err); + } else { + search->result.items_in_buffer++; + search->result.count++; + search->result.state = SSDFS_BTREE_SEARCH_VALID_ITEM; + } + + down_read(&node->bmap_array.lock); + spin_lock(&bmap->lock); + bitmap_clear(bmap->ptr, (unsigned int)cur_index, 1); + spin_unlock(&bmap->lock); + up_read(&node->bmap_array.lock); + + wake_up_all(&node->wait_queue); + + if (unlikely(err)) + goto finish_extract_range; + } + +finish_extract_range: + if (err == -ENODATA) { + /* + * do nothing + */ + } else if (unlikely(err)) { + search->result.state = SSDFS_BTREE_SEARCH_FAILURE; + search->result.err = err; + } else + search->result.start_index = start_index; + + return err; +} + +/* + * __ssdfs_btree_node_resize_items_area() - resize items area of the node + * @node: node object + * @item_size: size of the item in bytes + * @index_size: size of the index in bytes + * @new_size: new size of the items area + * + * This method tries to resize the items area of the node. + * + * It makes sense to allocate the bitmap with taking into + * account that we will resize the node. So, it needs + * to allocate the index area in bitmap is equal to + * the whole node and items area is equal to the whole node. + * This technique provides opportunity not to resize or + * to shift the content of the bitmap. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +int __ssdfs_btree_node_resize_items_area(struct ssdfs_btree_node *node, + size_t item_size, + size_t index_size, + u32 new_size) +{ + size_t hdr_size = sizeof(struct ssdfs_extents_btree_node_header); + bool index_area_exist = false; + bool items_area_exist = false; + u32 indexes_offset, items_offset; + u32 indexes_size, items_size; + u32 indexes_free_space, items_free_space; + u32 space_capacity, used_space = 0; + u16 capacity, count; + u32 diff_size; + u16 start_index, range_len; + u32 shift; + unsigned long index_start_bit; + unsigned long item_start_bit; + unsigned long bits_count; + u16 index_capacity; + u16 items_capacity; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, item_size %zu, new_size %u\n", + node->node_id, item_size, new_size); + + ssdfs_debug_btree_node_object(node); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + case SSDFS_BTREE_NODE_CORRUPTED: + SSDFS_WARN("node %u is corrupted\n", + node->node_id); + return -EFAULT; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + down_write(&node->bmap_array.lock); + + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + index_area_exist = true; + + indexes_offset = node->index_area.offset; + indexes_size = node->index_area.area_size; + + if (indexes_offset != hdr_size) { + err = -EFAULT; + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("corrupted index area: " + "offset %u, hdr_size %zu\n", + node->index_area.offset, + hdr_size); + goto finish_area_resize; + } + + if ((indexes_offset + indexes_size) > node->node_size) { + err = -EFAULT; + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("corrupted index area: " + "area_offset %u, area_size %u, " + "node_size %u\n", + node->index_area.offset, + node->index_area.area_size, + node->node_size); + goto finish_area_resize; + } + break; + + case SSDFS_BTREE_NODE_AREA_ABSENT: + index_area_exist = false; + indexes_offset = 0; + indexes_size = 0; + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid area state %#x\n", + atomic_read(&node->index_area.state)); + goto finish_area_resize; + } + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + items_area_exist = true; + + items_offset = node->items_area.offset; + items_size = node->items_area.area_size; + + if ((hdr_size + indexes_size) > items_offset) { + err = -EFAULT; + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("corrupted items area: " + "hdr_size %zu, index area_size %u, " + "offset %u\n", + hdr_size, + node->index_area.area_size, + node->items_area.offset); + goto finish_area_resize; + } + + if ((items_offset + items_size) > node->node_size) { + err = -EFAULT; + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("corrupted items area: " + "area_offset %u, area_size %u, " + "node_size %u\n", + node->items_area.offset, + node->items_area.area_size, + node->node_size); + goto finish_area_resize; + } + break; + + case SSDFS_BTREE_NODE_AREA_ABSENT: + items_area_exist = false; + items_offset = 0; + items_size = 0; + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid area state %#x\n", + atomic_read(&node->items_area.state)); + goto finish_area_resize; + } + + if ((hdr_size + indexes_size + items_size) > node->node_size) { + err = -EFAULT; + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("corrupted node: " + "hdr_size %zu, index area_size %u, " + "items area_size %u, node_size %u\n", + hdr_size, + node->index_area.area_size, + node->items_area.area_size, + node->node_size); + goto finish_area_resize; + } + + if (index_area_exist) { + space_capacity = node->index_area.index_size; + space_capacity *= node->index_area.index_capacity; + + if (space_capacity != indexes_size) { + err = -EFAULT; + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("space_capacity %u != indexes_size %u\n", + space_capacity, indexes_size); + goto finish_area_resize; + } + + used_space = node->index_area.index_size; + used_space *= node->index_area.index_count; + + if (used_space > space_capacity) { + err = -EFAULT; + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("used_space %u > space_capacity %u\n", + used_space, space_capacity); + goto finish_area_resize; + } + + indexes_free_space = space_capacity - used_space; + } else + indexes_free_space = 0; + + if (items_area_exist) { + space_capacity = item_size; + space_capacity *= node->items_area.items_capacity; + + if (space_capacity != items_size) { + err = -EFAULT; + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("space_capacity %u != items_size %u\n", + space_capacity, items_size); + goto finish_area_resize; + } + + used_space = item_size; + used_space *= node->items_area.items_count; + + if (used_space > space_capacity) { + err = -EFAULT; + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("used_space %u > space_capacity %u\n", + used_space, space_capacity); + goto finish_area_resize; + } + + items_free_space = space_capacity - used_space; + } else + items_free_space = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("indexes_offset %u, indexes_size %u, " + "items_offset %u, items_size %u, " + "indexes_free_space %u, items_free_space %u\n", + indexes_offset, indexes_size, + items_offset, items_size, + indexes_free_space, items_free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (new_size > items_size) { + /* increase items area */ + u32 unused_space; + + if ((hdr_size + indexes_size) > items_offset) { + err = -EFAULT; + SSDFS_ERR("corrupted node: " + "hdr_size %zu, indexes_size %u, " + "items_offset %u\n", + hdr_size, indexes_size, items_offset); + goto finish_area_resize; + } + + unused_space = items_offset - (hdr_size + indexes_size); + diff_size = new_size - items_size; + + if ((indexes_free_space + unused_space) < diff_size) { + err = -EFAULT; + SSDFS_ERR("corrupted_node: " + "indexes_free_space %u, unused_space %u, " + "diff_size %u\n", + indexes_free_space, + unused_space, + diff_size); + goto finish_area_resize; + } + + shift = diff_size / item_size; + + if (shift == 0 || shift >= U16_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid shift %u\n", shift); + goto finish_area_resize; + } + + start_index = (u16)shift; + range_len = node->items_area.items_count; + + if (unused_space >= diff_size) { + /* + * Do nothing. + * It doesn't need to correct index area. + */ + } else if (indexes_free_space >= diff_size) { + node->index_area.area_size -= diff_size; + node->index_area.index_capacity = + node->index_area.area_size / + node->index_area.index_size; + + if (node->index_area.area_size == 0) { + node->index_area.offset = U32_MAX; + node->index_area.start_hash = U64_MAX; + node->index_area.end_hash = U64_MAX; + atomic_set(&node->index_area.state, + SSDFS_BTREE_NODE_AREA_ABSENT); + } + } else { + err = -ERANGE; + SSDFS_ERR("node is corrupted: " + "indexes_free_space %u, " + "unused_space %u\n", + indexes_free_space, + unused_space); + goto finish_area_resize; + } + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + node->items_area.offset -= diff_size; + node->items_area.area_size += diff_size; + node->items_area.free_space += diff_size; + node->items_area.items_capacity = + node->items_area.area_size / item_size; + + if (node->items_area.items_capacity == 0) { + err = -ERANGE; + atomic_set(&node->state, + SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid items_capacity %u\n", + node->items_area.items_capacity); + goto finish_area_resize; + } + break; + + case SSDFS_BTREE_NODE_AREA_ABSENT: + node->items_area.offset = node->index_area.offset; + node->items_area.offset += node->index_area.area_size; + node->items_area.area_size = new_size; + node->items_area.free_space = new_size; + node->items_area.item_size = item_size; + if (item_size >= U8_MAX) + node->items_area.min_item_size = 0; + else + node->items_area.min_item_size = item_size; + node->items_area.max_item_size = item_size; + node->items_area.items_count = 0; + node->items_area.items_capacity = + node->items_area.area_size / item_size; + + if (node->items_area.items_capacity == 0) { + err = -ERANGE; + atomic_set(&node->state, + SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid items_capacity %u\n", + node->items_area.items_capacity); + goto finish_area_resize; + } + + node->items_area.start_hash = U64_MAX; + node->items_area.end_hash = U64_MAX; + + atomic_set(&node->items_area.state, + SSDFS_BTREE_NODE_ITEMS_AREA_EXIST); + break; + + default: + BUG(); + } + + if (range_len > 0) { + err = ssdfs_shift_range_left(node, &node->items_area, + item_size, + start_index, range_len, + (u16)shift); + if (unlikely(err)) { + atomic_set(&node->state, + SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to shift range to left: " + "start_index %u, range_len %u, " + "shift %u, err %d\n", + start_index, range_len, + shift, err); + goto finish_area_resize; + } + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items shift is not necessary: " + "range_len %u\n", + range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + /* + * It makes sense to allocate the bitmap with taking into + * account that we will resize the node. So, it needs + * to allocate the index area in bitmap is equal to + * the whole node and items area is equal to the whole node. + * This technique provides opportunity not to resize or + * to shift the content of the bitmap. + */ + } else if (new_size < items_size) { + /* decrease items area */ + diff_size = items_size - new_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_size %u, used_space %u, " + "node->items_area.items_count %u\n", + items_size, used_space, + node->items_area.items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (items_free_space < diff_size) { + err = -EFAULT; + SSDFS_ERR("items_free_space %u < diff_size %u\n", + items_free_space, diff_size); + goto finish_area_resize; + } + + shift = diff_size / item_size; + + if (shift == 0 || shift >= U16_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid shift %u\n", shift); + goto finish_area_resize; + } + + if (node->items_area.items_count > 0) { + start_index = 0; + range_len = node->items_area.items_count; + + err = ssdfs_shift_range_right(node, &node->items_area, + item_size, + start_index, range_len, + (u16)shift); + if (unlikely(err)) { + atomic_set(&node->state, + SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to shift range to left: " + "start_index %u, range_len %u, " + "shift %u, err %d\n", + start_index, range_len, + shift, err); + goto finish_area_resize; + } + } + + if (node->items_area.area_size < diff_size) + BUG(); + else if (node->items_area.area_size == diff_size) { + node->items_area.offset = U32_MAX; + node->items_area.area_size = 0; + node->items_area.free_space = 0; + node->items_area.items_count = 0; + node->items_area.items_capacity = 0; + node->items_area.start_hash = U64_MAX; + node->items_area.end_hash = U64_MAX; + atomic_set(&node->items_area.state, + SSDFS_BTREE_NODE_AREA_ABSENT); + } else { + node->items_area.offset += diff_size; + node->items_area.area_size -= diff_size; + node->items_area.free_space -= diff_size; + node->items_area.items_capacity = + node->items_area.area_size / item_size; + + capacity = node->items_area.items_capacity; + count = node->items_area.items_count; + if (capacity < count) { + err = -ERANGE; + atomic_set(&node->state, + SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("capacity %u < count %u\n", + capacity, count); + goto finish_area_resize; + } + } + + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + node->index_area.area_size += diff_size; + node->index_area.index_capacity = + node->index_area.area_size / + node->index_area.index_size; + + capacity = node->index_area.index_capacity; + count = node->index_area.index_count; + if (capacity < count) { + err = -ERANGE; + atomic_set(&node->state, + SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("capacity %u < count %u\n", + capacity, count); + goto finish_area_resize; + } + break; + + case SSDFS_BTREE_NODE_AREA_ABSENT: + node->index_area.offset = hdr_size; + node->index_area.area_size = diff_size; + node->index_area.index_size = index_size; + node->index_area.index_count = 0; + node->index_area.index_capacity = + node->index_area.area_size / + node->index_area.index_size; + + if (node->index_area.index_capacity == 0) { + err = -ERANGE; + atomic_set(&node->state, + SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("capacity == 0\n"); + goto finish_area_resize; + } + + node->index_area.start_hash = U64_MAX; + node->index_area.end_hash = U64_MAX; + + atomic_set(&node->items_area.state, + SSDFS_BTREE_NODE_INDEX_AREA_EXIST); + break; + + default: + BUG(); + } + + /* + * It makes sense to allocate the bitmap with taking into + * account that we will resize the node. So, it needs + * to allocate the index area in bitmap is equal to + * the whole node and items area is equal to the whole node. + * This technique provides opportunity not to resize or + * to shift the content of the bitmap. + */ + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("no necessity to resize: " + "new_size %u, items_size %u\n", + new_size, items_size); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_area_resize; + } + + node->bmap_array.item_start_bit = + node->bmap_array.index_start_bit + + node->index_area.index_capacity; + + index_capacity = node->index_area.index_capacity; + items_capacity = node->items_area.items_capacity; + index_start_bit = node->bmap_array.index_start_bit; + item_start_bit = node->bmap_array.item_start_bit; + bits_count = node->bmap_array.bits_count; + + if ((index_start_bit + index_capacity) > item_start_bit) { + err = -ERANGE; + atomic_set(&node->state, + SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid shift: " + "index_start_bit %lu, index_capacity %u, " + "item_start_bit %lu\n", + index_start_bit, index_capacity, item_start_bit); + goto finish_area_resize; + } + + if ((index_start_bit + index_capacity) > bits_count) { + err = -ERANGE; + atomic_set(&node->state, + SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid shift: " + "index_start_bit %lu, index_capacity %u, " + "bits_count %lu\n", + index_start_bit, index_capacity, bits_count); + goto finish_area_resize; + } + + if ((item_start_bit + items_capacity) > bits_count) { + err = -ERANGE; + atomic_set(&node->state, + SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid shift: " + "item_start_bit %lu, items_capacity %u, " + "bits_count %lu\n", + item_start_bit, items_capacity, bits_count); + goto finish_area_resize; + } + + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + index_area_exist = true; + break; + + default: + index_area_exist = false; + break; + } + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + items_area_exist = true; + break; + + default: + items_area_exist = false; + break; + } + + if (index_area_exist && items_area_exist) { + atomic_set(&node->type, SSDFS_BTREE_HYBRID_NODE); + atomic_or(SSDFS_BTREE_NODE_HAS_INDEX_AREA, + &node->flags); + atomic_or(SSDFS_BTREE_NODE_HAS_ITEMS_AREA, + &node->flags); + } else if (index_area_exist) { + atomic_set(&node->type, SSDFS_BTREE_INDEX_NODE); + atomic_or(SSDFS_BTREE_NODE_HAS_INDEX_AREA, + &node->flags); + atomic_and(~SSDFS_BTREE_NODE_HAS_ITEMS_AREA, + &node->flags); + } else if (items_area_exist) { + atomic_set(&node->type, SSDFS_BTREE_LEAF_NODE); + atomic_and(~SSDFS_BTREE_NODE_HAS_INDEX_AREA, + &node->flags); + atomic_or(SSDFS_BTREE_NODE_HAS_ITEMS_AREA, + &node->flags); + } else + BUG(); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_area: offset %u, area_size %u, " + "free_space %u, capacity %u; " + "index_area: offset %u, area_size %u, " + "capacity %u\n", + node->items_area.offset, + node->items_area.area_size, + node->items_area.free_space, + node->items_area.items_capacity, + node->index_area.offset, + node->index_area.area_size, + node->index_area.index_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + atomic_set(&node->state, SSDFS_BTREE_NODE_DIRTY); + +finish_area_resize: + up_write(&node->bmap_array.lock); + + return err; +} + +/* + * ssdfs_btree_node_get_hash_range() - extract hash range + */ +int ssdfs_btree_node_get_hash_range(struct ssdfs_btree_search *search, + u64 *start_hash, u64 *end_hash, + u16 *items_count) +{ + struct ssdfs_btree_node *node = NULL; + int state; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search || !start_hash || !end_hash || !items_count); + + SSDFS_DBG("search %p, start_hash %p, " + "end_hash %p, items_count %p\n", + search, start_hash, end_hash, items_count); + + ssdfs_debug_btree_search_object(search); +#endif /* CONFIG_SSDFS_DEBUG */ + + *start_hash = *end_hash = U64_MAX; + *items_count = 0; + + switch (search->node.state) { + case SSDFS_BTREE_SEARCH_FOUND_LEAF_NODE_DESC: + case SSDFS_BTREE_SEARCH_FOUND_INDEX_NODE_DESC: + node = search->node.child; + if (!node) { + SSDFS_ERR("node pointer is NULL\n"); + return -ERANGE; + } + break; + + default: + SSDFS_ERR("unexpected node state %#x\n", + search->node.state); + return -ERANGE; + } + + state = atomic_read(&node->items_area.state); + switch (state) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("unexpected items area's state %#x\n", + state); + return -ERANGE; + } + + down_read(&node->header_lock); + *start_hash = node->items_area.start_hash; + *end_hash = node->items_area.end_hash; + *items_count = node->items_area.items_count; + up_read(&node->header_lock); + + return 0; +} + +void ssdfs_show_btree_node_info(struct ssdfs_btree_node *node) +{ +#ifdef CONFIG_SSDFS_BTREE_CONSISTENCY_CHECK + int i; + + BUG_ON(!node); + + SSDFS_ERR("STATIC DATA: node_id %u, height %d, " + "owner_ino %llu, " + "node_size %u, pages_per_node %u, " + "create_cno %llu, tree %p, " + "parent_node %p, node_ops %p\n", + node->node_id, atomic_read(&node->height), + node->tree->owner_ino, + node->node_size, node->pages_per_node, + node->create_cno, node->tree, + node->parent_node, node->node_ops); + + if (node->parent_node) { + SSDFS_ERR("PARENT_NODE: node_id %u, height %d, " + "state %#x, type %#x\n", + node->parent_node->node_id, + atomic_read(&node->parent_node->height), + atomic_read(&node->parent_node->state), + atomic_read(&node->parent_node->type)); + } + + SSDFS_ERR("MUTABLE DATA: refs_count %d, state %#x, " + "flags %#x, type %#x\n", + atomic_read(&node->refs_count), + atomic_read(&node->state), + atomic_read(&node->flags), + atomic_read(&node->type)); + + down_read(&node->header_lock); + + SSDFS_ERR("INDEX_AREA: state %#x, " + "offset %u, size %u, " + "index_size %u, index_count %u, " + "index_capacity %u, " + "start_hash %llx, end_hash %llx\n", + atomic_read(&node->index_area.state), + node->index_area.offset, + node->index_area.area_size, + node->index_area.index_size, + node->index_area.index_count, + node->index_area.index_capacity, + node->index_area.start_hash, + node->index_area.end_hash); + + SSDFS_ERR("ITEMS_AREA: state %#x, " + "offset %u, size %u, free_space %u, " + "item_size %u, min_item_size %u, " + "max_item_size %u, items_count %u, " + "items_capacity %u, " + "start_hash %llx, end_hash %llx\n", + atomic_read(&node->items_area.state), + node->items_area.offset, + node->items_area.area_size, + node->items_area.free_space, + node->items_area.item_size, + node->items_area.min_item_size, + node->items_area.max_item_size, + node->items_area.items_count, + node->items_area.items_capacity, + node->items_area.start_hash, + node->items_area.end_hash); + + SSDFS_ERR("LOOKUP_TBL_AREA: state %#x, " + "offset %u, size %u, " + "index_size %u, index_count %u, " + "index_capacity %u, " + "start_hash %llx, end_hash %llx\n", + atomic_read(&node->lookup_tbl_area.state), + node->lookup_tbl_area.offset, + node->lookup_tbl_area.area_size, + node->lookup_tbl_area.index_size, + node->lookup_tbl_area.index_count, + node->lookup_tbl_area.index_capacity, + node->lookup_tbl_area.start_hash, + node->lookup_tbl_area.end_hash); + + SSDFS_ERR("HASH_TBL_AREA: state %#x, " + "offset %u, size %u, " + "index_size %u, index_count %u, " + "index_capacity %u, " + "start_hash %llx, end_hash %llx\n", + atomic_read(&node->hash_tbl_area.state), + node->hash_tbl_area.offset, + node->hash_tbl_area.area_size, + node->hash_tbl_area.index_size, + node->hash_tbl_area.index_count, + node->hash_tbl_area.index_capacity, + node->hash_tbl_area.start_hash, + node->hash_tbl_area.end_hash); + + up_read(&node->header_lock); + + spin_lock(&node->descriptor_lock); + + SSDFS_ERR("NODE DESCRIPTOR: is_locked %d, " + "update_cno %llu, seg %p, " + "completion_done %d\n", + spin_is_locked(&node->descriptor_lock), + node->update_cno, node->seg, + completion_done(&node->init_end)); + + SSDFS_ERR("NODE_INDEX: node_id %u, node_type %#x, " + "height %u, flags %#x, hash %llx, " + "seg_id %llu, logical_blk %u, len %u\n", + le32_to_cpu(node->node_index.node_id), + node->node_index.node_type, + node->node_index.height, + le16_to_cpu(node->node_index.flags), + le64_to_cpu(node->node_index.index.hash), + le64_to_cpu(node->node_index.index.extent.seg_id), + le32_to_cpu(node->node_index.index.extent.logical_blk), + le32_to_cpu(node->node_index.index.extent.len)); + + SSDFS_ERR("EXTENT: seg_id %llu, logical_blk %u, len %u\n", + le64_to_cpu(node->extent.seg_id), + le32_to_cpu(node->extent.logical_blk), + le32_to_cpu(node->extent.len)); + + if (node->seg) { + SSDFS_ERR("SEGMENT: seg_id %llu, seg_type %#x, " + "seg_state %#x, refs_count %d\n", + node->seg->seg_id, + node->seg->seg_type, + atomic_read(&node->seg->seg_state), + atomic_read(&node->seg->refs_count)); + } + + spin_unlock(&node->descriptor_lock); + + down_read(&node->bmap_array.lock); + + SSDFS_ERR("BITMAP ARRAY: bits_count %lu, " + "bmap_bytes %zu, index_start_bit %lu, " + "item_start_bit %lu\n", + node->bmap_array.bits_count, + node->bmap_array.bmap_bytes, + node->bmap_array.index_start_bit, + node->bmap_array.item_start_bit); + + for (i = 0; i < SSDFS_BTREE_NODE_BMAP_COUNT; i++) { + struct ssdfs_state_bitmap *bmap; + + bmap = &node->bmap_array.bmap[i]; + + SSDFS_ERR("BITMAP: index %d, is_locked %d, " + "flags %#x, ptr %p\n", + i, spin_is_locked(&bmap->lock), + bmap->flags, bmap->ptr); + } + + SSDFS_ERR("WAIT_QUEUE: is_active %d\n", + waitqueue_active(&node->wait_queue)); + + up_read(&node->bmap_array.lock); +#endif /* CONFIG_SSDFS_BTREE_CONSISTENCY_CHECK */ +} + +void ssdfs_debug_btree_node_object(struct ssdfs_btree_node *node) +{ +#ifdef CONFIG_SSDFS_DEBUG + int i; + + BUG_ON(!node); + + SSDFS_DBG("STATIC DATA: node_id %u, height %d, " + "owner_ino %llu, " + "node_size %u, pages_per_node %u, " + "create_cno %llu, tree %p, " + "parent_node %p, node_ops %p\n", + node->node_id, atomic_read(&node->height), + node->tree->owner_ino, + node->node_size, node->pages_per_node, + node->create_cno, node->tree, + node->parent_node, node->node_ops); + + if (node->parent_node) { + SSDFS_DBG("PARENT_NODE: node_id %u, height %d, " + "state %#x, type %#x\n", + node->parent_node->node_id, + atomic_read(&node->parent_node->height), + atomic_read(&node->parent_node->state), + atomic_read(&node->parent_node->type)); + } + + SSDFS_DBG("MUTABLE DATA: refs_count %d, state %#x, " + "flags %#x, type %#x\n", + atomic_read(&node->refs_count), + atomic_read(&node->state), + atomic_read(&node->flags), + atomic_read(&node->type)); + + SSDFS_DBG("NODE HEADER: is_locked %d\n", + rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("RAW HEADER DUMP:\n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + &node->raw, + sizeof(node->raw)); + SSDFS_DBG("\n"); + + SSDFS_DBG("INDEX_AREA: state %#x, " + "offset %u, size %u, " + "index_size %u, index_count %u, " + "index_capacity %u, " + "start_hash %llx, end_hash %llx\n", + atomic_read(&node->index_area.state), + node->index_area.offset, + node->index_area.area_size, + node->index_area.index_size, + node->index_area.index_count, + node->index_area.index_capacity, + node->index_area.start_hash, + node->index_area.end_hash); + + SSDFS_DBG("ITEMS_AREA: state %#x, " + "offset %u, size %u, free_space %u, " + "item_size %u, min_item_size %u, " + "max_item_size %u, items_count %u, " + "items_capacity %u, " + "start_hash %llx, end_hash %llx\n", + atomic_read(&node->items_area.state), + node->items_area.offset, + node->items_area.area_size, + node->items_area.free_space, + node->items_area.item_size, + node->items_area.min_item_size, + node->items_area.max_item_size, + node->items_area.items_count, + node->items_area.items_capacity, + node->items_area.start_hash, + node->items_area.end_hash); + + SSDFS_DBG("LOOKUP_TBL_AREA: state %#x, " + "offset %u, size %u, " + "index_size %u, index_count %u, " + "index_capacity %u, " + "start_hash %llx, end_hash %llx\n", + atomic_read(&node->lookup_tbl_area.state), + node->lookup_tbl_area.offset, + node->lookup_tbl_area.area_size, + node->lookup_tbl_area.index_size, + node->lookup_tbl_area.index_count, + node->lookup_tbl_area.index_capacity, + node->lookup_tbl_area.start_hash, + node->lookup_tbl_area.end_hash); + + SSDFS_DBG("HASH_TBL_AREA: state %#x, " + "offset %u, size %u, " + "index_size %u, index_count %u, " + "index_capacity %u, " + "start_hash %llx, end_hash %llx\n", + atomic_read(&node->hash_tbl_area.state), + node->hash_tbl_area.offset, + node->hash_tbl_area.area_size, + node->hash_tbl_area.index_size, + node->hash_tbl_area.index_count, + node->hash_tbl_area.index_capacity, + node->hash_tbl_area.start_hash, + node->hash_tbl_area.end_hash); + + SSDFS_DBG("NODE DESCRIPTOR: is_locked %d, " + "update_cno %llu, seg %p, " + "completion_done %d\n", + spin_is_locked(&node->descriptor_lock), + node->update_cno, node->seg, + completion_done(&node->init_end)); + + SSDFS_DBG("NODE_INDEX: node_id %u, node_type %#x, " + "height %u, flags %#x, hash %llx, " + "seg_id %llu, logical_blk %u, len %u\n", + le32_to_cpu(node->node_index.node_id), + node->node_index.node_type, + node->node_index.height, + le16_to_cpu(node->node_index.flags), + le64_to_cpu(node->node_index.index.hash), + le64_to_cpu(node->node_index.index.extent.seg_id), + le32_to_cpu(node->node_index.index.extent.logical_blk), + le32_to_cpu(node->node_index.index.extent.len)); + + SSDFS_DBG("EXTENT: seg_id %llu, logical_blk %u, len %u\n", + le64_to_cpu(node->extent.seg_id), + le32_to_cpu(node->extent.logical_blk), + le32_to_cpu(node->extent.len)); + + if (node->seg) { + SSDFS_DBG("SEGMENT: seg_id %llu, seg_type %#x, " + "seg_state %#x, refs_count %d\n", + node->seg->seg_id, + node->seg->seg_type, + atomic_read(&node->seg->seg_state), + atomic_read(&node->seg->refs_count)); + } + + SSDFS_DBG("BITMAP ARRAY: is_locked %d, bits_count %lu, " + "bmap_bytes %zu, index_start_bit %lu, " + "item_start_bit %lu\n", + rwsem_is_locked(&node->bmap_array.lock), + node->bmap_array.bits_count, + node->bmap_array.bmap_bytes, + node->bmap_array.index_start_bit, + node->bmap_array.item_start_bit); + + for (i = 0; i < SSDFS_BTREE_NODE_BMAP_COUNT; i++) { + struct ssdfs_state_bitmap *bmap; + + bmap = &node->bmap_array.bmap[i]; + + SSDFS_DBG("BITMAP: index %d, is_locked %d, " + "flags %#x, ptr %p\n", + i, spin_is_locked(&bmap->lock), + bmap->flags, bmap->ptr); + + if (bmap->ptr) { + SSDFS_DBG("BMAP DUMP: "); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + bmap->ptr, + node->bmap_array.bmap_bytes); + SSDFS_DBG("\n"); + } + } + + SSDFS_DBG("WAIT_QUEUE: is_active %d\n", + waitqueue_active(&node->wait_queue)); + + SSDFS_DBG("NODE CONTENT: is_locked %d, pvec_size %u\n", + rwsem_is_locked(&node->full_lock), + pagevec_count(&node->content.pvec)); + + for (i = 0; i < pagevec_count(&node->content.pvec); i++) { + struct page *page; + void *kaddr; + + page = node->content.pvec.pages[i]; + + if (!page) + continue; + + kaddr = kmap_local_page(page); + SSDFS_DBG("PAGE DUMP: index %d\n", + i); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, + PAGE_SIZE); + SSDFS_DBG("\n"); + kunmap_local(kaddr); + } +#endif /* CONFIG_SSDFS_DEBUG */ +} From patchwork Sat Feb 25 01:09:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151960 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72F8BC6FA8E for ; Sat, 25 Feb 2023 01:19:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229754AbjBYBTu (ORCPT ); Fri, 24 Feb 2023 20:19:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49448 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229762AbjBYBRm (ORCPT ); Fri, 24 Feb 2023 20:17:42 -0500 Received: from mail-oi1-x229.google.com (mail-oi1-x229.google.com [IPv6:2607:f8b0:4864:20::229]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 377EB126E3 for ; Fri, 24 Feb 2023 17:17:35 -0800 (PST) Received: by mail-oi1-x229.google.com with SMTP id bi17so822773oib.3 for ; Fri, 24 Feb 2023 17:17:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=65VcY9PTZgStBDfKE4CcvrEd85eg0uLz589qSj5TVBs=; b=sO9CjC8ASbLqbpcTZ9MsfDsOwLI/nDIpQV1llSRBLKA4coH94xeKI+VofBY7bqwfRd dLOqF1X/mfcCe5bPosROCBUBbikle4t/086MALMz0JWRTIeXTLG/IqxDsrfBP70j8BKI xVFUIaRUWY6p9ME1G0LT8WFwODOE02Oq3LcLt+vObPN236O0o7/dleJ+lmwnAFiFp/rP khi6NYv0cnEXbyt56T8l4sR46rVk3VHwbUEJ/Hn0FNTz6CAJKNhi5HEmTsanRDYnQjgQ elGTm/QB565B93ZoZ6CWYPEFAVKOimd16Wemtto9dUh6pw2DygSwnEfOPVv2PyIQHE6P OMnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=65VcY9PTZgStBDfKE4CcvrEd85eg0uLz589qSj5TVBs=; b=6VAT/dncIicp8Eemkdmkaf9/zSJrZSjNdJeFmHIsr21P5k6Hk+XE9lYXB/XuiV+a4c 5lxMS1CRWCBQowMdK58l7rZDR5KnoHfolkPu8BPUAP/Rw4Ns/bBLL6el4n5AWzk7U+JQ A/LYvKzETNkiTvssbzOc0mRVz6I9UwsNFAR+7tb5jaWoCUD2yD2sfM20k0CnHPoGoKCi m4CVtcVBJa3Sbh2Gq8S4Ki17jie1ccriZwr53vV+ysy1xvZykx8/krj/Wc523RB6j6u+ nOTR5QSJr4q48ZtjF+sNtegDpcF/aSfedvrTlgWu++Otyndbr2/j+JoGGa0FWqGLC4XG ofdQ== X-Gm-Message-State: AO0yUKXp8i/Jn+0cFHZp4FoGfME/UhDfDyYvm3jYhOBmRqJyi4jYvb7+ MtwrM/o/RQxOeI8lW0hQE3ys13AZDdKhOSjH X-Google-Smtp-Source: AK7set+lJT/xi5dvkCm8SAYZ9mT/hJkpvyE73pb8W6TGCJv4nolGSM7Ir/pojzkFX1TsgEmK5xspKA== X-Received: by 2002:aca:1901:0:b0:383:ee1d:f492 with SMTP id l1-20020aca1901000000b00383ee1df492mr2438900oii.9.1677287853613; Fri, 24 Feb 2023 17:17:33 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:32 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 56/76] ssdfs: introduce b-tree hierarchy object Date: Fri, 24 Feb 2023 17:09:07 -0800 Message-Id: <20230225010927.813929-57-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org B-tree needs to serve the operations of adding items, inserting items, and deleting items. These operations could require modification of b-tree structure (adding and deleting nodes). Also, indexes need to be updated in parent nodes. SSDFS file system uses the special b-tree hierarchy object to manage the b-tree structure. For every b-tree modification request, file system logic creates the hierarchy object and executes the b-tree hierarchy check. The checking logic defines the actions that should be done for every level of b-tree to execute b-tree node's add or delete operation. Finally, b-tree hierarchy object represents the actions plan that modification logic has to execute. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/btree_hierarchy.c | 2908 ++++++++++++++++++++++++++++++++++++ fs/ssdfs/btree_hierarchy.h | 284 ++++ 2 files changed, 3192 insertions(+) create mode 100644 fs/ssdfs/btree_hierarchy.c create mode 100644 fs/ssdfs/btree_hierarchy.h diff --git a/fs/ssdfs/btree_hierarchy.c b/fs/ssdfs/btree_hierarchy.c new file mode 100644 index 000000000000..cba502e6f3a6 --- /dev/null +++ b/fs/ssdfs/btree_hierarchy.c @@ -0,0 +1,2908 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/btree_hierarchy.c - btree hierarchy functionality implementation. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "offset_translation_table.h" +#include "page_array.h" +#include "page_vector.h" +#include "peb_container.h" +#include "segment_bitmap.h" +#include "segment.h" +#include "extents_queue.h" +#include "btree_search.h" +#include "btree_node.h" +#include "btree.h" +#include "btree_hierarchy.h" + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_btree_hierarchy_page_leaks; +atomic64_t ssdfs_btree_hierarchy_memory_leaks; +atomic64_t ssdfs_btree_hierarchy_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_btree_hierarchy_cache_leaks_increment(void *kaddr) + * void ssdfs_btree_hierarchy_cache_leaks_decrement(void *kaddr) + * void *ssdfs_btree_hierarchy_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_btree_hierarchy_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_btree_hierarchy_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_btree_hierarchy_kfree(void *kaddr) + * struct page *ssdfs_btree_hierarchy_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_btree_hierarchy_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_btree_hierarchy_free_page(struct page *page) + * void ssdfs_btree_hierarchy_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(btree_hierarchy) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(btree_hierarchy) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_btree_hierarchy_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_btree_hierarchy_page_leaks, 0); + atomic64_set(&ssdfs_btree_hierarchy_memory_leaks, 0); + atomic64_set(&ssdfs_btree_hierarchy_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_btree_hierarchy_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_btree_hierarchy_page_leaks) != 0) { + SSDFS_ERR("BTREE HIERARCHY: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_btree_hierarchy_page_leaks)); + } + + if (atomic64_read(&ssdfs_btree_hierarchy_memory_leaks) != 0) { + SSDFS_ERR("BTREE HIERARCHY: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_btree_hierarchy_memory_leaks)); + } + + if (atomic64_read(&ssdfs_btree_hierarchy_cache_leaks) != 0) { + SSDFS_ERR("BTREE HIERARCHY: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_btree_hierarchy_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +/* + * ssdfs_define_hierarchy_height() - define hierarchy's height + * @tree: btree object + * + * This method tries to define the hierarchy's height. + */ +static +int ssdfs_define_hierarchy_height(struct ssdfs_btree *tree) +{ + int tree_height; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); + + SSDFS_DBG("tree %p\n", tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_height = atomic_read(&tree->height); + if (tree_height < 0) { + SSDFS_WARN("invalid tree_height %d\n", + tree_height); + tree_height = 0; + } + + if (tree_height == 0) { + /* root node + child node */ + tree_height = 2; + } else { + /* pre-allocate additional level */ + tree_height += 1; + } + + return tree_height; +} + +/* + * ssdfs_btree_hierarchy_init() - init hierarchy object + * @tree: btree object + * + * This method tries to init the memory for the hierarchy object. + */ +void ssdfs_btree_hierarchy_init(struct ssdfs_btree *tree, + struct ssdfs_btree_hierarchy *ptr) +{ + int tree_height; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr); + + SSDFS_DBG("hierarchy %p\n", ptr); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_height = ssdfs_define_hierarchy_height(tree); + + ptr->desc.height = tree_height; + ptr->desc.increment_height = false; + ptr->desc.node_size = tree->node_size; + ptr->desc.index_size = tree->index_size; + ptr->desc.min_item_size = tree->min_item_size; + ptr->desc.max_item_size = tree->max_item_size; + ptr->desc.index_area_min_size = tree->index_area_min_size; + + for (i = 0; i < tree_height; i++) { + ptr->array_ptr[i]->flags = 0; + + ptr->array_ptr[i]->index_area.area_size = U32_MAX; + ptr->array_ptr[i]->index_area.free_space = U32_MAX; + ptr->array_ptr[i]->index_area.hash.start = U64_MAX; + ptr->array_ptr[i]->index_area.hash.end = U64_MAX; + ptr->array_ptr[i]->index_area.add.op_state = + SSDFS_BTREE_AREA_OP_UNKNOWN; + ptr->array_ptr[i]->index_area.add.hash.start = U64_MAX; + ptr->array_ptr[i]->index_area.add.hash.end = U64_MAX; + ptr->array_ptr[i]->index_area.add.pos.state = + SSDFS_HASH_RANGE_INTERSECTION_UNDEFINED; + ptr->array_ptr[i]->index_area.add.pos.start = U16_MAX; + ptr->array_ptr[i]->index_area.add.pos.count = 0; + ptr->array_ptr[i]->index_area.insert.op_state = + SSDFS_BTREE_AREA_OP_UNKNOWN; + ptr->array_ptr[i]->index_area.insert.hash.start = U64_MAX; + ptr->array_ptr[i]->index_area.insert.hash.end = U64_MAX; + ptr->array_ptr[i]->index_area.insert.pos.state = + SSDFS_HASH_RANGE_INTERSECTION_UNDEFINED; + ptr->array_ptr[i]->index_area.insert.pos.start = U16_MAX; + ptr->array_ptr[i]->index_area.insert.pos.count = 0; + ptr->array_ptr[i]->index_area.move.op_state = + SSDFS_BTREE_AREA_OP_UNKNOWN; + ptr->array_ptr[i]->index_area.move.direction = + SSDFS_BTREE_MOVE_NOWHERE; + ptr->array_ptr[i]->index_area.move.pos.state = + SSDFS_HASH_RANGE_INTERSECTION_UNDEFINED; + ptr->array_ptr[i]->index_area.move.pos.start = U16_MAX; + ptr->array_ptr[i]->index_area.move.pos.count = 0; + ptr->array_ptr[i]->index_area.delete.op_state = + SSDFS_BTREE_AREA_OP_UNKNOWN; + memset(&ptr->array_ptr[i]->index_area.delete.node_index, + 0xFF, sizeof(struct ssdfs_btree_index_key)); + + ptr->array_ptr[i]->items_area.area_size = U32_MAX; + ptr->array_ptr[i]->items_area.free_space = U32_MAX; + ptr->array_ptr[i]->items_area.hash.start = U64_MAX; + ptr->array_ptr[i]->items_area.hash.end = U64_MAX; + ptr->array_ptr[i]->items_area.add.op_state = + SSDFS_BTREE_AREA_OP_UNKNOWN; + ptr->array_ptr[i]->items_area.add.hash.start = U64_MAX; + ptr->array_ptr[i]->items_area.add.hash.end = U64_MAX; + ptr->array_ptr[i]->items_area.add.pos.state = + SSDFS_HASH_RANGE_INTERSECTION_UNDEFINED; + ptr->array_ptr[i]->items_area.add.pos.start = U16_MAX; + ptr->array_ptr[i]->items_area.add.pos.count = 0; + ptr->array_ptr[i]->items_area.insert.op_state = + SSDFS_BTREE_AREA_OP_UNKNOWN; + ptr->array_ptr[i]->items_area.insert.hash.start = U64_MAX; + ptr->array_ptr[i]->items_area.insert.hash.end = U64_MAX; + ptr->array_ptr[i]->items_area.insert.pos.state = + SSDFS_HASH_RANGE_INTERSECTION_UNDEFINED; + ptr->array_ptr[i]->items_area.insert.pos.start = U16_MAX; + ptr->array_ptr[i]->items_area.insert.pos.count = 0; + ptr->array_ptr[i]->items_area.move.op_state = + SSDFS_BTREE_AREA_OP_UNKNOWN; + ptr->array_ptr[i]->items_area.move.direction = + SSDFS_BTREE_MOVE_NOWHERE; + ptr->array_ptr[i]->items_area.move.pos.state = + SSDFS_HASH_RANGE_INTERSECTION_UNDEFINED; + ptr->array_ptr[i]->items_area.move.pos.start = U16_MAX; + ptr->array_ptr[i]->items_area.move.pos.count = 0; + + ptr->array_ptr[i]->nodes.old_node.type = + SSDFS_BTREE_NODE_UNKNOWN_TYPE; + ptr->array_ptr[i]->nodes.old_node.index_hash.start = U64_MAX; + ptr->array_ptr[i]->nodes.old_node.index_hash.end = U64_MAX; + ptr->array_ptr[i]->nodes.old_node.items_hash.start = U64_MAX; + ptr->array_ptr[i]->nodes.old_node.items_hash.end = U64_MAX; + ptr->array_ptr[i]->nodes.old_node.ptr = NULL; + ptr->array_ptr[i]->nodes.new_node.type = + SSDFS_BTREE_NODE_UNKNOWN_TYPE; + ptr->array_ptr[i]->nodes.new_node.index_hash.start = U64_MAX; + ptr->array_ptr[i]->nodes.new_node.index_hash.end = U64_MAX; + ptr->array_ptr[i]->nodes.new_node.items_hash.start = U64_MAX; + ptr->array_ptr[i]->nodes.new_node.items_hash.end = U64_MAX; + ptr->array_ptr[i]->nodes.new_node.ptr = NULL; + } +} + +/* + * ssdfs_btree_hierarchy_allocate() - allocate hierarchy object + * @tree: btree object + * + * This method tries to allocate the memory for the hierarchy object. + * + * RETURN: + * [success] - pointer on the allocated hierarchy object. + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate the memory. + */ +struct ssdfs_btree_hierarchy * +ssdfs_btree_hierarchy_allocate(struct ssdfs_btree *tree) +{ + struct ssdfs_btree_hierarchy *ptr; + size_t desc_size = sizeof(struct ssdfs_btree_hierarchy); + size_t ptr_size = sizeof(struct ssdfs_btree_level *); + size_t level_size = sizeof(struct ssdfs_btree_level); + int tree_height; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); + + SSDFS_DBG("tree %p\n", tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_height = ssdfs_define_hierarchy_height(tree); + if (tree_height <= 0) { + SSDFS_ERR("invalid tree_height %d\n", + tree_height); + return ERR_PTR(-ERANGE); + } + + ptr = ssdfs_btree_hierarchy_kzalloc(desc_size, GFP_KERNEL); + if (!ptr) { + SSDFS_ERR("fail to allocate tree levels' array\n"); + return ERR_PTR(-ENOMEM); + } + + ptr->array_ptr = ssdfs_btree_hierarchy_kzalloc(ptr_size * tree_height, + GFP_KERNEL); + if (!ptr) { + ssdfs_btree_hierarchy_kfree(ptr); + SSDFS_ERR("fail to allocate tree levels' array\n"); + return ERR_PTR(-ENOMEM); + } + + for (i = 0; i < tree_height; i++) { + ptr->array_ptr[i] = ssdfs_btree_hierarchy_kzalloc(level_size, + GFP_KERNEL); + if (!ptr) { + for (--i; i >= 0; i--) { + ssdfs_btree_hierarchy_kfree(ptr->array_ptr[i]); + ptr->array_ptr[i] = NULL; + } + + ssdfs_btree_hierarchy_kfree(ptr->array_ptr); + ptr->array_ptr = NULL; + ssdfs_btree_hierarchy_kfree(ptr); + SSDFS_ERR("fail to allocate tree levels' array\n"); + return ERR_PTR(-ENOMEM); + } + } + + ssdfs_btree_hierarchy_init(tree, ptr); + + return ptr; +} + +/* + * ssdfs_btree_hierarchy_free() - free the hierarchy object + * @hierarchy: pointer on the hierarchy object + */ +void ssdfs_btree_hierarchy_free(struct ssdfs_btree_hierarchy *hierarchy) +{ + int tree_height; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hierarchy %p\n", hierarchy); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!hierarchy) + return; + + tree_height = hierarchy->desc.height; + + for (i = 0; i < tree_height; i++) { + ssdfs_btree_hierarchy_kfree(hierarchy->array_ptr[i]); + hierarchy->array_ptr[i] = NULL; + } + + ssdfs_btree_hierarchy_kfree(hierarchy->array_ptr); + hierarchy->array_ptr = NULL; + + ssdfs_btree_hierarchy_kfree(hierarchy); +} + +/* + * ssdfs_btree_prepare_add_node() - prepare the level for adding node + * @tree: btree object + * @node_type: type of adding node + * @start_hash: starting hash value + * @end_hash: ending hash value + * @level: level object [out] + * @node: node object [in] + */ +void ssdfs_btree_prepare_add_node(struct ssdfs_btree *tree, + int node_type, + u64 start_hash, u64 end_hash, + struct ssdfs_btree_level *level, + struct ssdfs_btree_node *node) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !level); + + SSDFS_DBG("tree %p, level %p, node_type %#x, " + "start_hash %llx, end_hash %llx\n", + tree, level, node_type, start_hash, end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + level->flags |= SSDFS_BTREE_LEVEL_ADD_NODE; + level->nodes.new_node.type = node_type; + level->nodes.old_node.ptr = node; + + level->index_area.area_size = tree->index_area_min_size; + level->index_area.free_space = tree->index_area_min_size; + level->items_area.area_size = + tree->node_size - tree->index_area_min_size; + level->items_area.free_space = + tree->node_size - tree->index_area_min_size; + level->items_area.hash.start = start_hash; + level->items_area.hash.end = end_hash; +} + +/* + * ssdfs_btree_prepare_add_index() - prepare the level for adding index + * @level: level object [out] + * @start_hash: starting hash value + * @end_hash: ending hash value + * @node: node object [in] + * + * This method tries to prepare the @level for adding the index. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_btree_prepare_add_index(struct ssdfs_btree_level *level, + u64 start_hash, u64 end_hash, + struct ssdfs_btree_node *node) +{ + struct ssdfs_btree_node_insert *add; + int index_area_state; + int items_area_state; + u32 free_space; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!level || !node); + + SSDFS_DBG("level %p, node %p, " + "start_hash %llx, end_hash %llx\n", + level, node, start_hash, end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (level->flags & SSDFS_BTREE_LEVEL_ADD_NODE) { + level->flags |= SSDFS_BTREE_LEVEL_ADD_INDEX; + + add = &level->index_area.add; + add->hash.start = start_hash; + add->hash.end = end_hash; + add->pos.start = 0; + add->pos.state = SSDFS_HASH_RANGE_LEFT_ADJACENT; + add->pos.count = 1; + add->op_state = SSDFS_BTREE_AREA_OP_REQUESTED; + + return 0; + } + + index_area_state = atomic_read(&node->index_area.state); + items_area_state = atomic_read(&node->items_area.state); + + if (index_area_state != SSDFS_BTREE_NODE_INDEX_AREA_EXIST) { + SSDFS_ERR("index area is absent: " + "node_id %u, height %u\n", + node->node_id, + atomic_read(&node->height)); + return -ERANGE; + } + + if (can_add_new_index(node)) { + level->flags |= SSDFS_BTREE_LEVEL_ADD_INDEX; + level->nodes.old_node.type = atomic_read(&node->type); + level->nodes.old_node.ptr = node; + } else if (atomic_read(&node->type) == SSDFS_BTREE_ROOT_NODE) { + level->flags |= SSDFS_BTREE_LEVEL_ADD_INDEX; + level->nodes.new_node.type = atomic_read(&node->type); + level->nodes.new_node.ptr = node; + } else if (level->flags & SSDFS_BTREE_TRY_RESIZE_INDEX_AREA) { + level->flags |= SSDFS_BTREE_LEVEL_ADD_INDEX; + level->nodes.new_node.type = atomic_read(&node->type); + level->nodes.new_node.ptr = node; + } else { + SSDFS_ERR("fail to add a new index: " + "node_id %u, height %u\n", + node->node_id, + atomic_read(&node->height)); + return -ERANGE; + } + + down_read(&node->header_lock); + + free_space = node->index_area.index_capacity; + + if (node->index_area.index_count > free_space) { + err = -ERANGE; + SSDFS_ERR("index_count %u > index_capacity %u\n", + node->index_area.index_count, + free_space); + goto finish_prepare_level; + } + + free_space -= node->index_area.index_count; + free_space *= node->index_area.index_size; + + level->index_area.free_space = free_space; + level->index_area.area_size = node->index_area.area_size; + level->index_area.hash.start = node->index_area.start_hash; + level->index_area.hash.end = node->index_area.end_hash; + + if (items_area_state == SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + if (node->items_area.free_space > node->node_size) { + err = -ERANGE; + SSDFS_ERR("free_space %u > node_size %u\n", + node->items_area.free_space, + node->node_size); + goto finish_prepare_level; + } + + level->items_area.free_space = node->items_area.free_space; + level->items_area.area_size = node->items_area.area_size; + level->items_area.hash.start = node->items_area.start_hash; + level->items_area.hash.end = node->items_area.end_hash; + } + +finish_prepare_level: + up_read(&node->header_lock); + + if (unlikely(err)) + return err; + + if (start_hash > end_hash) { + SSDFS_ERR("invalid requested hash range: " + "start_hash %llx, end_hash %llx\n", + start_hash, end_hash); + return -ERANGE; + } + + add = &level->index_area.add; + + add->hash.start = start_hash; + add->hash.end = end_hash; + + err = ssdfs_btree_node_find_index_position(node, start_hash, + &add->pos.start); + if (err == -ENODATA) { + if (add->pos.start >= U16_MAX) { + SSDFS_ERR("fail to find the index position: " + "start_hash %llx, err %d\n", + start_hash, err); + return err; + } else + err = 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find the index position: " + "start_hash %llx, err %d\n", + start_hash, err); + return err; + } else if (level->index_area.hash.start != start_hash) { + /* + * We've received the position of available + * index record. So, correct it for the real + * insert operation. + */ + add->pos.start++; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_hash %llx, end_hash %llx, " + "level->index_area.hash.start %llx, " + "level->index_area.hash.end %llx\n", + start_hash, end_hash, + level->index_area.hash.start, + level->index_area.hash.end); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (end_hash < level->index_area.hash.start) + add->pos.state = SSDFS_HASH_RANGE_LEFT_ADJACENT; + else if (start_hash > level->index_area.hash.end) + add->pos.state = SSDFS_HASH_RANGE_RIGHT_ADJACENT; + else + add->pos.state = SSDFS_HASH_RANGE_INTERSECTION; + + add->pos.count = 1; + add->op_state = SSDFS_BTREE_AREA_OP_REQUESTED; + return 0; +} + +static inline +void ssdfs_btree_cancel_add_index(struct ssdfs_btree_level *level) +{ + struct ssdfs_btree_node_insert *add; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!level); + + SSDFS_DBG("level %p\n", level); +#endif /* CONFIG_SSDFS_DEBUG */ + + level->flags &= ~SSDFS_BTREE_LEVEL_ADD_INDEX; + + add = &level->index_area.add; + + add->op_state = SSDFS_BTREE_AREA_OP_UNKNOWN; + add->hash.start = U64_MAX; + add->hash.end = U64_MAX; + add->pos.state = SSDFS_HASH_RANGE_INTERSECTION_UNDEFINED; + add->pos.start = U16_MAX; + add->pos.count = 0; +} + +/* + * ssdfs_btree_prepare_update_index() - prepare the level for index update + * @level: level object [out] + * @start_hash: starting hash value + * @end_hash: ending hash value + * @node: node object [in] + * + * This method tries to prepare the @level for adding the index. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_prepare_update_index(struct ssdfs_btree_level *level, + u64 start_hash, u64 end_hash, + struct ssdfs_btree_node *node) +{ + struct ssdfs_btree_node_insert *insert; + int index_area_state; + int items_area_state; + u32 free_space; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!level || !node); + + SSDFS_DBG("level %p, start_hash %llx, " + "end_hash %llx, node %p\n", + level, start_hash, end_hash, node); +#endif /* CONFIG_SSDFS_DEBUG */ + + level->flags |= SSDFS_BTREE_LEVEL_UPDATE_INDEX; + level->nodes.old_node.type = atomic_read(&node->type); + level->nodes.old_node.ptr = node; + + index_area_state = atomic_read(&node->index_area.state); + items_area_state = atomic_read(&node->items_area.state); + + if (index_area_state != SSDFS_BTREE_NODE_INDEX_AREA_EXIST) { + SSDFS_ERR("index area is absent: " + "node_id %u, height %u\n", + node->node_id, + atomic_read(&node->height)); + return -ERANGE; + } + + down_read(&node->header_lock); + + free_space = node->index_area.index_capacity; + + if (node->index_area.index_count > free_space) { + err = -ERANGE; + SSDFS_ERR("index_count %u > index_capacity %u\n", + node->index_area.index_count, + free_space); + goto finish_prepare_level; + } + + free_space -= node->index_area.index_count; + free_space *= node->index_area.index_size; + + level->index_area.free_space = free_space; + level->index_area.area_size = node->index_area.area_size; + level->index_area.hash.start = node->index_area.start_hash; + level->index_area.hash.end = node->index_area.end_hash; + + if (start_hash > end_hash) { + err = -ERANGE; + SSDFS_ERR("invalid range: start_hash %llx, end_hash %llx\n", + start_hash, end_hash); + goto finish_prepare_level; + } + + if (!(level->index_area.hash.start <= start_hash && + end_hash <= level->index_area.hash.end)) { + err = -ERANGE; + SSDFS_ERR("invalid hash range " + "(start_hash %llx, end_hash %llx), " + "node (start_hash %llx, end_hash %llx)\n", + start_hash, end_hash, + level->index_area.hash.start, + level->index_area.hash.end); + goto finish_prepare_level; + } + + if (items_area_state == SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + if (node->items_area.free_space > node->node_size) { + err = -ERANGE; + SSDFS_ERR("free_space %u > node_size %u\n", + node->items_area.free_space, + node->node_size); + goto finish_prepare_level; + } + + level->items_area.free_space = node->items_area.free_space; + level->items_area.area_size = node->items_area.area_size; + level->index_area.hash.start = node->items_area.start_hash; + level->index_area.hash.end = node->items_area.end_hash; + } + +finish_prepare_level: + up_read(&node->header_lock); + + if (unlikely(err)) + return err; + + insert = &level->index_area.insert; + err = ssdfs_btree_node_find_index_position(node, start_hash, + &insert->pos.start); + if (unlikely(err)) { + SSDFS_ERR("fail to find the index position: " + "start_hash %llx, err %d\n", + start_hash, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_hash %llx, end_hash %llx, " + "level->index_area.hash.start %llx, " + "level->index_area.hash.end %llx\n", + start_hash, end_hash, + level->index_area.hash.start, + level->index_area.hash.end); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (end_hash < level->index_area.hash.start) + insert->pos.state = SSDFS_HASH_RANGE_LEFT_ADJACENT; + else if (start_hash > level->index_area.hash.end) + insert->pos.state = SSDFS_HASH_RANGE_RIGHT_ADJACENT; + else + insert->pos.state = SSDFS_HASH_RANGE_INTERSECTION; + + insert->pos.count = 1; + insert->op_state = SSDFS_BTREE_AREA_OP_REQUESTED; + return 0; +} + +/* + * ssdfs_btree_prepare_do_nothing() - prepare the level for to do nothing + * @level: level object [out] + * @node: node object [in] + * + * This method tries to prepare the @level for to do nothing. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_prepare_do_nothing(struct ssdfs_btree_level *level, + struct ssdfs_btree_node *node) +{ + int index_area_state; + int items_area_state; + u32 free_space; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!level || !node); + + SSDFS_DBG("level %p, node %p\n", + level, node); +#endif /* CONFIG_SSDFS_DEBUG */ + + level->flags = 0; + level->nodes.old_node.type = atomic_read(&node->type); + level->nodes.old_node.ptr = node; + + index_area_state = atomic_read(&node->index_area.state); + items_area_state = atomic_read(&node->items_area.state); + + if (index_area_state != SSDFS_BTREE_NODE_INDEX_AREA_EXIST) { + SSDFS_ERR("index area is absent: " + "node_id %u, height %u\n", + node->node_id, + atomic_read(&node->height)); + return -ERANGE; + } + + down_read(&node->header_lock); + + free_space = node->index_area.index_capacity; + + if (node->index_area.index_count > free_space) { + err = -ERANGE; + SSDFS_ERR("index_count %u > index_capacity %u\n", + node->index_area.index_count, + free_space); + goto finish_prepare_level; + } + + free_space -= node->index_area.index_count; + free_space *= node->index_area.index_size; + + level->index_area.free_space = free_space; + level->index_area.area_size = node->index_area.area_size; + level->index_area.hash.start = node->index_area.start_hash; + level->index_area.hash.end = node->index_area.end_hash; + + if (items_area_state == SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + if (node->items_area.free_space > node->node_size) { + err = -ERANGE; + SSDFS_ERR("free_space %u > node_size %u\n", + node->items_area.free_space, + node->node_size); + goto finish_prepare_level; + } + + level->items_area.free_space = node->items_area.free_space; + level->items_area.area_size = node->items_area.area_size; + level->items_area.hash.start = node->items_area.start_hash; + level->items_area.hash.end = node->items_area.end_hash; + } + +finish_prepare_level: + up_read(&node->header_lock); + + return err; +} + +/* + * ssdfs_btree_prepare_insert_item() - prepare the level to insert item + * @level: level object [out] + * @search: search object + * @node: node object [in] + * + * This method tries to prepare the @level to insert the item. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_prepare_insert_item(struct ssdfs_btree_level *level, + struct ssdfs_btree_search *search, + struct ssdfs_btree_node *node) +{ + struct ssdfs_btree_node *parent; + struct ssdfs_btree_node_insert *add; + struct ssdfs_btree_node_move *move; + int index_area_state; + int items_area_state; + u32 free_space; + u8 index_size; + u64 start_hash, end_hash; + u16 items_count; + u16 min_item_size, max_item_size; + u32 insert_size; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!level || !search || !node); + + SSDFS_DBG("level %p, node %p\n", + level, node); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_ROOT_NODE: + /* + * Item will be added into a new node. + * The tree will grow. + * No logic is necessary for such case. + */ + return 0; + + case SSDFS_BTREE_INDEX_NODE: + /* + * Item will be added into a new hybrid node. + * No logic is necessary for such case. + */ + return 0; + + default: + /* continue logic */ + break; + } + + level->flags |= SSDFS_BTREE_LEVEL_ADD_ITEM; + level->nodes.old_node.type = atomic_read(&node->type); + level->nodes.old_node.ptr = node; + + index_area_state = atomic_read(&node->index_area.state); + items_area_state = atomic_read(&node->items_area.state); + + if (items_area_state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + SSDFS_ERR("items area is absent: " + "node_id %u, height %u\n", + node->node_id, + atomic_read(&node->height)); + return -ERANGE; + } + + down_read(&node->header_lock); + + if (node->items_area.free_space > node->node_size) { + err = -ERANGE; + SSDFS_ERR("free_space %u > node_size %u\n", + node->items_area.free_space, + node->node_size); + goto finish_prepare_level; + } + + level->items_area.free_space = node->items_area.free_space; + level->items_area.area_size = node->items_area.area_size; + level->items_area.hash.start = node->items_area.start_hash; + level->items_area.hash.end = node->items_area.end_hash; + min_item_size = node->items_area.min_item_size; + max_item_size = node->items_area.max_item_size; + items_count = node->items_area.items_count; + + if (index_area_state == SSDFS_BTREE_NODE_INDEX_AREA_EXIST) { + free_space = node->index_area.index_capacity; + + if (node->index_area.index_count > free_space) { + err = -ERANGE; + SSDFS_ERR("index_count %u > index_capacity %u\n", + node->index_area.index_count, + free_space); + goto finish_prepare_level; + } + + free_space -= node->index_area.index_count; + free_space *= node->index_area.index_size; + + index_size = node->index_area.index_size; + + level->index_area.free_space = free_space; + level->index_area.area_size = node->index_area.area_size; + level->index_area.hash.start = node->index_area.start_hash; + level->index_area.hash.end = node->index_area.end_hash; + } + +finish_prepare_level: + up_read(&node->header_lock); + + if (unlikely(err)) + return err; + + start_hash = search->request.start.hash; + end_hash = search->request.end.hash; + + if (start_hash > end_hash) { + SSDFS_ERR("invalid requested hash range: " + "start_hash %llx, end_hash %llx\n", + start_hash, end_hash); + return -ERANGE; + } + + add = &level->items_area.add; + + add->hash.start = start_hash; + add->hash.end = end_hash; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_hash %llx, end_hash %llx, " + "level->items_area.hash.start %llx, " + "level->items_area.hash.end %llx\n", + start_hash, end_hash, + level->items_area.hash.start, + level->items_area.hash.end); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (items_count == 0) { + add->pos.state = SSDFS_HASH_RANGE_OUT_OF_NODE; + add->pos.start = 0; + add->pos.count = search->request.count; + add->op_state = SSDFS_BTREE_AREA_OP_REQUESTED; + return 0; + } else if (end_hash < level->items_area.hash.start) + add->pos.state = SSDFS_HASH_RANGE_LEFT_ADJACENT; + else if (start_hash > level->items_area.hash.end) + add->pos.state = SSDFS_HASH_RANGE_RIGHT_ADJACENT; + else + add->pos.state = SSDFS_HASH_RANGE_INTERSECTION; + + add->pos.start = search->result.start_index; + add->pos.count = search->request.count; + add->op_state = SSDFS_BTREE_AREA_OP_REQUESTED; + + switch (node->tree->type) { + case SSDFS_INODES_BTREE: + /* Inodes tree doesn't need in rebalancing */ + return 0; + + case SSDFS_EXTENTS_BTREE: + switch (add->pos.state) { + case SSDFS_HASH_RANGE_RIGHT_ADJACENT: + /* skip rebalancing */ + return 0; + + default: + /* continue rebalancing */ + break; + } + break; + + default: + /* continue logic */ + break; + } + + insert_size = max_item_size * search->request.count; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("insert_size %u, free_space %u\n", + insert_size, level->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (insert_size == 0) { + SSDFS_ERR("search->result.start_index %u, " + "search->request.count %u, " + "max_item_size %u, " + "insert_size %u\n", + search->result.start_index, + search->request.count, + max_item_size, + insert_size); + return -ERANGE; + } + + spin_lock(&node->descriptor_lock); + parent = node->parent_node; + spin_unlock(&node->descriptor_lock); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!parent); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (level->items_area.free_space < insert_size) { + u16 moving_items; + + if (can_add_new_index(parent)) + moving_items = items_count / 2; + else + moving_items = search->request.count; + + move = &level->items_area.move; + + switch (add->pos.state) { + case SSDFS_HASH_RANGE_LEFT_ADJACENT: + level->flags |= SSDFS_BTREE_ITEMS_AREA_NEED_MOVE; + move->direction = SSDFS_BTREE_MOVE_TO_LEFT; + move->pos.state = SSDFS_HASH_RANGE_INTERSECTION; + move->pos.start = 0; + move->pos.count = moving_items; + move->op_state = SSDFS_BTREE_AREA_OP_REQUESTED; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("MOVE_TO_LEFT: start %u, count %u\n", + move->pos.start, move->pos.count); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + case SSDFS_HASH_RANGE_INTERSECTION: + level->flags |= SSDFS_BTREE_ITEMS_AREA_NEED_MOVE; + move->direction = SSDFS_BTREE_MOVE_TO_RIGHT; + move->pos.state = SSDFS_HASH_RANGE_INTERSECTION; + move->pos.start = items_count - moving_items; + move->pos.count = moving_items; + move->op_state = SSDFS_BTREE_AREA_OP_REQUESTED; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("MOVE_TO_RIGHT: start %u, count %u\n", + move->pos.start, move->pos.count); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + case SSDFS_HASH_RANGE_RIGHT_ADJACENT: + /* do nothing */ + break; + + default: + SSDFS_ERR("invalid insert position's state %#x\n", + add->pos.state); + return -ERANGE; + } + } + + return 0; +} + +static inline +void ssdfs_btree_cancel_insert_item(struct ssdfs_btree_level *level) +{ + struct ssdfs_btree_node_insert *add; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!level); + + SSDFS_DBG("level %p\n", level); +#endif /* CONFIG_SSDFS_DEBUG */ + + level->flags &= ~SSDFS_BTREE_LEVEL_ADD_ITEM; + + add = &level->items_area.add; + + add->op_state = SSDFS_BTREE_AREA_OP_UNKNOWN; + add->hash.start = U64_MAX; + add->hash.end = U64_MAX; + add->pos.state = SSDFS_HASH_RANGE_INTERSECTION_UNDEFINED; + add->pos.start = U16_MAX; + add->pos.count = 0; +} + +/* + * ssdfs_need_move_items_to_sibling() - does it need to move items? + * @level: level object + * + * This method tries to check that the tree needs + * to be rebalanced. + */ +static inline +bool ssdfs_need_move_items_to_sibling(struct ssdfs_btree_level *level) +{ + struct ssdfs_btree_node_move *move; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!level); + + SSDFS_DBG("level %p\n", level); +#endif /* CONFIG_SSDFS_DEBUG */ + + move = &level->items_area.move; + + if (level->flags & SSDFS_BTREE_ITEMS_AREA_NEED_MOVE) { + switch (move->direction) { + case SSDFS_BTREE_MOVE_TO_LEFT: + case SSDFS_BTREE_MOVE_TO_RIGHT: + return true; + } + } + + return false; +} + +static inline +void ssdfs_btree_cancel_move_items_to_sibling(struct ssdfs_btree_level *level) +{ + struct ssdfs_btree_node_move *move; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!level); + + SSDFS_DBG("level %p\n", level); +#endif /* CONFIG_SSDFS_DEBUG */ + + move = &level->items_area.move; + + level->flags &= ~SSDFS_BTREE_ITEMS_AREA_NEED_MOVE; + + move->direction = SSDFS_BTREE_MOVE_NOWHERE; + move->pos.state = SSDFS_HASH_RANGE_INTERSECTION_UNDEFINED; + move->pos.start = U16_MAX; + move->pos.count = 0; + move->op_state = SSDFS_BTREE_AREA_OP_UNKNOWN; +} + +/* + * can_index_area_being_increased() - does items area has enough free space? + * @node: node object + */ +static inline +bool can_index_area_being_increased(struct ssdfs_btree_node *node) +{ + int flags; + int items_area_state; + int index_area_state; + u32 items_area_free_space; + u32 index_area_min_size; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + flags = atomic_read(&node->tree->flags); + + if (!(flags & SSDFS_BTREE_DESC_INDEX_AREA_RESIZABLE)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index area cannot be resized: " + "node %u\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + items_area_state = atomic_read(&node->items_area.state); + index_area_state = atomic_read(&node->index_area.state); + + if (index_area_state != SSDFS_BTREE_NODE_INDEX_AREA_EXIST) + return false; + + if (items_area_state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) + return false; + + index_area_min_size = node->tree->index_area_min_size; + + down_read(&node->header_lock); + items_area_free_space = node->items_area.free_space; + up_read(&node->header_lock); + + return items_area_free_space >= index_area_min_size; +} + +/* + * ssdfs_check_capability_move_to_sibling() - check capability to rebalance tree + * @level: level object + * + * This method tries to define the presence of free space in + * sibling node with the goal to rebalance the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOSPC - node hasn't free space. + */ +static +int ssdfs_check_capability_move_to_sibling(struct ssdfs_btree_level *level) +{ + struct ssdfs_btree *tree; + struct ssdfs_btree_node *node, *parent_node; + struct ssdfs_btree_node_move *move; + struct ssdfs_btree_node_index_area area; + struct ssdfs_btree_index_key index_key; + u64 hash = U64_MAX; + int items_area_state; + int index_area_state; + u16 index_count = 0; + u16 index_capacity = 0; + u16 vacant_indexes = 0; + u16 index_position; + u16 items_count; + u16 items_capacity; + u16 parent_items_count = 0; + u16 parent_items_capacity = 0; + u16 moving_items; + int node_type; + u32 node_id; + bool is_resize_possible = false; + spinlock_t *lock; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!level); + + SSDFS_DBG("level %p\n", level); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!(level->flags & SSDFS_BTREE_ITEMS_AREA_NEED_MOVE)) { + SSDFS_DBG("no items should be moved\n"); + return 0; + } + + move = &level->items_area.move; + + switch (move->direction) { + case SSDFS_BTREE_MOVE_TO_LEFT: + case SSDFS_BTREE_MOVE_TO_RIGHT: + /* expected state */ + break; + + default: + SSDFS_DBG("nothing should be done\n"); + return 0; + } + + if (!level->nodes.old_node.ptr) { + SSDFS_ERR("node pointer is empty\n"); + return -ERANGE; + } + + node = level->nodes.old_node.ptr; + tree = node->tree; + + spin_lock(&node->descriptor_lock); + hash = le64_to_cpu(node->node_index.index.hash); + spin_unlock(&node->descriptor_lock); + + items_area_state = atomic_read(&node->items_area.state); + + down_read(&node->header_lock); + if (items_area_state == SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + items_count = node->items_area.items_count; + items_capacity = node->items_area.items_capacity; + } else + err = -ERANGE; + up_read(&node->header_lock); + + if (unlikely(err)) { + SSDFS_ERR("items area is absent\n"); + return -ERANGE; + } + + lock = &level->nodes.old_node.ptr->descriptor_lock; + spin_lock(lock); + node = level->nodes.old_node.ptr->parent_node; + spin_unlock(lock); + lock = NULL; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + is_resize_possible = can_index_area_being_increased(node); + + items_area_state = atomic_read(&node->items_area.state); + index_area_state = atomic_read(&node->index_area.state); + + down_read(&node->header_lock); + + if (items_area_state == SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + parent_items_count = node->items_area.items_count; + parent_items_capacity = node->items_area.items_capacity; + } + + if (index_area_state == SSDFS_BTREE_NODE_INDEX_AREA_EXIST) { + index_count = node->index_area.index_count; + index_capacity = node->index_area.index_capacity; + vacant_indexes = index_capacity - index_count; + } else + err = -ERANGE; + + up_read(&node->header_lock); + + if (unlikely(err)) { + SSDFS_ERR("index area is absent\n"); + return -ERANGE; + } + + err = ssdfs_btree_node_find_index_position(node, hash, + &index_position); + if (unlikely(err)) { + SSDFS_ERR("fail to find the index position: " + "hash %llx, err %d\n", + hash, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hash %llx, index_position %u\n", + hash, index_position); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (index_position >= index_count) { + SSDFS_ERR("index_position %u >= index_count %u\n", + index_position, index_count); + return -ERANGE; + } + + switch (move->direction) { + case SSDFS_BTREE_MOVE_TO_LEFT: + if (index_position == 0) { + SSDFS_DBG("no siblings on the left\n"); + + if (vacant_indexes == 0 && !is_resize_possible) { + /* + * Try to move to parent node + */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("MOVE_TO_PARENT: start %u, count %u\n", + move->pos.start, move->pos.count); +#endif /* CONFIG_SSDFS_DEBUG */ + move->direction = SSDFS_BTREE_MOVE_TO_PARENT; + moving_items = items_capacity / 4; + move->pos.start = 0; + move->pos.count = moving_items; + } + + return -ENOSPC; + } + + index_position--; + break; + + case SSDFS_BTREE_MOVE_TO_RIGHT: + if ((index_position + 1) == index_count) { + SSDFS_DBG("no siblings on the right\n"); + + if (vacant_indexes == 0 && !is_resize_possible) { + /* + * Try to move to parent node + */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("MOVE_TO_PARENT: start %u, count %u\n", + move->pos.start, move->pos.count); +#endif /* CONFIG_SSDFS_DEBUG */ + move->direction = SSDFS_BTREE_MOVE_TO_PARENT; + moving_items = items_capacity / 4; + move->pos.start = items_capacity - moving_items; + move->pos.count = moving_items; + } + + return -ENOSPC; + } + + index_position++; + break; + + default: + BUG(); + } + + node_type = atomic_read(&node->type); + + down_read(&node->full_lock); + + if (node_type == SSDFS_BTREE_ROOT_NODE) { + err = __ssdfs_btree_root_node_extract_index(node, + index_position, + &index_key); + } else { + down_read(&node->header_lock); + ssdfs_memcpy(&area, + 0, sizeof(struct ssdfs_btree_node_index_area), + &node->index_area, + 0, sizeof(struct ssdfs_btree_node_index_area), + sizeof(struct ssdfs_btree_node_index_area)); + up_read(&node->header_lock); + + err = __ssdfs_btree_common_node_extract_index(node, &area, + index_position, + &index_key); + } + + up_read(&node->full_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to extract index key: " + "index_position %u, err %d\n", + index_position, err); + ssdfs_debug_show_btree_node_indexes(node->tree, node); + return err; + } + + parent_node = node; + node_id = le32_to_cpu(index_key.node_id); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index_position %u, node_id %u\n", + index_position, node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_btree_radix_tree_find(tree, node_id, &node); + if (err == -ENOENT) { + err = 0; + node = __ssdfs_btree_read_node(tree, parent_node, + &index_key, + index_key.node_type, + node_id); + if (unlikely(IS_ERR_OR_NULL(node))) { + err = !node ? -ENOMEM : PTR_ERR(node); + SSDFS_ERR("fail to read: " + "node %llu, err %d\n", + (u64)node_id, err); + return err; + } + } else if (unlikely(err)) { + SSDFS_ERR("fail to find node in radix tree: " + "node_id %llu, err %d\n", + (u64)node_id, err); + return err; + } else if (!node) { + SSDFS_WARN("empty node pointer\n"); + return -ERANGE; + } + + items_area_state = atomic_read(&node->items_area.state); + + down_read(&node->header_lock); + if (items_area_state == SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + items_count = node->items_area.items_count; + items_capacity = node->items_area.items_capacity; + } else + err = -ERANGE; + up_read(&node->header_lock); + + if (unlikely(err)) { + SSDFS_ERR("items area is absent\n"); + return -ERANGE; + } else if (items_count >= items_capacity) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items area hasn't free space: " + "items_count %u, items_capacity %u\n", + items_count, items_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (vacant_indexes == 0 && !is_resize_possible) { + /* + * Try to move to parent node + */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("MOVE_TO_PARENT: start %u, count %u\n", + move->pos.start, move->pos.count); +#endif /* CONFIG_SSDFS_DEBUG */ + + move->direction = SSDFS_BTREE_MOVE_TO_PARENT; + moving_items = items_capacity / 4; + move->pos.start = items_capacity - moving_items; + move->pos.count = moving_items; + } + + return -ENOSPC; + } + + moving_items = items_capacity - items_count; + moving_items /= 2; + if (moving_items == 0) + moving_items = 1; + + switch (move->direction) { + case SSDFS_BTREE_MOVE_TO_LEFT: + move->pos.count = moving_items; + break; + + case SSDFS_BTREE_MOVE_TO_RIGHT: + items_count = move->pos.start + move->pos.count; + moving_items = min_t(u16, moving_items, move->pos.count); + move->pos.start = items_count - moving_items; + move->pos.count = moving_items; + break; + + default: + BUG(); + } + + level->nodes.new_node.type = atomic_read(&node->type); + level->nodes.new_node.ptr = node; + + return 0; +} + +/* + * ssdfs_need_move_items_to_parent() - does it need to move items? + * @level: level object + * + * This method tries to check that items need to move + * to the parent node. + */ +static inline +bool ssdfs_need_move_items_to_parent(struct ssdfs_btree_level *level) +{ + struct ssdfs_btree_node_move *move; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!level); + + SSDFS_DBG("level %p\n", level); +#endif /* CONFIG_SSDFS_DEBUG */ + + move = &level->items_area.move; + + if (level->flags & SSDFS_BTREE_ITEMS_AREA_NEED_MOVE) { + switch (move->direction) { + case SSDFS_BTREE_MOVE_TO_PARENT: + return true; + } + } + + return false; +} + +static inline +void ssdfs_btree_cancel_move_items_to_parent(struct ssdfs_btree_level *level) +{ + struct ssdfs_btree_node_move *move; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!level); + + SSDFS_DBG("level %p\n", level); +#endif /* CONFIG_SSDFS_DEBUG */ + + move = &level->items_area.move; + + level->flags &= ~SSDFS_BTREE_ITEMS_AREA_NEED_MOVE; + + move->direction = SSDFS_BTREE_MOVE_NOWHERE; + move->pos.state = SSDFS_HASH_RANGE_INTERSECTION_UNDEFINED; + move->pos.start = U16_MAX; + move->pos.count = 0; + move->op_state = SSDFS_BTREE_AREA_OP_UNKNOWN; +} + +/* + * ssdfs_prepare_move_items_to_parent() - prepare tree rebalance + * @search: search object + * @parent: parent level object + * @child: child level object + * + * This method tries to prepare the moving items from child + * to parent with the goal to rebalance the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_prepare_move_items_to_parent(struct ssdfs_btree_search *search, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + struct ssdfs_btree_node *node; + struct ssdfs_btree_node_insert *insert; + struct ssdfs_btree_node_move *move; + int items_area_state; + u64 start_hash, end_hash; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search || !parent || !child); + + SSDFS_DBG("search %p, parent %p, child %p\n", + search, parent, child); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!(child->flags & SSDFS_BTREE_ITEMS_AREA_NEED_MOVE)) { + SSDFS_DBG("no items should be moved\n"); + return 0; + } + + move = &child->items_area.move; + + switch (move->direction) { + case SSDFS_BTREE_MOVE_TO_PARENT: + /* expected state */ + break; + + default: + SSDFS_DBG("nothing should be done\n"); + return 0; + } + + if (!(parent->flags & SSDFS_BTREE_LEVEL_ADD_NODE)) { + SSDFS_DBG("items can be copied only into a new node\n"); + ssdfs_btree_cancel_move_items_to_parent(child); + return 0; + } + + node = child->nodes.old_node.ptr; + + if (!node) { + SSDFS_WARN("node pointer is empty\n"); + return -ERANGE; + } + + items_area_state = atomic_read(&node->items_area.state); + + down_read(&node->header_lock); + if (items_area_state == SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + start_hash = node->items_area.start_hash; + end_hash = node->items_area.end_hash; + } else + err = -ERANGE; + up_read(&node->header_lock); + + if (unlikely(err)) { + SSDFS_ERR("items area is absent\n"); + return -ERANGE; + } + + if (search->request.start.hash > end_hash || + search->request.end.hash < start_hash) { + ssdfs_btree_cancel_move_items_to_parent(child); + return 0; + } + + insert = &parent->items_area.insert; + + insert->hash.start = search->request.start.hash; + insert->hash.end = child->items_area.hash.end; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_hash %llx, end_hash %llx, " + "child->items_area.hash.start %llx, " + "child->items_area.hash.end %llx\n", + insert->hash.start, + insert->hash.end, + child->items_area.hash.start, + child->items_area.hash.end); +#endif /* CONFIG_SSDFS_DEBUG */ + + insert->pos.state = SSDFS_HASH_RANGE_LEFT_ADJACENT; + insert->pos.start = 0; + insert->pos.count = move->pos.count; + insert->op_state = SSDFS_BTREE_AREA_OP_REQUESTED; + + return 0; +} + +/* + * ssdfs_btree_define_moving_indexes() - define moving index range + * @parent: parent level object [in|out] + * @child: child level object [in|out] + * + * This method tries to define what index range should be moved + * between @parent and @child levels. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_define_moving_indexes(struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ +#ifdef CONFIG_SSDFS_DEBUG + int state; +#endif /* CONFIG_SSDFS_DEBUG */ + struct ssdfs_btree_node *child_node; + u8 index_size; + u16 index_count; + u16 index_capacity; + u64 start_hash; + u64 end_hash; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!parent || !child); + + SSDFS_DBG("parent: node_type %#x, node %p, " + "child: node_type %#x, node %p\n", + parent->nodes.old_node.type, + parent->nodes.old_node.ptr, + child->nodes.old_node.type, + child->nodes.old_node.ptr); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (parent->nodes.old_node.type) { + case SSDFS_BTREE_ROOT_NODE: + switch (child->nodes.old_node.type) { + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + /* + * Nothing should be done for the case of + * adding the node. + */ + break; + + case SSDFS_BTREE_LEAF_NODE: + /* + * Nothing should be done for the case of + * adding the node. + */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid child node's type %#x\n", + child->nodes.old_node.type); + break; + } + break; + + case SSDFS_BTREE_INDEX_NODE: + switch (child->nodes.old_node.type) { + case SSDFS_BTREE_INDEX_NODE: + /* + * Nothing should be done for the case of + * adding the node. + */ + break; + + case SSDFS_BTREE_HYBRID_NODE: + /* + * Nothing should be done for the case of + * adding the node. + */ + break; + + case SSDFS_BTREE_LEAF_NODE: + /* + * Nothing should be done for the case of + * adding the node. + */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid child node's type %#x\n", + child->nodes.old_node.type); + break; + } + break; + + case SSDFS_BTREE_HYBRID_NODE: + switch (child->nodes.old_node.type) { + case SSDFS_BTREE_INDEX_NODE: + /* + * Nothing should be done for the case of + * adding the node. + */ + break; + + case SSDFS_BTREE_HYBRID_NODE: + /* + * Nothing should be done for the case of + * adding the node. + */ + break; + + case SSDFS_BTREE_LEAF_NODE: + /* + * Nothing should be done for the case of + * adding the node. + */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid child node's type %#x\n", + child->nodes.old_node.type); + break; + } + break; + + default: + switch (child->nodes.old_node.type) { + case SSDFS_BTREE_ROOT_NODE: + child_node = child->nodes.old_node.ptr; +#ifdef CONFIG_SSDFS_DEBUG + if (!child_node) { + SSDFS_ERR("child node is NULL\n"); + return -ERANGE; + } + + state = atomic_read(&child_node->index_area.state); + if (state != SSDFS_BTREE_NODE_INDEX_AREA_EXIST) { + SSDFS_ERR("index area is absent\n"); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&child_node->header_lock); + index_size = child_node->index_area.index_size; + index_count = child_node->index_area.index_count; + index_capacity = child_node->index_area.index_capacity; + start_hash = child_node->index_area.start_hash; + end_hash = child_node->index_area.end_hash; + up_read(&child_node->header_lock); + + if (index_count != index_capacity) { + SSDFS_ERR("count %u != capacity %u\n", + index_count, index_capacity); + return -ERANGE; + } + + parent->nodes.new_node.type = SSDFS_BTREE_ROOT_NODE; + parent->nodes.new_node.ptr = child_node; + + parent->flags |= SSDFS_BTREE_INDEX_AREA_NEED_MOVE; + parent->index_area.move.direction = + SSDFS_BTREE_MOVE_TO_CHILD; + parent->index_area.move.pos.state = + SSDFS_HASH_RANGE_INTERSECTION; + parent->index_area.move.pos.start = 0; + parent->index_area.move.pos.count = index_count; + parent->index_area.move.op_state = + SSDFS_BTREE_AREA_OP_REQUESTED; + + child->index_area.insert.pos.state = + SSDFS_HASH_RANGE_LEFT_ADJACENT; + child->index_area.insert.hash.start = start_hash; + child->index_area.insert.hash.end = end_hash; + child->index_area.insert.pos.start = 0; + child->index_area.insert.pos.count = index_count; + child->index_area.insert.op_state = + SSDFS_BTREE_AREA_OP_REQUESTED; + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid child node's type %#x\n", + child->nodes.old_node.type); + break; + } + break; + } + + return err; +} + +/* + * ssdfs_prepare_move_indexes_right() - prepare to move indexes (right) + * @parent: parent level object [in|out] + * @parent_node: parent node + * + * This method tries to define what index range should be moved + * from @parent_node to a new node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_prepare_move_indexes_right(struct ssdfs_btree_level *parent, + struct ssdfs_btree_node *parent_node) +{ +#ifdef CONFIG_SSDFS_DEBUG + int state; +#endif /* CONFIG_SSDFS_DEBUG */ + struct ssdfs_btree_node_move *move; + u16 index_count; + u16 index_capacity; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!parent || !parent_node); + + SSDFS_DBG("parent: node_type %#x, node %p, node_id %u\n", + parent->nodes.old_node.type, + parent->nodes.old_node.ptr, + parent_node->node_id); + + state = atomic_read(&parent_node->index_area.state); + if (state != SSDFS_BTREE_NODE_INDEX_AREA_EXIST) { + SSDFS_ERR("index area is absent\n"); + return -ERANGE; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&parent_node->header_lock); + index_count = parent_node->index_area.index_count; + index_capacity = parent_node->index_area.index_capacity; + up_read(&parent_node->header_lock); + + if (index_count != index_capacity) { + SSDFS_ERR("count %u != capacity %u\n", + index_count, index_capacity); + return -ERANGE; + } + + if (index_count == 0) { + SSDFS_ERR("invalid count %u\n", + index_count); + return -ERANGE; + } + + move = &parent->index_area.move; + + parent->flags |= SSDFS_BTREE_INDEX_AREA_NEED_MOVE; + move->direction = SSDFS_BTREE_MOVE_TO_RIGHT; + move->pos.state = SSDFS_HASH_RANGE_INTERSECTION; + move->pos.start = index_count / 2; + move->pos.count = index_count - move->pos.start; + move->op_state = SSDFS_BTREE_AREA_OP_REQUESTED; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("MOVE_TO_RIGHT: start %u, count %u\n", + move->pos.start, move->pos.count); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_check_capability_move_indexes_to_sibling() - check ability to rebalance + * @level: level object + * + * This method tries to define the presence of free space in + * sibling index node with the goal to rebalance the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOSPC - node hasn't free space. + */ +static int +ssdfs_check_capability_move_indexes_to_sibling(struct ssdfs_btree_level *level) +{ + struct ssdfs_btree *tree; + struct ssdfs_btree_node *node, *parent_node; + struct ssdfs_btree_node_move *move; + struct ssdfs_btree_node_index_area area; + struct ssdfs_btree_index_key index_key; + u64 hash = U64_MAX; + int index_area_state; + u16 index_count = 0; + u16 index_capacity = 0; + u16 vacant_indexes = 0; + u16 src_index_count = 0; + u16 index_position; + int node_type; + u32 node_id; + spinlock_t *lock; + u16 moving_indexes = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!level); + + SSDFS_DBG("level %p\n", level); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!level->nodes.old_node.ptr) { + SSDFS_ERR("node pointer is empty\n"); + return -ERANGE; + } + + node = level->nodes.old_node.ptr; + tree = node->tree; + + spin_lock(&node->descriptor_lock); + hash = le64_to_cpu(node->node_index.index.hash); + spin_unlock(&node->descriptor_lock); + + index_area_state = atomic_read(&node->index_area.state); + + down_read(&node->header_lock); + + if (index_area_state == SSDFS_BTREE_NODE_INDEX_AREA_EXIST) { + src_index_count = node->index_area.index_count; + index_capacity = node->index_area.index_capacity; + vacant_indexes = index_capacity - src_index_count; + } else + err = -ERANGE; + + up_read(&node->header_lock); + + if (unlikely(err)) { + SSDFS_ERR("index area is absent\n"); + return -ERANGE; + } else if (vacant_indexes != 0) { + SSDFS_ERR("node %u is not exhausted: " + "index_count %u, index_capacity %u\n", + node->node_id, src_index_count, + index_capacity); + return -ERANGE; + } + + lock = &level->nodes.old_node.ptr->descriptor_lock; + spin_lock(lock); + node = level->nodes.old_node.ptr->parent_node; + spin_unlock(lock); + lock = NULL; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + index_area_state = atomic_read(&node->index_area.state); + + down_read(&node->header_lock); + + if (index_area_state == SSDFS_BTREE_NODE_INDEX_AREA_EXIST) { + index_count = node->index_area.index_count; + index_capacity = node->index_area.index_capacity; + vacant_indexes = index_capacity - index_count; + } else + err = -ERANGE; + + up_read(&node->header_lock); + + if (unlikely(err)) { + SSDFS_ERR("index area is absent\n"); + return -ERANGE; + } + + err = ssdfs_btree_node_find_index_position(node, hash, + &index_position); + if (unlikely(err)) { + SSDFS_ERR("fail to find the index position: " + "hash %llx, err %d\n", + hash, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hash %llx, index_position %u\n", + hash, index_position); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (index_position >= index_count) { + SSDFS_ERR("index_position %u >= index_count %u\n", + index_position, index_count); + return -ERANGE; + } + + if ((index_position + 1) == index_count) { + SSDFS_DBG("no siblings on the right\n"); + + if (vacant_indexes == 0) { + SSDFS_DBG("cannot add index\n"); + return -ENOSPC; + } else { + SSDFS_DBG("need add empty index node\n"); + return -ENOENT; + } + } + + index_position++; + + node_type = atomic_read(&node->type); + + down_read(&node->full_lock); + + if (node_type == SSDFS_BTREE_ROOT_NODE) { + err = __ssdfs_btree_root_node_extract_index(node, + index_position, + &index_key); + } else { + down_read(&node->header_lock); + ssdfs_memcpy(&area, + 0, sizeof(struct ssdfs_btree_node_index_area), + &node->index_area, + 0, sizeof(struct ssdfs_btree_node_index_area), + sizeof(struct ssdfs_btree_node_index_area)); + up_read(&node->header_lock); + + err = __ssdfs_btree_common_node_extract_index(node, &area, + index_position, + &index_key); + } + + up_read(&node->full_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to extract index key: " + "index_position %u, err %d\n", + index_position, err); + ssdfs_debug_show_btree_node_indexes(node->tree, node); + return err; + } + + parent_node = node; + node_id = le32_to_cpu(index_key.node_id); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index_position %u, node_id %u\n", + index_position, node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_btree_radix_tree_find(tree, node_id, &node); + if (err == -ENOENT) { + err = 0; + node = __ssdfs_btree_read_node(tree, parent_node, + &index_key, + index_key.node_type, + node_id); + if (unlikely(IS_ERR_OR_NULL(node))) { + err = !node ? -ENOMEM : PTR_ERR(node); + SSDFS_ERR("fail to read: " + "node %llu, err %d\n", + (u64)node_id, err); + return err; + } + } else if (unlikely(err)) { + SSDFS_ERR("fail to find node in radix tree: " + "node_id %llu, err %d\n", + (u64)node_id, err); + return err; + } else if (!node) { + SSDFS_WARN("empty node pointer\n"); + return -ERANGE; + } + + index_area_state = atomic_read(&node->index_area.state); + + down_read(&node->header_lock); + + if (index_area_state == SSDFS_BTREE_NODE_INDEX_AREA_EXIST) { + index_count = node->index_area.index_count; + index_capacity = node->index_area.index_capacity; + vacant_indexes = index_capacity - index_count; + } else + err = -ERANGE; + + up_read(&node->header_lock); + + if (unlikely(err)) { + SSDFS_ERR("index area is absent\n"); + return -ERANGE; + } else if (index_count >= index_capacity) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index area hasn't free space: " + "index_count %u, index_capacity %u\n", + index_count, index_capacity); + SSDFS_DBG("cannot move indexes\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOENT; + } + + moving_indexes = vacant_indexes / 2; + if (moving_indexes == 0) + moving_indexes = 1; + + move = &level->index_area.move; + + level->flags |= SSDFS_BTREE_INDEX_AREA_NEED_MOVE; + move->direction = SSDFS_BTREE_MOVE_TO_RIGHT; + move->pos.state = SSDFS_HASH_RANGE_INTERSECTION; + move->pos.start = src_index_count - moving_indexes; + move->pos.count = src_index_count - move->pos.start; + move->op_state = SSDFS_BTREE_AREA_OP_REQUESTED; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("MOVE_TO_RIGHT: start %u, count %u\n", + move->pos.start, move->pos.count); +#endif /* CONFIG_SSDFS_DEBUG */ + + level->nodes.new_node.type = atomic_read(&node->type); + level->nodes.new_node.ptr = node; + + return 0; +} + +/* + * ssdfs_define_hybrid_node_moving_items() - define moving items range + * @tree: btree object + * @start_hash: starting hash + * @end_hash: ending hash + * @parent_node: pointer on parent node + * @child_node: pointer on child node + * @parent: parent level object [in|out] + * @child: child level object [in|out] + * + * This method tries to define what items range should be moved + * between @parent and @child levels. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_define_hybrid_node_moving_items(struct ssdfs_btree *tree, + u64 start_hash, u64 end_hash, + struct ssdfs_btree_node *parent_node, + struct ssdfs_btree_node *child_node, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + struct ssdfs_btree_node_move *move; + struct ssdfs_btree_node_insert *insert; + int state; + u32 free_space = 0; + u16 item_size; + u16 items_count; + u16 items_capacity; + u64 parent_start_hash; + u64 parent_end_hash; + u32 index_area_size; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!parent || !child || !parent_node); + + SSDFS_DBG("parent: node_type %#x, node %p\n", + parent->nodes.old_node.type, + parent->nodes.old_node.ptr); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_read(&parent_node->items_area.state); + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + SSDFS_ERR("items area is absent\n"); + return -ERANGE; + } + + down_read(&parent_node->header_lock); + item_size = parent_node->items_area.item_size; + items_count = parent_node->items_area.items_count; + items_capacity = parent_node->items_area.items_capacity; + parent_start_hash = parent_node->items_area.start_hash; + parent_end_hash = parent_node->items_area.end_hash; + index_area_size = parent_node->index_area.area_size; + up_read(&parent_node->header_lock); + + if (child_node) { + down_read(&child_node->header_lock); + free_space = child_node->items_area.free_space; + up_read(&child_node->header_lock); + } else { + /* it needs to add a child node */ + free_space = (u32)items_capacity * item_size; + } + + if (items_count == 0) { + SSDFS_DBG("no items to move\n"); + return 0; + } + + if (free_space < ((u32)item_size * items_count)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to move items: " + "free_space %u, item_size %u, items_count %u\n", + free_space, item_size, items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + move = &parent->items_area.move; + insert = &child->items_area.insert; + + switch (tree->type) { + case SSDFS_INODES_BTREE: + case SSDFS_EXTENTS_BTREE: + case SSDFS_SHARED_EXTENTS_BTREE: + /* no additional check is necessary */ + break; + + case SSDFS_DENTRIES_BTREE: + case SSDFS_XATTR_BTREE: + case SSDFS_SHARED_DICTIONARY_BTREE: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_hash %llx, " + "end_hash %llx, " + "parent: (start_hash %llx, " + "end_hash %llx)\n", + start_hash, end_hash, + parent_start_hash, + parent_end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if ((index_area_size * 2) < parent_node->node_size) { + /* parent node will be still hybrid one */ + items_count /= 2; + } + break; + + default: + BUG(); + } + + parent->flags |= SSDFS_BTREE_ITEMS_AREA_NEED_MOVE; + move->direction = SSDFS_BTREE_MOVE_TO_CHILD; + move->pos.state = SSDFS_HASH_RANGE_INTERSECTION; + move->pos.start = 0; + move->pos.count = items_count; + move->op_state = SSDFS_BTREE_AREA_OP_REQUESTED; + + insert->pos.state = SSDFS_HASH_RANGE_LEFT_ADJACENT; + insert->hash.start = start_hash; + insert->hash.end = end_hash; + insert->pos.start = 0; + insert->pos.count = items_count; + insert->op_state = SSDFS_BTREE_AREA_OP_REQUESTED; + + child->flags |= SSDFS_BTREE_LEVEL_ADD_NODE; + + return 0; +} + +/* + * ssdfs_btree_define_moving_items() - define moving items range + * @tree: btree object + * @search: search object + * @parent: parent level object [in|out] + * @child: child level object [in|out] + * + * This method tries to define what items range should be moved + * between @parent and @child levels. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_define_moving_items(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + struct ssdfs_btree_node *parent_node, *child_node = NULL; + struct ssdfs_btree_node_move *move; + struct ssdfs_btree_node_insert *insert; + int state; + u32 free_space = 0; + u16 item_size; + u16 items_count; + u16 items_capacity; + int child_node_type; + u64 start_hash1, end_hash1; + u64 start_hash2, end_hash2; + bool has_intersection = false; + bool can_add_index = true; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search || !parent || !child); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (child->flags & SSDFS_BTREE_LEVEL_ADD_NODE) + child_node_type = child->nodes.new_node.type; + else { + child_node_type = child->nodes.old_node.type; + child_node = child->nodes.old_node.ptr; + + if (!child_node) { + SSDFS_ERR("child node is empty\n"); + return -EINVAL; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("parent: node_type %#x, node %p, " + "child: node_type %#x, node %p\n", + parent->nodes.old_node.type, + parent->nodes.old_node.ptr, + child_node_type, + child_node); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (parent->nodes.old_node.type) { + case SSDFS_BTREE_ROOT_NODE: + switch (child_node_type) { + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_LEAF_NODE: + /* + * Nothing should be done. + * The root node is pure index node. + */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid child node's type %#x\n", + child->nodes.old_node.type); + break; + } + break; + + case SSDFS_BTREE_INDEX_NODE: + switch (child_node_type) { + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_LEAF_NODE: + /* + * Nothing should be done. + * The index node hasn't items at all. + */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid child node's type %#x\n", + child->nodes.old_node.type); + break; + } + break; + + case SSDFS_BTREE_HYBRID_NODE: + switch (child_node_type) { + case SSDFS_BTREE_INDEX_NODE: + /* + * Nothing should be done. + * The index node hasn't items at all. + */ + break; + + case SSDFS_BTREE_HYBRID_NODE: + parent_node = parent->nodes.old_node.ptr; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!parent_node); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_read(&parent_node->items_area.state); + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + SSDFS_ERR("items area is absent\n"); + return -ERANGE; + } + + if (!(child->flags & SSDFS_BTREE_ITEMS_AREA_NEED_MOVE)) + return 0; + + down_read(&parent_node->header_lock); + free_space = parent_node->items_area.area_size; + item_size = parent_node->items_area.item_size; + items_count = parent_node->items_area.items_count; + items_capacity = parent_node->items_area.items_capacity; + start_hash1 = parent_node->items_area.start_hash; + end_hash1 = parent_node->items_area.end_hash; + up_read(&parent_node->header_lock); + + if (child_node) { + down_read(&child_node->header_lock); + free_space = child_node->items_area.free_space; + up_read(&child_node->header_lock); + } else { + /* it needs to add a child node */ + free_space = (u32)items_capacity * item_size; + } + + if (items_count == 0) { + SSDFS_DBG("no items to move\n"); + return 0; + } + + if (free_space < ((u32)item_size * items_count)) { + SSDFS_WARN("unable to move items: " + "items_area.free_space %u, " + "items_area.item_size %u, " + "items_count %u\n", + free_space, item_size, + items_count); + return 0; + } + + start_hash2 = search->request.start.hash; + end_hash2 = search->request.end.hash; + + has_intersection = + RANGE_HAS_PARTIAL_INTERSECTION(start_hash1, + end_hash1, + start_hash2, + end_hash2); + can_add_index = can_add_new_index(parent_node); + + move = &parent->items_area.move; + insert = &child->items_area.insert; + + switch (tree->type) { + case SSDFS_INODES_BTREE: + case SSDFS_EXTENTS_BTREE: + case SSDFS_SHARED_EXTENTS_BTREE: + /* no additional check is necessary */ + break; + + case SSDFS_DENTRIES_BTREE: + case SSDFS_XATTR_BTREE: + case SSDFS_SHARED_DICTIONARY_BTREE: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search: (start_hash %llx, " + "end_hash %llx), " + "items_area: (start_hash %llx, " + "end_hash %llx)\n", + start_hash2, end_hash2, + start_hash1, end_hash1); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (has_intersection) + items_count /= 2; + else if (can_add_index) { + SSDFS_DBG("no need to move items\n"); + + move->op_state = + SSDFS_BTREE_AREA_OP_UNKNOWN; + move->direction = + SSDFS_BTREE_MOVE_NOWHERE; + move->pos.start = U16_MAX; + move->pos.count = 0; + return 0; + } else { + SSDFS_DBG("need two phase adding\n"); + return -EAGAIN; + } + break; + + default: + BUG(); + } + + parent->flags |= SSDFS_BTREE_ITEMS_AREA_NEED_MOVE; + move->direction = SSDFS_BTREE_MOVE_TO_CHILD; + move->pos.state = SSDFS_HASH_RANGE_INTERSECTION; + move->pos.start = 0; + move->pos.count = items_count; + move->op_state = SSDFS_BTREE_AREA_OP_REQUESTED; + + insert->pos.state = SSDFS_HASH_RANGE_LEFT_ADJACENT; + insert->hash.start = start_hash1; + insert->hash.end = end_hash1; + insert->pos.start = 0; + insert->pos.count = items_count; + insert->op_state = SSDFS_BTREE_AREA_OP_REQUESTED; + + child->flags |= SSDFS_BTREE_LEVEL_ADD_NODE; + break; + + case SSDFS_BTREE_LEAF_NODE: + parent_node = parent->nodes.old_node.ptr; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!parent_node); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_read(&parent_node->items_area.state); + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + SSDFS_ERR("items area is absent\n"); + return -ERANGE; + } + + down_read(&parent_node->header_lock); + free_space = parent_node->items_area.area_size; + item_size = parent_node->items_area.item_size; + items_count = parent_node->items_area.items_count; + items_capacity = parent_node->items_area.items_capacity; + start_hash1 = parent_node->items_area.start_hash; + end_hash1 = parent_node->items_area.end_hash; + up_read(&parent_node->header_lock); + + if (child_node) { + down_read(&child_node->header_lock); + free_space = child_node->items_area.free_space; + up_read(&child_node->header_lock); + } else { + /* it needs to add a child node */ + free_space = (u32)items_capacity * item_size; + } + + if (items_count == 0) { + SSDFS_DBG("no items to move\n"); + return 0; + } + + if (free_space < ((u32)item_size * items_count)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to move items: " + "items_area.free_space %u, " + "items_area.item_size %u, " + "items_count %u\n", + free_space, item_size, + items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + start_hash2 = search->request.start.hash; + end_hash2 = search->request.end.hash; + + has_intersection = + RANGE_HAS_PARTIAL_INTERSECTION(start_hash1, + end_hash1, + start_hash2, + end_hash2); + can_add_index = can_add_new_index(parent_node); + + move = &parent->items_area.move; + insert = &child->items_area.insert; + + switch (tree->type) { + case SSDFS_INODES_BTREE: + case SSDFS_EXTENTS_BTREE: + case SSDFS_SHARED_EXTENTS_BTREE: + /* no additional check is necessary */ + break; + + case SSDFS_DENTRIES_BTREE: + case SSDFS_XATTR_BTREE: + case SSDFS_SHARED_DICTIONARY_BTREE: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search: (start_hash %llx, " + "end_hash %llx), " + "items_area: (start_hash %llx, " + "end_hash %llx)\n", + start_hash2, end_hash2, + start_hash1, end_hash1); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (has_intersection) + items_count /= 2; + else if (can_add_index) { + SSDFS_DBG("no need to move items\n"); + + move->op_state = + SSDFS_BTREE_AREA_OP_UNKNOWN; + move->direction = + SSDFS_BTREE_MOVE_NOWHERE; + move->pos.start = U16_MAX; + move->pos.count = 0; + return 0; + } else { + SSDFS_DBG("need two phase adding\n"); + return -EAGAIN; + } + break; + + default: + BUG(); + } + + parent->flags |= SSDFS_BTREE_ITEMS_AREA_NEED_MOVE; + move->direction = SSDFS_BTREE_MOVE_TO_CHILD; + move->pos.state = SSDFS_HASH_RANGE_INTERSECTION; + move->pos.start = 0; + move->pos.count = items_count; + move->op_state = SSDFS_BTREE_AREA_OP_REQUESTED; + + insert->pos.state = SSDFS_HASH_RANGE_LEFT_ADJACENT; + insert->hash.start = start_hash1; + insert->hash.end = end_hash1; + insert->pos.start = 0; + insert->pos.count = items_count; + insert->op_state = SSDFS_BTREE_AREA_OP_REQUESTED; + + child->flags |= SSDFS_BTREE_LEVEL_ADD_NODE; + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid child node's type %#x\n", + child->nodes.old_node.type); + break; + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid parent node's type %#x\n", + parent->nodes.old_node.type); + break; + } + + return err; +} + +/* + * need_update_parent_index_area() - does it need to update parent's index area + * @start_hash: starting hash value + * @child: btree node object + */ +bool need_update_parent_index_area(u64 start_hash, + struct ssdfs_btree_node *child) +{ + int state; + u64 child_start_hash = U64_MAX; + bool need_update = false; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!child); + + SSDFS_DBG("start_hash %llx, node_id %u, " + "node type %#x, tree type %#x\n", + start_hash, child->node_id, + atomic_read(&child->type), + child->tree->type); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&child->type)) { + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_INDEX_NODE: + state = atomic_read(&child->index_area.state); + if (state != SSDFS_BTREE_NODE_INDEX_AREA_EXIST) { + SSDFS_ERR("invalid index area's state %#x\n", + state); + return false; + } + + down_read(&child->header_lock); + child_start_hash = child->index_area.start_hash; + up_read(&child->header_lock); + break; + + case SSDFS_BTREE_LEAF_NODE: + state = atomic_read(&child->items_area.state); + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + SSDFS_ERR("invalid items area's state %#x\n", + state); + return false; + } + + down_read(&child->header_lock); + child_start_hash = child->items_area.start_hash; + up_read(&child->header_lock); + break; + + default: + SSDFS_ERR("unexpected node's type %#x\n", + atomic_read(&child->type)); + return false; + } + + if (child_start_hash >= U64_MAX) { + SSDFS_WARN("invalid start_hash %llx\n", + child_start_hash); + return false; + } + + if (start_hash <= child_start_hash) + need_update = true; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_hash %llx, child_start_hash %llx, " + "need_update %#x\n", + start_hash, child_start_hash, + need_update); +#endif /* CONFIG_SSDFS_DEBUG */ + + return need_update; +} + +/* + * is_index_area_resizable() - is it possible to resize the index area? + * @node: btree node object + */ +static inline +bool is_index_area_resizable(struct ssdfs_btree_node *node) +{ + int flags; + int state; + u32 node_size; + u32 index_area_size, items_area_size; + size_t hdr_size; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree); + + SSDFS_DBG("node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + flags = atomic_read(&node->tree->flags); + + if (!(flags & SSDFS_BTREE_DESC_INDEX_AREA_RESIZABLE)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index area cannot be resized: " + "node %u\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + node_size = node->node_size; + hdr_size = sizeof(node->raw); + + down_read(&node->header_lock); + index_area_size = node->index_area.area_size; + items_area_size = node->items_area.area_size; + up_read(&node->header_lock); + + state = atomic_read(&node->index_area.state); + if (state != SSDFS_BTREE_NODE_INDEX_AREA_EXIST) + index_area_size = 0; + + state = atomic_read(&node->items_area.state); + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) + items_area_size = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, node_size %u, hdr_size %zu, " + "index_area_size %u, items_area_size %u\n", + node->node_id, node_size, hdr_size, + index_area_size, items_area_size); + + BUG_ON(node_size < (hdr_size + index_area_size + items_area_size)); +#else + if (node_size < (hdr_size + index_area_size + items_area_size)) { + SSDFS_WARN("node_size %u < " + "(hdr_size %zu +index_area %u + items_area %u)\n", + node_size, hdr_size, + index_area_size, items_area_size); + return false; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + return items_area_size == 0 ? false : true; +} + +/* + * ssdfs_btree_prepare_index_area_resize() - prepare index area resize + * @level: level object + * @node: node object + * + * This method tries to prepare index area for resize operation. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_prepare_index_area_resize(struct ssdfs_btree_level *level, + struct ssdfs_btree_node *node) +{ + int state; + u16 items_count; + u16 items_capacity; + u32 index_area_size, items_area_size; + u32 index_area_min_size; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!level); + BUG_ON(!node || !node->tree); + + SSDFS_DBG("node_id %u\n", node->node_id); + + BUG_ON(!is_index_area_resizable(node)); +#endif /* CONFIG_SSDFS_DEBUG */ + + level->flags |= SSDFS_BTREE_TRY_RESIZE_INDEX_AREA; + + down_read(&node->header_lock); + index_area_size = node->index_area.area_size; + items_area_size = node->items_area.area_size; + items_count = node->items_area.items_count; + items_capacity = node->items_area.items_capacity; + up_read(&node->header_lock); + + index_area_min_size = node->tree->index_area_min_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index_area_size %u, index_area_min_size %u, " + "items_area_size %u, items_count %u, " + "items_capacity %u\n", + index_area_size, index_area_min_size, + items_area_size, items_count, items_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_read(&node->items_area.state); + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + SSDFS_ERR("items area doesn't exist: " + "node_id %u\n", + node->node_id); + return -ERANGE; + } + + if (items_count == 0 || items_count > items_capacity) { + SSDFS_ERR("corrupted items area: " + "items_count %u, items_capacity %u\n", + items_count, items_capacity); + return -ERANGE; + } + + return 0; +} diff --git a/fs/ssdfs/btree_hierarchy.h b/fs/ssdfs/btree_hierarchy.h new file mode 100644 index 000000000000..b431be941e46 --- /dev/null +++ b/fs/ssdfs/btree_hierarchy.h @@ -0,0 +1,284 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/btree_hierarchy.h - btree hierarchy declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#ifndef _SSDFS_BTREE_HIERARCHY_H +#define _SSDFS_BTREE_HIERARCHY_H + +/* + * struct ssdfs_hash_range - hash range + * @start: start hash + * @end: end hash + */ +struct ssdfs_hash_range { + u64 start; + u64 end; +}; + +/* + * struct ssdfs_btree_node_position - node's position range + * @state: intersection state + * @start: starting node's position + * @count: number of positions in the range + */ +struct ssdfs_btree_node_position { + int state; + u16 start; + u16 count; +}; + +/* Intersection states */ +enum { + SSDFS_HASH_RANGE_INTERSECTION_UNDEFINED, + SSDFS_HASH_RANGE_LEFT_ADJACENT, + SSDFS_HASH_RANGE_INTERSECTION, + SSDFS_HASH_RANGE_RIGHT_ADJACENT, + SSDFS_HASH_RANGE_OUT_OF_NODE, + SSDFS_HASH_RANGE_INTERSECTION_STATE_MAX +}; + +/* + * struct ssdfs_btree_node_insert - insert position + * @op_state: operation state + * @hash: hash range of insertion + * @pos: position descriptor + */ +struct ssdfs_btree_node_insert { + int op_state; + struct ssdfs_hash_range hash; + struct ssdfs_btree_node_position pos; +}; + +/* + * struct ssdfs_btree_node_move - moving range descriptor + * @op_state: operation state + * @direction: moving direction + * @pos: position descriptor + */ +struct ssdfs_btree_node_move { + int op_state; + int direction; + struct ssdfs_btree_node_position pos; +}; + +/* + * struct ssdfs_btree_node_delete - deleting node's index descriptor + * @op_state: operation state + * @node_index: node index for deletion + */ +struct ssdfs_btree_node_delete { + int op_state; + struct ssdfs_btree_index_key node_index; +}; + +/* Possible operation states */ +enum { + SSDFS_BTREE_AREA_OP_UNKNOWN, + SSDFS_BTREE_AREA_OP_REQUESTED, + SSDFS_BTREE_AREA_OP_DONE, + SSDFS_BTREE_AREA_OP_FAILED, + SSDFS_BTREE_AREA_OP_STATE_MAX +}; + +/* Possible moving directions */ +enum { + SSDFS_BTREE_MOVE_NOWHERE, + SSDFS_BTREE_MOVE_TO_PARENT, + SSDFS_BTREE_MOVE_TO_CHILD, + SSDFS_BTREE_MOVE_TO_LEFT, + SSDFS_BTREE_MOVE_TO_RIGHT, + SSDFS_BTREE_MOVE_DIRECTION_MAX +}; + +/* Btree level's flags */ +#define SSDFS_BTREE_LEVEL_ADD_NODE (1 << 0) +#define SSDFS_BTREE_LEVEL_ADD_INDEX (1 << 1) +#define SSDFS_BTREE_LEVEL_UPDATE_INDEX (1 << 2) +#define SSDFS_BTREE_LEVEL_ADD_ITEM (1 << 3) +#define SSDFS_BTREE_INDEX_AREA_NEED_MOVE (1 << 4) +#define SSDFS_BTREE_ITEMS_AREA_NEED_MOVE (1 << 5) +#define SSDFS_BTREE_TRY_RESIZE_INDEX_AREA (1 << 6) +#define SSDFS_BTREE_LEVEL_DELETE_NODE (1 << 7) +#define SSDFS_BTREE_LEVEL_DELETE_INDEX (1 << 8) +#define SSDFS_BTREE_LEVEL_FLAGS_MASK 0x1FF + +#define SSDFS_BTREE_ADD_NODE_MASK \ + (SSDFS_BTREE_LEVEL_ADD_NODE | SSDFS_BTREE_LEVEL_ADD_INDEX | \ + SSDFS_BTREE_LEVEL_UPDATE_INDEX | SSDFS_BTREE_LEVEL_ADD_ITEM | \ + SSDFS_BTREE_INDEX_AREA_NEED_MOVE | \ + SSDFS_BTREE_ITEMS_AREA_NEED_MOVE | \ + SSDFS_BTREE_TRY_RESIZE_INDEX_AREA) + +#define SSDFS_BTREE_DELETE_NODE_MASK \ + (SSDFS_BTREE_LEVEL_UPDATE_INDEX | SSDFS_BTREE_LEVEL_DELETE_NODE | \ + SSDFS_BTREE_LEVEL_DELETE_INDEX) + +/* + * struct ssdfs_btree_level_node - node descriptor + * @type: node's type + * @index_hash: old index area's hash pair + * @items_hash: old items area's hash pair + * @ptr: pointer on node's object + */ +struct ssdfs_btree_level_node { + int type; + struct ssdfs_hash_range index_hash; + struct ssdfs_hash_range items_hash; + struct ssdfs_btree_node *ptr; +}; + +/* + * struct ssdfs_btree_level_node_desc - descriptor of level's nodes + * @old_node: old node of the level + * @new_node: created empty node + */ +struct ssdfs_btree_level_node_desc { + struct ssdfs_btree_level_node old_node; + struct ssdfs_btree_level_node new_node; +}; + +/* + * struct ssdfs_btree_level - btree level descriptor + * @flags: level's flags + * @index_area.area_size: size of the index area + * @index_area.free_space: free space in index area + * @index_area.hash: hash range of index area + * @items_area.add: adding index descriptor + * @index_area.insert: insert position descriptor + * @index_area.move: move range descriptor + * @index_area.delete: delete index descriptor + * @items_area.area_size: size of the items area + * @items_area.free_space: free space in items area + * @items_area.hash: hash range of items area + * @items_area.add: adding item descriptor + * @items_area.insert: insert position descriptor + * @items_area.move: move range descriptor + * @nodes: descriptor of level's nodes + */ +struct ssdfs_btree_level { + u32 flags; + + struct { + u32 area_size; + u32 free_space; + struct ssdfs_hash_range hash; + struct ssdfs_btree_node_insert add; + struct ssdfs_btree_node_insert insert; + struct ssdfs_btree_node_move move; + struct ssdfs_btree_node_delete delete; + } index_area; + + struct { + u32 area_size; + u32 free_space; + struct ssdfs_hash_range hash; + struct ssdfs_btree_node_insert add; + struct ssdfs_btree_node_insert insert; + struct ssdfs_btree_node_move move; + } items_area; + + struct ssdfs_btree_level_node_desc nodes; +}; + +/* + * struct ssdfs_btree_state_descriptor - btree's state descriptor + * @height: btree height + * @increment_height: request to increment tree's height + * @node_size: size of the node in bytes + * @index_size: size of the index record in bytes + * @min_item_size: minimum item size in bytes + * @max_item_size: maximum item size in bytes + * @index_area_min_size: minimum size of index area in bytes + */ +struct ssdfs_btree_state_descriptor { + int height; + bool increment_height; + u32 node_size; + u16 index_size; + u16 min_item_size; + u16 max_item_size; + u16 index_area_min_size; +}; + +/* + * struct ssdfs_btree_hierarchy - btree's hierarchy descriptor + * @desc: btree state's descriptor + * @array_ptr: btree level's array + */ +struct ssdfs_btree_hierarchy { + struct ssdfs_btree_state_descriptor desc; + struct ssdfs_btree_level **array_ptr; +}; + +/* Btree hierarchy inline methods */ +static inline +bool need_add_node(struct ssdfs_btree_level *level) +{ + return level->flags & SSDFS_BTREE_LEVEL_ADD_NODE; +} + +static inline +bool need_delete_node(struct ssdfs_btree_level *level) +{ + return level->flags & SSDFS_BTREE_LEVEL_DELETE_NODE; +} + +/* Btree hierarchy API */ +struct ssdfs_btree_hierarchy * +ssdfs_btree_hierarchy_allocate(struct ssdfs_btree *tree); +void ssdfs_btree_hierarchy_init(struct ssdfs_btree *tree, + struct ssdfs_btree_hierarchy *hierarchy); +void ssdfs_btree_hierarchy_free(struct ssdfs_btree_hierarchy *hierarchy); + +bool need_update_parent_index_area(u64 start_hash, + struct ssdfs_btree_node *child); +int ssdfs_btree_check_hierarchy_for_add(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_hierarchy *ptr); +int ssdfs_btree_process_level_for_add(struct ssdfs_btree_hierarchy *ptr, + int cur_height, + struct ssdfs_btree_search *search); +int ssdfs_btree_check_hierarchy_for_delete(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_hierarchy *ptr); +int ssdfs_btree_process_level_for_delete(struct ssdfs_btree_hierarchy *ptr, + int cur_height, + struct ssdfs_btree_search *search); +int ssdfs_btree_check_hierarchy_for_update(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_hierarchy *ptr); +int ssdfs_btree_process_level_for_update(struct ssdfs_btree_hierarchy *ptr, + int cur_height, + struct ssdfs_btree_search *search); + +/* Btree hierarchy internal API*/ +void ssdfs_btree_prepare_add_node(struct ssdfs_btree *tree, + int node_type, + u64 start_hash, u64 end_hash, + struct ssdfs_btree_level *level, + struct ssdfs_btree_node *node); +int ssdfs_btree_prepare_add_index(struct ssdfs_btree_level *level, + u64 start_hash, u64 end_hash, + struct ssdfs_btree_node *node); + +void ssdfs_show_btree_hierarchy_object(struct ssdfs_btree_hierarchy *ptr); +void ssdfs_debug_btree_hierarchy_object(struct ssdfs_btree_hierarchy *ptr); + +#endif /* _SSDFS_BTREE_HIERARCHY_H */ From patchwork Sat Feb 25 01:09:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151962 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D448C7EE2E for ; Sat, 25 Feb 2023 01:19:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229660AbjBYBTv (ORCPT ); Fri, 24 Feb 2023 20:19:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49454 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229717AbjBYBRm (ORCPT ); Fri, 24 Feb 2023 20:17:42 -0500 Received: from mail-oo1-xc30.google.com (mail-oo1-xc30.google.com [IPv6:2607:f8b0:4864:20::c30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1CA3514EBA for ; Fri, 24 Feb 2023 17:17:37 -0800 (PST) Received: by mail-oo1-xc30.google.com with SMTP id c184-20020a4a4fc1000000b005250b2dc0easo187302oob.2 for ; Fri, 24 Feb 2023 17:17:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Y0vWwwEK2Sgn1j3SMsE4vCMD80KQjUt/bWDZ6WgkEw0=; b=hflc34cSNVjsVQBFsm9QMbZNr2pEIaiyW1BZPLaRuA9bxQgMEFrQGkY9geMsfdEdfX RcLDKqTeZ64bV2v8+Ldhy6GSrAPlsX2H+Qx2pe6wn5CG/8D2pdm00OLbclKpB0JVb2V9 0AkHpxyIoF3SDP64mNhfhqfFQ+9RCsC5WQRwqaLO8X1/q2AL3wwr1SU/Nr6Sh+y1xsjh BRYhkmGOykVunEzXrTzpGxTDpBbYrazfWUi2jg9XcgPOS3zbOQn2fjk0MRaJW7DYlBwZ KUeuqX0BRsKTAn/D/jYPFtKJQ7APzy+igOMPfIf8nk5k80SpkpUYvnKP6lRgRqGwjWBL oXBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Y0vWwwEK2Sgn1j3SMsE4vCMD80KQjUt/bWDZ6WgkEw0=; b=29DF4pZCbQpthEVXcVWb6BzZrVD8Ve+Ev6gk/3OSdZaLd/2027iUiZpYLgbT8BoYgc PnPnaU174/3Ocedjdzz6snn+7I4xDD8HmcEAgIaC882sMtabpvwDqTFuSPt5JeVw4BAK uXGAmxX9xJ2G9/1oIPFjDwSGvHd7aqcULdognIy/rfy6OIrIPzq9QW4e3QlXfB0c3zdI Q33DUE8Zd6263dV25nNyhY3guyotxITjzhWZnF/7C5AjAlwQFqrpQ6/BdLReiScCyJsN 7vPWN0C50FHvII9P1VElYqlfOt8RU9st2Hhf8lgPKqLbOK0IR/VzZZEucnPj3Dj0a7ll LMlg== X-Gm-Message-State: AO0yUKUd0vA8VfwRoI9wkQGl464erQC00msCQCKDS1MP/ww9IlxJkMYq UenarbLFArx49cLYjKLU2QH0hL8GQNKLujvV X-Google-Smtp-Source: AK7set/dueR8IGlTlcAYxV11ztmkXP8UI9tBwSd+SSZX7/wO1POPKhmuC4bByyCcVZBbIxs22KVEhQ== X-Received: by 2002:a4a:3552:0:b0:525:129c:6165 with SMTP id w18-20020a4a3552000000b00525129c6165mr5554801oog.6.1677287855466; Fri, 24 Feb 2023 17:17:35 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:34 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 57/76] ssdfs: check b-tree hierarchy for add operation Date: Fri, 24 Feb 2023 17:09:08 -0800 Message-Id: <20230225010927.813929-58-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The main goal of checking b-tree hierarchy for add operation is to define how many new nodes and which type(s) should be added. Any b-tree starts with creation of root node. The root node can store only two index keys. Initially, SSDFS logic adds leaf nodes into empty b-tree. If root node already contains two index keys on leaf nodes, then hybrid node needs to be added into the b-tree. Hybrid node contains as index as item areas. New items will be added into items area of hybrid node until this area becomes completely full. B-tree logic allocates new leaf node, all existing items in hybrid node are moved into newly created leaf node, and index key is added into hybrid node's index area. Such operation repeat multiple times until index area of hybrid node becomes completely full. Now index area is resized by increasing twice in size after moving existing items into newly created node. Finally, hybrid node will be converted into index node. If root node contains two index keys on hybrid nodes, then index node will be added in the b-tree. Generaly speaking, the leaf nodes are always allocated for the lowest level. Next level contains hybrid nodes. And the rest of b-tree levels contain index nodes. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/btree_hierarchy.c | 2747 ++++++++++++++++++++++++++++++++++++ 1 file changed, 2747 insertions(+) diff --git a/fs/ssdfs/btree_hierarchy.c b/fs/ssdfs/btree_hierarchy.c index cba502e6f3a6..6e9f91ed4541 100644 --- a/fs/ssdfs/btree_hierarchy.c +++ b/fs/ssdfs/btree_hierarchy.c @@ -2906,3 +2906,2750 @@ int ssdfs_btree_prepare_index_area_resize(struct ssdfs_btree_level *level, return 0; } + +/* + * ssdfs_btree_check_nothing_root_pair() - check pair of nothing and root nodes + * @tree: btree object + * @search: search object + * @parent: parent level object + * @child: child level object + * + * This method tries to check the nothing and root nodes pair. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_check_nothing_root_pair(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + int child_type; + struct ssdfs_btree_node *child_node; + u64 start_hash, end_hash; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!parent || !child); + BUG_ON(!rwsem_is_locked(&tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + child_node = child->nodes.old_node.ptr; + + if (!child_node) { + SSDFS_ERR("child is NULL\n"); + return -ERANGE; + } + + child_type = atomic_read(&child_node->type); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree %p, search %p, " + "parent %p, child %p, " + "child id %u, child_type %#x\n", + tree, search, parent, child, + child_node->node_id, child_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (child_type != SSDFS_BTREE_ROOT_NODE) { + SSDFS_WARN("invalid child node's type %#x\n", + child_type); + return -ERANGE; + } + + if (!(child->flags & SSDFS_BTREE_LEVEL_ADD_NODE)) + return 0; + + down_read(&child_node->header_lock); + start_hash = child_node->index_area.start_hash; + end_hash = U64_MAX; + up_read(&child_node->header_lock); + + err = ssdfs_btree_prepare_add_index(parent, + start_hash, + end_hash, + child_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u, height %u\n", + child_node->node_id, + atomic_read(&child_node->height)); + return err; + } + + err = ssdfs_btree_define_moving_indexes(parent, child); + if (unlikely(err)) { + SSDFS_ERR("fail to define moving indexes: " + "err %d\n", err); + return err; + } + + return 0; +} + +/* + * ssdfs_btree_check_root_nothing_pair() - check pair of root and nothing nodes + * @tree: btree object + * @search: search object + * @parent: parent level object + * @child: child level object + * + * This method tries to check the root and nothing nodes pair. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_check_root_nothing_pair(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + int tree_height; + int parent_type; + struct ssdfs_btree_node *parent_node; + u64 start_hash, end_hash; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!parent || !child); + BUG_ON(!rwsem_is_locked(&tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + parent_node = parent->nodes.old_node.ptr; + + if (!parent_node) { + SSDFS_ERR("parent is NULL\n"); + return -ERANGE; + } + + parent_type = atomic_read(&parent_node->type); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree %p, search %p, " + "parent %p, child %p, " + "parent id %u, type %#x\n", + tree, search, parent, child, + parent_node->node_id, + parent_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (parent_type != SSDFS_BTREE_ROOT_NODE) { + SSDFS_WARN("invalid parent node's type %#x\n", + parent_type); + return -ERANGE; + } + + tree_height = atomic_read(&tree->height); + if (tree_height <= 0 || tree_height > 1) { + SSDFS_WARN("unexpected tree_height %u\n", + tree_height); + return -EINVAL; + } + + if (!can_add_new_index(parent_node)) { + SSDFS_ERR("unable add index into the root\n"); + return -ERANGE; + } + + start_hash = search->request.start.hash; + end_hash = search->request.end.hash; + + ssdfs_btree_prepare_add_node(tree, SSDFS_BTREE_LEAF_NODE, + start_hash, end_hash, + child, NULL); + + err = ssdfs_btree_prepare_add_index(parent, start_hash, + end_hash, parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } + + return 0; +} + +/* + * ssdfs_btree_check_root_index_pair() - check pair of root and index nodes + * @tree: btree object + * @search: search object + * @parent: parent level object + * @child: child level object + * + * This method tries to check the root and index nodes pair. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOSPC - needs to increase the tree's height. + */ +static +int ssdfs_btree_check_root_index_pair(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + int parent_type, child_type; + int parent_height, child_height; + struct ssdfs_btree_node *parent_node, *child_node; + u64 start_hash, end_hash; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!parent || !child); + BUG_ON(!rwsem_is_locked(&tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + parent_node = parent->nodes.old_node.ptr; + + if (!parent_node) { + SSDFS_ERR("parent is NULL\n"); + return -ERANGE; + } + + child_node = child->nodes.old_node.ptr; + + if (!child_node) { + SSDFS_ERR("child is NULL\n"); + return -ERANGE; + } + + parent_type = atomic_read(&parent_node->type); + child_type = atomic_read(&child_node->type); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree %p, search %p, " + "parent %p, child %p, " + "parent id %u, parent_type %#x, " + "child id %u, child_type %#x\n", + tree, search, parent, child, + parent_node->node_id, parent_type, + child_node->node_id, child_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (parent_type != SSDFS_BTREE_ROOT_NODE) { + SSDFS_WARN("invalid parent node's type %#x\n", + parent_type); + return -ERANGE; + } + + if (child_type != SSDFS_BTREE_INDEX_NODE) { + SSDFS_WARN("invalid child node's type %#x\n", + child_type); + return -ERANGE; + } + + parent_height = atomic_read(&parent_node->height); + child_height = atomic_read(&child_node->height); + + if ((child_height + 1) != parent_height) { + SSDFS_ERR("invalid pair: " + "parent_height %u, child_height %u\n", + parent_height, child_height); + return -ERANGE; + } + + start_hash = search->request.start.hash; + end_hash = search->request.end.hash; + + if (child->flags & SSDFS_BTREE_LEVEL_ADD_NODE) { + if (can_add_new_index(parent_node)) { + err = ssdfs_btree_prepare_add_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u\n", + parent_node->node_id); + return err; + } + } else { + ssdfs_btree_prepare_add_node(tree, + SSDFS_BTREE_INDEX_NODE, + start_hash, end_hash, + parent, parent_node); + + err = ssdfs_btree_prepare_add_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } + + /* + * it needs to prepare increasing + * the tree's height + */ + return -ENOSPC; + } + } else if (need_update_parent_index_area(start_hash, child_node)) { + err = ssdfs_btree_prepare_update_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare update index: " + "err %d\n", err); + return err; + } + } + + if (!parent->flags) { + err = ssdfs_btree_prepare_do_nothing(parent, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare root node: " + "err %d\n", err); + return err; + } + } + + return 0; +} + +/* + * ssdfs_btree_check_root_hybrid_pair() - check pair of root and hybrid nodes + * @tree: btree object + * @search: search object + * @parent: parent level object + * @child: child level object + * + * This method tries to check the root and hybrid nodes pair. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOSPC - needs to increase the tree's height. + */ +static +int ssdfs_btree_check_root_hybrid_pair(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + int tree_height; + int parent_type, child_type; + struct ssdfs_btree_node *parent_node, *child_node; + u64 start_hash, end_hash; + u16 items_count; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!parent || !child); + BUG_ON(!rwsem_is_locked(&tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + parent_node = parent->nodes.old_node.ptr; + + if (!parent_node) { + SSDFS_ERR("parent is NULL\n"); + return -ERANGE; + } + + child_node = child->nodes.old_node.ptr; + + if (!child_node) { + SSDFS_ERR("child is NULL\n"); + return -ERANGE; + } + + parent_type = atomic_read(&parent_node->type); + child_type = atomic_read(&child_node->type); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree %p, search %p, " + "parent %p, child %p, " + "parent id %u, parent_type %#x, " + "child id %u, child_type %#x\n", + tree, search, parent, child, + parent_node->node_id, parent_type, + child_node->node_id, child_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (parent_type != SSDFS_BTREE_ROOT_NODE) { + SSDFS_WARN("invalid parent node's type %#x\n", + parent_type); + return -ERANGE; + } + + if (child_type != SSDFS_BTREE_HYBRID_NODE) { + SSDFS_WARN("invalid child node's type %#x\n", + child_type); + return -ERANGE; + } + + tree_height = atomic_read(&tree->height); + start_hash = search->request.start.hash; + end_hash = search->request.end.hash; + + if (tree_height < 2) { + SSDFS_ERR("invalid tree height %d\n", + tree_height); + return -ERANGE; + } + + down_read(&child_node->header_lock); + items_count = child_node->items_area.items_count; + up_read(&child_node->header_lock); + + if (tree_height >= 3 && items_count == 0) { + err = ssdfs_btree_prepare_insert_item(child, search, + child_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare the insert: " + "node_id %u, height %u\n", + child_node->node_id, + atomic_read(&child_node->height)); + return err; + } + + if (need_update_parent_index_area(start_hash, child_node)) { + err = ssdfs_btree_prepare_update_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare update index: " + "err %d\n", err); + return err; + } + } + } else if (tree_height == 2) { + err = ssdfs_btree_prepare_insert_item(child, search, child_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare the insert: " + "node_id %u, height %u\n", + child_node->node_id, + atomic_read(&child_node->height)); + return err; + } + + ssdfs_btree_prepare_add_node(tree, SSDFS_BTREE_LEAF_NODE, + start_hash, end_hash, + child, child_node); + + err = ssdfs_btree_prepare_add_index(child, + start_hash, + end_hash, + child_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u, height %u\n", + child_node->node_id, + atomic_read(&child_node->height)); + return err; + } + + err = ssdfs_btree_define_moving_items(tree, search, + child, child); + if (unlikely(err)) { + SSDFS_ERR("fail to define moving items: " + "err %d\n", err); + return err; + } + + /* + * it needs to prepare increasing + * the tree's height + */ + return -ENOSPC; + } else if (child->flags & SSDFS_BTREE_LEVEL_ADD_NODE) { + if (can_add_new_index(parent_node)) { + err = ssdfs_btree_prepare_add_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u\n", + parent_node->node_id); + return err; + } + } else { + ssdfs_btree_prepare_add_node(tree, + SSDFS_BTREE_INDEX_NODE, + start_hash, end_hash, + parent, parent_node); + + err = ssdfs_btree_prepare_add_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u\n", + parent_node->node_id); + return err; + } + + /* + * it needs to prepare increasing + * the tree's height + */ + return -ENOSPC; + } + } else if (need_update_parent_index_area(start_hash, + child_node)) { + err = ssdfs_btree_prepare_update_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare update index: " + "err %d\n", err); + return err; + } + } + + if (!parent->flags) { + err = ssdfs_btree_prepare_do_nothing(parent, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare root node: " + "err %d\n", err); + return err; + } + } + + return 0; +} + +/* + * ssdfs_btree_check_root_leaf_pair() - check pair of root and leaf nodes + * @tree: btree object + * @search: search object + * @parent: parent level object + * @child: child level object + * + * This method tries to check the root and leaf nodes pair. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOSPC - needs to increase the tree's height. + */ +static +int ssdfs_btree_check_root_leaf_pair(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + int tree_height; + int parent_type, child_type; + struct ssdfs_btree_node *parent_node, *child_node; + u64 start_hash, end_hash; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!parent || !child); + BUG_ON(!rwsem_is_locked(&tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + parent_node = parent->nodes.old_node.ptr; + + if (!parent_node) { + SSDFS_ERR("parent is NULL\n"); + return -ERANGE; + } + + child_node = child->nodes.old_node.ptr; + + if (!child_node) { + SSDFS_ERR("child is NULL\n"); + return -ERANGE; + } + + parent_type = atomic_read(&parent_node->type); + child_type = atomic_read(&child_node->type); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree %p, search %p, " + "parent %p, child %p, " + "parent id %u, parent_type %#x, " + "child id %u, child_type %#x\n", + tree, search, parent, child, + parent_node->node_id, parent_type, + child_node->node_id, child_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (parent_type != SSDFS_BTREE_ROOT_NODE) { + SSDFS_WARN("invalid parent node's type %#x\n", + parent_type); + return -ERANGE; + } + + if (child_type != SSDFS_BTREE_LEAF_NODE) { + SSDFS_WARN("invalid child node's type %#x\n", + child_type); + return -ERANGE; + } + + tree_height = atomic_read(&tree->height); + if (tree_height > 2) { + SSDFS_WARN("unexpected tree_height %u\n", + tree_height); + return -EINVAL; + } + + start_hash = search->request.start.hash; + end_hash = search->request.end.hash; + + if (can_add_new_index(parent_node)) { + /* tree has only one leaf node */ + err = ssdfs_btree_prepare_insert_item(child, + search, + child_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare the insert: " + "node_id %u, height %u\n", + child_node->node_id, + atomic_read(&child_node->height)); + return err; + } + + if (ssdfs_need_move_items_to_sibling(child)) { + err = ssdfs_check_capability_move_to_sibling(child); + if (err == -ENOSPC) { + /* + * It needs to add a node. + */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to check moving to sibling: " + "err %d\n", err); + return err; + } + } else + err = -ENOSPC; + + if (err == -ENOSPC) { + ssdfs_btree_prepare_add_node(tree, + SSDFS_BTREE_LEAF_NODE, + start_hash, end_hash, + child, child_node); + + err = ssdfs_btree_prepare_add_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } + } else { + err = ssdfs_btree_prepare_update_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare update index: " + "err %d\n", err); + return err; + } + } + } else { + err = ssdfs_btree_prepare_insert_item(child, + search, + child_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare the insert: " + "node_id %u, height %u\n", + child_node->node_id, + atomic_read(&child_node->height)); + return err; + } + + if (ssdfs_need_move_items_to_sibling(child)) { + err = ssdfs_check_capability_move_to_sibling(child); + if (err == -ENOSPC) { + /* + * It needs to add a node. + */ + ssdfs_btree_cancel_insert_item(child); + } else if (unlikely(err)) { + SSDFS_ERR("fail to check moving to sibling: " + "err %d\n", err); + return err; + } else { + err = ssdfs_btree_prepare_update_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare update: " + "err %d\n", err); + return err; + } + + return 0; + } + } else + ssdfs_btree_cancel_insert_item(child); + + ssdfs_btree_prepare_add_node(tree, + SSDFS_BTREE_HYBRID_NODE, + start_hash, end_hash, + parent, parent_node); + + err = ssdfs_btree_prepare_insert_item(parent, + search, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare the insert: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } + + if (ssdfs_need_move_items_to_parent(child)) { + err = ssdfs_prepare_move_items_to_parent(search, + parent, + child); + if (unlikely(err)) { + SSDFS_ERR("fail to check moving to sibling: " + "err %d\n", err); + return err; + } + } + + /* it needs to prepare increasing the tree's height */ + return -ENOSPC; + } + + return 0; +} + +/* + * ssdfs_btree_check_index_index_pair() - check pair of index and index nodes + * @tree: btree object + * @search: search object + * @parent: parent level object + * @child: child level object + * + * This method tries to check the index and index nodes pair. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_check_index_index_pair(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + int parent_type, child_type; + int parent_height, child_height; + struct ssdfs_btree_node *parent_node, *child_node; + u64 start_hash, end_hash; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!parent || !child); + BUG_ON(!rwsem_is_locked(&tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + parent_node = parent->nodes.old_node.ptr; + + if (!parent_node) { + SSDFS_ERR("parent is NULL\n"); + return -ERANGE; + } + + child_node = child->nodes.old_node.ptr; + + if (!child_node) { + SSDFS_ERR("child is NULL\n"); + return -ERANGE; + } + + parent_type = atomic_read(&parent_node->type); + child_type = atomic_read(&child_node->type); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree %p, search %p, " + "parent %p, child %p, " + "parent id %u, parent_type %#x, " + "child id %u, child_type %#x\n", + tree, search, parent, child, + parent_node->node_id, parent_type, + child_node->node_id, child_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (parent_type != SSDFS_BTREE_INDEX_NODE) { + SSDFS_WARN("invalid parent node's type %#x\n", + parent_type); + return -ERANGE; + } + + if (child_type != SSDFS_BTREE_INDEX_NODE) { + SSDFS_WARN("invalid child node's type %#x\n", + child_type); + return -ERANGE; + } + + parent_height = atomic_read(&parent_node->height); + child_height = atomic_read(&child_node->height); + + if ((child_height + 1) != parent_height) { + SSDFS_ERR("invalid pair: " + "parent_height %u, child_height %u\n", + parent_height, child_height); + return -ERANGE; + } + + start_hash = search->request.start.hash; + end_hash = search->request.end.hash; + + if (child->flags & SSDFS_BTREE_LEVEL_ADD_NODE) { + if (can_add_new_index(parent_node)) { + err = ssdfs_btree_prepare_add_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } + } else { + ssdfs_btree_prepare_add_node(tree, + SSDFS_BTREE_INDEX_NODE, + start_hash, end_hash, + parent, parent_node); + + err = ssdfs_btree_prepare_add_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } + } + } else if (need_update_parent_index_area(start_hash, child_node)) { + err = ssdfs_btree_prepare_update_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare update index: " + "err %d\n", err); + return err; + } + } + + if (!parent->flags) { + err = ssdfs_btree_prepare_do_nothing(parent, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare index node: " + "err %d\n", err); + return err; + } + } + + return 0; +} + +/* + * ssdfs_btree_check_index_hybrid_pair() - check pair of index and hybrid nodes + * @tree: btree object + * @search: search object + * @parent: parent level object + * @child: child level object + * + * This method tries to check the index and hybrid nodes pair. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_check_index_hybrid_pair(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + int parent_type, child_type; + int parent_height, child_height; + struct ssdfs_btree_node *parent_node, *child_node; + u64 start_hash, end_hash; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!parent || !child); + BUG_ON(!rwsem_is_locked(&tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + parent_node = parent->nodes.old_node.ptr; + + if (!parent_node) { + SSDFS_ERR("parent is NULL\n"); + return -ERANGE; + } + + child_node = child->nodes.old_node.ptr; + + if (!child_node) { + SSDFS_ERR("child is NULL\n"); + return -ERANGE; + } + + parent_type = atomic_read(&parent_node->type); + child_type = atomic_read(&child_node->type); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree %p, search %p, " + "parent %p, child %p, " + "parent id %u, parent_type %#x, " + "child id %u, child_type %#x\n", + tree, search, parent, child, + parent_node->node_id, parent_type, + child_node->node_id, child_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (parent_type != SSDFS_BTREE_INDEX_NODE) { + SSDFS_WARN("invalid parent node's type %#x\n", + parent_type); + return -ERANGE; + } + + if (child_type != SSDFS_BTREE_HYBRID_NODE) { + SSDFS_WARN("invalid child node's type %#x\n", + child_type); + return -ERANGE; + } + + parent_height = atomic_read(&parent_node->height); + child_height = atomic_read(&child_node->height); + + if ((child_height + 1) != parent_height) { + SSDFS_ERR("invalid pair: " + "parent_height %u, child_height %u\n", + parent_height, child_height); + return -ERANGE; + } + + start_hash = search->request.start.hash; + end_hash = search->request.end.hash; + + if (child->flags & SSDFS_BTREE_TRY_RESIZE_INDEX_AREA) { + err = ssdfs_btree_define_moving_indexes(parent, child); + if (unlikely(err)) { + SSDFS_ERR("fail to define moving indexes: " + "err %d\n", err); + return err; + } + + if (need_update_parent_index_area(start_hash, child_node)) { + err = ssdfs_btree_prepare_update_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare update index: " + "err %d\n", err); + return err; + } + } + } else if (child->flags & SSDFS_BTREE_LEVEL_ADD_NODE) { + if (can_add_new_index(parent_node)) { + err = ssdfs_btree_prepare_add_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } + } else { + ssdfs_btree_prepare_add_node(tree, + SSDFS_BTREE_INDEX_NODE, + start_hash, end_hash, + parent, parent_node); + + err = ssdfs_btree_prepare_add_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } + } + } else if (need_update_parent_index_area(start_hash, child_node)) { + err = ssdfs_btree_prepare_update_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare update index: " + "err %d\n", err); + return err; + } + } + + if (!parent->flags) { + err = ssdfs_btree_prepare_do_nothing(parent, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare index node: " + "err %d\n", err); + return err; + } + } + + return 0; +} + +/* + * ssdfs_btree_check_index_leaf_pair() - check pair of index and leaf nodes + * @tree: btree object + * @search: search object + * @parent: parent level object + * @child: child level object + * + * This method tries to check the index and leaf nodes pair. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_check_index_leaf_pair(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + int parent_type, child_type; + int parent_height, child_height; + struct ssdfs_btree_node *parent_node, *child_node; + u64 start_hash, end_hash; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!parent || !child); + BUG_ON(!rwsem_is_locked(&tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + parent_node = parent->nodes.old_node.ptr; + + if (!parent_node) { + SSDFS_ERR("parent is NULL\n"); + return -ERANGE; + } + + child_node = child->nodes.old_node.ptr; + + if (!child_node) { + SSDFS_ERR("child is NULL\n"); + return -ERANGE; + } + + parent_type = atomic_read(&parent_node->type); + child_type = atomic_read(&child_node->type); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree %p, search %p, " + "parent %p, child %p, " + "parent id %u, parent_type %#x, " + "child id %u, child_type %#x\n", + tree, search, parent, child, + parent_node->node_id, parent_type, + child_node->node_id, child_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (parent_type != SSDFS_BTREE_INDEX_NODE) { + SSDFS_WARN("invalid parent node's type %#x\n", + parent_type); + return -ERANGE; + } + + if (child_type != SSDFS_BTREE_LEAF_NODE) { + SSDFS_WARN("invalid child node's type %#x\n", + child_type); + return -ERANGE; + } + + parent_height = atomic_read(&parent_node->height); + child_height = atomic_read(&child_node->height); + + if ((child_height + 1) != parent_height) { + SSDFS_ERR("invalid pair: " + "parent_height %u, child_height %u\n", + parent_height, child_height); + return -ERANGE; + } + + start_hash = search->request.start.hash; + end_hash = search->request.end.hash; + + if (can_add_new_index(parent_node)) { + err = ssdfs_btree_prepare_insert_item(child, + search, + child_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare the insert: " + "node_id %u, height %u\n", + child_node->node_id, + atomic_read(&child_node->height)); + return err; + } + + if (ssdfs_need_move_items_to_sibling(child)) { + err = ssdfs_check_capability_move_to_sibling(child); + if (err == -ENOSPC) { + /* + * It needs to add a node. + */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to check moving to sibling: " + "err %d\n", err); + return err; + } + } else + err = -ENOSPC; + + if (err == -ENOSPC) { + ssdfs_btree_prepare_add_node(tree, + SSDFS_BTREE_LEAF_NODE, + start_hash, end_hash, + child, child_node); + + err = ssdfs_btree_prepare_add_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } + } else { + err = ssdfs_btree_prepare_update_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare update index: " + "err %d\n", err); + return err; + } + } + } else { + err = ssdfs_check_capability_move_indexes_to_sibling(parent); + if (err == -ENOENT) { + ssdfs_btree_prepare_add_node(tree, + SSDFS_BTREE_INDEX_NODE, + U64_MAX, U64_MAX, + parent, parent_node); + + err = ssdfs_prepare_move_indexes_right(parent, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare to move indexes: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } + } else if (err == -ENOSPC) { + ssdfs_btree_prepare_add_node(tree, + SSDFS_BTREE_INDEX_NODE, + U64_MAX, U64_MAX, + parent, parent_node); + + err = ssdfs_prepare_move_indexes_right(parent, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare to move indexes: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } + } else if (unlikely(err)) { + SSDFS_ERR("fail to prepare to move indexes: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } else { + /* + * Do nothing. + * The index moving has been prepared already. + */ + } + + /* make first phase of transformation */ + return -EAGAIN; + } + + return 0; +} + +/* + * ssdfs_btree_check_hybrid_nothing_pair() - check pair of hybrid and nothing + * @tree: btree object + * @search: search object + * @parent: parent level object + * @child: child level object + * + * This method tries to check the hybrid and nothing nodes pair. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_check_hybrid_nothing_pair(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + int parent_type; + struct ssdfs_btree_node *parent_node; + u64 start_hash, end_hash; + u16 items_count; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!parent || !child); + BUG_ON(!rwsem_is_locked(&tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + parent_node = parent->nodes.old_node.ptr; + + if (!parent_node) { + SSDFS_ERR("parent is NULL\n"); + return -ERANGE; + } + + parent_type = atomic_read(&parent_node->type); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree %p, search %p, " + "parent %p, child %p, " + "parent id %u, type %#x\n", + tree, search, parent, child, + parent_node->node_id, + parent_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (parent_type != SSDFS_BTREE_HYBRID_NODE) { + SSDFS_WARN("invalid parent node's type %#x\n", + parent_type); + return -ERANGE; + } + + down_read(&parent_node->header_lock); + items_count = parent_node->items_area.items_count; + start_hash = parent_node->items_area.start_hash; + end_hash = parent_node->items_area.end_hash; + up_read(&parent_node->header_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_count %u, start_hash %llx, end_hash %llx\n", + items_count, start_hash, end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (items_count == 0) { + /* + * Do nothing. + */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("parent id %u, type %#x, items_count == 0\n", + parent_node->node_id, + parent_type); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (can_add_new_index(parent_node)) { + ssdfs_btree_prepare_add_node(tree, SSDFS_BTREE_LEAF_NODE, + start_hash, end_hash, + child, NULL); + + err = ssdfs_btree_prepare_add_index(parent, + start_hash, + start_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare update index: " + "err %d\n", err); + return err; + } + + err = ssdfs_btree_define_moving_items(tree, search, + parent, child); + if (unlikely(err)) { + SSDFS_ERR("fail to define moving items: " + "err %d\n", err); + return err; + } + } else if (is_index_area_resizable(parent_node)) { + err = ssdfs_btree_prepare_index_area_resize(parent, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare resize of index area: " + "err %d\n", err); + return err; + } + + ssdfs_btree_prepare_add_node(tree, SSDFS_BTREE_LEAF_NODE, + start_hash, end_hash, + child, NULL); + + err = ssdfs_btree_prepare_add_index(parent, + start_hash, + start_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare update index: " + "err %d\n", err); + return err; + } + + err = ssdfs_btree_define_moving_items(tree, search, + parent, child); + if (unlikely(err)) { + SSDFS_ERR("fail to define moving items: " + "err %d\n", err); + return err; + } + } else { + start_hash = search->request.start.hash; + end_hash = search->request.end.hash; + + ssdfs_btree_prepare_add_node(tree, SSDFS_BTREE_HYBRID_NODE, + start_hash, end_hash, + parent, parent_node); + } + + if (!parent->flags) { + err = ssdfs_btree_prepare_do_nothing(parent, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare root node: " + "err %d\n", err); + return err; + } + } + + return 0; +} + +/* + * ssdfs_btree_check_hybrid_index_pair() - check pair of hybrid and index nodes + * @tree: btree object + * @search: search object + * @parent: parent level object + * @child: child level object + * + * This method tries to check the hybrid and index nodes pair. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_check_hybrid_index_pair(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + int parent_type, child_type; + int parent_height, child_height; + struct ssdfs_btree_node *parent_node, *child_node; + u64 start_hash, end_hash; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!parent || !child); + BUG_ON(!rwsem_is_locked(&tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + parent_node = parent->nodes.old_node.ptr; + + if (!parent_node) { + SSDFS_ERR("parent is NULL\n"); + return -ERANGE; + } + + child_node = child->nodes.old_node.ptr; + + if (!child_node) { + SSDFS_ERR("child is NULL\n"); + return -ERANGE; + } + + parent_type = atomic_read(&parent_node->type); + child_type = atomic_read(&child_node->type); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree %p, search %p, " + "parent %p, child %p, " + "parent id %u, parent_type %#x, " + "child id %u, child_type %#x\n", + tree, search, parent, child, + parent_node->node_id, parent_type, + child_node->node_id, child_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (parent_type != SSDFS_BTREE_HYBRID_NODE) { + SSDFS_WARN("invalid parent node's type %#x\n", + parent_type); + return -ERANGE; + } + + if (child_type != SSDFS_BTREE_INDEX_NODE) { + SSDFS_WARN("invalid child node's type %#x\n", + child_type); + return -ERANGE; + } + + parent_height = atomic_read(&parent_node->height); + child_height = atomic_read(&child_node->height); + + if ((child_height + 1) != parent_height) { + SSDFS_ERR("invalid pair: " + "parent_height %u, child_height %u\n", + parent_height, child_height); + return -ERANGE; + } + + start_hash = search->request.start.hash; + end_hash = search->request.end.hash; + + if (child->flags & SSDFS_BTREE_LEVEL_ADD_NODE) { + if (can_add_new_index(parent_node)) { + err = ssdfs_btree_prepare_add_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } + } else if (is_index_area_resizable(parent_node)) { + err = ssdfs_btree_prepare_index_area_resize(parent, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare resize of index area: " + "err %d\n", err); + return err; + } + + err = ssdfs_btree_prepare_add_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } + } else { + ssdfs_btree_prepare_add_node(tree, + SSDFS_BTREE_HYBRID_NODE, + start_hash, end_hash, + parent, parent_node); + + err = ssdfs_btree_prepare_add_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } + } + } else if (need_update_parent_index_area(start_hash, child_node)) { + err = ssdfs_btree_prepare_update_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare update index: " + "err %d\n", err); + return err; + } + } + + if (!parent->flags) { + err = ssdfs_btree_prepare_do_nothing(parent, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare hybrid node: " + "err %d\n", err); + return err; + } + } + + return 0; +} + +/* + * ssdfs_btree_check_hybrid_hybrid_pair() - check pair of hybrid + hybrid nodes + * @tree: btree object + * @search: search object + * @parent: parent level object + * @child: child level object + * + * This method tries to check the hybrid and hybrid nodes pair. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_check_hybrid_hybrid_pair(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + int parent_type, child_type; + int parent_height, child_height; + struct ssdfs_btree_node *parent_node, *child_node; + u64 start_hash, end_hash; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!parent || !child); + BUG_ON(!rwsem_is_locked(&tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + parent_node = parent->nodes.old_node.ptr; + + if (!parent_node) { + SSDFS_ERR("parent is NULL\n"); + return -ERANGE; + } + + child_node = child->nodes.old_node.ptr; + + if (!child_node) { + SSDFS_ERR("child is NULL\n"); + return -ERANGE; + } + + parent_type = atomic_read(&parent_node->type); + child_type = atomic_read(&child_node->type); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree %p, search %p, " + "parent %p, child %p, " + "parent id %u, parent_type %#x, " + "child id %u, child_type %#x\n", + tree, search, parent, child, + parent_node->node_id, parent_type, + child_node->node_id, child_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (parent_type != SSDFS_BTREE_HYBRID_NODE) { + SSDFS_WARN("invalid parent node's type %#x\n", + parent_type); + return -ERANGE; + } + + if (child_type != SSDFS_BTREE_HYBRID_NODE) { + SSDFS_WARN("invalid child node's type %#x\n", + child_type); + return -ERANGE; + } + + parent_height = atomic_read(&parent_node->height); + child_height = atomic_read(&child_node->height); + + if ((child_height + 1) != parent_height) { + SSDFS_ERR("invalid pair: " + "parent_height %u, child_height %u\n", + parent_height, child_height); + return -ERANGE; + } + + start_hash = search->request.start.hash; + end_hash = search->request.end.hash; + + err = ssdfs_btree_define_moving_items(tree, search, + parent, child); + if (unlikely(err)) { + SSDFS_ERR("fail to define moving items: " + "err %d\n", err); + return err; + } + + if (child->flags & SSDFS_BTREE_TRY_RESIZE_INDEX_AREA) { + err = ssdfs_btree_define_moving_indexes(parent, child); + if (unlikely(err)) { + SSDFS_ERR("fail to define moving indexes: " + "err %d\n", err); + return err; + } + + err = ssdfs_btree_prepare_update_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare update index: " + "err %d\n", err); + return err; + } + } else if (child->flags & SSDFS_BTREE_LEVEL_ADD_NODE) { + if (can_add_new_index(parent_node)) { + err = ssdfs_btree_prepare_add_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } + } else if (is_index_area_resizable(parent_node)) { + err = ssdfs_btree_prepare_index_area_resize(parent, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare resize of index area: " + "err %d\n", err); + return err; + } + + err = ssdfs_btree_prepare_add_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } + } else { + ssdfs_btree_prepare_add_node(tree, + SSDFS_BTREE_HYBRID_NODE, + start_hash, end_hash, + parent, parent_node); + + err = ssdfs_btree_prepare_add_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } + } + } else if (need_update_parent_index_area(start_hash, child_node)) { + err = ssdfs_btree_prepare_update_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare update index: " + "err %d\n", err); + return err; + } + } + + if (!parent->flags) { + err = ssdfs_btree_prepare_do_nothing(parent, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare index node: " + "err %d\n", err); + return err; + } + } + + return 0; +} + +/* + * ssdfs_btree_check_hybrid_leaf_pair() - check pair of hybrid and leaf nodes + * @tree: btree object + * @search: search object + * @parent: parent level object + * @child: child level object + * + * This method tries to check the hybrid and leaf nodes pair. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_check_hybrid_leaf_pair(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + int parent_type, child_type; + int parent_height, child_height; + struct ssdfs_btree_node *parent_node, *child_node; + u64 start_hash, end_hash; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!parent || !child); + BUG_ON(!rwsem_is_locked(&tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + parent_node = parent->nodes.old_node.ptr; + + if (!parent_node) { + SSDFS_ERR("parent is NULL\n"); + return -ERANGE; + } + + child_node = child->nodes.old_node.ptr; + + if (!child_node) { + SSDFS_ERR("child is NULL\n"); + return -ERANGE; + } + + parent_type = atomic_read(&parent_node->type); + child_type = atomic_read(&child_node->type); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree %p, search %p, " + "parent %p, child %p, " + "parent id %u, parent_type %#x, " + "child id %u, child_type %#x\n", + tree, search, parent, child, + parent_node->node_id, parent_type, + child_node->node_id, child_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (parent_type != SSDFS_BTREE_HYBRID_NODE) { + SSDFS_WARN("invalid parent node's type %#x\n", + parent_type); + return -ERANGE; + } + + if (child_type != SSDFS_BTREE_LEAF_NODE) { + SSDFS_WARN("invalid child node's type %#x\n", + child_type); + return -ERANGE; + } + + parent_height = atomic_read(&parent_node->height); + child_height = atomic_read(&child_node->height); + + if ((child_height + 1) != parent_height) { + SSDFS_ERR("invalid pair: " + "parent_height %u, child_height %u\n", + parent_height, child_height); + return -ERANGE; + } + + start_hash = search->request.start.hash; + end_hash = search->request.end.hash; + + err = ssdfs_btree_prepare_insert_item(child, search, child_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare the insert: " + "node_id %u, height %u\n", + child_node->node_id, + atomic_read(&child_node->height)); + return err; + } + + if (ssdfs_need_move_items_to_sibling(child)) { + err = ssdfs_check_capability_move_to_sibling(child); + if (err == -ENOSPC) { + /* + * It needs to add a node. + */ + goto try_add_node; + } else if (unlikely(err)) { + SSDFS_ERR("fail to check moving to sibling: " + "err %d\n", err); + return err; + } + + err = ssdfs_btree_prepare_update_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare update: " + "err %d\n", err); + return err; + } + + return 0; + } + +try_add_node: + ssdfs_btree_prepare_add_node(tree, + SSDFS_BTREE_LEAF_NODE, + start_hash, end_hash, + child, child_node); + + if (can_add_new_index(parent_node)) { + err = ssdfs_btree_prepare_add_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } + } else if (is_index_area_resizable(parent_node)) { + err = ssdfs_btree_prepare_index_area_resize(parent, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare resize: " + "err %d\n", err); + return err; + } + + err = ssdfs_btree_prepare_add_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } + + err = ssdfs_btree_define_moving_items(tree, search, + parent, child); + if (err == -EAGAIN) { + ssdfs_btree_cancel_insert_item(child); + ssdfs_btree_cancel_move_items_to_sibling(child); + ssdfs_btree_cancel_add_index(parent); + + down_read(&parent_node->header_lock); + start_hash = parent_node->items_area.start_hash; + up_read(&parent_node->header_lock); + + end_hash = start_hash; + + ssdfs_btree_prepare_add_node(tree, + SSDFS_BTREE_LEAF_NODE, + start_hash, end_hash, + child, child_node); + + err = ssdfs_btree_prepare_add_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } + + err = ssdfs_define_hybrid_node_moving_items(tree, + start_hash, + end_hash, + parent_node, + NULL, + parent, + child); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } + + /* make first phase of transformation */ + return -EAGAIN; + } else if (unlikely(err)) { + SSDFS_ERR("fail to define moving items: " + "err %d\n", err); + return err; + } + } else { + ssdfs_btree_prepare_add_node(tree, + SSDFS_BTREE_HYBRID_NODE, + start_hash, end_hash, + parent, parent_node); + + if (ssdfs_need_move_items_to_parent(child)) { + err = ssdfs_prepare_move_items_to_parent(search, + parent, + child); + if (unlikely(err)) { + SSDFS_ERR("fail to check moving to parent: " + "err %d\n", err); + return err; + } + } + + err = ssdfs_btree_prepare_add_index(parent, + start_hash, + end_hash, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare level: " + "node_id %u, height %u\n", + parent_node->node_id, + atomic_read(&parent_node->height)); + return err; + } + } + + return 0; +} + +/* + * ssdfs_btree_check_level_for_add() - check btree's level for adding a node + * @tree: btree object + * @search: search object + * @parent: parent level object + * @child: child level object + * + * This method tries to check the level of btree for adding a node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOSPC - needs to increase the tree's height. + */ +int ssdfs_btree_check_level_for_add(struct ssdfs_btree_state_descriptor *desc, + struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + struct ssdfs_btree_node *node = NULL; + int parent_type, child_type; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!desc || !tree || !search); + BUG_ON(!parent || !child); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, search %p, parent %p, child %p\n", + tree, search, parent, child); +#endif /* CONFIG_SSDFS_DEBUG */ + + parent_type = parent->nodes.old_node.type; + if (parent_type != SSDFS_BTREE_NODE_UNKNOWN_TYPE) { + BUG_ON(!parent->nodes.old_node.ptr); + parent_type = atomic_read(&parent->nodes.old_node.ptr->type); + } + + child_type = child->nodes.old_node.type; + if (child_type != SSDFS_BTREE_NODE_UNKNOWN_TYPE) { + BUG_ON(!child->nodes.old_node.ptr); + child_type = atomic_read(&child->nodes.old_node.ptr->type); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("parent_type %#x, child_type %#x\n", + parent_type, child_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (parent_type) { + case SSDFS_BTREE_NODE_UNKNOWN_TYPE: + switch (child_type) { + case SSDFS_BTREE_ROOT_NODE: + err = ssdfs_btree_check_nothing_root_pair(tree, + search, + parent, + child); + if (unlikely(err)) { + SSDFS_ERR("fail to check nothing-root pair: " + "err %d\n", err); + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid child node's type %#x\n", + child_type); + }; + break; + + case SSDFS_BTREE_ROOT_NODE: + switch (child_type) { + case SSDFS_BTREE_NODE_UNKNOWN_TYPE: + err = ssdfs_btree_check_root_nothing_pair(tree, + search, + parent, + child); + if (unlikely(err)) { + SSDFS_ERR("fail to check root-nothing pair: " + "err %d\n", err); + } + + node = parent->nodes.old_node.ptr; + if (is_ssdfs_btree_node_index_area_empty(node)) { + /* root node should be moved on upper level */ + desc->increment_height = true; + SSDFS_DBG("need to grow the tree height\n"); + } + break; + + case SSDFS_BTREE_INDEX_NODE: + err = ssdfs_btree_check_root_index_pair(tree, + search, + parent, + child); + if (err == -ENOSPC) { + /* root node should be moved on upper level */ + desc->increment_height = true; + SSDFS_DBG("need to grow the tree height\n"); + } else if (unlikely(err)) { + SSDFS_ERR("fail to check root-index pair: " + "err %d\n", err); + } + break; + + case SSDFS_BTREE_HYBRID_NODE: + err = ssdfs_btree_check_root_hybrid_pair(tree, + search, + parent, + child); + if (err == -ENOSPC) { + /* root node should be moved on upper level */ + desc->increment_height = true; + SSDFS_DBG("need to grow the tree height\n"); + } else if (unlikely(err)) { + SSDFS_ERR("fail to check root-hybrid pair: " + "err %d\n", err); + } + break; + + case SSDFS_BTREE_LEAF_NODE: + err = ssdfs_btree_check_root_leaf_pair(tree, + search, + parent, + child); + if (err == -ENOSPC) { + /* root node should be moved on upper level */ + desc->increment_height = true; + SSDFS_DBG("need to grow the tree height\n"); + } else if (unlikely(err)) { + SSDFS_ERR("fail to check root-leaf pair: " + "err %d\n", err); + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid child node's type %#x\n", + child_type); + }; + break; + + case SSDFS_BTREE_INDEX_NODE: + switch (child_type) { + case SSDFS_BTREE_INDEX_NODE: + err = ssdfs_btree_check_index_index_pair(tree, + search, + parent, + child); + if (unlikely(err)) { + SSDFS_ERR("fail to check index-index pair: " + "err %d\n", err); + } + break; + + case SSDFS_BTREE_HYBRID_NODE: + err = ssdfs_btree_check_index_hybrid_pair(tree, + search, + parent, + child); + if (unlikely(err)) { + SSDFS_ERR("fail to check index-hybrid pair: " + "err %d\n", err); + } + break; + + case SSDFS_BTREE_LEAF_NODE: + err = ssdfs_btree_check_index_leaf_pair(tree, + search, + parent, + child); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("need to prepare hierarchy: " + "err %d\n", err); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to check index-leaf pair: " + "err %d\n", err); + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid child node's type %#x\n", + child_type); + }; + break; + + case SSDFS_BTREE_HYBRID_NODE: + switch (child_type) { + case SSDFS_BTREE_NODE_UNKNOWN_TYPE: + err = ssdfs_btree_check_hybrid_nothing_pair(tree, + search, + parent, + child); + if (unlikely(err)) { + SSDFS_ERR("fail to check hybrid-nothing pair: " + "err %d\n", err); + } + break; + + case SSDFS_BTREE_INDEX_NODE: + err = ssdfs_btree_check_hybrid_index_pair(tree, + search, + parent, + child); + if (unlikely(err)) { + SSDFS_ERR("fail to check hybrid-index pair: " + "err %d\n", err); + } + break; + + case SSDFS_BTREE_HYBRID_NODE: + err = ssdfs_btree_check_hybrid_hybrid_pair(tree, + search, + parent, + child); + if (unlikely(err)) { + SSDFS_ERR("fail to check hybrid-hybrid pair: " + "err %d\n", err); + } + break; + + case SSDFS_BTREE_LEAF_NODE: + err = ssdfs_btree_check_hybrid_leaf_pair(tree, + search, + parent, + child); + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("need to prepare hierarchy: " + "err %d\n", err); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to check hybrid-leaf pair: " + "err %d\n", err); + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid child node's type %#x\n", + child_type); + }; + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid parent node's type %#x\n", + parent_type); + } + + return err; +} + +/* + * ssdfs_btree_descend_to_leaf_node() - descend to a leaf node + * @tree: btree object + * @search: search object + * + * This method tries to descend from the current level till a leaf node. + * + * RETURN: + * [success] - pointer on a leaf node. + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +struct ssdfs_btree_node * +ssdfs_btree_descend_to_leaf_node(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_node *node = NULL; + int type; + u64 upper_hash; + u64 start_item_hash; + u16 items_count; + u32 prev_node_id; + int counter = 0; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, search %p\n", + tree, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search->node.height == SSDFS_BTREE_LEAF_NODE_HEIGHT) { + SSDFS_DBG("search object contains leaf node\n"); + return 0; + } + + if (!search->node.child) { + err = -ERANGE; + SSDFS_ERR("child node object is NULL\n"); + return ERR_PTR(err); + } + + type = atomic_read(&search->node.child->type); + if (type != SSDFS_BTREE_HYBRID_NODE) { + err = -ERANGE; + SSDFS_ERR("invalid search object: " + "height %u, node_type %#x\n", + atomic_read(&search->node.child->height), + type); + return ERR_PTR(err); + } + + if (!is_ssdfs_btree_node_index_area_exist(search->node.child)) { + err = -ERANGE; + SSDFS_ERR("index area is absent: " + "node_id %u\n", + search->node.child->node_id); + return ERR_PTR(err); + } + + down_read(&search->node.child->header_lock); + items_count = search->node.child->items_area.items_count; + start_item_hash = search->node.child->items_area.start_hash; + upper_hash = search->node.child->index_area.end_hash; + up_read(&search->node.child->header_lock); + + if (upper_hash >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid upper hash\n"); + return ERR_PTR(err); + } + + node = search->node.child; + prev_node_id = node->node_id; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, items_count %u, " + "start_item_hash %llx, end_index_hash %llx\n", + node->node_id, items_count, + start_item_hash, upper_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (type == SSDFS_BTREE_HYBRID_NODE) { + if (items_count == 0) + return node; + else if (start_item_hash >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid start_item_hash %llx\n", + start_item_hash); + return ERR_PTR(err); + } + + if (start_item_hash == upper_hash) + return node; + } + + do { + node = ssdfs_btree_get_child_node_for_hash(tree, node, + upper_hash); + if (IS_ERR_OR_NULL(node)) { + err = !node ? -ERANGE : PTR_ERR(node); + SSDFS_ERR("fail to get the child node: err %d\n", + err); + return node; + } + + if (prev_node_id == node->node_id) { + counter++; + + if (counter > 3) { + SSDFS_ERR("infinite cycle suspected: " + "node_id %u, counter %d\n", + node->node_id, + counter); + return ERR_PTR(-ERANGE); + } + } else + prev_node_id = node->node_id; + + type = atomic_read(&node->type); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, type %#x, " + "end_index_hash %llx\n", + node->node_id, atomic_read(&node->type), + upper_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (type) { + case SSDFS_BTREE_LEAF_NODE: + /* do nothing */ + break; + + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_INDEX_NODE: + if (!is_ssdfs_btree_node_index_area_exist(node)) { + err = -ERANGE; + SSDFS_ERR("index area is absent: " + "node_id %u\n", + node->node_id); + return ERR_PTR(err); + } + + down_read(&node->header_lock); + upper_hash = node->index_area.end_hash; + up_read(&node->header_lock); + + if (upper_hash == U64_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid upper hash\n"); + return ERR_PTR(err); + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid node type: " + "node_id %u, height %u, type %#x\n", + node->node_id, + atomic_read(&node->height), + type); + return ERR_PTR(err); + } + } while (type != SSDFS_BTREE_LEAF_NODE); + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return ERR_PTR(err); + } + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid items area state: node_id %u\n", + search->node.id); + return ERR_PTR(err); + } + + return node; +} + +static +void ssdfs_btree_hierarchy_init_hash_range(struct ssdfs_btree_level *level, + struct ssdfs_btree_node *node) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!level); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!node) + return; + + down_read(&node->header_lock); + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_ROOT_NODE: + case SSDFS_BTREE_INDEX_NODE: + level->nodes.old_node.index_hash.start = + node->index_area.start_hash; + level->nodes.old_node.index_hash.end = + node->index_area.end_hash; + break; + + case SSDFS_BTREE_HYBRID_NODE: + level->nodes.old_node.index_hash.start = + node->index_area.start_hash; + level->nodes.old_node.index_hash.end = + node->index_area.end_hash; + level->nodes.old_node.items_hash.start = + node->items_area.start_hash; + level->nodes.old_node.items_hash.end = + node->items_area.end_hash; + break; + + case SSDFS_BTREE_LEAF_NODE: + level->nodes.old_node.items_hash.start = + node->items_area.start_hash; + level->nodes.old_node.items_hash.end = + node->items_area.end_hash; + break; + + default: + SSDFS_WARN("unexpected node type %#x\n", + atomic_read(&node->type)); + break; + } + up_read(&node->header_lock); +} + +/* + * ssdfs_btree_check_hierarchy_for_add() - check the btree for add node + * @tree: btree object + * @search: search object + * @hierarchy: btree's hierarchy object + * + * This method tries to check the btree's hierarchy for operation of + * node addition. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_btree_check_hierarchy_for_add(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_hierarchy *hierarchy) +{ + struct ssdfs_btree_level *level; + struct ssdfs_btree_node *parent_node, *child_node; + int child_node_height, cur_height, tree_height; + int parent_node_type, child_node_type; + spinlock_t *lock = NULL; + int err; + int res = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search || !hierarchy); + BUG_ON(!rwsem_is_locked(&tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p, search %p, hierarchy %p\n", + tree, search, hierarchy); +#else + SSDFS_DBG("tree %p, search %p, hierarchy %p\n", + tree, search, hierarchy); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + tree_height = atomic_read(&tree->height); + if (tree_height <= 0) { + SSDFS_ERR("invalid tree_height %d\n", + tree_height); + return -ERANGE; + } + + if (search->node.id == SSDFS_BTREE_ROOT_NODE_ID) { + if (tree_height <= 0 || tree_height > 1) { + SSDFS_ERR("invalid search object state: " + "tree_height %u, node_id %u\n", + tree_height, + search->node.id); + return -ERANGE; + } + + child_node = search->node.child; + parent_node = search->node.parent; + + if (child_node || !parent_node) { + SSDFS_ERR("invalid search object state: " + "child_node %p, parent_node %p\n", + child_node, parent_node); + return -ERANGE; + } + + parent_node_type = atomic_read(&parent_node->type); + child_node_type = SSDFS_BTREE_NODE_UNKNOWN_TYPE; + + if (parent_node_type != SSDFS_BTREE_ROOT_NODE) { + SSDFS_ERR("invalid parent node's type %#x\n", + parent_node_type); + return -ERANGE; + } + + child_node_height = search->node.height; + } else { + child_node = search->node.child; + parent_node = search->node.parent; + + if (!child_node || !parent_node) { + SSDFS_ERR("invalid search object state: " + "child_node %p, parent_node %p\n", + child_node, parent_node); + return -ERANGE; + } + + switch (atomic_read(&child_node->type)) { + case SSDFS_BTREE_LEAF_NODE: + /* do nothing */ + break; + + case SSDFS_BTREE_HYBRID_NODE: + child_node = ssdfs_btree_descend_to_leaf_node(tree, + search); + if (unlikely(IS_ERR_OR_NULL(child_node))) { + err = !child_node ? + -ERANGE : PTR_ERR(child_node); + SSDFS_ERR("fail to descend to leaf node: " + "err %d\n", err); + return err; + } + + lock = &child_node->descriptor_lock; + spin_lock(lock); + parent_node = child_node->parent_node; + spin_unlock(lock); + lock = NULL; + + if (!child_node || !parent_node) { + SSDFS_ERR("invalid search object state: " + "child_node %p, parent_node %p\n", + child_node, parent_node); + return -ERANGE; + } + break; + + default: + SSDFS_ERR("invalid child node's type %#x\n", + atomic_read(&child_node->type)); + return -ERANGE; + } + + parent_node_type = atomic_read(&parent_node->type); + child_node_type = atomic_read(&child_node->type); + + switch (child_node_type) { + case SSDFS_BTREE_LEAF_NODE: + case SSDFS_BTREE_HYBRID_NODE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid child node's type %#x\n", + child_node_type); + return -ERANGE; + } + + child_node_height = atomic_read(&child_node->height); + } + + cur_height = child_node_height; + if (cur_height > tree_height) { + SSDFS_ERR("cur_height %u > tree_height %u\n", + cur_height, tree_height); + return -ERANGE; + } + + if ((cur_height + 1) >= hierarchy->desc.height) { + SSDFS_ERR("invalid hierarchy: " + "tree_height %u, cur_height %u, " + "hierarchy->desc.height %u\n", + tree_height, cur_height, + hierarchy->desc.height); + return -ERANGE; + } + + level = hierarchy->array_ptr[cur_height]; + level->nodes.old_node.type = child_node_type; + level->nodes.old_node.ptr = child_node; + ssdfs_btree_hierarchy_init_hash_range(level, child_node); + + cur_height++; + level = hierarchy->array_ptr[cur_height]; + level->nodes.old_node.type = parent_node_type; + level->nodes.old_node.ptr = parent_node; + ssdfs_btree_hierarchy_init_hash_range(level, parent_node); + + cur_height++; + lock = &parent_node->descriptor_lock; + spin_lock(lock); + parent_node = parent_node->parent_node; + spin_unlock(lock); + lock = NULL; + for (; cur_height < tree_height; cur_height++) { + if (!parent_node) { + SSDFS_ERR("parent node is NULL\n"); + return -ERANGE; + } + + parent_node_type = atomic_read(&parent_node->type); + level = hierarchy->array_ptr[cur_height]; + level->nodes.old_node.type = parent_node_type; + level->nodes.old_node.ptr = parent_node; + ssdfs_btree_hierarchy_init_hash_range(level, parent_node); + + lock = &parent_node->descriptor_lock; + spin_lock(lock); + parent_node = parent_node->parent_node; + spin_unlock(lock); + lock = NULL; + } + + cur_height = child_node_height; + + if (child_node_type == SSDFS_BTREE_HYBRID_NODE) { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(cur_height < 1); +#endif /* CONFIG_SSDFS_DEBUG */ + + cur_height--; + } + + for (; cur_height <= tree_height; cur_height++) { + struct ssdfs_btree_level *parent; + struct ssdfs_btree_level *child; + + parent = hierarchy->array_ptr[cur_height + 1]; + child = hierarchy->array_ptr[cur_height]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_height %d, tree_height %d\n", + cur_height, tree_height); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_btree_check_level_for_add(&hierarchy->desc, + tree, search, + parent, child); + if (err == -EAGAIN) { + res = -EAGAIN; + err = 0; + ssdfs_debug_btree_hierarchy_object(hierarchy); + SSDFS_DBG("need to prepare btree hierarchy for add\n"); + /* continue logic for upper layers */ + continue; + } else if (err == -ENOSPC) { + if ((cur_height + 1) != (tree_height - 1)) { + ssdfs_debug_btree_hierarchy_object(hierarchy); + SSDFS_ERR("invalid current height: " + "cur_height %u, tree_height %u\n", + cur_height, tree_height); + return -ERANGE; + } else { + err = 0; + continue; + } + } else if (unlikely(err)) { + ssdfs_debug_btree_hierarchy_object(hierarchy); + SSDFS_ERR("fail to check btree's level: " + "cur_height %u, tree_height %u, " + "err %d\n", + cur_height, tree_height, err); + return err; + } else if ((cur_height + 1) >= tree_height) + break; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ssdfs_debug_btree_hierarchy_object(hierarchy); + + return res; +} From patchwork Sat Feb 25 01:09:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151963 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64D80C64ED8 for ; Sat, 25 Feb 2023 01:19:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229706AbjBYBTw (ORCPT ); Fri, 24 Feb 2023 20:19:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49068 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229527AbjBYBTJ (ORCPT ); Fri, 24 Feb 2023 20:19:09 -0500 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7948012F04 for ; Fri, 24 Feb 2023 17:17:39 -0800 (PST) Received: by mail-oi1-x22e.google.com with SMTP id s41so7201oiw.13 for ; Fri, 24 Feb 2023 17:17:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=FQ0lAmfuEaV7OCv59eNLIT6qasxQk2Pca2+gzhW7plc=; b=o6ForgXYdHq6e5QgTkk830RFhCSruv08DZHebfGnE8FLKv/+mQC9E32q1tCFwsLk5z vRY/viEexMnfKc3WxEsPl5JxHY//iC/Niano+aDpStnXT3k0rVFEqp1Hdp5UYgdEssYV pyZsyA9J0cAv93alUFKxVt+LY5rDWg7wpWVwhUhh7o1TK97vQKRqAricaNbiq7BOxc7v 8IDbWdLd4qu8TnibraLTfCQvR03sy4jmYcEjYirAmafVUf+s823A6ktCnPIpw2Gh3x6c 4y7v8AVljgOiMLLCDm6WNUTc2GkihliUpDKGbFMfTR9qVo6kwBgWTcR0o1ubw7/w2e0D ySsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FQ0lAmfuEaV7OCv59eNLIT6qasxQk2Pca2+gzhW7plc=; b=hrwBsT52NQwQOAG9s2T0NeV173K6QALfTrST7HTUUTnSHzaYoDLE9ZO0tH0vU71V2G SyyZupyCY1IGtqd42Ea+IvNvWqeN0tOff7t9M8tOd5rxLWRZg/WIPO7OzmxSRJ8JbUoI ipYO2blz+qU1wGF/wnof+eN1XIweMeKcGe1bgBRbr6OVxv5VcrsEY53gjMIVZgmSd0ID l+YCLGBEz0c7RhdJr3B9CpVvICqB4+an755qeS241Dplip8Vb4MAJnBfyeaRnIaVa/w0 lHI9iQdfwlE8O9GFd8UDcFglTjXJy7kihgPuU29Fs4noAiDdwEloWWT8Hp+y1b/oTQD8 WgBg== X-Gm-Message-State: AO0yUKXRrQM/LEjcoCy9zfh+c7R4xqcqmN+sVJPt1D46lMcfLgB+ozGh 8xkFypL/5dpjNSC3k+aiSmDTqie4ILQO3Ii8 X-Google-Smtp-Source: AK7set8WzwErCjNbCoeQkzBcsfS9JYTfTBiS/AG9FKbcfahaj3/VjhsqjkDa7yFAgaSBDNVoAZwSlw== X-Received: by 2002:a54:4803:0:b0:37d:69cf:b6a0 with SMTP id j3-20020a544803000000b0037d69cfb6a0mr7991628oij.30.1677287857909; Fri, 24 Feb 2023 17:17:37 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:36 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 58/76] ssdfs: check b-tree hierarchy for update/delete operation Date: Fri, 24 Feb 2023 17:09:09 -0800 Message-Id: <20230225010927.813929-59-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Every b-tree node has associated hash value that represents starting hash value of records sequence in the node. This hash value is stored in index key that keeps parent node. Finally, hash values are used for search items in the b-tree. If modification operation changes starting hash value in the node, then index key in parent node has to be updated. The checking logic identifies all parent nodes that requires index keys update. As a result, modification logic executes index keys update in all parent nodes that were selected for update by checking logic. Delete operation requires to identify which nodes are empty and should be deleted/invalidated. This invalidation plan is executed by modification logic, finally. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/btree_hierarchy.c | 1896 ++++++++++++++++++++++++++++++++++++ 1 file changed, 1896 insertions(+) diff --git a/fs/ssdfs/btree_hierarchy.c b/fs/ssdfs/btree_hierarchy.c index 6e9f91ed4541..3c1444732019 100644 --- a/fs/ssdfs/btree_hierarchy.c +++ b/fs/ssdfs/btree_hierarchy.c @@ -5653,3 +5653,1899 @@ int ssdfs_btree_check_hierarchy_for_add(struct ssdfs_btree *tree, return res; } + +/* + * ssdfs_btree_check_level_for_delete() - check btree's level for node deletion + * @tree: btree object + * @search: search object + * @parent: parent level object + * @child: child level object + * + * This method tries to check the level of btree for node deletion. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_check_level_for_delete(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + struct ssdfs_btree_node *parent_node, *child_node; + u16 index_count, items_count; + u64 hash; + u64 parent_start_hash, parent_end_hash; + u64 child_start_hash, child_end_hash; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search || !parent || !child); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, search %p, parent %p, child %p\n", + tree, search, parent, child); +#endif /* CONFIG_SSDFS_DEBUG */ + + parent_node = parent->nodes.old_node.ptr; + child_node = child->nodes.old_node.ptr; + + if (!child_node) { + SSDFS_ERR("node is NULL\n"); + return -ERANGE; + } + + switch (atomic_read(&child_node->type)) { + case SSDFS_BTREE_ROOT_NODE: + /* do nothing */ + return 0; + + default: + if (!parent_node) { + SSDFS_ERR("node is NULL\n"); + return -ERANGE; + } + } + + if (child->flags & SSDFS_BTREE_LEVEL_DELETE_NODE) { + parent->flags |= SSDFS_BTREE_LEVEL_DELETE_INDEX; + + switch (atomic_read(&parent_node->type)) { + case SSDFS_BTREE_ROOT_NODE: + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_INDEX_NODE: + /* expected type */ + break; + + default: + SSDFS_ERR("invalid parent node type %#x\n", + atomic_read(&parent_node->type)); + return -ERANGE; + } + + parent->index_area.delete.op_state = + SSDFS_BTREE_AREA_OP_REQUESTED; + + spin_lock(&child_node->descriptor_lock); + ssdfs_memcpy(&parent->index_area.delete.node_index, + 0, sizeof(struct ssdfs_btree_index_key), + &child_node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + spin_unlock(&child_node->descriptor_lock); + + down_read(&parent_node->header_lock); + index_count = parent_node->index_area.index_count; + items_count = parent_node->items_area.items_count; + if (index_count <= 1 && items_count == 0) + parent->flags |= SSDFS_BTREE_LEVEL_DELETE_NODE; + up_read(&parent_node->header_lock); + } else if (child->flags & SSDFS_BTREE_LEVEL_DELETE_INDEX) { + struct ssdfs_btree_node_delete *delete; + + delete = &child->index_area.delete; + + if (delete->op_state != SSDFS_BTREE_AREA_OP_REQUESTED) { + SSDFS_ERR("invalid operation state %#x\n", + delete->op_state); + return -ERANGE; + } + + hash = le64_to_cpu(delete->node_index.index.hash); + + down_read(&child_node->header_lock); + child_start_hash = child_node->index_area.start_hash; + child_end_hash = child_node->index_area.end_hash; + up_read(&child_node->header_lock); + + if (hash == child_start_hash || hash == child_end_hash) { + parent->flags |= SSDFS_BTREE_LEVEL_UPDATE_INDEX; + + /* + * Simply add flag. + * Maybe it will need to add additional code. + */ + } + } else if (child->flags & SSDFS_BTREE_LEVEL_UPDATE_INDEX) { + down_read(&parent_node->header_lock); + parent_start_hash = parent_node->index_area.start_hash; + parent_end_hash = parent_node->index_area.end_hash; + up_read(&parent_node->header_lock); + + down_read(&child_node->header_lock); + child_start_hash = child_node->index_area.start_hash; + child_end_hash = child_node->index_area.end_hash; + up_read(&child_node->header_lock); + + if (child_start_hash == parent_start_hash || + child_start_hash == parent_end_hash) { + /* set update index flag */ + parent->flags |= SSDFS_BTREE_LEVEL_UPDATE_INDEX; + } else if (child_end_hash == parent_start_hash || + child_end_hash == parent_end_hash) { + /* set update index flag */ + parent->flags |= SSDFS_BTREE_LEVEL_UPDATE_INDEX; + } else { + err = ssdfs_btree_prepare_do_nothing(parent, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare index node: " + "err %d\n", err); + return err; + } + } + } else { + err = ssdfs_btree_prepare_do_nothing(parent, + parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare index node: " + "err %d\n", err); + return err; + } + } + + return 0; +} + +/* + * ssdfs_btree_check_hierarchy_for_delete() - check the btree for node deletion + * @tree: btree object + * @search: search object + * @hierarchy: btree's hierarchy object + * + * This method tries to check the btree's hierarchy for operation of + * node deletion. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_btree_check_hierarchy_for_delete(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_hierarchy *hierarchy) +{ + struct ssdfs_btree_level *level; + struct ssdfs_btree_node *parent_node, *child_node; + int child_node_height, cur_height, tree_height; + int parent_node_type, child_node_type; + spinlock_t *lock = NULL; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search || !hierarchy); + BUG_ON(!rwsem_is_locked(&tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p, search %p, hierarchy %p\n", + tree, search, hierarchy); +#else + SSDFS_DBG("tree %p, search %p, hierarchy %p\n", + tree, search, hierarchy); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + tree_height = atomic_read(&tree->height); + if (tree_height == 0) { + SSDFS_ERR("invalid tree_height %u\n", + tree_height); + return -ERANGE; + } + + if (search->node.id == SSDFS_BTREE_ROOT_NODE_ID) { + SSDFS_ERR("root node cannot be deleted\n"); + return -ERANGE; + } else { + child_node = search->node.child; + + lock = &child_node->descriptor_lock; + spin_lock(lock); + parent_node = child_node->parent_node; + spin_unlock(lock); + lock = NULL; + + if (!child_node || !parent_node) { + SSDFS_ERR("invalid search object state: " + "child_node %p, parent_node %p\n", + child_node, parent_node); + return -ERANGE; + } + + parent_node_type = atomic_read(&parent_node->type); + child_node_type = atomic_read(&child_node->type); + child_node_height = atomic_read(&child_node->height); + } + + cur_height = child_node_height; + if (cur_height >= tree_height) { + SSDFS_ERR("cur_height %u >= tree_height %u\n", + cur_height, tree_height); + return -ERANGE; + } + + if ((cur_height + 1) >= hierarchy->desc.height || + (cur_height + 1) >= tree_height) { + SSDFS_ERR("invalid hierarchy: " + "tree_height %u, cur_height %u, " + "hierarchy->desc.height %u\n", + tree_height, cur_height, + hierarchy->desc.height); + return -ERANGE; + } + + level = hierarchy->array_ptr[cur_height]; + level->nodes.old_node.type = child_node_type; + level->nodes.old_node.ptr = child_node; + ssdfs_btree_hierarchy_init_hash_range(level, child_node); + level->flags |= SSDFS_BTREE_LEVEL_DELETE_NODE; + + cur_height++; + level = hierarchy->array_ptr[cur_height]; + level->nodes.old_node.type = parent_node_type; + level->nodes.old_node.ptr = parent_node; + ssdfs_btree_hierarchy_init_hash_range(level, parent_node); + + cur_height++; + lock = &parent_node->descriptor_lock; + spin_lock(lock); + parent_node = parent_node->parent_node; + spin_unlock(lock); + lock = NULL; + for (; cur_height < tree_height; cur_height++) { + if (!parent_node) { + SSDFS_ERR("parent node is NULL\n"); + return -ERANGE; + } + + parent_node_type = atomic_read(&parent_node->type); + level = hierarchy->array_ptr[cur_height]; + level->nodes.old_node.type = parent_node_type; + level->nodes.old_node.ptr = parent_node; + ssdfs_btree_hierarchy_init_hash_range(level, parent_node); + + lock = &parent_node->descriptor_lock; + spin_lock(lock); + parent_node = parent_node->parent_node; + spin_unlock(lock); + lock = NULL; + } + + cur_height = child_node_height; + for (; cur_height < tree_height; cur_height++) { + struct ssdfs_btree_level *parent; + struct ssdfs_btree_level *child; + + parent = hierarchy->array_ptr[cur_height + 1]; + child = hierarchy->array_ptr[cur_height]; + + err = ssdfs_btree_check_level_for_delete(tree, search, + parent, child); + if (unlikely(err)) { + SSDFS_ERR("fail to check btree's level: " + "cur_height %u, tree_height %u, " + "err %d\n", + cur_height, tree_height, err); + return err; + } + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} + +/* + * ssdfs_btree_check_level_for_update() - check btree's level for index update + * @tree: btree object + * @search: search object + * @parent: parent level object + * @child: child level object + * + * This method tries to check the level of btree for index update. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_check_level_for_update(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + struct ssdfs_btree_node *parent_node, *child_node; + int state; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search || !parent || !child); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, search %p, parent %p, child %p\n", + tree, search, parent, child); +#endif /* CONFIG_SSDFS_DEBUG */ + + child_node = child->nodes.old_node.ptr; + if (!child_node) { + SSDFS_ERR("child node is NULL\n"); + return -ERANGE; + } + + if (child_node->node_id == SSDFS_BTREE_ROOT_NODE_ID) { + SSDFS_DBG("nothing should be done for the root node\n"); + return 0; + } + + parent_node = parent->nodes.old_node.ptr; + if (!parent_node) { + SSDFS_ERR("parent node is NULL\n"); + return -ERANGE; + } + + state = atomic_read(&parent_node->index_area.state); + if (state != SSDFS_BTREE_NODE_INDEX_AREA_EXIST) { + SSDFS_ERR("parent node %u hasn't index area\n", + parent_node->node_id); + return -ERANGE; + } + + switch (atomic_read(&child_node->type)) { + case SSDFS_BTREE_LEAF_NODE: + state = atomic_read(&child_node->items_area.state); + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + SSDFS_ERR("child node %u hasn't items area\n", + child_node->node_id); + return -ERANGE; + } + + /* set necessity to update the parent's index */ + parent->flags |= SSDFS_BTREE_LEVEL_UPDATE_INDEX; + break; + + default: + state = atomic_read(&child_node->index_area.state); + if (state != SSDFS_BTREE_NODE_INDEX_AREA_EXIST) { + SSDFS_ERR("child node %u hasn't index area\n", + child_node->node_id); + return -ERANGE; + } + + /* set necessity to update the parent's index */ + parent->flags |= SSDFS_BTREE_LEVEL_UPDATE_INDEX; + break; + } + + return 0; +} + +/* + * ssdfs_btree_check_hierarchy_for_update() - check the btree for index update + * @tree: btree object + * @search: search object + * @hierarchy: btree's hierarchy object + * + * This method tries to check the btree's hierarchy for operation of + * index update. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_btree_check_hierarchy_for_update(struct ssdfs_btree *tree, + struct ssdfs_btree_search *search, + struct ssdfs_btree_hierarchy *hierarchy) +{ + struct ssdfs_btree_level *level; + struct ssdfs_btree_node *parent_node, *child_node; + int child_node_height, cur_height, tree_height; + int parent_node_type, child_node_type; + spinlock_t *lock; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search || !hierarchy); + BUG_ON(!rwsem_is_locked(&tree->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p, search %p, hierarchy %p\n", + tree, search, hierarchy); +#else + SSDFS_DBG("tree %p, search %p, hierarchy %p\n", + tree, search, hierarchy); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + tree_height = atomic_read(&tree->height); + if (tree_height == 0) { + SSDFS_ERR("invalid tree_height %u\n", + tree_height); + return -ERANGE; + } + + if (search->node.id == SSDFS_BTREE_ROOT_NODE_ID) { + SSDFS_ERR("parent node is absent\n"); + return -ERANGE; + } else { + child_node = search->node.child; + + lock = &child_node->descriptor_lock; + spin_lock(lock); + parent_node = child_node->parent_node; + spin_unlock(lock); + lock = NULL; + + if (!child_node || !parent_node) { + SSDFS_ERR("invalid search object state: " + "child_node %p, parent_node %p\n", + child_node, parent_node); + return -ERANGE; + } + + parent_node_type = atomic_read(&parent_node->type); + child_node_type = atomic_read(&child_node->type); + child_node_height = atomic_read(&child_node->height); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("child_node %px, child_node_id %u, child_node_type %#x\n", + child_node, child_node->node_id, child_node_type); + SSDFS_DBG("parent_node %px, parent_node_id %u, parent_node_type %#x\n", + parent_node, parent_node->node_id, parent_node_type); +#endif /* CONFIG_SSDFS_DEBUG */ + + cur_height = child_node_height; + if (cur_height >= tree_height) { + SSDFS_ERR("cur_height %u >= tree_height %u\n", + cur_height, tree_height); + return -ERANGE; + } + + if ((cur_height + 1) >= hierarchy->desc.height || + (cur_height + 1) >= tree_height) { + SSDFS_ERR("invalid hierarchy: " + "tree_height %u, cur_height %u, " + "hierarchy->desc.height %u\n", + tree_height, cur_height, + hierarchy->desc.height); + return -ERANGE; + } + + level = hierarchy->array_ptr[cur_height]; + level->nodes.old_node.type = child_node_type; + level->nodes.old_node.ptr = child_node; + ssdfs_btree_hierarchy_init_hash_range(level, child_node); + + cur_height++; + level = hierarchy->array_ptr[cur_height]; + level->nodes.old_node.type = parent_node_type; + level->nodes.old_node.ptr = parent_node; + ssdfs_btree_hierarchy_init_hash_range(level, parent_node); + level->flags |= SSDFS_BTREE_LEVEL_UPDATE_INDEX; + + cur_height++; + lock = &parent_node->descriptor_lock; + spin_lock(lock); + parent_node = parent_node->parent_node; + spin_unlock(lock); + lock = NULL; + for (; cur_height < tree_height; cur_height++) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_height %d, tree_height %d, parent_node %px\n", + cur_height, tree_height, parent_node); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!parent_node) { + SSDFS_ERR("parent node is NULL\n"); + return -ERANGE; + } + + parent_node_type = atomic_read(&parent_node->type); + level = hierarchy->array_ptr[cur_height]; + level->nodes.old_node.type = parent_node_type; + level->nodes.old_node.ptr = parent_node; + ssdfs_btree_hierarchy_init_hash_range(level, parent_node); + + lock = &parent_node->descriptor_lock; + spin_lock(lock); + parent_node = parent_node->parent_node; + spin_unlock(lock); + lock = NULL; + } + + cur_height = child_node_height; + for (; cur_height < tree_height; cur_height++) { + struct ssdfs_btree_level *parent; + struct ssdfs_btree_level *child; + + parent = hierarchy->array_ptr[cur_height + 1]; + child = hierarchy->array_ptr[cur_height]; + + err = ssdfs_btree_check_level_for_update(tree, search, + parent, child); + if (unlikely(err)) { + SSDFS_ERR("fail to check btree's level: " + "cur_height %u, tree_height %u, " + "err %d\n", + cur_height, tree_height, err); + return err; + } + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} + +static +int ssdfs_btree_update_index_after_move(struct ssdfs_btree_level *child, + struct ssdfs_btree_node *parent_node); + +/* + * ssdfs_btree_move_items_left() - move head items from old to new node + * @desc: btree state descriptor + * @parent: parent level descriptor + * @child: child level descriptor + * + * This method tries to move the head items from the old node into + * new one. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_move_items_left(struct ssdfs_btree_state_descriptor *desc, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + struct ssdfs_btree_node *parent_node; + struct ssdfs_btree_node *old_node; + struct ssdfs_btree_node *new_node; + int type; + u32 calculated; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!desc || !parent || !child); + + if (!(child->flags & SSDFS_BTREE_ITEMS_AREA_NEED_MOVE && + child->items_area.move.direction == SSDFS_BTREE_MOVE_TO_LEFT)) { + SSDFS_WARN("invalid move request: " + "flags %#x, direction %#x\n", + child->flags, + child->items_area.move.direction); + return -ERANGE; + } + + SSDFS_DBG("desc %p, child %p\n", + desc, child); +#endif /* CONFIG_SSDFS_DEBUG */ + + parent_node = parent->nodes.old_node.ptr; + + if (!parent_node) { + SSDFS_ERR("fail to move items: " + "parent_node %p\n", + parent_node); + return -ERANGE; + } + + type = atomic_read(&parent_node->type); + + switch (type) { + case SSDFS_BTREE_ROOT_NODE: + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + /* expected type */ + break; + + default: + SSDFS_ERR("parent node has improper type: " + "node_id %u, type %#x\n", + parent_node->node_id, type); + return -ERANGE; + } + + if (child->items_area.move.op_state != SSDFS_BTREE_AREA_OP_REQUESTED) { + SSDFS_ERR("invalid operation state %#x\n", + child->items_area.move.op_state); + return -ERANGE; + } else + child->items_area.move.op_state = SSDFS_BTREE_AREA_OP_FAILED; + + old_node = child->nodes.old_node.ptr; + new_node = child->nodes.new_node.ptr; + + if (!old_node || !new_node) { + SSDFS_ERR("fail to move items: " + "old_node %p, new_node %p\n", + old_node, new_node); + return -ERANGE; + } + + type = atomic_read(&old_node->type); + switch (type) { + case SSDFS_BTREE_LEAF_NODE: + case SSDFS_BTREE_HYBRID_NODE: + /* expected type */ + break; + + default: + SSDFS_ERR("old node is not leaf node: " + "node_id %u, type %#x\n", + old_node->node_id, type); + return -ERANGE; + } + + type = atomic_read(&new_node->type); + + switch (type) { + case SSDFS_BTREE_LEAF_NODE: + case SSDFS_BTREE_HYBRID_NODE: + /* expected type */ + break; + + default: + SSDFS_ERR("new node is not leaf node: " + "node_id %u, type %#x\n", + new_node->node_id, type); + return -ERANGE; + } + + switch (child->items_area.move.pos.state) { + case SSDFS_HASH_RANGE_INTERSECTION: + case SSDFS_HASH_RANGE_OUT_OF_NODE: + if (child->items_area.move.pos.start != 0) { + SSDFS_ERR("invalid position's start %u\n", + child->items_area.move.pos.start); + return -ERANGE; + } + + if (child->items_area.move.pos.count == 0) { + SSDFS_ERR("invalid position's count %u\n", + child->items_area.move.pos.count); + return -ERANGE; + } + break; + + default: + SSDFS_ERR("invalid position's state %#x\n", + child->items_area.move.pos.state); + return -ERANGE; + } + + calculated = child->items_area.move.pos.count * desc->min_item_size; + if (calculated >= desc->node_size) { + SSDFS_ERR("invalid position: " + "count %u, min_item_size %u, node_size %u\n", + child->items_area.move.pos.count, + desc->min_item_size, + desc->node_size); + return -ERANGE; + } + + err = ssdfs_btree_node_move_items_range(old_node, new_node, + child->items_area.move.pos.start, + child->items_area.move.pos.count); + if (unlikely(err)) { + SSDFS_ERR("fail to move items range: " + "src_node %u, dst_node %u, " + "start_item %u, count %u, " + "err %d\n", + old_node->node_id, + new_node->node_id, + child->items_area.move.pos.start, + child->items_area.move.pos.count, + err); + return err; + } + + down_read(&old_node->header_lock); + child->index_area.hash.start = old_node->index_area.start_hash; + child->index_area.hash.end = old_node->index_area.end_hash; + child->items_area.hash.start = old_node->items_area.start_hash; + child->items_area.hash.end = old_node->items_area.end_hash; + up_read(&old_node->header_lock); + + err = ssdfs_btree_update_index_after_move(child, parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to update indexes in parent: err %d\n", + err); + return err; + } + + child->items_area.move.op_state = SSDFS_BTREE_AREA_OP_DONE; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("src_node %u, dst_node %u, " + "start_item %u, count %u\n", + old_node->node_id, + new_node->node_id, + child->items_area.move.pos.start, + child->items_area.move.pos.count); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_btree_move_items_right() - move tail items from old to new node + * @desc: btree state descriptor + * @parent: parent level descriptor + * @child: child level descriptor + * + * This method tries to move the tail items from the old node into + * new one. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_move_items_right(struct ssdfs_btree_state_descriptor *desc, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + struct ssdfs_btree_node *parent_node; + struct ssdfs_btree_node *old_node; + struct ssdfs_btree_node *new_node; + int type; + u32 calculated; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!desc || !parent || !child); + + if (!(child->flags & SSDFS_BTREE_ITEMS_AREA_NEED_MOVE && + child->items_area.move.direction == SSDFS_BTREE_MOVE_TO_RIGHT)) { + SSDFS_WARN("invalid move request: " + "flags %#x, direction %#x\n", + child->flags, + child->items_area.move.direction); + return -ERANGE; + } + + SSDFS_DBG("desc %p, child %p\n", + desc, child); +#endif /* CONFIG_SSDFS_DEBUG */ + + parent_node = parent->nodes.old_node.ptr; + + if (!parent_node) { + SSDFS_ERR("fail to move items: " + "parent_node %p\n", + parent_node); + return -ERANGE; + } + + type = atomic_read(&parent_node->type); + switch (type) { + case SSDFS_BTREE_ROOT_NODE: + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + /* expected type */ + break; + + default: + SSDFS_ERR("parent node has improper type: " + "node_id %u, type %#x\n", + parent_node->node_id, type); + return -ERANGE; + } + + if (child->items_area.move.op_state != SSDFS_BTREE_AREA_OP_REQUESTED) { + SSDFS_ERR("invalid operation state %#x\n", + child->items_area.move.op_state); + return -ERANGE; + } else + child->items_area.move.op_state = SSDFS_BTREE_AREA_OP_FAILED; + + old_node = child->nodes.old_node.ptr; + new_node = child->nodes.new_node.ptr; + + if (!old_node || !new_node) { + SSDFS_ERR("fail to move items: " + "old_node %p, new_node %p\n", + old_node, new_node); + return -ERANGE; + } + + type = atomic_read(&old_node->type); + switch (type) { + case SSDFS_BTREE_LEAF_NODE: + case SSDFS_BTREE_HYBRID_NODE: + /* expected type */ + break; + + default: + SSDFS_ERR("old node is not leaf node: " + "node_id %u, type %#x\n", + old_node->node_id, type); + return -ERANGE; + } + + type = atomic_read(&new_node->type); + switch (type) { + case SSDFS_BTREE_LEAF_NODE: + case SSDFS_BTREE_HYBRID_NODE: + /* expected type */ + break; + + default: + SSDFS_ERR("new node is not leaf node: " + "node_id %u, type %#x\n", + new_node->node_id, type); + return -ERANGE; + } + + switch (child->items_area.move.pos.state) { + case SSDFS_HASH_RANGE_INTERSECTION: + case SSDFS_HASH_RANGE_OUT_OF_NODE: + if (child->items_area.move.pos.count == 0) { + SSDFS_ERR("invalid position's count %u\n", + child->items_area.move.pos.count); + return -ERANGE; + } + break; + + default: + SSDFS_ERR("invalid position's state %#x\n", + child->items_area.move.pos.state); + return -ERANGE; + } + + calculated = child->items_area.move.pos.count * desc->min_item_size; + if (calculated >= desc->node_size) { + SSDFS_ERR("invalid position: " + "count %u, min_item_size %u, node_size %u\n", + child->items_area.move.pos.count, + desc->min_item_size, + desc->node_size); + return -ERANGE; + } + + err = ssdfs_btree_node_move_items_range(old_node, new_node, + child->items_area.move.pos.start, + child->items_area.move.pos.count); + if (unlikely(err)) { + SSDFS_ERR("fail to move items range: " + "src_node %u, dst_node %u, " + "start_item %u, count %u, " + "err %d\n", + old_node->node_id, + new_node->node_id, + child->items_area.move.pos.start, + child->items_area.move.pos.count, + err); + return err; + } + + down_read(&old_node->header_lock); + child->index_area.hash.start = old_node->index_area.start_hash; + child->index_area.hash.end = old_node->index_area.end_hash; + child->items_area.hash.start = old_node->items_area.start_hash; + child->items_area.hash.end = old_node->items_area.end_hash; + up_read(&old_node->header_lock); + + err = ssdfs_btree_update_index_after_move(child, parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to update indexes in parent: err %d\n", + err); + return err; + } + + child->items_area.move.op_state = SSDFS_BTREE_AREA_OP_DONE; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("src_node %u, dst_node %u, " + "start_item %u, count %u\n", + old_node->node_id, + new_node->node_id, + child->items_area.move.pos.start, + child->items_area.move.pos.count); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_btree_move_items_parent2child() - move items from parent to child node + * @desc: btree state descriptor + * @parent: parent level descriptor + * @child: child level descriptor + * + * This method tries to move items from the parent node into + * child one. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static int +ssdfs_btree_move_items_parent2child(struct ssdfs_btree_state_descriptor *desc, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + struct ssdfs_btree_node *parent_node; + struct ssdfs_btree_node *child_node; + int type; + u32 calculated; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!desc || !parent || !child); + + if (!(parent->flags & SSDFS_BTREE_ITEMS_AREA_NEED_MOVE && + parent->items_area.move.direction == SSDFS_BTREE_MOVE_TO_CHILD)) { + SSDFS_WARN("invalid move request: " + "flags %#x, direction %#x\n", + parent->flags, + parent->items_area.move.direction); + return -ERANGE; + } + + SSDFS_DBG("desc %p, parent %p, child %p\n", + desc, parent, child); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (parent->items_area.move.op_state != SSDFS_BTREE_AREA_OP_REQUESTED) { + SSDFS_ERR("invalid operation state %#x\n", + parent->items_area.move.op_state); + return -ERANGE; + } else + parent->items_area.move.op_state = SSDFS_BTREE_AREA_OP_FAILED; + + parent_node = parent->nodes.old_node.ptr; + + if (child->flags & SSDFS_BTREE_LEVEL_ADD_NODE) + child_node = child->nodes.new_node.ptr; + else + child_node = child->nodes.old_node.ptr; + + if (!parent_node || !child_node) { + SSDFS_ERR("fail to move items: " + "parent_node %p, child_node %p\n", + parent_node, child_node); + return -ERANGE; + } + + type = atomic_read(&parent_node->type); + if (type != SSDFS_BTREE_HYBRID_NODE) { + SSDFS_ERR("parent node has improper type: " + "node_id %u, type %#x\n", + parent_node->node_id, type); + return -ERANGE; + } + + type = atomic_read(&child_node->type); + switch (type) { + case SSDFS_BTREE_LEAF_NODE: + case SSDFS_BTREE_HYBRID_NODE: + /* expected state */ + break; + + default: + SSDFS_ERR("child node has improper type: " + "node_id %u, type %#x\n", + child_node->node_id, type); + return -ERANGE; + } + + switch (parent->items_area.move.pos.state) { + case SSDFS_HASH_RANGE_INTERSECTION: + if (parent->items_area.move.pos.count == 0) { + SSDFS_ERR("invalid position's count %u\n", + parent->items_area.move.pos.count); + return -ERANGE; + } + break; + + default: + SSDFS_ERR("invalid position's state %#x\n", + parent->items_area.move.pos.state); + return -ERANGE; + } + + calculated = parent->items_area.move.pos.count * desc->min_item_size; + + if (calculated >= desc->node_size) { + SSDFS_ERR("invalid position: " + "count %u, min_item_size %u, node_size %u\n", + parent->items_area.move.pos.count, + desc->min_item_size, + desc->node_size); + return -ERANGE; + } + + if (!(child->flags & SSDFS_BTREE_LEVEL_ADD_NODE) && + calculated > child->items_area.free_space) { + SSDFS_ERR("child has not enough free space: " + "calculated %u, free_space %u\n", + calculated, + child->items_area.free_space); + return -ERANGE; + } + + err = ssdfs_btree_node_move_items_range(parent_node, child_node, + parent->items_area.move.pos.start, + parent->items_area.move.pos.count); + if (unlikely(err)) { + SSDFS_ERR("fail to move items range: " + "src_node %u, dst_node %u, " + "start_item %u, count %u, " + "err %d\n", + parent_node->node_id, + child_node->node_id, + parent->items_area.move.pos.start, + parent->items_area.move.pos.count, + err); + return err; + } + + down_read(&child_node->header_lock); + + switch (atomic_read(&child_node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + child->index_area.area_size = child_node->index_area.area_size; + calculated = child_node->index_area.index_capacity; + calculated -= child_node->index_area.index_count; + calculated *= child_node->index_area.index_size; + child->index_area.free_space = calculated; + child->index_area.hash.start = + child_node->index_area.start_hash; + child->index_area.hash.end = + child_node->index_area.end_hash; + break; + + default: + /* do nothing */ + break; + } + + switch (atomic_read(&child_node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + child->items_area.area_size = child_node->items_area.area_size; + child->items_area.free_space = + child_node->items_area.free_space; + child->items_area.hash.start = + child_node->items_area.start_hash; + child->items_area.hash.end = + child_node->items_area.end_hash; + break; + + default: + /* do nothing */ + break; + } + + up_read(&child_node->header_lock); + + down_read(&parent_node->header_lock); + + switch (atomic_read(&parent_node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + parent->index_area.area_size = + parent_node->index_area.area_size; + calculated = parent_node->index_area.index_capacity; + calculated -= parent_node->index_area.index_count; + calculated *= parent_node->index_area.index_size; + parent->index_area.free_space = calculated; + parent->index_area.hash.start = + parent_node->index_area.start_hash; + parent->index_area.hash.end = + parent_node->index_area.end_hash; + break; + + default: + /* do nothing */ + break; + } + + switch (atomic_read(&parent_node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + parent->items_area.area_size = + parent_node->items_area.area_size; + parent->items_area.free_space = + parent_node->items_area.free_space; + parent->items_area.hash.start = + parent_node->items_area.start_hash; + parent->items_area.hash.end = + parent_node->items_area.end_hash; + break; + + default: + /* do nothing */ + break; + } + + up_read(&parent_node->header_lock); + + parent->items_area.move.op_state = SSDFS_BTREE_AREA_OP_DONE; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("src_node %u, dst_node %u, " + "start_item %u, count %u\n", + parent_node->node_id, + child_node->node_id, + parent->items_area.move.pos.start, + parent->items_area.move.pos.count); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_btree_move_items_child2parent() - move items from child to parent node + * @desc: btree state descriptor + * @parent: parent level descriptor + * @child: child level descriptor + * + * This method tries to move items from the child node into + * parent one. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static int +ssdfs_btree_move_items_child2parent(struct ssdfs_btree_state_descriptor *desc, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + struct ssdfs_btree_node *parent_node; + struct ssdfs_btree_node *child_node; + int type; + u32 calculated; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!desc || !parent || !child); + + if (!(child->flags & SSDFS_BTREE_ITEMS_AREA_NEED_MOVE && + child->items_area.move.direction == SSDFS_BTREE_MOVE_TO_PARENT)) { + SSDFS_WARN("invalid move request: " + "flags %#x, direction %#x\n", + parent->flags, + parent->items_area.move.direction); + return -ERANGE; + } + + SSDFS_DBG("desc %p, parent %p, child %p\n", + desc, parent, child); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (child->items_area.move.op_state != SSDFS_BTREE_AREA_OP_REQUESTED) { + SSDFS_ERR("invalid operation state %#x\n", + parent->items_area.move.op_state); + return -ERANGE; + } else + child->items_area.move.op_state = SSDFS_BTREE_AREA_OP_FAILED; + + if (parent->flags & SSDFS_BTREE_LEVEL_ADD_NODE) + parent_node = parent->nodes.new_node.ptr; + else + parent_node = parent->nodes.old_node.ptr; + + child_node = child->nodes.old_node.ptr; + + if (!parent_node || !child_node) { + SSDFS_ERR("fail to move items: " + "parent_node %p, child_node %p\n", + parent_node, child_node); + return -ERANGE; + } + + type = atomic_read(&parent_node->type); + if (type != SSDFS_BTREE_HYBRID_NODE) { + SSDFS_ERR("parent node has improper type: " + "node_id %u, type %#x\n", + parent_node->node_id, type); + return -ERANGE; + } + + type = atomic_read(&child_node->type); + switch (type) { + case SSDFS_BTREE_LEAF_NODE: + case SSDFS_BTREE_HYBRID_NODE: + /* expected state */ + break; + + default: + SSDFS_ERR("child node has improper type: " + "node_id %u, type %#x\n", + child_node->node_id, type); + return -ERANGE; + } + + switch (child->items_area.move.pos.state) { + case SSDFS_HASH_RANGE_INTERSECTION: + if (child->items_area.move.pos.count == 0) { + SSDFS_ERR("invalid position's count %u\n", + child->items_area.move.pos.count); + return -ERANGE; + } + break; + + default: + SSDFS_ERR("invalid position's state %#x\n", + child->items_area.move.pos.state); + return -ERANGE; + } + + calculated = child->items_area.move.pos.count * desc->min_item_size; + + if (calculated >= desc->node_size) { + SSDFS_ERR("invalid position: " + "count %u, min_item_size %u, node_size %u\n", + child->items_area.move.pos.count, + desc->min_item_size, + desc->node_size); + return -ERANGE; + } + + if (!(parent->flags & SSDFS_BTREE_LEVEL_ADD_NODE) && + calculated > parent->items_area.free_space) { + SSDFS_ERR("child has not enough free space: " + "calculated %u, free_space %u\n", + calculated, + parent->items_area.free_space); + return -ERANGE; + } + + err = ssdfs_btree_node_move_items_range(child_node, parent_node, + child->items_area.move.pos.start, + child->items_area.move.pos.count); + if (unlikely(err)) { + SSDFS_ERR("fail to move items range: " + "src_node %u, dst_node %u, " + "start_item %u, count %u, " + "err %d\n", + child_node->node_id, + parent_node->node_id, + child->items_area.move.pos.start, + child->items_area.move.pos.count, + err); + return err; + } + + down_read(&child_node->header_lock); + + switch (atomic_read(&child_node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + child->index_area.area_size = child_node->index_area.area_size; + calculated = child_node->index_area.index_capacity; + calculated -= child_node->index_area.index_count; + calculated *= child_node->index_area.index_size; + child->index_area.free_space = calculated; + child->index_area.hash.start = + child_node->index_area.start_hash; + child->index_area.hash.end = + child_node->index_area.end_hash; + break; + + default: + /* do nothing */ + break; + } + + switch (atomic_read(&child_node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + child->items_area.area_size = child_node->items_area.area_size; + child->items_area.free_space = + child_node->items_area.free_space; + child->items_area.hash.start = + child_node->items_area.start_hash; + child->items_area.hash.end = + child_node->items_area.end_hash; + break; + + default: + /* do nothing */ + break; + } + + up_read(&child_node->header_lock); + + down_read(&parent_node->header_lock); + + switch (atomic_read(&parent_node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + parent->index_area.area_size = + parent_node->index_area.area_size; + calculated = parent_node->index_area.index_capacity; + calculated -= parent_node->index_area.index_count; + calculated *= parent_node->index_area.index_size; + parent->index_area.free_space = calculated; + parent->index_area.hash.start = + parent_node->index_area.start_hash; + parent->index_area.hash.end = + parent_node->index_area.end_hash; + break; + + default: + /* do nothing */ + break; + } + + switch (atomic_read(&parent_node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + parent->items_area.area_size = + parent_node->items_area.area_size; + parent->items_area.free_space = + parent_node->items_area.free_space; + parent->items_area.hash.start = + parent_node->items_area.start_hash; + parent->items_area.hash.end = + parent_node->items_area.end_hash; + break; + + default: + /* do nothing */ + break; + } + + up_read(&parent_node->header_lock); + + child->items_area.move.op_state = SSDFS_BTREE_AREA_OP_DONE; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("src_node %u, dst_node %u, " + "start_item %u, count %u\n", + child_node->node_id, + parent_node->node_id, + child->items_area.move.pos.start, + child->items_area.move.pos.count); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_btree_move_items() - move items between nodes + * @desc: btree state descriptor + * @parent: parent level descriptor + * @child: child level descriptor + * + * This method tries to move items between nodes. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_move_items(struct ssdfs_btree_state_descriptor *desc, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + int op_state; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!desc || !parent || !child); + + SSDFS_DBG("desc %p, parent %p, child %p\n", + desc, parent, child); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (child->flags & SSDFS_BTREE_ITEMS_AREA_NEED_MOVE) { + switch (child->items_area.move.direction) { + case SSDFS_BTREE_MOVE_TO_CHILD: + op_state = child->items_area.move.op_state; + if (op_state != SSDFS_BTREE_AREA_OP_DONE) { + SSDFS_ERR("invalid op_state %#x\n", + op_state); + return -ERANGE; + } + break; + + case SSDFS_BTREE_MOVE_TO_PARENT: + err = ssdfs_btree_move_items_child2parent(desc, + parent, + child); + if (unlikely(err)) { + SSDFS_ERR("failed to move items: err %d\n", + err); + return err; + } + break; + + case SSDFS_BTREE_MOVE_TO_LEFT: + err = ssdfs_btree_move_items_left(desc, parent, child); + if (unlikely(err)) { + SSDFS_ERR("failed to move items: err %d\n", + err); + return err; + } + break; + + case SSDFS_BTREE_MOVE_TO_RIGHT: + err = ssdfs_btree_move_items_right(desc, parent, child); + if (unlikely(err)) { + SSDFS_ERR("failed to move items: err %d\n", + err); + return err; + } + break; + + default: + SSDFS_ERR("invalid move direction %#x\n", + child->items_area.move.direction); + return -ERANGE; + } + } + + if (parent->flags & SSDFS_BTREE_ITEMS_AREA_NEED_MOVE) { + switch (parent->items_area.move.direction) { + case SSDFS_BTREE_MOVE_TO_CHILD: + err = ssdfs_btree_move_items_parent2child(desc, + parent, + child); + if (unlikely(err)) { + SSDFS_ERR("failed to move items: err %d\n", + err); + return err; + } + break; + + default: + SSDFS_ERR("invalid move direction %#x\n", + parent->items_area.move.direction); + return -ERANGE; + } + } + + return 0; +} + +/* + * ssdfs_btree_move_indexes_to_parent() - move indexes from child to parent node + * @desc: btree state descriptor + * @parent: parent level descriptor + * @child: child level descriptor + * + * This method tries to move indexes from the child to the parent node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static int +ssdfs_btree_move_indexes_to_parent(struct ssdfs_btree_state_descriptor *desc, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + struct ssdfs_btree_node *parent_node; + struct ssdfs_btree_node *child_node; + int type; + u16 start, count; + u32 calculated; + int state; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!desc || !parent || !child); + + if (!(child->flags & SSDFS_BTREE_INDEX_AREA_NEED_MOVE && + child->index_area.move.direction == SSDFS_BTREE_MOVE_TO_PARENT)) { + SSDFS_WARN("invalid move request: " + "flags %#x, direction %#x\n", + child->flags, + child->index_area.move.direction); + return -ERANGE; + } + + SSDFS_DBG("desc %p, parent %p, child %p\n", + desc, parent, child); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = child->index_area.move.op_state; + if (state != SSDFS_BTREE_AREA_OP_REQUESTED) { + SSDFS_ERR("invalid operation state %#x\n", + state); + return -ERANGE; + } else + child->index_area.move.op_state = SSDFS_BTREE_AREA_OP_FAILED; + + parent_node = parent->nodes.old_node.ptr; + child_node = child->nodes.old_node.ptr; + + if (!parent_node || !child_node) { + SSDFS_ERR("fail to move items: " + "parent_node %p, child_node %p\n", + parent_node, child_node); + return -ERANGE; + } + + type = atomic_read(&parent_node->type); + switch (type) { + case SSDFS_BTREE_INDEX_NODE: + /* expected state */ + break; + + default: + SSDFS_ERR("parent node has improper type: " + "node_id %u, type %#x\n", + parent_node->node_id, type); + return -ERANGE; + } + + type = atomic_read(&child_node->type); + switch (type) { + case SSDFS_BTREE_HYBRID_NODE: + /* expected state */ + break; + + default: + SSDFS_ERR("child node has improper type: " + "node_id %u, type %#x\n", + child_node->node_id, type); + return -ERANGE; + } + + start = child->index_area.move.pos.start; + count = child->index_area.move.pos.count; + + switch (child->index_area.move.pos.state) { + case SSDFS_HASH_RANGE_INTERSECTION: + if (count == 0) { + SSDFS_ERR("invalid position's count %u\n", + count); + return -ERANGE; + } + break; + + default: + SSDFS_ERR("invalid position's state %#x\n", + parent->index_area.move.pos.state); + return -ERANGE; + } + + calculated = (start + count) * desc->index_size; + if (calculated >= desc->node_size) { + SSDFS_ERR("invalid position: " + "start %u, count %u, " + "index_size %u, node_size %u\n", + child->index_area.move.pos.start, + child->index_area.move.pos.count, + desc->index_size, + desc->node_size); + return -ERANGE; + } + + calculated = count * desc->index_size; + if (calculated > parent->index_area.free_space) { + SSDFS_ERR("child has not enough free space: " + "calculated %u, free_space %u\n", + calculated, + parent->index_area.free_space); + return -ERANGE; + } + + state = parent->index_area.insert.op_state; + if (state != SSDFS_BTREE_AREA_OP_REQUESTED) { + SSDFS_ERR("invalid operation state %#x\n", + state); + return -ERANGE; + } else + parent->index_area.insert.op_state = SSDFS_BTREE_AREA_OP_FAILED; + + switch (parent->index_area.insert.pos.state) { + case SSDFS_HASH_RANGE_RIGHT_ADJACENT: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid position's state %#x\n", + parent->index_area.insert.pos.state); + return -ERANGE; + } + + if (count != parent->index_area.insert.pos.count) { + SSDFS_ERR("inconsistent state: " + "child->index_area.move.pos.count %u, " + "parent->index_area.insert.pos.count %u\n", + child->index_area.move.pos.count, + parent->index_area.insert.pos.count); + return -ERANGE; + } + + err = ssdfs_btree_node_move_index_range(child_node, + child->index_area.move.pos.start, + parent_node, + parent->index_area.insert.pos.start, + parent->index_area.insert.pos.count); + if (unlikely(err)) { + SSDFS_ERR("fail to move index range: " + "src_node %u, dst_node %u, " + "src_start %u, dst_start %u, count %u, " + "err %d\n", + child_node->node_id, + parent_node->node_id, + child->index_area.move.pos.start, + parent->index_area.insert.pos.start, + parent->index_area.insert.pos.count, + err); + return err; + } + + down_read(&parent_node->header_lock); + parent->index_area.hash.start = parent_node->index_area.start_hash; + parent->index_area.hash.end = parent_node->index_area.end_hash; + parent->items_area.hash.start = parent_node->items_area.start_hash; + parent->items_area.hash.end = parent_node->items_area.end_hash; + up_read(&parent_node->header_lock); + + down_read(&child_node->header_lock); + child->index_area.hash.start = child_node->index_area.start_hash; + child->index_area.hash.end = child_node->index_area.end_hash; + child->items_area.hash.start = child_node->items_area.start_hash; + child->items_area.hash.end = child_node->items_area.end_hash; + up_read(&child_node->header_lock); + + parent->index_area.insert.op_state = SSDFS_BTREE_AREA_OP_DONE; + child->index_area.move.op_state = SSDFS_BTREE_AREA_OP_DONE; + + return 0; +} + +/* + * ssdfs_btree_move_indexes_to_child() - move indexes from parent to child node + * @desc: btree state descriptor + * @parent: parent level descriptor + * @child: child level descriptor + * + * This method tries to move indexes from the parent to the child node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_move_indexes_to_child(struct ssdfs_btree_state_descriptor *desc, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + struct ssdfs_btree_node *parent_node; + struct ssdfs_btree_node *child_node; + int type; + u16 start, count; + u32 calculated; + int state; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!desc || !parent || !child); + + if (!(parent->flags & SSDFS_BTREE_INDEX_AREA_NEED_MOVE && + parent->index_area.move.direction == SSDFS_BTREE_MOVE_TO_CHILD)) { + SSDFS_WARN("invalid move request: " + "flags %#x, direction %#x\n", + parent->flags, + parent->index_area.move.direction); + return -ERANGE; + } + + SSDFS_DBG("desc %p, parent %p, child %p\n", + desc, parent, child); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = parent->index_area.move.op_state; + if (state != SSDFS_BTREE_AREA_OP_REQUESTED) { + SSDFS_ERR("invalid operation state %#x\n", + state); + return -ERANGE; + } else + parent->index_area.move.op_state = SSDFS_BTREE_AREA_OP_FAILED; + + if (parent->nodes.new_node.type == SSDFS_BTREE_ROOT_NODE) + parent_node = parent->nodes.new_node.ptr; + else + parent_node = parent->nodes.old_node.ptr; + + if (child->flags & SSDFS_BTREE_LEVEL_ADD_NODE) + child_node = child->nodes.new_node.ptr; + else + child_node = child->nodes.old_node.ptr; + + if (!parent_node || !child_node) { + SSDFS_ERR("fail to move items: " + "parent_node %p, child_node %p\n", + parent_node, child_node); + return -ERANGE; + } + + type = atomic_read(&parent_node->type); + switch (type) { + case SSDFS_BTREE_ROOT_NODE: + case SSDFS_BTREE_HYBRID_NODE: + /* expected state */ + break; + + default: + SSDFS_ERR("parent node has improper type: " + "node_id %u, type %#x\n", + parent_node->node_id, type); + return -ERANGE; + } + + type = atomic_read(&child_node->type); + switch (type) { + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_INDEX_NODE: + /* expected state */ + break; + + default: + SSDFS_ERR("child node has improper type: " + "node_id %u, type %#x\n", + child_node->node_id, type); + return -ERANGE; + } + + start = parent->index_area.move.pos.start; + count = parent->index_area.move.pos.count; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start %u, count %u, state %#x\n", + parent->index_area.move.pos.start, + parent->index_area.move.pos.count, + parent->index_area.move.pos.state); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (parent->index_area.move.pos.state) { + case SSDFS_HASH_RANGE_INTERSECTION: + if (count == 0) { + SSDFS_ERR("invalid position's count %u\n", + count); + return -ERANGE; + } + break; + + default: + SSDFS_ERR("invalid position's state %#x\n", + parent->index_area.move.pos.state); + return -ERANGE; + } + + calculated = (start + count) * desc->index_size; + if (calculated >= desc->node_size) { + SSDFS_ERR("invalid position: " + "start %u, count %u, " + "index_size %u, node_size %u\n", + parent->index_area.move.pos.start, + parent->index_area.move.pos.count, + desc->index_size, + desc->node_size); + return -ERANGE; + } + + calculated = count * desc->index_size; + if (calculated > child->index_area.free_space) { + SSDFS_ERR("child has not enough free space: " + "calculated %u, free_space %u\n", + calculated, + child->index_area.free_space); + return -ERANGE; + } + + state = child->index_area.insert.op_state; + if (state != SSDFS_BTREE_AREA_OP_REQUESTED) { + SSDFS_ERR("invalid operation state %#x\n", + state); + return -ERANGE; + } else + child->index_area.insert.op_state = SSDFS_BTREE_AREA_OP_FAILED; + + switch (child->index_area.insert.pos.state) { + case SSDFS_HASH_RANGE_LEFT_ADJACENT: + case SSDFS_HASH_RANGE_INTERSECTION: + case SSDFS_HASH_RANGE_RIGHT_ADJACENT: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid position's state %#x\n", + child->index_area.insert.pos.state); + return -ERANGE; + } + + if (count != child->index_area.insert.pos.count) { + SSDFS_ERR("inconsistent state: " + "parent->index_area.move.pos.count %u, " + "child->index_area.insert.pos.count %u\n", + parent->index_area.move.pos.count, + child->index_area.insert.pos.count); + return -ERANGE; + } + + err = ssdfs_btree_node_move_index_range(parent_node, + parent->index_area.move.pos.start, + child_node, + child->index_area.insert.pos.start, + child->index_area.insert.pos.count); + if (unlikely(err)) { + SSDFS_ERR("fail to move index range: " + "src_node %u, dst_node %u, " + "src_start %u, dst_start %u, count %u, " + "err %d\n", + parent_node->node_id, + child_node->node_id, + parent->index_area.move.pos.start, + child->index_area.insert.pos.start, + child->index_area.insert.pos.count, + err); + SSDFS_ERR("child_node %u\n", child_node->node_id); + ssdfs_debug_show_btree_node_indexes(child_node->tree, + child_node); + SSDFS_ERR("parent_node %u\n", parent_node->node_id); + ssdfs_debug_show_btree_node_indexes(parent_node->tree, + parent_node); + return err; + } + + down_read(&parent_node->header_lock); + parent->index_area.hash.start = parent_node->index_area.start_hash; + parent->index_area.hash.end = parent_node->index_area.end_hash; + parent->items_area.hash.start = parent_node->items_area.start_hash; + parent->items_area.hash.end = parent_node->items_area.end_hash; + up_read(&parent_node->header_lock); + + down_read(&child_node->header_lock); + child->index_area.hash.start = child_node->index_area.start_hash; + child->index_area.hash.end = child_node->index_area.end_hash; + child->items_area.hash.start = child_node->items_area.start_hash; + child->items_area.hash.end = child_node->items_area.end_hash; + up_read(&child_node->header_lock); + + parent->index_area.move.op_state = SSDFS_BTREE_AREA_OP_DONE; + child->index_area.insert.op_state = SSDFS_BTREE_AREA_OP_DONE; + + return 0; +} From patchwork Sat Feb 25 01:09:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151964 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 558B1C7EE2F for ; Sat, 25 Feb 2023 01:19:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229827AbjBYBTx (ORCPT ); Fri, 24 Feb 2023 20:19:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50388 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229536AbjBYBTJ (ORCPT ); Fri, 24 Feb 2023 20:19:09 -0500 Received: from mail-oi1-x229.google.com (mail-oi1-x229.google.com [IPv6:2607:f8b0:4864:20::229]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A234D16AFF for ; Fri, 24 Feb 2023 17:17:42 -0800 (PST) Received: by mail-oi1-x229.google.com with SMTP id q15so781678oiw.11 for ; Fri, 24 Feb 2023 17:17:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=SrU+T5eeodCOduc9KnHpBgEpwyFrwnPHYxu6nWFw+6s=; b=CDnXZQzQx3hRh20Q3woSjT5backRqZWn28AQgqnCyGRO15URxUOKk3fM7xVwX6TdR4 KNglDxGaw/Fg/FuL4SOfufzs3O5KQOgXjGTq1mnAOhX7/5A9+yL9ltXN599FBT6FKNmq FTaKZL7VPuvFfTsLANJxu2dIkpb49AWgd9dgrPVLJhnlAlqmgS3tXuZx/1+WnvGOURf7 CHR7fWoo5bSfInd4Tx6GyU4OsrM32HCxwzfjqs9uEw9/Vuuw3vuvjC9ICnxyFVpkguLE ue1jL2g6fu8AGMF2zcZDS/91Xeh5jou3/FWrjOAM+V3FB0XiZFbzA/EwG8AJrJKQ/OkQ 57sw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SrU+T5eeodCOduc9KnHpBgEpwyFrwnPHYxu6nWFw+6s=; b=GrPzhItPmx3TD2i8nxbqAV/3Wg8bKK43IdP0NjLky/AQsociCBQRA7kMIjK32gBe9t 4ZcSMyPh6duXPpbPR6Af7MdDRQ2zh9bpHv/Nyo9VUrk5pXXwcTgo6l06z4TxUXygm8pd q+5bzCLb1eNBMmPy+XBO6ST1BCDANiwGeemgP5nfG3w/LrEo3VY967PWxOtYhY9XF3XX oD8CSKELj/xUsd8AUlyYxnxV7+c+pBvZlVfQZv3xz91iEpXpZaNt64GqKx1qEw0N+7Ix XEKWGyfMo+OwfdRTxz3wFFnDJgQVkXrXAFLFDIr7MFHF+dof+1CjSk92RPC2ClfHsAHu ZE6Q== X-Gm-Message-State: AO0yUKVdJHX5YIW9pIxJz8LOyV6wpZFjY89wwdrQg9LdQGiZuRBT7pDE /AR7N1uD+MNfgg5P2FCJa4Fbuk5qdsk+GhXG X-Google-Smtp-Source: AK7set8GJomXWpvmkn4nMh5dYCKa4l8JlNngQGJAU761anVHKyW/usIs+swAx1AqQgdU66nprxoi1w== X-Received: by 2002:a05:6808:2d6:b0:374:3688:36ee with SMTP id a22-20020a05680802d600b00374368836eemr4792193oid.54.1677287860272; Fri, 24 Feb 2023 17:17:40 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:39 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 59/76] ssdfs: execute b-tree hierarchy modification Date: Fri, 24 Feb 2023 17:09:10 -0800 Message-Id: <20230225010927.813929-60-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org For every b-tree modification request, file system logic creates the hierarchy object and executes the b-tree hierarchy check. The checking logic defines the actions that should be done for every level of b-tree to execute b-tree node's add or delete operation. Finally, b-tree hierarchy object represents the actions plan that modification logic has to execute. The execution logic simply starts from the bottom of the hierarchy and executes planned action for every level of b-tree. The planned actions could include adding a new empty node, moving items from hybrid parent node into leaf one, rebalancing b-tree, updating indexes. Finally, b-tree should be able to receive new items/indexes and be consistent. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/btree_hierarchy.c | 1869 ++++++++++++++++++++++++++++++++++++ 1 file changed, 1869 insertions(+) diff --git a/fs/ssdfs/btree_hierarchy.c b/fs/ssdfs/btree_hierarchy.c index 3c1444732019..a6a42833d57f 100644 --- a/fs/ssdfs/btree_hierarchy.c +++ b/fs/ssdfs/btree_hierarchy.c @@ -7549,3 +7549,1872 @@ int ssdfs_btree_move_indexes_to_child(struct ssdfs_btree_state_descriptor *desc, return 0; } + +/* + * ssdfs_btree_move_indexes_right() - move indexes from old to new node + * @desc: btree state descriptor + * @parent: parent level descriptor + * @child: child level descriptor + * + * This method tries to move indexes from the old to new node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_move_indexes_right(struct ssdfs_btree_state_descriptor *desc, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + struct ssdfs_btree_node *parent_node; + struct ssdfs_btree_node *old_node; + struct ssdfs_btree_node *new_node; + int type; + u16 start, count; + u32 calculated; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!desc || !parent || !child); + + if (!(child->flags & SSDFS_BTREE_INDEX_AREA_NEED_MOVE && + child->index_area.move.direction == SSDFS_BTREE_MOVE_TO_RIGHT)) { + SSDFS_WARN("invalid move request: " + "flags %#x, direction %#x\n", + child->flags, + child->index_area.move.direction); + return -ERANGE; + } + + SSDFS_DBG("desc %p, parent %p, child %p\n", + desc, parent, child); +#endif /* CONFIG_SSDFS_DEBUG */ + + parent_node = parent->nodes.old_node.ptr; + if (!parent_node) { + SSDFS_ERR("fail to move indexes: " + "parent_node %p\n", + parent_node); + return -ERANGE; + } + + type = atomic_read(&parent_node->type); + + switch (type) { + case SSDFS_BTREE_ROOT_NODE: + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + /* expected type */ + break; + + default: + SSDFS_ERR("parent node has improper type: " + "node_id %u, type %#x\n", + parent_node->node_id, type); + return -ERANGE; + } + + if (child->index_area.move.op_state != SSDFS_BTREE_AREA_OP_REQUESTED) { + SSDFS_ERR("invalid operation state %#x\n", + child->index_area.move.op_state); + return -ERANGE; + } else + child->index_area.move.op_state = SSDFS_BTREE_AREA_OP_FAILED; + + old_node = child->nodes.old_node.ptr; + new_node = child->nodes.new_node.ptr; + + if (!old_node || !new_node) { + SSDFS_ERR("fail to move indexes: " + "old_node %p, new_node %p\n", + old_node, new_node); + return -ERANGE; + } + + type = atomic_read(&old_node->type); + switch (type) { + case SSDFS_BTREE_INDEX_NODE: + /* expected type */ + break; + + default: + SSDFS_ERR("old node is not index node: " + "node_id %u, type %#x\n", + old_node->node_id, type); + return -ERANGE; + } + + type = atomic_read(&new_node->type); + switch (type) { + case SSDFS_BTREE_INDEX_NODE: + /* expected type */ + break; + + default: + SSDFS_ERR("new node is not index node: " + "node_id %u, type %#x\n", + new_node->node_id, type); + return -ERANGE; + } + + switch (child->index_area.move.pos.state) { + case SSDFS_HASH_RANGE_INTERSECTION: + case SSDFS_HASH_RANGE_OUT_OF_NODE: + if (child->index_area.move.pos.count == 0) { + SSDFS_ERR("invalid position's count %u\n", + child->index_area.move.pos.count); + return -ERANGE; + } + break; + + default: + SSDFS_ERR("invalid position's state %#x\n", + child->index_area.move.pos.state); + return -ERANGE; + } + + start = child->index_area.move.pos.start; + count = child->index_area.move.pos.count; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start %u, count %u, state %#x\n", + child->index_area.move.pos.start, + child->index_area.move.pos.count, + child->index_area.move.pos.state); +#endif /* CONFIG_SSDFS_DEBUG */ + + calculated = (start + count) * desc->index_size; + if (calculated >= desc->node_size) { + SSDFS_ERR("invalid position: " + "start %u, count %u, " + "index_size %u, node_size %u\n", + child->index_area.move.pos.start, + child->index_area.move.pos.count, + desc->index_size, + desc->node_size); + return -ERANGE; + } + + err = ssdfs_btree_node_move_index_range(old_node, + child->index_area.move.pos.start, + new_node, + 0, + child->index_area.move.pos.count); + if (unlikely(err)) { + SSDFS_ERR("fail to move index range: " + "src_node %u, dst_node %u, " + "src_start %u, dst_start %u, count %u, " + "err %d\n", + old_node->node_id, + new_node->node_id, + child->index_area.move.pos.start, + 0, + child->index_area.move.pos.count, + err); + goto fail_move_indexes_right; + } + + err = ssdfs_btree_update_index_after_move(child, parent_node); + if (unlikely(err)) { + SSDFS_ERR("fail to update indexes in parent: err %d\n", + err); + goto fail_move_indexes_right; + } + + err = ssdfs_btree_update_parent_node_pointer(old_node->tree, old_node); + if (unlikely(err)) { + SSDFS_ERR("fail to update parent pointer: " + "node_id %u, err %d\n", + old_node->node_id, err); + goto fail_move_indexes_right; + } + + err = ssdfs_btree_update_parent_node_pointer(new_node->tree, new_node); + if (unlikely(err)) { + SSDFS_ERR("fail to update parent pointer: " + "node_id %u, err %d\n", + new_node->node_id, err); + goto fail_move_indexes_right; + } + + child->index_area.move.op_state = SSDFS_BTREE_AREA_OP_DONE; + return 0; + +fail_move_indexes_right: + SSDFS_ERR("old_node %u\n", old_node->node_id); + ssdfs_debug_show_btree_node_indexes(old_node->tree, old_node); + + SSDFS_ERR("new_node %u\n", new_node->node_id); + ssdfs_debug_show_btree_node_indexes(new_node->tree, new_node); + + SSDFS_ERR("parent_node %u\n", parent_node->node_id); + ssdfs_debug_show_btree_node_indexes(parent_node->tree, parent_node); + + return err; +} + +/* + * ssdfs_btree_move_indexes() - move indexes between parent and child nodes + * @desc: btree state descriptor + * @parent: parent level descriptor + * @child: child level descriptor + * + * This method tries to move indexes between parent and child nodes. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_move_indexes(struct ssdfs_btree_state_descriptor *desc, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + int op_state; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!desc || !parent || !child); + + SSDFS_DBG("desc %p, parent %p, child %p\n", + desc, parent, child); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (parent->flags & SSDFS_BTREE_INDEX_AREA_NEED_MOVE) { + switch (parent->index_area.move.direction) { + case SSDFS_BTREE_MOVE_TO_PARENT: + /* do nothing */ + break; + + case SSDFS_BTREE_MOVE_TO_CHILD: + err = ssdfs_btree_move_indexes_to_child(desc, + parent, + child); + if (unlikely(err)) { + SSDFS_ERR("failed to move indexes: err %d\n", + err); + return err; + } + break; + + case SSDFS_BTREE_MOVE_TO_RIGHT: + /* do nothing */ + break; + + default: + SSDFS_ERR("invalid move direction %#x\n", + parent->index_area.move.direction); + return -ERANGE; + } + } + + if (child->flags & SSDFS_BTREE_INDEX_AREA_NEED_MOVE) { + switch (child->index_area.move.direction) { + case SSDFS_BTREE_MOVE_TO_PARENT: + err = ssdfs_btree_move_indexes_to_parent(desc, + parent, + child); + if (unlikely(err)) { + SSDFS_ERR("failed to move indexes: err %d\n", + err); + return err; + } + break; + + case SSDFS_BTREE_MOVE_TO_CHILD: + op_state = child->index_area.move.op_state; + if (op_state != SSDFS_BTREE_AREA_OP_DONE) { + SSDFS_ERR("invalid op_state %#x\n", + op_state); + return -ERANGE; + } + break; + + case SSDFS_BTREE_MOVE_TO_RIGHT: + err = ssdfs_btree_move_indexes_right(desc, + parent, + child); + if (unlikely(err)) { + SSDFS_ERR("failed to move indexes: err %d\n", + err); + return err; + } + break; + + default: + SSDFS_ERR("invalid move direction %#x\n", + child->index_area.move.direction); + return -ERANGE; + } + } + + return 0; +} + +/* + * ssdfs_btree_resize_index_area() - resize index area of the node + * @desc: btree state descriptor + * @child: child level descriptor + * + * This method tries to resize the index area of the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOSPC - unable to resize the index area. + */ +static +int ssdfs_btree_resize_index_area(struct ssdfs_btree_state_descriptor *desc, + struct ssdfs_btree_level *child) +{ + struct ssdfs_btree_node *node; + u32 index_area_size, index_free_area; + u32 items_area_size, items_free_area; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!desc || !child); + + SSDFS_DBG("desc %p, child %p\n", + desc, child); +#endif /* CONFIG_SSDFS_DEBUG */ + + node = child->nodes.old_node.ptr; + + if (!node) { + SSDFS_ERR("node is NULL\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!(child->flags & SSDFS_BTREE_TRY_RESIZE_INDEX_AREA)) { + SSDFS_WARN("resize hasn't been requested\n"); + return 0; + } + + if (child->index_area.free_space >= desc->node_size) { + SSDFS_ERR("invalid index area's free space: " + "free_space %u, node_size %u\n", + child->index_area.free_space, + desc->node_size); + return -ERANGE; + } + + if (child->items_area.free_space >= desc->node_size) { + SSDFS_ERR("invalid items area's free space: " + "free_space %u, node_size %u\n", + child->items_area.free_space, + desc->node_size); + return -ERANGE; + } + + if (child->index_area.free_space % desc->index_size) { + SSDFS_ERR("invalid index area's free space: " + "free_space %u, index_size %u\n", + child->index_area.free_space, + desc->index_size); + return -ERANGE; + } + + if (desc->index_size >= desc->index_area_min_size) { + SSDFS_ERR("corrupted descriptor: " + "index_size %u, index_area_min_size %u\n", + desc->index_size, + desc->index_area_min_size); + return -ERANGE; + } + + if (desc->index_area_min_size % desc->index_size) { + SSDFS_ERR("corrupted descriptor: " + "index_size %u, index_area_min_size %u\n", + desc->index_size, + desc->index_area_min_size); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("INITIAL_STATE: " + "items_area: area_size %u, free_space %u; " + "index_area: area_size %u, free_space %u\n", + child->items_area.area_size, + child->items_area.free_space, + child->index_area.area_size, + child->index_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + index_area_size = child->index_area.area_size << 1; + index_free_area = index_area_size - child->index_area.area_size; + + if (index_area_size > desc->node_size) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to resize the index area: " + "requested_size %u, node_size %u\n", + index_free_area, desc->node_size); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } else if (index_area_size == desc->node_size) { + index_area_size = desc->node_size; + index_free_area = child->index_area.free_space; + index_free_area += child->items_area.free_space; + + items_area_size = 0; + items_free_area = 0; + } else if (child->items_area.free_space < index_free_area) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to resize the index area: " + "free_space %u, requested_size %u\n", + child->items_area.free_space, + index_free_area); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } else { + items_area_size = child->items_area.area_size; + items_area_size -= index_free_area; + items_free_area = child->items_area.free_space; + items_free_area -= index_free_area; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("NEW_STATE: " + "items_area: area_size %u, free_space %u; " + "index_area: area_size %u, free_space %u\n", + items_area_size, items_free_area, + index_area_size, index_free_area); + + BUG_ON(index_area_size == 0); + BUG_ON(index_area_size > desc->node_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_btree_node_resize_index_area(node, index_area_size); + if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to resize the index area: " + "node_id %u, new_size %u\n", + node->node_id, index_area_size); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to resize the index area: " + "node_id %u, new_size %u\n", + node->node_id, index_area_size); + } else { + child->index_area.area_size = index_area_size; + child->index_area.free_space = index_free_area; + child->items_area.area_size = items_area_size; + child->items_area.free_space = items_free_area; + } + + return err; +} + +/* + * ssdfs_btree_prepare_add_item() - prepare to add an item into the node + * @parent: parent level descriptor + * @child: child level descriptor + * + * This method tries to prepare the node for adding an item. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_prepare_add_item(struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + struct ssdfs_btree_node_insert *add; + struct ssdfs_btree_node *node = NULL; + struct ssdfs_btree_node *left_node = NULL, *right_node = NULL; + u64 start_hash, end_hash; + u16 count; + u8 min_item_size; + u32 free_space; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!parent || !child); + + SSDFS_DBG("parent %p, child %p\n", + parent, child); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!(child->flags & SSDFS_BTREE_LEVEL_ADD_ITEM)) { + SSDFS_WARN("add item hasn't been requested\n"); + return 0; + } + + add = &child->items_area.add; + + if (add->op_state != SSDFS_BTREE_AREA_OP_REQUESTED) { + SSDFS_ERR("invalid operation state %#x\n", + add->op_state); + return -ERANGE; + } else + add->op_state = SSDFS_BTREE_AREA_OP_FAILED; + + switch (add->pos.state) { + case SSDFS_HASH_RANGE_OUT_OF_NODE: + node = child->nodes.old_node.ptr; + + if (!node) { + SSDFS_ERR("node %p\n", node); + return -ERANGE; + } + + start_hash = child->items_area.add.hash.start; + end_hash = child->items_area.add.hash.end; + count = child->items_area.add.pos.count; + + down_write(&node->header_lock); + + if (node->items_area.items_count == 0) { + node->items_area.start_hash = start_hash; + node->items_area.end_hash = end_hash; + } + + free_space = node->items_area.free_space; + min_item_size = node->items_area.min_item_size; + + if (((u32)count * min_item_size) > free_space) { + err = -ERANGE; + SSDFS_ERR("free_space %u is too small\n", + free_space); + } + + up_write(&node->header_lock); + break; + + case SSDFS_HASH_RANGE_LEFT_ADJACENT: + left_node = child->nodes.new_node.ptr; + right_node = child->nodes.old_node.ptr; + + if (!left_node || !right_node) { + SSDFS_ERR("left_node %p, right_node %p\n", + left_node, right_node); + return -ERANGE; + } + + start_hash = child->items_area.add.hash.start; + end_hash = child->items_area.add.hash.end; + count = child->items_area.add.pos.count; + + down_write(&left_node->header_lock); + + if (left_node->items_area.items_count == 0) { + left_node->items_area.start_hash = start_hash; + left_node->items_area.end_hash = end_hash; + } + + free_space = left_node->items_area.free_space; + min_item_size = left_node->items_area.min_item_size; + + if (((u32)count * min_item_size) > free_space) { + err = -ERANGE; + SSDFS_ERR("free_space %u is too small\n", + free_space); + goto finish_left_adjacent_check; + } + +finish_left_adjacent_check: + up_write(&left_node->header_lock); + break; + + case SSDFS_HASH_RANGE_INTERSECTION: + left_node = child->nodes.old_node.ptr; + right_node = child->nodes.new_node.ptr; + + if (!left_node) { + SSDFS_ERR("left_node %p, right_node %p\n", + left_node, right_node); + return -ERANGE; + } + + count = child->items_area.add.pos.count; + + down_write(&left_node->header_lock); + + free_space = left_node->items_area.free_space; + min_item_size = left_node->items_area.min_item_size; + + if (((u32)count * min_item_size) > free_space) { + err = -ERANGE; + SSDFS_ERR("free_space %u is too small\n", + free_space); + goto finish_intersection_check; + } + +finish_intersection_check: + up_write(&left_node->header_lock); + break; + + case SSDFS_HASH_RANGE_RIGHT_ADJACENT: + left_node = child->nodes.old_node.ptr; + right_node = child->nodes.new_node.ptr; + + if (!left_node || !right_node) { + SSDFS_ERR("left_node %p, right_node %p\n", + left_node, right_node); + return -ERANGE; + } + + start_hash = child->items_area.add.hash.start; + end_hash = child->items_area.add.hash.end; + count = child->items_area.add.pos.count; + + down_write(&right_node->header_lock); + + if (right_node->items_area.items_count == 0) { + right_node->items_area.start_hash = start_hash; + right_node->items_area.end_hash = end_hash; + } + + free_space = right_node->items_area.free_space; + min_item_size = right_node->items_area.min_item_size; + + if (((u32)count * min_item_size) > free_space) { + err = -ERANGE; + SSDFS_ERR("free_space %u is too small\n", + free_space); + goto finish_right_adjacent_check; + } + +finish_right_adjacent_check: + up_write(&right_node->header_lock); + break; + + default: + SSDFS_ERR("invalid position's state %#x\n", + add->pos.state); + return -ERANGE; + } + + if (!err) + add->op_state = SSDFS_BTREE_AREA_OP_DONE; + + return err; +} + +/* + * __ssdfs_btree_update_index() - update index in the parent node + * @parent_node: parent node + * @child_node: child node + * + * This method tries to update an index into the parent node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_btree_update_index(struct ssdfs_btree_node *parent_node, + struct ssdfs_btree_node *child_node) +{ + struct ssdfs_btree_index_key old_key, new_key; + int parent_type, child_type; + u64 start_hash = U64_MAX; + u64 old_hash = U64_MAX; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!parent_node || !child_node); + + SSDFS_DBG("parent_node %p, child_node %p\n", + parent_node, child_node); +#endif /* CONFIG_SSDFS_DEBUG */ + + parent_type = atomic_read(&parent_node->type); + child_type = atomic_read(&child_node->type); + + switch (parent_type) { + case SSDFS_BTREE_ROOT_NODE: + switch (child_type) { + case SSDFS_BTREE_LEAF_NODE: + down_read(&child_node->header_lock); + start_hash = child_node->items_area.start_hash; + up_read(&child_node->header_lock); + break; + + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_INDEX_NODE: + down_read(&child_node->header_lock); + start_hash = child_node->index_area.start_hash; + up_read(&child_node->header_lock); + break; + + default: + SSDFS_ERR("unexpected child type %#x\n", + child_type); + return -ERANGE; + } + break; + + case SSDFS_BTREE_HYBRID_NODE: + switch (child_type) { + case SSDFS_BTREE_LEAF_NODE: + down_read(&child_node->header_lock); + start_hash = child_node->items_area.start_hash; + up_read(&child_node->header_lock); + break; + + case SSDFS_BTREE_HYBRID_NODE: + if (parent_node == child_node) { + down_read(&child_node->header_lock); + start_hash = child_node->items_area.start_hash; + up_read(&child_node->header_lock); + } else { + down_read(&child_node->header_lock); + start_hash = child_node->index_area.start_hash; + up_read(&child_node->header_lock); + } + break; + + case SSDFS_BTREE_INDEX_NODE: + down_read(&child_node->header_lock); + start_hash = child_node->index_area.start_hash; + up_read(&child_node->header_lock); + break; + + default: + SSDFS_ERR("unexpected child type %#x\n", + child_type); + return -ERANGE; + } + + break; + + case SSDFS_BTREE_INDEX_NODE: + switch (child_type) { + case SSDFS_BTREE_LEAF_NODE: + down_read(&child_node->header_lock); + start_hash = child_node->items_area.start_hash; + up_read(&child_node->header_lock); + break; + + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_INDEX_NODE: + down_read(&child_node->header_lock); + start_hash = child_node->index_area.start_hash; + up_read(&child_node->header_lock); + break; + + default: + SSDFS_ERR("unexpected child type %#x\n", + child_type); + return -ERANGE; + } + break; + + default: + SSDFS_ERR("unexpected parent type %#x\n", + parent_type); + return -ERANGE; + } + + if (parent_type == SSDFS_BTREE_HYBRID_NODE && + child_type == SSDFS_BTREE_HYBRID_NODE && + parent_node == child_node) { + down_read(&parent_node->header_lock); + old_hash = parent_node->items_area.start_hash; + up_read(&parent_node->header_lock); + } + + spin_lock(&child_node->descriptor_lock); + + ssdfs_memcpy(&old_key, + 0, sizeof(struct ssdfs_btree_index_key), + &child_node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + + if (parent_type == SSDFS_BTREE_HYBRID_NODE && + child_type == SSDFS_BTREE_HYBRID_NODE && + parent_node == child_node) { + if (old_hash == U64_MAX) { + err = -ERANGE; + SSDFS_WARN("invalid old hash\n"); + goto finish_update_index; + } + + old_key.index.hash = cpu_to_le64(old_hash); + } + + ssdfs_memcpy(&child_node->node_index.index.extent, + 0, sizeof(struct ssdfs_raw_extent), + &child_node->extent, + 0, sizeof(struct ssdfs_raw_extent), + sizeof(struct ssdfs_raw_extent)); + ssdfs_memcpy(&new_key, + 0, sizeof(struct ssdfs_btree_index_key), + &child_node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + new_key.index.hash = cpu_to_le64(start_hash); + ssdfs_memcpy(&child_node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + &new_key, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + +finish_update_index: + spin_unlock(&child_node->descriptor_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, node_type %#x, " + "node_height %u, hash %llx\n", + le32_to_cpu(new_key.node_id), + new_key.node_type, + new_key.height, + le64_to_cpu(new_key.index.hash)); + SSDFS_DBG("seg_id %llu, logical_blk %u, len %u\n", + le64_to_cpu(new_key.index.extent.seg_id), + le32_to_cpu(new_key.index.extent.logical_blk), + le32_to_cpu(new_key.index.extent.len)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) + return err; + + err = ssdfs_btree_node_change_index(parent_node, + &old_key, &new_key); + if (unlikely(err)) { + SSDFS_ERR("fail to update index: err %d\n", err); + return err; + } + + return 0; +} + +/* + * ssdfs_btree_update_index() - update the index in the parent node + * @desc: btree state descriptor + * @parent: parent level descriptor + * @child: child level descriptor + * + * This method tries to update the index into the parent node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_update_index(struct ssdfs_btree_state_descriptor *desc, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + struct ssdfs_btree_node *parent_node = NULL, *child_node = NULL; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!desc || !parent || !child); + + SSDFS_DBG("desc %p, parent %p, child %p\n", + desc, parent, child); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!(parent->flags & SSDFS_BTREE_LEVEL_UPDATE_INDEX)) { + SSDFS_WARN("update index hasn't been requested\n"); + return 0; + } + + if (parent->flags & SSDFS_BTREE_LEVEL_ADD_NODE) + parent_node = parent->nodes.new_node.ptr; + else if (parent->nodes.old_node.ptr) + parent_node = parent->nodes.old_node.ptr; + else + parent_node = parent->nodes.new_node.ptr; + + if (child->flags & SSDFS_BTREE_LEVEL_ADD_NODE) + child_node = child->nodes.new_node.ptr; + else + child_node = child->nodes.old_node.ptr; + + if (!parent_node || !child_node) { + SSDFS_ERR("invalid pointer: " + "parent_node %p, child_node %p\n", + parent_node, child_node); + return -ERANGE; + } + + err = __ssdfs_btree_update_index(parent_node, child_node); + if (unlikely(err)) { + SSDFS_ERR("fail to update index: err %d\n", err); + return err; + } + + return 0; +} + +/* + * __ssdfs_btree_add_index() - add index in the parent node + * @parent_node: parent node + * @child_node: child node + * + * This method tries to add an index into the parent node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_btree_add_index(struct ssdfs_btree_node *parent_node, + struct ssdfs_btree_node *child_node) +{ + struct ssdfs_btree_index_key key; + int type; + u64 start_hash = U64_MAX; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!parent_node || !child_node); + + SSDFS_DBG("parent_node %p, child_node %p\n", + parent_node, child_node); +#endif /* CONFIG_SSDFS_DEBUG */ + + type = atomic_read(&child_node->type); + + switch (type) { + case SSDFS_BTREE_LEAF_NODE: + down_read(&child_node->header_lock); + start_hash = child_node->items_area.start_hash; + up_read(&child_node->header_lock); + break; + + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_INDEX_NODE: + down_read(&child_node->header_lock); + start_hash = child_node->index_area.start_hash; + up_read(&child_node->header_lock); + break; + } + + spin_lock(&child_node->descriptor_lock); + if (start_hash != U64_MAX) { + child_node->node_index.index.hash = + cpu_to_le64(start_hash); + } + ssdfs_memcpy(&child_node->node_index.index.extent, + 0, sizeof(struct ssdfs_raw_extent), + &child_node->extent, + 0, sizeof(struct ssdfs_raw_extent), + sizeof(struct ssdfs_raw_extent)); + ssdfs_memcpy(&key, + 0, sizeof(struct ssdfs_btree_index_key), + &child_node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + spin_unlock(&child_node->descriptor_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, node_type %#x, " + "node_height %u, hash %llx\n", + le32_to_cpu(key.node_id), + key.node_type, + key.height, + le64_to_cpu(key.index.hash)); + SSDFS_DBG("seg_id %llu, logical_blk %u, len %u\n", + le64_to_cpu(key.index.extent.seg_id), + le32_to_cpu(key.index.extent.logical_blk), + le32_to_cpu(key.index.extent.len)); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_btree_node_add_index(parent_node, &key); + if (unlikely(err)) { + SSDFS_ERR("fail to add index: err %d\n", err); + return err; + } + + + return 0; +} + +/* + * ssdfs_btree_add_index() - add an index into parent node + * @desc: btree state descriptor + * @parent: parent level descriptor + * @child: child level descriptor + * + * This method tries to add an index into parent node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_add_index(struct ssdfs_btree_state_descriptor *desc, + struct ssdfs_btree_level *parent, + struct ssdfs_btree_level *child) +{ + struct ssdfs_btree_node *parent_node = NULL, *child_node = NULL; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!desc || !parent || !child); + + SSDFS_DBG("desc %p, parent %p, child %p\n", + desc, parent, child); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!(parent->flags & SSDFS_BTREE_LEVEL_ADD_INDEX)) { + SSDFS_WARN("add index hasn't been requested\n"); + return -ERANGE; + } + + if (parent->flags & SSDFS_BTREE_LEVEL_ADD_NODE) + parent_node = parent->nodes.new_node.ptr; + else if (parent->nodes.old_node.ptr) + parent_node = parent->nodes.old_node.ptr; + else + parent_node = parent->nodes.new_node.ptr; + + child_node = child->nodes.new_node.ptr; + + if (!parent_node || !child_node) { + SSDFS_ERR("invalid pointer: " + "parent_node %p, child_node %p\n", + parent_node, child_node); + return -ERANGE; + } + + err = __ssdfs_btree_add_index(parent_node, child_node); + if (unlikely(err)) { + SSDFS_ERR("fail to add index: err %d\n", + err); + return err; + } + + return 0; +} + +/* + * ssdfs_btree_update_index_after_move() - update sibling nodes' indexes + * @child: child level descriptor + * @parent_node: parent node + * + * This method tries to update the sibling nodes' indexes + * after operation of moving items/indexes. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_update_index_after_move(struct ssdfs_btree_level *child, + struct ssdfs_btree_node *parent_node) +{ + struct ssdfs_btree_node *child_node = NULL; + int parent_type, child_type; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!child || !parent_node); + + SSDFS_DBG("child %p, parent_node %p\n", + child, parent_node); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (child->flags & SSDFS_BTREE_ITEMS_AREA_NEED_MOVE || + child->flags & SSDFS_BTREE_INDEX_AREA_NEED_MOVE) { + struct ssdfs_btree_node_move *move; + + if (child->flags & SSDFS_BTREE_ITEMS_AREA_NEED_MOVE) + move = &child->items_area.move; + else if (child->flags & SSDFS_BTREE_INDEX_AREA_NEED_MOVE) + move = &child->index_area.move; + else + BUG(); + + switch (move->direction) { + case SSDFS_BTREE_MOVE_TO_LEFT: + case SSDFS_BTREE_MOVE_TO_RIGHT: + /* expected state */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("nothing should be done: " + "direction %#x\n", + move->direction); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + child_node = child->nodes.old_node.ptr; + if (!child_node) { + SSDFS_ERR("invalid child pointer\n"); + return -ERANGE; + } + + parent_type = atomic_read(&parent_node->type); + child_type = atomic_read(&child_node->type); + + if (parent_type == SSDFS_BTREE_HYBRID_NODE && + child_type == SSDFS_BTREE_HYBRID_NODE && + parent_node == child_node) { + /* + * The hybrid node has been updated already. + */ + } else { + err = __ssdfs_btree_update_index(parent_node, + child_node); + if (unlikely(err)) { + SSDFS_ERR("fail to update index: err %d\n", + err); + return err; + } + } + + child_node = child->nodes.new_node.ptr; + if (!child_node) { + SSDFS_ERR("invalid child pointer\n"); + return -ERANGE; + } + + parent_type = atomic_read(&parent_node->type); + child_type = atomic_read(&child_node->type); + + if (parent_type == SSDFS_BTREE_HYBRID_NODE && + child_type == SSDFS_BTREE_HYBRID_NODE && + parent_node == child_node) { + /* + * The hybrid node has been updated already. + */ + } else { + if (child->flags & SSDFS_BTREE_LEVEL_ADD_NODE) { + /* + * Do nothing. Index will be added later. + */ + SSDFS_DBG("nothing should be done: " + "index will be added later\n"); + + /*err = __ssdfs_btree_add_index(parent_node, + child_node); + if (unlikely(err)) { + SSDFS_ERR("fail to add index: " + "err %d\n", + err); + return err; + }*/ + } else { + err = __ssdfs_btree_update_index(parent_node, + child_node); + if (unlikely(err)) { + SSDFS_ERR("fail to update index: " + "err %d\n", + err); + return err; + } + } + } + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("nothing should be done: " + "flags %#x\n", child->flags); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return 0; +} + +/* + * ssdfs_btree_process_level_for_add() - process a level of btree's hierarchy + * @hierarchy: btree's hierarchy + * @cur_height: current height + * @search: search object + * + * This method tries to process the level of btree's hierarchy. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOSPC - unable to resize the index area. + */ +int ssdfs_btree_process_level_for_add(struct ssdfs_btree_hierarchy *hierarchy, + int cur_height, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_state_descriptor *desc; + struct ssdfs_btree_level *cur_level; + struct ssdfs_btree_level *parent; + struct ssdfs_btree_node *node; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!hierarchy || !search); + + SSDFS_DBG("hierarchy %p, cur_height %d\n", + hierarchy, cur_height); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (cur_height >= hierarchy->desc.height) { + SSDFS_ERR("invalid hierarchy: " + "cur_height %d, tree_height %d\n", + cur_height, hierarchy->desc.height); + return -ERANGE; + } + + desc = &hierarchy->desc; + cur_level = hierarchy->array_ptr[cur_height]; + + if (!cur_level->flags) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("nothing to do: cur_height %d\n", + cur_height); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + if (cur_height == (hierarchy->desc.height - 1)) + goto check_necessity_increase_tree_height; + + parent = hierarchy->array_ptr[cur_height + 1]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_height %d, tree_height %d, " + "cur_level->flags %#x, parent->flags %#x\n", + cur_height, hierarchy->desc.height, + cur_level->flags, parent->flags); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (cur_level->flags & ~SSDFS_BTREE_ADD_NODE_MASK || + parent->flags & ~SSDFS_BTREE_ADD_NODE_MASK) { + SSDFS_ERR("invalid flags: cur_level %#x, parent %#x\n", + cur_level->flags, + parent->flags); + return -ERANGE; + } + + if (cur_level->flags & SSDFS_BTREE_LEVEL_ADD_NODE) { + if (!cur_level->nodes.new_node.ptr) { + SSDFS_ERR("new node hasn't been created\n"); + return -ERANGE; + } + } + + if (parent->flags & SSDFS_BTREE_ITEMS_AREA_NEED_MOVE) { + err = ssdfs_btree_move_items(desc, parent, cur_level); + if (unlikely(err)) { + SSDFS_ERR("fail to move items: err %d\n", + err); + return err; + } + } + + if (cur_level->flags & SSDFS_BTREE_ITEMS_AREA_NEED_MOVE) { + err = ssdfs_btree_move_items(desc, parent, cur_level); + if (unlikely(err)) { + SSDFS_ERR("fail to move items: err %d\n", + err); + return err; + } + } + + if (parent->flags & SSDFS_BTREE_TRY_RESIZE_INDEX_AREA) { + err = ssdfs_btree_resize_index_area(desc, parent); + if (unlikely(err)) { + SSDFS_ERR("fail to resize index area: err %d\n", + err); + return err; + } + } + + if (parent->flags & SSDFS_BTREE_INDEX_AREA_NEED_MOVE) { + err = ssdfs_btree_move_indexes(desc, parent, cur_level); + if (err == -ENOSPC) { + err = ssdfs_btree_resize_index_area(desc, cur_level); + if (unlikely(err)) { + SSDFS_ERR("fail to resize index area: err %d\n", + err); + return err; + } + + err = ssdfs_btree_move_indexes(desc, parent, cur_level); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to move indexes: err %d\n", + err); + return err; + } + } + + if (cur_level->flags & SSDFS_BTREE_INDEX_AREA_NEED_MOVE) { + err = ssdfs_btree_move_indexes(desc, parent, cur_level); + if (err == -ENOSPC) { + err = ssdfs_btree_resize_index_area(desc, cur_level); + if (unlikely(err)) { + SSDFS_ERR("fail to resize index area: err %d\n", + err); + return err; + } + + err = ssdfs_btree_move_indexes(desc, parent, cur_level); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to move indexes: err %d\n", + err); + return err; + } + } + + if (cur_level->flags & SSDFS_BTREE_LEVEL_ADD_ITEM) { + err = ssdfs_btree_prepare_add_item(parent, cur_level); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare node for add: err %d\n", + err); + return err; + } + } + + if (parent->flags & SSDFS_BTREE_LEVEL_UPDATE_INDEX) { + err = ssdfs_btree_update_index(desc, parent, cur_level); + if (unlikely(err)) { + SSDFS_ERR("fail to update the index: err %d\n", + err); + return err; + } + } + + if (parent->flags & SSDFS_BTREE_LEVEL_ADD_INDEX) { + err = ssdfs_btree_add_index(desc, parent, cur_level); + if (unlikely(err)) { + SSDFS_ERR("fail to add the index: err %d\n", + err); + return err; + } + } + + if (cur_height == (hierarchy->desc.height - 1)) { +check_necessity_increase_tree_height: + if (cur_level->nodes.old_node.ptr) + node = cur_level->nodes.old_node.ptr; + else if (cur_level->nodes.new_node.ptr) + node = cur_level->nodes.new_node.ptr; + else + goto finish_process_level_for_add; + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_ROOT_NODE: + if (hierarchy->desc.increment_height) + atomic_inc(&node->height); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, node height %u, " + "cur_height %u, increment_height %#x\n", + node->node_id, atomic_read(&node->height), + cur_height, hierarchy->desc.increment_height); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + default: + /* do nothing */ + break; + } + } + +finish_process_level_for_add: + return 0; +} + +/* + * ssdfs_btree_delete_index() - delete index from the node + * @desc: btree state descriptor + * @level: level descriptor + * + * This method tries to delete an index from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_btree_delete_index(struct ssdfs_btree_state_descriptor *desc, + struct ssdfs_btree_level *level) +{ + struct ssdfs_btree_node *node; + struct ssdfs_btree_node_delete *delete; + u64 hash; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!desc || !level); + + SSDFS_DBG("desc %p, level %p\n", + desc, level); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!(level->flags & SSDFS_BTREE_LEVEL_DELETE_INDEX)) { + SSDFS_WARN("delete index hasn't been requested\n"); + return 0; + } + + node = level->nodes.old_node.ptr; + if (!node) { + SSDFS_ERR("invalid pointer: node %p\n", + node); + return -ERANGE; + } + + delete = &level->index_area.delete; + + if (delete->op_state != SSDFS_BTREE_AREA_OP_REQUESTED) { + SSDFS_ERR("invalid operation state %#x\n", + delete->op_state); + return -ERANGE; + } else + delete->op_state = SSDFS_BTREE_AREA_OP_FAILED; + + hash = cpu_to_le64(delete->node_index.index.hash); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree type %#x, node_id %u, hash %llx\n", + node->tree->type, node->node_id, hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_btree_node_delete_index(node, hash); + if (unlikely(err)) { + SSDFS_ERR("fail to delete index: " + "hash %llx, err %d\n", + hash, err); + return err; + } + + delete->op_state = SSDFS_BTREE_AREA_OP_DONE; + + return 0; +} + +/* + * ssdfs_btree_process_level_for_delete() - process a level of btree's hierarchy + * @hierarchy: btree's hierarchy + * @cur_height: current height + * @search: search object + * + * This method tries to process the level of btree's hierarchy. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_btree_process_level_for_delete(struct ssdfs_btree_hierarchy *ptr, + int cur_height, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_state_descriptor *desc; + struct ssdfs_btree_level *cur_level; + struct ssdfs_btree_level *parent; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr || !search); + + SSDFS_DBG("hierarchy %p, cur_height %d\n", + ptr, cur_height); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (cur_height >= ptr->desc.height) { + SSDFS_ERR("invalid hierarchy: " + "cur_height %d, tree_height %d\n", + cur_height, ptr->desc.height); + return -ERANGE; + } + + desc = &ptr->desc; + cur_level = ptr->array_ptr[cur_height]; + parent = ptr->array_ptr[cur_height + 1]; + + if (!cur_level->flags) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("nothing to do: cur_height %d\n", + cur_height); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + if (cur_level->flags & ~SSDFS_BTREE_DELETE_NODE_MASK) { + SSDFS_ERR("invalid flags %#x\n", + cur_level->flags); + return -ERANGE; + } + + if (cur_level->flags & SSDFS_BTREE_LEVEL_DELETE_INDEX) { + err = ssdfs_btree_delete_index(desc, cur_level); + if (unlikely(err)) { + SSDFS_ERR("fail to delete the index: err %d\n", + err); + return err; + } + } + + if (parent->flags & SSDFS_BTREE_LEVEL_UPDATE_INDEX) { + err = ssdfs_btree_update_index(desc, parent, cur_level); + if (unlikely(err)) { + SSDFS_ERR("fail to update the index: err %d\n", + err); + return err; + } + } + + return 0; +} + +/* + * ssdfs_btree_process_level_for_update() - process a level of btree's hierarchy + * @hierarchy: btree's hierarchy + * @cur_height: current height + * @search: search object + * + * This method tries to process the level of btree's hierarchy. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_btree_process_level_for_update(struct ssdfs_btree_hierarchy *ptr, + int cur_height, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_state_descriptor *desc; + struct ssdfs_btree_level *cur_level; + struct ssdfs_btree_level *parent; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr || !search); + + SSDFS_DBG("hierarchy %p, cur_height %d\n", + ptr, cur_height); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_debug_btree_hierarchy_object(ptr); + + if (cur_height >= ptr->desc.height) { + SSDFS_ERR("invalid hierarchy: " + "cur_height %d, tree_height %d\n", + cur_height, ptr->desc.height); + return -ERANGE; + } + + desc = &ptr->desc; + cur_level = ptr->array_ptr[cur_height]; + parent = ptr->array_ptr[cur_height + 1]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("parent->flags %#x\n", parent->flags); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!parent->flags) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("nothing to do: cur_height %d\n", + cur_height); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + if (parent->flags & ~SSDFS_BTREE_LEVEL_FLAGS_MASK) { + SSDFS_ERR("invalid flags %#x\n", + cur_level->flags); + return -ERANGE; + } + + if (parent->flags & SSDFS_BTREE_LEVEL_UPDATE_INDEX) { + err = ssdfs_btree_update_index(desc, parent, cur_level); + if (unlikely(err)) { + SSDFS_ERR("fail to update the index: err %d\n", + err); + return err; + } + } + + return 0; +} + +void ssdfs_show_btree_hierarchy_object(struct ssdfs_btree_hierarchy *ptr) +{ + struct ssdfs_btree_index_key *index_key; + int i; + + BUG_ON(!ptr); + + SSDFS_ERR("DESCRIPTOR: " + "height %d, increment_height %d, " + "node_size %u, index_size %u, " + "min_item_size %u, max_item_size %u, " + "index_area_min_size %u\n", + ptr->desc.height, ptr->desc.increment_height, + ptr->desc.node_size, ptr->desc.index_size, + ptr->desc.min_item_size, + ptr->desc.max_item_size, + ptr->desc.index_area_min_size); + + for (i = 0; i < ptr->desc.height; i++) { + struct ssdfs_btree_level *level = ptr->array_ptr[i]; + + SSDFS_ERR("LEVEL: height %d, flags %#x, " + "OLD_NODE: type %#x, ptr %p, " + "index_area (start %llx, end %llx), " + "items_area (start %llx, end %llx), " + "NEW_NODE: type %#x, ptr %p, " + "index_area (start %llx, end %llx), " + "items_area (start %llx, end %llx)\n", + i, level->flags, + level->nodes.old_node.type, + level->nodes.old_node.ptr, + level->nodes.old_node.index_hash.start, + level->nodes.old_node.index_hash.end, + level->nodes.old_node.items_hash.start, + level->nodes.old_node.items_hash.end, + level->nodes.new_node.type, + level->nodes.new_node.ptr, + level->nodes.new_node.index_hash.start, + level->nodes.new_node.index_hash.end, + level->nodes.new_node.items_hash.start, + level->nodes.new_node.items_hash.end); + + SSDFS_ERR("INDEX_AREA: area_size %u, free_space %u, " + "start_hash %llx, end_hash %llx\n", + level->index_area.area_size, + level->index_area.free_space, + level->index_area.hash.start, + level->index_area.hash.end); + + SSDFS_ERR("ADD: op_state %#x, start_hash %llx, " + "end_hash %llx, " + "POSITION(state %#x, start %u, count %u)\n", + level->index_area.add.op_state, + level->index_area.add.hash.start, + level->index_area.add.hash.end, + level->index_area.add.pos.state, + level->index_area.add.pos.start, + level->index_area.add.pos.count); + + SSDFS_ERR("INSERT: op_state %#x, start_hash %llx, " + "end_hash %llx, " + "POSITION(state %#x, start %u, count %u)\n", + level->index_area.insert.op_state, + level->index_area.insert.hash.start, + level->index_area.insert.hash.end, + level->index_area.insert.pos.state, + level->index_area.insert.pos.start, + level->index_area.insert.pos.count); + + SSDFS_ERR("MOVE: op_state %#x, direction %#x, " + "POSITION(state %#x, start %u, count %u)\n", + level->index_area.move.op_state, + level->index_area.move.direction, + level->index_area.move.pos.state, + level->index_area.move.pos.start, + level->index_area.move.pos.count); + + index_key = &level->index_area.delete.node_index; + SSDFS_ERR("DELETE: op_state %#x, " + "INDEX_KEY: node_id %u, node_type %#x, " + "height %u, flags %#x, hash %llx, " + "seg_id %llu, logical_blk %u, len %u\n", + level->index_area.delete.op_state, + le32_to_cpu(index_key->node_id), + index_key->node_type, + index_key->height, + le16_to_cpu(index_key->flags), + le64_to_cpu(index_key->index.hash), + le64_to_cpu(index_key->index.extent.seg_id), + le32_to_cpu(index_key->index.extent.logical_blk), + le32_to_cpu(index_key->index.extent.len)); + + SSDFS_ERR("ITEMS_AREA: area_size %u, free_space %u, " + "start_hash %llx, end_hash %llx\n", + level->items_area.area_size, + level->items_area.free_space, + level->items_area.hash.start, + level->items_area.hash.end); + + SSDFS_ERR("ADD: op_state %#x, start_hash %llx, " + "end_hash %llx, " + "POSITION(state %#x, start %u, count %u)\n", + level->items_area.add.op_state, + level->items_area.add.hash.start, + level->items_area.add.hash.end, + level->items_area.add.pos.state, + level->items_area.add.pos.start, + level->items_area.add.pos.count); + + SSDFS_ERR("INSERT: op_state %#x, start_hash %llx, " + "end_hash %llx, " + "POSITION(state %#x, start %u, count %u)\n", + level->items_area.insert.op_state, + level->items_area.insert.hash.start, + level->items_area.insert.hash.end, + level->items_area.insert.pos.state, + level->items_area.insert.pos.start, + level->items_area.insert.pos.count); + + SSDFS_ERR("MOVE: op_state %#x, direction %#x, " + "POSITION(state %#x, start %u, count %u)\n", + level->items_area.move.op_state, + level->items_area.move.direction, + level->items_area.move.pos.state, + level->items_area.move.pos.start, + level->items_area.move.pos.count); + } +} + +void ssdfs_debug_btree_hierarchy_object(struct ssdfs_btree_hierarchy *ptr) +{ +#ifdef CONFIG_SSDFS_DEBUG + struct ssdfs_btree_index_key *index_key; + int i; + + BUG_ON(!ptr); + + SSDFS_DBG("DESCRIPTOR: " + "height %d, increment_height %d, " + "node_size %u, index_size %u, " + "min_item_size %u, max_item_size %u, " + "index_area_min_size %u\n", + ptr->desc.height, ptr->desc.increment_height, + ptr->desc.node_size, ptr->desc.index_size, + ptr->desc.min_item_size, + ptr->desc.max_item_size, + ptr->desc.index_area_min_size); + + for (i = 0; i < ptr->desc.height; i++) { + struct ssdfs_btree_level *level = ptr->array_ptr[i]; + + SSDFS_DBG("LEVEL: height %d, flags %#x, " + "OLD_NODE: type %#x, ptr %p, " + "index_area (start %llx, end %llx), " + "items_area (start %llx, end %llx), " + "NEW_NODE: type %#x, ptr %p, " + "index_area (start %llx, end %llx), " + "items_area (start %llx, end %llx)\n", + i, level->flags, + level->nodes.old_node.type, + level->nodes.old_node.ptr, + level->nodes.old_node.index_hash.start, + level->nodes.old_node.index_hash.end, + level->nodes.old_node.items_hash.start, + level->nodes.old_node.items_hash.end, + level->nodes.new_node.type, + level->nodes.new_node.ptr, + level->nodes.new_node.index_hash.start, + level->nodes.new_node.index_hash.end, + level->nodes.new_node.items_hash.start, + level->nodes.new_node.items_hash.end); + + SSDFS_DBG("INDEX_AREA: area_size %u, free_space %u, " + "start_hash %llx, end_hash %llx\n", + level->index_area.area_size, + level->index_area.free_space, + level->index_area.hash.start, + level->index_area.hash.end); + + SSDFS_DBG("ADD: op_state %#x, start_hash %llx, " + "end_hash %llx, " + "POSITION(state %#x, start %u, count %u)\n", + level->index_area.add.op_state, + level->index_area.add.hash.start, + level->index_area.add.hash.end, + level->index_area.add.pos.state, + level->index_area.add.pos.start, + level->index_area.add.pos.count); + + SSDFS_DBG("INSERT: op_state %#x, start_hash %llx, " + "end_hash %llx, " + "POSITION(state %#x, start %u, count %u)\n", + level->index_area.insert.op_state, + level->index_area.insert.hash.start, + level->index_area.insert.hash.end, + level->index_area.insert.pos.state, + level->index_area.insert.pos.start, + level->index_area.insert.pos.count); + + SSDFS_DBG("MOVE: op_state %#x, direction %#x, " + "POSITION(state %#x, start %u, count %u)\n", + level->index_area.move.op_state, + level->index_area.move.direction, + level->index_area.move.pos.state, + level->index_area.move.pos.start, + level->index_area.move.pos.count); + + index_key = &level->index_area.delete.node_index; + SSDFS_DBG("DELETE: op_state %#x, " + "INDEX_KEY: node_id %u, node_type %#x, " + "height %u, flags %#x, hash %llx, " + "seg_id %llu, logical_blk %u, len %u\n", + level->index_area.delete.op_state, + le32_to_cpu(index_key->node_id), + index_key->node_type, + index_key->height, + le16_to_cpu(index_key->flags), + le64_to_cpu(index_key->index.hash), + le64_to_cpu(index_key->index.extent.seg_id), + le32_to_cpu(index_key->index.extent.logical_blk), + le32_to_cpu(index_key->index.extent.len)); + + SSDFS_DBG("ITEMS_AREA: area_size %u, free_space %u, " + "start_hash %llx, end_hash %llx\n", + level->items_area.area_size, + level->items_area.free_space, + level->items_area.hash.start, + level->items_area.hash.end); + + SSDFS_DBG("ADD: op_state %#x, start_hash %llx, " + "end_hash %llx, " + "POSITION(state %#x, start %u, count %u)\n", + level->items_area.add.op_state, + level->items_area.add.hash.start, + level->items_area.add.hash.end, + level->items_area.add.pos.state, + level->items_area.add.pos.start, + level->items_area.add.pos.count); + + SSDFS_DBG("INSERT: op_state %#x, start_hash %llx, " + "end_hash %llx, " + "POSITION(state %#x, start %u, count %u)\n", + level->items_area.insert.op_state, + level->items_area.insert.hash.start, + level->items_area.insert.hash.end, + level->items_area.insert.pos.state, + level->items_area.insert.pos.start, + level->items_area.insert.pos.count); + + SSDFS_DBG("MOVE: op_state %#x, direction %#x, " + "POSITION(state %#x, start %u, count %u)\n", + level->items_area.move.op_state, + level->items_area.move.direction, + level->items_area.move.pos.state, + level->items_area.move.pos.start, + level->items_area.move.pos.count); + } +#endif /* CONFIG_SSDFS_DEBUG */ +} From patchwork Sat Feb 25 01:09:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151965 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 718EAC7EE36 for ; Sat, 25 Feb 2023 01:19:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229786AbjBYBTy (ORCPT ); Fri, 24 Feb 2023 20:19:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48798 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229539AbjBYBTK (ORCPT ); Fri, 24 Feb 2023 20:19:10 -0500 Received: from mail-oi1-x231.google.com (mail-oi1-x231.google.com [IPv6:2607:f8b0:4864:20::231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE64612BE7 for ; Fri, 24 Feb 2023 17:17:43 -0800 (PST) Received: by mail-oi1-x231.google.com with SMTP id q15so781706oiw.11 for ; Fri, 24 Feb 2023 17:17:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hQtdBTWPSIGinxpu9EfcnlDffR7AeQZLHgnbLDcqhUc=; b=b68OrgjSwsMmT/fackALFggBJMfS+WHxPkVIpbaiizA2x9rMdpOnvc1do56a7dzyg1 KSFeRPwNq2KeZjc+pUohDmSkXEjwko65LxAYNV1GKv0AgmCWTHACb1LSENd/BDDaWCn+ YBam4BDNDH2SunrHXLbEVntBda83oLaxo4qetEM5jN/tSLg7NMWiDfpeU9xGklwFfQ3h D24P6zZTjdm8LsYf9EcJHX/VDkCFMe/Vim6s/IAXA2iMmoqfkQ9a8CFR1fdLdF1GUQ6d ymRLNqT/V/0lppUjXdTsz+aERa3a+geRpqFdLUik6pJGvL9VJHElszdTJiLHfSQPuLZ5 MihQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hQtdBTWPSIGinxpu9EfcnlDffR7AeQZLHgnbLDcqhUc=; b=M57DSiiOAZ4TzIYfHJhUZgN9EdLyH0usngovcplJgk3cTqqrCjE89NNia8oPIDYXcx 4ZGQHOeZenKoBKFRFMj5/HimzZ4RbXgRrowTtm/f8LoSxIqYgqaIripDWBu+b8IOIZI3 peQNvkcUV3jSi7MsDFMeIVTpGVdekqVSHay+7OMlDtZD1+tAF+hh2MlJM9/jvFsXdLwd o8JamUqvMAm+zZEOyik1Z2dPzs2jRKeAME6LPHqxmMYfL4ZoTI1gUr+aEkD4Cj1i9od3 TY+1P0odU4xsAsFRFHy8vsl4gciH+c22e4ltyksV78UvLTGuDTYy59LQQ3/BE+386xHI gcGg== X-Gm-Message-State: AO0yUKWU7tREWThzqqp0W6YKMQJeiUS9FESEYDsYhWcMQVi4NzJJTuDn 2hassQJSI2nwvUpb0N46hcwhlYdABtZtpA2z X-Google-Smtp-Source: AK7set+ji5n60DNn1iVTbYjlOx99fN+kzE1jmXWPswkPYVfpD6KiH0qIMp1TG0rvPR7JmFWMz9NdPA== X-Received: by 2002:a05:6808:2811:b0:37d:cbe8:4e31 with SMTP id et17-20020a056808281100b0037dcbe84e31mr4336832oib.32.1677287862083; Fri, 24 Feb 2023 17:17:42 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:41 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 60/76] ssdfs: introduce inodes b-tree Date: Fri, 24 Feb 2023 17:09:11 -0800 Message-Id: <20230225010927.813929-61-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org SSDFS raw inode is the metadata structure of fixed size that can vary from 256 bytes to several KBs. The size of inode is defined during the file system’s volume creation. The most special part of the SSDFS raw inode is the private area that is used for storing: (1) small file inline, (2) root node of extents, dentries, and/or xattr b-tree. SSDFS inodes b-tree is the hybrid b-tree that includes the hybrid nodes with the goal to use the node’s space in more efficient way by means of combination the index and data records inside of the node. Root node of inodes b-tree is stored into the log footer or partial log header of every log. Generally speaking, it means that SSDFS file system is using the massive replication of the root node of inodes b-tree. Actually, inodes b-tree node’s space includes header, index area (in the case of hybrid node), and array of inodes are ordered by ID values. If node has 8 KB in size and inode structure is 256 bytes in size then the maximum capacity of one inodes b-tree’s node is 32 inodes. Generally speaking, inodes table can be imagined like an imaginary array that is extended by means of adding the new inodes into the tail of the array. However, inode can be allocated or deleted by virtue of create file or delete file operations, for example. As a result, every b-tree node has an allocation bitmap that is tracking the state (used or free) of every inode in the b-tree node. The allocation bitmap provides the mechanism of fast lookup a free inode with the goal to reuse the inodes of deleted files. Additionally, every b-tree node has a dirty bitmap that has goal to track modification of inodes. Generally speaking, the dirty bitmap provides the opportunity to flush not the whole node but the modified inodes only. As a result, such bitmap could play the cornerstone role in the delta-encoding or in the Diff-On-Write approach. Moreover, b-tree node has a lock bitmap that has responsibility to implement the mechanism of exclusive lock a particular inode without the necessity to lock exclusively the whole node. Generally speaking, the lock bitmap was introduced with the goal to improve the granularity of lock operation. As a result, it provides the way to modify the different inodes in the same b-tree node without the using of exclusive lock the whole b-tree node. However, the exclusive lock of the whole tree has to be used for the case of addition or deletion a b-tree node. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/inodes_tree.c | 3168 ++++++++++++++++++++++++++++++++++++++++ fs/ssdfs/inodes_tree.h | 177 +++ 2 files changed, 3345 insertions(+) create mode 100644 fs/ssdfs/inodes_tree.c create mode 100644 fs/ssdfs/inodes_tree.h diff --git a/fs/ssdfs/inodes_tree.c b/fs/ssdfs/inodes_tree.c new file mode 100644 index 000000000000..1cc42cc84513 --- /dev/null +++ b/fs/ssdfs/inodes_tree.c @@ -0,0 +1,3168 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/inodes_tree.c - inodes btree implementation. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "btree_search.h" +#include "btree_node.h" +#include "btree.h" +#include "inodes_tree.h" + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_ino_tree_page_leaks; +atomic64_t ssdfs_ino_tree_memory_leaks; +atomic64_t ssdfs_ino_tree_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_ino_tree_cache_leaks_increment(void *kaddr) + * void ssdfs_ino_tree_cache_leaks_decrement(void *kaddr) + * void *ssdfs_ino_tree_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_ino_tree_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_ino_tree_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_ino_tree_kfree(void *kaddr) + * struct page *ssdfs_ino_tree_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_ino_tree_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_ino_tree_free_page(struct page *page) + * void ssdfs_ino_tree_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(ino_tree) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(ino_tree) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_ino_tree_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_ino_tree_page_leaks, 0); + atomic64_set(&ssdfs_ino_tree_memory_leaks, 0); + atomic64_set(&ssdfs_ino_tree_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_ino_tree_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_ino_tree_page_leaks) != 0) { + SSDFS_ERR("INODES TREE: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_ino_tree_page_leaks)); + } + + if (atomic64_read(&ssdfs_ino_tree_memory_leaks) != 0) { + SSDFS_ERR("INODES TREE: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_ino_tree_memory_leaks)); + } + + if (atomic64_read(&ssdfs_ino_tree_cache_leaks) != 0) { + SSDFS_ERR("INODES TREE: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_ino_tree_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +static struct kmem_cache *ssdfs_free_ino_desc_cachep; + +void ssdfs_zero_free_ino_desc_cache_ptr(void) +{ + ssdfs_free_ino_desc_cachep = NULL; +} + +static +void ssdfs_init_free_ino_desc_once(void *obj) +{ + struct ssdfs_inodes_btree_range *range_desc = obj; + + memset(range_desc, 0, sizeof(struct ssdfs_inodes_btree_range)); +} + +void ssdfs_shrink_free_ino_desc_cache(void) +{ + if (ssdfs_free_ino_desc_cachep) + kmem_cache_shrink(ssdfs_free_ino_desc_cachep); +} + +void ssdfs_destroy_free_ino_desc_cache(void) +{ + if (ssdfs_free_ino_desc_cachep) + kmem_cache_destroy(ssdfs_free_ino_desc_cachep); +} + +int ssdfs_init_free_ino_desc_cache(void) +{ + ssdfs_free_ino_desc_cachep = + kmem_cache_create("ssdfs_free_ino_desc_cache", + sizeof(struct ssdfs_inodes_btree_range), 0, + SLAB_RECLAIM_ACCOUNT | + SLAB_MEM_SPREAD | + SLAB_ACCOUNT, + ssdfs_init_free_ino_desc_once); + if (!ssdfs_free_ino_desc_cachep) { + SSDFS_ERR("unable to create free inode descriptors cache\n"); + return -ENOMEM; + } + + return 0; +} + +/****************************************************************************** + * FREE INODES RANGE FUNCTIONALITY * + ******************************************************************************/ + +/* + * ssdfs_free_inodes_range_alloc() - allocate memory for free inodes range + */ +struct ssdfs_inodes_btree_range *ssdfs_free_inodes_range_alloc(void) +{ + struct ssdfs_inodes_btree_range *ptr; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ssdfs_free_ino_desc_cachep); +#endif /* CONFIG_SSDFS_DEBUG */ + + ptr = kmem_cache_alloc(ssdfs_free_ino_desc_cachep, GFP_KERNEL); + if (!ptr) { + SSDFS_ERR("fail to allocate memory for free inodes range\n"); + return ERR_PTR(-ENOMEM); + } + + ssdfs_ino_tree_cache_leaks_increment(ptr); + + return ptr; +} + +/* + * ssdfs_free_inodes_range_free() - free memory for free inodes range + */ +void ssdfs_free_inodes_range_free(struct ssdfs_inodes_btree_range *range) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ssdfs_free_ino_desc_cachep); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!range) + return; + + ssdfs_ino_tree_cache_leaks_decrement(range); + kmem_cache_free(ssdfs_free_ino_desc_cachep, range); +} + +/* + * ssdfs_free_inodes_range_init() - init free inodes range + * @range: free inodes range object [out] + */ +void ssdfs_free_inodes_range_init(struct ssdfs_inodes_btree_range *range) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!range); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset(range, 0, sizeof(struct ssdfs_inodes_btree_range)); + + INIT_LIST_HEAD(&range->list); + range->node_id = SSDFS_BTREE_NODE_INVALID_ID; + range->area.start_hash = SSDFS_INODES_RANGE_INVALID_START; + range->area.start_index = SSDFS_INODES_RANGE_INVALID_INDEX; +} + +/****************************************************************************** + * FREE INODES QUEUE FUNCTIONALITY * + ******************************************************************************/ + +/* + * ssdfs_free_inodes_queue_init() - initialize free inodes queue + * @q: free inodes queue [out] + */ +static +void ssdfs_free_inodes_queue_init(struct ssdfs_free_inode_range_queue *q) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!q); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock_init(&q->lock); + INIT_LIST_HEAD(&q->list); +} + +/* + * is_ssdfs_free_inodes_queue_empty() - check that free inodes queue is empty + * @q: free inodes queue + */ +static +bool is_ssdfs_free_inodes_queue_empty(struct ssdfs_free_inode_range_queue *q) +{ + bool is_empty; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!q); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&q->lock); + is_empty = list_empty_careful(&q->list); + spin_unlock(&q->lock); + + return is_empty; +} + +/* + * ssdfs_free_inodes_queue_add_head() - add range at the head of queue + * @q: free inodes queue + * @range: free inodes range + */ +static void +ssdfs_free_inodes_queue_add_head(struct ssdfs_free_inode_range_queue *q, + struct ssdfs_inodes_btree_range *range) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!q || !range); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&q->lock); + list_add(&range->list, &q->list); + spin_unlock(&q->lock); +} + +/* + * ssdfs_free_inodes_queue_add_tail() - add range at the tail of queue + * @q: free inodes queue + * @range: free inodes range + */ +static void +ssdfs_free_inodes_queue_add_tail(struct ssdfs_free_inode_range_queue *q, + struct ssdfs_inodes_btree_range *range) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!q || !range); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&q->lock); + list_add_tail(&range->list, &q->list); + spin_unlock(&q->lock); +} + +/* + * ssdfs_free_inodes_queue_get_first() - get first free inodes range + * @q: free inodes queue + * @range: pointer on value that stores range pointer [out] + * + * This method tries to retrieve the first free inode's index from + * queue of free inode ranges. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENODATA - queue is empty. + * %-ERANGE - internal error. + */ +static +int ssdfs_free_inodes_queue_get_first(struct ssdfs_free_inode_range_queue *q, + struct ssdfs_inodes_btree_range **range) +{ + struct ssdfs_inodes_btree_range *first = NULL, *tmp = NULL; + bool is_empty = true; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!q || !range); +#endif /* CONFIG_SSDFS_DEBUG */ + + tmp = ssdfs_free_inodes_range_alloc(); + if (!tmp) { + SSDFS_ERR("fail to allocate free inodes range\n"); + return -ERANGE; + } + + ssdfs_free_inodes_range_init(tmp); + + spin_lock(&q->lock); + + is_empty = list_empty_careful(&q->list); + if (!is_empty) { + first = list_first_entry_or_null(&q->list, + struct ssdfs_inodes_btree_range, + list); + if (!first) { + err = -ENOENT; + SSDFS_WARN("first entry is NULL\n"); + goto finish_get_first; + } else { +#ifdef CONFIG_SSDFS_DEBUG + if (first->node_id == SSDFS_BTREE_NODE_INVALID_ID) { + err = -ERANGE; + SSDFS_ERR("invalid node ID\n"); + goto finish_get_first; + } + + if (first->area.start_hash == + SSDFS_INODES_RANGE_INVALID_START) { + err = -ERANGE; + SSDFS_ERR("invalid start index\n"); + goto finish_get_first; + } + + if (first->area.count == 0) { + err = -ERANGE; + SSDFS_ERR("empty range\n"); + list_del(&first->list); + goto finish_get_first; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + tmp->node_id = first->node_id; + tmp->area.start_hash = first->area.start_hash; + tmp->area.start_index = first->area.start_index; + tmp->area.count = 1; + + first->area.start_hash += 1; + first->area.start_index += 1; + first->area.count -= 1; + + if (first->area.count == 0) + list_del(&first->list); + } + } + +finish_get_first: + spin_unlock(&q->lock); + + if (first && first->area.count == 0) { + ssdfs_free_inodes_range_free(first); + first = NULL; + } + + if (unlikely(err)) { + ssdfs_free_inodes_range_free(tmp); + return err; + } else if (is_empty) { + ssdfs_free_inodes_range_free(tmp); + SSDFS_DBG("free inodes queue is empty\n"); + return -ENODATA; + } + + *range = tmp; + + return 0; +} + +/* + * ssdfs_free_inodes_queue_remove_first() - remove first free inodes range + * @q: free inodes queue + * @range: pointer on value that stores range pointer [out] + * + * This method tries to remove the first free inodes' range from + * queue. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENODATA - queue is empty. + * %-ERANGE - internal error. + */ +int ssdfs_free_inodes_queue_remove_first(struct ssdfs_free_inode_range_queue *q, + struct ssdfs_inodes_btree_range **range) +{ + bool is_empty; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!q || !range); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&q->lock); + is_empty = list_empty_careful(&q->list); + if (!is_empty) { + *range = list_first_entry_or_null(&q->list, + struct ssdfs_inodes_btree_range, + list); + if (!*range) { + SSDFS_WARN("first entry is NULL\n"); + err = -ENOENT; + } else + list_del(&(*range)->list); + } + spin_unlock(&q->lock); + + if (is_empty) { + SSDFS_WARN("requests queue is empty\n"); + return -ENODATA; + } else if (err) + return err; + + return 0; +} + +/* + * ssdfs_free_inodes_queue_remove_all() - remove all ranges from the queue + * @q: free inodes queue + */ +static +void ssdfs_free_inodes_queue_remove_all(struct ssdfs_free_inode_range_queue *q) +{ + bool is_empty; + LIST_HEAD(tmp_list); + struct list_head *this, *next; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!q); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&q->lock); + is_empty = list_empty_careful(&q->list); + if (!is_empty) + list_replace_init(&q->list, &tmp_list); + spin_unlock(&q->lock); + + if (is_empty) + return; + + list_for_each_safe(this, next, &tmp_list) { + struct ssdfs_inodes_btree_range *range; + + range = list_entry(this, struct ssdfs_inodes_btree_range, list); + + if (range) { + list_del(&range->list); + ssdfs_free_inodes_range_free(range); + } + } +} + +/****************************************************************************** + * INODES TREE OBJECT FUNCTIONALITY * + ******************************************************************************/ + +/* + * ssdfs_inodes_btree_create() - create inodes btree + * @fsi: pointer on shared file system object + * + * This method tries to create inodes btree object. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENOMEM - unable to allocate memory. + * %-ERANGE - internal error. + */ +int ssdfs_inodes_btree_create(struct ssdfs_fs_info *fsi) +{ + struct ssdfs_inodes_btree_info *ptr; + struct ssdfs_inodes_btree *raw_btree; + struct ssdfs_btree_search *search; + size_t raw_inode_size = sizeof(struct ssdfs_inode); + u32 vs_flags; + bool is_tree_inline = true; + ino_t ino; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + BUG_ON(!rwsem_is_locked(&fsi->volume_sem)); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("fsi %p\n", fsi); +#else + SSDFS_DBG("fsi %p\n", fsi); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ptr = ssdfs_ino_tree_kzalloc(sizeof(struct ssdfs_inodes_btree_info), + GFP_KERNEL); + if (!ptr) { + SSDFS_ERR("fail to allocate inodes tree\n"); + return -ENOMEM; + } + + fsi->inodes_tree = ptr; + + err = ssdfs_btree_create(fsi, + SSDFS_INODES_BTREE_INO, + &ssdfs_inodes_btree_desc_ops, + &ssdfs_inodes_btree_ops, + &ptr->generic_tree); + if (unlikely(err)) { + SSDFS_ERR("fail to create inodes tree: err %d\n", + err); + goto fail_create_inodes_tree; + } + + spin_lock(&fsi->volume_state_lock); + vs_flags = fsi->fs_flags; + spin_unlock(&fsi->volume_state_lock); + + is_tree_inline = vs_flags & SSDFS_HAS_INLINE_INODES_TREE; + + spin_lock_init(&ptr->lock); + raw_btree = &fsi->vs->inodes_btree; + ptr->upper_allocated_ino = le64_to_cpu(raw_btree->upper_allocated_ino); + ptr->last_free_ino = 0; + ptr->allocated_inodes = le64_to_cpu(raw_btree->allocated_inodes); + ptr->free_inodes = le64_to_cpu(raw_btree->free_inodes); + ptr->inodes_capacity = le64_to_cpu(raw_btree->inodes_capacity); + ptr->leaf_nodes = le32_to_cpu(raw_btree->leaf_nodes); + ptr->nodes_count = le32_to_cpu(raw_btree->nodes_count); + ptr->raw_inode_size = le16_to_cpu(raw_btree->desc.item_size); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("upper_allocated_ino %llu, allocated_inodes %llu, " + "free_inodes %llu, inodes_capacity %llu\n", + ptr->upper_allocated_ino, + ptr->allocated_inodes, + ptr->free_inodes, + ptr->inodes_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_memcpy(&ptr->root_folder, 0, raw_inode_size, + &fsi->vs->root_folder, 0, raw_inode_size, + raw_inode_size); + + if (!is_raw_inode_checksum_correct(fsi, + &ptr->root_folder, + raw_inode_size)) { + err = -EIO; + SSDFS_ERR("root folder inode is corrupted\n"); + goto fail_create_inodes_tree; + } + + ssdfs_free_inodes_queue_init(&ptr->free_inodes_queue); + + if (is_tree_inline) { + search = ssdfs_btree_search_alloc(); + if (!search) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate btree search object\n"); + goto fail_create_inodes_tree; + } + + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_ALLOCATE_ITEM; + search->request.flags = + SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT; + search->request.start.hash = 0; + search->request.end.hash = 0; + search->request.count = 1; + + ptr->allocated_inodes = 0; + ptr->free_inodes = 0; + ptr->inodes_capacity = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("upper_allocated_ino %llu, allocated_inodes %llu, " + "free_inodes %llu, inodes_capacity %llu\n", + ptr->upper_allocated_ino, + ptr->allocated_inodes, + ptr->free_inodes, + ptr->inodes_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_btree_add_node(&ptr->generic_tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to add the node: err %d\n", + err); + goto free_search_object; + } + + /* allocate all reserved inodes */ + ino = 0; + do { + search->request.start.hash = ino; + search->request.end.hash = ino; + search->request.count = 1; + + err = ssdfs_inodes_btree_allocate(ptr, &ino, search); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate an inode: err %d\n", + err); + goto free_search_object; + } else if (search->request.start.hash != ino) { + err = -ERANGE; + SSDFS_ERR("invalid ino %lu\n", + ino); + goto free_search_object; + } + + ino++; + } while (ino <= SSDFS_ROOT_INO); + + if (ino > SSDFS_ROOT_INO) + ino = SSDFS_ROOT_INO; + else { + err = -ERANGE; + SSDFS_ERR("unexpected ino %lu\n", ino); + goto free_search_object; + } + + switch (search->result.buf_state) { + case SSDFS_BTREE_SEARCH_INLINE_BUFFER: + case SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER: + /* expected state */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid result's buffer state: " + "%#x\n", + search->result.buf_state); + goto free_search_object; + } + + if (!search->result.buf) { + err = -ERANGE; + SSDFS_ERR("invalid buffer\n"); + goto free_search_object; + } + + if (search->result.buf_size < raw_inode_size) { + err = -ERANGE; + SSDFS_ERR("buf_size %zu < raw_inode_size %zu\n", + search->result.buf_size, + raw_inode_size); + goto free_search_object; + } + + if (search->result.items_in_buffer != 1) { + SSDFS_WARN("unexpected value: " + "items_in_buffer %u\n", + search->result.items_in_buffer); + } + + ssdfs_memcpy(search->result.buf, 0, search->result.buf_size, + &ptr->root_folder, 0, raw_inode_size, + raw_inode_size); + + err = ssdfs_inodes_btree_change(ptr, ino, search); + if (unlikely(err)) { + SSDFS_ERR("fail to change inode: " + "ino %lu, err %d\n", + ino, err); + goto free_search_object; + } + +free_search_object: + ssdfs_btree_search_free(search); + + if (unlikely(err)) + goto fail_create_inodes_tree; + + spin_lock(&fsi->volume_state_lock); + vs_flags = fsi->fs_flags; + vs_flags &= ~SSDFS_HAS_INLINE_INODES_TREE; + fsi->fs_flags = vs_flags; + spin_unlock(&fsi->volume_state_lock); + } else { + search = ssdfs_btree_search_alloc(); + if (!search) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate btree search object\n"); + goto fail_create_inodes_tree; + } + + ssdfs_btree_search_init(search); + err = ssdfs_inodes_btree_find(ptr, ptr->upper_allocated_ino, + search); + ssdfs_btree_search_free(search); + + if (err == -ENODATA) { + err = 0; + /* + * It doesn't need to find the inode. + * The goal is to pass through the tree. + * Simply ignores the no data error. + */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to prepare free inodes queue: " + "upper_allocated_ino %llu, err %d\n", + ptr->upper_allocated_ino, err); + goto fail_create_inodes_tree; + } + + spin_lock(&ptr->lock); + if (ptr->last_free_ino > 0 && + ptr->last_free_ino < ptr->upper_allocated_ino) { + ptr->upper_allocated_ino = ptr->last_free_ino - 1; + } + spin_unlock(&ptr->lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("last_free_ino %llu, upper_allocated_ino %llu\n", + ptr->last_free_ino, + ptr->upper_allocated_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("DONE: create inodes btree\n"); +#else + SSDFS_DBG("DONE: create inodes btree\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; + +fail_create_inodes_tree: + fsi->inodes_tree = NULL; + ssdfs_ino_tree_kfree(ptr); + return err; +} + +/* + * ssdfs_inodes_btree_destroy - destroy inodes btree + * @fsi: pointer on shared file system object + */ +void ssdfs_inodes_btree_destroy(struct ssdfs_fs_info *fsi) +{ + struct ssdfs_inodes_btree_info *tree; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p\n", fsi->inodes_tree); +#else + SSDFS_DBG("tree %p\n", fsi->inodes_tree); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (!fsi->inodes_tree) + return; + + ssdfs_debug_inodes_btree_object(fsi->inodes_tree); + + tree = fsi->inodes_tree; + ssdfs_btree_destroy(&tree->generic_tree); + ssdfs_free_inodes_queue_remove_all(&tree->free_inodes_queue); + + ssdfs_ino_tree_kfree(fsi->inodes_tree); + fsi->inodes_tree = NULL; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ +} + +/* + * ssdfs_inodes_btree_flush() - flush dirty inodes btree + * @tree: pointer on inodes btree object + * + * This method tries to flush the dirty inodes btree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_inodes_btree_flush(struct ssdfs_inodes_btree_info *tree) +{ + struct ssdfs_fs_info *fsi; + u64 upper_allocated_ino; + u64 allocated_inodes; + u64 free_inodes; + u64 inodes_capacity; + u32 leaf_nodes; + u32 nodes_count; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p\n", tree); +#else + SSDFS_DBG("tree %p\n", tree); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + fsi = tree->generic_tree.fsi; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!rwsem_is_locked(&fsi->volume_sem)); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_btree_flush(&tree->generic_tree); + if (unlikely(err)) { + SSDFS_ERR("fail to flush inodes btree: err %d\n", + err); + return err; + } + + spin_lock(&tree->lock); + ssdfs_memcpy(&fsi->vs->root_folder, + 0, sizeof(struct ssdfs_inode), + &tree->root_folder, + 0, sizeof(struct ssdfs_inode), + sizeof(struct ssdfs_inode)); + upper_allocated_ino = tree->upper_allocated_ino; + allocated_inodes = tree->allocated_inodes; + free_inodes = tree->free_inodes; + inodes_capacity = tree->inodes_capacity; + leaf_nodes = tree->leaf_nodes; + nodes_count = tree->nodes_count; + spin_unlock(&tree->lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("allocated_inodes %llu, free_inodes %llu, " + "inodes_capacity %llu\n", + allocated_inodes, free_inodes, inodes_capacity); + WARN_ON((allocated_inodes + free_inodes) != inodes_capacity); + + SSDFS_DBG("leaf_nodes %u, nodes_count %u\n", + leaf_nodes, nodes_count); + WARN_ON(leaf_nodes >= nodes_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi->vs->inodes_btree.allocated_inodes = cpu_to_le64(allocated_inodes); + fsi->vs->inodes_btree.free_inodes = cpu_to_le64(free_inodes); + fsi->vs->inodes_btree.inodes_capacity = cpu_to_le64(inodes_capacity); + fsi->vs->inodes_btree.leaf_nodes = cpu_to_le32(leaf_nodes); + fsi->vs->inodes_btree.nodes_count = cpu_to_le32(nodes_count); + fsi->vs->inodes_btree.upper_allocated_ino = + cpu_to_le64(upper_allocated_ino); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ssdfs_debug_inodes_btree_object(fsi->inodes_tree); + + return 0; +} + +static inline +bool need_initialize_inodes_btree_search(ino_t ino, + struct ssdfs_btree_search *search) +{ + return need_initialize_btree_search(search) || + search->request.start.hash != ino; +} + +/* + * ssdfs_inodes_btree_find() - find raw inode + * @tree: pointer on inodes btree object + * @ino: inode ID value + * @search: pointer on search request object + * + * This method tries to find the raw inode for @ino. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_inodes_btree_find(struct ssdfs_inodes_btree_info *tree, + ino_t ino, + struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + + SSDFS_DBG("tree %p, ino %lu, search %p\n", + tree, ino, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + + if (need_initialize_inodes_btree_search(ino, search)) { + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + search->request.flags = + SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT; + search->request.start.hash = ino; + search->request.end.hash = ino; + search->request.count = 1; + } + + return ssdfs_btree_find_item(&tree->generic_tree, search); +} + +/* + * ssdfs_inodes_btree_allocate() - allocate a new raw inode + * @tree: pointer on inodes btree object + * @ino: pointer on inode ID value [out] + * @search: pointer on search request object + * + * This method tries to allocate a new raw inode into + * the inodes btree. The @ino contains inode ID number. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENOMEM - unable to allocate memory. + * %-ERANGE - internal error. + */ +int ssdfs_inodes_btree_allocate(struct ssdfs_inodes_btree_info *tree, + ino_t *ino, + struct ssdfs_btree_search *search) +{ + struct ssdfs_inodes_btree_range *range = NULL; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !ino || !search); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p, ino %p, search %p\n", + tree, ino, search); +#else + SSDFS_DBG("tree %p, ino %p, search %p\n", + tree, ino, search); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + *ino = ULONG_MAX; + + err = ssdfs_free_inodes_queue_get_first(&tree->free_inodes_queue, + &range); + if (err == -ENODATA) { + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_ALLOCATE_ITEM; + search->request.flags = + SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT; + spin_lock(&tree->lock); + search->request.start.hash = tree->upper_allocated_ino + 1; + search->request.end.hash = tree->upper_allocated_ino + 1; + spin_unlock(&tree->lock); + search->request.count = 1; + + err = ssdfs_btree_add_node(&tree->generic_tree, search); + if (err == -EEXIST) + err = 0; + else if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to add the node: err %d\n", + err); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to add the node: err %d\n", + err); + return err; + } + + err = + ssdfs_free_inodes_queue_get_first(&tree->free_inodes_queue, + &range); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to get first free inode hash from the queue: " + "err %d\n", + err); + return err; + } + + if (is_free_inodes_range_invalid(range)) { + err = -ERANGE; + SSDFS_WARN("invalid free inodes range\n"); + goto finish_inode_allocation; + } + + if (range->area.start_hash >= ULONG_MAX) { + err = -EOPNOTSUPP; + SSDFS_WARN("start_hash %llx is too huge\n", + range->area.start_hash); + goto finish_inode_allocation; + } + + if (range->area.count != 1) + SSDFS_WARN("invalid free inodes range\n"); + + *ino = (ino_t)range->area.start_hash; + search->request.type = SSDFS_BTREE_SEARCH_ALLOCATE_ITEM; + + if (need_initialize_inodes_btree_search(*ino, search)) { + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_ALLOCATE_ITEM; + search->request.flags = + SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT; + search->request.start.hash = *ino; + search->request.end.hash = *ino; + search->request.count = 1; + } + + search->result.state = SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND; + search->result.start_index = range->area.start_index; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %llu, start_index %u\n", + (u64)*ino, (u32)search->result.start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_btree_allocate_item(&tree->generic_tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate item: ino %llu, err %d\n", + search->request.start.hash, err); + goto finish_inode_allocation; + } + +finish_inode_allocation: + ssdfs_free_inodes_range_free(range); + + ssdfs_btree_search_forget_parent_node(search); + ssdfs_btree_search_forget_child_node(search); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * ssdfs_inodes_btree_change() - change raw inode + * @tree: pointer on inodes btree object + * @ino: inode ID value + * @search: pointer on search request object + * + * This method tries to change the raw inode for @ino. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_inodes_btree_change(struct ssdfs_inodes_btree_info *tree, + ino_t ino, + struct ssdfs_btree_search *search) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p, ino %lu, search %p\n", + tree, ino, search); +#else + SSDFS_DBG("tree %p, ino %lu, search %p\n", + tree, ino, search); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + search->request.type = SSDFS_BTREE_SEARCH_CHANGE_ITEM; + + if (need_initialize_inodes_btree_search(ino, search)) { + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_CHANGE_ITEM; + search->request.flags = + SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT; + search->request.start.hash = ino; + search->request.end.hash = ino; + search->request.count = 1; + } + + err = ssdfs_btree_change_item(&tree->generic_tree, search); + + ssdfs_btree_search_forget_parent_node(search); + ssdfs_btree_search_forget_child_node(search); + + if (unlikely(err)) { + SSDFS_ERR("fail to change inode: ino %lu, err %d\n", + ino, err); + return err; + } + + if (ino == SSDFS_ROOT_INO) { + spin_lock(&tree->lock); + ssdfs_memcpy(&tree->root_folder, + 0, sizeof(struct ssdfs_inode), + search->result.buf, + 0, search->result.buf_size, + sizeof(struct ssdfs_inode)); + spin_unlock(&tree->lock); + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} + +/* + * ssdfs_inodes_btree_delete_range() - delete a range of raw inodes + * @tree: pointer on inodes btree object + * @ino: starting inode ID value + * @count: count of raw inodes in the range + * + * This method tries to delete the @count of raw inodes + * that are starting from @ino. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENOMEM - unable to allocate memory. + * %-ERANGE - internal error. + */ +int ssdfs_inodes_btree_delete_range(struct ssdfs_inodes_btree_info *tree, + ino_t ino, u16 count) +{ + struct ssdfs_btree_search *search; + struct ssdfs_inodes_btree_range *range; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p, ino %lu, count %u\n", + tree, ino, count); +#else + SSDFS_DBG("tree %p, ino %lu, count %u\n", + tree, ino, count); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (count == 0) { + SSDFS_WARN("count == 0\n"); + return 0; + } + + search = ssdfs_btree_search_alloc(); + if (!search) { + SSDFS_ERR("fail to allocate btree search object\n"); + return -ENOMEM; + } + + ssdfs_btree_search_init(search); + + if (count == 1) + err = ssdfs_inodes_btree_find(tree, ino, search); + else { + search->request.type = SSDFS_BTREE_SEARCH_FIND_RANGE; + search->request.flags = + SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT; + search->request.start.hash = ino; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(ino >= U64_MAX - count); +#endif /* CONFIG_SSDFS_DEBUG */ + search->request.end.hash = (u64)ino + count; + search->request.count = count; + + err = ssdfs_btree_find_range(&tree->generic_tree, search); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to find inodes range: " + "ino %lu, count %u, err %d\n", + ino, count, err); + goto finish_delete_inodes_range; + } + + if (count == 1) { + search->request.type = SSDFS_BTREE_SEARCH_DELETE_ITEM; + err = ssdfs_btree_delete_item(&tree->generic_tree, search); + } else { + search->request.type = SSDFS_BTREE_SEARCH_DELETE_RANGE; + err = ssdfs_btree_delete_range(&tree->generic_tree, search); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to delete raw inodes range: " + "ino %lu, count %u, err %d\n", + ino, count, err); + goto finish_delete_inodes_range; + } + + range = ssdfs_free_inodes_range_alloc(); + if (!range) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate free inodes range object\n"); + goto finish_delete_inodes_range; + } + + ssdfs_free_inodes_range_init(range); + + range->node_id = search->node.id; + range->area.start_hash = search->request.start.hash; + range->area.start_index = search->result.start_index; + range->area.count = count; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("add free range: node_id %u, " + "start_hash %llx, start_index %u, " + "count %u\n", + range->node_id, + range->area.start_hash, + range->area.start_index, + range->area.count); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_free_inodes_queue_add_head(&tree->free_inodes_queue, range); + + spin_lock(&tree->lock); + if (range->area.start_hash > tree->last_free_ino) { + tree->last_free_ino = + range->area.start_hash + range->area.count; + } + spin_unlock(&tree->lock); + +finish_delete_inodes_range: + ssdfs_btree_search_free(search); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * ssdfs_inodes_btree_delete() - delete raw inode + * @tree: pointer on inodes btree object + * @ino: inode ID value + * + * This method tries to delete the raw inode for @ino. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENOMEM - unable to allocate memory. + * %-ERANGE - internal error. + */ +int ssdfs_inodes_btree_delete(struct ssdfs_inodes_btree_info *tree, + ino_t ino) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); + + SSDFS_DBG("tree %p, ino %lu\n", + tree, ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + return ssdfs_inodes_btree_delete_range(tree, ino, 1); +} + +/****************************************************************************** + * SPECIALIZED INODES BTREE DESCRIPTOR OPERATIONS * + ******************************************************************************/ + +/* + * ssdfs_inodes_btree_desc_init() - specialized btree descriptor init + * @fsi: pointer on shared file system object + * @tree: pointer on inodes btree object + */ +static +int ssdfs_inodes_btree_desc_init(struct ssdfs_fs_info *fsi, + struct ssdfs_btree *tree) +{ + struct ssdfs_btree_descriptor *desc; + u32 erasesize; + u32 node_size; + size_t inode_size = sizeof(struct ssdfs_inode); + u16 item_size; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !tree); + BUG_ON(!rwsem_is_locked(&fsi->volume_sem)); + + SSDFS_DBG("fsi %p, tree %p\n", + fsi, tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + erasesize = fsi->erasesize; + + desc = &fsi->vs->inodes_btree.desc; + + if (le32_to_cpu(desc->magic) != SSDFS_INODES_BTREE_MAGIC) { + err = -EIO; + SSDFS_ERR("invalid magic %#x\n", + le32_to_cpu(desc->magic)); + goto finish_btree_desc_init; + } + + /* TODO: check flags */ + + if (desc->type != SSDFS_INODES_BTREE) { + err = -EIO; + SSDFS_ERR("invalid btree type %#x\n", + desc->type); + goto finish_btree_desc_init; + } + + node_size = 1 << desc->log_node_size; + if (node_size < SSDFS_4KB || node_size > erasesize) { + err = -EIO; + SSDFS_ERR("invalid node size: " + "log_node_size %u, node_size %u, erasesize %u\n", + desc->log_node_size, + node_size, erasesize); + goto finish_btree_desc_init; + } + + item_size = le16_to_cpu(desc->item_size); + + if (item_size != inode_size) { + err = -EIO; + SSDFS_ERR("invalid item size %u\n", + item_size); + goto finish_btree_desc_init; + } + + if (le16_to_cpu(desc->index_area_min_size) != inode_size) { + err = -EIO; + SSDFS_ERR("invalid index_area_min_size %u\n", + le16_to_cpu(desc->index_area_min_size)); + goto finish_btree_desc_init; + } + + err = ssdfs_btree_desc_init(fsi, tree, desc, 0, item_size); + +finish_btree_desc_init: + if (unlikely(err)) { + SSDFS_ERR("fail to init btree descriptor: err %d\n", + err); + } + + return err; +} + +/* + * ssdfs_inodes_btree_desc_flush() - specialized btree's descriptor flush + * @tree: pointer on inodes btree object + */ +static +int ssdfs_inodes_btree_desc_flush(struct ssdfs_btree *tree) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_btree_descriptor desc; + size_t inode_size = sizeof(struct ssdfs_inode); + u32 erasesize; + u32 node_size; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !tree->fsi); + BUG_ON(!rwsem_is_locked(&tree->fsi->volume_sem)); + + SSDFS_DBG("owner_ino %llu, type %#x, state %#x\n", + tree->owner_ino, tree->type, + atomic_read(&tree->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = tree->fsi; + + memset(&desc, 0xFF, sizeof(struct ssdfs_btree_descriptor)); + + desc.magic = cpu_to_le32(SSDFS_INODES_BTREE_MAGIC); + desc.item_size = cpu_to_le16(inode_size); + + err = ssdfs_btree_desc_flush(tree, &desc); + if (unlikely(err)) { + SSDFS_ERR("invalid btree descriptor: err %d\n", + err); + return err; + } + + if (desc.type != SSDFS_INODES_BTREE) { + SSDFS_ERR("invalid btree type %#x\n", + desc.type); + return -ERANGE; + } + + erasesize = fsi->erasesize; + node_size = 1 << desc.log_node_size; + + if (node_size < SSDFS_4KB || node_size > erasesize) { + SSDFS_ERR("invalid node size: " + "log_node_size %u, node_size %u, erasesize %u\n", + desc.log_node_size, + node_size, erasesize); + return -ERANGE; + } + + if (le16_to_cpu(desc.index_area_min_size) != inode_size) { + SSDFS_ERR("invalid index_area_min_size %u\n", + le16_to_cpu(desc.index_area_min_size)); + return -ERANGE; + } + + ssdfs_memcpy(&fsi->vs->inodes_btree.desc, + 0, sizeof(struct ssdfs_btree_descriptor), + &desc, + 0, sizeof(struct ssdfs_btree_descriptor), + sizeof(struct ssdfs_btree_descriptor)); + + return 0; +} + +/****************************************************************************** + * SPECIALIZED INODES BTREE OPERATIONS * + ******************************************************************************/ + +/* + * ssdfs_inodes_btree_create_root_node() - specialized root node creation + * @fsi: pointer on shared file system object + * @node: pointer on node object [out] + */ +static +int ssdfs_inodes_btree_create_root_node(struct ssdfs_fs_info *fsi, + struct ssdfs_btree_node *node) +{ + struct ssdfs_btree_inline_root_node *root_node; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !fsi->vs || !node); + BUG_ON(!rwsem_is_locked(&fsi->volume_sem)); + + SSDFS_DBG("fsi %p, node %p\n", + fsi, node); +#endif /* CONFIG_SSDFS_DEBUG */ + + root_node = &fsi->vs->inodes_btree.root_node; + err = ssdfs_btree_create_root_node(node, root_node); + if (unlikely(err)) { + SSDFS_ERR("fail to create root node: err %d\n", + err); + } + + return err; +} + +/* + * ssdfs_inodes_btree_pre_flush_root_node() - specialized root node pre-flush + * @node: pointer on node object + */ +static +int ssdfs_inodes_btree_pre_flush_root_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_btree *tree; + struct ssdfs_state_bitmap *bmap; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node_id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + case SSDFS_BTREE_NODE_INITIALIZED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is clean\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + + case SSDFS_BTREE_NODE_CORRUPTED: + SSDFS_WARN("node %u is corrupted\n", + node->node_id); + down_read(&node->bmap_array.lock); + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_DIRTY_BMAP]; + spin_lock(&bmap->lock); + bitmap_clear(bmap->ptr, 0, node->bmap_array.bits_count); + spin_unlock(&bmap->lock); + up_read(&node->bmap_array.lock); + clear_ssdfs_btree_node_dirty(node); + return -EFAULT; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + tree = node->tree; + if (!tree) { + SSDFS_ERR("node hasn't pointer on tree\n"); + return -ERANGE; + } + + if (tree->type != SSDFS_INODES_BTREE) { + SSDFS_WARN("invalid tree type %#x\n", + tree->type); + return -ERANGE; + } + + down_write(&node->full_lock); + down_write(&node->header_lock); + + err = ssdfs_btree_pre_flush_root_node(node); + if (unlikely(err)) { + SSDFS_ERR("fail to pre-flush root node: " + "node_id %u, err %d\n", + node->node_id, err); + } + + up_write(&node->header_lock); + up_write(&node->full_lock); + + return err; +} + +/* + * ssdfs_inodes_btree_flush_root_node() - specialized root node flush + * @node: pointer on node object + */ +static +int ssdfs_inodes_btree_flush_root_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_btree_inline_root_node *root_node; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !node->tree->fsi); + BUG_ON(!rwsem_is_locked(&node->tree->fsi->volume_sem)); + + SSDFS_DBG("node_id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_ssdfs_btree_node_dirty(node)) { + SSDFS_WARN("node %u is not dirty\n", + node->node_id); + return 0; + } + + root_node = &node->tree->fsi->vs->inodes_btree.root_node; + ssdfs_btree_flush_root_node(node, root_node); + + return 0; +} + +/* + * ssdfs_inodes_btree_create_node() - specialized node creation + * @node: pointer on node object + */ +static +int ssdfs_inodes_btree_create_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_btree *tree; + void *addr[SSDFS_BTREE_NODE_BMAP_COUNT]; + struct ssdfs_inodes_btree_node_header *hdr; + size_t hdr_size = sizeof(struct ssdfs_inodes_btree_node_header); + u32 node_size; + u32 items_area_size = 0; + u16 item_size = 0; + u16 index_size = 0; + u16 index_area_min_size; + u16 items_capacity = 0; + u16 index_capacity = 0; + u32 index_area_size = 0; + size_t bmap_bytes; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree); + WARN_ON(atomic_read(&node->state) != SSDFS_BTREE_NODE_CREATED); + + SSDFS_DBG("node_id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree = node->tree; + node_size = tree->node_size; + index_area_min_size = tree->index_area_min_size; + + node->node_ops = &ssdfs_inodes_btree_node_ops; + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_INDEX_NODE: + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid index area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_AREA_ABSENT: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid items area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + break; + + case SSDFS_BTREE_HYBRID_NODE: + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid index area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid items area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + break; + + case SSDFS_BTREE_LEAF_NODE: + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_AREA_ABSENT: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid index area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid items area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + break; + + default: + SSDFS_WARN("invalid node type %#x\n", + atomic_read(&node->type)); + return -ERANGE; + } + + down_write(&node->header_lock); + down_write(&node->bmap_array.lock); + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_INDEX_NODE: + node->index_area.offset = (u32)hdr_size; + node->index_area.area_size = node_size - hdr_size; + + index_area_size = node->index_area.area_size; + index_size = node->index_area.index_size; + + node->index_area.index_capacity = index_area_size / index_size; + index_capacity = node->index_area.index_capacity; + + node->bmap_array.index_start_bit = + SSDFS_BTREE_NODE_HEADER_INDEX + 1; + node->bmap_array.item_start_bit = + node->bmap_array.index_start_bit + index_capacity; + break; + + case SSDFS_BTREE_HYBRID_NODE: + node->index_area.offset = (u32)hdr_size; + + if (index_area_min_size == 0 || + index_area_min_size >= (node_size - hdr_size)) { + err = -ERANGE; + SSDFS_ERR("invalid index area desc: " + "index_area_min_size %u, " + "node_size %u, hdr_size %zu\n", + index_area_min_size, + node_size, hdr_size); + goto finish_create_node; + } + + node->index_area.area_size = index_area_min_size; + + index_area_size = node->index_area.area_size; + index_size = node->index_area.index_size; + node->index_area.index_capacity = index_area_size / index_size; + index_capacity = node->index_area.index_capacity; + + node->items_area.offset = node->index_area.offset + + node->index_area.area_size; + + if (node->items_area.offset >= node_size) { + err = -ERANGE; + SSDFS_ERR("invalid items area desc: " + "area_offset %u, node_size %u\n", + node->items_area.offset, + node_size); + goto finish_create_node; + } + + node->items_area.area_size = node_size - + node->items_area.offset; + node->items_area.free_space = node->items_area.area_size; + node->items_area.item_size = tree->item_size; + node->items_area.min_item_size = tree->min_item_size; + node->items_area.max_item_size = tree->max_item_size; + + items_area_size = node->items_area.area_size; + item_size = node->items_area.item_size; + + node->items_area.items_count = 0; + node->items_area.items_capacity = items_area_size / item_size; + items_capacity = node->items_area.items_capacity; + + if (node->items_area.items_capacity == 0) { + err = -ERANGE; + SSDFS_ERR("items area's capacity %u\n", + node->items_area.items_capacity); + goto finish_create_node; + } + + node->items_area.end_hash = node->items_area.start_hash + + node->items_area.items_capacity - 1; + + node->bmap_array.index_start_bit = + SSDFS_BTREE_NODE_HEADER_INDEX + 1; + node->bmap_array.item_start_bit = + node->bmap_array.index_start_bit + index_capacity; + break; + + case SSDFS_BTREE_LEAF_NODE: + node->items_area.offset = (u32)hdr_size; + node->items_area.area_size = node_size - hdr_size; + node->items_area.free_space = node->items_area.area_size; + node->items_area.item_size = tree->item_size; + node->items_area.min_item_size = tree->min_item_size; + node->items_area.max_item_size = tree->max_item_size; + + items_area_size = node->items_area.area_size; + item_size = node->items_area.item_size; + + node->items_area.items_count = 0; + node->items_area.items_capacity = items_area_size / item_size; + items_capacity = node->items_area.items_capacity; + + node->items_area.end_hash = node->items_area.start_hash + + node->items_area.items_capacity - 1; + + node->bmap_array.item_start_bit = + SSDFS_BTREE_NODE_HEADER_INDEX + 1; + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid node type %#x\n", + atomic_read(&node->type)); + goto finish_create_node; + } + + node->bmap_array.bits_count = index_capacity + items_capacity + 1; + + if (item_size > 0) + items_capacity = node_size / item_size; + else + items_capacity = 0; + + if (index_size > 0) + index_capacity = node_size / index_size; + else + index_capacity = 0; + + bmap_bytes = index_capacity + items_capacity + 1; + bmap_bytes += BITS_PER_LONG; + bmap_bytes /= BITS_PER_BYTE; + + node->bmap_array.bmap_bytes = bmap_bytes; + + if (bmap_bytes == 0 || bmap_bytes > SSDFS_INODE_BMAP_SIZE) { + err = -EIO; + SSDFS_ERR("invalid bmap_bytes %zu\n", + bmap_bytes); + goto finish_create_node; + } + + hdr = &node->raw.inodes_header; + hdr->inodes_count = cpu_to_le16(0); + hdr->valid_inodes = cpu_to_le16(0); + +finish_create_node: + up_write(&node->bmap_array.lock); + up_write(&node->header_lock); + + if (unlikely(err)) + return err; + + err = ssdfs_btree_node_allocate_bmaps(addr, bmap_bytes); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate node's bitmaps: " + "bmap_bytes %zu, err %d\n", + bmap_bytes, err); + return err; + } + + down_write(&node->bmap_array.lock); + for (i = 0; i < SSDFS_BTREE_NODE_BMAP_COUNT; i++) { + spin_lock(&node->bmap_array.bmap[i].lock); + node->bmap_array.bmap[i].ptr = addr[i]; + addr[i] = NULL; + spin_unlock(&node->bmap_array.bmap[i].lock); + } + up_write(&node->bmap_array.lock); + + err = ssdfs_btree_node_allocate_content_space(node, node_size); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate content space: " + "node_size %u, err %d\n", + node_size, err); + return err; + } + + ssdfs_debug_btree_node_object(node); + + return err; +} + +/* + * ssdfs_process_deleted_nodes() - process deleted nodes + * @node: pointer on node object + * @q: pointer on temporary ranges queue + * @start_hash: starting hash of the range + * @end_hash: ending hash of the range + * @inodes_per_node: number of inodes per leaf node + * + * This method tries to process the deleted nodes. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - unable to allocate memory. + * %-ERANGE - internal error. + */ +static +int ssdfs_process_deleted_nodes(struct ssdfs_btree_node *node, + struct ssdfs_free_inode_range_queue *q, + u64 start_hash, u64 end_hash, + u32 inodes_per_node) +{ + struct ssdfs_inodes_btree_info *tree; + struct ssdfs_inodes_btree_range *range; + u64 inodes_range; + u64 deleted_nodes; + u32 remainder; + s64 i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !q); + + SSDFS_DBG("node_id %u, state %#x, " + "start_hash %llx, end_hash %llx, " + "inodes_per_node %u\n", + node->node_id, atomic_read(&node->state), + start_hash, end_hash, inodes_per_node); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (node->tree->type == SSDFS_INODES_BTREE) + tree = (struct ssdfs_inodes_btree_info *)node->tree; + else { + SSDFS_ERR("invalid tree type %#x\n", + node->tree->type); + return -ERANGE; + } + + if (start_hash == U64_MAX || end_hash == U64_MAX) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("invalid range: " + "start_hash %llx, end_hash %llx\n", + start_hash, end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } else if (start_hash > end_hash) { + SSDFS_ERR("invalid range: " + "start_hash %llx, end_hash %llx\n", + start_hash, end_hash); + return -ERANGE; + } + + inodes_range = end_hash - start_hash; + deleted_nodes = div_u64_rem(inodes_range, inodes_per_node, &remainder); + + if (remainder != 0) { + SSDFS_ERR("invalid range: " + "inodes_range %llu, inodes_per_node %u, " + "remainder %u\n", + inodes_range, inodes_per_node, remainder); + return -ERANGE; + } + + for (i = 0; i < deleted_nodes; i++) { + range = ssdfs_free_inodes_range_alloc(); + if (unlikely(!range)) { + SSDFS_ERR("fail to allocate inodes range\n"); + return -ENOMEM; + } + + ssdfs_free_inodes_range_init(range); + range->node_id = node->node_id; + range->area.start_hash = start_hash + (i * inodes_per_node); + range->area.start_index = 0; + range->area.count = (u16)inodes_per_node; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("add free range: node_id %u, " + "start_hash %llx, start_index %u, " + "count %u\n", + range->node_id, + range->area.start_hash, + range->area.start_index, + range->area.count); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_free_inodes_queue_add_tail(q, range); + + spin_lock(&tree->lock); + if (range->area.start_hash > tree->last_free_ino) { + tree->last_free_ino = + range->area.start_hash + range->area.count; + } + spin_unlock(&tree->lock); + } + + return 0; +} + +/* + * ssdfs_inodes_btree_detect_deleted_nodes() - detect deleted nodes + * @node: pointer on node object + * @q: pointer on temporary ranges queue + * + * This method tries to detect deleted nodes. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - unable to allocate memory. + * %-ERANGE - internal error. + */ +static +int ssdfs_inodes_btree_detect_deleted_nodes(struct ssdfs_btree_node *node, + struct ssdfs_free_inode_range_queue *q) +{ + struct ssdfs_btree_node_index_area index_area; + struct ssdfs_btree_index_key index; + size_t hdr_size = sizeof(struct ssdfs_inodes_btree_node_header); + u16 item_size; + u32 inodes_per_node; + u64 prev_hash, start_hash; + s64 i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !q); + + SSDFS_DBG("node_id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->header_lock); + ssdfs_memcpy(&index_area, + 0, sizeof(struct ssdfs_btree_node_index_area), + &node->index_area, + 0, sizeof(struct ssdfs_btree_node_index_area), + sizeof(struct ssdfs_btree_node_index_area)); + up_read(&node->header_lock); + + item_size = node->tree->item_size; + inodes_per_node = node->node_size; + inodes_per_node -= hdr_size; + inodes_per_node /= item_size; + + if (inodes_per_node == 0) { + SSDFS_ERR("invalid inodes_per_node %u\n", + inodes_per_node); + return -ERANGE; + } + + if (index_area.start_hash == U64_MAX || + index_area.end_hash == U64_MAX) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to detect deleted nodes: " + "start_hash %llx, end_hash %llx\n", + index_area.start_hash, + index_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_process_index_area; + } else if (index_area.start_hash > index_area.end_hash) { + err = -ERANGE; + SSDFS_ERR("invalid range: " + "start_hash %llx, end_hash %llx\n", + index_area.start_hash, + index_area.end_hash); + goto finish_process_index_area; + } else if (index_area.start_hash == index_area.end_hash) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("empty range: " + "start_hash %llx, end_hash %llx\n", + index_area.start_hash, + index_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_process_index_area; + } + + prev_hash = index_area.start_hash; + + for (i = 0; i < index_area.index_count; i++) { + err = ssdfs_btree_node_get_index(&node->content.pvec, + index_area.offset, + index_area.area_size, + node->node_size, + (u16)i, &index); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to extract index: " + "node_id %u, index %d, err %d\n", + node->node_id, 0, err); + goto finish_process_index_area; + } + + start_hash = le64_to_cpu(index.index.hash); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("prev_hash %llx, start_hash %llx, " + "index_area.start_hash %llx\n", + prev_hash, start_hash, + index_area.start_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (prev_hash != start_hash) { + err = ssdfs_process_deleted_nodes(node, q, + prev_hash, + start_hash, + inodes_per_node); + if (unlikely(err)) { + SSDFS_ERR("fail to process deleted nodes: " + "start_hash %llx, end_hash %llx, " + "err %d\n", + prev_hash, start_hash, err); + goto finish_process_index_area; + } + } + + prev_hash = start_hash + inodes_per_node; + } + + if (prev_hash < index_area.end_hash) { + start_hash = index_area.end_hash + inodes_per_node; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("prev_hash %llx, start_hash %llx, " + "index_area.end_hash %llx\n", + prev_hash, start_hash, + index_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_process_deleted_nodes(node, q, + prev_hash, + start_hash, + inodes_per_node); + if (unlikely(err)) { + SSDFS_ERR("fail to process deleted nodes: " + "start_hash %llx, end_hash %llx, " + "err %d\n", + prev_hash, start_hash, err); + goto finish_process_index_area; + } + } + +finish_process_index_area: + return err; +} + +/* + * ssdfs_inodes_btree_init_node() - init inodes tree's node + * @node: pointer on node object + * + * This method tries to init the node of inodes btree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - unable to allocate memory. + * %-ERANGE - internal error. + * %-EIO - invalid node's header content + */ +static +int ssdfs_inodes_btree_init_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_inodes_btree_info *tree; + struct ssdfs_inodes_btree_node_header *hdr; + size_t hdr_size = sizeof(struct ssdfs_inodes_btree_node_header); + struct ssdfs_free_inode_range_queue q; + struct ssdfs_inodes_btree_range *range; + void *addr[SSDFS_BTREE_NODE_BMAP_COUNT]; + struct page *page; + void *kaddr; + u32 node_size; + u16 flags; + u16 item_size; + u32 items_count = 0; + u8 index_size; + u16 items_capacity; + u32 index_area_size = 0; + u16 index_capacity = 0; + u16 inodes_count; + u16 valid_inodes; + size_t bmap_bytes; + u64 start_hash, end_hash; + unsigned long start, end; + unsigned long size, upper_bound; + signed long count; + unsigned long free_inodes; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree); + + SSDFS_DBG("node_id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (node->tree->type == SSDFS_INODES_BTREE) + tree = (struct ssdfs_inodes_btree_info *)node->tree; + else { + SSDFS_ERR("invalid tree type %#x\n", + node->tree->type); + return -ERANGE; + } + + if (atomic_read(&node->state) != SSDFS_BTREE_NODE_CONTENT_PREPARED) { + SSDFS_WARN("fail to init node: id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); + return -ERANGE; + } + + down_read(&node->full_lock); + + if (pagevec_count(&node->content.pvec) == 0) { + err = -ERANGE; + SSDFS_ERR("empty node's content: id %u\n", + node->node_id); + goto finish_init_node; + } + + page = node->content.pvec.pages[0]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + kaddr = kmap_local_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PAGE DUMP\n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, + PAGE_SIZE); + SSDFS_DBG("\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr = (struct ssdfs_inodes_btree_node_header *)kaddr; + + if (!is_csum_valid(&hdr->node.check, hdr, hdr_size)) { + err = -EIO; + SSDFS_ERR("invalid checksum: node_id %u\n", + node->node_id); + goto finish_init_operation; + } + + if (le32_to_cpu(hdr->node.magic.common) != SSDFS_SUPER_MAGIC || + le16_to_cpu(hdr->node.magic.key) != SSDFS_INODES_BNODE_MAGIC) { + err = -EIO; + SSDFS_ERR("invalid magic: common %#x, key %#x\n", + le32_to_cpu(hdr->node.magic.common), + le16_to_cpu(hdr->node.magic.key)); + goto finish_init_operation; + } + + down_write(&node->header_lock); + + ssdfs_memcpy(&node->raw.inodes_header, 0, hdr_size, + hdr, 0, hdr_size, + hdr_size); + + err = ssdfs_btree_init_node(node, &hdr->node, + hdr_size); + if (unlikely(err)) { + SSDFS_ERR("fail to init node: id %u, err %d\n", + node->node_id, err); + goto finish_header_init; + } + + start_hash = le64_to_cpu(hdr->node.start_hash); + end_hash = le64_to_cpu(hdr->node.end_hash); + node_size = 1 << hdr->node.log_node_size; + index_size = hdr->node.index_size; + item_size = node->tree->item_size; + items_capacity = le16_to_cpu(hdr->node.items_capacity); + inodes_count = le16_to_cpu(hdr->inodes_count); + valid_inodes = le16_to_cpu(hdr->valid_inodes); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_hash %llx, end_hash %llx, " + "items_capacity %u, valid_inodes %u, " + "inodes_count %u\n", + start_hash, end_hash, items_capacity, + valid_inodes, inodes_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (item_size == 0 || node_size % item_size) { + err = -EIO; + SSDFS_ERR("invalid size: item_size %u, node_size %u\n", + item_size, node_size); + goto finish_header_init; + } + + if (item_size != sizeof(struct ssdfs_inode)) { + err = -EIO; + SSDFS_ERR("invalid item_size: " + "size %u, expected size %zu\n", + item_size, + sizeof(struct ssdfs_inode)); + goto finish_header_init; + } + + switch (hdr->node.type) { + case SSDFS_BTREE_LEAF_NODE: + if (items_capacity == 0 || + items_capacity > (node_size / item_size)) { + err = -EIO; + SSDFS_ERR("invalid items_capacity %u\n", + items_capacity); + goto finish_header_init; + } + + if (items_capacity != inodes_count) { + err = -EIO; + SSDFS_ERR("items_capacity %u != inodes_count %u\n", + items_capacity, + inodes_count); + goto finish_header_init; + } + + if (valid_inodes > inodes_count) { + err = -EIO; + SSDFS_ERR("valid_inodes %u > inodes_count %u\n", + valid_inodes, inodes_count); + goto finish_header_init; + } + + node->items_area.items_count = valid_inodes; + node->items_area.items_capacity = inodes_count; + free_inodes = inodes_count - valid_inodes; + + node->items_area.free_space = (u32)free_inodes * item_size; + if (node->items_area.free_space > node->items_area.area_size) { + err = -EIO; + SSDFS_ERR("free_space %u > area_size %u\n", + node->items_area.free_space, + node->items_area.area_size); + goto finish_header_init; + } + + items_count = node_size / item_size; + items_capacity = node_size / item_size; + + index_capacity = 0; + break; + + case SSDFS_BTREE_HYBRID_NODE: + if (items_capacity == 0 || + items_capacity > (node_size / item_size)) { + err = -EIO; + SSDFS_ERR("invalid items_capacity %u\n", + items_capacity); + goto finish_header_init; + } + + if (items_capacity != inodes_count) { + err = -EIO; + SSDFS_ERR("items_capacity %u != inodes_count %u\n", + items_capacity, + inodes_count); + goto finish_header_init; + } + + if (valid_inodes > inodes_count) { + err = -EIO; + SSDFS_ERR("valid_inodes %u > inodes_count %u\n", + valid_inodes, inodes_count); + goto finish_header_init; + } + + node->items_area.items_count = valid_inodes; + node->items_area.items_capacity = inodes_count; + free_inodes = inodes_count - valid_inodes; + + node->items_area.free_space = (u32)free_inodes * item_size; + if (node->items_area.free_space > node->items_area.area_size) { + err = -EIO; + SSDFS_ERR("free_space %u > area_size %u\n", + node->items_area.free_space, + node->items_area.area_size); + goto finish_header_init; + } + + node->index_area.start_hash = + le64_to_cpu(hdr->index_area.start_hash); + node->index_area.end_hash = + le64_to_cpu(hdr->index_area.end_hash); + + if (node->index_area.start_hash >= U64_MAX || + node->index_area.end_hash >= U64_MAX) { + err = -EIO; + SSDFS_ERR("corrupted node: " + "index_area (start_hash %llx, end_hash %llx)\n", + node->index_area.start_hash, + node->index_area.end_hash); + goto finish_header_init; + } + + items_count = node_size / item_size; + items_capacity = node_size / item_size; + + index_capacity = node_size / index_size; + break; + + case SSDFS_BTREE_INDEX_NODE: + node->items_area.items_count = 0; + node->items_area.items_capacity = 0; + node->items_area.free_space = 0; + + items_count = 0; + items_capacity = 0; + + if (start_hash != le64_to_cpu(hdr->index_area.start_hash) || + end_hash != le64_to_cpu(hdr->index_area.end_hash)) { + err = -EIO; + SSDFS_ERR("corrupted node: " + "node index_area " + "(start_hash %llx, end_hash %llx), " + "header index_area " + "(start_hash %llx, end_hash %llx)\n", + node->index_area.start_hash, + node->index_area.end_hash, + le64_to_cpu(hdr->index_area.start_hash), + le64_to_cpu(hdr->index_area.end_hash)); + goto finish_header_init; + } + + index_capacity = node_size / index_size; + break; + + default: + SSDFS_ERR("unexpected node type %#x\n", + hdr->node.type); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_count %u, area_size %u, free_space %u\n", + node->items_area.items_count, + node->items_area.area_size, + node->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_header_init: + up_write(&node->header_lock); + + if (unlikely(err)) + goto finish_init_operation; + + bmap_bytes = index_capacity + items_capacity + 1; + bmap_bytes += BITS_PER_LONG; + bmap_bytes /= BITS_PER_BYTE; + + if (bmap_bytes == 0 || bmap_bytes > SSDFS_INODE_BMAP_SIZE) { + err = -EIO; + SSDFS_ERR("invalid bmap_bytes %zu\n", + bmap_bytes); + goto finish_init_operation; + } + + err = ssdfs_btree_node_allocate_bmaps(addr, bmap_bytes); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate node's bitmaps: " + "bmap_bytes %zu, err %d\n", + bmap_bytes, err); + goto finish_init_operation; + } + + down_write(&node->bmap_array.lock); + + flags = atomic_read(&node->flags); + if (flags & SSDFS_BTREE_NODE_HAS_INDEX_AREA) { + node->bmap_array.index_start_bit = + SSDFS_BTREE_NODE_HEADER_INDEX + 1; + index_area_size = 1 << hdr->node.log_index_area_size; + index_area_size += index_size - 1; + index_capacity = index_area_size / index_size; + node->bmap_array.item_start_bit = + node->bmap_array.index_start_bit + index_capacity; + } else if (flags & SSDFS_BTREE_NODE_HAS_ITEMS_AREA) { + node->bmap_array.item_start_bit = + SSDFS_BTREE_NODE_HEADER_INDEX + 1; + } else + BUG(); + + node->bmap_array.bits_count = index_capacity + items_capacity + 1; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index_capacity %u, index_area_size %u, " + "index_size %u\n", + index_capacity, index_area_size, index_size); + SSDFS_DBG("index_start_bit %lu, item_start_bit %lu, " + "bits_count %lu\n", + node->bmap_array.index_start_bit, + node->bmap_array.item_start_bit, + node->bmap_array.bits_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_btree_node_init_bmaps(node, addr); + + spin_lock(&node->bmap_array.bmap[SSDFS_BTREE_NODE_ALLOC_BMAP].lock); + ssdfs_memcpy(node->bmap_array.bmap[SSDFS_BTREE_NODE_ALLOC_BMAP].ptr, + 0, bmap_bytes, + hdr->bmap, + 0, bmap_bytes, + bmap_bytes); + spin_unlock(&node->bmap_array.bmap[SSDFS_BTREE_NODE_ALLOC_BMAP].lock); + + start = node->bmap_array.item_start_bit; + + up_write(&node->bmap_array.lock); +finish_init_operation: + kunmap_local(kaddr); + + if (unlikely(err)) + goto finish_init_node; + + if (hdr->node.type == SSDFS_BTREE_INDEX_NODE) + goto finish_init_node; + + ssdfs_free_inodes_queue_init(&q); + + switch (hdr->node.type) { + case SSDFS_BTREE_HYBRID_NODE: + err = ssdfs_inodes_btree_detect_deleted_nodes(node, &q); + if (unlikely(err)) { + SSDFS_ERR("fail to detect deleted nodes: " + "err %d\n", err); + ssdfs_free_inodes_queue_remove_all(&q); + goto finish_init_node; + } + break; + + default: + /* do nothing */ + break; + } + + size = inodes_count; + upper_bound = node->bmap_array.item_start_bit + size; + free_inodes = 0; + + do { + start = find_next_zero_bit((unsigned long *)hdr->bmap, + upper_bound, start); + if (start >= upper_bound) + break; + + end = find_next_bit((unsigned long *)hdr->bmap, + upper_bound, start); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(start >= U16_MAX); + BUG_ON((end - start) >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + count = end - start; + start -= node->bmap_array.item_start_bit; + + if (count <= 0) { + err = -ERANGE; + SSDFS_WARN("invalid count %ld\n", count); + break; + } + + range = ssdfs_free_inodes_range_alloc(); + if (unlikely(!range)) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate inodes range\n"); + break; + } + + ssdfs_free_inodes_range_init(range); + range->node_id = node->node_id; + range->area.start_hash = start_hash + start; + range->area.start_index = (u16)start; + range->area.count = (u16)count; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_hash %llx, end_hash %llx, " + "range->area.start_hash %llx\n", + start_hash, end_hash, + range->area.start_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (range->area.start_hash > end_hash) { + err = -EIO; + SSDFS_ERR("start_hash %llx > end_hash %llx\n", + range->area.start_hash, end_hash); + ssdfs_free_inodes_range_free(range); + break; + } + + free_inodes += count; + if ((valid_inodes + free_inodes) > inodes_count) { + err = -EIO; + SSDFS_ERR("invalid free_inodes: " + "valid_inodes %u, free_inodes %lu, " + "inodes_count %u\n", + valid_inodes, free_inodes, + inodes_count); + ssdfs_free_inodes_range_free(range); + break; + } + + ssdfs_free_inodes_queue_add_tail(&q, range); + + spin_lock(&tree->lock); + if (range->area.start_hash > tree->last_free_ino) { + tree->last_free_ino = + range->area.start_hash + range->area.count; + } + spin_unlock(&tree->lock); + + start = end; + } while (start < size); + + if (unlikely(err)) { + ssdfs_free_inodes_queue_remove_all(&q); + goto finish_init_node; + } + + while (!is_ssdfs_free_inodes_queue_empty(&q)) { + err = ssdfs_free_inodes_queue_remove_first(&q, &range); + if (unlikely(err)) { + SSDFS_ERR("fail to get range: err %d\n", err); + goto finish_init_node; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("add free range: node_id %u, " + "start_hash %llx, start_index %u, " + "count %u\n", + range->node_id, + range->area.start_hash, + range->area.start_index, + range->area.count); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_free_inodes_queue_add_tail(&tree->free_inodes_queue, + range); + }; + +finish_init_node: + up_read(&node->full_lock); + + ssdfs_debug_btree_node_object(node); + + return err; +} + +static +void ssdfs_inodes_btree_destroy_node(struct ssdfs_btree_node *node) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("operation is unavailable\n"); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +/* + * ssdfs_inodes_btree_node_correct_hash_range() - correct node's hash range + * @node: pointer on node object + * + * This method tries to correct node's hash range. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - unable to allocate memory. + * %-ERANGE - internal error. + */ +static +int ssdfs_inodes_btree_node_correct_hash_range(struct ssdfs_btree_node *node, + u64 start_hash) +{ + struct ssdfs_inodes_btree_info *itree; + u16 items_count; + u16 items_capacity; + u16 free_items; + struct ssdfs_inodes_btree_range *range = NULL; + struct ssdfs_btree_index_key new_key; + int type; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(start_hash >= U64_MAX); + + SSDFS_DBG("node_id %u, state %#x, " + "node_type %#x, start_hash %llx\n", + node->node_id, atomic_read(&node->state), + atomic_read(&node->type), start_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + itree = (struct ssdfs_inodes_btree_info *)node->tree; + type = atomic_read(&node->type); + + switch (type) { + case SSDFS_BTREE_LEAF_NODE: + case SSDFS_BTREE_HYBRID_NODE: + /* expected state */ + break; + + default: + /* do nothing */ + return 0; + } + + down_write(&node->header_lock); + + items_count = node->items_area.items_count; + items_capacity = node->items_area.items_capacity; + + switch (type) { + case SSDFS_BTREE_LEAF_NODE: + case SSDFS_BTREE_HYBRID_NODE: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(items_capacity == 0); +#endif /* CONFIG_SSDFS_DEBUG */ + node->items_area.start_hash = start_hash; + node->items_area.end_hash = start_hash + items_capacity - 1; + break; + + default: + /* do nothing */ + break; + } + + up_write(&node->header_lock); + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_HYBRID_NODE: + spin_lock(&node->descriptor_lock); + ssdfs_memcpy(&new_key, + 0, sizeof(struct ssdfs_btree_index_key), + &node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + spin_unlock(&node->descriptor_lock); + + new_key.index.hash = cpu_to_le64(start_hash); + + err = ssdfs_btree_node_add_index(node, &new_key); + if (unlikely(err)) { + SSDFS_ERR("fail to add index: err %d\n", + err); + return err; + } + break; + + default: + /* do nothing */ + break; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(items_count > items_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + free_items = items_capacity - items_count; + + if (items_capacity == 0) { + if (type == SSDFS_BTREE_LEAF_NODE || + type == SSDFS_BTREE_HYBRID_NODE) { + SSDFS_ERR("invalid node state: " + "type %#x, items_capacity %u\n", + type, items_capacity); + return -ERANGE; + } + } else { + range = ssdfs_free_inodes_range_alloc(); + if (unlikely(!range)) { + SSDFS_ERR("fail to allocate inodes range\n"); + return -ENOMEM; + } + + ssdfs_free_inodes_range_init(range); + range->node_id = node->node_id; + range->area.start_hash = start_hash + items_count; + range->area.start_index = items_count; + range->area.count = free_items; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("add free range: node_id %u, " + "start_hash %llx, start_index %u, " + "count %u\n", + range->node_id, + range->area.start_hash, + range->area.start_index, + range->area.count); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_free_inodes_queue_add_tail(&itree->free_inodes_queue, + range); + + spin_lock(&itree->lock); + if (range->area.start_hash > itree->last_free_ino) { + itree->last_free_ino = + range->area.start_hash + range->area.count; + } + spin_unlock(&itree->lock); + } + + ssdfs_debug_btree_node_object(node); + + return 0; +} + +/* + * ssdfs_inodes_btree_add_node() - add node into inodes btree + * @node: pointer on node object + * + * This method tries to finish addition of node into inodes btree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - unable to allocate memory. + * %-ERANGE - internal error. + */ +static +int ssdfs_inodes_btree_add_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_inodes_btree_info *itree; + struct ssdfs_btree_node *parent_node = NULL; + int type; + u64 start_hash = U64_MAX; + u16 items_capacity; + spinlock_t *lock; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node_id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected states */ + break; + + default: + SSDFS_WARN("invalid node: id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); + return -ERANGE; + } + + itree = (struct ssdfs_inodes_btree_info *)node->tree; + type = atomic_read(&node->type); + + down_read(&node->header_lock); + start_hash = node->items_area.start_hash; + items_capacity = node->items_area.items_capacity; + up_read(&node->header_lock); + + switch (type) { + case SSDFS_BTREE_INDEX_NODE: + ssdfs_debug_btree_node_object(node); + break; + + case SSDFS_BTREE_HYBRID_NODE: + err = ssdfs_inodes_btree_node_correct_hash_range(node, + start_hash); + if (unlikely(err)) { + SSDFS_ERR("fail to correct hash range: " + "err %d\n", err); + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + return err; + } + break; + + case SSDFS_BTREE_LEAF_NODE: + err = ssdfs_inodes_btree_node_correct_hash_range(node, + start_hash); + if (unlikely(err)) { + SSDFS_ERR("fail to correct hash range: " + "err %d\n", err); + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + return err; + } + + lock = &node->descriptor_lock; + spin_lock(lock); + parent_node = node->parent_node; + spin_unlock(lock); + lock = NULL; + + start_hash += items_capacity; + + err = ssdfs_inodes_btree_node_correct_hash_range(parent_node, + start_hash); + if (unlikely(err)) { + SSDFS_ERR("fail to correct hash range: " + "err %d\n", err); + atomic_set(&parent_node->state, + SSDFS_BTREE_NODE_CORRUPTED); + return err; + } + break; + + default: + SSDFS_WARN("invalid node type %#x\n", type); + return -ERANGE; + }; + + spin_lock(&itree->lock); + itree->nodes_count++; + if (type == SSDFS_BTREE_LEAF_NODE) + itree->leaf_nodes++; + itree->inodes_capacity += items_capacity; + itree->free_inodes += items_capacity; + spin_unlock(&itree->lock); + + err = ssdfs_btree_update_parent_node_pointer(node->tree, node); + if (unlikely(err)) { + SSDFS_ERR("fail to update parent pointer: " + "node_id %u, err %d\n", + node->node_id, err); + return err; + } + + ssdfs_debug_btree_node_object(node); + + return 0; +} + +static +int ssdfs_inodes_btree_delete_node(struct ssdfs_btree_node *node) +{ + /* TODO: implement */ + SSDFS_DBG("TODO: implement %s\n", __func__); + return 0; + +/* + * TODO: it needs to add special free space descriptor in the + * index area for the case of deleted nodes. Code of + * allocation of new items should create empty node + * with completely free items during passing through + * index level. + */ + + + +/* + * TODO: node can be really deleted/invalidated. But index + * area should contain index for deleted node with + * special flag. In this case it will be clear that + * we have some capacity without real node allocation. + * If some item will be added in the node then node + * has to be allocated. It means that if you delete + * a node then index hierachy will be the same without + * necessity to delete or modify it. + */ + + + + /* TODO: decrement nodes_count and/or leaf_nodes counters */ + /* TODO: decrease inodes_capacity and/or free_inodes */ +} + +/* + * ssdfs_inodes_btree_pre_flush_node() - pre-flush node's header + * @node: pointer on node object + * + * This method tries to flush node's header. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_inodes_btree_pre_flush_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_inodes_btree_node_header inodes_header; + struct ssdfs_state_bitmap *bmap; + size_t hdr_size = sizeof(struct ssdfs_inodes_btree_node_header); + u32 bmap_bytes; + struct page *page; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node_id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_debug_btree_node_object(node); + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + case SSDFS_BTREE_NODE_INITIALIZED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is clean\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + + case SSDFS_BTREE_NODE_PRE_DELETED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is pre-deleted\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + + case SSDFS_BTREE_NODE_CORRUPTED: + SSDFS_WARN("node %u is corrupted\n", + node->node_id); + down_read(&node->bmap_array.lock); + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_DIRTY_BMAP]; + spin_lock(&bmap->lock); + bitmap_clear(bmap->ptr, 0, node->bmap_array.bits_count); + spin_unlock(&bmap->lock); + up_read(&node->bmap_array.lock); + clear_ssdfs_btree_node_dirty(node); + return -EFAULT; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + down_write(&node->full_lock); + down_write(&node->header_lock); + + ssdfs_memcpy(&inodes_header, 0, hdr_size, + &node->raw.inodes_header, 0, hdr_size, + hdr_size); + + inodes_header.node.magic.common = cpu_to_le32(SSDFS_SUPER_MAGIC); + inodes_header.node.magic.key = cpu_to_le16(SSDFS_INODES_BNODE_MAGIC); + inodes_header.node.magic.version.major = SSDFS_MAJOR_REVISION; + inodes_header.node.magic.version.minor = SSDFS_MINOR_REVISION; + + err = ssdfs_btree_node_pre_flush_header(node, &inodes_header.node); + if (unlikely(err)) { + SSDFS_ERR("fail to flush generic header: " + "node_id %u, err %d\n", + node->node_id, err); + goto finish_inodes_header_preparation; + } + + inodes_header.valid_inodes = + cpu_to_le16(node->items_area.items_count); + inodes_header.inodes_count = + cpu_to_le16(node->items_area.items_capacity); + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + inodes_header.index_area.start_hash = + cpu_to_le64(node->index_area.start_hash); + inodes_header.index_area.end_hash = + cpu_to_le64(node->index_area.end_hash); + break; + + case SSDFS_BTREE_LEAF_NODE: + /* do nothing */ + break; + + default: + SSDFS_WARN("invalid node type %#x\n", + atomic_read(&node->type)); + break; + }; + + down_read(&node->bmap_array.lock); + bmap_bytes = node->bmap_array.bmap_bytes; + spin_lock(&node->bmap_array.bmap[SSDFS_BTREE_NODE_ALLOC_BMAP].lock); + ssdfs_memcpy(inodes_header.bmap, + 0, bmap_bytes, + node->bmap_array.bmap[SSDFS_BTREE_NODE_ALLOC_BMAP].ptr, + 0, bmap_bytes, + bmap_bytes); + spin_unlock(&node->bmap_array.bmap[SSDFS_BTREE_NODE_ALLOC_BMAP].lock); + up_read(&node->bmap_array.lock); + + inodes_header.node.check.bytes = cpu_to_le16((u16)hdr_size); + inodes_header.node.check.flags = cpu_to_le16(SSDFS_CRC32); + + err = ssdfs_calculate_csum(&inodes_header.node.check, + &inodes_header, hdr_size); + if (unlikely(err)) { + SSDFS_ERR("unable to calculate checksum: err %d\n", err); + goto finish_inodes_header_preparation; + } + + ssdfs_memcpy(&node->raw.inodes_header, 0, hdr_size, + &inodes_header, 0, hdr_size, + hdr_size); + +finish_inodes_header_preparation: + up_write(&node->header_lock); + + if (unlikely(err)) + goto finish_node_pre_flush; + + if (pagevec_count(&node->content.pvec) < 1) { + err = -ERANGE; + SSDFS_ERR("pagevec is empty\n"); + goto finish_node_pre_flush; + } + + page = node->content.pvec.pages[0]; + ssdfs_memcpy_to_page(page, 0, PAGE_SIZE, + &inodes_header, 0, hdr_size, + hdr_size); + +finish_node_pre_flush: + up_write(&node->full_lock); + + return err; +} + +/* + * ssdfs_inodes_btree_flush_node() - flush node + * @node: pointer on node object + * + * This method tries to flush node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_inodes_btree_flush_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_btree *tree; + u64 fs_feature_compat; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node %p, node_id %u\n", + node, node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree = node->tree; + if (!tree) { + SSDFS_ERR("node hasn't pointer on tree\n"); + return -ERANGE; + } + + if (tree->type != SSDFS_INODES_BTREE) { + SSDFS_WARN("invalid tree type %#x\n", + tree->type); + return -ERANGE; + } + + fsi = node->tree->fsi; + + spin_lock(&fsi->volume_state_lock); + fs_feature_compat = fsi->fs_feature_compat; + spin_unlock(&fsi->volume_state_lock); + + if (fs_feature_compat & SSDFS_HAS_INODES_TREE_COMPAT_FLAG) { + err = ssdfs_btree_common_node_flush(node); + if (unlikely(err)) { + SSDFS_ERR("fail to flush node: " + "node_id %u, height %u, err %d\n", + node->node_id, + atomic_read(&node->height), + err); + } + } else { + err = -EFAULT; + SSDFS_CRIT("inodes tree is absent\n"); + } + + ssdfs_debug_btree_node_object(node); + + return err; +} diff --git a/fs/ssdfs/inodes_tree.h b/fs/ssdfs/inodes_tree.h new file mode 100644 index 000000000000..e0e8efca7b86 --- /dev/null +++ b/fs/ssdfs/inodes_tree.h @@ -0,0 +1,177 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/inodes_tree.h - inodes btree declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#ifndef _SSDFS_INODES_TREE_H +#define _SSDFS_INODES_TREE_H + +/* + * struct ssdfs_inodes_range - items range + * @start_hash: starting hash + * @start_index: staring index in the node + * @count: count of items in the range + */ +struct ssdfs_inodes_range { +#define SSDFS_INODES_RANGE_INVALID_START (U64_MAX) + u64 start_hash; +#define SSDFS_INODES_RANGE_INVALID_INDEX (U16_MAX) + u16 start_index; + u16 count; +}; + +/* + * struct ssdfs_inodes_btree_range - node's items range descriptor + * @list: free inode ranges queue + * @node_id: node identification number + * @area: items range + */ +struct ssdfs_inodes_btree_range { + struct list_head list; + u32 node_id; + struct ssdfs_inodes_range area; +}; + +/* + * struct ssdfs_free_inode_range_queue - free inode ranges queue + * @lock: queue's lock + * @list: queue's list + */ +struct ssdfs_free_inode_range_queue { + spinlock_t lock; + struct list_head list; +}; + +/* + * struct ssdfs_inodes_btree_info - inodes btree info + * @generic_tree: generic btree description + * @lock: inodes btree lock + * @root_folder: copy of root folder's inode + * @upper_allocated_ino: maximal allocated inode ID number + * @last_free_ino: latest free inode ID number + * @allocated_inodes: allocated inodes count in the whole tree + * @free_inodes: free inodes count in the whole tree + * @inodes_capacity: inodes capacity in the whole tree + * @leaf_nodes: count of leaf nodes in the whole tree + * @nodes_count: count of all nodes in the whole tree + * @raw_inode_size: size in bytes of raw inode + * @free_inodes_queue: queue of free inode descriptors + */ +struct ssdfs_inodes_btree_info { + struct ssdfs_btree generic_tree; + + spinlock_t lock; + struct ssdfs_inode root_folder; + u64 upper_allocated_ino; + u64 last_free_ino; + u64 allocated_inodes; + u64 free_inodes; + u64 inodes_capacity; + u32 leaf_nodes; + u32 nodes_count; + u16 raw_inode_size; + +/* + * Inodes btree should have special allocation queue. + * If a btree nodes has free (not allocated) inodes + * items then the information about such btree node + * should be added into queue. Moreover, queue should + * contain as so many node's descriptors as free items + * in the node. + * + * If some btree node has deleted inodes (free items) + * then all node's descriptors should be added into + * the head of allocation queue. Descriptors of the last + * btree's node should be added into tail of the queue. + * Information about node's descriptors should be added + * into the allocation queue during btree node creation + * or reading from the volume. Otherwise, allocation of + * new items should be done from last leaf btree's node. + */ + struct ssdfs_free_inode_range_queue free_inodes_queue; +}; + +/* + * Inline methods + */ +static inline +bool is_free_inodes_range_invalid(struct ssdfs_inodes_btree_range *range) +{ + bool is_invalid; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!range); +#endif /* CONFIG_SSDFS_DEBUG */ + + is_invalid = range->node_id == SSDFS_BTREE_NODE_INVALID_ID || + range->area.start_hash == SSDFS_INODES_RANGE_INVALID_START || + range->area.start_index == SSDFS_INODES_RANGE_INVALID_INDEX || + range->area.count == 0; + + if (is_invalid) { + SSDFS_ERR("node_id %u, start_hash %llx, " + "start_index %u, count %u\n", + range->node_id, + range->area.start_hash, + range->area.start_index, + range->area.count); + } + + return is_invalid; +} + +/* + * Free inodes range API + */ +struct ssdfs_inodes_btree_range *ssdfs_free_inodes_range_alloc(void); +void ssdfs_free_inodes_range_free(struct ssdfs_inodes_btree_range *range); +void ssdfs_free_inodes_range_init(struct ssdfs_inodes_btree_range *range); + +/* + * Inodes btree API + */ +int ssdfs_inodes_btree_create(struct ssdfs_fs_info *fsi); +void ssdfs_inodes_btree_destroy(struct ssdfs_fs_info *fsi); +int ssdfs_inodes_btree_flush(struct ssdfs_inodes_btree_info *tree); + +int ssdfs_inodes_btree_allocate(struct ssdfs_inodes_btree_info *tree, + ino_t *ino, + struct ssdfs_btree_search *search); +int ssdfs_inodes_btree_find(struct ssdfs_inodes_btree_info *tree, + ino_t ino, + struct ssdfs_btree_search *search); +int ssdfs_inodes_btree_change(struct ssdfs_inodes_btree_info *tree, + ino_t ino, + struct ssdfs_btree_search *search); +int ssdfs_inodes_btree_delete(struct ssdfs_inodes_btree_info *tree, + ino_t ino); +int ssdfs_inodes_btree_delete_range(struct ssdfs_inodes_btree_info *tree, + ino_t ino, u16 count); + +void ssdfs_debug_inodes_btree_object(struct ssdfs_inodes_btree_info *tree); + +/* + * Inodes btree specialized operations + */ +extern const struct ssdfs_btree_descriptor_operations + ssdfs_inodes_btree_desc_ops; +extern const struct ssdfs_btree_operations ssdfs_inodes_btree_ops; +extern const struct ssdfs_btree_node_operations ssdfs_inodes_btree_node_ops; + +#endif /* _SSDFS_INODES_TREE_H */ From patchwork Sat Feb 25 01:09:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151966 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17CB3C7EE31 for ; Sat, 25 Feb 2023 01:19:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229787AbjBYBTz (ORCPT ); Fri, 24 Feb 2023 20:19:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49164 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229600AbjBYBTK (ORCPT ); Fri, 24 Feb 2023 20:19:10 -0500 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 51118136F7 for ; Fri, 24 Feb 2023 17:17:45 -0800 (PST) Received: by mail-oi1-x22b.google.com with SMTP id y184so798593oiy.8 for ; Fri, 24 Feb 2023 17:17:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BTGT6KmQ2vo1h08Z5eusnqR7jQWVuEdSEGSCxP+3mls=; b=bmghS9Hzil5PohcNIPo3dBsQKW0WAr763AhmdeYF/P+zC4paxkY3B+IapfLcYVpCXq he36P4ZADj8R8ouk0hK0UaEcc/XhaC4jLamvvBZbCqkdSmLmJIiwC2yk+jq+wJc+UziJ +NJ9cO/K53gAIC6cfH9MGJk1MJr1xPRzQtoMkotWmZ6822mFDmZanFVzgXN2MhOZc10M 0tAB6pry2nHA81jwbujT3usvhz93QtZdSgAYOKAhsRgBuzpNYSyzLLMOCqpn0uGM0vcg cMkCRx/tYXfGytWsJib8DEiIxMxaCIMD0tMFBJvO5/naHVnK2imPfFLZVLK/qAVoXGSm G8JA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BTGT6KmQ2vo1h08Z5eusnqR7jQWVuEdSEGSCxP+3mls=; b=RkMzzp0eSigkF3ERik/qq67l+M5tscoWH9fmS1tkce0WoRfB1ZpQro+4DOYLKYlwKQ OE5tq29jI42fVpCLzET+09w/EgY19l+wssSdbG1WxytXnh8mPzULIAJvLuMQ0ra314Ut n9euHYUO+tCWvJ1jdl4sAJIhyDg/PQwNvhVJRhu+atodaav2viWg2XC7ZMjSdo9IrRPl 8Um4AWKuPVZnXIgcXtW3WavmaB/xpdvITQkKCuX27igCKyk96VhehQbzZiNku7kXdrsV qWxBftvDS/CL/7+I+v8Y/AujRWP/mDpmg2IwjtDX1HutbiPc+JGJOK1f8DZ2rB275oyU ydxQ== X-Gm-Message-State: AO0yUKVk4jKZpUb/ZPYzqdkYHP6/rE/FuxD0X0wm+IhJv8k8Xwr5/XqC cGyCq4qKyM59WIdP9/8ou1gB5B4uiaR01V1q X-Google-Smtp-Source: AK7set9yK8oGhDikfsIXID2ADNM62ACbZtdv0RaSSQTEChbDli4lqSNHm86yw78nt4kLhdghBERhBw== X-Received: by 2002:aca:181a:0:b0:37a:dbfd:2416 with SMTP id h26-20020aca181a000000b0037adbfd2416mr8084756oih.53.1677287864083; Fri, 24 Feb 2023 17:17:44 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:43 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 61/76] ssdfs: inodes b-tree node operations Date: Fri, 24 Feb 2023 17:09:12 -0800 Message-Id: <20230225010927.813929-62-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Inodes b-tree supports operations: (1) find_item - find item in the b-tree (2) find_range - find range of items in the b-tree (3) extract_range - extract range of items from the node of b-tree (4) allocate_item - allocate item in b-tree (5) allocate_range - allocate range of items in b-tree (6) insert_item - insert item into node of the b-tree (7) insert_range - insert range of items into node of the b-tree (8) change_item - change item in the b-tree (9) delete_item - delete item from the b-tree (10) delete_range - delete range of items from a node of b-tree Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/inodes_tree.c | 2366 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 2366 insertions(+) diff --git a/fs/ssdfs/inodes_tree.c b/fs/ssdfs/inodes_tree.c index 1cc42cc84513..f17142d9c6db 100644 --- a/fs/ssdfs/inodes_tree.c +++ b/fs/ssdfs/inodes_tree.c @@ -3166,3 +3166,2369 @@ int ssdfs_inodes_btree_flush_node(struct ssdfs_btree_node *node) return err; } + +/****************************************************************************** + * SPECIALIZED INODES BTREE NODE OPERATIONS * + ******************************************************************************/ + +/* + * ssdfs_inodes_btree_node_find_range() - find a range of items into the node + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to find a range of items into the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - requested range is out of the node. + * %-ENOMEM - unable to allocate memory. + */ +static +int ssdfs_inodes_btree_node_find_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + size_t item_size = sizeof(struct ssdfs_inode); + int state; + u16 items_count; + u16 items_capacity; + u64 start_hash; + u64 end_hash; + u64 found_index, start_index = U64_MAX; + u64 found_bit = U64_MAX; + struct ssdfs_state_bitmap *bmap; + unsigned long item_start_bit; + bool is_allocated = false; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); + + ssdfs_debug_btree_search_object(search); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->header_lock); + state = atomic_read(&node->items_area.state); + items_count = node->items_area.items_count; + items_capacity = node->items_area.items_capacity; + start_hash = node->items_area.start_hash; + end_hash = node->items_area.end_hash; + up_read(&node->header_lock); + + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + SSDFS_ERR("invalid area state %#x\n", + state); + return -ERANGE; + } + + if (items_capacity == 0 || items_count > items_capacity) { + SSDFS_ERR("corrupted node description: " + "items_count %u, items_capacity %u\n", + items_count, + items_capacity); + return -ERANGE; + } + + if (search->request.count == 0 || + search->request.count > items_capacity) { + SSDFS_ERR("invalid request: " + "count %u, items_capacity %u\n", + search->request.count, + items_capacity); + return -ERANGE; + } + + err = ssdfs_btree_node_check_hash_range(node, + items_count, + items_capacity, + start_hash, + end_hash, + search); + if (err) + return err; + + found_index = search->request.start.hash - start_hash; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(found_index >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if ((found_index + search->request.count) > items_capacity) { + SSDFS_ERR("invalid request: " + "found_index %llu, count %u, " + "items_capacity %u\n", + found_index, search->request.count, + items_capacity); + return -ERANGE; + } + + down_read(&node->bmap_array.lock); + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_ALLOC_BMAP]; + item_start_bit = node->bmap_array.item_start_bit; + if (item_start_bit == ULONG_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid items_area_start\n"); + goto finish_bmap_operation; + } + start_index = found_index + item_start_bit; + + spin_lock(&bmap->lock); + + found_bit = bitmap_find_next_zero_area(bmap->ptr, + items_capacity + item_start_bit, + start_index, + search->request.count, + 0); + + if (start_index == found_bit) { + /* item isn't allocated yet */ + is_allocated = false; + } else { + /* item has been allocated already */ + is_allocated = true; + } + spin_unlock(&bmap->lock); +finish_bmap_operation: + up_read(&node->bmap_array.lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_count %u, items_capacity %u, " + "item_start_bit %lu, found_index %llu, " + "start_index %llu, found_bit %llu\n", + items_count, items_capacity, + item_start_bit, found_index, + start_index, found_bit); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_allocated) { + if (search->request.count == 1) { + search->result.buf_state = + SSDFS_BTREE_SEARCH_INLINE_BUFFER; + search->result.buf = &search->raw.inode; + search->result.buf_size = item_size; + search->result.items_in_buffer = 0; + } else { + err = ssdfs_btree_search_alloc_result_buf(search, + item_size * search->request.count); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate buffer\n"); + return err; + } + } + + for (i = 0; i < search->request.count; i++) { + err = ssdfs_copy_item_in_buffer(node, + (u16)found_index + i, + item_size, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to copy item in buffer: " + "index %d, err %d\n", + i, err); + return err; + } + } + + err = 0; + search->result.state = + SSDFS_BTREE_SEARCH_VALID_ITEM; + search->result.err = 0; + search->result.start_index = (u16)found_index; + search->result.count = search->request.count; + search->result.search_cno = + ssdfs_current_cno(node->tree->fsi->sb); + } else { + err = -ENODATA; + search->result.state = + SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND; + search->result.err = -ENODATA; + search->result.start_index = (u16)found_index; + search->result.count = search->request.count; + search->result.search_cno = + ssdfs_current_cno(node->tree->fsi->sb); + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_ADD_ITEM: + case SSDFS_BTREE_SEARCH_ADD_RANGE: + case SSDFS_BTREE_SEARCH_CHANGE_ITEM: + /* do nothing */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->result.buf_state = + SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE; + search->result.buf = NULL; + search->result.buf_size = 0; + search->result.items_in_buffer = 0; + break; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search result: " + "state %#x, err %d, " + "start_index %u, count %u, " + "search_cno %llu, " + "buf_state %#x, buf %p\n", + search->result.state, + search->result.err, + search->result.start_index, + search->result.count, + search->result.search_cno, + search->result.buf_state, + search->result.buf); + + ssdfs_debug_btree_node_object(node); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_inodes_btree_node_find_item() - find item into node + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to find an item into the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_inodes_btree_node_find_item(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search->request.count != 1 || + search->request.start.hash != search->request.end.hash) { + SSDFS_ERR("invalid request state: " + "count %d, start_hash %llx, end_hash %llx\n", + search->request.count, + search->request.start.hash, + search->request.end.hash); + return -ERANGE; + } + + return ssdfs_inodes_btree_node_find_range(node, search); +} + +/* + * ssdfs_define_allocated_range() - define range for allocation + * @search: pointer on search request object + * @start_hash: requested starting hash + * @end_hash: requested ending hash + * @start: pointer on start index value [out] + * @count: pointer on count items in the range [out] + * + * This method checks request in the search object and + * to define the range's start index and count of items + * in the range. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static inline +int ssdfs_define_allocated_range(struct ssdfs_btree_search *search, + u64 start_hash, u64 end_hash, + unsigned long *start, unsigned int *count) +{ + unsigned int calculated_count; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search || !start || !count); + + SSDFS_DBG("node (id %u, start_hash %llx, " + "end_hash %llx), " + "request (start_hash %llx, " + "end_hash %llx, flags %#x)\n", + search->node.id, start_hash, end_hash, + search->request.start.hash, + search->request.end.hash, + search->request.flags); +#endif /* CONFIG_SSDFS_DEBUG */ + + *start = ULONG_MAX; + *count = 0; + + if (search->request.flags & SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE) { + if (search->request.start.hash < start_hash || + search->request.start.hash > end_hash) { + SSDFS_ERR("invalid hash range: " + "node (id %u, start_hash %llx, " + "end_hash %llx), " + "request (start_hash %llx, " + "end_hash %llx)\n", + search->node.id, start_hash, end_hash, + search->request.start.hash, + search->request.end.hash); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON((search->request.start.hash - start_hash) >= ULONG_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + *start = (unsigned long)(search->request.start.hash - + start_hash); + calculated_count = search->request.end.hash - + search->request.start.hash + 1; + } else { + *start = 0; + calculated_count = search->request.count; + } + + if (search->request.flags & SSDFS_BTREE_SEARCH_HAS_VALID_COUNT) { + *count = search->request.count; + + if (*count < 0 || *count >= UINT_MAX) { + SSDFS_WARN("invalid count %u\n", *count); + return -ERANGE; + } + + if (*count != calculated_count) { + SSDFS_ERR("invalid count: count %u, " + "calculated_count %u\n", + *count, calculated_count); + return -ERANGE; + } + } + + if (*start >= ULONG_MAX || *count >= UINT_MAX) { + SSDFS_WARN("invalid range (start %lu, count %u)\n", + *start, *count); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_copy_item_into_node_unlocked() - copy item from buffer into the node + * @node: pointer on node object + * @search: pointer on search request object + * @item_index: index of item in the node + * @buf_index: index of item into the buffer + * + * This method tries to copy an item from the buffer into the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_copy_item_into_node_unlocked(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search, + u16 item_index, u16 buf_index) +{ + size_t item_size = sizeof(struct ssdfs_inode); + u32 area_offset; + u32 area_size; + u32 item_offset; + u32 buf_offset; + int page_index; + struct page *page; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, item_index %u, buf_index %u\n", + node->node_id, item_index, buf_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->header_lock); + area_offset = node->items_area.offset; + area_size = node->items_area.area_size; + up_read(&node->header_lock); + + item_offset = (u32)item_index * item_size; + if (item_offset >= area_size) { + SSDFS_ERR("item_offset %u >= area_size %u\n", + item_offset, area_size); + return -ERANGE; + } + + item_offset += area_offset; + if (item_offset >= node->node_size) { + SSDFS_ERR("item_offset %u >= node_size %u\n", + item_offset, node->node_size); + return -ERANGE; + } + + page_index = item_offset >> PAGE_SHIFT; + + if (page_index > 0) + item_offset %= page_index * PAGE_SIZE; + + if (page_index >= pagevec_count(&node->content.pvec)) { + SSDFS_ERR("invalid page_index: " + "index %d, pvec_size %u\n", + page_index, + pagevec_count(&node->content.pvec)); + return -ERANGE; + } + + page = node->content.pvec.pages[page_index]; + + if (!search->result.buf) { + SSDFS_ERR("buffer is not created\n"); + return -ERANGE; + } + + if (buf_index >= search->result.items_in_buffer) { + SSDFS_ERR("buf_index %u >= items_in_buffer %u\n", + buf_index, search->result.items_in_buffer); + return -ERANGE; + } + + buf_offset = buf_index * item_size; + + err = ssdfs_memcpy_to_page(page, + item_offset, PAGE_SIZE, + search->result.buf, + buf_offset, search->result.buf_size, + item_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy item: " + "buf_offset %u, item_offset %u, " + "item_size %zu, buf_size %zu\n", + buf_offset, item_offset, + item_size, search->result.buf_size); + return err; + } + + return 0; +} + +/* + * __ssdfs_btree_node_allocate_range() - allocate range of items in the node + * @node: pointer on node object + * @search: pointer on search request object + * @start_index: start index of the range + * @count: count of items in the range + * + * This method tries to allocate range of items in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOSPC - node hasn't free items. + * %-ENOMEM - fail to allocate memory. + */ +static +int __ssdfs_btree_node_allocate_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search, + u16 start, u16 count) +{ + struct ssdfs_inodes_btree_info *itree; + struct ssdfs_inodes_btree_node_header *hdr; + size_t inode_size = sizeof(struct ssdfs_inode); + struct ssdfs_state_bitmap *bmap; + struct timespec64 cur_time; + u16 item_size; + u16 max_item_size; + u16 item_index; + u16 items_count; + u16 items_capacity; + int free_items; + u64 start_hash; + u64 end_hash; + u32 bmap_bytes; + u64 free_inodes; + u64 allocated_inodes; + u64 upper_allocated_ino; + u64 inodes_capacity; + u32 used_space; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->header_lock); + item_size = node->items_area.item_size; + max_item_size = node->items_area.max_item_size; + items_count = node->items_area.items_count; + items_capacity = node->items_area.items_capacity; + start_hash = node->items_area.start_hash; + end_hash = node->items_area.end_hash; + up_read(&node->header_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, start %u, count %u, " + "items_count %u, items_capacity %u, " + "start_hash %llx, end_hash %llx\n", + node->node_id, start, count, + items_count, items_capacity, + start_hash, end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (items_capacity == 0 || items_capacity < items_count) { + SSDFS_ERR("invalid items accounting: " + "node_id %u, items_capacity %u, items_count %u\n", + search->node.id, items_capacity, items_count); + return -ERANGE; + } + + if (item_size != inode_size || max_item_size != item_size) { + SSDFS_ERR("item_size %u, max_item_size %u, " + "inode_size %zu\n", + item_size, max_item_size, inode_size); + return -ERANGE; + } + + free_items = items_capacity - items_count; + if (unlikely(free_items < 0)) { + SSDFS_WARN("invalid free_items %d\n", + free_items); + return -ERANGE; + } else if (free_items == 0) { + SSDFS_DBG("node hasn't free items\n"); + return -ENOSPC; + } + + item_index = search->result.start_index; + if ((item_index + search->request.count) > items_capacity) { + SSDFS_ERR("invalid request: " + "item_index %u, count %u, " + "items_capacity %u\n", + item_index, search->request.count, + items_capacity); + return -ERANGE; + } + + if ((start_hash + item_index) != search->request.start.hash) { + SSDFS_WARN("node (start_hash %llx, index %u), " + "request (start_hash %llx, end_hash %llx)\n", + start_hash, item_index, + search->request.start.hash, + search->request.end.hash); + return -ERANGE; + } + + if (start != item_index) { + SSDFS_WARN("start %u != item_index %u\n", + start, item_index); + return -ERANGE; + } + + down_write(&node->full_lock); + + err = ssdfs_lock_items_range(node, start, count); + if (err == -ENOENT) { + up_write(&node->full_lock); + return -ERANGE; + } else if (err == -ENODATA) { + up_write(&node->full_lock); + wake_up_all(&node->wait_queue); + return -ERANGE; + } else if (unlikely(err)) + BUG(); + + downgrade_write(&node->full_lock); + + err = ssdfs_allocate_items_range(node, search, + items_capacity, + start, count); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate: " + "start %u, count %u, err %d\n", + start, count, err); + goto finish_allocate_item; + } + + search->result.state = SSDFS_BTREE_SEARCH_VALID_ITEM; + search->result.start_index = start; + search->result.count = count; + search->result.buf_size = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search->result.start_index %u\n", + (u32)search->result.start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (count > 1) { + size_t allocated_bytes = item_size * count; + + err = ssdfs_btree_search_alloc_result_buf(search, + allocated_bytes); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate memory for buffer\n"); + goto finish_allocate_item; + } + search->result.items_in_buffer = count; + search->result.buf_size = allocated_bytes; + } else if (count == 1) { + search->result.buf_state = SSDFS_BTREE_SEARCH_INLINE_BUFFER; + search->result.buf = &search->raw.inode; + search->result.buf_size = item_size; + search->result.items_in_buffer = 1; + } else + BUG(); + + memset(search->result.buf, 0, search->result.buf_size); + + for (i = 0; i < count; i++) { + struct ssdfs_inode *inode; + u32 item_offset = i * item_size; + + inode = (struct ssdfs_inode *)(search->result.buf + + item_offset); + + ktime_get_coarse_real_ts64(&cur_time); + + inode->magic = cpu_to_le16(SSDFS_INODE_MAGIC); + inode->birthtime = cpu_to_le64(cur_time.tv_sec); + inode->birthtime_nsec = cpu_to_le32(cur_time.tv_nsec); + inode->ino = cpu_to_le64(search->request.start.hash); + + err = ssdfs_copy_item_into_node_unlocked(node, search, + start + i, i); + if (unlikely(err)) { + SSDFS_ERR("fail to initialized allocated item: " + "index %d, err %d\n", + start + i, err); + goto finish_allocate_item; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.count == 0 || search->result.count >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&node->header_lock); + hdr = &node->raw.inodes_header; + le16_add_cpu(&hdr->valid_inodes, (u16)count); + down_read(&node->bmap_array.lock); + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_ALLOC_BMAP]; + bmap_bytes = node->bmap_array.bmap_bytes; + spin_lock(&bmap->lock); + ssdfs_memcpy(hdr->bmap, 0, bmap_bytes, + bmap->ptr, 0, bmap_bytes, + bmap_bytes); + spin_unlock(&bmap->lock); + up_read(&node->bmap_array.lock); + node->items_area.items_count += count; + used_space = (u32)node->items_area.item_size * count; + if (used_space > node->items_area.free_space) { + err = -ERANGE; + SSDFS_ERR("used_space %u > free_space %u\n", + used_space, + node->items_area.free_space); + goto finish_change_node_header; + } else + node->items_area.free_space -= used_space; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_count %u, area_size %u, " + "free_space %u, valid_inodes %u\n", + node->items_area.items_count, + node->items_area.area_size, + node->items_area.free_space, + le16_to_cpu(hdr->valid_inodes)); +#endif /* CONFIG_SSDFS_DEBUG */ + + up_write(&node->header_lock); + +finish_change_node_header: + if (unlikely(err)) + goto finish_allocate_item; + + err = ssdfs_set_node_header_dirty(node, items_capacity); + if (unlikely(err)) { + SSDFS_ERR("fail to set header dirty: err %d\n", + err); + goto finish_allocate_item; + } + + err = ssdfs_set_dirty_items_range(node, items_capacity, + start, count); + if (unlikely(err)) { + SSDFS_ERR("fail to set items range as dirty: " + "start %u, count %u, err %d\n", + start, count, err); + goto finish_allocate_item; + } + +finish_allocate_item: + ssdfs_unlock_items_range(node, (u16)start, (u16)count); + up_read(&node->full_lock); + + if (unlikely(err)) + return err; + + itree = (struct ssdfs_inodes_btree_info *)node->tree; + + spin_lock(&itree->lock); + free_inodes = itree->free_inodes; + if (free_inodes < count) + err = -ERANGE; + else { + u64 upper_bound = start_hash + start + count - 1; + + itree->allocated_inodes += count; + itree->free_inodes -= count; + if (itree->upper_allocated_ino < upper_bound) + itree->upper_allocated_ino = upper_bound; + } + + upper_allocated_ino = itree->upper_allocated_ino; + allocated_inodes = itree->allocated_inodes; + free_inodes = itree->free_inodes; + inodes_capacity = itree->inodes_capacity; + spin_unlock(&itree->lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("upper_allocated_ino %llu, allocated_inodes %llu, " + "free_inodes %llu, inodes_capacity %llu\n", + itree->upper_allocated_ino, + itree->allocated_inodes, + itree->free_inodes, + itree->inodes_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) { + SSDFS_ERR("fail to correct free_inodes count: " + "free_inodes %llu, count %u, err %d\n", + free_inodes, count, err); + return err; + } + + return 0; +} + +/* + * ssdfs_inodes_btree_node_allocate_item() - allocate item in the node + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to allocate an item in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOSPC - node hasn't free items. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_inodes_btree_node_allocate_item(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + int state; + u64 start_hash; + u64 end_hash; + unsigned long start = ULONG_MAX; + unsigned int count = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); + + ssdfs_debug_btree_search_object(search); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search->result.state != SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND) { + SSDFS_ERR("invalid result's state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.err == -ENODATA) { + search->result.err = 0; + /* + * Node doesn't contain an item. + */ + } else if (search->result.err) { + SSDFS_WARN("invalid search result: err %d\n", + search->result.err); + return search->result.err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->request.count != 1); + BUG_ON(search->result.buf); + BUG_ON(search->result.buf_state != + SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->header_lock); + state = atomic_read(&node->items_area.state); + start_hash = node->items_area.start_hash; + end_hash = node->items_area.end_hash; + up_read(&node->header_lock); + + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + SSDFS_ERR("invalid area state %#x\n", + state); + return -ERANGE; + } + + err = ssdfs_define_allocated_range(search, + start_hash, end_hash, + &start, &count); + if (unlikely(err)) { + SSDFS_ERR("fail to define allocated range: err %d\n", + err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(start >= U16_MAX); + BUG_ON(count >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (count != 1) { + SSDFS_ERR("invalid count %u\n", + count); + return -ERANGE; + } + + err = __ssdfs_btree_node_allocate_range(node, search, + start, count); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate range " + "(start %lu, count %u), err %d\n", + start, count, err); + return err; + } + + ssdfs_debug_btree_node_object(node); + + return 0; +} + +/* + * ssdfs_inodes_btree_node_allocate_range() - allocate range of items + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to allocate a range of items in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOSPC - node hasn't free items. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_inodes_btree_node_allocate_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + int state; + u64 start_hash; + u64 end_hash; + unsigned long start = ULONG_MAX; + unsigned int count = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search->result.state != SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND) { + SSDFS_ERR("invalid result's state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.err == -ENODATA) { + search->result.err = 0; + /* + * Node doesn't contain an item. + */ + } else if (search->result.err) { + SSDFS_WARN("invalid search result: err %d\n", + search->result.err); + return search->result.err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.buf); + BUG_ON(search->result.buf_state != + SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->header_lock); + state = atomic_read(&node->items_area.state); + start_hash = node->items_area.start_hash; + end_hash = node->items_area.end_hash; + up_read(&node->header_lock); + + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + SSDFS_ERR("invalid area state %#x\n", + state); + return -ERANGE; + } + + err = ssdfs_define_allocated_range(search, + start_hash, end_hash, + &start, &count); + if (unlikely(err)) { + SSDFS_ERR("fail to define allocated range: err %d\n", + err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(start >= U16_MAX); + BUG_ON(count >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_btree_node_allocate_range(node, search, + start, count); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate range " + "(start %lu, count %u), err %d\n", + start, count, err); + return err; + } + + return 0; +} + +static +int ssdfs_inodes_btree_node_insert_item(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("operation is unavailable\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EOPNOTSUPP; +} + +/* + * __ssdfs_inodes_btree_node_insert_range() - insert range into node + * @node: pointer on node object + * @search: search object + * + * This method tries to insert the range of inodes into the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int __ssdfs_inodes_btree_node_insert_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree *tree; + struct ssdfs_inodes_btree_info *itree; + struct ssdfs_inodes_btree_node_header *hdr; + struct ssdfs_btree_node_items_area items_area; + size_t item_size = sizeof(struct ssdfs_inode); + struct ssdfs_btree_index_key key; + u16 item_index; + int free_items; + u16 inodes_count = 0; + u32 used_space; + u16 items_count = 0; + u16 valid_inodes = 0; + u64 free_inodes; + u64 allocated_inodes; + u64 inodes_capacity; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid items_area state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + + tree = node->tree; + + switch (tree->type) { + case SSDFS_INODES_BTREE: + /* expected btree type */ + break; + + default: + SSDFS_ERR("invalid btree type %#x\n", tree->type); + return -ERANGE; + } + + itree = (struct ssdfs_inodes_btree_info *)node->tree; + + down_read(&node->header_lock); + ssdfs_memcpy(&items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + &node->items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + sizeof(struct ssdfs_btree_node_items_area)); + up_read(&node->header_lock); + + if (items_area.items_capacity == 0 || + items_area.items_capacity < items_area.items_count) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid items accounting: " + "node_id %u, items_capacity %u, items_count %u\n", + node->node_id, items_area.items_capacity, + items_area.items_count); + return -EFAULT; + } + + if (items_area.min_item_size != 0 || + items_area.max_item_size != item_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("min_item_size %u, max_item_size %u, " + "item_size %zu\n", + items_area.min_item_size, items_area.max_item_size, + item_size); + return -EFAULT; + } + + if (items_area.area_size == 0 || + items_area.area_size >= node->node_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid area_size %u\n", + items_area.area_size); + return -EFAULT; + } + + if (items_area.free_space > items_area.area_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("free_space %u > area_size %u\n", + items_area.free_space, items_area.area_size); + return -EFAULT; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_capacity %u, items_count %u\n", + items_area.items_capacity, + items_area.items_count); + SSDFS_DBG("area_size %u, free_space %u\n", + items_area.area_size, + items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + free_items = items_area.items_capacity - items_area.items_count; + if (unlikely(free_items < 0)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_WARN("invalid free_items %d\n", + free_items); + return -EFAULT; + } else if (free_items == 0) { + SSDFS_DBG("node hasn't free items\n"); + return -ENOSPC; + } + + if (free_items != items_area.items_capacity) { + SSDFS_WARN("free_items %d != items_capacity %u\n", + free_items, items_area.items_capacity); + return -ERANGE; + } + + if (((u64)free_items * item_size) > items_area.free_space) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid free_items: " + "free_items %d, item_size %zu, free_space %u\n", + free_items, item_size, items_area.free_space); + return -EFAULT; + } + + item_index = search->result.start_index; + if (item_index != 0) { + SSDFS_ERR("start_index != 0\n"); + return -ERANGE; + } else if ((item_index + search->request.count) >= items_area.items_capacity) { + SSDFS_ERR("invalid request: " + "item_index %u, count %u\n", + item_index, search->request.count); + return -ERANGE; + } + + down_write(&node->full_lock); + + inodes_count = search->request.count; + + if ((item_index + inodes_count) > items_area.items_capacity) { + err = -ERANGE; + SSDFS_ERR("invalid inodes_count: " + "item_index %u, inodes_count %u, " + "items_capacity %u\n", + item_index, inodes_count, + items_area.items_capacity); + goto finish_detect_affected_items; + } + + err = ssdfs_lock_items_range(node, item_index, inodes_count); + if (err == -ENOENT) { + up_write(&node->full_lock); + wake_up_all(&node->wait_queue); + return -ERANGE; + } else if (err == -ENODATA) { + up_write(&node->full_lock); + wake_up_all(&node->wait_queue); + return -ERANGE; + } else if (unlikely(err)) + BUG(); + +finish_detect_affected_items: + downgrade_write(&node->full_lock); + + if (unlikely(err)) + goto finish_insert_item; + + err = ssdfs_generic_insert_range(node, &items_area, + item_size, search); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to insert item: err %d\n", + err); + goto unlock_items_range; + } + + down_write(&node->header_lock); + + node->items_area.items_count += search->request.count; + if (node->items_area.items_count > node->items_area.items_capacity) { + err = -ERANGE; + SSDFS_ERR("items_count %u > items_capacity %u\n", + node->items_area.items_count, + node->items_area.items_capacity); + goto finish_items_area_correction; + } + items_count = node->items_area.items_count; + + hdr = &node->raw.inodes_header; + le16_add_cpu(&hdr->valid_inodes, (u16)search->request.count); + valid_inodes = le16_to_cpu(hdr->valid_inodes); + + used_space = (u32)search->request.count * item_size; + if (used_space > node->items_area.free_space) { + err = -ERANGE; + SSDFS_ERR("used_space %u > free_space %u\n", + used_space, + node->items_area.free_space); + goto finish_items_area_correction; + } + node->items_area.free_space -= used_space; + +finish_items_area_correction: + up_write(&node->header_lock); + + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + goto unlock_items_range; + } + + err = ssdfs_allocate_items_range(node, search, + items_area.items_capacity, + item_index, inodes_count); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate range: " + "start %u, len %u, err %d\n", + item_index, inodes_count, err); + goto unlock_items_range; + } + + err = ssdfs_set_node_header_dirty(node, items_area.items_capacity); + if (unlikely(err)) { + SSDFS_ERR("fail to set header dirty: err %d\n", + err); + goto unlock_items_range; + } + + err = ssdfs_set_dirty_items_range(node, items_area.items_capacity, + item_index, inodes_count); + if (unlikely(err)) { + SSDFS_ERR("fail to set items range as dirty: " + "start %u, count %u, err %d\n", + item_index, inodes_count, err); + goto unlock_items_range; + } + +unlock_items_range: + ssdfs_unlock_items_range(node, item_index, inodes_count); + +finish_insert_item: + up_read(&node->full_lock); + + if (unlikely(err)) + return err; + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_HYBRID_NODE: + spin_lock(&node->descriptor_lock); + ssdfs_memcpy(&key, + 0, sizeof(struct ssdfs_btree_index_key), + &node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + spin_unlock(&node->descriptor_lock); + + key.index.hash = cpu_to_le64(search->request.start.hash); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, node_type %#x, " + "node_height %u, hash %llx\n", + le32_to_cpu(key.node_id), + key.node_type, + key.height, + le64_to_cpu(key.index.hash)); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_btree_node_add_index(node, &key); + if (unlikely(err)) { + SSDFS_ERR("fail to add index: err %d\n", err); + return err; + } + break; + + default: + /* do nothing */ + break; + } + + spin_lock(&itree->lock); + free_inodes = itree->free_inodes; + if (free_inodes < search->request.count) + err = -ERANGE; + else { + itree->allocated_inodes += search->request.count; + itree->free_inodes -= search->request.count; + } + allocated_inodes = itree->allocated_inodes; + free_inodes = itree->free_inodes; + inodes_capacity = itree->inodes_capacity; + spin_unlock(&itree->lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("valid_inodes %u, items_count %u, " + "allocated_inodes %llu, " + "free_inodes %llu, inodes_capacity %llu, " + "search->request.count %u\n", + valid_inodes, items_count, + allocated_inodes, + free_inodes, inodes_capacity, + search->request.count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) { + SSDFS_ERR("fail to correct allocated_inodes count: " + "err %d\n", + err); + return err; + } + + ssdfs_debug_btree_node_object(node); + + return 0; +} + +/* + * ssdfs_inodes_btree_node_insert_range() - insert range of items + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to insert a range of items in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOSPC - node hasn't free items. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_inodes_btree_node_insert_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + int state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); + SSDFS_DBG("free_space %u\n", node->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid result's state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.err == -ENODATA) { + /* + * Node doesn't contain inserting items. + */ + } else if (search->result.err) { + SSDFS_WARN("invalid search result: err %d\n", + search->result.err); + return search->result.err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.count <= 1); + BUG_ON(!search->result.buf); + BUG_ON(search->result.buf_state != SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_read(&node->items_area.state); + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + SSDFS_ERR("invalid area state %#x\n", + state); + return -ERANGE; + } + + err = __ssdfs_inodes_btree_node_insert_range(node, search); + if (unlikely(err)) { + SSDFS_ERR("fail to insert range: " + "node_id %u, err %d\n", + node->node_id, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_space %u\n", node->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_inodes_btree_node_change_item() - change an item in the node + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to change an item in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_inodes_btree_node_change_item(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + int state; + u16 item_index; + u16 items_count; + u16 items_capacity; + u64 start_hash; + u64 end_hash; + u64 found_index; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); + + ssdfs_debug_btree_search_object(search); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search->result.state != SSDFS_BTREE_SEARCH_VALID_ITEM) { + SSDFS_ERR("invalid result's state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.err) { + SSDFS_WARN("invalid search result: err %d\n", + search->result.err); + return search->result.err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.count != 1); + BUG_ON(!search->result.buf); + BUG_ON(search->result.buf_state != SSDFS_BTREE_SEARCH_INLINE_BUFFER); + BUG_ON(search->result.items_in_buffer != 1); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->header_lock); + state = atomic_read(&node->items_area.state); + items_count = node->items_area.items_count; + items_capacity = node->items_area.items_capacity; + start_hash = node->items_area.start_hash; + end_hash = node->items_area.end_hash; + up_read(&node->header_lock); + + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + SSDFS_ERR("invalid area state %#x\n", + state); + return -ERANGE; + } + + if (items_capacity == 0 || items_capacity < items_count) { + SSDFS_ERR("invalid items accounting: " + "node_id %u, items_capacity %u, items_count %u\n", + search->node.id, items_capacity, items_count); + return -ERANGE; + } + + err = ssdfs_btree_node_check_hash_range(node, + items_count, + items_capacity, + start_hash, + end_hash, + search); + if (err) + return err; + + found_index = search->request.start.hash - start_hash; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(found_index >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if ((found_index + search->request.count) > items_capacity) { + SSDFS_ERR("invalid request: " + "found_index %llu, count %u, " + "items_capacity %u\n", + found_index, search->request.count, + items_capacity); + return -ERANGE; + } + + item_index = (u16)found_index; + + down_write(&node->full_lock); + + err = ssdfs_lock_items_range(node, item_index, search->result.count); + if (err == -ENOENT) { + up_write(&node->full_lock); + return -ERANGE; + } else if (err == -ENODATA) { + up_write(&node->full_lock); + wake_up_all(&node->wait_queue); + return -ERANGE; + } else if (unlikely(err)) + BUG(); + + downgrade_write(&node->full_lock); + + if (!is_ssdfs_node_items_range_allocated(node, items_capacity, + item_index, + search->result.count)) { + err = -ERANGE; + SSDFS_WARN("range wasn't be allocated: " + "start %u, count %u\n", + item_index, search->result.count); + goto finish_change_item; + } + + err = ssdfs_copy_item_into_node_unlocked(node, search, item_index, 0); + if (unlikely(err)) { + SSDFS_ERR("fail to copy item into the node: " + "item_index %u, err %d\n", + item_index, err); + goto finish_change_item; + } + + err = ssdfs_set_dirty_items_range(node, items_capacity, + item_index, + search->result.count); + if (unlikely(err)) { + SSDFS_ERR("fail to set items range as dirty: " + "start %u, count %u, err %d\n", + item_index, search->result.count, err); + goto finish_change_item; + } + + ssdfs_unlock_items_range(node, item_index, search->result.count); + +finish_change_item: + up_read(&node->full_lock); + + ssdfs_debug_btree_node_object(node); + + return err; +} + +/* + * ssdfs_correct_hybrid_node_items_area_hashes() - correct items area hashes + * @node: pointer on node object + */ +static +int ssdfs_correct_hybrid_node_hashes(struct ssdfs_btree_node *node) +{ + struct ssdfs_btree_index_key key; + size_t hdr_size = sizeof(struct ssdfs_inodes_btree_node_header); + u64 start_hash; + u64 end_hash; + u16 items_count; + u16 index_count; + u32 items_area_size; + u32 items_capacity; + u16 index_id; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_HYBRID_NODE: + /* expected node type */ + break; + + default: + return -ERANGE; + } + + down_write(&node->header_lock); + + items_count = node->items_area.items_count; + + if (items_count != 0) { + err = -ERANGE; + SSDFS_ERR("invalid request: items_count %u\n", + items_count); + goto unlock_header; + } + + index_count = node->index_area.index_count; + + if (index_count == 0) { + err = -ENODATA; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("do nothing: node %u is empty\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto unlock_header; + } + + index_id = index_count - 1; + err = ssdfs_btree_node_get_index(&node->content.pvec, + node->index_area.offset, + node->index_area.area_size, + node->node_size, + index_id, &key); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to extract index: " + "node_id %u, index %d, err %d\n", + node->node_id, index_id, err); + goto unlock_header; + } + + items_area_size = node->node_size - hdr_size; + items_capacity = items_area_size / node->tree->item_size; + + start_hash = le64_to_cpu(key.index.hash); + start_hash += items_capacity; + end_hash = start_hash + node->items_area.items_capacity - 1; + + node->items_area.start_hash = start_hash; + node->items_area.end_hash = end_hash; + +unlock_header: + up_write(&node->header_lock); + + if (err == -ENODATA) { + err = 0; + /* do nothing */ + goto finish_correct_hybrid_node_hashes; + } else if (unlikely(err)) { + /* finish logic */ + goto finish_correct_hybrid_node_hashes; + } + + spin_lock(&node->descriptor_lock); + ssdfs_memcpy(&key, + 0, sizeof(struct ssdfs_btree_index_key), + &node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + spin_unlock(&node->descriptor_lock); + + key.index.hash = cpu_to_le64(start_hash); + + err = ssdfs_btree_node_add_index(node, &key); + if (unlikely(err)) { + SSDFS_ERR("fail to add index: err %d\n", + err); + return err; + } + +finish_correct_hybrid_node_hashes: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, " + "items_area (start_hash %llx, end_hash %llx), " + "index_area (start_hash %llx, end_hash %llx)\n", + node->node_id, + node->items_area.start_hash, + node->items_area.end_hash, + node->index_area.start_hash, + node->index_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * __ssdfs_inodes_btree_node_delete_range() - delete range of items + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to delete a range of items from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_inodes_btree_node_delete_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + struct ssdfs_inodes_btree_info *itree; + struct ssdfs_inodes_btree_node_header *hdr; + struct ssdfs_state_bitmap *bmap; + int state; + u16 item_index; + u16 item_size; + u16 items_count; + u16 items_capacity; + u16 index_count = 0; + int free_items; + u64 start_hash; + u64 end_hash; + u64 old_hash; + u64 index_start_hash; + u64 index_end_hash; + u32 bmap_bytes; + u16 valid_inodes; + u64 allocated_inodes; + u64 free_inodes; + u64 inodes_capacity; + u32 area_size; + u32 freed_space; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search->result.state != SSDFS_BTREE_SEARCH_VALID_ITEM) { + SSDFS_ERR("invalid result's state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.err) { + SSDFS_WARN("invalid search result: err %d\n", + search->result.err); + return search->result.err; + } + + down_read(&node->header_lock); + state = atomic_read(&node->items_area.state); + item_size = node->items_area.item_size; + items_count = node->items_area.items_count; + items_capacity = node->items_area.items_capacity; + start_hash = node->items_area.start_hash; + end_hash = node->items_area.end_hash; + old_hash = start_hash; + up_read(&node->header_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_count %u, items_capacity %u, " + "node (start_hash %llx, end_hash %llx)\n", + items_count, items_capacity, + start_hash, end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + SSDFS_ERR("invalid area state %#x\n", + state); + return -ERANGE; + } + + if (items_capacity == 0 || items_capacity < items_count) { + SSDFS_ERR("invalid items accounting: " + "node_id %u, items_capacity %u, items_count %u\n", + search->node.id, items_capacity, items_count); + return -ERANGE; + } + + free_items = items_capacity - items_count; + if (unlikely(free_items < 0 || free_items > items_capacity)) { + SSDFS_WARN("invalid free_items %d\n", + free_items); + return -ERANGE; + } else if (free_items == items_capacity) { + SSDFS_DBG("node hasn't any items\n"); + return 0; + } + + item_index = search->result.start_index; + if ((item_index + search->request.count) > items_capacity) { + SSDFS_ERR("invalid request: " + "item_index %u, count %u, " + "items_capacity %u\n", + item_index, search->request.count, + items_capacity); + return -ERANGE; + } + + if ((start_hash + item_index) != search->request.start.hash) { + SSDFS_WARN("node (start_hash %llx, index %u), " + "request (start_hash %llx, end_hash %llx)\n", + start_hash, item_index, + search->request.start.hash, + search->request.end.hash); + return -ERANGE; + } + + down_write(&node->full_lock); + + err = ssdfs_lock_items_range(node, item_index, search->request.count); + if (err == -ENOENT) { + up_write(&node->full_lock); + return -ERANGE; + } else if (err == -ENODATA) { + up_write(&node->full_lock); + wake_up_all(&node->wait_queue); + return -ERANGE; + } else if (unlikely(err)) + BUG(); + + downgrade_write(&node->full_lock); + + if (!is_ssdfs_node_items_range_allocated(node, items_capacity, + item_index, + search->result.count)) { + err = -ERANGE; + SSDFS_WARN("range wasn't be allocated: " + "start %u, count %u\n", + item_index, search->result.count); + goto finish_delete_range; + } + + err = ssdfs_free_items_range(node, item_index, search->result.count); + if (unlikely(err)) { + SSDFS_ERR("fail to free range: " + "start %u, count %u, err %d\n", + item_index, search->result.count, err); + goto finish_delete_range; + } + + err = ssdfs_btree_node_clear_range(node, &node->items_area, + item_size, search); + if (unlikely(err)) { + SSDFS_ERR("fail to clear items range: err %d\n", + err); + goto finish_delete_range; + } + + err = ssdfs_set_dirty_items_range(node, items_capacity, + item_index, + search->result.count); + if (unlikely(err)) { + SSDFS_ERR("fail to set items range as dirty: " + "start %u, count %u, err %d\n", + item_index, search->result.count, err); + goto finish_delete_range; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.count == 0 || search->result.count >= U16_MAX); + BUG_ON(search->request.count != search->result.count); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&node->header_lock); + + hdr = &node->raw.inodes_header; + valid_inodes = le16_to_cpu(hdr->valid_inodes); +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(valid_inodes < search->result.count); +#endif /* CONFIG_SSDFS_DEBUG */ + hdr->valid_inodes = cpu_to_le16(valid_inodes - search->result.count); + valid_inodes = le16_to_cpu(hdr->valid_inodes); + down_read(&node->bmap_array.lock); + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_ALLOC_BMAP]; + bmap_bytes = node->bmap_array.bmap_bytes; + spin_lock(&bmap->lock); + ssdfs_memcpy(hdr->bmap, 0, bmap_bytes, + bmap->ptr, 0, bmap_bytes, + bmap_bytes); + spin_unlock(&bmap->lock); + up_read(&node->bmap_array.lock); + node->items_area.items_count -= search->result.count; + area_size = node->items_area.area_size; + freed_space = (u32)node->items_area.item_size * search->result.count; + if ((node->items_area.free_space + freed_space) > area_size) { + err = -ERANGE; + SSDFS_ERR("freed_space %u, free_space %u, area_size %u\n", + freed_space, + node->items_area.free_space, + area_size); + goto finish_change_node_header; + } else + node->items_area.free_space += freed_space; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_count %u, valid_inodes %u, " + "area_size %u, free_space %u, " + "node (start_hash %llx, end_hash %llx)\n", + node->items_area.items_count, + valid_inodes, + node->items_area.area_size, + node->items_area.free_space, + node->items_area.start_hash, + node->items_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + up_write(&node->header_lock); + +finish_change_node_header: + if (unlikely(err)) + goto finish_delete_range; + + err = ssdfs_set_node_header_dirty(node, items_capacity); + if (unlikely(err)) { + SSDFS_ERR("fail to set header dirty: err %d\n", + err); + goto finish_delete_range; + } + +finish_delete_range: + ssdfs_unlock_items_range(node, item_index, search->request.count); + up_read(&node->full_lock); + + if (unlikely(err)) + return err; + + down_read(&node->header_lock); + items_count = node->items_area.items_count; + start_hash = node->items_area.start_hash; + end_hash = node->items_area.end_hash; + index_count = node->index_area.index_count; + index_start_hash = node->index_area.start_hash; + index_end_hash = node->index_area.end_hash; + up_read(&node->header_lock); + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_HYBRID_NODE: + state = atomic_read(&node->index_area.state); + + if (state != SSDFS_BTREE_NODE_INDEX_AREA_EXIST) { + SSDFS_ERR("invalid area state %#x\n", + state); + return -ERANGE; + } + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_DELETE_RANGE: + case SSDFS_BTREE_SEARCH_DELETE_ALL: + /* + * Moving all items into a leaf node + */ + if (items_count == 0) { + err = ssdfs_btree_node_delete_index(node, + old_hash); + if (unlikely(err)) { + SSDFS_ERR("fail to delete index: " + "old_hash %llx, err %d\n", + old_hash, err); + return err; + } + + if (index_count > 0) + index_count--; + } else { + SSDFS_WARN("unexpected items_count %u\n", + items_count); + return -ERANGE; + } + break; + + case SSDFS_BTREE_SEARCH_DELETE_ITEM: + if (items_count == 0) { + err = ssdfs_btree_node_delete_index(node, + old_hash); + if (unlikely(err)) { + SSDFS_ERR("fail to delete index: " + "old_hash %llx, err %d\n", + old_hash, err); + return err; + } + + if (index_count > 0) + index_count--; + + err = ssdfs_correct_hybrid_node_hashes(node); + if (unlikely(err)) { + SSDFS_ERR("fail to correct hybrid nodes: " + "err %d\n", err); + return err; + } + + down_read(&node->header_lock); + start_hash = node->items_area.start_hash; + end_hash = node->items_area.end_hash; + up_read(&node->header_lock); + } + break; + + default: + BUG(); + } + break; + + default: + /* do nothing */ + break; + } + + itree = (struct ssdfs_inodes_btree_info *)node->tree; + + spin_lock(&itree->lock); + free_inodes = itree->free_inodes; + inodes_capacity = itree->inodes_capacity; + if (itree->allocated_inodes < search->request.count) + err = -ERANGE; + else if ((free_inodes + search->request.count) > inodes_capacity) + err = -ERANGE; + else { + itree->allocated_inodes -= search->request.count; + itree->free_inodes += search->request.count; + } + free_inodes = itree->free_inodes; + allocated_inodes = itree->allocated_inodes; + spin_unlock(&itree->lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("valid_inodes %u, allocated_inodes %llu, " + "free_inodes %llu, inodes_capacity %llu, " + "search->request.count %u\n", + valid_inodes, allocated_inodes, + free_inodes, inodes_capacity, + search->request.count); + SSDFS_DBG("items_area (start_hash %llx, end_hash %llx), " + "index_area (start_hash %llx, end_hash %llx), " + "valid_inodes %u, index_count %u\n", + start_hash, end_hash, + index_start_hash, index_end_hash, + valid_inodes, index_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) { + SSDFS_ERR("fail to correct allocated_inodes count: " + "err %d\n", + err); + return err; + } + + if (valid_inodes == 0 && index_count == 0) { + search->result.state = SSDFS_BTREE_SEARCH_PLEASE_DELETE_NODE; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PLEASE, DELETE node_id %u\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + } else + search->result.state = SSDFS_BTREE_SEARCH_OBSOLETE_RESULT; + + ssdfs_debug_btree_node_object(node); + + return 0; +} + +/* + * ssdfs_inodes_btree_node_delete_item() - delete an item from the node + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to delete an item from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_inodes_btree_node_delete_item(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); + + BUG_ON(search->result.count != 1); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_inodes_btree_node_delete_range(node, search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete inode: err %d\n", + err); + return err; + } + + return 0; +} + +/* + * ssdfs_inodes_btree_node_delete_range() - delete a range of items + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to delete a range of items from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_inodes_btree_node_delete_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_inodes_btree_node_delete_range(node, search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete inodes range: err %d\n", + err); + return err; + } + + return 0; +} + +/* + * ssdfs_inodes_btree_node_extract_range() - extract range of items from node + * @node: pointer on node object + * @start_index: starting index of the range + * @count: count of items in the range + * @search: pointer on search request object + * + * This method tries to extract a range of items from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + * %-ENOMEM - fail to allocate memory. + * %-ENODATA - no such range in the node. + */ +static +int ssdfs_inodes_btree_node_extract_range(struct ssdfs_btree_node *node, + u16 start_index, u16 count, + struct ssdfs_btree_search *search) +{ + struct ssdfs_inode *inode; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_index %u, count %u, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + start_index, count, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->full_lock); + err = __ssdfs_btree_node_extract_range(node, start_index, count, + sizeof(struct ssdfs_inode), + search); + up_read(&node->full_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to extract a range: " + "start %u, count %u, err %d\n", + start_index, count, err); + return err; + } + + search->request.flags = + SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT; + inode = (struct ssdfs_inode *)search->result.buf; + search->request.start.hash = le64_to_cpu(inode->ino); + inode += search->result.count - 1; + search->request.end.hash = le64_to_cpu(inode->ino); + search->request.count = count; + + return 0; +} + +static +int ssdfs_inodes_btree_resize_items_area(struct ssdfs_btree_node *node, + u32 new_size) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("operation is unavailable\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EOPNOTSUPP; +} + +void ssdfs_debug_inodes_btree_object(struct ssdfs_inodes_btree_info *tree) +{ +#ifdef CONFIG_SSDFS_DEBUG + struct list_head *this, *next; + + BUG_ON(!tree); + + SSDFS_DBG("INODES TREE: is_locked %d, upper_allocated_ino %llu, " + "allocated_inodes %llu, free_inodes %llu, " + "inodes_capacity %llu, leaf_nodes %u, " + "nodes_count %u\n", + spin_is_locked(&tree->lock), + tree->upper_allocated_ino, + tree->allocated_inodes, + tree->free_inodes, + tree->inodes_capacity, + tree->leaf_nodes, + tree->nodes_count); + + ssdfs_debug_btree_object(&tree->generic_tree); + + SSDFS_DBG("ROOT FOLDER: magic %#x, mode %#x, flags %#x, " + "uid %u, gid %u, atime %llu, ctime %llu, " + "mtime %llu, birthtime %llu, " + "atime_nsec %u, ctime_nsec %u, mtime_nsec %u, " + "birthtime_nsec %u, generation %llu, " + "size %llu, blocks %llu, parent_ino %llu, " + "refcount %u, checksum %#x, ino %llu, " + "hash_code %llu, name_len %u, " + "private_flags %#x, dentries %u\n", + le16_to_cpu(tree->root_folder.magic), + le16_to_cpu(tree->root_folder.mode), + le32_to_cpu(tree->root_folder.flags), + le32_to_cpu(tree->root_folder.uid), + le32_to_cpu(tree->root_folder.gid), + le64_to_cpu(tree->root_folder.atime), + le64_to_cpu(tree->root_folder.ctime), + le64_to_cpu(tree->root_folder.mtime), + le64_to_cpu(tree->root_folder.birthtime), + le32_to_cpu(tree->root_folder.atime_nsec), + le32_to_cpu(tree->root_folder.ctime_nsec), + le32_to_cpu(tree->root_folder.mtime_nsec), + le32_to_cpu(tree->root_folder.birthtime_nsec), + le64_to_cpu(tree->root_folder.generation), + le64_to_cpu(tree->root_folder.size), + le64_to_cpu(tree->root_folder.blocks), + le64_to_cpu(tree->root_folder.parent_ino), + le32_to_cpu(tree->root_folder.refcount), + le32_to_cpu(tree->root_folder.checksum), + le64_to_cpu(tree->root_folder.ino), + le64_to_cpu(tree->root_folder.hash_code), + le16_to_cpu(tree->root_folder.name_len), + le16_to_cpu(tree->root_folder.private_flags), + le32_to_cpu(tree->root_folder.count_of.dentries)); + + SSDFS_DBG("PRIVATE AREA DUMP:\n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + &tree->root_folder.internal[0], + sizeof(struct ssdfs_inode_private_area)); + SSDFS_DBG("\n"); + + if (!list_empty_careful(&tree->free_inodes_queue.list)) { + SSDFS_DBG("FREE INODES RANGES:\n"); + + list_for_each_safe(this, next, &tree->free_inodes_queue.list) { + struct ssdfs_inodes_btree_range *range; + + range = list_entry(this, + struct ssdfs_inodes_btree_range, + list); + + if (range) { + SSDFS_DBG("[node_id %u, start_hash %llx, " + "start_index %u, count %u], ", + range->node_id, + range->area.start_hash, + range->area.start_index, + range->area.count); + } + } + + SSDFS_DBG("\n"); + } +#endif /* CONFIG_SSDFS_DEBUG */ +} + +const struct ssdfs_btree_descriptor_operations ssdfs_inodes_btree_desc_ops = { + .init = ssdfs_inodes_btree_desc_init, + .flush = ssdfs_inodes_btree_desc_flush, +}; + +const struct ssdfs_btree_operations ssdfs_inodes_btree_ops = { + .create_root_node = ssdfs_inodes_btree_create_root_node, + .create_node = ssdfs_inodes_btree_create_node, + .init_node = ssdfs_inodes_btree_init_node, + .destroy_node = ssdfs_inodes_btree_destroy_node, + .add_node = ssdfs_inodes_btree_add_node, + .delete_node = ssdfs_inodes_btree_delete_node, + .pre_flush_root_node = ssdfs_inodes_btree_pre_flush_root_node, + .flush_root_node = ssdfs_inodes_btree_flush_root_node, + .pre_flush_node = ssdfs_inodes_btree_pre_flush_node, + .flush_node = ssdfs_inodes_btree_flush_node, +}; + +const struct ssdfs_btree_node_operations ssdfs_inodes_btree_node_ops = { + .find_item = ssdfs_inodes_btree_node_find_item, + .find_range = ssdfs_inodes_btree_node_find_range, + .extract_range = ssdfs_inodes_btree_node_extract_range, + .allocate_item = ssdfs_inodes_btree_node_allocate_item, + .allocate_range = ssdfs_inodes_btree_node_allocate_range, + .insert_item = ssdfs_inodes_btree_node_insert_item, + .insert_range = ssdfs_inodes_btree_node_insert_range, + .change_item = ssdfs_inodes_btree_node_change_item, + .delete_item = ssdfs_inodes_btree_node_delete_item, + .delete_range = ssdfs_inodes_btree_node_delete_range, + .resize_items_area = ssdfs_inodes_btree_resize_items_area, +}; From patchwork Sat Feb 25 01:09:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151967 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B5DCC7EE32 for ; Sat, 25 Feb 2023 01:19:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229789AbjBYBT5 (ORCPT ); Fri, 24 Feb 2023 20:19:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50394 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229688AbjBYBTK (ORCPT ); Fri, 24 Feb 2023 20:19:10 -0500 Received: from mail-oi1-x229.google.com (mail-oi1-x229.google.com [IPv6:2607:f8b0:4864:20::229]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3D74F268F for ; Fri, 24 Feb 2023 17:17:49 -0800 (PST) Received: by mail-oi1-x229.google.com with SMTP id q15so781827oiw.11 for ; Fri, 24 Feb 2023 17:17:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=mWOMVD44xQ6+Wbo0BRb4CTj9Bc/5AQGNyo54vDsZCYo=; b=DdzbwAYAkPEhYSqXz+V4pgEJFVC9cXY03nedfRX+zRMnVhX5gGzcziwLF+9UX/5xFZ u1WO2+m6Yqm6jCgfuFPk2F9iY2x7K6QS7SwQ6cwr0pcbgfIqZi3Cvb1O6sISnPHs+kkQ H+U39I/Ilq4tUzq6c87ps42yK75zStSE4sOZBlGY4T31RvmR9oUCDstpoZPBu65hI0cK p4w8l3kcAbUP+/sdb1mC/fubM+cerhYqgd8ZugiRo4A5TPqYbd2kQFhAjBOhYtnv4xkp sMaEyWIkjF7IBQx2j6ZN/n0ipi8zpDYt0MsS8NarRz0q13h9T/pwECsui+qMreoXSg+G u+sw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mWOMVD44xQ6+Wbo0BRb4CTj9Bc/5AQGNyo54vDsZCYo=; b=WSEdiCRNPyuQqICfByALDfqfr+58v1e5AY9mhn6Llsl1Htca3IG7CJIKZz50+KWnAQ QzDAiNxk0eG1q6bKPf8A/BEFWZaf5gcDWBORIzE7NyfwPHWbwxEGJOepqPhQh+mPy0OR qlSBrHICchI0T9C0Gp6hi+PcdMZeE22Bo8/EU0BvY9JQ4wEfJrDhorvQ9LDwRBXD/jZK 1cCjryMYErg1On8sEXQQJ/YeKA5tN1Lxj2aLqcPHi/9k4WU2nP/RkpVnqQ79W5KrUqZx SNtmZF6nqsKXOrL5ITNx+CSp+RCpWQpmWuX8c7l1PH1XIvbT/fSbKCxYNqlO2E4ygUof +L3Q== X-Gm-Message-State: AO0yUKVdav5qp8+FFKIaZsIMDma6P0b94ECihhtSO1X0pYBiBT0UTifj XH4LL+Nol0APGR4FUVEUJZtf2YpW+20v8JlL X-Google-Smtp-Source: AK7set8iT1fUafBTHd5KiB5ve9ZXaOIJmDBS95Q/a3rHbFoxycrnq0LxXw2/iVaTtoCJwsPcf3HF8g== X-Received: by 2002:a05:6808:253:b0:383:f380:8694 with SMTP id m19-20020a056808025300b00383f3808694mr1936167oie.18.1677287866427; Fri, 24 Feb 2023 17:17:46 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:45 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 62/76] ssdfs: introduce dentries b-tree Date: Fri, 24 Feb 2023 17:09:13 -0800 Message-Id: <20230225010927.813929-63-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org SSDFS dentry is the metadata structure of fixed size (32 bytes). It contains inode ID, name hash, name length, and inline string for 12 symbols. Generally speaking, the dentry is able to store 8.3 filename inline. If the name of file/folder has longer name (more than 12 symbols) then the dentry will keep only the portion of the name but the whole name will be stored into a shared dictionary. The goal of such approach is to represent the dentry by compact metadata structure of fixed size for the fast and efficient operations with the dentries. It is possible to point out that there are a lot of use-cases when the length of file or folder is not very long. As a result, dentry’s inline string could be only storage for the file/folder name. Moreover, the goal of shared dictionary is to store the long names efficiently by means of using the deduplication mechanism. Dentries b-tree is the hybrid b-tree with the root node is stored into the private inode’s area. By default, inode’s private area has 128 bytes in size. Also SSDFS dentry has 32 bytes in size. As a result, inode’s private area provides enough space for 4 inline dentries. Generally speaking, if a folder contains 4 or lesser files then the dentries can be stored into the inode’s private area without the necessity to create the dentries b-tree. Otherwise, if a folder includes more than 4 files or folders then it needs to create the regular dentries b-tree with the root node is stored into the private area of inode. Actually, every node of dentries b-tree contains the header, index area (for the case of hybrid node), and array of dentries are ordered by hash value of filename. Moreover, if a b-tree node has 8 KB size then it is capable to contain maximum 256 dentries. Generally speaking, the hybrid b-tree was opted for the dentries metadata structure by virtue of compactness of metadata structure representation and efficient lookup mechanism. Dentries is ordered on the basis of name’s hash. Every node of dentries b-tree has: (1) dirty bitmap - tracking modified dentries, (2) lock bitmap - exclusive locking of particular dentries without the necessity to lock the whole b-tree node. Actually, it is expected that dentries b-tree could contain not many nodes in average because the two nodes (8K in size) of dentries b-tree is capable to store about 400 dentries. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/dentries_tree.c | 3013 ++++++++++++++++++++++++++++++++++++++ fs/ssdfs/dentries_tree.h | 156 ++ 2 files changed, 3169 insertions(+) create mode 100644 fs/ssdfs/dentries_tree.c create mode 100644 fs/ssdfs/dentries_tree.h diff --git a/fs/ssdfs/dentries_tree.c b/fs/ssdfs/dentries_tree.c new file mode 100644 index 000000000000..8c2ce87d1077 --- /dev/null +++ b/fs/ssdfs/dentries_tree.c @@ -0,0 +1,3013 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/dentries_tree.c - dentries btree implementation. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "btree_search.h" +#include "btree_node.h" +#include "btree.h" +#include "shared_dictionary.h" +#include "segment_tree.h" +#include "dentries_tree.h" + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_dentries_page_leaks; +atomic64_t ssdfs_dentries_memory_leaks; +atomic64_t ssdfs_dentries_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_dentries_cache_leaks_increment(void *kaddr) + * void ssdfs_dentries_cache_leaks_decrement(void *kaddr) + * void *ssdfs_dentries_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_dentries_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_dentries_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_dentries_kfree(void *kaddr) + * struct page *ssdfs_dentries_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_dentries_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_dentries_free_page(struct page *page) + * void ssdfs_dentries_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(dentries) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(dentries) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_dentries_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_dentries_page_leaks, 0); + atomic64_set(&ssdfs_dentries_memory_leaks, 0); + atomic64_set(&ssdfs_dentries_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_dentries_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_dentries_page_leaks) != 0) { + SSDFS_ERR("DENTRIES TREE: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_dentries_page_leaks)); + } + + if (atomic64_read(&ssdfs_dentries_memory_leaks) != 0) { + SSDFS_ERR("DENTRIES TREE: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_dentries_memory_leaks)); + } + + if (atomic64_read(&ssdfs_dentries_cache_leaks) != 0) { + SSDFS_ERR("DENTRIES TREE: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_dentries_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +#define S_SHIFT 12 +static unsigned char +ssdfs_type_by_mode[S_IFMT >> S_SHIFT] = { + [S_IFREG >> S_SHIFT] = SSDFS_FT_REG_FILE, + [S_IFDIR >> S_SHIFT] = SSDFS_FT_DIR, + [S_IFCHR >> S_SHIFT] = SSDFS_FT_CHRDEV, + [S_IFBLK >> S_SHIFT] = SSDFS_FT_BLKDEV, + [S_IFIFO >> S_SHIFT] = SSDFS_FT_FIFO, + [S_IFSOCK >> S_SHIFT] = SSDFS_FT_SOCK, + [S_IFLNK >> S_SHIFT] = SSDFS_FT_SYMLINK, +}; + +static inline +void ssdfs_set_file_type(struct ssdfs_dir_entry *de, struct inode *inode) +{ + umode_t mode = inode->i_mode; + + de->file_type = ssdfs_type_by_mode[(mode & S_IFMT)>>S_SHIFT]; +} + +/* + * ssdfs_dentries_tree_create() - create dentries tree of a new inode + * @fsi: pointer on shared file system object + * @ii: pointer on in-core SSDFS inode + * + * This method tries to create dentries btree for a new inode. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - unable to allocate memory. + */ +int ssdfs_dentries_tree_create(struct ssdfs_fs_info *fsi, + struct ssdfs_inode_info *ii) +{ + struct ssdfs_dentries_btree_info *ptr; + size_t dentry_size = sizeof(struct ssdfs_dir_entry); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !ii); + BUG_ON(!rwsem_is_locked(&ii->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("ii %p, ino %lu\n", + ii, ii->vfs_inode.i_ino); +#else + SSDFS_DBG("ii %p, ino %lu\n", + ii, ii->vfs_inode.i_ino); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (S_ISDIR(ii->vfs_inode.i_mode)) + ii->dentries_tree = NULL; + else { + SSDFS_WARN("regular file cannot have dentries tree\n"); + return -ERANGE; + } + + ptr = ssdfs_dentries_kzalloc(sizeof(struct ssdfs_dentries_btree_info), + GFP_KERNEL); + if (!ptr) { + SSDFS_ERR("fail to allocate dentries tree\n"); + return -ENOMEM; + } + + atomic_set(&ptr->state, SSDFS_DENTRIES_BTREE_UNKNOWN_STATE); + atomic_set(&ptr->type, SSDFS_INLINE_DENTRIES_ARRAY); + atomic64_set(&ptr->dentries_count, 0); + init_rwsem(&ptr->lock); + ptr->generic_tree = NULL; + memset(ptr->buffer.dentries, 0xFF, + dentry_size * SSDFS_INLINE_DENTRIES_COUNT); + ptr->inline_dentries = ptr->buffer.dentries; + memset(&ptr->root_buffer, 0xFF, + sizeof(struct ssdfs_btree_inline_root_node)); + ptr->root = NULL; + ssdfs_memcpy(&ptr->desc, + 0, sizeof(struct ssdfs_dentries_btree_descriptor), + &fsi->segs_tree->dentries_btree, + 0, sizeof(struct ssdfs_dentries_btree_descriptor), + sizeof(struct ssdfs_dentries_btree_descriptor)); + ptr->owner = ii; + ptr->fsi = fsi; + atomic_set(&ptr->state, SSDFS_DENTRIES_BTREE_CREATED); + + ssdfs_debug_dentries_btree_object(ptr); + + ii->dentries_tree = ptr; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} + +/* + * ssdfs_dentries_tree_destroy() - destroy dentries tree + * @ii: pointer on in-core SSDFS inode + */ +void ssdfs_dentries_tree_destroy(struct ssdfs_inode_info *ii) +{ + struct ssdfs_dentries_btree_info *tree; + size_t dentry_size = sizeof(struct ssdfs_dir_entry); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ii); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("ii %p, ino %lu\n", + ii, ii->vfs_inode.i_ino); +#else + SSDFS_DBG("ii %p, ino %lu\n", + ii, ii->vfs_inode.i_ino); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + tree = SSDFS_DTREE(ii); + + if (!tree) { + SSDFS_DBG("dentries tree is absent: ino %lu\n", + ii->vfs_inode.i_ino); + return; + } + + switch (atomic_read(&tree->state)) { + case SSDFS_DENTRIES_BTREE_CREATED: + case SSDFS_DENTRIES_BTREE_INITIALIZED: + /* expected state*/ + break; + + case SSDFS_DENTRIES_BTREE_CORRUPTED: + SSDFS_WARN("dentries tree is corrupted: " + "ino %lu\n", + ii->vfs_inode.i_ino); + break; + + case SSDFS_DENTRIES_BTREE_DIRTY: + if (atomic64_read(&tree->dentries_count) > 0) { + SSDFS_WARN("dentries tree is dirty: " + "ino %lu\n", + ii->vfs_inode.i_ino); + } else { + /* regular destroy */ + atomic_set(&tree->state, + SSDFS_DENTRIES_BTREE_UNKNOWN_STATE); + } + break; + + default: + SSDFS_WARN("invalid state of dentries tree: " + "ino %lu, state %#x\n", + ii->vfs_inode.i_ino, + atomic_read(&tree->state)); + return; + } + + if (rwsem_is_locked(&tree->lock)) { + /* inform about possible trouble */ + SSDFS_WARN("tree is locked under destruction\n"); + } + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_DENTRIES_ARRAY: + if (!tree->inline_dentries) { + SSDFS_WARN("empty inline_dentries pointer\n"); + memset(tree->buffer.dentries, 0xFF, + dentry_size * SSDFS_INLINE_DENTRIES_COUNT); + } else { + memset(tree->inline_dentries, 0xFF, + dentry_size * SSDFS_INLINE_DENTRIES_COUNT); + } + tree->inline_dentries = NULL; + break; + + case SSDFS_PRIVATE_DENTRIES_BTREE: + if (!tree->generic_tree) { + SSDFS_WARN("empty generic_tree pointer\n"); + ssdfs_btree_destroy(&tree->buffer.tree); + } else { + /* destroy tree via pointer */ + ssdfs_btree_destroy(tree->generic_tree); + } + tree->generic_tree = NULL; + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + SSDFS_WARN("invalid dentries btree state %#x\n", + atomic_read(&tree->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + } + + memset(&tree->root_buffer, 0xFF, + sizeof(struct ssdfs_btree_inline_root_node)); + tree->root = NULL; + + tree->owner = NULL; + tree->fsi = NULL; + + atomic_set(&tree->type, SSDFS_DENTRIES_BTREE_UNKNOWN_TYPE); + atomic_set(&tree->state, SSDFS_DENTRIES_BTREE_UNKNOWN_STATE); + + ssdfs_dentries_kfree(ii->dentries_tree); + ii->dentries_tree = NULL; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ +} + +/* + * ssdfs_dentries_tree_init() - init dentries tree for existing inode + * @fsi: pointer on shared file system object + * @ii: pointer on in-core SSDFS inode + * + * This method tries to create the dentries tree and to initialize + * the root node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - unable to allocate memory. + * %-ERANGE - internal error. + * %-EIO - corrupted raw on-disk inode. + */ +int ssdfs_dentries_tree_init(struct ssdfs_fs_info *fsi, + struct ssdfs_inode_info *ii) +{ + struct ssdfs_inode raw_inode; + struct ssdfs_btree_node *node; + struct ssdfs_dentries_btree_info *tree; + struct ssdfs_btree_inline_root_node *root_node; + u16 flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !ii); + BUG_ON(!rwsem_is_locked(&ii->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("fsi %p, ii %p, ino %lu\n", + fsi, ii, ii->vfs_inode.i_ino); +#else + SSDFS_DBG("fsi %p, ii %p, ino %lu\n", + fsi, ii, ii->vfs_inode.i_ino); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + tree = SSDFS_DTREE(ii); + if (!tree) { + SSDFS_DBG("dentries tree is absent: ino %lu\n", + ii->vfs_inode.i_ino); + return -ERANGE; + } + + ssdfs_memcpy(&raw_inode, + 0, sizeof(struct ssdfs_inode), + &ii->raw_inode, + 0, sizeof(struct ssdfs_inode), + sizeof(struct ssdfs_inode)); + + flags = le16_to_cpu(raw_inode.private_flags); + + switch (atomic_read(&tree->state)) { + case SSDFS_DENTRIES_BTREE_CREATED: + /* expected tree state */ + break; + + default: + SSDFS_WARN("unexpected state of tree %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + } + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_DENTRIES_ARRAY: + /* expected tree type */ + break; + + case SSDFS_PRIVATE_DENTRIES_BTREE: + SSDFS_WARN("unexpected type of tree %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + + default: + SSDFS_WARN("invalid type of tree %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + + down_write(&tree->lock); + + if (flags & SSDFS_INODE_HAS_DENTRIES_BTREE) { + atomic64_set(&tree->dentries_count, + le32_to_cpu(raw_inode.count_of.dentries)); + + if (tree->generic_tree) { + err = -ERANGE; + atomic_set(&tree->state, + SSDFS_DENTRIES_BTREE_CORRUPTED); + SSDFS_WARN("generic tree exists\n"); + goto finish_tree_init; + } + + tree->generic_tree = &tree->buffer.tree; + tree->inline_dentries = NULL; + atomic_set(&tree->type, SSDFS_PRIVATE_DENTRIES_BTREE); + + err = ssdfs_btree_create(fsi, + ii->vfs_inode.i_ino, + &ssdfs_dentries_btree_desc_ops, + &ssdfs_dentries_btree_ops, + tree->generic_tree); + if (unlikely(err)) { + atomic_set(&tree->state, + SSDFS_DENTRIES_BTREE_CORRUPTED); + SSDFS_ERR("fail to create dentries tree: err %d\n", + err); + goto finish_tree_init; + } + + err = ssdfs_btree_radix_tree_find(tree->generic_tree, + SSDFS_BTREE_ROOT_NODE_ID, + &node); + if (unlikely(err)) { + SSDFS_ERR("fail to get the root node: err %d\n", + err); + goto fail_create_generic_tree; + } else if (unlikely(!node)) { + err = -ERANGE; + SSDFS_WARN("empty node pointer\n"); + goto fail_create_generic_tree; + } + + root_node = &raw_inode.internal[0].area1.dentries_root; + err = ssdfs_btree_create_root_node(node, root_node); + if (unlikely(err)) { + SSDFS_ERR("fail to init the root node: err %d\n", + err); + goto fail_create_generic_tree; + } + + tree->root = &tree->root_buffer; + ssdfs_memcpy(tree->root, + 0, sizeof(struct ssdfs_btree_inline_root_node), + root_node, + 0, sizeof(struct ssdfs_btree_inline_root_node), + sizeof(struct ssdfs_btree_inline_root_node)); + + atomic_set(&tree->type, SSDFS_PRIVATE_DENTRIES_BTREE); + atomic_set(&tree->state, SSDFS_DENTRIES_BTREE_INITIALIZED); + +fail_create_generic_tree: + if (unlikely(err)) { + atomic_set(&tree->state, + SSDFS_DENTRIES_BTREE_CORRUPTED); + ssdfs_btree_destroy(tree->generic_tree); + tree->generic_tree = NULL; + goto finish_tree_init; + } + } else if (flags & SSDFS_INODE_HAS_XATTR_BTREE) { + atomic64_set(&tree->dentries_count, + le32_to_cpu(raw_inode.count_of.dentries)); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(atomic64_read(&tree->dentries_count) > + SSDFS_INLINE_DENTRIES_PER_AREA); +#else + if (atomic64_read(&tree->dentries_count) > + SSDFS_INLINE_DENTRIES_PER_AREA) { + err = -EIO; + atomic_set(&tree->state, + SSDFS_DENTRIES_BTREE_CORRUPTED); + SSDFS_ERR("corrupted on-disk raw inode: " + "dentries_count %llu\n", + (u64)atomic64_read(&tree->dentries_count)); + goto finish_tree_init; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!tree->inline_dentries) { + err = -ERANGE; + atomic_set(&tree->state, + SSDFS_DENTRIES_BTREE_CORRUPTED); + SSDFS_WARN("undefined inline dentries pointer\n"); + goto finish_tree_init; + } else { + ssdfs_memcpy(tree->inline_dentries, + 0, ssdfs_inline_dentries_size(), + &raw_inode.internal[0].area1, + 0, ssdfs_area_dentries_size(), + ssdfs_area_dentries_size()); + } + + atomic_set(&tree->type, SSDFS_INLINE_DENTRIES_ARRAY); + atomic_set(&tree->state, SSDFS_DENTRIES_BTREE_INITIALIZED); + } else if (flags & SSDFS_INODE_HAS_INLINE_DENTRIES) { + u32 dentries_count = le32_to_cpu(raw_inode.count_of.dentries); + u32 i; + + atomic64_set(&tree->dentries_count, dentries_count); + + if (!tree->inline_dentries) { + err = -ERANGE; + atomic_set(&tree->state, + SSDFS_DENTRIES_BTREE_CORRUPTED); + SSDFS_WARN("undefined inline dentries pointer\n"); + goto finish_tree_init; + } else { + ssdfs_memcpy(tree->inline_dentries, + 0, ssdfs_inline_dentries_size(), + &raw_inode.internal, + 0, ssdfs_inline_dentries_size(), + ssdfs_inline_dentries_size()); + } + + for (i = 0; i < dentries_count; i++) { + u64 hash; + struct ssdfs_dir_entry *dentry = + &tree->inline_dentries[i]; + + hash = le64_to_cpu(dentry->hash_code); + + if (hash == 0) { + size_t len = dentry->name_len; + const char *name = + (const char *)dentry->inline_string; + + if (len > SSDFS_DENTRY_INLINE_NAME_MAX_LEN) { + err = -ERANGE; + SSDFS_ERR("dentry hasn't hash code: " + "len %zu\n", len); + goto finish_tree_init; + } + + hash = __ssdfs_generate_name_hash(name, len, + SSDFS_DENTRY_INLINE_NAME_MAX_LEN); + if (hash == U64_MAX) { + err = -ERANGE; + SSDFS_ERR("fail to generate hash\n"); + goto finish_tree_init; + } + + dentry->hash_code = cpu_to_le64(hash); + } + } + + atomic_set(&tree->type, SSDFS_INLINE_DENTRIES_ARRAY); + atomic_set(&tree->state, SSDFS_DENTRIES_BTREE_INITIALIZED); + } else + BUG(); + +finish_tree_init: + up_write(&tree->lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dentries_count %llu\n", + atomic64_read(&tree->dentries_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ssdfs_debug_dentries_btree_object(tree); + + return err; +} + +/* + * ssdfs_migrate_inline2generic_tree() - convert inline tree into generic + * @tree: dentries tree + * + * This method tries to convert the inline tree into generic one. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EFAULT - the tree is empty. + */ +static +int ssdfs_migrate_inline2generic_tree(struct ssdfs_dentries_btree_info *tree) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_dir_entry dentries[SSDFS_INLINE_DENTRIES_COUNT]; + struct ssdfs_dir_entry *cur; + struct ssdfs_btree_search *search; + size_t dentry_size = sizeof(struct ssdfs_dir_entry); + size_t dentries_bytes; + s64 dentries_count, dentries_capacity; + int private_flags; + s64 i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !tree->fsi); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p\n", tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = tree->fsi; + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_DENTRIES_ARRAY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's type %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + + switch (atomic_read(&tree->state)) { + case SSDFS_DENTRIES_BTREE_CREATED: + case SSDFS_DENTRIES_BTREE_INITIALIZED: + case SSDFS_DENTRIES_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + dentries_count = atomic64_read(&tree->dentries_count); + + if (!tree->owner) { + SSDFS_ERR("empty owner inode\n"); + return -ERANGE; + } + + private_flags = atomic_read(&tree->owner->private_flags); + + dentries_capacity = SSDFS_INLINE_DENTRIES_COUNT; + if (private_flags & SSDFS_INODE_HAS_XATTR_BTREE) + dentries_capacity -= SSDFS_INLINE_DENTRIES_PER_AREA; + if (private_flags & SSDFS_INODE_HAS_DENTRIES_BTREE) { + SSDFS_ERR("the dentries tree is generic\n"); + return -ERANGE; + } + + if (dentries_count > dentries_capacity) { + SSDFS_WARN("dentries tree is corrupted: " + "dentries_count %lld, dentries_capacity %lld\n", + dentries_count, dentries_capacity); + atomic_set(&tree->state, SSDFS_DENTRIES_BTREE_CORRUPTED); + return -ERANGE; + } else if (dentries_count == 0) { + SSDFS_DBG("empty tree\n"); + return -EFAULT; + } else if (dentries_count < dentries_capacity) { + SSDFS_WARN("dentries_count %lld, dentries_capacity %lld\n", + dentries_count, dentries_capacity); + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree->inline_dentries || tree->generic_tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset(dentries, 0xFF, ssdfs_inline_dentries_size()); + + dentries_bytes = dentry_size * dentries_capacity; + ssdfs_memcpy(dentries, 0, ssdfs_inline_dentries_size(), + tree->inline_dentries, 0, ssdfs_inline_dentries_size(), + dentries_bytes); + + atomic64_sub(dentries_count, &tree->dentries_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dentries_count %llu\n", + atomic64_read(&tree->dentries_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < dentries_count; i++) { + cur = &dentries[i]; + + cur->dentry_type = SSDFS_REGULAR_DENTRY; + } + + tree->generic_tree = &tree->buffer.tree; + tree->inline_dentries = NULL; + + err = ssdfs_btree_create(fsi, + tree->owner->vfs_inode.i_ino, + &ssdfs_dentries_btree_desc_ops, + &ssdfs_dentries_btree_ops, + &tree->buffer.tree); + if (unlikely(err)) { + SSDFS_ERR("fail to create generic tree: err %d\n", + err); + goto recover_inline_tree; + } + + search = ssdfs_btree_search_alloc(); + if (!search) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate btree search object\n"); + goto destroy_generic_tree; + } + + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + search->request.flags = + SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT | + SSDFS_BTREE_SEARCH_HAS_VALID_INO; + cur = &dentries[0]; + search->request.start.hash = le64_to_cpu(cur->hash_code); + search->request.start.ino = le64_to_cpu(cur->ino); + if (dentries_count > 1) { + cur = &dentries[dentries_count - 1]; + search->request.end.hash = le64_to_cpu(cur->hash_code); + search->request.end.ino = le64_to_cpu(cur->ino); + } else { + search->request.end.hash = search->request.start.hash; + search->request.end.ino = search->request.start.ino; + } + search->request.count = (u16)dentries_count; + + err = ssdfs_btree_find_item(&tree->buffer.tree, search); + if (err == -ENODATA) { + /* expected error */ + err = 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find item: " + "start (hash %llx, ino %llu), " + "end (hash %llx, ino %llu), err %d\n", + search->request.start.hash, + search->request.start.ino, + search->request.end.hash, + search->request.end.ino, + err); + goto finish_add_range; + } + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + case SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE: + /* expected state */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid search result's state %#x\n", + search->result.state); + goto finish_add_range; + } + + if (search->result.buf) { + err = -ERANGE; + SSDFS_ERR("search->result.buf %p\n", + search->result.buf); + goto finish_add_range; + } + + if (dentries_count == 1) { + search->result.buf_state = SSDFS_BTREE_SEARCH_INLINE_BUFFER; + search->result.buf_size = sizeof(struct ssdfs_dir_entry); + search->result.items_in_buffer = dentries_count; + search->result.buf = &search->raw.dentry; + ssdfs_memcpy(&search->raw.dentry, 0, dentry_size, + dentries, 0, ssdfs_inline_dentries_size(), + search->result.buf_size); + } else { + err = ssdfs_btree_search_alloc_result_buf(search, + dentries_count * sizeof(struct ssdfs_dir_entry)); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate memory for buffer\n"); + goto finish_add_range; + } + + ssdfs_memcpy(search->result.buf, 0, search->result.buf_size, + dentries, 0, ssdfs_inline_dentries_size(), + search->result.buf_size); + search->result.items_in_buffer = (u16)dentries_count; + } + + search->request.type = SSDFS_BTREE_SEARCH_ADD_RANGE; + + err = ssdfs_btree_add_range(&tree->buffer.tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to add the range into tree: " + "start_hash %llx, end_hash %llx, err %d\n", + search->request.start.hash, + search->request.end.hash, + err); + goto finish_add_range; + } + +finish_add_range: + ssdfs_btree_search_free(search); + + if (unlikely(err)) + goto destroy_generic_tree; + + err = ssdfs_btree_synchronize_root_node(tree->generic_tree, + tree->root); + if (unlikely(err)) { + SSDFS_ERR("fail to synchronize the root node: " + "err %d\n", err); + goto destroy_generic_tree; + } + + atomic_set(&tree->type, SSDFS_PRIVATE_DENTRIES_BTREE); + atomic_set(&tree->state, SSDFS_DENTRIES_BTREE_DIRTY); + + atomic_or(SSDFS_INODE_HAS_DENTRIES_BTREE, + &tree->owner->private_flags); + atomic_and(~SSDFS_INODE_HAS_INLINE_DENTRIES, + &tree->owner->private_flags); + + return 0; + +destroy_generic_tree: + ssdfs_btree_destroy(&tree->buffer.tree); + +recover_inline_tree: + for (i = 0; i < dentries_count; i++) { + cur = &dentries[i]; + + cur->dentry_type = SSDFS_INLINE_DENTRY; + } + + ssdfs_memcpy(tree->buffer.dentries, 0, ssdfs_inline_dentries_size(), + dentries, 0, ssdfs_inline_dentries_size(), + ssdfs_inline_dentries_size()); + + tree->inline_dentries = tree->buffer.dentries; + tree->generic_tree = NULL; + + atomic64_set(&tree->dentries_count, dentries_count); + + return err; +} + +/* + * ssdfs_dentries_tree_flush() - save modified dentries tree + * @fsi: pointer on shared file system object + * @ii: pointer on in-core SSDFS inode + * + * This method tries to flush inode's dentries btree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_dentries_tree_flush(struct ssdfs_fs_info *fsi, + struct ssdfs_inode_info *ii) +{ + struct ssdfs_dentries_btree_info *tree; + size_t dentry_size = sizeof(struct ssdfs_dir_entry); + int flags; + u64 dentries_count; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !ii); + BUG_ON(!rwsem_is_locked(&ii->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("fsi %p, ii %p, ino %lu\n", + fsi, ii, ii->vfs_inode.i_ino); +#else + SSDFS_DBG("fsi %p, ii %p, ino %lu\n", + fsi, ii, ii->vfs_inode.i_ino); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + tree = SSDFS_DTREE(ii); + if (!tree) { + SSDFS_DBG("dentries tree is absent: ino %lu\n", + ii->vfs_inode.i_ino); + return -ERANGE; + } + + flags = atomic_read(&ii->private_flags); + + switch (atomic_read(&tree->state)) { + case SSDFS_DENTRIES_BTREE_DIRTY: + /* need to flush */ + break; + + case SSDFS_DENTRIES_BTREE_CREATED: + case SSDFS_DENTRIES_BTREE_INITIALIZED: + /* do nothing */ + return 0; + + case SSDFS_DENTRIES_BTREE_CORRUPTED: + SSDFS_DBG("dentries btree corrupted: ino %lu\n", + ii->vfs_inode.i_ino); + return -EOPNOTSUPP; + + default: + SSDFS_WARN("unexpected state of tree %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + } + + down_write(&tree->lock); + + dentries_count = atomic64_read(&tree->dentries_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dentries_count %llu\n", dentries_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (dentries_count >= U32_MAX) { + err = -EOPNOTSUPP; + SSDFS_ERR("fail to store dentries_count %llu\n", + dentries_count); + goto finish_dentries_tree_flush; + } + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_DENTRIES_ARRAY: + if (!tree->inline_dentries) { + err = -ERANGE; + atomic_set(&tree->state, + SSDFS_DENTRIES_BTREE_CORRUPTED); + SSDFS_WARN("undefined inline dentries pointer\n"); + goto finish_dentries_tree_flush; + } + + if (dentries_count == 0) { + flags = atomic_read(&ii->private_flags); + + if (flags & SSDFS_INODE_HAS_XATTR_BTREE) { + memset(&ii->raw_inode.internal[0].area1, 0xFF, + ssdfs_area_dentries_size()); + } else { + memset(&ii->raw_inode.internal, 0xFF, + ssdfs_inline_dentries_size()); + } + } else if (dentries_count <= SSDFS_INLINE_DENTRIES_PER_AREA) { + flags = atomic_read(&ii->private_flags); + + if (flags & SSDFS_INODE_HAS_XATTR_BTREE) { + memset(&ii->raw_inode.internal[0].area1, 0xFF, + ssdfs_area_dentries_size()); + ssdfs_memcpy(&ii->raw_inode.internal[0].area1, + 0, ssdfs_area_dentries_size(), + tree->inline_dentries, + 0, ssdfs_inline_dentries_size(), + dentries_count * dentry_size); + } else { + memset(&ii->raw_inode.internal, 0xFF, + ssdfs_inline_dentries_size()); + ssdfs_memcpy(&ii->raw_inode.internal, + 0, ssdfs_inline_dentries_size(), + tree->inline_dentries, + 0, ssdfs_inline_dentries_size(), + dentries_count * dentry_size); + } + } else if (dentries_count <= SSDFS_INLINE_DENTRIES_COUNT) { + flags = atomic_read(&ii->private_flags); + + if (flags & SSDFS_INODE_HAS_XATTR_BTREE) { + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree should be converted: " + "ino %lu\n", + ii->vfs_inode.i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + memset(&ii->raw_inode.internal, 0xFF, + ssdfs_inline_dentries_size()); + ssdfs_memcpy(&ii->raw_inode.internal, + 0, ssdfs_inline_dentries_size(), + tree->inline_dentries, + 0, ssdfs_inline_dentries_size(), + dentries_count * dentry_size); + } + + if (err == -EAGAIN) { + err = ssdfs_migrate_inline2generic_tree(tree); + if (unlikely(err)) { + atomic_set(&tree->state, + SSDFS_DENTRIES_BTREE_CORRUPTED); + SSDFS_ERR("fail to convert tree: " + "err %d\n", err); + goto finish_dentries_tree_flush; + } else + goto try_generic_tree_flush; + } + } else { + err = -ERANGE; + atomic_set(&tree->state, + SSDFS_DENTRIES_BTREE_CORRUPTED); + SSDFS_WARN("invalid dentries_count %llu\n", + (u64)atomic64_read(&tree->dentries_count)); + goto finish_dentries_tree_flush; + } + + atomic_or(SSDFS_INODE_HAS_INLINE_DENTRIES, + &ii->private_flags); + break; + + case SSDFS_PRIVATE_DENTRIES_BTREE: +try_generic_tree_flush: + if (!tree->generic_tree) { + err = -ERANGE; + atomic_set(&tree->state, + SSDFS_DENTRIES_BTREE_CORRUPTED); + SSDFS_WARN("undefined generic tree pointer\n"); + goto finish_dentries_tree_flush; + } + + err = ssdfs_btree_flush(tree->generic_tree); + if (unlikely(err)) { + SSDFS_ERR("fail to flush dentries btree: " + "ino %lu, err %d\n", + ii->vfs_inode.i_ino, err); + goto finish_dentries_tree_flush; + } + + if (!tree->root) { + err = -ERANGE; + atomic_set(&tree->state, + SSDFS_DENTRIES_BTREE_CORRUPTED); + SSDFS_WARN("undefined root node pointer\n"); + goto finish_dentries_tree_flush; + } + + ssdfs_memcpy(&ii->raw_inode.internal[0].area1.dentries_root, + 0, sizeof(struct ssdfs_btree_inline_root_node), + tree->root, + 0, sizeof(struct ssdfs_btree_inline_root_node), + sizeof(struct ssdfs_btree_inline_root_node)); + + atomic_or(SSDFS_INODE_HAS_DENTRIES_BTREE, + &ii->private_flags); + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid type of tree %#x\n", + atomic_read(&tree->type)); + goto finish_dentries_tree_flush; + } + + ii->raw_inode.count_of.dentries = cpu_to_le32((u32)dentries_count); + atomic_set(&tree->state, SSDFS_DENTRIES_BTREE_INITIALIZED); + +finish_dentries_tree_flush: + up_write(&tree->lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("RAW INODE DUMP\n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + &ii->raw_inode, + sizeof(struct ssdfs_inode)); + SSDFS_DBG("\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/****************************************************************************** + * DENTRIES TREE OBJECT FUNCTIONALITY * + ******************************************************************************/ + +/* + * need_initialize_dentries_btree_search() - check necessity to init the search + * @name_hash: name hash + * @search: search object + */ +static inline +bool need_initialize_dentries_btree_search(u64 name_hash, + struct ssdfs_btree_search *search) +{ + return need_initialize_btree_search(search) || + search->request.start.hash != name_hash; +} + +/* + * ssdfs_generate_name_hash() - generate a name's hash + * @str: string descriptor + */ +u64 ssdfs_generate_name_hash(const struct qstr *str) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!str); + + SSDFS_DBG("name %s, len %u\n", + str->name, str->len); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_generate_name_hash(str->name, str->len, + SSDFS_DENTRY_INLINE_NAME_MAX_LEN); +} + +/* + * ssdfs_check_dentry_for_request() - check dentry + * @fsi: pointer on shared file system object + * @dentry: pointer on dentry object + * @search: search object + * + * This method tries to check @dentry for the @search request. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EAGAIN - continue the search. + * %-ENODATA - possible place was found. + */ +static +int ssdfs_check_dentry_for_request(struct ssdfs_fs_info *fsi, + struct ssdfs_dir_entry *dentry, + struct ssdfs_btree_search *search) +{ + struct ssdfs_shared_dict_btree_info *dict; + u32 req_flags; + u64 search_hash; + u64 req_ino; + const char *req_name; + size_t req_name_len; + u64 hash_code; + u64 ino; + u8 dentry_type; + u8 file_type; + u8 flags; + u8 name_len; + int res, err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !dentry || !search); + + SSDFS_DBG("fsi %p, dentry %p, search %p\n", + fsi, dentry, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + dict = fsi->shdictree; + if (!dict) { + SSDFS_ERR("shared dictionary is absent\n"); + return -ERANGE; + } + + req_flags = search->request.flags; + search_hash = search->request.start.hash; + req_ino = search->request.start.ino; + req_name = search->request.start.name; + req_name_len = search->request.start.name_len; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search_hash %llx, req_ino %llu\n", + search_hash, req_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + hash_code = le64_to_cpu(dentry->hash_code); + ino = le64_to_cpu(dentry->ino); + dentry_type = dentry->dentry_type; + file_type = dentry->file_type; + flags = dentry->flags; + name_len = dentry->name_len; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hash_code %llx, ino %llu, " + "type %#x, file_type %#x, flags %#x, name_len %u\n", + hash_code, ino, dentry_type, + file_type, flags, name_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (dentry_type <= SSDFS_DENTRY_UNKNOWN_TYPE || + dentry_type >= SSDFS_DENTRY_TYPE_MAX) { + SSDFS_ERR("corrupted dentry: dentry_type %#x\n", + dentry_type); + return -EIO; + } + + if (file_type <= SSDFS_FT_UNKNOWN || + file_type >= SSDFS_FT_MAX) { + SSDFS_ERR("corrupted dentry: file_type %#x\n", + file_type); + return -EIO; + } + + if (hash_code != 0 && search_hash < hash_code) { + err = -ENODATA; + search->result.err = -ENODATA; + search->result.state = + SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND; + goto finish_check_dentry; + } else if (hash_code != 0 && search_hash > hash_code) { + /* continue the search */ + err = -EAGAIN; + goto finish_check_dentry; + } else { + /* search_hash == hash_code */ + + if (req_flags & SSDFS_BTREE_SEARCH_HAS_VALID_INO) { + if (req_ino < ino) { + /* hash collision case */ + err = -ENODATA; + search->result.err = -ENODATA; + search->result.state = + SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND; + goto finish_check_dentry; + } else if (req_ino == ino) { + search->result.state = + SSDFS_BTREE_SEARCH_VALID_ITEM; + goto extract_full_name; + } else { + /* hash collision case */ + /* continue the search */ + err = -EAGAIN; + goto finish_check_dentry; + } + } + + if (req_flags & SSDFS_BTREE_SEARCH_HAS_VALID_NAME) { + int res; + + if (!req_name) { + SSDFS_ERR("empty name pointer\n"); + return -ERANGE; + } + + name_len = min_t(u8, name_len, + SSDFS_DENTRY_INLINE_NAME_MAX_LEN); + res = strncmp(req_name, dentry->inline_string, + name_len); + if (res < 0) { + /* hash collision case */ + err = -ENODATA; + search->result.err = -ENODATA; + search->result.state = + SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND; + goto finish_check_dentry; + } else if (res == 0) { + search->result.state = + SSDFS_BTREE_SEARCH_VALID_ITEM; + goto extract_full_name; + } else { + /* hash collision case */ + /* continue the search */ + err = -EAGAIN; + goto finish_check_dentry; + } + } + +extract_full_name: + if (flags & SSDFS_DENTRY_HAS_EXTERNAL_STRING) { + err = ssdfs_shared_dict_get_name(dict, search_hash, + &search->name); + if (unlikely(err)) { + SSDFS_ERR("fail to extract the name: " + "hash %llx, err %d\n", + search_hash, err); + goto finish_check_dentry; + } + } else + goto finish_check_dentry; + + if (req_flags & SSDFS_BTREE_SEARCH_HAS_VALID_NAME) { + name_len = dentry->name_len; + + res = strncmp(req_name, search->name.str, + name_len); + if (res < 0) { + /* hash collision case */ + err = -ENODATA; + search->result.err = -ENODATA; + search->result.state = + SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND; + goto finish_check_dentry; + } else if (res == 0) { + search->result.state = + SSDFS_BTREE_SEARCH_VALID_ITEM; + goto finish_check_dentry; + } else { + /* hash collision case */ + /* continue the search */ + err = -EAGAIN; + goto finish_check_dentry; + } + } + } + +finish_check_dentry: + return err; +} + +/* + * ssdfs_dentries_tree_find_inline_dentry() - find inline dentry + * @tree: btree object + * @search: search object + * + * This method tries to find an inline dentry. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - possible place was found. + */ +static int +ssdfs_dentries_tree_find_inline_dentry(struct ssdfs_dentries_btree_info *tree, + struct ssdfs_btree_search *search) +{ + s64 dentries_count; + u32 req_flags; + s64 i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !tree->fsi || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, search %p\n", + tree, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (atomic_read(&tree->type) != SSDFS_INLINE_DENTRIES_ARRAY) { + SSDFS_ERR("invalid tree type %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + + ssdfs_btree_search_free_result_buf(search); + + dentries_count = atomic64_read(&tree->dentries_count); + + if (dentries_count < 0) { + SSDFS_ERR("invalid dentries_count %lld\n", + dentries_count); + return -ERANGE; + } else if (dentries_count == 0) { + SSDFS_DBG("empty tree\n"); + search->result.state = SSDFS_BTREE_SEARCH_OUT_OF_RANGE; + search->result.err = -ENODATA; + search->result.start_index = 0; + search->result.count = 0; + search->result.search_cno = ssdfs_current_cno(tree->fsi->sb); + search->result.buf_state = + SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + search->result.buf = NULL; + search->result.buf_size = 0; + search->result.items_in_buffer = 0; + return -ENODATA; + } else if (dentries_count > SSDFS_INLINE_DENTRIES_COUNT) { + SSDFS_ERR("invalid dentries_count %lld\n", + dentries_count); + return -ERANGE; + } + + if (!tree->inline_dentries) { + SSDFS_ERR("inline dentries haven't been initialized\n"); + return -ERANGE; + } + + req_flags = search->request.flags; + + for (i = 0; i < dentries_count; i++) { + struct ssdfs_dir_entry *dentry; + u64 hash_code; + u64 ino; + u8 type; + u8 flags; + u8 name_len; + + search->result.buf = NULL; + search->result.state = SSDFS_BTREE_SEARCH_UNKNOWN_RESULT; + + dentry = &tree->inline_dentries[i]; + hash_code = le64_to_cpu(dentry->hash_code); + ino = le64_to_cpu(dentry->ino); + type = dentry->dentry_type; + flags = dentry->flags; + name_len = dentry->name_len; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("i %llu, hash_code %llx, ino %llu, " + "type %#x, flags %#x, name_len %u\n", + (u64)i, hash_code, ino, type, flags, name_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (type != SSDFS_INLINE_DENTRY) { + SSDFS_ERR("corrupted dentry: " + "hash_code %llx, ino %llu, " + "type %#x, flags %#x\n", + hash_code, ino, + type, flags); + atomic_set(&tree->state, + SSDFS_DENTRIES_BTREE_CORRUPTED); + return -ERANGE; + } + + if (flags & ~SSDFS_DENTRY_FLAGS_MASK) { + SSDFS_ERR("corrupted dentry: " + "hash_code %llx, ino %llu, " + "type %#x, flags %#x\n", + hash_code, ino, + type, flags); + atomic_set(&tree->state, + SSDFS_DENTRIES_BTREE_CORRUPTED); + return -ERANGE; + } + + if (hash_code >= U64_MAX || ino >= U64_MAX) { + SSDFS_ERR("corrupted dentry: " + "hash_code %llx, ino %llu, " + "type %#x, flags %#x\n", + hash_code, ino, + type, flags); + atomic_set(&tree->state, + SSDFS_DENTRIES_BTREE_CORRUPTED); + return -ERANGE; + } + + if (!(req_flags & SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE)) { + SSDFS_ERR("invalid request: hash is absent\n"); + return -ERANGE; + } + + ssdfs_memcpy(&search->raw.dentry.header, + 0, sizeof(struct ssdfs_dir_entry), + dentry, + 0, sizeof(struct ssdfs_dir_entry), + sizeof(struct ssdfs_dir_entry)); + + search->result.err = 0; + search->result.start_index = (u16)i; + search->result.count = 1; + search->result.search_cno = ssdfs_current_cno(tree->fsi->sb); + search->result.buf_state = SSDFS_BTREE_SEARCH_INLINE_BUFFER; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + search->result.buf = &search->raw.dentry; + search->result.buf_size = sizeof(struct ssdfs_dir_entry); + search->result.items_in_buffer = 1; + + err = ssdfs_check_dentry_for_request(tree->fsi, dentry, search); + if (err == -ENODATA) + goto finish_search_inline_dentry; + else if (err == -EAGAIN) + continue; + else if (unlikely(err)) { + SSDFS_ERR("fail to check dentry: err %d\n", err); + goto finish_search_inline_dentry; + } else { + search->result.state = + SSDFS_BTREE_SEARCH_VALID_ITEM; + goto finish_search_inline_dentry; + } + } + + err = -ENODATA; + search->result.err = -ENODATA; + search->result.start_index = dentries_count; + search->result.state = SSDFS_BTREE_SEARCH_OUT_OF_RANGE; + +finish_search_inline_dentry: + return err; +} + +/* + * __ssdfs_dentries_tree_find() - find a dentry in the tree + * @tree: dentries tree + * @search: search object + * + * This method tries to find a dentry in the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - item hasn't been found + */ +static +int __ssdfs_dentries_tree_find(struct ssdfs_dentries_btree_info *tree, + struct ssdfs_btree_search *search) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + + SSDFS_DBG("tree %p, search %p\n", + tree, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->state)) { + case SSDFS_DENTRIES_BTREE_CREATED: + case SSDFS_DENTRIES_BTREE_INITIALIZED: + case SSDFS_DENTRIES_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_DENTRIES_ARRAY: + down_read(&tree->lock); + err = ssdfs_dentries_tree_find_inline_dentry(tree, search); + up_read(&tree->lock); + + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find the inline dentry: " + "hash %llx\n", + search->request.start.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to find the inline dentry: " + "hash %llx, err %d\n", + search->request.start.hash, err); + } + break; + + case SSDFS_PRIVATE_DENTRIES_BTREE: + down_read(&tree->lock); + err = ssdfs_btree_find_item(tree->generic_tree, search); + up_read(&tree->lock); + + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find the dentry: " + "hash %llx\n", + search->request.start.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to find the dentry: " + "hash %llx, err %d\n", + search->request.start.hash, err); + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid dentries tree type %#x\n", + atomic_read(&tree->type)); + break; + } + + ssdfs_debug_dentries_btree_object(tree); + + return err; +} + +/* + * ssdfs_dentries_tree_find() - find a dentry in the tree + * @tree: dentries tree + * @name: name string + * @len: length of the string + * @search: search object + * + * This method tries to find a dentry for the requested @name. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - item hasn't been found + */ +int ssdfs_dentries_tree_find(struct ssdfs_dentries_btree_info *tree, + const char *name, size_t len, + struct ssdfs_btree_search *search) +{ + u64 name_hash; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !name || !search); + + SSDFS_DBG("tree %p, name %s, len %zu, search %p\n", + tree, name, len, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + + name_hash = __ssdfs_generate_name_hash(name, len, + SSDFS_DENTRY_INLINE_NAME_MAX_LEN); + if (name_hash == U64_MAX) { + SSDFS_ERR("fail to generate name hash\n"); + return -ERANGE; + } + + if (need_initialize_dentries_btree_search(name_hash, search)) { + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + search->request.flags = + SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT | + SSDFS_BTREE_SEARCH_HAS_VALID_NAME; + search->request.start.hash = name_hash; + search->request.start.name = name; + search->request.start.name_len = len; + search->request.end.hash = name_hash; + search->request.end.name = name; + search->request.end.name_len = len; + search->request.count = 1; + } + + return __ssdfs_dentries_tree_find(tree, search); +} + +/* + * ssdfs_dentries_tree_find_leaf_node() - find a leaf node in the tree + * @tree: dentries tree + * @name_hash: name hash + * @search: search object + * + * This method tries to find a leaf node for the requested @name_hash. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_dentries_tree_find_leaf_node(struct ssdfs_dentries_btree_info *tree, + u64 name_hash, + struct ssdfs_btree_search *search) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + + SSDFS_DBG("tree %p, name_hash %llx, search %p\n", + tree, name_hash, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + + if (need_initialize_dentries_btree_search(name_hash, search)) { + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + search->request.flags = + SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT; + search->request.start.hash = name_hash; + search->request.start.name = NULL; + search->request.start.name_len = 0; + search->request.end.hash = name_hash; + search->request.end.name = NULL; + search->request.end.name_len = 0; + search->request.count = 1; + } + + err = __ssdfs_dentries_tree_find(tree, search); + if (err == -ENODATA) { + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + case SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE: + /* expected state */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("unexpected result's state %#x\n", + search->result.state); + goto finish_find_leaf_node; + } + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_DENTRIES_ARRAY: + /* do nothing */ + break; + + case SSDFS_PRIVATE_DENTRIES_BTREE: + switch (search->node.state) { + case SSDFS_BTREE_SEARCH_FOUND_LEAF_NODE_DESC: + case SSDFS_BTREE_SEARCH_FOUND_INDEX_NODE_DESC: + /* expected state */ + err = 0; + break; + + default: + err = -ERANGE; + SSDFS_ERR("unexpected node state %#x\n", + search->node.state); + break; + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid dentries tree type %#x\n", + atomic_read(&tree->type)); + break; + } + } + +finish_find_leaf_node: + return err; +} + +/* + * can_name_be_inline() - check that name can be inline + * @str: string descriptor + */ +static inline +bool can_name_be_inline(const struct qstr *str) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!str || !str->name); + + SSDFS_DBG("name %s, len %u\n", + str->name, str->len); +#endif /* CONFIG_SSDFS_DEBUG */ + + return str->len <= SSDFS_DENTRY_INLINE_NAME_MAX_LEN; +} + +/* + * ssdfs_prepare_dentry() - prepare dentry object + * @str: string descriptor + * @ii: inode descriptor + * @dentry_type: dentry type + * @search: search object + * + * This method tries to prepare a dentry for adding into the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_prepare_dentry(const struct qstr *str, + struct ssdfs_inode_info *ii, + int dentry_type, + struct ssdfs_btree_search *search) +{ + struct ssdfs_raw_dentry *dentry; + u64 name_hash; + u32 copy_len; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!str || !str->name || !ii || !search); + + SSDFS_DBG("name %s, len %u, ino %lu\n", + str->name, str->len, ii->vfs_inode.i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (dentry_type <= SSDFS_DENTRIES_BTREE_UNKNOWN_TYPE || + dentry_type >= SSDFS_DENTRIES_BTREE_TYPE_MAX) { + SSDFS_ERR("invalid dentry type %#x\n", + dentry_type); + return -EINVAL; + } + + name_hash = ssdfs_generate_name_hash(str); + if (name_hash == U64_MAX) { + SSDFS_ERR("fail to generate name hash\n"); + return -ERANGE; + } + + switch (search->result.buf_state) { + case SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE: + search->result.buf_state = SSDFS_BTREE_SEARCH_INLINE_BUFFER; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + search->result.buf = &search->raw.dentry; + search->result.buf_size = sizeof(struct ssdfs_raw_dentry); + search->result.items_in_buffer = 1; + break; + + case SSDFS_BTREE_SEARCH_INLINE_BUFFER: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search->result.buf); + BUG_ON(search->result.buf_size != + sizeof(struct ssdfs_raw_dentry)); + BUG_ON(search->result.items_in_buffer != 1); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + default: + SSDFS_ERR("unexpected buffer state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + dentry = &search->raw.dentry; + + dentry->header.ino = cpu_to_le64(ii->vfs_inode.i_ino); + dentry->header.hash_code = cpu_to_le64(name_hash); + dentry->header.flags = 0; + + if (str->len > SSDFS_MAX_NAME_LEN) { + SSDFS_ERR("invalid name_len %u\n", + str->len); + return -ERANGE; + } + + dentry->header.dentry_type = (u8)dentry_type; + ssdfs_set_file_type(&dentry->header, &ii->vfs_inode); + + if (str->len > SSDFS_DENTRY_INLINE_NAME_MAX_LEN) + dentry->header.flags |= SSDFS_DENTRY_HAS_EXTERNAL_STRING; + + dentry->header.name_len = (u8)str->len; + + memset(dentry->header.inline_string, 0, + SSDFS_DENTRY_INLINE_NAME_MAX_LEN); + copy_len = min_t(u32, (u32)str->len, SSDFS_DENTRY_INLINE_NAME_MAX_LEN); + ssdfs_memcpy(dentry->header.inline_string, + 0, SSDFS_DENTRY_INLINE_NAME_MAX_LEN, + str->name, 0, str->len, + copy_len); + + memset(search->name.str, 0, SSDFS_MAX_NAME_LEN); + search->name.len = (u8)str->len; + ssdfs_memcpy(search->name.str, 0, SSDFS_MAX_NAME_LEN, + str->name, 0, str->len, + str->len); + + search->request.flags |= SSDFS_BTREE_SEARCH_INLINE_BUF_HAS_NEW_ITEM; + + return 0; +} + +/* + * ssdfs_dentries_tree_add_inline_dentry() - add inline dentry into the tree + * @tree: dentries tree + * @search: search object + * + * This method tries to add the inline dentry into the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOSPC - inline tree hasn't room for the new dentry. + * %-EEXIST - dentry exists in the tree. + */ +static int +ssdfs_dentries_tree_add_inline_dentry(struct ssdfs_dentries_btree_info *tree, + struct ssdfs_btree_search *search) +{ + struct ssdfs_dir_entry *cur; + size_t dentry_size = sizeof(struct ssdfs_dir_entry); + s64 dentries_count, dentries_capacity; + int private_flags; + u64 hash1, hash2; + u64 ino1, ino2; + u16 start_index; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, search %p\n", + tree, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_DENTRIES_ARRAY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's type %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + + switch (atomic_read(&tree->state)) { + case SSDFS_DENTRIES_BTREE_CREATED: + case SSDFS_DENTRIES_BTREE_INITIALIZED: + case SSDFS_DENTRIES_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + if (!tree->inline_dentries) { + SSDFS_ERR("empty inline tree %p\n", + tree->inline_dentries); + return -ERANGE; + } + + dentries_count = atomic64_read(&tree->dentries_count); + + if (!tree->owner) { + SSDFS_ERR("empty owner inode\n"); + return -ERANGE; + } + + private_flags = atomic_read(&tree->owner->private_flags); + + dentries_capacity = SSDFS_INLINE_DENTRIES_COUNT; + if (private_flags & SSDFS_INODE_HAS_XATTR_BTREE) + dentries_capacity -= SSDFS_INLINE_DENTRIES_PER_AREA; + if (private_flags & SSDFS_INODE_HAS_DENTRIES_BTREE) { + SSDFS_ERR("the dentries tree is generic\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dentries_count %lld, dentries_capacity %lld\n", + dentries_count, dentries_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (dentries_count > dentries_capacity) { + SSDFS_WARN("dentries tree is corrupted: " + "dentries_count %lld, dentries_capacity %lld\n", + dentries_count, dentries_capacity); + atomic_set(&tree->state, SSDFS_DENTRIES_BTREE_CORRUPTED); + return -ERANGE; + } else if (dentries_count == dentries_capacity) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("inline tree hasn't room for the new dentry: " + "dentries_count %lld, dentries_capacity %lld\n", + dentries_count, dentries_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid search result's state %#x, " + "start_index %u\n", + search->result.state, + search->result.start_index); + return -ERANGE; + } + + if (search->result.buf_state != SSDFS_BTREE_SEARCH_INLINE_BUFFER) { + SSDFS_ERR("invalid buf_state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + hash1 = search->request.start.hash; + ino1 = search->request.start.ino; + hash2 = le64_to_cpu(search->raw.dentry.header.hash_code); + ino2 = le64_to_cpu(search->raw.dentry.header.ino); + + if (hash1 != hash2 || ino1 != ino2) { + SSDFS_ERR("corrupted dentry: " + "request (hash %llx, ino %llu), " + "dentry (hash %llx, ino %llu)\n", + hash1, ino1, hash2, ino2); + return -ERANGE; + } + + start_index = search->result.start_index; + + if (dentries_count == 0) { + if (start_index != 0) { + SSDFS_ERR("invalid start_index %u\n", + start_index); + return -ERANGE; + } + + cur = &tree->inline_dentries[start_index]; + ssdfs_memcpy(cur, 0, dentry_size, + &search->raw.dentry.header, 0, dentry_size, + dentry_size); + } else { + if (start_index >= dentries_capacity) { + SSDFS_ERR("start_index %u >= dentries_capacity %lld\n", + start_index, dentries_capacity); + return -ERANGE; + } + + cur = &tree->inline_dentries[start_index]; + + if ((start_index + 1) <= dentries_count) { + err = ssdfs_memmove(tree->inline_dentries, + (start_index + 1) * dentry_size, + ssdfs_inline_dentries_size(), + tree->inline_dentries, + start_index * dentry_size, + ssdfs_inline_dentries_size(), + (dentries_count - start_index) * + dentry_size); + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", err); + return err; + } + + ssdfs_memcpy(cur, 0, dentry_size, + &search->raw.dentry.header, 0, dentry_size, + dentry_size); + + hash1 = le64_to_cpu(cur->hash_code); + ino1 = le64_to_cpu(cur->ino); + + cur = &tree->inline_dentries[start_index + 1]; + + hash2 = le64_to_cpu(cur->hash_code); + ino2 = le64_to_cpu(cur->ino); + } else { + ssdfs_memcpy(cur, 0, dentry_size, + &search->raw.dentry.header, 0, dentry_size, + dentry_size); + + if (start_index > 0) { + hash2 = le64_to_cpu(cur->hash_code); + ino2 = le64_to_cpu(cur->ino); + + cur = + &tree->inline_dentries[start_index - 1]; + + hash1 = le64_to_cpu(cur->hash_code); + ino1 = le64_to_cpu(cur->ino); + } + } + + if (hash1 < hash2) { + /* + * Correct order. Do nothing. + */ + } else if (hash1 == hash2) { + if (ino1 < ino2) { + /* + * Correct order. Do nothing. + */ + } else if (ino1 < ino2) { + SSDFS_ERR("duplicated dentry: " + "hash1 %llx, ino1 %llu, " + "hash2 %llx, ino2 %llu\n", + hash1, ino1, hash2, ino2); + atomic_set(&tree->state, + SSDFS_DENTRIES_BTREE_CORRUPTED); + return -ERANGE; + } else { + SSDFS_ERR("invalid dentries oredring: " + "hash1 %llx, ino1 %llu, " + "hash2 %llx, ino2 %llu\n", + hash1, ino1, hash2, ino2); + atomic_set(&tree->state, + SSDFS_DENTRIES_BTREE_CORRUPTED); + return -ERANGE; + } + } else { + SSDFS_ERR("invalid hash order: " + "hash1 %llx > hash2 %llx\n", + hash1, hash2); + atomic_set(&tree->state, + SSDFS_DENTRIES_BTREE_CORRUPTED); + return -ERANGE; + } + } + + dentries_count = atomic64_inc_return(&tree->dentries_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dentries_count %llu\n", + atomic64_read(&tree->dentries_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (dentries_count > dentries_capacity) { + SSDFS_WARN("dentries_count is too much: " + "count %lld, capacity %lld\n", + dentries_count, dentries_capacity); + atomic_set(&tree->state, SSDFS_DENTRIES_BTREE_CORRUPTED); + return -ERANGE; + } + + atomic_set(&tree->state, SSDFS_DENTRIES_BTREE_DIRTY); + return 0; +} + +/* + * ssdfs_dentries_tree_add_dentry() - add the dentry into the tree + * @tree: dentries tree + * @search: search object + * + * This method tries to add the generic dentry into the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EEXIST - dentry exists in the tree. + */ +static +int ssdfs_dentries_tree_add_dentry(struct ssdfs_dentries_btree_info *tree, + struct ssdfs_btree_search *search) +{ + u64 hash1, hash2; + u64 ino1, ino2; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, search %p\n", + tree, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->type)) { + case SSDFS_PRIVATE_DENTRIES_BTREE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's type %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + + switch (atomic_read(&tree->state)) { + case SSDFS_DENTRIES_BTREE_CREATED: + case SSDFS_DENTRIES_BTREE_INITIALIZED: + case SSDFS_DENTRIES_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + if (!tree->generic_tree) { + SSDFS_ERR("empty generic tree %p\n", + tree->generic_tree); + return -ERANGE; + } + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + case SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE: + case SSDFS_BTREE_SEARCH_OBSOLETE_RESULT: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid search result's state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.buf_state != SSDFS_BTREE_SEARCH_INLINE_BUFFER) { + SSDFS_ERR("invalid buf_state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + hash1 = search->request.start.hash; + ino1 = search->request.start.ino; + hash2 = le64_to_cpu(search->raw.dentry.header.hash_code); + ino2 = le64_to_cpu(search->raw.dentry.header.ino); + + if (hash1 != hash2 || ino1 != ino2) { + SSDFS_ERR("corrupted dentry: " + "request (hash %llx, ino %llu), " + "dentry (hash %llx, ino %llu)\n", + hash1, ino1, hash2, ino2); + return -ERANGE; + } + + err = ssdfs_btree_add_item(tree->generic_tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to add the dentry into the tree: " + "err %d\n", err); + return err; + } + + err = ssdfs_btree_synchronize_root_node(tree->generic_tree, + tree->root); + if (unlikely(err)) { + SSDFS_ERR("fail to synchronize the root node: " + "err %d\n", err); + return err; + } + + atomic_set(&tree->state, SSDFS_DENTRIES_BTREE_DIRTY); + return 0; +} + +/* + * ssdfs_dentries_tree_add() - add dentry into the tree + * @tree: dentries tree + * @str: name of the file/folder + * @ii: inode info + * @search: search object + * + * This method tries to add dentry into the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EEXIST - dentry exists in the tree. + */ +int ssdfs_dentries_tree_add(struct ssdfs_dentries_btree_info *tree, + const struct qstr *str, + struct ssdfs_inode_info *ii, + struct ssdfs_btree_search *search) +{ + struct ssdfs_shared_dict_btree_info *dict; + u64 name_hash; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !str || !ii || !search); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p, ii %p, ino %lu\n", + tree, ii, ii->vfs_inode.i_ino); +#else + SSDFS_DBG("tree %p, ii %p, ino %lu\n", + tree, ii, ii->vfs_inode.i_ino); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + dict = tree->fsi->shdictree; + if (!dict) { + SSDFS_ERR("shared dictionary is absent\n"); + return -ERANGE; + } + + search->request.type = SSDFS_BTREE_SEARCH_ADD_ITEM; + + name_hash = ssdfs_generate_name_hash(str); + if (name_hash == U64_MAX) { + SSDFS_ERR("fail to generate name hash\n"); + return -ERANGE; + } + + if (need_initialize_dentries_btree_search(name_hash, search)) { + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_ADD_ITEM; + search->request.flags = + SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT | + SSDFS_BTREE_SEARCH_HAS_VALID_NAME | + SSDFS_BTREE_SEARCH_HAS_VALID_INO; + search->request.start.hash = name_hash; + search->request.start.name = str->name; + search->request.start.name_len = str->len; + search->request.start.ino = ii->vfs_inode.i_ino; + search->request.end.hash = name_hash; + search->request.end.name = str->name; + search->request.end.name_len = str->len; + search->request.end.ino = ii->vfs_inode.i_ino; + search->request.count = 1; + } + + switch (atomic_read(&tree->state)) { + case SSDFS_DENTRIES_BTREE_CREATED: + case SSDFS_DENTRIES_BTREE_INITIALIZED: + case SSDFS_DENTRIES_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_DENTRIES_ARRAY: + down_write(&tree->lock); + + err = ssdfs_dentries_tree_find_inline_dentry(tree, search); + if (err == -ENODATA) { + /* + * Dentry doesn't exist for requested name hash. + * It needs to create a new dentry. + */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to find the inline dentry: " + "name_hash %llx, err %d\n", + name_hash, err); + goto finish_add_inline_dentry; + } + + if (err == -ENODATA) { + err = ssdfs_prepare_dentry(str, ii, + SSDFS_INLINE_DENTRY, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare the dentry: " + "name_hash %llx, ino %lu, " + "err %d\n", + name_hash, + ii->vfs_inode.i_ino, + err); + goto finish_add_inline_dentry; + } + + search->request.type = SSDFS_BTREE_SEARCH_ADD_ITEM; + err = ssdfs_dentries_tree_add_inline_dentry(tree, + search); + if (err == -ENOSPC) { + err = ssdfs_migrate_inline2generic_tree(tree); + if (unlikely(err)) { + SSDFS_ERR("fail to migrate the tree: " + "err %d\n", + err); + goto finish_add_inline_dentry; + } else { + search->request.type = + SSDFS_BTREE_SEARCH_ADD_ITEM; + downgrade_write(&tree->lock); + goto try_to_add_into_generic_tree; + } + } else if (unlikely(err)) { + SSDFS_ERR("fail to add the dentry: " + "name_hash %llx, ino %lu, " + "err %d\n", + name_hash, + ii->vfs_inode.i_ino, + err); + goto finish_add_inline_dentry; + } + + if (!can_name_be_inline(str)) { + err = ssdfs_shared_dict_save_name(dict, + name_hash, + str); + if (unlikely(err)) { + SSDFS_ERR("fail to store name: " + "hash %llx, err %d\n", + name_hash, err); + goto finish_add_inline_dentry; + } + } + } else { + err = -EEXIST; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dentry exists in the tree: " + "name_hash %llx, ino %lu\n", + name_hash, ii->vfs_inode.i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_add_inline_dentry; + } + +finish_add_inline_dentry: + up_write(&tree->lock); + break; + + case SSDFS_PRIVATE_DENTRIES_BTREE: + down_read(&tree->lock); +try_to_add_into_generic_tree: + err = ssdfs_btree_find_item(tree->generic_tree, search); + if (err == -ENODATA) { + /* + * Dentry doesn't exist for requested name. + * It needs to create a new dentry. + */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to find the dentry: " + "name_hash %llx, ino %lu, " + "err %d\n", + name_hash, + ii->vfs_inode.i_ino, + err); + goto finish_add_generic_dentry; + } + + if (err == -ENODATA) { + err = ssdfs_prepare_dentry(str, ii, + SSDFS_REGULAR_DENTRY, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare the dentry: " + "name_hash %llx, ino %lu, " + "err %d\n", + name_hash, + ii->vfs_inode.i_ino, + err); + goto finish_add_generic_dentry; + } + + search->request.type = SSDFS_BTREE_SEARCH_ADD_ITEM; + err = ssdfs_dentries_tree_add_dentry(tree, search); + + ssdfs_btree_search_forget_parent_node(search); + ssdfs_btree_search_forget_child_node(search); + + if (unlikely(err)) { + SSDFS_ERR("fail to add the dentry: " + "name_hash %llx, ino %lu, " + "err %d\n", + name_hash, + ii->vfs_inode.i_ino, + err); + goto finish_add_generic_dentry; + } + + if (!can_name_be_inline(str)) { + err = ssdfs_shared_dict_save_name(dict, + name_hash, + str); + if (unlikely(err)) { + SSDFS_ERR("fail to store name: " + "hash %llx, err %d\n", + name_hash, err); + goto finish_add_generic_dentry; + } + } + } else { + err = -EEXIST; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dentry exists in the tree: " + "name_hash %llx, ino %lu\n", + name_hash, ii->vfs_inode.i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_add_generic_dentry; + } + +finish_add_generic_dentry: + up_read(&tree->lock); + break; + + default: + SSDFS_ERR("invalid dentries tree type %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ssdfs_debug_dentries_btree_object(tree); + + return err; +} + +/* + * ssdfs_change_dentry() - change a dentry + * @str: string descriptor + * @new_ii: new inode info + * @dentry_type: dentry type + * @search: search object + * + * This method tries to prepare a new state of the dentry object. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_change_dentry(const struct qstr *str, + struct ssdfs_inode_info *new_ii, + int dentry_type, + struct ssdfs_btree_search *search) +{ + struct ssdfs_raw_dentry *dentry; + ino_t ino; + u64 name_hash; + u32 copy_len; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!str || !str->name || !new_ii || !search); +#endif /* CONFIG_SSDFS_DEBUG */ + + ino = new_ii->vfs_inode.i_ino; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("name %s, len %u, ino %lu\n", + str->name, str->len, ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (dentry_type <= SSDFS_DENTRIES_BTREE_UNKNOWN_TYPE || + dentry_type >= SSDFS_DENTRIES_BTREE_TYPE_MAX) { + SSDFS_ERR("invalid dentry type %#x\n", + dentry_type); + return -EINVAL; + } + + name_hash = ssdfs_generate_name_hash(str); + if (name_hash == U64_MAX) { + SSDFS_ERR("fail to generate name hash\n"); + return -ERANGE; + } + + if (search->result.buf_state != SSDFS_BTREE_SEARCH_INLINE_BUFFER || + !search->result.buf || + search->result.buf_size != sizeof(struct ssdfs_raw_dentry)) { + SSDFS_ERR("invalid buffer state: " + "state %#x, buf %p\n", + search->result.buf_state, + search->result.buf); + return -ERANGE; + } + + dentry = &search->raw.dentry; + + if (ino != le64_to_cpu(dentry->header.ino)) { + SSDFS_ERR("invalid ino: " + "ino1 %lu != ino2 %llu\n", + ino, + le64_to_cpu(dentry->header.ino)); + return -ERANGE; + } + + dentry->header.hash_code = cpu_to_le64(name_hash); + dentry->header.flags = 0; + + dentry->header.dentry_type = (u8)dentry_type; + ssdfs_set_file_type(&dentry->header, &new_ii->vfs_inode); + + if (str->len > SSDFS_MAX_NAME_LEN) { + SSDFS_ERR("invalid name_len %u\n", + str->len); + return -ERANGE; + } + + if (str->len > SSDFS_DENTRY_INLINE_NAME_MAX_LEN) + dentry->header.flags |= SSDFS_DENTRY_HAS_EXTERNAL_STRING; + + dentry->header.name_len = (u8)str->len; + + memset(dentry->header.inline_string, 0, + SSDFS_DENTRY_INLINE_NAME_MAX_LEN); + copy_len = min_t(u32, (u32)str->len, SSDFS_DENTRY_INLINE_NAME_MAX_LEN); + ssdfs_memcpy(dentry->header.inline_string, + 0, SSDFS_DENTRY_INLINE_NAME_MAX_LEN, + str->name, 0, str->len, + copy_len); + + memset(search->name.str, 0, SSDFS_MAX_NAME_LEN); + search->name.len = (u8)str->len; + ssdfs_memcpy(search->name.str, 0, SSDFS_MAX_NAME_LEN, + str->name, 0, str->len, + str->len); + + return 0; +} + +/* + * ssdfs_dentries_tree_change_inline_dentry() - change inline dentry + * @tree: dentries tree + * @search: search object + * + * This method tries to change the existing inline dentry. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - dentry doesn't exist in the tree. + */ +static int +ssdfs_dentries_tree_change_inline_dentry(struct ssdfs_dentries_btree_info *tree, + struct ssdfs_btree_search *search) +{ + struct ssdfs_dir_entry *cur; + size_t dentry_size = sizeof(struct ssdfs_dir_entry); + u64 hash1, hash2; + u64 ino1, ino2; + int private_flags; + s64 dentries_count, dentries_capacity; + u16 start_index; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, search %p\n", + tree, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_DENTRIES_ARRAY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's type %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + + switch (atomic_read(&tree->state)) { + case SSDFS_DENTRIES_BTREE_CREATED: + case SSDFS_DENTRIES_BTREE_INITIALIZED: + case SSDFS_DENTRIES_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + if (!tree->inline_dentries) { + SSDFS_ERR("empty inline tree %p\n", + tree->inline_dentries); + return -ERANGE; + } + + if (search->result.state != SSDFS_BTREE_SEARCH_VALID_ITEM) { + SSDFS_ERR("invalid search result's state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.buf_state != SSDFS_BTREE_SEARCH_INLINE_BUFFER) { + SSDFS_ERR("invalid buf_state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + hash1 = search->request.start.hash; + ino1 = search->request.start.ino; + + cur = &search->raw.dentry.header; + hash2 = le64_to_cpu(cur->hash_code); + ino2 = le64_to_cpu(cur->ino); + + if (hash1 != hash2 || ino1 != ino2) { + SSDFS_ERR("hash1 %llx, hash2 %llx, " + "ino1 %llu, ino2 %llu\n", + hash1, hash2, ino1, ino2); + return -ERANGE; + } + + if (!tree->owner) { + SSDFS_ERR("empty owner inode\n"); + return -ERANGE; + } + + dentries_count = atomic64_read(&tree->dentries_count); + private_flags = atomic_read(&tree->owner->private_flags); + + dentries_capacity = SSDFS_INLINE_DENTRIES_COUNT; + if (private_flags & SSDFS_INODE_HAS_XATTR_BTREE) + dentries_capacity -= SSDFS_INLINE_DENTRIES_PER_AREA; + if (private_flags & SSDFS_INODE_HAS_DENTRIES_BTREE) { + SSDFS_ERR("the dentries tree is generic\n"); + return -ERANGE; + } + + if (dentries_count > dentries_capacity) { + SSDFS_WARN("dentries tree is corrupted: " + "dentries_count %lld, dentries_capacity %lld\n", + dentries_count, dentries_capacity); + atomic_set(&tree->state, SSDFS_DENTRIES_BTREE_CORRUPTED); + return -ERANGE; + } else if (dentries_count == 0) { + SSDFS_DBG("empty tree\n"); + return -EFAULT; + } + + start_index = search->result.start_index; + + if (start_index >= dentries_count) { + SSDFS_ERR("start_index %u >= dentries_count %lld\n", + start_index, dentries_count); + return -ENODATA; + } + + ssdfs_memcpy(tree->inline_dentries, + start_index * dentry_size, ssdfs_inline_dentries_size(), + &search->raw.dentry.header, 0, dentry_size, + dentry_size); + atomic_set(&tree->state, SSDFS_DENTRIES_BTREE_DIRTY); + + return 0; +} + +/* + * ssdfs_dentries_tree_change_dentry() - change the generic dentry + * @tree: dentries tree + * @search: search object + * + * This method tries to change the existing generic dentry. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - dentry doesn't exist in the tree. + */ +static +int ssdfs_dentries_tree_change_dentry(struct ssdfs_dentries_btree_info *tree, + struct ssdfs_btree_search *search) +{ + struct ssdfs_raw_dentry *cur; + u64 hash1, hash2; + u64 ino1, ino2; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, search %p\n", + tree, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->type)) { + case SSDFS_PRIVATE_DENTRIES_BTREE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's type %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + + switch (atomic_read(&tree->state)) { + case SSDFS_DENTRIES_BTREE_CREATED: + case SSDFS_DENTRIES_BTREE_INITIALIZED: + case SSDFS_DENTRIES_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + if (!tree->generic_tree) { + SSDFS_ERR("empty generic tree %p\n", + tree->generic_tree); + return -ERANGE; + } + + if (search->result.state != SSDFS_BTREE_SEARCH_VALID_ITEM) { + SSDFS_ERR("invalid search result's state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.buf_state != SSDFS_BTREE_SEARCH_INLINE_BUFFER) { + SSDFS_ERR("invalid buf_state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + hash1 = search->request.start.hash; + ino1 = search->request.start.ino; + + cur = &search->raw.dentry; + hash2 = le64_to_cpu(cur->header.hash_code); + ino2 = le64_to_cpu(cur->header.ino); + + if (hash1 != hash2 || ino1 != ino2) { + SSDFS_ERR("hash1 %llx, hash2 %llx, " + "ino1 %llu, ino2 %llu\n", + hash1, hash2, ino1, ino2); + return -ERANGE; + } + + err = ssdfs_btree_change_item(tree->generic_tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to change the dentry into the tree: " + "err %d\n", err); + return err; + } + + err = ssdfs_btree_synchronize_root_node(tree->generic_tree, + tree->root); + if (unlikely(err)) { + SSDFS_ERR("fail to synchronize the root node: " + "err %d\n", err); + return err; + } + + atomic_set(&tree->state, SSDFS_DENTRIES_BTREE_DIRTY); + return 0; +} + +/* + * ssdfs_dentries_tree_change() - change dentry in the tree + * @tree: dentries tree + * @name_hash: hash of the name + * @old_ino: old inode ID + * @new_str: new name of the file/folder + * @new_ii: new inode info + * @search: search object + * + * This method tries to change dentry in the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - dentry doesn't exist in the tree. + */ +int ssdfs_dentries_tree_change(struct ssdfs_dentries_btree_info *tree, + u64 name_hash, ino_t old_ino, + const struct qstr *str, + struct ssdfs_inode_info *new_ii, + struct ssdfs_btree_search *search) +{ + struct ssdfs_shared_dict_btree_info *dict; + u64 new_name_hash; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p, search %p, name_hash %llx\n", + tree, search, name_hash); +#else + SSDFS_DBG("tree %p, search %p, name_hash %llx\n", + tree, search, name_hash); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + switch (atomic_read(&tree->state)) { + case SSDFS_DENTRIES_BTREE_CREATED: + case SSDFS_DENTRIES_BTREE_INITIALIZED: + case SSDFS_DENTRIES_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + dict = tree->fsi->shdictree; + if (!dict) { + SSDFS_ERR("shared dictionary is absent\n"); + return -ERANGE; + } + + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + + if (need_initialize_dentries_btree_search(name_hash, search)) { + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + search->request.flags = + SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT | + SSDFS_BTREE_SEARCH_HAS_VALID_INO; + search->request.start.hash = name_hash; + search->request.start.name = NULL; + search->request.start.name_len = U32_MAX; + search->request.start.ino = old_ino; + search->request.end.hash = name_hash; + search->request.end.name = NULL; + search->request.end.name_len = U32_MAX; + search->request.end.ino = old_ino; + search->request.count = 1; + } + + new_name_hash = ssdfs_generate_name_hash(str); + if (new_name_hash == U64_MAX) { + SSDFS_ERR("fail to generate name hash\n"); + return -ERANGE; + } + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_DENTRIES_ARRAY: + down_write(&tree->lock); + + err = ssdfs_dentries_tree_find_inline_dentry(tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the inline dentry: " + "name_hash %llx, err %d\n", + name_hash, err); + goto finish_change_inline_dentry; + } + + err = ssdfs_change_dentry(str, new_ii, + SSDFS_INLINE_DENTRY, search); + if (unlikely(err)) { + SSDFS_ERR("fail to change dentry: err %d\n", + err); + goto finish_change_inline_dentry; + } + + search->request.type = SSDFS_BTREE_SEARCH_CHANGE_ITEM; + + err = ssdfs_dentries_tree_change_inline_dentry(tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to change inline dentry: " + "name_hash %llx, err %d\n", + name_hash, err); + goto finish_change_inline_dentry; + } + + if (!can_name_be_inline(str)) { + err = ssdfs_shared_dict_save_name(dict, + new_name_hash, + str); + if (unlikely(err)) { + SSDFS_ERR("fail to store name: " + "hash %llx, err %d\n", + new_name_hash, err); + goto finish_change_inline_dentry; + } + } + +finish_change_inline_dentry: + up_write(&tree->lock); + break; + + case SSDFS_PRIVATE_DENTRIES_BTREE: + down_read(&tree->lock); + + err = ssdfs_btree_find_item(tree->generic_tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the dentry: " + "name_hash %llx, err %d\n", + name_hash, err); + goto finish_change_generic_dentry; + } + + err = ssdfs_change_dentry(str, new_ii, + SSDFS_REGULAR_DENTRY, search); + if (unlikely(err)) { + SSDFS_ERR("fail to change dentry: err %d\n", + err); + goto finish_change_generic_dentry; + } + + search->request.type = SSDFS_BTREE_SEARCH_CHANGE_ITEM; + + err = ssdfs_dentries_tree_change_dentry(tree, search); + + ssdfs_btree_search_forget_parent_node(search); + ssdfs_btree_search_forget_child_node(search); + + if (unlikely(err)) { + SSDFS_ERR("fail to change dentry: " + "name_hash %llx, err %d\n", + name_hash, err); + goto finish_change_generic_dentry; + } + + if (!can_name_be_inline(str)) { + err = ssdfs_shared_dict_save_name(dict, + new_name_hash, + str); + if (unlikely(err)) { + SSDFS_ERR("fail to store name: " + "hash %llx, err %d\n", + new_name_hash, err); + goto finish_change_generic_dentry; + } + } + +finish_change_generic_dentry: + up_read(&tree->lock); + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid dentries tree type %#x\n", + atomic_read(&tree->type)); + break; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ssdfs_debug_dentries_btree_object(tree); + + return err; +} diff --git a/fs/ssdfs/dentries_tree.h b/fs/ssdfs/dentries_tree.h new file mode 100644 index 000000000000..fb2168d511f8 --- /dev/null +++ b/fs/ssdfs/dentries_tree.h @@ -0,0 +1,156 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/dentries_tree.h - dentries btree declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#ifndef _SSDFS_DENTRIES_TREE_H +#define _SSDFS_DENTRIES_TREE_H + +#define SSDFS_INLINE_DENTRIES_COUNT (2 * SSDFS_INLINE_DENTRIES_PER_AREA) + +/* + * struct ssdfs_dentries_btree_info - dentries btree info + * @type: dentries btree type + * @state: dentries btree state + * @dentries_count: count of the dentries in the whole dentries tree + * @lock: dentries btree lock + * @generic_tree: pointer on generic btree object + * @inline_dentries: pointer on inline dentries array + * @buffer.tree: piece of memory for generic btree object + * @buffer.dentries: piece of memory for the inline dentries + * @root: pointer on root node + * @root_buffer: buffer for root node + * @desc: b-tree descriptor + * @owner: pointer on owner inode object + * @fsi: pointer on shared file system object + * + * A newly created inode tries to store dentries into inline + * dentries. The raw on-disk inode has internal private area + * that is able to contain the four inline dentries or + * root node of extents btree and extended attributes btree. + * If inode hasn't extended attributes and the amount of dentries + * are lesser than four then everithing can be stored inside of + * inline dentries. Otherwise, the real dentries btree should + * be created. + */ +struct ssdfs_dentries_btree_info { + atomic_t type; + atomic_t state; + atomic64_t dentries_count; + + struct rw_semaphore lock; + struct ssdfs_btree *generic_tree; + struct ssdfs_dir_entry *inline_dentries; + union { + struct ssdfs_btree tree; + struct ssdfs_dir_entry dentries[SSDFS_INLINE_DENTRIES_COUNT]; + } buffer; + struct ssdfs_btree_inline_root_node *root; + struct ssdfs_btree_inline_root_node root_buffer; + + struct ssdfs_dentries_btree_descriptor desc; + struct ssdfs_inode_info *owner; + struct ssdfs_fs_info *fsi; +}; + +/* Dentries tree types */ +enum { + SSDFS_DENTRIES_BTREE_UNKNOWN_TYPE, + SSDFS_INLINE_DENTRIES_ARRAY, + SSDFS_PRIVATE_DENTRIES_BTREE, + SSDFS_DENTRIES_BTREE_TYPE_MAX +}; + +/* Dentries tree states */ +enum { + SSDFS_DENTRIES_BTREE_UNKNOWN_STATE, + SSDFS_DENTRIES_BTREE_CREATED, + SSDFS_DENTRIES_BTREE_INITIALIZED, + SSDFS_DENTRIES_BTREE_DIRTY, + SSDFS_DENTRIES_BTREE_CORRUPTED, + SSDFS_DENTRIES_BTREE_STATE_MAX +}; + +/* + * Inline methods + */ +static inline +size_t ssdfs_inline_dentries_size(void) +{ + size_t dentry_size = sizeof(struct ssdfs_dir_entry); + return dentry_size * SSDFS_INLINE_DENTRIES_COUNT; +} + +static inline +size_t ssdfs_area_dentries_size(void) +{ + size_t dentry_size = sizeof(struct ssdfs_dir_entry); + return dentry_size * SSDFS_INLINE_DENTRIES_PER_AREA; +} + +/* + * Dentries tree API + */ +int ssdfs_dentries_tree_create(struct ssdfs_fs_info *fsi, + struct ssdfs_inode_info *ii); +int ssdfs_dentries_tree_init(struct ssdfs_fs_info *fsi, + struct ssdfs_inode_info *ii); +void ssdfs_dentries_tree_destroy(struct ssdfs_inode_info *ii); +int ssdfs_dentries_tree_flush(struct ssdfs_fs_info *fsi, + struct ssdfs_inode_info *ii); + +int ssdfs_dentries_tree_find(struct ssdfs_dentries_btree_info *tree, + const char *name, size_t len, + struct ssdfs_btree_search *search); +int ssdfs_dentries_tree_add(struct ssdfs_dentries_btree_info *tree, + const struct qstr *str, + struct ssdfs_inode_info *ii, + struct ssdfs_btree_search *search); +int ssdfs_dentries_tree_change(struct ssdfs_dentries_btree_info *tree, + u64 name_hash, ino_t old_ino, + const struct qstr *str, + struct ssdfs_inode_info *new_ii, + struct ssdfs_btree_search *search); +int ssdfs_dentries_tree_delete(struct ssdfs_dentries_btree_info *tree, + u64 name_hash, ino_t ino, + struct ssdfs_btree_search *search); +int ssdfs_dentries_tree_delete_all(struct ssdfs_dentries_btree_info *tree); + +/* + * Internal dentries tree API + */ +u64 ssdfs_generate_name_hash(const struct qstr *str); +int ssdfs_dentries_tree_find_leaf_node(struct ssdfs_dentries_btree_info *tree, + u64 name_hash, + struct ssdfs_btree_search *search); +int ssdfs_dentries_tree_extract_range(struct ssdfs_dentries_btree_info *tree, + u16 start_index, u16 count, + struct ssdfs_btree_search *search); + +void ssdfs_debug_dentries_btree_object(struct ssdfs_dentries_btree_info *tree); + +/* + * Dentries btree specialized operations + */ +extern const struct ssdfs_btree_descriptor_operations + ssdfs_dentries_btree_desc_ops; +extern const struct ssdfs_btree_operations ssdfs_dentries_btree_ops; +extern const struct ssdfs_btree_node_operations ssdfs_dentries_btree_node_ops; + +#endif /* _SSDFS_DENTRIES_TREE_H */ From patchwork Sat Feb 25 01:09:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151968 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51179C7EE30 for ; Sat, 25 Feb 2023 01:19:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229838AbjBYBT6 (ORCPT ); Fri, 24 Feb 2023 20:19:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48872 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229722AbjBYBTK (ORCPT ); Fri, 24 Feb 2023 20:19:10 -0500 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8F2F03AB6 for ; Fri, 24 Feb 2023 17:17:49 -0800 (PST) Received: by mail-oi1-x22e.google.com with SMTP id s41so7410oiw.13 for ; Fri, 24 Feb 2023 17:17:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Ni2YGbY/F+FnvC9dmCps3fVIRcw6WcCHKY2oBihVXWU=; b=5VKcCsYEssyXdDGC1nzC55cMPgiMkpBi4n0RrJ6/SsRKo8GWoABlB89YvQ9Lz2n60p MEITYOfi73BRKN1rexLv4OiIUmSH5xv1XAgjzWbSVAsZsMz7okuI7Sn9Tdg+JRCRt1V3 kmfwnWCdclFhDhNKB1BFKwngvooJ9tKRfVyv/d8fiuFuyIospTAvSaV/Z9q5CrdOrexe hpJdcNEQMiyQNKXJ1Gg4Er/ja6nJRGS0tlHEJs9V43QWlHhEk14aNMAoJ0vdMyyL/DhB kQME4r46kuW4ZqcMaw2TdGbFVV4px5/XvGdkU7VDjULWAmDJ6gt3IYB0lp5MsJQ0IA3q M5mA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ni2YGbY/F+FnvC9dmCps3fVIRcw6WcCHKY2oBihVXWU=; b=CcKsfgl+uA5AWSDL9wzwEnj3U4xsfR/5Q2pMwc1fbI7RbmsITg31ian2SyjfUD78ij JCcihpF01Uj3DbygpG9x/71WPaBpBW/dIjF8o/24HBzFatJP9+1Z8xck82XRqMsP9EN0 X8DbKsX2UO7+dBRsIlVf7wfJd5ZHwPK+hjZr+CJLQlxPHKGPt4V7wwzkyisDBWd9W0a+ fpSPcBaCUYtqj6ss59qzu7z09qXm8DJItPM7/BG2N+V2vET0mA/l1wqPofB6FWtZSCpo q7w4iLncLuZFGPYV5Syvox9hIaC1hAemXf+mjczgaquswkxdHNNK4B+psZ74JqvCT0lG yEYQ== X-Gm-Message-State: AO0yUKWhAzHYOjIv6AfSPlsbXxW1fzL266ni/H3wGzSpd5HFyIQJQBiY LtTiS7Ung2CK5LeVN1OQQlFx6ZJdv9ot72GW X-Google-Smtp-Source: AK7set8rAHzFpHaRoFpUU6sLkr+ym8sLtrjpHlveaM/ZVDVmkHIsx61hWpF0Z9y4hrZ5nQ9uelS6SQ== X-Received: by 2002:a05:6808:1a1b:b0:37a:2bf0:502d with SMTP id bk27-20020a0568081a1b00b0037a2bf0502dmr871001oib.27.1677287868298; Fri, 24 Feb 2023 17:17:48 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:47 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 63/76] ssdfs: dentries b-tree specialized operations Date: Fri, 24 Feb 2023 17:09:14 -0800 Message-Id: <20230225010927.813929-64-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Dentries b-tree implements API: (1) create - create dentries b-tree (2) destroy - destroy dentries b-tree (3) flush - flush dirty dentries b-tree (4) find - find dentry for a name in b-tree (5) add - add dentry object into b-tree (6) change - change/update dentry object in b-tree (7) delete - delete dentry object from b-tree (8) delete_all - delete all dentries from b-tree Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/dentries_tree.c | 3369 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 3369 insertions(+) diff --git a/fs/ssdfs/dentries_tree.c b/fs/ssdfs/dentries_tree.c index 8c2ce87d1077..9b4115b6bffa 100644 --- a/fs/ssdfs/dentries_tree.c +++ b/fs/ssdfs/dentries_tree.c @@ -3011,3 +3011,3372 @@ int ssdfs_dentries_tree_change(struct ssdfs_dentries_btree_info *tree, return err; } + +/* + * ssdfs_dentries_tree_delete_inline_dentry() - delete inline dentry + * @tree: dentries tree + * @search: search object + * + * This method tries to delete the inline dentry from the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - dentry doesn't exist in the tree. + * %-ENOENT - no more dentries in the tree. + */ +static int +ssdfs_dentries_tree_delete_inline_dentry(struct ssdfs_dentries_btree_info *tree, + struct ssdfs_btree_search *search) +{ + struct ssdfs_raw_dentry *cur; + struct ssdfs_dir_entry *dentry1; + size_t dentry_size = sizeof(struct ssdfs_dir_entry); + u64 hash1, hash2; + u64 ino1, ino2; + s64 dentries_count; + u16 index; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, search %p\n", + tree, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_DENTRIES_ARRAY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's type %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + + switch (atomic_read(&tree->state)) { + case SSDFS_DENTRIES_BTREE_CREATED: + case SSDFS_DENTRIES_BTREE_INITIALIZED: + case SSDFS_DENTRIES_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + if (!tree->inline_dentries) { + SSDFS_ERR("empty inline tree %p\n", + tree->inline_dentries); + return -ERANGE; + } + + if (search->result.buf_state != SSDFS_BTREE_SEARCH_INLINE_BUFFER) { + SSDFS_ERR("invalid buf_state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + if (!search->result.buf) { + SSDFS_ERR("empty buffer pointer\n"); + return -ERANGE; + } + + hash1 = search->request.start.hash; + ino1 = search->request.start.ino; + + cur = &search->raw.dentry; + hash2 = le64_to_cpu(cur->header.hash_code); + ino2 = le64_to_cpu(cur->header.ino); + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_VALID_ITEM: + if (hash1 != hash2 || ino1 != ino2) { + SSDFS_ERR("hash1 %llx, hash2 %llx, " + "ino1 %llu, ino2 %llu\n", + hash1, hash2, ino1, ino2); + return -ERANGE; + } + break; + + default: + SSDFS_WARN("unexpected result state %#x\n", + search->result.state); + return -ERANGE; + } + + dentries_count = atomic64_read(&tree->dentries_count); + if (dentries_count == 0) { + SSDFS_DBG("empty tree\n"); + return -ENOENT; + } else if (dentries_count > SSDFS_INLINE_DENTRIES_COUNT) { + SSDFS_ERR("invalid dentries count %llu\n", + dentries_count); + return -ERANGE; + } + + if (search->result.start_index >= dentries_count) { + SSDFS_ERR("invalid search result: " + "start_index %u, dentries_count %lld\n", + search->result.start_index, + dentries_count); + return -ENODATA; + } + + index = search->result.start_index; + + if ((index + 1) < dentries_count) { + err = ssdfs_memmove(tree->inline_dentries, + index * dentry_size, + ssdfs_inline_dentries_size(), + tree->inline_dentries, + (index + 1) * dentry_size, + ssdfs_inline_dentries_size(), + (dentries_count - index) * dentry_size); + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", err); + return err; + } + } + + index = (u16)(dentries_count - 1); + dentry1 = &tree->inline_dentries[index]; + memset(dentry1, 0xFF, sizeof(struct ssdfs_dir_entry)); + + atomic_set(&tree->state, SSDFS_DENTRIES_BTREE_DIRTY); + + dentries_count = atomic64_dec_return(&tree->dentries_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dentries_count %llu\n", + atomic64_read(&tree->dentries_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (dentries_count == 0) { + SSDFS_DBG("tree is empty now\n"); + } else if (dentries_count < 0) { + SSDFS_WARN("invalid dentries_count %lld\n", + dentries_count); + atomic_set(&tree->state, SSDFS_DENTRIES_BTREE_CORRUPTED); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_dentries_tree_delete_dentry() - delete generic dentry + * @tree: dentries tree + * @search: search object + * + * This method tries to delete the generic dentry from the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - dentry doesn't exist in the tree. + * %-ENOENT - no more dentries in the tree. + */ +static +int ssdfs_dentries_tree_delete_dentry(struct ssdfs_dentries_btree_info *tree, + struct ssdfs_btree_search *search) +{ + struct ssdfs_raw_dentry *cur; + u64 hash1, hash2; + u64 ino1, ino2; + s64 dentries_count; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, search %p\n", + tree, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->type)) { + case SSDFS_PRIVATE_DENTRIES_BTREE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's type %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + + switch (atomic_read(&tree->state)) { + case SSDFS_DENTRIES_BTREE_CREATED: + case SSDFS_DENTRIES_BTREE_INITIALIZED: + case SSDFS_DENTRIES_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + if (!tree->generic_tree) { + SSDFS_ERR("empty generic tree %p\n", + tree->generic_tree); + return -ERANGE; + } + + if (search->result.state != SSDFS_BTREE_SEARCH_VALID_ITEM) { + SSDFS_ERR("invalid search result's state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.buf_state != SSDFS_BTREE_SEARCH_INLINE_BUFFER) { + SSDFS_ERR("invalid buf_state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + hash1 = search->request.start.hash; + ino1 = search->request.start.ino; + + cur = &search->raw.dentry; + hash2 = le64_to_cpu(cur->header.hash_code); + ino2 = le64_to_cpu(cur->header.ino); + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_VALID_ITEM: + if (hash1 != hash2 || ino1 != ino2) { + SSDFS_ERR("hash1 %llx, hash2 %llx, " + "ino1 %llu, ino2 %llu\n", + hash1, hash2, ino1, ino2); + return -ERANGE; + } + break; + + default: + SSDFS_WARN("unexpected result state %#x\n", + search->result.state); + return -ERANGE; + } + + dentries_count = atomic64_read(&tree->dentries_count); + if (dentries_count == 0) { + SSDFS_DBG("empty tree\n"); + return -ENOENT; + } + + if (search->result.start_index >= dentries_count) { + SSDFS_ERR("invalid search result: " + "start_index %u, dentries_count %lld\n", + search->result.start_index, + dentries_count); + return -ENODATA; + } + + err = ssdfs_btree_delete_item(tree->generic_tree, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete the dentry from the tree: " + "err %d\n", err); + return err; + } + + atomic_set(&tree->state, SSDFS_DENTRIES_BTREE_DIRTY); + + err = ssdfs_btree_synchronize_root_node(tree->generic_tree, + tree->root); + if (unlikely(err)) { + SSDFS_ERR("fail to synchronize the root node: " + "err %d\n", err); + return err; + } + + dentries_count = atomic64_read(&tree->dentries_count); + if (dentries_count == 0) { + SSDFS_DBG("tree is empty now\n"); + return -ENOENT; + } else if (dentries_count < 0) { + SSDFS_WARN("invalid dentries_count %lld\n", + dentries_count); + atomic_set(&tree->state, SSDFS_DENTRIES_BTREE_CORRUPTED); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_dentries_tree_remove_generic_items() - delete generic items + * @tree: dentries tree + * @count: requested number of items + * @search: search object + * + * This method tries to extract the head range. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_dentries_tree_get_head_range(struct ssdfs_dentries_btree_info *tree, + s64 count, + struct ssdfs_btree_search *search) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, search %p, count %lld\n", + tree, search, count); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_FIND_RANGE; + search->request.flags = 0; + search->request.start.hash = U64_MAX; + search->request.start.ino = U64_MAX; + search->request.end.hash = U64_MAX; + search->request.end.ino = U64_MAX; + search->request.count = 0; + + err = ssdfs_btree_get_head_range(&tree->buffer.tree, + count, search); + if (err == -EAGAIN) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("need to repeat extraction: " + "count %lld, search->result.count %u\n", + count, search->result.count); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to extract dentries: " + "count %lld, err %d\n", + count, err); + return err; + } + + if (search->result.state != SSDFS_BTREE_SEARCH_VALID_ITEM) { + SSDFS_ERR("invalid search result's state %#x\n", + search->result.state); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_dentries_tree_remove_generic_items() - delete generic items + * @tree: dentries tree + * @count: requested number of items + * @start_ino: starting inode ID + * @start_hash: starting hash + * @end_ino: ending inode ID + * @end_hash: ending hash + * @search: search object + * + * This method tries to delete generic items. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EAGAIN - tree is not empty. + */ +static int +ssdfs_dentries_tree_remove_generic_items(struct ssdfs_dentries_btree_info *tree, + s64 count, + u64 start_ino, u64 start_hash, + u64 end_ino, u64 end_hash, + struct ssdfs_btree_search *search) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, search %p, count %lld, " + "start_ino %llu, start_hash %llx, " + "end_ino %llu, end_hash %llx\n", + tree, search, count, + start_ino, start_hash, + end_ino, end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->request.type = SSDFS_BTREE_SEARCH_DELETE_RANGE; + search->request.flags = SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT | + SSDFS_BTREE_SEARCH_HAS_VALID_INO; + search->request.start.hash = start_hash; + search->request.start.ino = start_ino; + search->request.end.hash = end_hash; + search->request.end.ino = end_ino; + search->request.count = count; + + err = ssdfs_btree_delete_range(&tree->buffer.tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete range: " + "start (hash %llx, ino %llu), " + "end (hash %llx, ino %llu), " + "count %u, err %d\n", + search->request.start.hash, + search->request.start.ino, + search->request.end.hash, + search->request.end.ino, + search->request.count, + err); + return err; + } + + if (!is_ssdfs_btree_empty(&tree->buffer.tree)) { + SSDFS_DBG("dentries tree is not empty\n"); + return -EAGAIN; + } + + return 0; +} + +/* + * ssdfs_migrate_generic2inline_tree() - convert generic tree into inline + * @tree: dentries tree + * + * This method tries to convert the generic tree into inline one. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOSPC - the tree cannot be converted into inline again. + */ +static +int ssdfs_migrate_generic2inline_tree(struct ssdfs_dentries_btree_info *tree) +{ + struct ssdfs_dir_entry dentries[SSDFS_INLINE_DENTRIES_COUNT]; + struct ssdfs_dir_entry *cur; + struct ssdfs_btree_search *search; + size_t dentry_size = sizeof(struct ssdfs_dir_entry); + s64 dentries_count, dentries_capacity; + s64 count; + u64 start_ino; + u64 start_hash; + u64 end_ino; + u64 end_hash; + s64 copied = 0; + s64 start_index, end_index; + int private_flags; + s64 i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p\n", tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->type)) { + case SSDFS_PRIVATE_DENTRIES_BTREE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's type %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + + switch (atomic_read(&tree->state)) { + case SSDFS_DENTRIES_BTREE_CREATED: + case SSDFS_DENTRIES_BTREE_INITIALIZED: + case SSDFS_DENTRIES_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + if (!tree->owner) { + SSDFS_ERR("empty owner inode\n"); + return -ERANGE; + } + + dentries_count = atomic64_read(&tree->dentries_count); + private_flags = atomic_read(&tree->owner->private_flags); + + dentries_capacity = SSDFS_INLINE_DENTRIES_COUNT; + if (private_flags & SSDFS_INODE_HAS_XATTR_BTREE) + dentries_capacity -= SSDFS_INLINE_DENTRIES_PER_AREA; + + if (private_flags & SSDFS_INODE_HAS_INLINE_DENTRIES) { + SSDFS_ERR("the dentries tree is not generic\n"); + return -ERANGE; + } + + if (dentries_count > dentries_capacity) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dentries_count %lld > dentries_capacity %lld\n", + dentries_count, dentries_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(tree->inline_dentries || !tree->generic_tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset(dentries, 0xFF, ssdfs_inline_dentries_size()); + + search = ssdfs_btree_search_alloc(); + if (!search) { + SSDFS_ERR("fail to allocate btree search object\n"); + return -ENOMEM; + } + +try_extract_range: + if (copied >= dentries_count) { + err = -ERANGE; + SSDFS_ERR("copied %lld >= dentries_count %lld\n", + copied, dentries_count); + goto finish_process_range; + } + + count = dentries_count - copied; + + err = ssdfs_dentries_tree_get_head_range(tree, count, search); + if (unlikely(err)) { + SSDFS_ERR("fail to extract dentries: " + "dentries_count %lld, err %d\n", + dentries_count, err); + goto finish_process_range; + } + + if (search->result.count == 0) { + err = -ERANGE; + SSDFS_ERR("invalid search->result.count %u\n", + search->result.count); + goto finish_process_range; + } + + if ((copied + search->result.count) > SSDFS_INLINE_DENTRIES_COUNT) { + err = -ERANGE; + SSDFS_ERR("invalid items count: " + "copied %lld, count %u, capacity %u\n", + copied, search->result.count, + SSDFS_INLINE_DENTRIES_COUNT); + goto finish_process_range; + } + + switch (search->result.buf_state) { + case SSDFS_BTREE_SEARCH_INLINE_BUFFER: + err = ssdfs_memcpy(dentries, + copied * dentry_size, + ssdfs_inline_dentries_size(), + &search->raw.dentry.header, + 0, dentry_size, + dentry_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + goto finish_process_range; + } + break; + + case SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER: + if (!search->result.buf) { + err = -ERANGE; + SSDFS_ERR("empty buffer\n"); + goto finish_process_range; + } + + err = ssdfs_memcpy(dentries, + copied * dentry_size, + ssdfs_inline_dentries_size(), + search->result.buf, + 0, search->result.buf_size, + (u64)dentry_size * search->result.count); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + goto finish_process_range; + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid buffer's state %#x\n", + search->result.buf_state); + goto finish_process_range; + } + + start_index = copied; + end_index = copied + search->result.count - 1; + + start_hash = le64_to_cpu(dentries[start_index].hash_code); + start_ino = le64_to_cpu(dentries[start_index].ino); + end_hash = le64_to_cpu(dentries[end_index].hash_code); + end_ino = le64_to_cpu(dentries[end_index].ino); + + count = search->result.count; + copied += count; + + err = ssdfs_dentries_tree_remove_generic_items(tree, count, + start_ino, + start_hash, + end_ino, + end_hash, + search); + if (err == -EAGAIN) { + err = 0; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("need to extract more: " + "copied %lld, dentries_count %lld\n", + copied, dentries_count); +#endif /* CONFIG_SSDFS_DEBUG */ + goto try_extract_range; + } else if (unlikely(err)) { + SSDFS_ERR("fail to remove generic items: " + "start (ino %llu, hash %llx), " + "end (ino %llu, hash %llx), " + "count %lld, err %d\n", + start_ino, start_hash, + end_ino, end_hash, + count, err); + goto finish_process_range; + } + + err = ssdfs_btree_destroy_node_range(&tree->buffer.tree, + 0); + if (unlikely(err)) { + SSDFS_ERR("fail to destroy nodes' range: err %d\n", + err); + goto finish_process_range; + } + +finish_process_range: + ssdfs_btree_search_free(search); + + if (unlikely(err)) + return err; + + ssdfs_btree_destroy(&tree->buffer.tree); + + for (i = 0; i < dentries_count; i++) { + cur = &dentries[i]; + + cur->dentry_type = SSDFS_INLINE_DENTRY; + } + + ssdfs_memcpy(tree->buffer.dentries, + 0, ssdfs_inline_dentries_size(), + dentries, + 0, ssdfs_inline_dentries_size(), + dentry_size * dentries_count); + + atomic_set(&tree->type, SSDFS_INLINE_DENTRIES_ARRAY); + atomic_set(&tree->state, SSDFS_DENTRIES_BTREE_DIRTY); + tree->inline_dentries = tree->buffer.dentries; + tree->generic_tree = NULL; + + atomic64_set(&tree->dentries_count, dentries_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dentries_count %llu\n", + atomic64_read(&tree->dentries_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + atomic_and(~SSDFS_INODE_HAS_DENTRIES_BTREE, + &tree->owner->private_flags); + atomic_or(SSDFS_INODE_HAS_INLINE_DENTRIES, + &tree->owner->private_flags); + + return 0; +} + +/* + * ssdfs_dentries_tree_delete() - delete dentry from the tree + * @tree: dentries tree + * @name_hash: hash of the name + * @ino: inode ID + * @search: search object + * + * This method tries to delete dentry from the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - dentry doesn't exist in the tree. + */ +int ssdfs_dentries_tree_delete(struct ssdfs_dentries_btree_info *tree, + u64 name_hash, ino_t ino, + struct ssdfs_btree_search *search) +{ + int threshold = SSDFS_INLINE_DENTRIES_PER_AREA; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p, search %p, name_hash %llx\n", + tree, search, name_hash); +#else + SSDFS_DBG("tree %p, search %p, name_hash %llx\n", + tree, search, name_hash); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + switch (atomic_read(&tree->state)) { + case SSDFS_DENTRIES_BTREE_CREATED: + case SSDFS_DENTRIES_BTREE_INITIALIZED: + case SSDFS_DENTRIES_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + + if (need_initialize_dentries_btree_search(name_hash, search)) { + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + search->request.flags = + SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT | + SSDFS_BTREE_SEARCH_HAS_VALID_INO; + search->request.start.hash = name_hash; + search->request.start.name = NULL; + search->request.start.name_len = U32_MAX; + search->request.start.ino = ino; + search->request.end.hash = name_hash; + search->request.end.name = NULL; + search->request.end.name_len = U32_MAX; + search->request.end.ino = ino; + search->request.count = 1; + } + + ssdfs_debug_dentries_btree_object(tree); + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_DENTRIES_ARRAY: + down_write(&tree->lock); + + err = ssdfs_dentries_tree_find_inline_dentry(tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the inline dentry: " + "name_hash %llx, err %d\n", + name_hash, err); + goto finish_delete_inline_dentry; + } + + search->request.type = SSDFS_BTREE_SEARCH_DELETE_ITEM; + + err = ssdfs_dentries_tree_delete_inline_dentry(tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete dentry: " + "name_hash %llx, err %d\n", + name_hash, err); + goto finish_delete_inline_dentry; + } + +finish_delete_inline_dentry: + up_write(&tree->lock); + break; + + case SSDFS_PRIVATE_DENTRIES_BTREE: + down_read(&tree->lock); + + err = ssdfs_btree_find_item(tree->generic_tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the dentry: " + "name_hash %llx, err %d\n", + name_hash, err); + goto finish_delete_generic_dentry; + } + + search->request.type = SSDFS_BTREE_SEARCH_DELETE_ITEM; + + err = ssdfs_dentries_tree_delete_dentry(tree, search); + + ssdfs_btree_search_forget_parent_node(search); + ssdfs_btree_search_forget_child_node(search); + + if (unlikely(err)) { + SSDFS_ERR("fail to delete dentry: " + "name_hash %llx, err %d\n", + name_hash, err); + goto finish_delete_generic_dentry; + } + +finish_delete_generic_dentry: + up_read(&tree->lock); + + if (!err && + need_migrate_generic2inline_btree(tree->generic_tree, + threshold)) { + down_write(&tree->lock); + err = ssdfs_migrate_generic2inline_tree(tree); + up_write(&tree->lock); + + if (err == -ENOSPC) { + /* continue to use the generic tree */ + err = 0; + SSDFS_DBG("unable to re-create inline tree\n"); + } else if (unlikely(err)) { + SSDFS_ERR("fail to re-create inline tree: " + "err %d\n", + err); + } + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid dentries tree type %#x\n", + atomic_read(&tree->type)); + break; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ssdfs_debug_dentries_btree_object(tree); + + return err; +} + +/* + * ssdfs_delete_all_inline_dentries() - delete all inline dentries + * @tree: dentries tree + * + * This method tries to delete all inline dentries in the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOENT - empty tree. + */ +static +int ssdfs_delete_all_inline_dentries(struct ssdfs_dentries_btree_info *tree) +{ + s64 dentries_count; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p\n", tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_DENTRIES_ARRAY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's type %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + + switch (atomic_read(&tree->state)) { + case SSDFS_DENTRIES_BTREE_CREATED: + case SSDFS_DENTRIES_BTREE_INITIALIZED: + case SSDFS_DENTRIES_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + if (!tree->inline_dentries) { + SSDFS_ERR("empty inline dentries %p\n", + tree->inline_dentries); + return -ERANGE; + } + + dentries_count = atomic64_read(&tree->dentries_count); + if (dentries_count == 0) { + SSDFS_DBG("empty tree\n"); + return -ENOENT; + } else if (dentries_count > SSDFS_INLINE_DENTRIES_COUNT) { + atomic_set(&tree->state, + SSDFS_DENTRIES_BTREE_CORRUPTED); + SSDFS_ERR("dentries tree is corupted: " + "dentries_count %lld", + dentries_count); + return -ERANGE; + } + + memset(tree->inline_dentries, 0xFF, + sizeof(struct ssdfs_dir_entry) * SSDFS_INLINE_DENTRIES_COUNT); + + atomic_set(&tree->state, SSDFS_DENTRIES_BTREE_DIRTY); + return 0; +} + +/* + * ssdfs_dentries_tree_delete_all() - delete all dentries in the tree + * @tree: dentries tree + * + * This method tries to delete all dentries in the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_dentries_tree_delete_all(struct ssdfs_dentries_btree_info *tree) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p\n", tree); +#else + SSDFS_DBG("tree %p\n", tree); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + switch (atomic_read(&tree->state)) { + case SSDFS_DENTRIES_BTREE_CREATED: + case SSDFS_DENTRIES_BTREE_INITIALIZED: + case SSDFS_DENTRIES_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_DENTRIES_ARRAY: + down_write(&tree->lock); + err = ssdfs_delete_all_inline_dentries(tree); + if (!err) + atomic64_set(&tree->dentries_count, 0); + up_write(&tree->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to delete all inline dentries: " + "err %d\n", + err); + } + break; + + case SSDFS_PRIVATE_DENTRIES_BTREE: + down_write(&tree->lock); + err = ssdfs_btree_delete_all(tree->generic_tree); + if (!err) { + atomic64_set(&tree->dentries_count, 0); + } + up_write(&tree->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to delete the all dentries: " + "err %d\n", + err); + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid dentries tree type %#x\n", + atomic_read(&tree->type)); + break; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dentries_count %llu\n", + atomic64_read(&tree->dentries_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * ssdfs_dentries_tree_extract_inline_range() - extract inline range + * @tree: dentries tree + * @start_index: start item index + * @count: requested count of items + * @search: search object + * + * This method tries to extract a range of items from the inline tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-ENOENT - unable to extract any items. + */ +static int +ssdfs_dentries_tree_extract_inline_range(struct ssdfs_dentries_btree_info *tree, + u16 start_index, u16 count, + struct ssdfs_btree_search *search) +{ + size_t dentry_size = sizeof(struct ssdfs_dir_entry); + u64 dentries_count; + size_t buf_size; + u16 i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + BUG_ON(atomic_read(&tree->type) != SSDFS_INLINE_DENTRIES_ARRAY); + BUG_ON(!tree->inline_dentries); + + SSDFS_DBG("tree %p, start_index %u, count %u, search %p\n", + tree, start_index, count, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->result.count = 0; + + dentries_count = atomic64_read(&tree->dentries_count); + if (dentries_count == 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dentries_count %llu\n", + dentries_count); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOENT; + } else if (dentries_count > SSDFS_INLINE_DENTRIES_COUNT) { + SSDFS_ERR("unexpected dentries_count %llu\n", + dentries_count); + return -ERANGE; + } + + if (start_index >= dentries_count) { + SSDFS_ERR("start_index %u >= dentries_count %llu\n", + start_index, dentries_count); + return -ERANGE; + } + + count = min_t(u16, count, (u16)(dentries_count - start_index)); + buf_size = dentry_size * count; + + switch (search->result.buf_state) { + case SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE: + case SSDFS_BTREE_SEARCH_INLINE_BUFFER: + if (count == 1) { + search->result.buf = &search->raw.dentry; + search->result.buf_state = + SSDFS_BTREE_SEARCH_INLINE_BUFFER; + search->result.buf_size = buf_size; + search->result.items_in_buffer = 0; + } else { + err = ssdfs_btree_search_alloc_result_buf(search, + buf_size); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate buffer\n"); + return err; + } + } + break; + + case SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER: + if (count == 1) { + ssdfs_btree_search_free_result_buf(search); + + search->result.buf = &search->raw.dentry; + search->result.buf_state = + SSDFS_BTREE_SEARCH_INLINE_BUFFER; + search->result.buf_size = buf_size; + search->result.items_in_buffer = 0; + } else { + search->result.buf = krealloc(search->result.buf, + buf_size, GFP_KERNEL); + if (!search->result.buf) { + SSDFS_ERR("fail to allocate buffer\n"); + return -ENOMEM; + } + search->result.buf_state = + SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER; + search->result.buf_size = buf_size; + search->result.items_in_buffer = 0; + } + break; + + default: + SSDFS_ERR("invalid buf_state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + for (i = start_index; i < (start_index + count); i++) { + err = ssdfs_memcpy(search->result.buf, + i * dentry_size, search->result.buf_size, + tree->inline_dentries, + i * dentry_size, + ssdfs_inline_dentries_size(), + dentry_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + return err; + } + + search->result.items_in_buffer++; + search->result.count++; + } + + search->result.state = SSDFS_BTREE_SEARCH_VALID_ITEM; + return 0; +} + +/* + * ssdfs_dentries_tree_extract_range() - extract range of items + * @tree: dentries tree + * @start_index: start item index in the node + * @count: requested count of items + * @search: search object + * + * This method tries to extract a range of items from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-ENOENT - unable to extract any items. + */ +int ssdfs_dentries_tree_extract_range(struct ssdfs_dentries_btree_info *tree, + u16 start_index, u16 count, + struct ssdfs_btree_search *search) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + + SSDFS_DBG("tree %p, start_index %u, count %u, search %p\n", + tree, start_index, count, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->state)) { + case SSDFS_DENTRIES_BTREE_CREATED: + case SSDFS_DENTRIES_BTREE_INITIALIZED: + case SSDFS_DENTRIES_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_DENTRIES_ARRAY: + down_read(&tree->lock); + err = ssdfs_dentries_tree_extract_inline_range(tree, + start_index, + count, + search); + up_read(&tree->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to extract the inline range: " + "start_index %u, count %u, err %d\n", + start_index, count, err); + } + break; + + case SSDFS_PRIVATE_DENTRIES_BTREE: + down_read(&tree->lock); + err = ssdfs_btree_extract_range(tree->generic_tree, + start_index, count, + search); + up_read(&tree->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to extract the range: " + "start_index %u, count %u, err %d\n", + start_index, count, err); + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid dentries tree type %#x\n", + atomic_read(&tree->type)); + break; + } + + return err; +} + +/****************************************************************************** + * SPECIALIZED DENTRIES BTREE DESCRIPTOR OPERATIONS * + ******************************************************************************/ + +/* + * ssdfs_dentries_btree_desc_init() - specialized btree descriptor init + * @fsi: pointer on shared file system object + * @tree: pointer on dentries btree object + */ +static +int ssdfs_dentries_btree_desc_init(struct ssdfs_fs_info *fsi, + struct ssdfs_btree *tree) +{ + struct ssdfs_dentries_btree_info *tree_info = NULL; + struct ssdfs_btree_descriptor *desc; + u32 erasesize; + u32 node_size; + size_t dentry_size = sizeof(struct ssdfs_dir_entry); + u16 item_size; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !tree); + + SSDFS_DBG("fsi %p, tree %p\n", + fsi, tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_info = container_of(tree, + struct ssdfs_dentries_btree_info, + buffer.tree); + desc = &tree_info->desc.desc; + erasesize = fsi->erasesize; + + if (le32_to_cpu(desc->magic) != SSDFS_DENTRIES_BTREE_MAGIC) { + err = -EIO; + SSDFS_ERR("invalid magic %#x\n", + le32_to_cpu(desc->magic)); + goto finish_btree_desc_init; + } + + /* TODO: check flags */ + + if (desc->type != SSDFS_DENTRIES_BTREE) { + err = -EIO; + SSDFS_ERR("invalid btree type %#x\n", + desc->type); + goto finish_btree_desc_init; + } + + node_size = 1 << desc->log_node_size; + if (node_size < SSDFS_4KB || node_size > erasesize) { + err = -EIO; + SSDFS_ERR("invalid node size: " + "log_node_size %u, node_size %u, erasesize %u\n", + desc->log_node_size, + node_size, erasesize); + goto finish_btree_desc_init; + } + + item_size = le16_to_cpu(desc->item_size); + + if (item_size != dentry_size) { + err = -EIO; + SSDFS_ERR("invalid item size %u\n", + item_size); + goto finish_btree_desc_init; + } + + if (le16_to_cpu(desc->index_area_min_size) < (2 * dentry_size)) { + err = -EIO; + SSDFS_ERR("invalid index_area_min_size %u\n", + le16_to_cpu(desc->index_area_min_size)); + goto finish_btree_desc_init; + } + + err = ssdfs_btree_desc_init(fsi, tree, desc, (u8)item_size, item_size); + +finish_btree_desc_init: + if (unlikely(err)) { + SSDFS_ERR("fail to init btree descriptor: err %d\n", + err); + } + + return err; +} + +/* + * ssdfs_dentries_btree_desc_flush() - specialized btree's descriptor flush + * @tree: pointer on btree object + */ +static +int ssdfs_dentries_btree_desc_flush(struct ssdfs_btree *tree) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_dentries_btree_info *tree_info = NULL; + struct ssdfs_btree_descriptor desc; + size_t dentry_size = sizeof(struct ssdfs_dir_entry); + u32 erasesize; + u32 node_size; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !tree->fsi); + + SSDFS_DBG("owner_ino %llu, type %#x, state %#x\n", + tree->owner_ino, tree->type, + atomic_read(&tree->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = tree->fsi; + + if (tree->type != SSDFS_DENTRIES_BTREE) { + SSDFS_WARN("invalid tree type %#x\n", + tree->type); + return -ERANGE; + } else { + tree_info = container_of(tree, + struct ssdfs_dentries_btree_info, + buffer.tree); + } + + memset(&desc, 0xFF, sizeof(struct ssdfs_btree_descriptor)); + + desc.magic = cpu_to_le32(SSDFS_DENTRIES_BTREE_MAGIC); + desc.item_size = cpu_to_le16(dentry_size); + + err = ssdfs_btree_desc_flush(tree, &desc); + if (unlikely(err)) { + SSDFS_ERR("invalid btree descriptor: err %d\n", + err); + return err; + } + + if (desc.type != SSDFS_DENTRIES_BTREE) { + SSDFS_ERR("invalid btree type %#x\n", + desc.type); + return -ERANGE; + } + + erasesize = fsi->erasesize; + node_size = 1 << desc.log_node_size; + + if (node_size < SSDFS_4KB || node_size > erasesize) { + SSDFS_ERR("invalid node size: " + "log_node_size %u, node_size %u, erasesize %u\n", + desc.log_node_size, + node_size, erasesize); + return -ERANGE; + } + + if (le16_to_cpu(desc.index_area_min_size) < (2 * dentry_size)) { + SSDFS_ERR("invalid index_area_min_size %u\n", + le16_to_cpu(desc.index_area_min_size)); + return -ERANGE; + } + + ssdfs_memcpy(&tree_info->desc.desc, + 0, sizeof(struct ssdfs_btree_descriptor), + &desc, + 0, sizeof(struct ssdfs_btree_descriptor), + sizeof(struct ssdfs_btree_descriptor)); + + return 0; +} + +/****************************************************************************** + * SPECIALIZED DENTRIES BTREE OPERATIONS * + ******************************************************************************/ + +/* + * ssdfs_dentries_btree_create_root_node() - specialized root node creation + * @fsi: pointer on shared file system object + * @node: pointer on node object [out] + */ +static +int ssdfs_dentries_btree_create_root_node(struct ssdfs_fs_info *fsi, + struct ssdfs_btree_node *node) +{ + struct ssdfs_btree *tree; + struct ssdfs_dentries_btree_info *tree_info = NULL; + struct ssdfs_btree_inline_root_node tmp_buffer; + struct ssdfs_inode *raw_inode = NULL; + int private_flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !node); + + SSDFS_DBG("fsi %p, node %p\n", + fsi, node); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree = node->tree; + if (!tree) { + SSDFS_ERR("node hasn't pointer on tree\n"); + return -ERANGE; + } + + if (atomic_read(&tree->state) != SSDFS_BTREE_UNKNOWN_STATE) { + SSDFS_ERR("unexpected tree state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + } + + if (tree->type != SSDFS_DENTRIES_BTREE) { + SSDFS_WARN("invalid tree type %#x\n", + tree->type); + return -ERANGE; + } else { + tree_info = container_of(tree, + struct ssdfs_dentries_btree_info, + buffer.tree); + } + + if (!tree_info->owner) { + SSDFS_ERR("empty inode pointer\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!rwsem_is_locked(&tree_info->owner->lock)); + BUG_ON(!rwsem_is_locked(&tree_info->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + private_flags = atomic_read(&tree_info->owner->private_flags); + + if (private_flags & SSDFS_INODE_HAS_DENTRIES_BTREE) { + switch (atomic_read(&tree_info->type)) { + case SSDFS_PRIVATE_DENTRIES_BTREE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid tree type %#x\n", + atomic_read(&tree_info->type)); + return -ERANGE; + } + + raw_inode = &tree_info->owner->raw_inode; + ssdfs_memcpy(&tmp_buffer, + 0, sizeof(struct ssdfs_btree_inline_root_node), + &raw_inode->internal[0].area1.dentries_root, + 0, sizeof(struct ssdfs_btree_inline_root_node), + sizeof(struct ssdfs_btree_inline_root_node)); + } else { + switch (atomic_read(&tree_info->type)) { + case SSDFS_INLINE_DENTRIES_ARRAY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid tree type %#x\n", + atomic_read(&tree_info->type)); + return -ERANGE; + } + + memset(&tmp_buffer, 0xFF, + sizeof(struct ssdfs_btree_inline_root_node)); + + tmp_buffer.header.height = SSDFS_BTREE_LEAF_NODE_HEIGHT + 1; + tmp_buffer.header.items_count = 0; + tmp_buffer.header.flags = 0; + tmp_buffer.header.type = SSDFS_BTREE_ROOT_NODE; + tmp_buffer.header.upper_node_id = + cpu_to_le32(SSDFS_BTREE_ROOT_NODE_ID); + } + + ssdfs_memcpy(&tree_info->root_buffer, + 0, sizeof(struct ssdfs_btree_inline_root_node), + &tmp_buffer, + 0, sizeof(struct ssdfs_btree_inline_root_node), + sizeof(struct ssdfs_btree_inline_root_node)); + tree_info->root = &tree_info->root_buffer; + + err = ssdfs_btree_create_root_node(node, tree_info->root); + if (unlikely(err)) { + SSDFS_ERR("fail to create root node: err %d\n", + err); + } + + return err; +} + +/* + * ssdfs_dentries_btree_pre_flush_root_node() - specialized root node pre-flush + * @node: pointer on node object + */ +static +int ssdfs_dentries_btree_pre_flush_root_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_btree *tree; + struct ssdfs_state_bitmap *bmap; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node_id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + case SSDFS_BTREE_NODE_INITIALIZED: + SSDFS_DBG("node %u is clean\n", + node->node_id); + return 0; + + case SSDFS_BTREE_NODE_CORRUPTED: + SSDFS_WARN("node %u is corrupted\n", + node->node_id); + down_read(&node->bmap_array.lock); + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_DIRTY_BMAP]; + spin_lock(&bmap->lock); + bitmap_clear(bmap->ptr, 0, node->bmap_array.bits_count); + spin_unlock(&bmap->lock); + up_read(&node->bmap_array.lock); + clear_ssdfs_btree_node_dirty(node); + return -EFAULT; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + tree = node->tree; + if (!tree) { + SSDFS_ERR("node hasn't pointer on tree\n"); + return -ERANGE; + } + + if (tree->type != SSDFS_DENTRIES_BTREE) { + SSDFS_WARN("invalid tree type %#x\n", + tree->type); + return -ERANGE; + } + + down_write(&node->full_lock); + down_write(&node->header_lock); + + err = ssdfs_btree_pre_flush_root_node(node); + if (unlikely(err)) { + SSDFS_ERR("fail to pre-flush root node: " + "node_id %u, err %d\n", + node->node_id, err); + } + + up_write(&node->header_lock); + up_write(&node->full_lock); + + return err; +} + +/* + * ssdfs_dentries_btree_flush_root_node() - specialized root node flush + * @node: pointer on node object + */ +static +int ssdfs_dentries_btree_flush_root_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_btree *tree; + struct ssdfs_dentries_btree_info *tree_info = NULL; + struct ssdfs_btree_inline_root_node tmp_buffer; + struct ssdfs_inode *raw_inode = NULL; + int private_flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node %p, node_id %u\n", + node, node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree = node->tree; + if (!tree) { + SSDFS_ERR("node hasn't pointer on tree\n"); + return -ERANGE; + } + + if (tree->type != SSDFS_DENTRIES_BTREE) { + SSDFS_WARN("invalid tree type %#x\n", + tree->type); + return -ERANGE; + } else { + tree_info = container_of(tree, + struct ssdfs_dentries_btree_info, + buffer.tree); + } + + if (!tree_info->owner) { + SSDFS_ERR("empty inode pointer\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!rwsem_is_locked(&tree_info->owner->lock)); + BUG_ON(!rwsem_is_locked(&tree_info->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + private_flags = atomic_read(&tree_info->owner->private_flags); + + if (private_flags & SSDFS_INODE_HAS_DENTRIES_BTREE) { + switch (atomic_read(&tree_info->type)) { + case SSDFS_PRIVATE_DENTRIES_BTREE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid tree type %#x\n", + atomic_read(&tree_info->type)); + return -ERANGE; + } + + if (!tree_info->root) { + SSDFS_ERR("root node pointer is NULL\n"); + return -ERANGE; + } + + ssdfs_btree_flush_root_node(node, tree_info->root); + + ssdfs_memcpy(&tmp_buffer, + 0, sizeof(struct ssdfs_btree_inline_root_node), + tree_info->root, + 0, sizeof(struct ssdfs_btree_inline_root_node), + sizeof(struct ssdfs_btree_inline_root_node)); + + raw_inode = &tree_info->owner->raw_inode; + ssdfs_memcpy(&raw_inode->internal[0].area1.dentries_root, + 0, sizeof(struct ssdfs_btree_inline_root_node), + &tmp_buffer, + 0, sizeof(struct ssdfs_btree_inline_root_node), + sizeof(struct ssdfs_btree_inline_root_node)); + } else { + err = -ERANGE; + SSDFS_ERR("dentries tree is inline dentries array\n"); + } + + return err; +} + +/* + * ssdfs_dentries_btree_create_node() - specialized node creation + * @node: pointer on node object + */ +static +int ssdfs_dentries_btree_create_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_btree *tree; + void *addr[SSDFS_BTREE_NODE_BMAP_COUNT]; + size_t hdr_size = sizeof(struct ssdfs_dentries_btree_node_header); + u32 node_size; + u32 items_area_size = 0; + u16 item_size = 0; + u16 index_size = 0; + u16 index_area_min_size; + u16 items_capacity = 0; + u16 index_capacity = 0; + u32 index_area_size = 0; + size_t bmap_bytes; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree); + WARN_ON(atomic_read(&node->state) != SSDFS_BTREE_NODE_CREATED); + + SSDFS_DBG("node_id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree = node->tree; + node_size = tree->node_size; + index_area_min_size = tree->index_area_min_size; + + node->node_ops = &ssdfs_dentries_btree_node_ops; + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_INDEX_NODE: + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid index area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_AREA_ABSENT: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid items area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + break; + + case SSDFS_BTREE_HYBRID_NODE: + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid index area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid items area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + break; + + case SSDFS_BTREE_LEAF_NODE: + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_AREA_ABSENT: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid index area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid items area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + break; + + default: + SSDFS_WARN("invalid node type %#x\n", + atomic_read(&node->type)); + return -ERANGE; + } + + down_write(&node->header_lock); + down_write(&node->bmap_array.lock); + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_INDEX_NODE: + node->index_area.offset = (u32)hdr_size; + node->index_area.area_size = node_size - hdr_size; + + index_area_size = node->index_area.area_size; + index_size = node->index_area.index_size; + + node->index_area.index_capacity = index_area_size / index_size; + index_capacity = node->index_area.index_capacity; + + node->bmap_array.index_start_bit = + SSDFS_BTREE_NODE_HEADER_INDEX + 1; + node->bmap_array.item_start_bit = + node->bmap_array.index_start_bit + index_capacity; + + node->raw.dentries_header.dentries_count = cpu_to_le16(0); + node->raw.dentries_header.inline_names = cpu_to_le16(0); + node->raw.dentries_header.free_space = cpu_to_le16(0); + break; + + case SSDFS_BTREE_HYBRID_NODE: + node->index_area.offset = (u32)hdr_size; + + if (index_area_min_size == 0 || + index_area_min_size >= (node_size - hdr_size)) { + err = -ERANGE; + SSDFS_ERR("invalid index area desc: " + "index_area_min_size %u, " + "node_size %u, hdr_size %zu\n", + index_area_min_size, + node_size, hdr_size); + goto finish_create_node; + } + + node->index_area.area_size = index_area_min_size; + + index_area_size = node->index_area.area_size; + index_size = node->index_area.index_size; + node->index_area.index_capacity = index_area_size / index_size; + index_capacity = node->index_area.index_capacity; + + node->items_area.offset = node->index_area.offset + + node->index_area.area_size; + + if (node->items_area.offset >= node_size) { + err = -ERANGE; + SSDFS_ERR("invalid items area desc: " + "area_offset %u, node_size %u\n", + node->items_area.offset, + node_size); + goto finish_create_node; + } + + node->items_area.area_size = node_size - + node->items_area.offset; + node->items_area.free_space = node->items_area.area_size; + node->items_area.item_size = tree->item_size; + node->items_area.min_item_size = tree->min_item_size; + node->items_area.max_item_size = tree->max_item_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_size %u, hdr_size %zu, free_space %u\n", + node_size, hdr_size, + node->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + items_area_size = node->items_area.area_size; + item_size = node->items_area.item_size; + + node->items_area.items_count = 0; + node->items_area.items_capacity = items_area_size / item_size; + items_capacity = node->items_area.items_capacity; + + if (node->items_area.items_capacity == 0) { + err = -ERANGE; + SSDFS_ERR("items area's capacity %u\n", + node->items_area.items_capacity); + goto finish_create_node; + } + + node->bmap_array.index_start_bit = + SSDFS_BTREE_NODE_HEADER_INDEX + 1; + node->bmap_array.item_start_bit = + node->bmap_array.index_start_bit + index_capacity; + + node->raw.dentries_header.dentries_count = cpu_to_le16(0); + node->raw.dentries_header.inline_names = cpu_to_le16(0); + node->raw.dentries_header.free_space = + cpu_to_le16((u16)node->items_area.free_space); + break; + + case SSDFS_BTREE_LEAF_NODE: + node->items_area.offset = (u32)hdr_size; + node->items_area.area_size = node_size - hdr_size; + node->items_area.free_space = node->items_area.area_size; + node->items_area.item_size = tree->item_size; + node->items_area.min_item_size = tree->min_item_size; + node->items_area.max_item_size = tree->max_item_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_size %u, hdr_size %zu, free_space %u\n", + node_size, hdr_size, + node->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + items_area_size = node->items_area.area_size; + item_size = node->items_area.item_size; + + node->items_area.items_count = 0; + node->items_area.items_capacity = items_area_size / item_size; + items_capacity = node->items_area.items_capacity; + + node->bmap_array.item_start_bit = + SSDFS_BTREE_NODE_HEADER_INDEX + 1; + + node->raw.dentries_header.dentries_count = cpu_to_le16(0); + node->raw.dentries_header.inline_names = cpu_to_le16(0); + node->raw.dentries_header.free_space = + cpu_to_le16((u16)node->items_area.free_space); + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid node type %#x\n", + atomic_read(&node->type)); + goto finish_create_node; + } + + node->bmap_array.bits_count = index_capacity + items_capacity + 1; + + if (item_size > 0) + items_capacity = node_size / item_size; + else + items_capacity = 0; + + if (index_size > 0) + index_capacity = node_size / index_size; + else + index_capacity = 0; + + bmap_bytes = index_capacity + items_capacity + 1; + bmap_bytes += BITS_PER_LONG; + bmap_bytes /= BITS_PER_BYTE; + + node->bmap_array.bmap_bytes = bmap_bytes; + + if (bmap_bytes == 0 || bmap_bytes > SSDFS_DENTRIES_BMAP_SIZE) { + err = -EIO; + SSDFS_ERR("invalid bmap_bytes %zu\n", + bmap_bytes); + goto finish_create_node; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, dentries_count %u, " + "inline_names %u, free_space %u\n", + node->node_id, + le16_to_cpu(node->raw.dentries_header.dentries_count), + le16_to_cpu(node->raw.dentries_header.inline_names), + le16_to_cpu(node->raw.dentries_header.free_space)); + SSDFS_DBG("items_count %u, items_capacity %u, " + "start_hash %llx, end_hash %llx\n", + node->items_area.items_count, + node->items_area.items_capacity, + node->items_area.start_hash, + node->items_area.end_hash); + SSDFS_DBG("index_count %u, index_capacity %u, " + "start_hash %llx, end_hash %llx\n", + node->index_area.index_count, + node->index_area.index_capacity, + node->index_area.start_hash, + node->index_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_create_node: + up_write(&node->bmap_array.lock); + up_write(&node->header_lock); + + if (unlikely(err)) + return err; + + err = ssdfs_btree_node_allocate_bmaps(addr, bmap_bytes); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate node's bitmaps: " + "bmap_bytes %zu, err %d\n", + bmap_bytes, err); + return err; + } + + down_write(&node->bmap_array.lock); + for (i = 0; i < SSDFS_BTREE_NODE_BMAP_COUNT; i++) { + spin_lock(&node->bmap_array.bmap[i].lock); + node->bmap_array.bmap[i].ptr = addr[i]; + addr[i] = NULL; + spin_unlock(&node->bmap_array.bmap[i].lock); + } + up_write(&node->bmap_array.lock); + + err = ssdfs_btree_node_allocate_content_space(node, node_size); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate content space: " + "node_size %u, err %d\n", + node_size, err); + return err; + } + + ssdfs_debug_btree_node_object(node); + + return err; +} + +/* + * ssdfs_dentries_btree_init_node() - init dentries tree's node + * @node: pointer on node object + * + * This method tries to init the node of dentries btree. + * + * It makes sense to allocate the bitmap with taking into + * account that we will resize the node. So, it needs + * to allocate the index area in bitmap is equal to + * the whole node and items area is equal to the whole node. + * This technique provides opportunity not to resize or + * to shift the content of the bitmap. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - unable to allocate memory. + * %-ERANGE - internal error. + * %-EIO - invalid node's header content + */ +static +int ssdfs_dentries_btree_init_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_btree *tree; + struct ssdfs_dentries_btree_info *tree_info = NULL; + struct ssdfs_dentries_btree_node_header *hdr; + size_t hdr_size = sizeof(struct ssdfs_dentries_btree_node_header); + void *addr[SSDFS_BTREE_NODE_BMAP_COUNT]; + struct page *page; + void *kaddr; + u64 start_hash, end_hash; + u32 node_size; + u16 item_size; + u64 parent_ino; + u32 dentries_count; + u16 items_capacity; + u16 inline_names; + u16 free_space; + u32 calculated_used_space; + u32 items_count; + u16 flags; + u8 index_size; + u32 index_area_size = 0; + u16 index_capacity = 0; + size_t bmap_bytes; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node_id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree = node->tree; + if (!tree) { + SSDFS_ERR("node hasn't pointer on tree\n"); + return -ERANGE; + } + + if (tree->type != SSDFS_DENTRIES_BTREE) { + SSDFS_WARN("invalid tree type %#x\n", + tree->type); + return -ERANGE; + } else { + tree_info = container_of(tree, + struct ssdfs_dentries_btree_info, + buffer.tree); + } + + if (atomic_read(&node->state) != SSDFS_BTREE_NODE_CONTENT_PREPARED) { + SSDFS_WARN("fail to init node: id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); + return -ERANGE; + } + + down_read(&node->full_lock); + + if (pagevec_count(&node->content.pvec) == 0) { + err = -ERANGE; + SSDFS_ERR("empty node's content: id %u\n", + node->node_id); + goto finish_init_node; + } + + page = node->content.pvec.pages[0]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + kaddr = kmap_local_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("PAGE DUMP\n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + kaddr, + PAGE_SIZE); + SSDFS_DBG("\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + hdr = (struct ssdfs_dentries_btree_node_header *)kaddr; + + if (!is_csum_valid(&hdr->node.check, hdr, hdr_size)) { + err = -EIO; + SSDFS_ERR("invalid checksum: node_id %u\n", + node->node_id); + goto finish_init_operation; + } + + if (le32_to_cpu(hdr->node.magic.common) != SSDFS_SUPER_MAGIC || + le16_to_cpu(hdr->node.magic.key) != SSDFS_DENTRIES_BNODE_MAGIC) { + err = -EIO; + SSDFS_ERR("invalid magic: common %#x, key %#x\n", + le32_to_cpu(hdr->node.magic.common), + le16_to_cpu(hdr->node.magic.key)); + goto finish_init_operation; + } + + down_write(&node->header_lock); + + ssdfs_memcpy(&node->raw.dentries_header, 0, hdr_size, + hdr, 0, hdr_size, + hdr_size); + + err = ssdfs_btree_init_node(node, &hdr->node, + hdr_size); + if (unlikely(err)) { + SSDFS_ERR("fail to init node: id %u, err %d\n", + node->node_id, err); + goto finish_header_init; + } + + flags = atomic_read(&node->flags); + + start_hash = le64_to_cpu(hdr->node.start_hash); + end_hash = le64_to_cpu(hdr->node.end_hash); + node_size = 1 << hdr->node.log_node_size; + index_size = hdr->node.index_size; + item_size = hdr->node.min_item_size; + items_capacity = le16_to_cpu(hdr->node.items_capacity); + parent_ino = le64_to_cpu(hdr->parent_ino); + dentries_count = le16_to_cpu(hdr->dentries_count); + inline_names = le16_to_cpu(hdr->inline_names); + free_space = le16_to_cpu(hdr->free_space); + + if (parent_ino != tree_info->owner->vfs_inode.i_ino) { + err = -EIO; + SSDFS_ERR("parent_ino %llu != ino %lu\n", + parent_ino, + tree_info->owner->vfs_inode.i_ino); + goto finish_header_init; + } + + calculated_used_space = hdr_size; + calculated_used_space += dentries_count * item_size; + + if (flags & SSDFS_BTREE_NODE_HAS_INDEX_AREA) { + index_area_size = 1 << hdr->node.log_index_area_size; + calculated_used_space += index_area_size; + } + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_ROOT_NODE: + /* do nothing */ + break; + + case SSDFS_BTREE_INDEX_NODE: + if (flags & SSDFS_BTREE_NODE_HAS_INDEX_AREA) { + if (index_area_size != node->node_size) { + err = -EIO; + SSDFS_ERR("invalid index area's size: " + "node_id %u, index_area_size %u, " + "node_size %u\n", + node->node_id, + index_area_size, + node->node_size); + goto finish_header_init; + } + + calculated_used_space -= hdr_size; + } else { + err = -EIO; + SSDFS_ERR("invalid set of flags: " + "node_id %u, flags %#x\n", + node->node_id, flags); + goto finish_header_init; + } + break; + + case SSDFS_BTREE_HYBRID_NODE: + if (flags & SSDFS_BTREE_NODE_HAS_INDEX_AREA) { + /* + * expected state + */ + } else { + err = -EIO; + SSDFS_ERR("invalid set of flags: " + "node_id %u, flags %#x\n", + node->node_id, flags); + goto finish_header_init; + } + /* FALLTHRU */ + fallthrough; + case SSDFS_BTREE_LEAF_NODE: + if (dentries_count > 0 && + (start_hash >= U64_MAX || end_hash >= U64_MAX)) { + err = -EIO; + SSDFS_ERR("invalid hash range: " + "start_hash %llx, end_hash %llx\n", + start_hash, end_hash); + goto finish_header_init; + } + + if (item_size == 0 || node_size % item_size) { + err = -EIO; + SSDFS_ERR("invalid size: item_size %u, node_size %u\n", + item_size, node_size); + goto finish_header_init; + } + + if (item_size != sizeof(struct ssdfs_dir_entry)) { + err = -EIO; + SSDFS_ERR("invalid item_size: " + "size %u, expected size %zu\n", + item_size, + sizeof(struct ssdfs_dir_entry)); + goto finish_header_init; + } + + if (items_capacity == 0 || + items_capacity > (node_size / item_size)) { + err = -EIO; + SSDFS_ERR("invalid items_capacity %u\n", + items_capacity); + goto finish_header_init; + } + + if (dentries_count > items_capacity) { + err = -EIO; + SSDFS_ERR("items_capacity %u != dentries_count %u\n", + items_capacity, + dentries_count); + goto finish_header_init; + } + + if (inline_names > dentries_count) { + err = -EIO; + SSDFS_ERR("inline_names %u > dentries_count %u\n", + inline_names, dentries_count); + goto finish_header_init; + } + break; + + default: + BUG(); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_space %u, index_area_size %u, " + "hdr_size %zu, dentries_count %u, " + "item_size %u\n", + free_space, index_area_size, hdr_size, + dentries_count, item_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (free_space != (node_size - calculated_used_space)) { + err = -EIO; + SSDFS_ERR("free_space %u, node_size %u, " + "calculated_used_space %u\n", + free_space, node_size, + calculated_used_space); + goto finish_header_init; + } + + node->items_area.free_space = free_space; + node->items_area.items_count = (u16)dentries_count; + node->items_area.items_capacity = items_capacity; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_count %u, items_capacity %u, " + "start_hash %llx, end_hash %llx\n", + node->items_area.items_count, + node->items_area.items_capacity, + node->items_area.start_hash, + node->items_area.end_hash); + SSDFS_DBG("index_count %u, index_capacity %u, " + "start_hash %llx, end_hash %llx\n", + node->index_area.index_count, + node->index_area.index_capacity, + node->index_area.start_hash, + node->index_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_header_init: + up_write(&node->header_lock); + + if (unlikely(err)) + goto finish_init_operation; + + items_count = node_size / item_size; + + if (item_size > 0) + items_capacity = node_size / item_size; + else + items_capacity = 0; + + if (index_size > 0) + index_capacity = node_size / index_size; + else + index_capacity = 0; + + bmap_bytes = index_capacity + items_capacity + 1; + bmap_bytes += BITS_PER_LONG; + bmap_bytes /= BITS_PER_BYTE; + + if (bmap_bytes == 0 || bmap_bytes > SSDFS_DENTRIES_BMAP_SIZE) { + err = -EIO; + SSDFS_ERR("invalid bmap_bytes %zu\n", + bmap_bytes); + goto finish_init_operation; + } + + err = ssdfs_btree_node_allocate_bmaps(addr, bmap_bytes); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate node's bitmaps: " + "bmap_bytes %zu, err %d\n", + bmap_bytes, err); + goto finish_init_operation; + } + + down_write(&node->bmap_array.lock); + + if (flags & SSDFS_BTREE_NODE_HAS_INDEX_AREA) { + node->bmap_array.index_start_bit = + SSDFS_BTREE_NODE_HEADER_INDEX + 1; + /* + * Reserve the whole node space as + * potential space for indexes. + */ + index_capacity = node_size / index_size; + node->bmap_array.item_start_bit = + node->bmap_array.index_start_bit + index_capacity; + } else if (flags & SSDFS_BTREE_NODE_HAS_ITEMS_AREA) { + node->bmap_array.item_start_bit = + SSDFS_BTREE_NODE_HEADER_INDEX + 1; + } else + BUG(); + + node->bmap_array.bits_count = index_capacity + items_capacity + 1; + node->bmap_array.bmap_bytes = bmap_bytes; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index_capacity %u, index_area_size %u, " + "index_size %u\n", + index_capacity, index_area_size, index_size); + SSDFS_DBG("index_start_bit %lu, item_start_bit %lu, " + "bits_count %lu\n", + node->bmap_array.index_start_bit, + node->bmap_array.item_start_bit, + node->bmap_array.bits_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_btree_node_init_bmaps(node, addr); + + spin_lock(&node->bmap_array.bmap[SSDFS_BTREE_NODE_ALLOC_BMAP].lock); + bitmap_set(node->bmap_array.bmap[SSDFS_BTREE_NODE_ALLOC_BMAP].ptr, + 0, dentries_count); + spin_unlock(&node->bmap_array.bmap[SSDFS_BTREE_NODE_ALLOC_BMAP].lock); + + up_write(&node->bmap_array.lock); +finish_init_operation: + kunmap_local(kaddr); + + if (unlikely(err)) + goto finish_init_node; + +finish_init_node: + up_read(&node->full_lock); + + ssdfs_debug_btree_node_object(node); + + return err; +} + +static +void ssdfs_dentries_btree_destroy_node(struct ssdfs_btree_node *node) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("operation is unavailable\n"); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +/* + * ssdfs_dentries_btree_add_node() - add node into dentries btree + * @node: pointer on node object + * + * This method tries to finish addition of node into dentries btree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_dentries_btree_add_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_btree *tree; + int type; + u16 items_capacity = 0; + u64 start_hash = U64_MAX; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node_id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_WARN("invalid node: id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); + return -ERANGE; + } + + type = atomic_read(&node->type); + + switch (type) { + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_LEAF_NODE: + /* expected states */ + break; + + default: + SSDFS_WARN("invalid node type %#x\n", type); + return -ERANGE; + }; + + tree = node->tree; + if (!tree) { + SSDFS_ERR("node hasn't pointer on tree\n"); + return -ERANGE; + } + + down_read(&node->header_lock); + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + items_capacity = node->items_area.items_capacity; + start_hash = node->items_area.start_hash; + break; + default: + items_capacity = 0; + break; + }; + + if (items_capacity == 0) { + if (type == SSDFS_BTREE_LEAF_NODE || + type == SSDFS_BTREE_HYBRID_NODE) { + err = -ERANGE; + SSDFS_ERR("invalid node state: " + "type %#x, items_capacity %u\n", + type, items_capacity); + goto finish_add_node; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, dentries_count %u, " + "inline_names %u, free_space %u\n", + node->node_id, + le16_to_cpu(node->raw.dentries_header.dentries_count), + le16_to_cpu(node->raw.dentries_header.inline_names), + le16_to_cpu(node->raw.dentries_header.free_space)); + SSDFS_DBG("items_count %u, items_capacity %u, " + "start_hash %llx, end_hash %llx\n", + node->items_area.items_count, + node->items_area.items_capacity, + node->items_area.start_hash, + node->items_area.end_hash); + SSDFS_DBG("index_count %u, index_capacity %u, " + "start_hash %llx, end_hash %llx\n", + node->index_area.index_count, + node->index_area.index_capacity, + node->index_area.start_hash, + node->index_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_add_node: + up_read(&node->header_lock); + + ssdfs_debug_btree_node_object(node); + + if (err) + return err; + + err = ssdfs_btree_update_parent_node_pointer(tree, node); + if (unlikely(err)) { + SSDFS_ERR("fail to update parent pointer: " + "node_id %u, err %d\n", + node->node_id, err); + return err; + } + + return 0; +} + +static +int ssdfs_dentries_btree_delete_node(struct ssdfs_btree_node *node) +{ + /* TODO: implement */ + SSDFS_DBG("TODO: implement\n"); + return 0; + +/* + * TODO: it needs to add special free space descriptor in the + * index area for the case of deleted nodes. Code of + * allocation of new items should create empty node + * with completely free items during passing through + * index level. + */ + + + +/* + * TODO: node can be really deleted/invalidated. But index + * area should contain index for deleted node with + * special flag. In this case it will be clear that + * we have some capacity without real node allocation. + * If some item will be added in the node then node + * has to be allocated. It means that if you delete + * a node then index hierachy will be the same without + * necessity to delete or modify it. + */ + + + + /* TODO: decrement nodes_count and/or leaf_nodes counters */ + /* TODO: decrease inodes_capacity and/or free_inodes */ +} + +/* + * ssdfs_dentries_btree_pre_flush_node() - pre-flush node's header + * @node: pointer on node object + * + * This method tries to flush node's header. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_dentries_btree_pre_flush_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_dentries_btree_node_header dentries_header; + size_t hdr_size = sizeof(struct ssdfs_dentries_btree_node_header); + struct ssdfs_btree *tree; + struct ssdfs_dentries_btree_info *tree_info = NULL; + struct ssdfs_state_bitmap *bmap; + struct page *page; + u16 items_count; + u32 items_area_size; + u16 dentries_count; + u16 inline_names; + u16 free_space; + u32 used_space; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node_id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + case SSDFS_BTREE_NODE_INITIALIZED: + SSDFS_DBG("node %u is clean\n", + node->node_id); + return 0; + + case SSDFS_BTREE_NODE_CORRUPTED: + SSDFS_WARN("node %u is corrupted\n", + node->node_id); + down_read(&node->bmap_array.lock); + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_DIRTY_BMAP]; + spin_lock(&bmap->lock); + bitmap_clear(bmap->ptr, 0, node->bmap_array.bits_count); + spin_unlock(&bmap->lock); + up_read(&node->bmap_array.lock); + clear_ssdfs_btree_node_dirty(node); + return -EFAULT; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + tree = node->tree; + if (!tree) { + SSDFS_ERR("node hasn't pointer on tree\n"); + return -ERANGE; + } + + if (tree->type != SSDFS_DENTRIES_BTREE) { + SSDFS_WARN("invalid tree type %#x\n", + tree->type); + return -ERANGE; + } else { + tree_info = container_of(tree, + struct ssdfs_dentries_btree_info, + buffer.tree); + } + + down_write(&node->full_lock); + down_write(&node->header_lock); + + ssdfs_memcpy(&dentries_header, 0, hdr_size, + &node->raw.dentries_header, 0, hdr_size, + hdr_size); + + dentries_header.node.magic.common = cpu_to_le32(SSDFS_SUPER_MAGIC); + dentries_header.node.magic.key = + cpu_to_le16(SSDFS_DENTRIES_BNODE_MAGIC); + dentries_header.node.magic.version.major = SSDFS_MAJOR_REVISION; + dentries_header.node.magic.version.minor = SSDFS_MINOR_REVISION; + + err = ssdfs_btree_node_pre_flush_header(node, &dentries_header.node); + if (unlikely(err)) { + SSDFS_ERR("fail to flush generic header: " + "node_id %u, err %d\n", + node->node_id, err); + goto finish_dentries_header_preparation; + } + + if (!tree_info->owner) { + err = -ERANGE; + SSDFS_WARN("fail to extract parent_ino\n"); + goto finish_dentries_header_preparation; + } + + dentries_header.parent_ino = + cpu_to_le64(tree_info->owner->vfs_inode.i_ino); + + items_count = node->items_area.items_count; + items_area_size = node->items_area.area_size; + dentries_count = le16_to_cpu(dentries_header.dentries_count); + inline_names = le16_to_cpu(dentries_header.inline_names); + free_space = le16_to_cpu(dentries_header.free_space); + + if (dentries_count != items_count) { + err = -ERANGE; + SSDFS_ERR("dentries_count %u != items_count %u\n", + dentries_count, items_count); + goto finish_dentries_header_preparation; + } + + if (inline_names > dentries_count) { + err = -ERANGE; + SSDFS_ERR("inline_names %u > dentries_count %u\n", + inline_names, dentries_count); + goto finish_dentries_header_preparation; + } + + used_space = (u32)items_count * sizeof(struct ssdfs_dir_entry); + + if (used_space > items_area_size) { + err = -ERANGE; + SSDFS_ERR("used_space %u > items_area_size %u\n", + used_space, items_area_size); + goto finish_dentries_header_preparation; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_space %u, dentries_count %u, " + "items_area_size %u, item_size %zu\n", + free_space, dentries_count, + items_area_size, + sizeof(struct ssdfs_dir_entry)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (free_space != (items_area_size - used_space)) { + err = -ERANGE; + SSDFS_ERR("free_space %u, items_area_size %u, " + "used_space %u\n", + free_space, items_area_size, + used_space); + goto finish_dentries_header_preparation; + } + + dentries_header.node.check.bytes = cpu_to_le16((u16)hdr_size); + dentries_header.node.check.flags = cpu_to_le16(SSDFS_CRC32); + + err = ssdfs_calculate_csum(&dentries_header.node.check, + &dentries_header, hdr_size); + if (unlikely(err)) { + SSDFS_ERR("unable to calculate checksum: err %d\n", err); + goto finish_dentries_header_preparation; + } + + ssdfs_memcpy(&node->raw.dentries_header, 0, hdr_size, + &dentries_header, 0, hdr_size, + hdr_size); + +finish_dentries_header_preparation: + up_write(&node->header_lock); + + if (unlikely(err)) + goto finish_node_pre_flush; + + if (pagevec_count(&node->content.pvec) < 1) { + err = -ERANGE; + SSDFS_ERR("pagevec is empty\n"); + goto finish_node_pre_flush; + } + + page = node->content.pvec.pages[0]; + ssdfs_memcpy_to_page(page, 0, PAGE_SIZE, + &dentries_header, 0, hdr_size, + hdr_size); + +finish_node_pre_flush: + up_write(&node->full_lock); + + return err; +} + +/* + * ssdfs_dentries_btree_flush_node() - flush node + * @node: pointer on node object + * + * This method tries to flush node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_dentries_btree_flush_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_btree *tree; + struct ssdfs_dentries_btree_info *tree_info = NULL; + int private_flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node %p, node_id %u\n", + node, node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree = node->tree; + if (!tree) { + SSDFS_ERR("node hasn't pointer on tree\n"); + return -ERANGE; + } + + if (tree->type != SSDFS_DENTRIES_BTREE) { + SSDFS_WARN("invalid tree type %#x\n", + tree->type); + return -ERANGE; + } else { + tree_info = container_of(tree, + struct ssdfs_dentries_btree_info, + buffer.tree); + } + + private_flags = atomic_read(&tree_info->owner->private_flags); + + if (private_flags & SSDFS_INODE_HAS_DENTRIES_BTREE) { + switch (atomic_read(&tree_info->type)) { + case SSDFS_PRIVATE_DENTRIES_BTREE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid tree type %#x\n", + atomic_read(&tree_info->type)); + return -ERANGE; + } + + err = ssdfs_btree_common_node_flush(node); + if (unlikely(err)) { + SSDFS_ERR("fail to flush node: " + "node_id %u, height %u, err %d\n", + node->node_id, + atomic_read(&node->height), + err); + } + } else { + err = -ERANGE; + SSDFS_ERR("dentries tree is inline dentries array\n"); + } + + ssdfs_debug_btree_node_object(node); + + return err; +} + +/****************************************************************************** + * SPECIALIZED DENTRIES BTREE NODE OPERATIONS * + ******************************************************************************/ + +/* + * ssdfs_convert_lookup2item_index() - convert lookup into item index + * @node_size: size of the node in bytes + * @lookup_index: lookup index + */ +static inline +u16 ssdfs_convert_lookup2item_index(u32 node_size, u16 lookup_index) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_size %u, lookup_index %u\n", + node_size, lookup_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_convert_lookup2item_index(lookup_index, node_size, + sizeof(struct ssdfs_dir_entry), + SSDFS_DENTRIES_BTREE_LOOKUP_TABLE_SIZE); +} + +/* + * ssdfs_convert_item2lookup_index() - convert item into lookup index + * @node_size: size of the node in bytes + * @item_index: item index + */ +static inline +u16 ssdfs_convert_item2lookup_index(u32 node_size, u16 item_index) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_size %u, item_index %u\n", + node_size, item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_convert_item2lookup_index(item_index, node_size, + sizeof(struct ssdfs_dir_entry), + SSDFS_DENTRIES_BTREE_LOOKUP_TABLE_SIZE); +} + +/* + * is_hash_for_lookup_table() - should item's hash be into lookup table? + * @node_size: size of the node in bytes + * @item_index: item index + */ +static inline +bool is_hash_for_lookup_table(u32 node_size, u16 item_index) +{ + u16 lookup_index; + u16 calculated; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_size %u, item_index %u\n", + node_size, item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + lookup_index = ssdfs_convert_item2lookup_index(node_size, item_index); + calculated = ssdfs_convert_lookup2item_index(node_size, lookup_index); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("lookup_index %u, calculated %u\n", + lookup_index, calculated); +#endif /* CONFIG_SSDFS_DEBUG */ + + return calculated == item_index; +} + +/* + * ssdfs_dentries_btree_node_find_lookup_index() - find lookup index + * @node: node object + * @search: search object + * @lookup_index: lookup index [out] + * + * This method tries to find a lookup index for requested items. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - lookup index doesn't exist for requested hash. + */ +static +int ssdfs_dentries_btree_node_find_lookup_index(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search, + u16 *lookup_index) +{ + __le64 *lookup_table; + int array_size = SSDFS_DENTRIES_BTREE_LOOKUP_TABLE_SIZE; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search || !lookup_index); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->header_lock); + lookup_table = node->raw.dentries_header.lookup_table; + err = ssdfs_btree_node_find_lookup_index_nolock(search, + lookup_table, + array_size, + lookup_index); + up_read(&node->header_lock); + + return err; +} + +/* + * ssdfs_get_dentries_hash_range() - get dentry's hash range + * @kaddr: pointer on dentry object + * @start_hash: pointer on start_hash value [out] + * @end_hash: pointer on end_hash value [out] + */ +static +void ssdfs_get_dentries_hash_range(void *kaddr, + u64 *start_hash, + u64 *end_hash) +{ + struct ssdfs_dir_entry *dentry; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!kaddr || !start_hash || !end_hash); + + SSDFS_DBG("kaddr %p\n", kaddr); +#endif /* CONFIG_SSDFS_DEBUG */ + + dentry = (struct ssdfs_dir_entry *)kaddr; + *start_hash = le64_to_cpu(dentry->hash_code); + *end_hash = *start_hash; +} + +/* + * ssdfs_check_found_dentry() - check found dentry + * @fsi: pointer on shared file system object + * @search: search object + * @kaddr: pointer on dentry object + * @item_index: index of the dentry + * @start_hash: pointer on start_hash value [out] + * @end_hash: pointer on end_hash value [out] + * @found_index: pointer on found index [out] + * + * This method tries to check the found dentry. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - corrupted dentry. + * %-EAGAIN - continue the search. + * %-ENODATA - possible place was found. + */ +static +int ssdfs_check_found_dentry(struct ssdfs_fs_info *fsi, + struct ssdfs_btree_search *search, + void *kaddr, + u16 item_index, + u64 *start_hash, + u64 *end_hash, + u16 *found_index) +{ + struct ssdfs_dir_entry *dentry; + u64 hash_code; + u64 ino; + u8 type; + u8 flags; + u16 name_len; + u32 req_flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search || !kaddr || !found_index); + BUG_ON(!start_hash || !end_hash); + + SSDFS_DBG("item_index %u\n", item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + *start_hash = U64_MAX; + *end_hash = U64_MAX; + *found_index = U16_MAX; + + dentry = (struct ssdfs_dir_entry *)kaddr; + hash_code = le64_to_cpu(dentry->hash_code); + ino = le64_to_cpu(dentry->ino); + type = dentry->dentry_type; + flags = dentry->flags; + name_len = le16_to_cpu(dentry->name_len); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("hash_code %llx, ino %llu, name_len %u\n", + hash_code, ino, name_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + req_flags = search->request.flags; + + if (type != SSDFS_REGULAR_DENTRY) { + SSDFS_ERR("corrupted dentry: " + "hash_code %llx, ino %llu, " + "type %#x, flags %#x\n", + hash_code, ino, + type, flags); + return -ERANGE; + } + + if (flags & ~SSDFS_DENTRY_FLAGS_MASK) { + SSDFS_ERR("corrupted dentry: " + "hash_code %llx, ino %llu, " + "type %#x, flags %#x\n", + hash_code, ino, + type, flags); + return -ERANGE; + } + + if (hash_code >= U64_MAX || ino >= U64_MAX) { + SSDFS_ERR("corrupted dentry: " + "hash_code %llx, ino %llu, " + "type %#x, flags %#x\n", + hash_code, ino, + type, flags); + return -ERANGE; + } + + if (!(req_flags & SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE)) { + SSDFS_ERR("invalid request: hash is absent\n"); + return -ERANGE; + } + + ssdfs_get_dentries_hash_range(kaddr, start_hash, end_hash); + + err = ssdfs_check_dentry_for_request(fsi, dentry, search); + if (err == -ENODATA) { + search->result.state = + SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND; + search->result.err = err; + search->result.start_index = item_index; + search->result.count = 1; + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_ADD_ITEM: + case SSDFS_BTREE_SEARCH_ADD_RANGE: + case SSDFS_BTREE_SEARCH_CHANGE_ITEM: + /* do nothing */ + break; + + default: + ssdfs_btree_search_free_result_buf(search); + + search->result.buf_state = + SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE; + search->result.buf = NULL; + search->result.buf_size = 0; + search->result.items_in_buffer = 0; + break; + } + + *found_index = item_index; + } else if (err == -EAGAIN) { + /* continue to search */ + err = 0; + *found_index = U16_MAX; + } else if (unlikely(err)) { + SSDFS_ERR("fail to check dentry: err %d\n", + err); + } else { + *found_index = item_index; + search->result.state = + SSDFS_BTREE_SEARCH_VALID_ITEM; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_hash %llx, end_hash %llx, " + "found_index %u\n", + *start_hash, *end_hash, + *found_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_prepare_dentries_buffer() - prepare buffer for dentries + * @search: search object + * @found_index: found index of dentry + * @start_hash: starting hash + * @end_hash: ending hash + * @items_count: count of items in the sequence + * @item_size: size of the item + */ +static +int ssdfs_prepare_dentries_buffer(struct ssdfs_btree_search *search, + u16 found_index, + u64 start_hash, + u64 end_hash, + u16 items_count, + size_t item_size) +{ + u16 found_dentries = 0; + size_t buf_size = sizeof(struct ssdfs_raw_dentry); + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); + + SSDFS_DBG("found_index %u, start_hash %llx, end_hash %llx, " + "items_count %u, item_size %zu\n", + found_index, start_hash, end_hash, + items_count, item_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_btree_search_free_result_buf(search); + + if (start_hash == end_hash) { + /* use inline buffer */ + found_dentries = 1; + } else { + /* use external buffer */ + if (found_index >= items_count) { + SSDFS_ERR("found_index %u >= items_count %u\n", + found_index, items_count); + return -ERANGE; + } + found_dentries = items_count - found_index; + } + + if (found_dentries == 1) { + search->result.buf_state = + SSDFS_BTREE_SEARCH_INLINE_BUFFER; + search->result.buf = &search->raw.dentry; + search->result.buf_size = buf_size; + search->result.items_in_buffer = 0; + + search->result.name_state = + SSDFS_BTREE_SEARCH_INLINE_BUFFER; + search->result.name = &search->name; + search->result.name_string_size = + sizeof(struct ssdfs_name_string); + search->result.names_in_buffer = 0; + } else { + if (search->result.buf) { + SSDFS_WARN("search->result.buf %p, " + "search->result.buf_state %#x\n", + search->result.buf, + search->result.buf_state); + } + + err = ssdfs_btree_search_alloc_result_buf(search, + buf_size * found_dentries); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate memory for buffer\n"); + return err; + } + + err = ssdfs_btree_search_alloc_result_name(search, + (size_t)found_dentries * + sizeof(struct ssdfs_name_string)); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate memory for buffer\n"); + ssdfs_btree_search_free_result_buf(search); + return err; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found_dentries %u, " + "search->result.buf (buf_state %#x, " + "buf_size %zu, items_in_buffer %u)\n", + found_dentries, + search->result.buf_state, + search->result.buf_size, + search->result.items_in_buffer); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_extract_found_dentry() - extract found dentry + * @fsi: pointer on shared file system object + * @search: search object + * @item_size: size of the item + * @kaddr: pointer on dentry + * @start_hash: pointer on start_hash value [out] + * @end_hash: pointer on end_hash value [out] + * + * This method tries to extract the found dentry. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_extract_found_dentry(struct ssdfs_fs_info *fsi, + struct ssdfs_btree_search *search, + size_t item_size, + void *kaddr, + u64 *start_hash, + u64 *end_hash) +{ + struct ssdfs_shared_dict_btree_info *dict; + struct ssdfs_dir_entry *dentry; + size_t buf_size = sizeof(struct ssdfs_raw_dentry); + struct ssdfs_name_string *name; + size_t name_size = sizeof(struct ssdfs_name_string); + u32 calculated; + u8 flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !search || !kaddr); + BUG_ON(!start_hash || !end_hash); + + SSDFS_DBG("kaddr %p\n", kaddr); +#endif /* CONFIG_SSDFS_DEBUG */ + + *start_hash = U64_MAX; + *end_hash = U64_MAX; + + dict = fsi->shdictree; + if (!dict) { + SSDFS_ERR("shared dictionary is absent\n"); + return -ERANGE; + } + + calculated = search->result.items_in_buffer * buf_size; + if (calculated >= search->result.buf_size) { + SSDFS_ERR("calculated %u >= buf_size %zu, " + "items_in_buffer %u\n", + calculated, search->result.buf_size, + search->result.items_in_buffer); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + dentry = (struct ssdfs_dir_entry *)kaddr; + ssdfs_get_dentries_hash_range(dentry, start_hash, end_hash); + + err = ssdfs_memcpy(search->result.buf, + calculated, search->result.buf_size, + dentry, 0, item_size, + item_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + return err; + } + + search->result.items_in_buffer++; + + flags = dentry->flags; + if (flags & SSDFS_DENTRY_HAS_EXTERNAL_STRING) { + calculated = search->result.names_in_buffer * name_size; + if (calculated >= search->result.name_string_size) { + SSDFS_ERR("calculated %u >= name_string_size %zu\n", + calculated, + search->result.name_string_size); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search->result.name); +#endif /* CONFIG_SSDFS_DEBUG */ + + name = search->result.name + search->result.names_in_buffer; + + err = ssdfs_shared_dict_get_name(dict, *start_hash, name); + if (unlikely(err)) { + SSDFS_ERR("fail to extract the name: " + "hash %llx, err %d\n", + *start_hash, err); + return err; + } + + search->result.names_in_buffer++; + } + + search->result.count++; + search->result.state = SSDFS_BTREE_SEARCH_VALID_ITEM; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_hash %llx, end_hash %llx, " + "search->result.count %u\n", + *start_hash, *end_hash, + search->result.count); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_extract_range_by_lookup_index() - extract a range of items + * @node: pointer on node object + * @lookup_index: lookup index for requested range + * @search: pointer on search request object + * + * This method tries to extract a range of items from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - requested range is out of the node. + */ +static +int ssdfs_extract_range_by_lookup_index(struct ssdfs_btree_node *node, + u16 lookup_index, + struct ssdfs_btree_search *search) +{ + int capacity = SSDFS_DENTRIES_BTREE_LOOKUP_TABLE_SIZE; + size_t item_size = sizeof(struct ssdfs_dir_entry); + + return __ssdfs_extract_range_by_lookup_index(node, lookup_index, + capacity, item_size, + search, + ssdfs_check_found_dentry, + ssdfs_prepare_dentries_buffer, + ssdfs_extract_found_dentry); +} From patchwork Sat Feb 25 01:09:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1C68C6FA8E for ; Sat, 25 Feb 2023 01:20:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229727AbjBYBT7 (ORCPT ); Fri, 24 Feb 2023 20:19:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50412 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229777AbjBYBTK (ORCPT ); Fri, 24 Feb 2023 20:19:10 -0500 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3574E86A4 for ; Fri, 24 Feb 2023 17:17:52 -0800 (PST) Received: by mail-oi1-x22b.google.com with SMTP id y184so798753oiy.8 for ; Fri, 24 Feb 2023 17:17:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/0yvaQH4FC0lXUcbryy9NJTCK5J4j44PbJd9H8gjxoo=; b=1T6a+GInO6Tql0q1Qti+zRx1iRELLFjHGjfSDZh/1mMk0xSVszBbeXbC4cElFDe9bX 6c5XgqGD0+is48DrT33E7rewc+oVk6cxtcogaKlmDcvEHo4irCjX2XgHccDEdgLPQhgA BhieFl4MmcNaaEaD1cvav0zT46fraW+AjDOD134DiQ2SKrhJuA0QhuXHHN+DV9SwkDCx PKVoYm6OHAO+/Yt6tWHhHLlfcq5e5yUjnsvTkNSzT/NP61pni0KT48+rVIRLAY/tv+TB Cs41xHmNRUcIDrG6x0wTXAdGsa3EugyC4RMxWv3jwWcedTUXCS3jnQ07ND3b/yz3g9qf wFTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/0yvaQH4FC0lXUcbryy9NJTCK5J4j44PbJd9H8gjxoo=; b=Q0wJNVIt7azuxRPNCiAFKCY5uMxjQ6sHT13FN8PgP6G4X7n9wuUEnsXMm2HrenhuMc UKGzsbToXwwD9qYn42qyp7DVrI6WrnGKCatlVNU0idAdWTTQ3xaChlAHwD4/SuLjGpYK I6kfg/9lb8emcYd5KQxj9KFQLTItkUPy7y2m7oDftD+TDEhLCDMa2Qm7iabz9rbgCheV giR2aS25C7704SsoRp+n/cNS2fnwJFhwhljiIr5CGIYpPIJWsQt0Gbp+5FKLUybAM/n5 Vm2UiFMMi3hCz/xHm4LxY+V8kPYVTThWQsiNfSNzdPx3K7mb6S07L4YBu1yAXCaJT+7O ZNVw== X-Gm-Message-State: AO0yUKVGhNaTWgLT6N6aJdBqH3CqPIEk4eqDU/y+JmyhIM633siQcwTT VZqVIDFIL0TYQPvr1PWlOtYWzGa0CHlc3iPb X-Google-Smtp-Source: AK7set+ZjCMSHm41UETr6XfkPmwW/ozE4/VNCTBzCRCqSLqwiLHm4G3gpiRprYHYcW/Sqi+3rnCMUg== X-Received: by 2002:a05:6808:ab3:b0:378:5f75:bf27 with SMTP id r19-20020a0568080ab300b003785f75bf27mr4186671oij.50.1677287870814; Fri, 24 Feb 2023 17:17:50 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:49 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 64/76] ssdfs: dentries b-tree node's specialized operations Date: Fri, 24 Feb 2023 17:09:15 -0800 Message-Id: <20230225010927.813929-65-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Dentries b-tree node implements a specialized API: (1) find_item - find item in the node (2) find_range - find range of items in the node (3) extract_range - extract range of items (or all items) from node (4) insert_item - insert item in the node (5) insert_range - insert range of items in the node (6) change_item - change item in the node (7) delete_item - delete item from the node (8) delete_range - delete range of items from the node Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/dentries_tree.c | 3344 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 3344 insertions(+) diff --git a/fs/ssdfs/dentries_tree.c b/fs/ssdfs/dentries_tree.c index 9b4115b6bffa..55abc05d1e99 100644 --- a/fs/ssdfs/dentries_tree.c +++ b/fs/ssdfs/dentries_tree.c @@ -6380,3 +6380,3347 @@ int ssdfs_extract_range_by_lookup_index(struct ssdfs_btree_node *node, ssdfs_prepare_dentries_buffer, ssdfs_extract_found_dentry); } + +/* + * ssdfs_btree_search_result_no_data() - prepare result state for no data case + * @node: pointer on node object + * @lookup_index: lookup index + * @search: pointer on search request object [in|out] + * + * This method prepares result state for no data case. + */ +static inline +void ssdfs_btree_search_result_no_data(struct ssdfs_btree_node *node, + u16 lookup_index, + struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->result.state = SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND; + search->result.err = -ENODATA; + search->result.start_index = + ssdfs_convert_lookup2item_index(node->node_size, + lookup_index); + search->result.count = search->request.count; + search->result.search_cno = ssdfs_current_cno(node->tree->fsi->sb); + + if (!is_btree_search_contains_new_item(search)) { + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_ADD_ITEM: + case SSDFS_BTREE_SEARCH_ADD_RANGE: + case SSDFS_BTREE_SEARCH_CHANGE_ITEM: + /* do nothing */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->result.buf_state = + SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE; + search->result.buf = NULL; + search->result.buf_size = 0; + search->result.items_in_buffer = 0; + break; + } + } +} + +/* + * ssdfs_dentries_btree_node_find_range() - find a range of items into the node + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to find a range of items into the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - requested range is out of the node. + * %-ENOMEM - unable to allocate memory. + */ +static +int ssdfs_dentries_btree_node_find_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + int state; + u16 items_count; + u16 items_capacity; + u64 start_hash; + u64 end_hash; + u16 lookup_index; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->header_lock); + state = atomic_read(&node->items_area.state); + items_count = node->items_area.items_count; + items_capacity = node->items_area.items_capacity; + start_hash = node->items_area.start_hash; + end_hash = node->items_area.end_hash; + up_read(&node->header_lock); + + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + SSDFS_ERR("invalid area state %#x\n", + state); + return -ERANGE; + } + + if (items_capacity == 0 || items_count > items_capacity) { + SSDFS_ERR("corrupted node description: " + "items_count %u, items_capacity %u\n", + items_count, + items_capacity); + return -ERANGE; + } + + if (search->request.count == 0 || + search->request.count > items_capacity) { + SSDFS_ERR("invalid request: " + "count %u, items_capacity %u\n", + search->request.count, + items_capacity); + return -ERANGE; + } + + err = ssdfs_btree_node_check_hash_range(node, + items_count, + items_capacity, + start_hash, + end_hash, + search); + if (err) + return err; + + err = ssdfs_dentries_btree_node_find_lookup_index(node, search, + &lookup_index); + if (err == -ENODATA) { + ssdfs_btree_search_result_no_data(node, lookup_index, search); + return -ENODATA; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find the index: " + "start_hash %llx, end_hash %llx, err %d\n", + search->request.start.hash, + search->request.end.hash, + err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(lookup_index >= SSDFS_DENTRIES_BTREE_LOOKUP_TABLE_SIZE); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_extract_range_by_lookup_index(node, lookup_index, + search); + search->result.search_cno = ssdfs_current_cno(node->tree->fsi->sb); + + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node contains not all requested dentries: " + "node (start_hash %llx, end_hash %llx), " + "request (start_hash %llx, end_hash %llx)\n", + start_hash, end_hash, + search->request.start.hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to extract range: " + "node (start_hash %llx, end_hash %llx), " + "request (start_hash %llx, end_hash %llx), " + "err %d\n", + start_hash, end_hash, + search->request.start.hash, + search->request.end.hash, + err); +#endif /* CONFIG_SSDFS_DEBUG */ + ssdfs_btree_search_result_no_data(node, lookup_index, search); + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to extract range: " + "node (start_hash %llx, end_hash %llx), " + "request (start_hash %llx, end_hash %llx), " + "err %d\n", + start_hash, end_hash, + search->request.start.hash, + search->request.end.hash, + err); + return err; + } + + search->request.flags &= ~SSDFS_BTREE_SEARCH_INLINE_BUF_HAS_NEW_ITEM; + + return 0; +} + +/* + * ssdfs_dentries_btree_node_find_item() - find item into node + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to find an item into the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_dentries_btree_node_find_item(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search->request.count != 1 || + search->request.start.hash != search->request.end.hash) { + SSDFS_ERR("invalid request state: " + "count %d, start_hash %llx, end_hash %llx\n", + search->request.count, + search->request.start.hash, + search->request.end.hash); + return -ERANGE; + } + + return ssdfs_dentries_btree_node_find_range(node, search); +} + +static +int ssdfs_dentries_btree_node_allocate_item(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("operation is unavailable\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return -EOPNOTSUPP; +} + +static +int ssdfs_dentries_btree_node_allocate_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("operation is unavailable\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return -EOPNOTSUPP; +} + +/* + * __ssdfs_dentries_btree_node_get_dentry() - extract the dentry from pagevec + * @pvec: pointer on pagevec + * @area_offset: area offset from the node's beginning + * @area_size: area size + * @node_size: size of the node + * @item_index: index of the dentry in the node + * @dentry: pointer on dentry's buffer [out] + * + * This method tries to extract the dentry from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_dentries_btree_node_get_dentry(struct pagevec *pvec, + u32 area_offset, + u32 area_size, + u32 node_size, + u16 item_index, + struct ssdfs_dir_entry *dentry) +{ + size_t item_size = sizeof(struct ssdfs_dir_entry); + u32 item_offset; + int page_index; + struct page *page; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pvec || !dentry); + + SSDFS_DBG("area_offset %u, area_size %u, item_index %u\n", + area_offset, area_size, item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + item_offset = (u32)item_index * item_size; + if (item_offset >= area_size) { + SSDFS_ERR("item_offset %u >= area_size %u\n", + item_offset, area_size); + return -ERANGE; + } + + item_offset += area_offset; + if (item_offset >= node_size) { + SSDFS_ERR("item_offset %u >= node_size %u\n", + item_offset, node_size); + return -ERANGE; + } + + page_index = item_offset >> PAGE_SHIFT; + + if (page_index > 0) + item_offset %= page_index * PAGE_SIZE; + + if (page_index >= pagevec_count(pvec)) { + SSDFS_ERR("invalid page_index: " + "index %d, pvec_size %u\n", + page_index, + pagevec_count(pvec)); + return -ERANGE; + } + + page = pvec->pages[page_index]; + err = ssdfs_memcpy_from_page(dentry, 0, item_size, + page, item_offset, PAGE_SIZE, + item_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + return err; + } + + return 0; +} + +/* + * ssdfs_dentries_btree_node_get_dentry() - extract dentry from the node + * @node: pointer on node object + * @area: items area descriptor + * @item_index: index of the dentry + * @dentry: pointer on extracted dentry [out] + * + * This method tries to extract the dentry from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_dentries_btree_node_get_dentry(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + u16 item_index, + struct ssdfs_dir_entry *dentry) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !dentry); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, item_index %u\n", + node->node_id, item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_dentries_btree_node_get_dentry(&node->content.pvec, + area->offset, + area->area_size, + node->node_size, + item_index, + dentry); +} + +/* + * is_requested_position_correct() - check that requested position is correct + * @node: pointer on node object + * @area: items area descriptor + * @search: search object + * + * This method tries to check that requested position of a dentry + * into the node is correct. + * + * RETURN: + * [success] + * + * %SSDFS_CORRECT_POSITION - requested position is correct. + * %SSDFS_SEARCH_LEFT_DIRECTION - correct position from the left. + * %SSDFS_SEARCH_RIGHT_DIRECTION - correct position from the right. + * + * [failure] - error code: + * + * %SSDFS_CHECK_POSITION_FAILURE - internal error. + */ +static +int is_requested_position_correct(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + struct ssdfs_btree_search *search) +{ + struct ssdfs_dir_entry dentry; + u16 item_index; + u64 ino; + u64 hash; + u32 req_flags; + size_t name_len; + int direction = SSDFS_CHECK_POSITION_FAILURE; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, item_index %u\n", + node->node_id, search->result.start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + item_index = search->result.start_index; + if ((item_index + search->request.count) > area->items_capacity) { + SSDFS_ERR("invalid request: " + "item_index %u, count %u\n", + item_index, search->request.count); + return SSDFS_CHECK_POSITION_FAILURE; + } + + if (item_index >= area->items_count) { + if (area->items_count == 0) + item_index = area->items_count; + else + item_index = area->items_count - 1; + + search->result.start_index = item_index; + } + + + err = ssdfs_dentries_btree_node_get_dentry(node, area, + item_index, &dentry); + if (unlikely(err)) { + SSDFS_ERR("fail to extract the dentry: " + "item_index %u, err %d\n", + item_index, err); + return SSDFS_CHECK_POSITION_FAILURE; + } + + ino = le64_to_cpu(dentry.ino); + hash = le64_to_cpu(dentry.hash_code); + req_flags = search->request.flags; + + if (search->request.end.hash < hash) + direction = SSDFS_SEARCH_LEFT_DIRECTION; + else if (hash < search->request.start.hash) + direction = SSDFS_SEARCH_RIGHT_DIRECTION; + else { + /* search->request.start.hash == hash */ + + if (req_flags & SSDFS_BTREE_SEARCH_HAS_VALID_INO) { + if (search->request.start.ino < ino) + direction = SSDFS_SEARCH_LEFT_DIRECTION; + else if (ino < search->request.start.ino) + direction = SSDFS_SEARCH_RIGHT_DIRECTION; + else + direction = SSDFS_CORRECT_POSITION; + } else if (req_flags & SSDFS_BTREE_SEARCH_HAS_VALID_NAME) { + int res; + + if (!search->request.start.name) { + SSDFS_ERR("empty name pointer\n"); + return -ERANGE; + } + + name_len = min_t(size_t, search->request.start.name_len, + SSDFS_DENTRY_INLINE_NAME_MAX_LEN); + res = strncmp(search->request.start.name, + dentry.inline_string, + name_len); + if (res < 0) + direction = SSDFS_SEARCH_LEFT_DIRECTION; + else if (res > 0) + direction = SSDFS_SEARCH_RIGHT_DIRECTION; + else + direction = SSDFS_CORRECT_POSITION; + } else + direction = SSDFS_CORRECT_POSITION; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %llu, hash %llx, " + "search (start_hash %llx, ino %llu; " + "end_hash %llx, ino %llu), " + "direction %#x\n", + ino, hash, + search->request.start.hash, + search->request.start.ino, + search->request.end.hash, + search->request.end.ino, + direction); +#endif /* CONFIG_SSDFS_DEBUG */ + + return direction; +} + +/* + * ssdfs_find_correct_position_from_left() - find position from the left + * @node: pointer on node object + * @area: items area descriptor + * @search: search object + * + * This method tries to find a correct position of the dentry + * from the left side of dentries' sequence in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_find_correct_position_from_left(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + struct ssdfs_btree_search *search) +{ + struct ssdfs_dir_entry dentry; + int item_index; + u64 ino; + u64 hash; + u32 req_flags; + size_t name_len; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, item_index %u\n", + node->node_id, search->result.start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + item_index = search->result.start_index; + if ((item_index + search->request.count) >= area->items_capacity) { + SSDFS_ERR("invalid request: " + "item_index %u, count %u\n", + item_index, search->request.count); + return -ERANGE; + } + + if (item_index >= area->items_count) { + if (area->items_count == 0) + item_index = area->items_count; + else + item_index = area->items_count - 1; + + search->result.start_index = (u16)item_index; + return 0; + } + + + req_flags = search->request.flags; + + for (; item_index >= 0; item_index--) { + err = ssdfs_dentries_btree_node_get_dentry(node, area, + (u16)item_index, + &dentry); + if (unlikely(err)) { + SSDFS_ERR("fail to extract the dentry: " + "item_index %d, err %d\n", + item_index, err); + return err; + } + + ino = le64_to_cpu(dentry.ino); + hash = le64_to_cpu(dentry.hash_code); + + if (search->request.start.hash == hash) { + if (req_flags & SSDFS_BTREE_SEARCH_HAS_VALID_INO) { + if (ino == search->request.start.ino) { + search->result.start_index = + (u16)item_index; + return 0; + } else if (ino < search->request.start.ino) { + search->result.start_index = + (u16)(item_index + 1); + return 0; + } else + continue; + } + + if (req_flags & SSDFS_BTREE_SEARCH_HAS_VALID_NAME) { + int res; + + if (!search->request.start.name) { + SSDFS_ERR("empty name pointer\n"); + return -ERANGE; + } + + name_len = min_t(size_t, + search->request.start.name_len, + SSDFS_DENTRY_INLINE_NAME_MAX_LEN); + res = strncmp(search->request.start.name, + dentry.inline_string, + name_len); + if (res == 0) { + search->result.start_index = + (u16)item_index; + return 0; + } else if (res < 0) { + search->result.start_index = + (u16)(item_index + 1); + return 0; + } else + continue; + } + + search->result.start_index = (u16)item_index; + return 0; + } else if (hash < search->request.start.hash) { + search->result.start_index = (u16)(item_index + 1); + return 0; + } + } + + search->result.start_index = 0; + return 0; +} + +/* + * ssdfs_find_correct_position_from_right() - find position from the right + * @node: pointer on node object + * @area: items area descriptor + * @search: search object + * + * This method tries to find a correct position of the dentry + * from the right side of dentries' sequence in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_find_correct_position_from_right(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + struct ssdfs_btree_search *search) +{ + struct ssdfs_dir_entry dentry; + int item_index; + u64 ino; + u64 hash; + u32 req_flags; + size_t name_len; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, item_index %u\n", + node->node_id, search->result.start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + item_index = search->result.start_index; + if ((item_index + search->request.count) >= area->items_capacity) { + SSDFS_ERR("invalid request: " + "item_index %u, count %u\n", + item_index, search->request.count); + return -ERANGE; + } + + if (item_index >= area->items_count) { + if (area->items_count == 0) + item_index = area->items_count; + else + item_index = area->items_count - 1; + + search->result.start_index = (u16)item_index; + + return 0; + } + + req_flags = search->request.flags; + + for (; item_index < area->items_count; item_index++) { + err = ssdfs_dentries_btree_node_get_dentry(node, area, + (u16)item_index, + &dentry); + if (unlikely(err)) { + SSDFS_ERR("fail to extract the dentry: " + "item_index %d, err %d\n", + item_index, err); + return err; + } + + ino = le64_to_cpu(dentry.ino); + hash = le64_to_cpu(dentry.hash_code); + + if (search->request.start.hash == hash) { + if (req_flags & SSDFS_BTREE_SEARCH_HAS_VALID_INO) { + if (ino == search->request.start.ino) { + search->result.start_index = + (u16)item_index; + return 0; + } else if (search->request.start.ino < ino) { + if (item_index == 0) { + search->result.start_index = + (u16)item_index; + } else { + search->result.start_index = + (u16)(item_index - 1); + } + return 0; + } else + continue; + } + + if (req_flags & SSDFS_BTREE_SEARCH_HAS_VALID_NAME) { + int res; + + if (!search->request.start.name) { + SSDFS_ERR("empty name pointer\n"); + return -ERANGE; + } + + name_len = min_t(size_t, + search->request.start.name_len, + SSDFS_DENTRY_INLINE_NAME_MAX_LEN); + res = strncmp(search->request.start.name, + dentry.inline_string, + name_len); + if (res < 0) + continue; + else { + search->result.start_index = + (u16)item_index; + } + } + + search->result.start_index = (u16)item_index; + return 0; + } else if (search->request.end.hash < hash) { + search->result.start_index = (u16)item_index; + return 0; + } + } + + search->result.start_index = area->items_count; + return 0; +} + +/* + * ssdfs_clean_lookup_table() - clean unused space of lookup table + * @node: pointer on node object + * @area: items area descriptor + * @start_index: starting index + * + * This method tries to clean the unused space of lookup table. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_clean_lookup_table(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + u16 start_index) +{ + __le64 *lookup_table; + u16 lookup_index; + u16 item_index; + u16 items_count; + u16 items_capacity; + u16 cleaning_indexes; + u32 cleaning_bytes; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, start_index %u\n", + node->node_id, start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + items_capacity = node->items_area.items_capacity; + if (start_index >= items_capacity) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_index %u >= items_capacity %u\n", + start_index, items_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + lookup_table = node->raw.dentries_header.lookup_table; + + lookup_index = ssdfs_convert_item2lookup_index(node->node_size, + start_index); + if (unlikely(lookup_index >= SSDFS_DENTRIES_BTREE_LOOKUP_TABLE_SIZE)) { + SSDFS_ERR("invalid lookup_index %u\n", + lookup_index); + return -ERANGE; + } + + items_count = node->items_area.items_count; + item_index = ssdfs_convert_lookup2item_index(node->node_size, + lookup_index); + if (unlikely(item_index >= items_capacity)) { + SSDFS_ERR("item_index %u >= items_capacity %u\n", + item_index, items_capacity); + return -ERANGE; + } + + if (item_index != start_index) + lookup_index++; + + cleaning_indexes = + SSDFS_DENTRIES_BTREE_LOOKUP_TABLE_SIZE - lookup_index; + cleaning_bytes = cleaning_indexes * sizeof(__le64); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("lookup_index %u, cleaning_indexes %u, cleaning_bytes %u\n", + lookup_index, cleaning_indexes, cleaning_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset(&lookup_table[lookup_index], 0xFF, cleaning_bytes); + + return 0; +} + +/* + * ssdfs_correct_lookup_table() - correct lookup table of the node + * @node: pointer on node object + * @area: items area descriptor + * @start_index: starting index of the range + * @range_len: number of items in the range + * + * This method tries to correct the lookup table of the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_correct_lookup_table(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + u16 start_index, u16 range_len) +{ + __le64 *lookup_table; + struct ssdfs_dir_entry dentry; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, start_index %u, range_len %u\n", + node->node_id, start_index, range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (range_len == 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("range_len == 0\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + lookup_table = node->raw.dentries_header.lookup_table; + + for (i = 0; i < range_len; i++) { + int item_index = start_index + i; + u16 lookup_index; + + if (is_hash_for_lookup_table(node->node_size, item_index)) { + lookup_index = + ssdfs_convert_item2lookup_index(node->node_size, + item_index); + + err = ssdfs_dentries_btree_node_get_dentry(node, area, + item_index, + &dentry); + if (unlikely(err)) { + SSDFS_ERR("fail to extract dentry: " + "item_index %d, err %d\n", + item_index, err); + return err; + } + + lookup_table[lookup_index] = dentry.hash_code; + } + } + + return 0; +} + +/* + * ssdfs_initialize_lookup_table() - initialize lookup table + * @node: pointer on node object + */ +static +void ssdfs_initialize_lookup_table(struct ssdfs_btree_node *node) +{ + __le64 *lookup_table; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + lookup_table = node->raw.dentries_header.lookup_table; + memset(lookup_table, 0xFF, + sizeof(__le64) * SSDFS_DENTRIES_BTREE_LOOKUP_TABLE_SIZE); +} + +/* + * __ssdfs_dentries_btree_node_insert_range() - insert range into node + * @node: pointer on node object + * @search: search object + * + * This method tries to insert the range of dentries into the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int __ssdfs_dentries_btree_node_insert_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree *tree; + struct ssdfs_dentries_btree_info *dtree; + struct ssdfs_dentries_btree_node_header *hdr; + struct ssdfs_btree_node_items_area items_area; + struct ssdfs_dir_entry dentry; + size_t item_size = sizeof(struct ssdfs_dir_entry); + u16 item_index; + int free_items; + u16 range_len; + u16 dentries_count = 0; + int direction; + u32 used_space; + u64 start_hash = U64_MAX; + u64 end_hash = U64_MAX; + u64 cur_hash; + u64 old_hash; + u16 inline_names = 0; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid items_area state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + + tree = node->tree; + + switch (tree->type) { + case SSDFS_DENTRIES_BTREE: + /* expected btree type */ + break; + + default: + SSDFS_ERR("invalid btree type %#x\n", tree->type); + return -ERANGE; + } + + dtree = container_of(tree, struct ssdfs_dentries_btree_info, + buffer.tree); + + down_read(&node->header_lock); + ssdfs_memcpy(&items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + &node->items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + sizeof(struct ssdfs_btree_node_items_area)); + old_hash = node->items_area.start_hash; + up_read(&node->header_lock); + + if (items_area.items_capacity == 0 || + items_area.items_capacity < items_area.items_count) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid items accounting: " + "node_id %u, items_capacity %u, items_count %u\n", + node->node_id, items_area.items_capacity, + items_area.items_count); + return -EFAULT; + } + + if (items_area.min_item_size != item_size || + items_area.max_item_size != item_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("min_item_size %u, max_item_size %u, " + "item_size %zu\n", + items_area.min_item_size, items_area.max_item_size, + item_size); + return -EFAULT; + } + + if (items_area.area_size == 0 || + items_area.area_size >= node->node_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid area_size %u\n", + items_area.area_size); + return -EFAULT; + } + + if (items_area.free_space > items_area.area_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("free_space %u > area_size %u\n", + items_area.free_space, items_area.area_size); + return -EFAULT; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_capacity %u, items_count %u\n", + items_area.items_capacity, + items_area.items_count); + SSDFS_DBG("items_area: start_hash %llx, end_hash %llx\n", + items_area.start_hash, items_area.end_hash); + SSDFS_DBG("area_size %u, free_space %u\n", + items_area.area_size, + items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + free_items = items_area.items_capacity - items_area.items_count; + if (unlikely(free_items < 0)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_WARN("invalid free_items %d\n", + free_items); + return -EFAULT; + } else if (free_items == 0) { + SSDFS_DBG("node hasn't free items\n"); + return -ENOSPC; + } + + if (((u64)free_items * item_size) > items_area.free_space) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid free_items: " + "free_items %d, item_size %zu, free_space %u\n", + free_items, item_size, items_area.free_space); + return -EFAULT; + } + + item_index = search->result.start_index; + if ((item_index + search->request.count) > items_area.items_capacity) { + SSDFS_ERR("invalid request: " + "item_index %u, count %u\n", + item_index, search->request.count); + return -ERANGE; + } + + down_write(&node->full_lock); + + direction = is_requested_position_correct(node, &items_area, + search); + switch (direction) { + case SSDFS_CORRECT_POSITION: + /* do nothing */ + break; + + case SSDFS_SEARCH_LEFT_DIRECTION: + err = ssdfs_find_correct_position_from_left(node, &items_area, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the correct position: " + "err %d\n", + err); + goto finish_detect_affected_items; + } + break; + + case SSDFS_SEARCH_RIGHT_DIRECTION: + err = ssdfs_find_correct_position_from_right(node, &items_area, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the correct position: " + "err %d\n", + err); + goto finish_detect_affected_items; + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("fail to check requested position\n"); + goto finish_detect_affected_items; + } + + range_len = items_area.items_count - search->result.start_index; + dentries_count = range_len + search->request.count; + + item_index = search->result.start_index; + if ((item_index + dentries_count) > items_area.items_capacity) { + err = -ERANGE; + SSDFS_ERR("invalid dentries_count: " + "item_index %u, dentries_count %u, " + "items_capacity %u\n", + item_index, dentries_count, + items_area.items_capacity); + goto finish_detect_affected_items; + } + + if (items_area.items_count == 0) + goto lock_items_range; + + start_hash = search->request.start.hash; + end_hash = search->request.end.hash; + + if (item_index > 0) { + err = ssdfs_dentries_btree_node_get_dentry(node, + &items_area, + item_index - 1, + &dentry); + if (unlikely(err)) { + SSDFS_ERR("fail to get dentry: err %d\n", err); + goto finish_detect_affected_items; + } + + cur_hash = le64_to_cpu(dentry.hash_code); + + if (cur_hash < start_hash) { + /* + * expected state + */ + } else { + SSDFS_ERR("invalid range: item_index %u, " + "cur_hash %llx, " + "start_hash %llx, end_hash %llx\n", + item_index, cur_hash, + start_hash, end_hash); + + for (i = 0; i < items_area.items_count; i++) { + err = ssdfs_dentries_btree_node_get_dentry(node, + &items_area, + i, &dentry); + if (unlikely(err)) { + SSDFS_ERR("fail to get dentry: " + "err %d\n", err); + goto finish_detect_affected_items; + } + + SSDFS_ERR("index %d, ino %llu, hash %llx\n", + i, + le64_to_cpu(dentry.ino), + le64_to_cpu(dentry.hash_code)); + } + + err = -ERANGE; + goto finish_detect_affected_items; + } + } + + if (item_index < items_area.items_count) { + err = ssdfs_dentries_btree_node_get_dentry(node, + &items_area, + item_index, + &dentry); + if (unlikely(err)) { + SSDFS_ERR("fail to get dentry: err %d\n", err); + goto finish_detect_affected_items; + } + + cur_hash = le64_to_cpu(dentry.hash_code); + + if (end_hash < cur_hash) { + /* + * expected state + */ + } else { + SSDFS_ERR("invalid range: item_index %u, " + "cur_hash %llx, " + "start_hash %llx, end_hash %llx\n", + item_index, cur_hash, + start_hash, end_hash); + + for (i = 0; i < items_area.items_count; i++) { + err = ssdfs_dentries_btree_node_get_dentry(node, + &items_area, + i, &dentry); + if (unlikely(err)) { + SSDFS_ERR("fail to get dentry: " + "err %d\n", err); + goto finish_detect_affected_items; + } + + SSDFS_ERR("index %d, ino %llu, hash %llx\n", + i, + le64_to_cpu(dentry.ino), + le64_to_cpu(dentry.hash_code)); + } + + err = -ERANGE; + goto finish_detect_affected_items; + } + } + +lock_items_range: + err = ssdfs_lock_items_range(node, item_index, dentries_count); + if (err == -ENOENT) { + up_write(&node->full_lock); + wake_up_all(&node->wait_queue); + return -ERANGE; + } else if (err == -ENODATA) { + up_write(&node->full_lock); + wake_up_all(&node->wait_queue); + return -ERANGE; + } else if (unlikely(err)) + BUG(); + +finish_detect_affected_items: + downgrade_write(&node->full_lock); + + if (unlikely(err)) + goto finish_insert_item; + + err = ssdfs_shift_range_right(node, &items_area, item_size, + item_index, range_len, + search->request.count); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to shift dentries range: " + "start %u, count %u, err %d\n", + item_index, search->request.count, + err); + goto unlock_items_range; + } + + ssdfs_debug_btree_node_object(node); + + err = ssdfs_generic_insert_range(node, &items_area, + item_size, search); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to insert item: err %d\n", + err); + goto unlock_items_range; + } + + down_write(&node->header_lock); + + node->items_area.items_count += search->request.count; + if (node->items_area.items_count > node->items_area.items_capacity) { + err = -ERANGE; + SSDFS_ERR("items_count %u > items_capacity %u\n", + node->items_area.items_count, + node->items_area.items_capacity); + goto finish_items_area_correction; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_capacity %u, items_count %u\n", + items_area.items_capacity, + items_area.items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + used_space = (u32)search->request.count * item_size; + if (used_space > node->items_area.free_space) { + err = -ERANGE; + SSDFS_ERR("used_space %u > free_space %u\n", + used_space, + node->items_area.free_space); + goto finish_items_area_correction; + } + node->items_area.free_space -= used_space; + + err = ssdfs_dentries_btree_node_get_dentry(node, &node->items_area, + 0, &dentry); + if (unlikely(err)) { + SSDFS_ERR("fail to get dentry: err %d\n", err); + goto finish_items_area_correction; + } + start_hash = le64_to_cpu(dentry.hash_code); + + err = ssdfs_dentries_btree_node_get_dentry(node, + &node->items_area, + node->items_area.items_count - 1, + &dentry); + if (unlikely(err)) { + SSDFS_ERR("fail to get dentry: err %d\n", err); + goto finish_items_area_correction; + } + end_hash = le64_to_cpu(dentry.hash_code); + + if (start_hash >= U64_MAX || end_hash >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("start_hash %llx, end_hash %llx\n", + start_hash, end_hash); + goto finish_items_area_correction; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("BEFORE: node_id %u, start_hash %llx, end_hash %llx\n", + node->node_id, + node->items_area.start_hash, + node->items_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + node->items_area.start_hash = start_hash; + node->items_area.end_hash = end_hash; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("AFTER: node_id %u, start_hash %llx, end_hash %llx\n", + node->node_id, start_hash, end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_correct_lookup_table(node, &node->items_area, + item_index, dentries_count); + if (unlikely(err)) { + SSDFS_ERR("fail to correct lookup table: " + "err %d\n", err); + goto finish_items_area_correction; + } + + hdr = &node->raw.dentries_header; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("NODE (BEFORE): dentries_count %u\n", + le16_to_cpu(hdr->dentries_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + le16_add_cpu(&hdr->dentries_count, search->request.count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("NODE (AFTER): dentries_count %u\n", + le16_to_cpu(hdr->dentries_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + inline_names = 0; + for (i = 0; i < search->request.count; i++) { + u16 name_len; + + err = ssdfs_dentries_btree_node_get_dentry(node, + &items_area, + (i + item_index), + &dentry); + if (unlikely(err)) { + SSDFS_ERR("fail to get dentry: err %d\n", err); + goto finish_items_area_correction; + } + + name_len = le16_to_cpu(dentry.name_len); + if (name_len <= SSDFS_DENTRY_INLINE_NAME_MAX_LEN) + inline_names++; + } + + le16_add_cpu(&hdr->inline_names, inline_names); + hdr->free_space = cpu_to_le16(node->items_area.free_space); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("TREE (BEFORE): dentries_count %llu\n", + atomic64_read(&dtree->dentries_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + atomic64_add(search->request.count, &dtree->dentries_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("TREE (AFTER): dentries_count %llu\n", + atomic64_read(&dtree->dentries_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_items_area_correction: + up_write(&node->header_lock); + + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + goto unlock_items_range; + } + + err = ssdfs_set_node_header_dirty(node, items_area.items_capacity); + if (unlikely(err)) { + SSDFS_ERR("fail to set header dirty: err %d\n", + err); + goto unlock_items_range; + } + + err = ssdfs_set_dirty_items_range(node, items_area.items_capacity, + item_index, dentries_count); + if (unlikely(err)) { + SSDFS_ERR("fail to set items range as dirty: " + "start %u, count %u, err %d\n", + item_index, dentries_count, err); + goto unlock_items_range; + } + +unlock_items_range: + ssdfs_unlock_items_range(node, item_index, dentries_count); + +finish_insert_item: + up_read(&node->full_lock); + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_HYBRID_NODE: + if (items_area.items_count == 0) { + struct ssdfs_btree_index_key key; + + spin_lock(&node->descriptor_lock); + ssdfs_memcpy(&key, + 0, sizeof(struct ssdfs_btree_index_key), + &node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + spin_unlock(&node->descriptor_lock); + + key.index.hash = cpu_to_le64(start_hash); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, node_type %#x, " + "node_height %u, hash %llx\n", + le32_to_cpu(key.node_id), + key.node_type, + key.height, + le64_to_cpu(key.index.hash)); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_btree_node_add_index(node, &key); + if (unlikely(err)) { + SSDFS_ERR("fail to add index: err %d\n", err); + return err; + } + } else if (old_hash != start_hash) { + struct ssdfs_btree_index_key old_key, new_key; + + spin_lock(&node->descriptor_lock); + ssdfs_memcpy(&old_key, + 0, sizeof(struct ssdfs_btree_index_key), + &node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + ssdfs_memcpy(&new_key, + 0, sizeof(struct ssdfs_btree_index_key), + &node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + spin_unlock(&node->descriptor_lock); + + old_key.index.hash = cpu_to_le64(old_hash); + new_key.index.hash = cpu_to_le64(start_hash); + + err = ssdfs_btree_node_change_index(node, + &old_key, &new_key); + if (unlikely(err)) { + SSDFS_ERR("fail to change index: err %d\n", + err); + return err; + } + } + break; + + default: + /* do nothing */ + break; + } + + ssdfs_debug_btree_node_object(node); + + return err; +} + +/* + * ssdfs_dentries_btree_node_insert_item() - insert item in the node + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to insert an item in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOSPC - node hasn't free items. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_dentries_btree_node_insert_item(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + int state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); + SSDFS_DBG("free_space %u\n", node->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid result's state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.err == -ENODATA) { + search->result.err = 0; + /* + * Node doesn't contain requested item. + */ + } else if (search->result.err) { + SSDFS_WARN("invalid search result: err %d\n", + search->result.err); + return search->result.err; + } + + if (is_btree_search_contains_new_item(search)) { + switch (search->result.buf_state) { + case SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE: + search->result.buf_state = + SSDFS_BTREE_SEARCH_INLINE_BUFFER; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + search->result.buf = &search->raw.dentry; + search->result.buf_size = + sizeof(struct ssdfs_raw_dentry); + search->result.items_in_buffer = 1; + break; + + case SSDFS_BTREE_SEARCH_INLINE_BUFFER: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search->result.buf); + BUG_ON(search->result.buf_size != + sizeof(struct ssdfs_raw_dentry)); + BUG_ON(search->result.items_in_buffer != 1); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + default: + SSDFS_ERR("unexpected buffer state %#x\n", + search->result.buf_state); + return -ERANGE; + } + } else { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.count != 1); + BUG_ON(!search->result.buf); + BUG_ON(search->result.buf_state != + SSDFS_BTREE_SEARCH_INLINE_BUFFER); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + state = atomic_read(&node->items_area.state); + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + SSDFS_ERR("invalid area state %#x\n", + state); + return -ERANGE; + } + + err = __ssdfs_dentries_btree_node_insert_range(node, search); + if (unlikely(err)) { + SSDFS_ERR("fail to insert item: " + "node_id %u, err %d\n", + node->node_id, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_space %u\n", node->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_dentries_btree_node_insert_range() - insert range of items + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to insert a range of items in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOSPC - node hasn't free items. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_dentries_btree_node_insert_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + int state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); + SSDFS_DBG("free_space %u\n", node->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid result's state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.err == -ENODATA) { + /* + * Node doesn't contain inserting items. + */ + } else if (search->result.err) { + SSDFS_WARN("invalid search result: err %d\n", + search->result.err); + return search->result.err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.count < 1); + BUG_ON(!search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_read(&node->items_area.state); + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + SSDFS_ERR("invalid area state %#x\n", + state); + return -ERANGE; + } + + err = __ssdfs_dentries_btree_node_insert_range(node, search); + if (unlikely(err)) { + SSDFS_ERR("fail to insert range: " + "node_id %u, err %d\n", + node->node_id, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_space %u\n", node->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_change_item_only() - change dentry in the node + * @node: pointer on node object + * @area: pointer on items area's descriptor + * @search: pointer on search request object + * + * This method tries to change an item in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_change_item_only(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + struct ssdfs_btree_search *search) +{ + struct ssdfs_dentries_btree_node_header *hdr; + struct ssdfs_dir_entry dentry; + size_t item_size = sizeof(struct ssdfs_dir_entry); + u16 range_len; + u16 old_name_len, name_len; + bool name_was_inline, name_become_inline; + u16 item_index; + u64 start_hash, end_hash; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + range_len = search->request.count; + + if (range_len == 0) { + err = -ERANGE; + SSDFS_ERR("empty range\n"); + return err; + } + + item_index = search->result.start_index; + if ((item_index + range_len) > area->items_count) { + err = -ERANGE; + SSDFS_ERR("invalid request: " + "item_index %u, range_len %u, items_count %u\n", + item_index, range_len, + area->items_count); + return err; + } + + err = ssdfs_dentries_btree_node_get_dentry(node, area, item_index, + &dentry); + if (unlikely(err)) { + SSDFS_ERR("fail to get dentry: err %d\n", err); + return err; + } + + old_name_len = le16_to_cpu(dentry.name_len); + + err = ssdfs_generic_insert_range(node, area, + item_size, search); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to insert range: err %d\n", + err); + return err; + } + + down_write(&node->header_lock); + + start_hash = node->items_area.start_hash; + end_hash = node->items_area.end_hash; + + if (item_index == 0) { + err = ssdfs_dentries_btree_node_get_dentry(node, + &node->items_area, + item_index, + &dentry); + if (unlikely(err)) { + SSDFS_ERR("fail to get dentry: err %d\n", err); + goto finish_items_area_correction; + } + start_hash = le64_to_cpu(dentry.hash_code); + } + + if ((item_index + range_len) == node->items_area.items_count) { + err = ssdfs_dentries_btree_node_get_dentry(node, + &node->items_area, + item_index + range_len - 1, + &dentry); + if (unlikely(err)) { + SSDFS_ERR("fail to get dentry: err %d\n", err); + goto finish_items_area_correction; + } + end_hash = le64_to_cpu(dentry.hash_code); + } else if ((item_index + range_len) > node->items_area.items_count) { + err = -ERANGE; + SSDFS_ERR("invalid range_len: " + "item_index %u, range_len %u, items_count %u\n", + item_index, range_len, + node->items_area.items_count); + goto finish_items_area_correction; + } + + node->items_area.start_hash = start_hash; + node->items_area.end_hash = end_hash; + + err = ssdfs_correct_lookup_table(node, &node->items_area, + item_index, range_len); + if (unlikely(err)) { + SSDFS_ERR("fail to correct lookup table: " + "err %d\n", err); + goto finish_items_area_correction; + } + + err = ssdfs_dentries_btree_node_get_dentry(node, + &node->items_area, + item_index, + &dentry); + if (unlikely(err)) { + SSDFS_ERR("fail to get dentry: err %d\n", err); + goto finish_items_area_correction; + } + + name_len = le16_to_cpu(dentry.name_len); + + name_was_inline = old_name_len <= SSDFS_DENTRY_INLINE_NAME_MAX_LEN; + name_become_inline = name_len <= SSDFS_DENTRY_INLINE_NAME_MAX_LEN; + + hdr = &node->raw.dentries_header; + + if (!name_was_inline && name_become_inline) { + /* increment number of inline names */ + le16_add_cpu(&hdr->inline_names, 1); + } else if (name_was_inline && !name_become_inline) { + /* decrement number of inline names */ + if (le16_to_cpu(hdr->inline_names) == 0) { + err = -ERANGE; + SSDFS_ERR("invalid number of inline names: %u\n", + le16_to_cpu(hdr->inline_names)); + goto finish_items_area_correction; + } else + le16_add_cpu(&hdr->inline_names, -1); + } + +finish_items_area_correction: + up_write(&node->header_lock); + + if (unlikely(err)) + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + + return err; +} + +/* + * ssdfs_dentries_btree_node_change_item() - change item in the node + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to change an item in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_dentries_btree_node_change_item(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + size_t item_size = sizeof(struct ssdfs_dir_entry); + struct ssdfs_btree_node_items_area items_area; + u16 item_index; + int direction; + u16 range_len; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search->result.state != SSDFS_BTREE_SEARCH_VALID_ITEM) { + SSDFS_ERR("invalid result's state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.err) { + SSDFS_WARN("invalid search result: err %d\n", + search->result.err); + return search->result.err; + } + + if (is_btree_search_contains_new_item(search)) { + switch (search->result.buf_state) { + case SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE: + search->result.buf_state = + SSDFS_BTREE_SEARCH_INLINE_BUFFER; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + search->result.buf = &search->raw.dentry; + search->result.buf_size = + sizeof(struct ssdfs_raw_dentry); + search->result.items_in_buffer = 1; + break; + + case SSDFS_BTREE_SEARCH_INLINE_BUFFER: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search->result.buf); + BUG_ON(search->result.buf_size != + sizeof(struct ssdfs_raw_dentry)); + BUG_ON(search->result.items_in_buffer != 1); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + default: + SSDFS_ERR("unexpected buffer state %#x\n", + search->result.buf_state); + return -ERANGE; + } + } else { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.count != 1); + BUG_ON(!search->result.buf); + BUG_ON(search->result.buf_state != + SSDFS_BTREE_SEARCH_INLINE_BUFFER); + BUG_ON(search->result.items_in_buffer != 1); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid items_area state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + + down_read(&node->header_lock); + ssdfs_memcpy(&items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + &node->items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + sizeof(struct ssdfs_btree_node_items_area)); + up_read(&node->header_lock); + + if (items_area.items_capacity == 0 || + items_area.items_capacity < items_area.items_count) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid items accounting: " + "node_id %u, items_capacity %u, items_count %u\n", + node->node_id, items_area.items_capacity, + items_area.items_count); + return -EFAULT; + } + + if (items_area.min_item_size != item_size || + items_area.max_item_size != item_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("min_item_size %u, max_item_size %u, " + "item_size %zu\n", + items_area.min_item_size, items_area.max_item_size, + item_size); + return -EFAULT; + } + + if (items_area.area_size == 0 || + items_area.area_size >= node->node_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid area_size %u\n", + items_area.area_size); + return -EFAULT; + } + + down_write(&node->full_lock); + + direction = is_requested_position_correct(node, &items_area, + search); + switch (direction) { + case SSDFS_CORRECT_POSITION: + /* do nothing */ + break; + + case SSDFS_SEARCH_LEFT_DIRECTION: + err = ssdfs_find_correct_position_from_left(node, &items_area, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the correct position: " + "err %d\n", + err); + goto finish_define_changing_items; + } + break; + + case SSDFS_SEARCH_RIGHT_DIRECTION: + err = ssdfs_find_correct_position_from_right(node, &items_area, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the correct position: " + "err %d\n", + err); + goto finish_define_changing_items; + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("fail to check requested position\n"); + goto finish_define_changing_items; + } + + range_len = search->request.count; + + if (range_len == 0) { + err = -ERANGE; + SSDFS_ERR("empty range\n"); + goto finish_define_changing_items; + } + + item_index = search->result.start_index; + if ((item_index + range_len) > items_area.items_count) { + err = -ERANGE; + SSDFS_ERR("invalid request: " + "item_index %u, range_len %u, items_count %u\n", + item_index, range_len, + items_area.items_count); + goto finish_define_changing_items; + } + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_CHANGE_ITEM: + /* range_len doesn't need to be changed */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid request type: %#x\n", + search->request.type); + goto finish_define_changing_items; + } + + err = ssdfs_lock_items_range(node, item_index, range_len); + if (err == -ENOENT) { + up_write(&node->full_lock); + wake_up_all(&node->wait_queue); + return -ERANGE; + } else if (err == -ENODATA) { + up_write(&node->full_lock); + wake_up_all(&node->wait_queue); + return -ERANGE; + } else if (unlikely(err)) + BUG(); + +finish_define_changing_items: + downgrade_write(&node->full_lock); + + if (unlikely(err)) + goto finish_change_item; + + err = ssdfs_change_item_only(node, &items_area, search); + if (unlikely(err)) { + SSDFS_ERR("fail to change item: err %d\n", + err); + goto unlock_items_range; + } + + err = ssdfs_set_node_header_dirty(node, items_area.items_capacity); + if (unlikely(err)) { + SSDFS_ERR("fail to set header dirty: err %d\n", + err); + goto unlock_items_range; + } + + err = ssdfs_set_dirty_items_range(node, items_area.items_capacity, + item_index, range_len); + if (unlikely(err)) { + SSDFS_ERR("fail to set items range as dirty: " + "start %u, count %u, err %d\n", + item_index, range_len, err); + goto unlock_items_range; + } + +unlock_items_range: + ssdfs_unlock_items_range(node, item_index, range_len); + +finish_change_item: + up_read(&node->full_lock); + + ssdfs_debug_btree_node_object(node); + + return err; +} + +/* + * __ssdfs_invalidate_items_area() - invalidate the items area + * @node: pointer on node object + * @area: pointer on items area's descriptor + * @start_index: starting index of the item + * @range_len: number of items in the range + * @search: pointer on search request object + * + * The method tries to invalidate the items area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_invalidate_items_area(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + u16 start_index, u16 range_len, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_node *parent = NULL; + bool is_hybrid = false; + bool has_index_area = false; + bool index_area_empty = false; + bool items_area_empty = false; + int parent_type = SSDFS_BTREE_LEAF_NODE; + spinlock_t *lock; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, start_index %u, range_len %u\n", + node->node_id, start_index, range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (((u32)start_index + range_len) > area->items_count) { + SSDFS_ERR("start_index %u, range_len %u, items_count %u\n", + start_index, range_len, + area->items_count); + return -ERANGE; + } + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_HYBRID_NODE: + is_hybrid = true; + break; + + case SSDFS_BTREE_LEAF_NODE: + is_hybrid = false; + break; + + default: + SSDFS_WARN("invalid node type %#x\n", + atomic_read(&node->type)); + return -ERANGE; + } + + down_write(&node->header_lock); + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + if (node->items_area.items_count == range_len) + items_area_empty = true; + else + items_area_empty = false; + break; + + default: + items_area_empty = false; + break; + } + + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + has_index_area = true; + if (node->index_area.index_count == 0) + index_area_empty = true; + else + index_area_empty = false; + break; + + default: + has_index_area = false; + index_area_empty = false; + break; + } + + up_write(&node->header_lock); + + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + return err; + } + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_DELETE_ITEM: + case SSDFS_BTREE_SEARCH_DELETE_RANGE: + if (is_hybrid && has_index_area && !index_area_empty) { + search->result.state = + SSDFS_BTREE_SEARCH_OBSOLETE_RESULT; + } else if (items_area_empty) { + search->result.state = + SSDFS_BTREE_SEARCH_PLEASE_DELETE_NODE; + } else { + search->result.state = + SSDFS_BTREE_SEARCH_OBSOLETE_RESULT; + } + break; + + case SSDFS_BTREE_SEARCH_DELETE_ALL: + search->result.state = + SSDFS_BTREE_SEARCH_OBSOLETE_RESULT; + + parent = node; + + do { + lock = &parent->descriptor_lock; + spin_lock(lock); + parent = parent->parent_node; + spin_unlock(lock); + lock = NULL; + + if (!parent) { + SSDFS_ERR("node %u hasn't parent\n", + node->node_id); + return -ERANGE; + } + + parent_type = atomic_read(&parent->type); + switch (parent_type) { + case SSDFS_BTREE_ROOT_NODE: + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid parent node's type %#x\n", + parent_type); + return -ERANGE; + } + } while (parent_type != SSDFS_BTREE_ROOT_NODE); + + err = ssdfs_invalidate_root_node_hierarchy(parent); + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate root node hierarchy: " + "err %d\n", err); + return -ERANGE; + } + break; + + default: + atomic_set(&node->state, + SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid request type %#x\n", + search->request.type); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_invalidate_whole_items_area() - invalidate the whole items area + * @node: pointer on node object + * @area: pointer on items area's descriptor + * @search: pointer on search request object + * + * The method tries to invalidate the items area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_invalidate_whole_items_area(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, area %p, search %p\n", + node->node_id, area, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_invalidate_items_area(node, area, + 0, area->items_count, + search); +} + +/* + * ssdfs_invalidate_items_area_partially() - invalidate the items area + * @node: pointer on node object + * @area: pointer on items area's descriptor + * @start_index: starting index + * @range_len: number of items in the range + * @search: pointer on search request object + * + * The method tries to invalidate the items area partially. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_invalidate_items_area_partially(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + u16 start_index, u16 range_len, + struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, start_index %u, range_len %u\n", + node->node_id, start_index, range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_invalidate_items_area(node, area, + start_index, range_len, + search); +} + +/* + * __ssdfs_dentries_btree_node_delete_range() - delete range of items + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to delete a range of items in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + * %-EAGAIN - continue deletion in the next node. + */ +static +int __ssdfs_dentries_btree_node_delete_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree *tree; + struct ssdfs_dentries_btree_info *dtree; + struct ssdfs_dentries_btree_node_header *hdr; + struct ssdfs_btree_node_items_area items_area; + struct ssdfs_dir_entry dentry; + size_t item_size = sizeof(struct ssdfs_dir_entry); + u16 index_count = 0; + int free_items; + u16 item_index; + int direction; + u16 range_len; + u16 shift_range_len = 0; + u16 locked_len = 0; + u32 deleted_space, free_space; + u64 start_hash = U64_MAX; + u64 end_hash = U64_MAX; + u64 old_hash; + u32 old_dentries_count = 0, dentries_count = 0; + u32 dentries_diff; + u16 deleted_inline_names = 0, inline_names = 0; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_VALID_ITEM: + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid result state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.err) { + SSDFS_WARN("invalid search result: err %d\n", + search->result.err); + return search->result.err; + } + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid items_area state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + + tree = node->tree; + + switch (tree->type) { + case SSDFS_DENTRIES_BTREE: + /* expected btree type */ + break; + + default: + SSDFS_ERR("invalid btree type %#x\n", tree->type); + return -ERANGE; + } + + dtree = container_of(tree, struct ssdfs_dentries_btree_info, + buffer.tree); + + down_read(&node->header_lock); + ssdfs_memcpy(&items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + &node->items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + sizeof(struct ssdfs_btree_node_items_area)); + old_hash = node->items_area.start_hash; + up_read(&node->header_lock); + + if (items_area.items_capacity == 0 || + items_area.items_capacity < items_area.items_count) { + SSDFS_ERR("invalid items accounting: " + "node_id %u, items_capacity %u, items_count %u\n", + search->node.id, + items_area.items_capacity, + items_area.items_count); + return -ERANGE; + } + + if (items_area.min_item_size != item_size || + items_area.max_item_size != item_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("min_item_size %u, max_item_size %u, " + "item_size %zu\n", + items_area.min_item_size, items_area.max_item_size, + item_size); + return -EFAULT; + } + + if (items_area.area_size == 0 || + items_area.area_size >= node->node_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid area_size %u\n", + items_area.area_size); + return -EFAULT; + } + + if (items_area.free_space > items_area.area_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("free_space %u > area_size %u\n", + items_area.free_space, items_area.area_size); + return -EFAULT; + } + + free_items = items_area.items_capacity - items_area.items_count; + if (unlikely(free_items < 0)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_WARN("invalid free_items %d\n", + free_items); + return -EFAULT; + } + + if (((u64)free_items * item_size) > items_area.free_space) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid free_items: " + "free_items %d, item_size %zu, free_space %u\n", + free_items, item_size, items_area.free_space); + return -EFAULT; + } + + dentries_count = items_area.items_count; + item_index = search->result.start_index; + + range_len = search->request.count; + if (range_len == 0) { + SSDFS_ERR("range_len == 0\n"); + return -ERANGE; + } + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_DELETE_ITEM: + if ((item_index + range_len) > items_area.items_count) { + SSDFS_ERR("invalid request: " + "item_index %u, range_len %u, " + "items_count %u\n", + item_index, range_len, + items_area.items_count); + return -ERANGE; + } + break; + + case SSDFS_BTREE_SEARCH_DELETE_RANGE: + case SSDFS_BTREE_SEARCH_DELETE_ALL: + /* request can be distributed between several nodes */ + break; + + default: + atomic_set(&node->state, + SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid request type %#x\n", + search->request.type); + return -ERANGE; + } + + down_write(&node->full_lock); + + direction = is_requested_position_correct(node, &items_area, + search); + switch (direction) { + case SSDFS_CORRECT_POSITION: + /* do nothing */ + break; + + case SSDFS_SEARCH_LEFT_DIRECTION: + err = ssdfs_find_correct_position_from_left(node, &items_area, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the correct position: " + "err %d\n", + err); + goto finish_detect_affected_items; + } + break; + + case SSDFS_SEARCH_RIGHT_DIRECTION: + err = ssdfs_find_correct_position_from_right(node, &items_area, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the correct position: " + "err %d\n", + err); + goto finish_detect_affected_items; + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("fail to check requested position\n"); + goto finish_detect_affected_items; + } + + item_index = search->result.start_index; + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_DELETE_ITEM: + if ((item_index + range_len) > items_area.items_count) { + err = -ERANGE; + SSDFS_ERR("invalid dentries_count: " + "item_index %u, dentries_count %u, " + "items_count %u\n", + item_index, range_len, + items_area.items_count); + goto finish_detect_affected_items; + } + break; + + case SSDFS_BTREE_SEARCH_DELETE_RANGE: + case SSDFS_BTREE_SEARCH_DELETE_ALL: + /* request can be distributed between several nodes */ + range_len = min_t(unsigned int, range_len, + items_area.items_count - item_index); +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, item_index %u, " + "request.count %u, items_count %u\n", + node->node_id, item_index, + search->request.count, + items_area.items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + default: + BUG(); + } + + locked_len = items_area.items_count - item_index; + + err = ssdfs_lock_items_range(node, item_index, locked_len); + if (err == -ENOENT) { + up_write(&node->full_lock); + wake_up_all(&node->wait_queue); + return -ERANGE; + } else if (err == -ENODATA) { + up_write(&node->full_lock); + wake_up_all(&node->wait_queue); + return -ERANGE; + } else if (unlikely(err)) + BUG(); + +finish_detect_affected_items: + downgrade_write(&node->full_lock); + + if (unlikely(err)) + goto finish_delete_range; + + for (i = 0; i < range_len; i++) { + u16 name_len; + + err = ssdfs_dentries_btree_node_get_dentry(node, + &items_area, + (i + item_index), + &dentry); + if (unlikely(err)) { + SSDFS_ERR("fail to get dentry: err %d\n", err); + goto unlock_items_range; + } + + name_len = le16_to_cpu(dentry.name_len); + if (name_len <= SSDFS_DENTRY_INLINE_NAME_MAX_LEN) + deleted_inline_names++; + } + + err = ssdfs_btree_node_clear_range(node, &node->items_area, + item_size, search); + if (unlikely(err)) { + SSDFS_ERR("fail to clear items range: err %d\n", + err); + goto unlock_items_range; + } + + if (range_len == items_area.items_count) { + /* items area is empty */ + err = ssdfs_invalidate_whole_items_area(node, &items_area, + search); + } else { + err = ssdfs_invalidate_items_area_partially(node, &items_area, + item_index, + range_len, + search); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate items area: " + "node_id %u, start_index %u, " + "range_len %u, err %d\n", + node->node_id, item_index, + range_len, err); + goto unlock_items_range; + } + + shift_range_len = locked_len - range_len; + if (shift_range_len != 0) { + err = ssdfs_shift_range_left(node, &items_area, item_size, + item_index + range_len, + shift_range_len, range_len); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to shift the range: " + "start %u, count %u, err %d\n", + item_index + range_len, + shift_range_len, + err); + goto unlock_items_range; + } + + err = __ssdfs_btree_node_clear_range(node, + &items_area, item_size, + item_index + shift_range_len, + range_len); + if (unlikely(err)) { + SSDFS_ERR("fail to clear range: " + "start %u, count %u, err %d\n", + item_index + range_len, + shift_range_len, + err); + goto unlock_items_range; + } + } + + down_write(&node->header_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("INITIAL STATE: node_id %u, " + "items_count %u, free_space %u\n", + node->node_id, + node->items_area.items_count, + node->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (node->items_area.items_count < search->request.count) + node->items_area.items_count = 0; + else + node->items_area.items_count -= search->request.count; + + deleted_space = (u32)search->request.count * item_size; + free_space = node->items_area.free_space; + if ((free_space + deleted_space) > node->items_area.area_size) { + err = -ERANGE; + SSDFS_ERR("invalid request: " + "deleted_space %u, free_space %u, area_size %u\n", + deleted_space, + node->items_area.free_space, + node->items_area.area_size); + goto finish_items_area_correction; + } + node->items_area.free_space += deleted_space; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("NEW STATE: node_id %u, " + "items_count %u, free_space %u\n", + node->node_id, + node->items_area.items_count, + node->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (node->items_area.items_count == 0) { + start_hash = U64_MAX; + end_hash = U64_MAX; + } else { + err = ssdfs_dentries_btree_node_get_dentry(node, + &node->items_area, + 0, &dentry); + if (unlikely(err)) { + SSDFS_ERR("fail to get dentry: err %d\n", err); + goto finish_items_area_correction; + } + start_hash = le64_to_cpu(dentry.hash_code); + + err = ssdfs_dentries_btree_node_get_dentry(node, + &node->items_area, + node->items_area.items_count - 1, + &dentry); + if (unlikely(err)) { + SSDFS_ERR("fail to get dentry: err %d\n", err); + goto finish_items_area_correction; + } + end_hash = le64_to_cpu(dentry.hash_code); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("BEFORE: node_id %u, items_area.start_hash %llx, " + "items_area.end_hash %llx\n", + node->node_id, + node->items_area.start_hash, + node->items_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + node->items_area.start_hash = start_hash; + node->items_area.end_hash = end_hash; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("AFTER: node_id %u, items_area.start_hash %llx, " + "items_area.end_hash %llx\n", + node->node_id, + node->items_area.start_hash, + node->items_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (node->items_area.items_count == 0) + ssdfs_initialize_lookup_table(node); + else { + err = ssdfs_clean_lookup_table(node, + &node->items_area, + node->items_area.items_count); + if (unlikely(err)) { + SSDFS_ERR("fail to clean the rest of lookup table: " + "start_index %u, err %d\n", + node->items_area.items_count, err); + goto finish_items_area_correction; + } + + if (shift_range_len != 0) { + int start_index = + node->items_area.items_count - shift_range_len; + + if (start_index < 0) { + err = -ERANGE; + SSDFS_ERR("invalid start_index %d\n", + start_index); + goto finish_items_area_correction; + } + + err = ssdfs_correct_lookup_table(node, + &node->items_area, + start_index, + shift_range_len); + if (unlikely(err)) { + SSDFS_ERR("fail to correct lookup table: " + "err %d\n", err); + goto finish_items_area_correction; + } + } + } + + hdr = &node->raw.dentries_header; + + hdr->free_space = cpu_to_le16(node->items_area.free_space); + old_dentries_count = le16_to_cpu(hdr->dentries_count); + + if (node->items_area.items_count == 0) { + hdr->dentries_count = cpu_to_le16(0); + hdr->inline_names = cpu_to_le16(0); + } else { + if (old_dentries_count < search->request.count) { + hdr->dentries_count = cpu_to_le16(0); + hdr->inline_names = cpu_to_le16(0); + } else { + dentries_count = le16_to_cpu(hdr->dentries_count); + dentries_count -= search->request.count; + hdr->dentries_count = cpu_to_le16(dentries_count); + + inline_names = le16_to_cpu(hdr->inline_names); + if (deleted_inline_names > inline_names) { + err = -ERANGE; + SSDFS_ERR("invalid inline names: " + "deleted_inline_names %u, " + "inline_names %u\n", + deleted_inline_names, + inline_names); + goto finish_items_area_correction; + } + inline_names -= deleted_inline_names; + hdr->inline_names = cpu_to_le16(inline_names); + } + } + + dentries_count = le16_to_cpu(hdr->dentries_count); + dentries_diff = old_dentries_count - dentries_count; + atomic64_sub(dentries_diff, &dtree->dentries_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dentries_count %llu\n", + atomic64_read(&dtree->dentries_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_memcpy(&items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + &node->items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + sizeof(struct ssdfs_btree_node_items_area)); + + err = ssdfs_set_node_header_dirty(node, items_area.items_capacity); + if (unlikely(err)) { + SSDFS_ERR("fail to set header dirty: err %d\n", + err); + goto finish_items_area_correction; + } + + if (dentries_count != 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("set items range as dirty: " + "node_id %u, start %u, count %u\n", + node->node_id, item_index, + old_dentries_count - item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_set_dirty_items_range(node, + items_area.items_capacity, + item_index, + old_dentries_count - item_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set items range as dirty: " + "start %u, count %u, err %d\n", + item_index, + old_dentries_count - item_index, + err); + goto finish_items_area_correction; + } + } + +finish_items_area_correction: + up_write(&node->header_lock); + + if (unlikely(err)) + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + +unlock_items_range: + ssdfs_unlock_items_range(node, item_index, locked_len); + +finish_delete_range: + up_read(&node->full_lock); + + if (unlikely(err)) + return err; + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_HYBRID_NODE: + if (dentries_count == 0) { + int state; + + down_read(&node->header_lock); + state = atomic_read(&node->index_area.state); + index_count = node->index_area.index_count; + end_hash = node->index_area.end_hash; + up_read(&node->header_lock); + + if (state != SSDFS_BTREE_NODE_INDEX_AREA_EXIST) { + SSDFS_ERR("invalid area state %#x\n", + state); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index_count %u, end_hash %llx, " + "old_hash %llx\n", + index_count, end_hash, old_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (index_count <= 1 || end_hash == old_hash) { + err = ssdfs_btree_node_delete_index(node, + old_hash); + if (unlikely(err)) { + SSDFS_ERR("fail to delete index: " + "old_hash %llx, err %d\n", + old_hash, err); + return err; + } + + if (index_count > 0) + index_count--; + } + } else if (old_hash != start_hash) { + struct ssdfs_btree_index_key old_key, new_key; + + spin_lock(&node->descriptor_lock); + ssdfs_memcpy(&old_key, + 0, sizeof(struct ssdfs_btree_index_key), + &node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + ssdfs_memcpy(&new_key, + 0, sizeof(struct ssdfs_btree_index_key), + &node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + spin_unlock(&node->descriptor_lock); + + old_key.index.hash = cpu_to_le64(old_hash); + new_key.index.hash = cpu_to_le64(start_hash); + + err = ssdfs_btree_node_change_index(node, + &old_key, &new_key); + if (unlikely(err)) { + SSDFS_ERR("fail to change index: err %d\n", + err); + return err; + } + } + break; + + default: + /* do nothing */ + break; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_type %#x, dentries_count %u, index_count %u\n", + atomic_read(&node->type), + dentries_count, index_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (dentries_count == 0 && index_count == 0) + search->result.state = SSDFS_BTREE_SEARCH_PLEASE_DELETE_NODE; + else + search->result.state = SSDFS_BTREE_SEARCH_OBSOLETE_RESULT; + + if (search->request.type == SSDFS_BTREE_SEARCH_DELETE_RANGE) { + if (search->request.count > range_len) { + search->request.start.hash = items_area.end_hash; + search->request.count -= range_len; + return -EAGAIN; + } + } + + ssdfs_debug_btree_node_object(node); + + return 0; +} + +/* + * ssdfs_dentries_btree_node_delete_item() - delete an item from node + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to delete an item from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_dentries_btree_node_delete_item(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p, " + "search->result.count %d\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child, + search->result.count); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.count != 1); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_dentries_btree_node_delete_range(node, search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete dentry: err %d\n", + err); + return err; + } + + return 0; +} + +/* + * ssdfs_dentries_btree_node_delete_range() - delete range of items from node + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to delete a range of items from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_dentries_btree_node_delete_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_dentries_btree_node_delete_range(node, search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete dentries range: err %d\n", + err); + return err; + } + + return 0; +} + +/* + * ssdfs_dentries_btree_node_extract_range() - extract range of items from node + * @node: pointer on node object + * @start_index: starting index of the range + * @count: count of items in the range + * @search: pointer on search request object + * + * This method tries to extract a range of items from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + * %-ENOMEM - fail to allocate memory. + * %-ENODATA - no such range in the node. + */ +static +int ssdfs_dentries_btree_node_extract_range(struct ssdfs_btree_node *node, + u16 start_index, u16 count, + struct ssdfs_btree_search *search) +{ + struct ssdfs_dir_entry *dentry; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_index %u, count %u, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + start_index, count, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->full_lock); + err = __ssdfs_btree_node_extract_range(node, start_index, count, + sizeof(struct ssdfs_dir_entry), + search); + up_read(&node->full_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to extract a range: " + "start %u, count %u, err %d\n", + start_index, count, err); + return err; + } + + search->request.flags = + SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT; + dentry = (struct ssdfs_dir_entry *)search->result.buf; + search->request.start.hash = le64_to_cpu(dentry->hash_code); + dentry += search->result.count - 1; + search->request.end.hash = le64_to_cpu(dentry->hash_code); + search->request.count = count; + + return 0; +} + +/* + * ssdfs_dentries_btree_resize_items_area() - resize items area of the node + * @node: node object + * @new_size: new size of the items area + * + * This method tries to resize the items area of the node. + * + * TODO: It makes sense to allocate the bitmap with taking into + * account that we will resize the node. So, it needs + * to allocate the index area in bitmap is equal to + * the whole node and items area is equal to the whole node. + * This technique provides opportunity not to resize or + * to shift the content of the bitmap. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_dentries_btree_resize_items_area(struct ssdfs_btree_node *node, + u32 new_size) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_dentries_btree_node_header *dentries_header; + size_t item_size = sizeof(struct ssdfs_dir_entry); + size_t index_size; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !node->tree->fsi); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, new_size %u\n", + node->node_id, new_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + index_size = le16_to_cpu(fsi->vh->dentries_btree.desc.index_size); + + err = __ssdfs_btree_node_resize_items_area(node, + item_size, + index_size, + new_size); + if (unlikely(err)) { + SSDFS_ERR("fail to resize items area: " + "node_id %u, new_size %u, err %d\n", + node->node_id, new_size, err); + return err; + } + + dentries_header = &node->raw.dentries_header; + dentries_header->free_space = + cpu_to_le16((u16)node->items_area.free_space); + + return 0; +} + +void ssdfs_debug_dentries_btree_object(struct ssdfs_dentries_btree_info *tree) +{ +#ifdef CONFIG_SSDFS_DEBUG + int i; + + BUG_ON(!tree); + + SSDFS_DBG("DENTRIES TREE: type %#x, state %#x, " + "dentries_count %llu, is_locked %d, " + "generic_tree %p, inline_dentries %p, " + "root %p, owner %p, fsi %p\n", + atomic_read(&tree->type), + atomic_read(&tree->state), + (u64)atomic64_read(&tree->dentries_count), + rwsem_is_locked(&tree->lock), + tree->generic_tree, + tree->inline_dentries, + tree->root, + tree->owner, + tree->fsi); + + if (tree->generic_tree) { + /* debug dump of generic tree */ + ssdfs_debug_btree_object(tree->generic_tree); + } + + if (tree->inline_dentries) { + for (i = 0; i < SSDFS_INLINE_DENTRIES_COUNT; i++) { + struct ssdfs_dir_entry *dentry; + + dentry = &tree->inline_dentries[i]; + + SSDFS_DBG("INLINE DENTRY: index %d, ino %llu, " + "hash_code %llx, name_len %u, " + "dentry_type %#x, file_type %#x, " + "flags %#x\n", + i, + le64_to_cpu(dentry->ino), + le64_to_cpu(dentry->hash_code), + dentry->name_len, + dentry->dentry_type, + dentry->file_type, + dentry->flags); + + SSDFS_DBG("RAW STRING DUMP: index %d\n", + i); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + dentry->inline_string, + SSDFS_DENTRY_INLINE_NAME_MAX_LEN); + SSDFS_DBG("\n"); + } + } + + if (tree->root) { + SSDFS_DBG("ROOT NODE HEADER: height %u, items_count %u, " + "flags %#x, type %#x, upper_node_id %u, " + "node_ids (left %u, right %u)\n", + tree->root->header.height, + tree->root->header.items_count, + tree->root->header.flags, + tree->root->header.type, + le32_to_cpu(tree->root->header.upper_node_id), + le32_to_cpu(tree->root->header.node_ids[0]), + le32_to_cpu(tree->root->header.node_ids[1])); + + for (i = 0; i < SSDFS_BTREE_ROOT_NODE_INDEX_COUNT; i++) { + struct ssdfs_btree_index *index; + + index = &tree->root->indexes[i]; + + SSDFS_DBG("NODE_INDEX: index %d, hash %llx, " + "seg_id %llu, logical_blk %u, len %u\n", + i, + le64_to_cpu(index->hash), + le64_to_cpu(index->extent.seg_id), + le32_to_cpu(index->extent.logical_blk), + le32_to_cpu(index->extent.len)); + } + } +#endif /* CONFIG_SSDFS_DEBUG */ +} + +const struct ssdfs_btree_descriptor_operations ssdfs_dentries_btree_desc_ops = { + .init = ssdfs_dentries_btree_desc_init, + .flush = ssdfs_dentries_btree_desc_flush, +}; + +const struct ssdfs_btree_operations ssdfs_dentries_btree_ops = { + .create_root_node = ssdfs_dentries_btree_create_root_node, + .create_node = ssdfs_dentries_btree_create_node, + .init_node = ssdfs_dentries_btree_init_node, + .destroy_node = ssdfs_dentries_btree_destroy_node, + .add_node = ssdfs_dentries_btree_add_node, + .delete_node = ssdfs_dentries_btree_delete_node, + .pre_flush_root_node = ssdfs_dentries_btree_pre_flush_root_node, + .flush_root_node = ssdfs_dentries_btree_flush_root_node, + .pre_flush_node = ssdfs_dentries_btree_pre_flush_node, + .flush_node = ssdfs_dentries_btree_flush_node, +}; + +const struct ssdfs_btree_node_operations ssdfs_dentries_btree_node_ops = { + .find_item = ssdfs_dentries_btree_node_find_item, + .find_range = ssdfs_dentries_btree_node_find_range, + .extract_range = ssdfs_dentries_btree_node_extract_range, + .allocate_item = ssdfs_dentries_btree_node_allocate_item, + .allocate_range = ssdfs_dentries_btree_node_allocate_range, + .insert_item = ssdfs_dentries_btree_node_insert_item, + .insert_range = ssdfs_dentries_btree_node_insert_range, + .change_item = ssdfs_dentries_btree_node_change_item, + .delete_item = ssdfs_dentries_btree_node_delete_item, + .delete_range = ssdfs_dentries_btree_node_delete_range, + .resize_items_area = ssdfs_dentries_btree_resize_items_area, +}; From patchwork Sat Feb 25 01:09:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151969 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 853C0C64ED8 for ; Sat, 25 Feb 2023 01:20:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229722AbjBYBUA (ORCPT ); Fri, 24 Feb 2023 20:20:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49188 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229780AbjBYBTL (ORCPT ); Fri, 24 Feb 2023 20:19:11 -0500 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 36EE36596 for ; Fri, 24 Feb 2023 17:17:55 -0800 (PST) Received: by mail-oi1-x22b.google.com with SMTP id bg11so811962oib.5 for ; Fri, 24 Feb 2023 17:17:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2DyWSkXePqZnklTp36rYuwe8e016HPK9b6YS4lN35eE=; b=K8eRYbZY/eU2dLfJhjWkyrXIb7hnS9sMKQWqD7aiW3JIhhKyMwC+ZOtcZgk1QEKX8d uRBoE8bKTurkpjQVTvXJYyICbFxOFNY2vzKI5u15yHY+qogjF5wiZ+fKl8RCYp0370P8 0ymYV+CNR6n1L03KPre+lr9wZg1S9IhQiZNkdRh+dcQxJOHYQcJT7tOp4so2Awx5LtO+ G+AkQLV5AiaE19IXl4zBSOd5eNVtn5WZRVk10pzRj1rFw7f3YBoCH7saZO+eF0yuA7fA Kbr3IwgLdoDDrInBoCIQf+NycGwE7q/8hMbdOgMGCr3WCpR6V9gB4to8buFtNrU6MNd5 WAiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2DyWSkXePqZnklTp36rYuwe8e016HPK9b6YS4lN35eE=; b=IFqsbiFEnILAcs7qfBnOkoAPep2J+9phTqvL4zfhupeC4kaIilnrTcFg/RRtUqJs0V +3oOglJ8bPdeGxvj9NDPW5S2d2Nl6XGill2Qki4YeVU6Eq5Vz9wWPbxPwkb2cPvbZbUg pHu4jm2j3vc2cjgcfkS/WkWeFPKNxBRW/8EaPxdurC80MUjPs147yiCG+5GwNQzLlC/w Dzbad3BeNNSo5X4zd6VrBhcIuLftgVoo5fP8wz/Np1xF1waaw+4O/0oNnQuWFN1AGTQ9 YL2sc+Mik5jyg+jak0zHsVnBJsISPskrTMOWsDzKEmIEkEGIK1sVSpBYV96im9POwIyy Z+ng== X-Gm-Message-State: AO0yUKXdK9XCRcL8QFi5g75OFkyIP7kcv38iz37l1yO/X1Y4MGWNSA4v Foxmg6IHeYcSNntq2LAtIppNJMon7bNzEnPh X-Google-Smtp-Source: AK7set+yNwQXVtn2wJtF55AlJSu6pTuQOBAJFrtk+GG8M5zbwoIZH76yauaGHfdRzzaf+paYv7uycw== X-Received: by 2002:a54:4492:0:b0:363:ad47:34fc with SMTP id v18-20020a544492000000b00363ad4734fcmr8343657oiv.50.1677287872622; Fri, 24 Feb 2023 17:17:52 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:51 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 65/76] ssdfs: introduce extents queue object Date: Fri, 24 Feb 2023 17:09:16 -0800 Message-Id: <20230225010927.813929-66-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Extents queue implements a sequence (queue) of extent descriptors that need to be invalidated. Every piece of user data (file) or metadata (b-tree node) is described by extent. Moreover, extents b-tree keeps the knowledge about all allocated extents. Finally, file can be truncated, b-tree node (or whole b-tree) can be deleted. As a result, all affected extents need to be invalidated. SSDFS file system logic simply adds extent descriptor, b-tree node or root b-tree node descriptor into the invaidation queue (extents queue). And real invalidation happens in the background by specialized thread. If it is one extent, then thread has to invalidate only one extent. But if the descriptor describes the b-tree, then thread logic requires to traverse the whole b-tree and invalidate all found extents. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/extents_queue.c | 1723 ++++++++++++++++++++++++++++++++++++++ fs/ssdfs/extents_queue.h | 105 +++ 2 files changed, 1828 insertions(+) create mode 100644 fs/ssdfs/extents_queue.c create mode 100644 fs/ssdfs/extents_queue.h diff --git a/fs/ssdfs/extents_queue.c b/fs/ssdfs/extents_queue.c new file mode 100644 index 000000000000..3edcbcd3b46c --- /dev/null +++ b/fs/ssdfs/extents_queue.c @@ -0,0 +1,1723 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/extents_queue.c - extents queue implementation. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "segment_bitmap.h" +#include "offset_translation_table.h" +#include "page_array.h" +#include "page_vector.h" +#include "peb_container.h" +#include "segment.h" +#include "btree_search.h" +#include "btree_node.h" +#include "btree.h" +#include "extents_queue.h" +#include "shared_extents_tree.h" +#include "extents_tree.h" +#include "xattr_tree.h" + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_ext_queue_page_leaks; +atomic64_t ssdfs_ext_queue_memory_leaks; +atomic64_t ssdfs_ext_queue_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_ext_queue_cache_leaks_increment(void *kaddr) + * void ssdfs_ext_queue_cache_leaks_decrement(void *kaddr) + * void *ssdfs_ext_queue_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_ext_queue_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_ext_queue_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_ext_queue_kfree(void *kaddr) + * struct page *ssdfs_ext_queue_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_ext_queue_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_ext_queue_free_page(struct page *page) + * void ssdfs_ext_queue_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(ext_queue) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(ext_queue) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_ext_queue_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_ext_queue_page_leaks, 0); + atomic64_set(&ssdfs_ext_queue_memory_leaks, 0); + atomic64_set(&ssdfs_ext_queue_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_ext_queue_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_ext_queue_page_leaks) != 0) { + SSDFS_ERR("EXTENTS QUEUE: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_ext_queue_page_leaks)); + } + + if (atomic64_read(&ssdfs_ext_queue_memory_leaks) != 0) { + SSDFS_ERR("EXTENTS QUEUE: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_ext_queue_memory_leaks)); + } + + if (atomic64_read(&ssdfs_ext_queue_cache_leaks) != 0) { + SSDFS_ERR("EXTENTS QUEUE: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_ext_queue_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +static struct kmem_cache *ssdfs_extent_info_cachep; + +void ssdfs_zero_extent_info_cache_ptr(void) +{ + ssdfs_extent_info_cachep = NULL; +} + +static +void ssdfs_init_extent_info_once(void *obj) +{ + struct ssdfs_extent_info *ei_obj = obj; + + memset(ei_obj, 0, sizeof(struct ssdfs_extent_info)); +} + +void ssdfs_shrink_extent_info_cache(void) +{ + if (ssdfs_extent_info_cachep) + kmem_cache_shrink(ssdfs_extent_info_cachep); +} + +void ssdfs_destroy_extent_info_cache(void) +{ + if (ssdfs_extent_info_cachep) + kmem_cache_destroy(ssdfs_extent_info_cachep); +} + +int ssdfs_init_extent_info_cache(void) +{ + ssdfs_extent_info_cachep = kmem_cache_create("ssdfs_extent_info_cache", + sizeof(struct ssdfs_extent_info), 0, + SLAB_RECLAIM_ACCOUNT | + SLAB_MEM_SPREAD | + SLAB_ACCOUNT, + ssdfs_init_extent_info_once); + if (!ssdfs_extent_info_cachep) { + SSDFS_ERR("unable to create extent info objects cache\n"); + return -ENOMEM; + } + + return 0; +} + +/* + * ssdfs_extents_queue_init() - initialize extents queue + * @eq: initialized extents queue + */ +void ssdfs_extents_queue_init(struct ssdfs_extents_queue *eq) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!eq); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock_init(&eq->lock); + INIT_LIST_HEAD(&eq->list); +} + +/* + * is_ssdfs_extents_queue_empty() - check that extents queue is empty + * @eq: extents queue + */ +bool is_ssdfs_extents_queue_empty(struct ssdfs_extents_queue *eq) +{ + bool is_empty; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!eq); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&eq->lock); + is_empty = list_empty_careful(&eq->list); + spin_unlock(&eq->lock); + + return is_empty; +} + +/* + * ssdfs_extents_queue_add_head() - add extent at the head of queue + * @eq: extents queue + * @ei: extent info + */ +void ssdfs_extents_queue_add_head(struct ssdfs_extents_queue *eq, + struct ssdfs_extent_info *ei) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!eq || !ei); + + SSDFS_DBG("type %#x, owner_ino %llu\n", + ei->type, ei->owner_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&eq->lock); + list_add(&ei->list, &eq->list); + spin_unlock(&eq->lock); +} + +/* + * ssdfs_extents_queue_add_tail() - add extent at the tail of queue + * @eq: extents queue + * @ei: extent info + */ +void ssdfs_extents_queue_add_tail(struct ssdfs_extents_queue *eq, + struct ssdfs_extent_info *ei) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!eq || !ei); + + SSDFS_DBG("type %#x, owner_ino %llu\n", + ei->type, ei->owner_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&eq->lock); + list_add_tail(&ei->list, &eq->list); + spin_unlock(&eq->lock); +} + +/* + * ssdfs_extents_queue_remove_first() - get extent and remove from queue + * @eq: extents queue + * @ei: first extent [out] + * + * This function get first extent in @eq, remove it from queue + * and return as @ei. + * + * RETURN: + * [success] - @ei contains pointer on extent. + * [failure] - error code: + * + * %-ENODATA - queue is empty. + * %-ENOENT - first entry is NULL. + */ +int ssdfs_extents_queue_remove_first(struct ssdfs_extents_queue *eq, + struct ssdfs_extent_info **ei) +{ + bool is_empty; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!eq || !ei); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&eq->lock); + is_empty = list_empty_careful(&eq->list); + if (!is_empty) { + *ei = list_first_entry_or_null(&eq->list, + struct ssdfs_extent_info, + list); + if (!*ei) { + SSDFS_WARN("first entry is NULL\n"); + err = -ENOENT; + } else + list_del(&(*ei)->list); + } + spin_unlock(&eq->lock); + + if (is_empty) { + SSDFS_WARN("extents queue is empty\n"); + err = -ENODATA; + } + + return err; +} + +/* + * ssdfs_extents_queue_remove_all() - remove all extents from queue + * @eq: extents queue + * + * This function removes all extents from the queue. + */ +void ssdfs_extents_queue_remove_all(struct ssdfs_extents_queue *eq) +{ + bool is_empty; + LIST_HEAD(tmp_list); + struct list_head *this, *next; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!eq); +#endif /* CONFIG_SSDFS_DEBUG */ + + spin_lock(&eq->lock); + is_empty = list_empty_careful(&eq->list); + if (!is_empty) + list_replace_init(&eq->list, &tmp_list); + spin_unlock(&eq->lock); + + if (is_empty) + return; + + list_for_each_safe(this, next, &tmp_list) { + struct ssdfs_extent_info *ei; + + ei = list_entry(this, struct ssdfs_extent_info, list); + list_del(&ei->list); + + switch (ei->type) { + case SSDFS_EXTENT_INFO_RAW_EXTENT: + SSDFS_WARN("delete extent: " + "seg_id %llu, logical_blk %u, len %u\n", + le64_to_cpu(ei->raw.extent.seg_id), + le32_to_cpu(ei->raw.extent.logical_blk), + le32_to_cpu(ei->raw.extent.len)); + break; + + case SSDFS_EXTENT_INFO_INDEX_DESCRIPTOR: + case SSDFS_EXTENT_INFO_DENTRY_INDEX_DESCRIPTOR: + case SSDFS_EXTENT_INFO_SHDICT_INDEX_DESCRIPTOR: + case SSDFS_EXTENT_INFO_XATTR_INDEX_DESCRIPTOR: + SSDFS_WARN("delete index: " + "node_id %u, node_type %#x, height %u, " + "seg_id %llu, logical_blk %u, len %u\n", + le32_to_cpu(ei->raw.index.node_id), + ei->raw.index.node_type, + ei->raw.index.height, + le64_to_cpu(ei->raw.index.index.extent.seg_id), + le32_to_cpu(ei->raw.index.index.extent.logical_blk), + le32_to_cpu(ei->raw.index.index.extent.len)); + break; + + default: + SSDFS_WARN("invalid extent info type %#x\n", + ei->type); + break; + } + + ssdfs_extent_info_free(ei); + } +} + +/* + * ssdfs_extent_info_alloc() - allocate memory for extent info object + */ +struct ssdfs_extent_info *ssdfs_extent_info_alloc(void) +{ + struct ssdfs_extent_info *ptr; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ssdfs_extent_info_cachep); +#endif /* CONFIG_SSDFS_DEBUG */ + + ptr = kmem_cache_alloc(ssdfs_extent_info_cachep, GFP_KERNEL); + if (!ptr) { + SSDFS_ERR("fail to allocate memory for extent\n"); + return ERR_PTR(-ENOMEM); + } + + ssdfs_ext_queue_cache_leaks_increment(ptr); + + return ptr; +} + +/* + * ssdfs_extent_info_free() - free memory for extent info object + */ +void ssdfs_extent_info_free(struct ssdfs_extent_info *ei) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ssdfs_extent_info_cachep); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!ei) + return; + + ssdfs_ext_queue_cache_leaks_decrement(ei); + kmem_cache_free(ssdfs_extent_info_cachep, ei); +} + +/* + * ssdfs_extent_info_init() - extent info initialization + * @type: extent info type + * @ptr: pointer on extent info item + * @owner_ino: btree's owner inode id + * @ei: extent info [out] + */ +void ssdfs_extent_info_init(int type, void *ptr, u64 owner_ino, + struct ssdfs_extent_info *ei) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ptr || !ei); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset(ei, 0, sizeof(struct ssdfs_extent_info)); + + INIT_LIST_HEAD(&ei->list); + ei->type = SSDFS_EXTENT_INFO_UNKNOWN_TYPE; + + switch (type) { + case SSDFS_EXTENT_INFO_RAW_EXTENT: + ei->type = type; + ei->owner_ino = owner_ino; + ssdfs_memcpy(&ei->raw.extent, + 0, sizeof(struct ssdfs_raw_extent), + ptr, + 0, sizeof(struct ssdfs_raw_extent), + sizeof(struct ssdfs_raw_extent)); + break; + + case SSDFS_EXTENT_INFO_INDEX_DESCRIPTOR: + case SSDFS_EXTENT_INFO_DENTRY_INDEX_DESCRIPTOR: + case SSDFS_EXTENT_INFO_SHDICT_INDEX_DESCRIPTOR: + case SSDFS_EXTENT_INFO_XATTR_INDEX_DESCRIPTOR: + ei->type = type; + ei->owner_ino = owner_ino; + ssdfs_memcpy(&ei->raw.index, + 0, sizeof(struct ssdfs_btree_index_key), + ptr, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + SSDFS_WARN("invalid type %#x\n", type); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + } +} + +static inline +int ssdfs_mark_segment_under_invalidation(struct ssdfs_segment_info *si) +{ + int activity_type; + + activity_type = atomic_cmpxchg(&si->activity_type, + SSDFS_SEG_OBJECT_REGULAR_ACTIVITY, + SSDFS_SEG_UNDER_INVALIDATION); + if (activity_type != SSDFS_SEG_OBJECT_REGULAR_ACTIVITY) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segment %llu is busy under activity %#x\n", + si->seg_id, activity_type); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EBUSY; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segment %llu is under invalidation\n", + si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +static inline +int ssdfs_revert_invalidation_to_regular_activity(struct ssdfs_segment_info *si) +{ + int activity_type; + + activity_type = atomic_cmpxchg(&si->activity_type, + SSDFS_SEG_UNDER_INVALIDATION, + SSDFS_SEG_OBJECT_REGULAR_ACTIVITY); + if (activity_type != SSDFS_SEG_UNDER_INVALIDATION) { + SSDFS_WARN("segment %llu is under activity %#x\n", + si->seg_id, activity_type); + return -EFAULT; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segment %llu has been reverted from invalidation\n", + si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_invalidate_index_area() - invalidate index area + * @shextree: shared dictionary tree + * @owner_ino: inode ID of btree's owner + * @node_size: node size in bytes + * @pvec: pagevec of the node's content + * + * This method tries to invalidate the index area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EIO - node is corrupted. + */ +static +int ssdfs_invalidate_index_area(struct ssdfs_shared_extents_tree *shextree, + u64 owner_ino, + struct ssdfs_btree_node_header *hdr, + u32 node_size, + struct pagevec *pvec) +{ + struct ssdfs_btree_index_key cur_index; + u8 index_size; + u16 index_count; + u32 area_offset, area_size; + u16 flags; + int index_type = SSDFS_EXTENT_INFO_UNKNOWN_TYPE; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!shextree || !hdr || !pvec); + + SSDFS_DBG("owner_id %llu, node_size %u\n", + owner_ino, node_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (le16_to_cpu(hdr->magic.key)) { + case SSDFS_EXTENTS_BNODE_MAGIC: + index_type = SSDFS_EXTENT_INFO_INDEX_DESCRIPTOR; + break; + + case SSDFS_DENTRIES_BNODE_MAGIC: + index_type = SSDFS_EXTENT_INFO_DENTRY_INDEX_DESCRIPTOR; + break; + + case SSDFS_DICTIONARY_BNODE_MAGIC: + index_type = SSDFS_EXTENT_INFO_SHDICT_INDEX_DESCRIPTOR; + break; + + case SSDFS_XATTR_BNODE_MAGIC: + index_type = SSDFS_EXTENT_INFO_XATTR_INDEX_DESCRIPTOR; + break; + + default: + SSDFS_ERR("unsupported btree: magic %#x\n", + le16_to_cpu(hdr->magic.key)); + return -ERANGE; + } + + index_size = hdr->index_size; + index_count = le16_to_cpu(hdr->index_count); + + area_offset = le16_to_cpu(hdr->index_area_offset); + area_size = 1 << hdr->log_index_area_size; + + if (area_size < ((u32)index_count * index_size)) { + SSDFS_ERR("corrupted node header: " + "index_size %u, index_count %u, " + "area_size %u\n", + index_size, index_count, area_size); + return -EIO; + } + + for (i = 0; i < index_count; i++) { + err = ssdfs_btree_node_get_index(pvec, + area_offset, + area_size, + node_size, + i, + &cur_index); + if (unlikely(err)) { + SSDFS_ERR("fail to get index: " + "position %u, err %d\n", + i, err); + return err; + } + + if (le32_to_cpu(cur_index.node_id) >= U32_MAX) { + SSDFS_ERR("corrupted index: " + "node_id %u\n", + le32_to_cpu(cur_index.node_id)); + return -EIO; + } + + switch (cur_index.node_type) { + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_LEAF_NODE: + /* expected state */ + break; + + default: + SSDFS_ERR("corrupted index: " + "invalid node type %#x\n", + cur_index.node_type); + return -EIO; + } + + if (cur_index.height >= U8_MAX) { + SSDFS_ERR("corrupted index: " + "invalid height %u\n", + cur_index.height); + return -EIO; + } + + flags = le16_to_cpu(cur_index.flags); + if (flags & ~SSDFS_BTREE_INDEX_FLAGS_MASK) { + SSDFS_ERR("corrupted index: " + "invalid flags set %#x\n", + flags); + return -EIO; + } + + if (le64_to_cpu(cur_index.index.hash) >= U64_MAX) { + SSDFS_ERR("corrupted index: " + "invalid hash %llx\n", + le64_to_cpu(cur_index.index.hash)); + return -EIO; + } + + err = ssdfs_shextree_add_pre_invalid_index(shextree, + owner_ino, + index_type, + &cur_index); + if (unlikely(err)) { + SSDFS_ERR("fail to add pre-invalid index: " + "position %u, err %d\n", + i, err); + return err; + } + } + + return 0; +} + +/* + * __ssdfs_invalidate_btree_index() - invalidate btree's index + * @fsi: pointer on shared file system object + * @owner_ino: inode ID of btree's owner + * @node_size: node size in bytes + * @hdr: pointer on header's buffer + * @hdr_size: size of the header in bytes + * @extent: extent for invalidation + * + * This method tries to invalidate the index. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EIO - node is corrupted. + */ +static +int __ssdfs_invalidate_btree_index(struct ssdfs_fs_info *fsi, + u64 owner_ino, + u32 node_size, + void *hdr, + size_t hdr_size, + struct ssdfs_btree_index_key *index) +{ + struct ssdfs_shared_extents_tree *shextree = NULL; + struct ssdfs_segment_info *si = NULL; + struct ssdfs_btree_node_header *hdr_ptr; + struct pagevec pvec; + struct page *page; + u32 node_id1, node_id2; + int node_type1, node_type2; + u8 height1, height2; + u16 flags; + bool has_index_area, has_items_area; + u32 start_blk; + u32 len; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !hdr || !index); + + SSDFS_DBG("node_id %u, node_type %#x, " + "height %u, owner_ino %llu, " + "node_size %u, hdr_size %zu\n", + le32_to_cpu(index->node_id), + index->node_type, + index->height, + owner_ino, + node_size, + hdr_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + shextree = fsi->shextree; + + if (!shextree) { + SSDFS_ERR("shared extents tree is absent\n"); + return -ERANGE; + } + + hdr_ptr = (struct ssdfs_btree_node_header *)hdr; + pagevec_init(&pvec); + + err = __ssdfs_btree_node_prepare_content(fsi, index, node_size, + owner_ino, &si, &pvec); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare node's content: " + "node_id %u, node_type %#x, " + "owner_ino %llu, err %d\n", + le32_to_cpu(index->node_id), + index->node_type, + owner_ino, err); + goto finish_invalidate_index; + } + + if (pagevec_count(&pvec) == 0) { + err = -ERANGE; + SSDFS_ERR("empty node's content: id %u\n", + le32_to_cpu(index->node_id)); + goto finish_invalidate_index; + } + + page = pvec.pages[0]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_memcpy_from_page(hdr, 0, hdr_size, + page, 0, PAGE_SIZE, + hdr_size); + + if (!is_csum_valid(&hdr_ptr->check, hdr_ptr, hdr_size)) { + err = -EIO; + SSDFS_ERR("invalid checksum: node_id %u\n", + le32_to_cpu(index->node_id)); + } + + if (unlikely(err)) + goto finish_invalidate_index; + + if (node_size != (1 << hdr_ptr->log_node_size)) { + err = -EIO; + SSDFS_ERR("node_size1 %u != node_size2 %u\n", + node_size, + 1 << hdr_ptr->log_node_size); + goto finish_invalidate_index; + } + + node_id1 = le32_to_cpu(index->node_id); + node_id2 = le32_to_cpu(hdr_ptr->node_id); + + if (node_id1 != node_id2) { + err = -ERANGE; + SSDFS_ERR("node_id1 %u != node_id2 %u\n", + node_id1, node_id2); + goto finish_invalidate_index; + } + + node_type1 = index->node_type; + node_type2 = hdr_ptr->type; + + if (node_type1 != node_type2) { + err = -ERANGE; + SSDFS_ERR("node_type1 %#x != node_type2 %#x\n", + node_type1, node_type2); + goto finish_invalidate_index; + } + + height1 = index->height; + height2 = hdr_ptr->height; + + if (height1 != height2) { + err = -ERANGE; + SSDFS_ERR("height1 %u != height2 %u\n", + height1, height2); + goto finish_invalidate_index; + } + + flags = le16_to_cpu(hdr_ptr->flags); + + if (flags & ~SSDFS_BTREE_NODE_FLAGS_MASK) { + err = -EIO; + SSDFS_ERR("corrupted node header: flags %#x\n", + flags); + goto finish_invalidate_index; + } + + has_index_area = flags & SSDFS_BTREE_NODE_HAS_INDEX_AREA; + has_items_area = flags & SSDFS_BTREE_NODE_HAS_ITEMS_AREA; + + if (!has_index_area && !has_items_area) { + err = -EIO; + SSDFS_ERR("corrupted node header: no areas\n"); + goto finish_invalidate_index; + } + + if (has_index_area) { + err = ssdfs_invalidate_index_area(shextree, owner_ino, + hdr_ptr, + node_size, &pvec); + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate index area: " + "err %d\n", err); + goto finish_invalidate_index; + } + } + + start_blk = le32_to_cpu(index->index.extent.logical_blk); + len = le32_to_cpu(index->index.extent.len); + + err = ssdfs_mark_segment_under_invalidation(si); + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segment %llu is busy\n", + si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_invalidate_index; + } + + err = ssdfs_segment_invalidate_logical_extent(si, start_blk, len); + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate node: " + "node_id %u, seg_id %llu, " + "start_blk %u, len %u\n", + node_id1, si->seg_id, + start_blk, len); + goto revert_invalidation_state; + } + + for (i = 0; i < si->pebs_count; i++) { + struct ssdfs_segment_request *req; + + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + goto revert_invalidation_state; + } + + ssdfs_request_init(req); + ssdfs_get_request(req); + + err = ssdfs_segment_commit_log_async2(si, SSDFS_REQ_ASYNC, + i, req); + if (unlikely(err)) { + SSDFS_ERR("commit log request failed: " + "peb_index %d, err %d\n", + i, err); + ssdfs_put_request(req); + ssdfs_request_free(req); + goto revert_invalidation_state; + } + } + +revert_invalidation_state: + err = ssdfs_revert_invalidation_to_regular_activity(si); + if (unlikely(err)) { + SSDFS_ERR("unexpected segment %llu activity\n", + si->seg_id); + } + +finish_invalidate_index: + ssdfs_ext_queue_pagevec_release(&pvec); + ssdfs_segment_put_object(si); + return err; +} + +/* + * ssdfs_invalidate_dentries_btree_index() - invalidate dentries btree index + * @fsi: pointer on shared file system object + * @owner_ino: inode ID of btree's owner + * @extent: extent for invalidation + * + * This method tries to invalidate the index. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EIO - node is corrupted. + */ +int ssdfs_invalidate_dentries_btree_index(struct ssdfs_fs_info *fsi, + u64 owner_ino, + struct ssdfs_btree_index_key *index) +{ + struct ssdfs_dentries_btree_descriptor *dentries_btree; + u32 node_size; + struct ssdfs_dentries_btree_node_header hdr; + size_t hdr_size = sizeof(struct ssdfs_dentries_btree_node_header); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !index); + + SSDFS_DBG("node_id %u, node_type %#x, " + "height %u, owner_ino %llu\n", + le32_to_cpu(index->node_id), + index->node_type, + index->height, + owner_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + dentries_btree = &fsi->vh->dentries_btree; + node_size = 1 << dentries_btree->desc.log_node_size; + + return __ssdfs_invalidate_btree_index(fsi, owner_ino, node_size, + &hdr, hdr_size, index); +} + +/* + * ssdfs_invalidate_shared_dict_btree_index() - invalidate shared dict index + * @fsi: pointer on shared file system object + * @owner_ino: inode ID of btree's owner + * @extent: extent for invalidation + * + * This method tries to invalidate the index. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EIO - node is corrupted. + */ +int ssdfs_invalidate_shared_dict_btree_index(struct ssdfs_fs_info *fsi, + u64 owner_ino, + struct ssdfs_btree_index_key *index) +{ + struct ssdfs_shared_dictionary_btree *shared_dict; + struct ssdfs_shared_dictionary_node_header hdr; + size_t hdr_size = sizeof(struct ssdfs_shared_dictionary_node_header); + u32 node_size; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !index); + + SSDFS_DBG("node_id %u, node_type %#x, " + "height %u, owner_ino %llu\n", + le32_to_cpu(index->node_id), + index->node_type, + index->height, + owner_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + shared_dict = &fsi->vs->shared_dict_btree; + node_size = 1 << shared_dict->desc.log_node_size; + + return __ssdfs_invalidate_btree_index(fsi, owner_ino, node_size, + &hdr, hdr_size, index); +} + +/* + * ssdfs_invalidate_extents_btree_index() - invalidate extents btree index + * @fsi: pointer on shared file system object + * @owner_ino: inode ID of btree's owner + * @extent: extent for invalidation + * + * This method tries to invalidate the index. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EIO - node is corrupted. + */ +int ssdfs_invalidate_extents_btree_index(struct ssdfs_fs_info *fsi, + u64 owner_ino, + struct ssdfs_btree_index_key *index) +{ + struct ssdfs_shared_extents_tree *shextree = NULL; + struct ssdfs_segment_info *si = NULL; + struct ssdfs_extents_btree_descriptor *extents_btree; + u32 node_size; + struct pagevec pvec; + struct ssdfs_btree_node_header hdr; + struct ssdfs_extents_btree_node_header *hdr_ptr; + size_t hdr_size = sizeof(struct ssdfs_extents_btree_node_header); + u64 parent_ino; + u64 blks_count, calculated_blks = 0; + u32 forks_count; + u32 allocated_extents; + u32 valid_extents; + u32 max_extent_blks; + struct page *page; + void *kaddr; + u32 node_id1, node_id2; + int node_type1, node_type2; + u8 height1, height2; + u16 flags; + u32 area_offset, area_size; + bool has_index_area, has_items_area; + u32 i; + u32 start_blk; + u32 len; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !index); + + SSDFS_DBG("node_id %u, node_type %#x, " + "height %u, owner_ino %llu\n", + le32_to_cpu(index->node_id), + index->node_type, + index->height, + owner_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + shextree = fsi->shextree; + + if (!shextree) { + SSDFS_ERR("shared extents tree is absent\n"); + return -ERANGE; + } + + extents_btree = &fsi->vh->extents_btree; + node_size = 1 << extents_btree->desc.log_node_size; + pagevec_init(&pvec); + + err = __ssdfs_btree_node_prepare_content(fsi, index, node_size, + owner_ino, &si, &pvec); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare node's content: " + "node_id %u, node_type %#x, " + "owner_ino %llu, err %d\n", + le32_to_cpu(index->node_id), + index->node_type, + owner_ino, err); + goto finish_invalidate_index; + } + + if (pagevec_count(&pvec) == 0) { + err = -ERANGE; + SSDFS_ERR("empty node's content: id %u\n", + le32_to_cpu(index->node_id)); + goto finish_invalidate_index; + } + + page = pvec.pages[0]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + kaddr = kmap_local_page(page); + + ssdfs_memcpy(&hdr, 0, sizeof(struct ssdfs_btree_node_header), + kaddr, 0, PAGE_SIZE, + sizeof(struct ssdfs_btree_node_header)); + + hdr_ptr = (struct ssdfs_extents_btree_node_header *)kaddr; + parent_ino = le64_to_cpu(hdr_ptr->parent_ino); + blks_count = le64_to_cpu(hdr_ptr->blks_count); + forks_count = le32_to_cpu(hdr_ptr->forks_count); + allocated_extents = le32_to_cpu(hdr_ptr->allocated_extents); + valid_extents = le32_to_cpu(hdr_ptr->valid_extents); + max_extent_blks = le32_to_cpu(hdr_ptr->max_extent_blks); + + if (!is_csum_valid(&hdr_ptr->node.check, hdr_ptr, hdr_size)) { + err = -EIO; + SSDFS_ERR("invalid checksum: node_id %u\n", + le32_to_cpu(index->node_id)); + } + + hdr_ptr = NULL; + kunmap_local(kaddr); + + if (unlikely(err)) + goto finish_invalidate_index; + + if (node_size != (1 << hdr.log_node_size)) { + err = -EIO; + SSDFS_ERR("node_size1 %u != node_size2 %u\n", + node_size, + 1 << hdr.log_node_size); + goto finish_invalidate_index; + } + + node_id1 = le32_to_cpu(index->node_id); + node_id2 = le32_to_cpu(hdr.node_id); + + if (node_id1 != node_id2) { + err = -ERANGE; + SSDFS_ERR("node_id1 %u != node_id2 %u\n", + node_id1, node_id2); + goto finish_invalidate_index; + } + + node_type1 = index->node_type; + node_type2 = hdr.type; + + if (node_type1 != node_type2) { + err = -ERANGE; + SSDFS_ERR("node_type1 %#x != node_type2 %#x\n", + node_type1, node_type2); + goto finish_invalidate_index; + } + + height1 = index->height; + height2 = hdr.height; + + if (height1 != height2) { + err = -ERANGE; + SSDFS_ERR("height1 %u != height2 %u\n", + height1, height2); + goto finish_invalidate_index; + } + + flags = le16_to_cpu(hdr.flags); + + if (flags & ~SSDFS_BTREE_NODE_FLAGS_MASK) { + err = -EIO; + SSDFS_ERR("corrupted node header: flags %#x\n", + flags); + goto finish_invalidate_index; + } + + has_index_area = flags & SSDFS_BTREE_NODE_HAS_INDEX_AREA; + has_items_area = flags & SSDFS_BTREE_NODE_HAS_ITEMS_AREA; + + if (!has_index_area && !has_items_area) { + err = -EIO; + SSDFS_ERR("corrupted node header: no areas\n"); + goto finish_invalidate_index; + } + + if (has_index_area) { + err = ssdfs_invalidate_index_area(shextree, owner_ino, &hdr, + node_size, &pvec); + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate index area: " + "err %d\n", err); + goto finish_invalidate_index; + } + } + + if (has_items_area) { + struct ssdfs_raw_fork fork; + u64 forks_size; + u64 start_hash, end_hash; + + forks_size = (u64)forks_count * sizeof(struct ssdfs_raw_fork); + area_offset = le32_to_cpu(hdr.item_area_offset); + start_hash = le64_to_cpu(hdr.start_hash); + end_hash = le64_to_cpu(hdr.end_hash); + + if (area_offset >= node_size) { + err = -EIO; + SSDFS_ERR("area_offset %u >= node_size %u\n", + area_offset, node_size); + goto finish_invalidate_index; + } + + area_size = node_size - area_offset; + + if (area_size < forks_size) { + err = -EIO; + SSDFS_ERR("corrupted node header: " + "fork_size %lu, forks_count %u, " + "area_size %u\n", + sizeof(struct ssdfs_raw_fork), + forks_count, area_size); + goto finish_invalidate_index; + } + + for (i = 0; i < forks_count; i++) { + u64 start_offset, fork_blks; + + err = __ssdfs_extents_btree_node_get_fork(&pvec, + area_offset, + area_size, + node_size, + i, + &fork); + if (unlikely(err)) { + SSDFS_ERR("fail to get fork: " + "fork_index %u\n", + i); + goto finish_invalidate_index; + } + + start_offset = le64_to_cpu(fork.start_offset); + fork_blks = le64_to_cpu(fork.blks_count); + + if (start_offset >= U64_MAX || fork_blks >= U64_MAX) { + err = -EIO; + SSDFS_ERR("corrupted fork: " + "start_offset %llu, " + "blks_count %llu\n", + start_offset, fork_blks); + goto finish_invalidate_index; + } + + if (fork_blks == 0) { + err = -EIO; + SSDFS_ERR("corrupted fork: " + "start_offset %llu, " + "blks_count %llu\n", + start_offset, fork_blks); + goto finish_invalidate_index; + } + + if (start_offset < start_hash || + start_offset > end_hash) { + err = -EIO; + SSDFS_ERR("corrupted fork: " + "start_hash %llx, end_hash %llx, " + "start_offset %llu\n", + start_hash, end_hash, + start_offset); + goto finish_invalidate_index; + } + + calculated_blks += fork_blks; + + if (calculated_blks > blks_count) { + err = -EIO; + SSDFS_ERR("corrupted fork: " + "calculated_blks %llu, " + "blks_count %llu\n", + calculated_blks, + blks_count); + goto finish_invalidate_index; + } + + err = ssdfs_shextree_add_pre_invalid_fork(shextree, + owner_ino, + &fork); + if (unlikely(err)) { + SSDFS_ERR("fail to add the fork into queue: " + "fork_index %u, err %d\n", + i, err); + goto finish_invalidate_index; + } + } + } + + start_blk = le32_to_cpu(index->index.extent.logical_blk); + len = le32_to_cpu(index->index.extent.len); + + err = ssdfs_mark_segment_under_invalidation(si); + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segment %llu is busy\n", + si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_invalidate_index; + } + + err = ssdfs_segment_invalidate_logical_extent(si, start_blk, len); + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate node: " + "node_id %u, seg_id %llu, " + "start_blk %u, len %u\n", + node_id1, si->seg_id, + start_blk, len); + goto revert_invalidation_state; + } + + for (i = 0; i < si->pebs_count; i++) { + struct ssdfs_segment_request *req; + + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + goto revert_invalidation_state; + } + + ssdfs_request_init(req); + ssdfs_get_request(req); + + err = ssdfs_segment_commit_log_async2(si, SSDFS_REQ_ASYNC, + i, req); + if (unlikely(err)) { + SSDFS_ERR("commit log request failed: " + "peb_index %d, err %d\n", + i, err); + ssdfs_put_request(req); + ssdfs_request_free(req); + goto revert_invalidation_state; + } + } + +revert_invalidation_state: + err = ssdfs_revert_invalidation_to_regular_activity(si); + if (unlikely(err)) { + SSDFS_ERR("unexpected segment %llu activity\n", + si->seg_id); + } + +finish_invalidate_index: + ssdfs_ext_queue_pagevec_release(&pvec); + ssdfs_segment_put_object(si); + return err; +} + +/* + * ssdfs_invalidate_xattrs_btree_index() - invalidate xattrs btree index + * @fsi: pointer on shared file system object + * @owner_ino: inode ID of btree's owner + * @extent: extent for invalidation + * + * This method tries to invalidate the index. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EIO - node is corrupted. + */ +int ssdfs_invalidate_xattrs_btree_index(struct ssdfs_fs_info *fsi, + u64 owner_ino, + struct ssdfs_btree_index_key *index) +{ + struct ssdfs_shared_extents_tree *shextree = NULL; + struct ssdfs_segment_info *si = NULL; + struct ssdfs_xattr_btree_descriptor *xattrs_btree; + u32 node_size; + struct pagevec pvec; + struct ssdfs_btree_node_header hdr; + struct ssdfs_xattrs_btree_node_header *hdr_ptr; + size_t hdr_size = sizeof(struct ssdfs_xattrs_btree_node_header); + u64 parent_ino; + u32 xattrs_count; + struct page *page; + void *kaddr; + u32 node_id1, node_id2; + int node_type1, node_type2; + u8 height1, height2; + u16 flags; + u32 area_offset, area_size; + bool has_index_area, has_items_area; + u32 i; + u32 start_blk; + u32 len; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !index); + + SSDFS_DBG("node_id %u, node_type %#x, " + "height %u, owner_ino %llu\n", + le32_to_cpu(index->node_id), + index->node_type, + index->height, + owner_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + shextree = fsi->shextree; + + if (!shextree) { + SSDFS_ERR("shared extents tree is absent\n"); + return -ERANGE; + } + + xattrs_btree = &fsi->vh->xattr_btree; + node_size = 1 << xattrs_btree->desc.log_node_size; + pagevec_init(&pvec); + + err = __ssdfs_btree_node_prepare_content(fsi, index, node_size, + owner_ino, &si, &pvec); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare node's content: " + "node_id %u, node_type %#x, " + "owner_ino %llu, err %d\n", + le32_to_cpu(index->node_id), + index->node_type, + owner_ino, err); + goto finish_invalidate_index; + } + + if (pagevec_count(&pvec) == 0) { + err = -ERANGE; + SSDFS_ERR("empty node's content: id %u\n", + le32_to_cpu(index->node_id)); + goto finish_invalidate_index; + } + + page = pvec.pages[0]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + kaddr = kmap_local_page(page); + + ssdfs_memcpy(&hdr, 0, sizeof(struct ssdfs_btree_node_header), + kaddr, 0, PAGE_SIZE, + sizeof(struct ssdfs_btree_node_header)); + + hdr_ptr = (struct ssdfs_xattrs_btree_node_header *)kaddr; + parent_ino = le64_to_cpu(hdr_ptr->parent_ino); + xattrs_count = le16_to_cpu(hdr_ptr->xattrs_count); + + if (!is_csum_valid(&hdr_ptr->node.check, hdr_ptr, hdr_size)) { + err = -EIO; + SSDFS_ERR("invalid checksum: node_id %u\n", + le32_to_cpu(index->node_id)); + } + + hdr_ptr = NULL; + kunmap_local(kaddr); + + if (unlikely(err)) + goto finish_invalidate_index; + + if (node_size != (1 << hdr.log_node_size)) { + err = -EIO; + SSDFS_ERR("node_size1 %u != node_size2 %u\n", + node_size, + 1 << hdr.log_node_size); + goto finish_invalidate_index; + } + + node_id1 = le32_to_cpu(index->node_id); + node_id2 = le32_to_cpu(hdr.node_id); + + if (node_id1 != node_id2) { + err = -ERANGE; + SSDFS_ERR("node_id1 %u != node_id2 %u\n", + node_id1, node_id2); + goto finish_invalidate_index; + } + + node_type1 = index->node_type; + node_type2 = hdr.type; + + if (node_type1 != node_type2) { + err = -ERANGE; + SSDFS_ERR("node_type1 %#x != node_type2 %#x\n", + node_type1, node_type2); + goto finish_invalidate_index; + } + + height1 = index->height; + height2 = hdr.height; + + if (height1 != height2) { + err = -ERANGE; + SSDFS_ERR("height1 %u != height2 %u\n", + height1, height2); + goto finish_invalidate_index; + } + + flags = le16_to_cpu(hdr.flags); + + if (flags & ~SSDFS_BTREE_NODE_FLAGS_MASK) { + err = -EIO; + SSDFS_ERR("corrupted node header: flags %#x\n", + flags); + goto finish_invalidate_index; + } + + has_index_area = flags & SSDFS_BTREE_NODE_HAS_INDEX_AREA; + has_items_area = flags & SSDFS_BTREE_NODE_HAS_ITEMS_AREA; + + if (!has_index_area && !has_items_area) { + err = -EIO; + SSDFS_ERR("corrupted node header: no areas\n"); + goto finish_invalidate_index; + } + + if (has_index_area) { + err = ssdfs_invalidate_index_area(shextree, owner_ino, &hdr, + node_size, &pvec); + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate index area: " + "err %d\n", err); + goto finish_invalidate_index; + } + } + + if (has_items_area) { + struct ssdfs_xattr_entry xattr; + u64 xattrs_size; + u64 start_hash, end_hash; + + xattrs_size = (u64)xattrs_count * + sizeof(struct ssdfs_xattr_entry); + area_offset = le32_to_cpu(hdr.item_area_offset); + start_hash = le64_to_cpu(hdr.start_hash); + end_hash = le64_to_cpu(hdr.end_hash); + + if (area_offset >= node_size) { + err = -EIO; + SSDFS_ERR("area_offset %u >= node_size %u\n", + area_offset, node_size); + goto finish_invalidate_index; + } + + area_size = node_size - area_offset; + + if (area_size < xattrs_size) { + err = -EIO; + SSDFS_ERR("corrupted node header: " + "xattr_size %lu, xattrs_count %u, " + "area_size %u\n", + sizeof(struct ssdfs_xattr_entry), + xattrs_count, area_size); + goto finish_invalidate_index; + } + + for (i = 0; i < xattrs_count; i++) { + struct ssdfs_blob_extent *desc; + struct ssdfs_raw_extent *extent; + bool is_flag_invalid; + + err = __ssdfs_xattrs_btree_node_get_xattr(&pvec, + area_offset, + area_size, + node_size, + i, + &xattr); + if (unlikely(err)) { + SSDFS_ERR("fail to get xattr: " + "xattr_index %u\n", + i); + goto finish_invalidate_index; + } + + switch (xattr.blob_type) { + case SSDFS_XATTR_INLINE_BLOB: + is_flag_invalid = xattr.blob_flags & + SSDFS_XATTR_HAS_EXTERNAL_BLOB; + + if (is_flag_invalid) { + err = -ERANGE; + SSDFS_ERR("invalid xattr: " + "blob_type %#x, " + "blob_flags %#x\n", + xattr.blob_type, + xattr.blob_flags); + goto finish_invalidate_index; + } else { + /* skip invalidation -> inline blob */ + continue; + } + break; + + case SSDFS_XATTR_REGULAR_BLOB: + is_flag_invalid = xattr.blob_flags & + ~SSDFS_XATTR_HAS_EXTERNAL_BLOB; + + if (is_flag_invalid) { + err = -ERANGE; + SSDFS_ERR("invalid xattr: " + "blob_type %#x, " + "blob_flags %#x\n", + xattr.blob_type, + xattr.blob_flags); + goto finish_invalidate_index; + } + + desc = &xattr.blob.descriptor; + extent = &xattr.blob.descriptor.extent; + err = + ssdfs_shextree_add_pre_invalid_extent(shextree, + owner_ino, + extent); + if (unlikely(err)) { + SSDFS_ERR("fail to pre-invalid: " + "cur_index %u, err %d\n", + i, err); + goto finish_invalidate_index; + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid blob_type %#x\n", + xattr.blob_type); + goto finish_invalidate_index; + } + } + } + + start_blk = le32_to_cpu(index->index.extent.logical_blk); + len = le32_to_cpu(index->index.extent.len); + + err = ssdfs_mark_segment_under_invalidation(si); + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segment %llu is busy\n", + si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_invalidate_index; + } + + err = ssdfs_segment_invalidate_logical_extent(si, start_blk, len); + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate node: " + "node_id %u, seg_id %llu, " + "start_blk %u, len %u\n", + node_id1, si->seg_id, + start_blk, len); + goto revert_invalidation_state; + } + + for (i = 0; i < si->pebs_count; i++) { + struct ssdfs_segment_request *req; + + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + goto revert_invalidation_state; + } + + ssdfs_request_init(req); + ssdfs_get_request(req); + + err = ssdfs_segment_commit_log_async2(si, SSDFS_REQ_ASYNC, + i, req); + if (unlikely(err)) { + SSDFS_ERR("commit log request failed: " + "peb_index %d, err %d\n", + i, err); + ssdfs_put_request(req); + ssdfs_request_free(req); + goto revert_invalidation_state; + } + } + +revert_invalidation_state: + err = ssdfs_revert_invalidation_to_regular_activity(si); + if (unlikely(err)) { + SSDFS_ERR("unexpected segment %llu activity\n", + si->seg_id); + } + +finish_invalidate_index: + ssdfs_ext_queue_pagevec_release(&pvec); + ssdfs_segment_put_object(si); + return err; +} + +/* + * ssdfs_invalidate_extent() - invalidate extent + * @fsi: pointer on shared file system object + * @extent: extent for invalidation + * + * This method tries to invalidate extent in the segment. + * The extent should be deleted from the extents tree + * beforehand. This method has goal to do real invalidation + * the extents from the extents queue. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_invalidate_extent(struct ssdfs_fs_info *fsi, + struct ssdfs_raw_extent *extent) +{ + struct ssdfs_segment_info *si; + u64 seg_id; + u32 start_blk; + u32 len; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !extent); + BUG_ON(le64_to_cpu(extent->seg_id) == U64_MAX || + le32_to_cpu(extent->logical_blk) == U32_MAX || + le32_to_cpu(extent->len) == U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + seg_id = le64_to_cpu(extent->seg_id); + start_blk = le32_to_cpu(extent->logical_blk); + len = le32_to_cpu(extent->len); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, start_blk %u, len %u\n", + seg_id, start_blk, len); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = ssdfs_grab_segment(fsi, SSDFS_USER_DATA_SEG_TYPE, seg_id, U64_MAX); + if (unlikely(IS_ERR_OR_NULL(si))) { + SSDFS_ERR("fail to grab segment object: " + "seg %llu, err %d\n", + seg_id, err); + return PTR_ERR(si); + } + + err = ssdfs_mark_segment_under_invalidation(si); + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("segment %llu is busy\n", + si->seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_invalidate_extent; + } + + err = ssdfs_segment_invalidate_logical_extent(si, start_blk, len); + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate logical extent: " + "seg %llu, extent (start_blk %u, len %u), err %d\n", + seg_id, start_blk, len, err); + goto revert_invalidation_state; + } + + for (i = 0; i < si->pebs_count; i++) { + struct ssdfs_segment_request *req; + + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + goto revert_invalidation_state; + } + + ssdfs_request_init(req); + ssdfs_get_request(req); + + err = ssdfs_segment_commit_log_async2(si, SSDFS_REQ_ASYNC, + i, req); + if (unlikely(err)) { + SSDFS_ERR("commit log request failed: " + "peb_index %d, err %d\n", + i, err); + ssdfs_put_request(req); + ssdfs_request_free(req); + goto revert_invalidation_state; + } + } + +revert_invalidation_state: + err = ssdfs_revert_invalidation_to_regular_activity(si); + if (unlikely(err)) { + SSDFS_ERR("unexpected segment %llu activity\n", + si->seg_id); + } + +finish_invalidate_extent: + ssdfs_segment_put_object(si); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} diff --git a/fs/ssdfs/extents_queue.h b/fs/ssdfs/extents_queue.h new file mode 100644 index 000000000000..1aeb12cff61b --- /dev/null +++ b/fs/ssdfs/extents_queue.h @@ -0,0 +1,105 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/extents_queue.h - extents queue declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#ifndef _SSDFS_EXTENTS_QUEUE_H +#define _SSDFS_EXTENTS_QUEUE_H + +/* + * struct ssdfs_extents_queue - extents queue descriptor + * @lock: extents queue's lock + * @list: extents queue's list + */ +struct ssdfs_extents_queue { + spinlock_t lock; + struct list_head list; +}; + +/* + * struct ssdfs_extent_info - extent info + * @list: extents queue list + * @type: extent info type + * @owner_ino: btree's owner inode id + * @raw.extent: raw extent + * @raw.index: raw index + */ +struct ssdfs_extent_info { + struct list_head list; + int type; + u64 owner_ino; + union { + struct ssdfs_raw_extent extent; + struct ssdfs_btree_index_key index; + } raw; +}; + +/* Extent info existing types */ +enum { + SSDFS_EXTENT_INFO_UNKNOWN_TYPE, + SSDFS_EXTENT_INFO_RAW_EXTENT, + SSDFS_EXTENT_INFO_INDEX_DESCRIPTOR, + SSDFS_EXTENT_INFO_DENTRY_INDEX_DESCRIPTOR, + SSDFS_EXTENT_INFO_SHDICT_INDEX_DESCRIPTOR, + SSDFS_EXTENT_INFO_XATTR_INDEX_DESCRIPTOR, + SSDFS_EXTENT_INFO_TYPE_MAX +}; + +/* + * Extents queue API + */ +void ssdfs_extents_queue_init(struct ssdfs_extents_queue *eq); +bool is_ssdfs_extents_queue_empty(struct ssdfs_extents_queue *eq); +void ssdfs_extents_queue_add_tail(struct ssdfs_extents_queue *eq, + struct ssdfs_extent_info *ei); +void ssdfs_extents_queue_add_head(struct ssdfs_extents_queue *eq, + struct ssdfs_extent_info *ei); +int ssdfs_extents_queue_remove_first(struct ssdfs_extents_queue *eq, + struct ssdfs_extent_info **ei); +void ssdfs_extents_queue_remove_all(struct ssdfs_extents_queue *eq); + +/* + * Extent info's API + */ +void ssdfs_zero_extent_info_cache_ptr(void); +int ssdfs_init_extent_info_cache(void); +void ssdfs_shrink_extent_info_cache(void); +void ssdfs_destroy_extent_info_cache(void); + +struct ssdfs_extent_info *ssdfs_extent_info_alloc(void); +void ssdfs_extent_info_free(struct ssdfs_extent_info *ei); +void ssdfs_extent_info_init(int type, void *ptr, u64 owner_ino, + struct ssdfs_extent_info *ei); + +int ssdfs_invalidate_extent(struct ssdfs_fs_info *fsi, + struct ssdfs_raw_extent *extent); +int ssdfs_invalidate_extents_btree_index(struct ssdfs_fs_info *fsi, + u64 owner_ino, + struct ssdfs_btree_index_key *index); +int ssdfs_invalidate_dentries_btree_index(struct ssdfs_fs_info *fsi, + u64 owner_ino, + struct ssdfs_btree_index_key *index); +int ssdfs_invalidate_shared_dict_btree_index(struct ssdfs_fs_info *fsi, + u64 owner_ino, + struct ssdfs_btree_index_key *index); +int ssdfs_invalidate_xattrs_btree_index(struct ssdfs_fs_info *fsi, + u64 owner_ino, + struct ssdfs_btree_index_key *index); + +#endif /* _SSDFS_EXTENTS_QUEUE_H */ From patchwork Sat Feb 25 01:09:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151971 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 865B1C7EE2F for ; Sat, 25 Feb 2023 01:20:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229777AbjBYBUD (ORCPT ); Fri, 24 Feb 2023 20:20:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229723AbjBYBTL (ORCPT ); Fri, 24 Feb 2023 20:19:11 -0500 Received: from mail-oi1-x229.google.com (mail-oi1-x229.google.com [IPv6:2607:f8b0:4864:20::229]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA6A0D33C for ; Fri, 24 Feb 2023 17:17:55 -0800 (PST) Received: by mail-oi1-x229.google.com with SMTP id bi17so823209oib.3 for ; Fri, 24 Feb 2023 17:17:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2ILxrIp7YEz/Of4dTMw2qMTAAOGvIyV39QleVA6DH+A=; b=UTKl/C5pwS7UC4EQyXUshung5qToYXDRMTlGZQoQQqpU3xaKfgkLz09eaYORL7xdVq ksVc4WWRZJx57JurofTiE5C6nnMda3J3oWRfsDI/sFpuMQXTAoPJhR6BT5C/E8AAgEZT ST9qaWMwWtqRgYHNsnik1v/AZvEV6x+cCa9sJ3Ge4ynRn8ZHRkuJcnWYEXuBNoqNoy43 RAN4sTvg3X6+udPglYLYEC1z+oyrvWc2oVfYCU0EupZt7A65yMbbgwEKGJ3+mxkPTPKx PBc4tetAi7PET/pkiDonJWFx7mhhYkALV+TqFP+9Oh3Z9TB6iyQHKNmOmIoc+1DqLuda YaIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2ILxrIp7YEz/Of4dTMw2qMTAAOGvIyV39QleVA6DH+A=; b=gIso5De81ThxAqiO8sThDaEsvEw35xiLgOjsUtnUKnFApuZZU64ChZwSk1ZKjYZKZz qw10yNadzXlYqGw0gcRwP93AEb2RYo3xmLzWitBH+lRwh8Xl88zhB7mprPe2PYqWZ+uk IwFABeByP5JHv0d59i7yf0pWVZ6UORQX6JXPgGZ8VaqM3QZsfwKUs+V50L3lFjnhEA1q v+2qYUK/p5vY8JVVNV/CzSi4Cr96A72/Bgh4gJh7087uOgk2IAeZHJP484pPW+5O54+c 5UvewkX7+ObA7vqv+QVlV0AlBSAD4/2lP8sy1kz9xYFBf4Tt0SSaBHXHtSpEbwAenW0g yXPg== X-Gm-Message-State: AO0yUKXHU/vVH1a8Cj3uPAzhzRHWIsKAMqqhfBeKct42KMBxh+24sFgf 02J6dkZz72KoWhgqdHvZ8PvVf9Cgvmvyp6Bn X-Google-Smtp-Source: AK7set/dRemqYge6HkbUOsC4K9Gkzs9eqDWNspizxYxM2NkF1muqhaZyi5iI46Qzsm/mq31Ik8EuzA== X-Received: by 2002:a54:4684:0:b0:378:1ae8:9253 with SMTP id k4-20020a544684000000b003781ae89253mr4405914oic.45.1677287874372; Fri, 24 Feb 2023 17:17:54 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:53 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 66/76] ssdfs: introduce extents b-tree Date: Fri, 24 Feb 2023 17:09:17 -0800 Message-Id: <20230225010927.813929-67-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org SSDFS raw extent describes a contiguous sequence of logical blocks by means of segment ID, logical block number of starting position, and length. By default, SSDFS inode has the private area of 128 bytes in size and SSDFS extent has 16 bytes in size. As a result, the inode’s private area is capable to store not more than 8 raw extents. Generally speaking, hybrid b-tree was opted with the goal to store efficiently larger number of raw extents. First of all, it was taken into account that file sizes can vary a lot on the same file system’s volume. Moreover, the size of the same file could vary significantly during its lifetime. Finally, b-tree is the really good mechanism for storing the extents compactly with very flexible way of increasing or shrinking the reserved space. Also b-tree provides very efficient technique of extents lookup. Additionally, SSDFS file system uses compression that guarantee the really compact storage of semi-empty b-tree nodes. Moreover, hybrid b-tree provides the way to mix as index as data records in the hybrid nodes with the goal to achieve much more compact representation of b-tree’s content. It needs to point out that extents b-tree’s nodes group the extent records into forks. Generally speaking, the raw extent describes a position on the volume of some contiguous sequence of logical blocks without any details about the offset of this extent from a file’s beginning. As a result, the fork describes an offset of some portion of file’s content from the file’s beginning and number of logical blocks in this portion. Also fork contains the space for three raw extents that are able to define the position of three contiguous sequences of logical blocks on the file system’s volume. Finally, one fork has 64 bytes in size. If anybody considers a b-tree node of 4 KB in size then such node is capable to store about 64 forks with 192 extents in total. Generally speaking, even a small b-tree is able to store a significant number of extents and to determine the position of fragments of generally big file. If anybody imagines a b-tree with the two 4 KB nodes in total, every extent defines a position of 8 MB file’s portion then such b-tree is able to describe a file of 3 GB in total. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/extents_tree.c | 3370 +++++++++++++++++++++++++++++++++++++++ fs/ssdfs/extents_tree.h | 171 ++ 2 files changed, 3541 insertions(+) create mode 100644 fs/ssdfs/extents_tree.c create mode 100644 fs/ssdfs/extents_tree.h diff --git a/fs/ssdfs/extents_tree.c b/fs/ssdfs/extents_tree.c new file mode 100644 index 000000000000..a13e7d773e7d --- /dev/null +++ b/fs/ssdfs/extents_tree.c @@ -0,0 +1,3370 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/extents_tree.c - extents tree functionality. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "request_queue.h" +#include "segment_bitmap.h" +#include "offset_translation_table.h" +#include "page_array.h" +#include "page_vector.h" +#include "peb_container.h" +#include "segment.h" +#include "extents_queue.h" +#include "btree_search.h" +#include "btree_node.h" +#include "btree.h" +#include "shared_extents_tree.h" +#include "segment_tree.h" +#include "extents_tree.h" + +#include + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_ext_tree_page_leaks; +atomic64_t ssdfs_ext_tree_memory_leaks; +atomic64_t ssdfs_ext_tree_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_ext_tree_cache_leaks_increment(void *kaddr) + * void ssdfs_ext_tree_cache_leaks_decrement(void *kaddr) + * void *ssdfs_ext_tree_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_ext_tree_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_ext_tree_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_ext_tree_kfree(void *kaddr) + * struct page *ssdfs_ext_tree_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_ext_tree_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_ext_tree_free_page(struct page *page) + * void ssdfs_ext_tree_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(ext_tree) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(ext_tree) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_ext_tree_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_ext_tree_page_leaks, 0); + atomic64_set(&ssdfs_ext_tree_memory_leaks, 0); + atomic64_set(&ssdfs_ext_tree_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_ext_tree_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_ext_tree_page_leaks) != 0) { + SSDFS_ERR("EXTENTS TREE: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_ext_tree_page_leaks)); + } + + if (atomic64_read(&ssdfs_ext_tree_memory_leaks) != 0) { + SSDFS_ERR("EXTENTS TREE: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_ext_tree_memory_leaks)); + } + + if (atomic64_read(&ssdfs_ext_tree_cache_leaks) != 0) { + SSDFS_ERR("EXTENTS TREE: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_ext_tree_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +/* + * ssdfs_commit_queue_create() - create commit queue + * @tree: extents tree + * + * This method tries to create the commit queue. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static inline +int ssdfs_commit_queue_create(struct ssdfs_extents_btree_info *tree) +{ + size_t bytes_count = sizeof(u64) * SSDFS_COMMIT_QUEUE_DEFAULT_CAPACITY; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); + + SSDFS_DBG("tree %p\n", tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree->updated_segs.ids = ssdfs_ext_tree_kzalloc(bytes_count, + GFP_KERNEL); + if (!tree->updated_segs.ids) { + SSDFS_ERR("fail to allocate commit queue\n"); + return -ENOMEM; + } + + tree->updated_segs.count = 0; + tree->updated_segs.capacity = SSDFS_COMMIT_QUEUE_DEFAULT_CAPACITY; + + return 0; +} + +/* + * ssdfs_commit_queue_destroy() - destroy commit queue + * @tree: extents tree + * + * This method tries to destroy the commit queue. + */ +static inline +void ssdfs_commit_queue_destroy(struct ssdfs_extents_btree_info *tree) +{ + u32 i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); + + SSDFS_DBG("tree %p\n", tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!tree->updated_segs.ids) + return; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("count %u, capacity %u\n", + tree->updated_segs.count, + tree->updated_segs.capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (tree->updated_segs.count > tree->updated_segs.capacity || + tree->updated_segs.capacity == 0) { + SSDFS_WARN("count %u > capacity %u\n", + tree->updated_segs.count, + tree->updated_segs.capacity); + } + + if (tree->updated_segs.count != 0) { + SSDFS_ERR("NOT processed segments:\n"); + + for (i = 0; i < tree->updated_segs.count; i++) { + SSDFS_ERR("ino %lu --> seg %llu\n", + tree->owner->vfs_inode.i_ino, + tree->updated_segs.ids[i]); + } + + SSDFS_WARN("commit queue contains not processed segments: " + "count %u\n", + tree->updated_segs.count); + } + + ssdfs_ext_tree_kfree(tree->updated_segs.ids); + tree->updated_segs.ids = NULL; + tree->updated_segs.count = 0; + tree->updated_segs.capacity = 0; +} + +/* + * ssdfs_commit_queue_realloc() - realloc commit queue + * @tree: extents tree + * + * This method tries to realloc the commit queue. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static inline +int ssdfs_commit_queue_realloc(struct ssdfs_extents_btree_info *tree) +{ + size_t old_size, new_size; + size_t step_size = sizeof(u64) * SSDFS_COMMIT_QUEUE_DEFAULT_CAPACITY; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p\n", tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!tree->updated_segs.ids) { + SSDFS_ERR("commit queue is absent!!!\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("count %u, capacity %u\n", + tree->updated_segs.count, + tree->updated_segs.capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (tree->updated_segs.count > tree->updated_segs.capacity || + tree->updated_segs.capacity == 0) { + SSDFS_ERR("count %u > capacity %u\n", + tree->updated_segs.count, + tree->updated_segs.capacity); + return -ERANGE; + } + + if (tree->updated_segs.count < tree->updated_segs.capacity) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("NO realloc necessary: " + "count %u < capacity %u\n", + tree->updated_segs.count, + tree->updated_segs.capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + old_size = sizeof(u64) * tree->updated_segs.capacity; + new_size = old_size + step_size; + + tree->updated_segs.ids = krealloc(tree->updated_segs.ids, + new_size, + GFP_KERNEL | __GFP_ZERO); + if (!tree->updated_segs.ids) { + SSDFS_ERR("fail to re-allocate commit queue\n"); + return -ENOMEM; + } + + tree->updated_segs.capacity += SSDFS_COMMIT_QUEUE_DEFAULT_CAPACITY; + + return 0; +} + +/* + * ssdfs_commit_queue_add_segment_id() - add updated segment ID in queue + * @tree: extents tree + * @seg_id: segment ID + * + * This method tries to add the updated segment ID into + * the commit queue. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_commit_queue_add_segment_id(struct ssdfs_extents_btree_info *tree, + u64 seg_id) +{ + u32 i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, seg_id %llu\n", tree, seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!tree->updated_segs.ids) { + SSDFS_ERR("commit queue is absent!!!\n"); + return -ERANGE; + } + + if (seg_id >= U64_MAX) { + SSDFS_ERR("invalid seg_id %llu\n", seg_id); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("count %u, capacity %u\n", + tree->updated_segs.count, + tree->updated_segs.capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (tree->updated_segs.count > tree->updated_segs.capacity || + tree->updated_segs.capacity == 0) { + SSDFS_ERR("count %u > capacity %u\n", + tree->updated_segs.count, + tree->updated_segs.capacity); + return -ERANGE; + } + + for (i = 0; i < tree->updated_segs.count; i++) { + if (tree->updated_segs.ids[i] == seg_id) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu in the queue already\n", + seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + } + + if (tree->updated_segs.count == tree->updated_segs.capacity) { + err = ssdfs_commit_queue_realloc(tree); + if (unlikely(err)) { + SSDFS_ERR("fail to realloc commit queue: " + "seg_id %llu, count %u, " + "capacity %u, err %d\n", + seg_id, + tree->updated_segs.count, + tree->updated_segs.capacity, + err); + return err; + } + } + + + tree->updated_segs.ids[tree->updated_segs.count] = seg_id; + tree->updated_segs.count++; + + return 0; +} + +/* + * __ssdfs_commit_queue_issue_requests_async() - issue commit now requests async + * @fsi: pointer on shared file system object + * @seg_id: segment ID + * + * This method tries to issue commit now requests + * asynchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static int +__ssdfs_commit_queue_issue_requests_async(struct ssdfs_fs_info *fsi, + u64 seg_id) +{ + struct ssdfs_segment_info *si; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + + SSDFS_DBG("fsi %p, seg_id %llu\n", fsi, seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = ssdfs_grab_segment(fsi, SSDFS_USER_DATA_SEG_TYPE, + seg_id, U64_MAX); + if (unlikely(IS_ERR_OR_NULL(si))) { + err = !si ? -ENOMEM : PTR_ERR(si); + SSDFS_ERR("fail to grab segment object: " + "seg %llu, err %d\n", + seg_id, err); + return err; + } + + for (i = 0; i < si->pebs_count; i++) { + struct ssdfs_segment_request *req; + + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + goto finish_issue_requests_async; + } + + ssdfs_request_init(req); + ssdfs_get_request(req); + + err = ssdfs_segment_commit_log_async2(si, SSDFS_REQ_ASYNC, + i, req); + if (unlikely(err)) { + SSDFS_ERR("commit log request failed: " + "peb_index %d, err %d\n", + i, err); + ssdfs_put_request(req); + ssdfs_request_free(req); + goto finish_issue_requests_async; + } + } + +finish_issue_requests_async: + ssdfs_segment_put_object(si); + + return err; +} + +/* + * ssdfs_commit_queue_issue_requests_async() - issue commit now requests async + * @tree: extents tree + * + * This method tries to issue commit now requests + * asynchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static int +ssdfs_commit_queue_issue_requests_async(struct ssdfs_extents_btree_info *tree) +{ + struct ssdfs_fs_info *fsi; + size_t bytes_count; + u32 i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p\n", tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = tree->fsi; + + if (!tree->updated_segs.ids) { + SSDFS_ERR("commit queue is absent!!!\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("count %u, capacity %u\n", + tree->updated_segs.count, + tree->updated_segs.capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (tree->updated_segs.count > tree->updated_segs.capacity || + tree->updated_segs.capacity == 0) { + SSDFS_ERR("count %u > capacity %u\n", + tree->updated_segs.count, + tree->updated_segs.capacity); + return -ERANGE; + } + + if (tree->updated_segs.count == 0) { + SSDFS_DBG("commit queue is empty\n"); + return 0; + } + + for (i = 0; i < tree->updated_segs.count; i++) { + u64 seg_id = tree->updated_segs.ids[i]; + + err = __ssdfs_commit_queue_issue_requests_async(fsi, seg_id); + if (unlikely(err)) { + SSDFS_ERR("fail to issue commit requests: " + "seg_id %llu, err %d\n", + seg_id, err); + goto finish_issue_requests_async; + } + } + + bytes_count = sizeof(u64) * tree->updated_segs.capacity; + memset(tree->updated_segs.ids, 0, bytes_count); + + tree->updated_segs.count = 0; + +finish_issue_requests_async: + return err; +} + +/* + * __ssdfs_commit_queue_issue_requests_sync() - issue commit now requests sync + * @fsi: pointer on shared file system object + * @seg_id: segment ID + * @pair: pointer on starting seg2req pair + * + * This method tries to issue commit now requests + * synchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static int +__ssdfs_commit_queue_issue_requests_sync(struct ssdfs_fs_info *fsi, + u64 seg_id, + struct ssdfs_seg2req_pair *pair) +{ + struct ssdfs_segment_info *si; + struct ssdfs_seg2req_pair *cur_pair; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !pair); + + SSDFS_DBG("fsi %p, seg_id %llu, pair %p\n", + fsi, seg_id, pair); +#endif /* CONFIG_SSDFS_DEBUG */ + + si = ssdfs_grab_segment(fsi, SSDFS_USER_DATA_SEG_TYPE, + seg_id, U64_MAX); + if (unlikely(IS_ERR_OR_NULL(si))) { + err = !si ? -ENOMEM : PTR_ERR(si); + SSDFS_ERR("fail to grab segment object: " + "seg %llu, err %d\n", + seg_id, err); + return err; + } + + for (i = 0; i < si->pebs_count; i++) { + cur_pair = pair + i; + + cur_pair->req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(cur_pair->req)) { + err = (cur_pair->req == NULL ? + -ENOMEM : PTR_ERR(cur_pair->req)); + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + goto finish_issue_requests_sync; + } + + ssdfs_request_init(cur_pair->req); + ssdfs_get_request(cur_pair->req); + + err = ssdfs_segment_commit_log_async2(si, + SSDFS_REQ_ASYNC_NO_FREE, + i, cur_pair->req); + if (unlikely(err)) { + SSDFS_ERR("commit log request failed: " + "err %d\n", err); + ssdfs_put_request(pair->req); + ssdfs_request_free(pair->req); + pair->req = NULL; + goto finish_issue_requests_sync; + } + + ssdfs_segment_get_object(si); + cur_pair->si = si; + } + +finish_issue_requests_sync: + ssdfs_segment_put_object(si); + + return err; +} + +/* + * ssdfs_commit_queue_check_request() - check request + * @req: segment request + * + * This method tries to check the state of request. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_commit_queue_check_request(struct ssdfs_segment_request *req) +{ + wait_queue_head_t *wq = NULL; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); + + SSDFS_DBG("req %p\n", req); +#endif /* CONFIG_SSDFS_DEBUG */ + +check_req_state: + switch (atomic_read(&req->result.state)) { + case SSDFS_REQ_CREATED: + case SSDFS_REQ_STARTED: + wq = &req->private.wait_queue; + + err = wait_event_killable_timeout(*wq, + has_request_been_executed(req), + SSDFS_DEFAULT_TIMEOUT); + if (err < 0) + WARN_ON(err < 0); + else + err = 0; + + goto check_req_state; + break; + + case SSDFS_REQ_FINISHED: + /* do nothing */ + break; + + case SSDFS_REQ_FAILED: + err = req->result.err; + + if (!err) { + SSDFS_ERR("error code is absent: " + "req %p, err %d\n", + req, err); + err = -ERANGE; + } + + SSDFS_ERR("flush request is failed: " + "err %d\n", err); + return err; + + default: + SSDFS_ERR("invalid result's state %#x\n", + atomic_read(&req->result.state)); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_commit_queue_wait_commit_logs_end() - wait commit logs end + * @fsi: pointer on shared file system object + * @seg_id: segment ID + * @pair: pointer on starting seg2req pair + * + * This method waits the requests ending and checking + * the requests. + */ +static void +ssdfs_commit_queue_wait_commit_logs_end(struct ssdfs_fs_info *fsi, + u64 seg_id, + struct ssdfs_seg2req_pair *pair) +{ + struct ssdfs_seg2req_pair *cur_pair; + int refs_count; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !pair); + + SSDFS_DBG("fsi %p, seg_id %llu, pair %p\n", + fsi, seg_id, pair); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < fsi->pebs_per_seg; i++) { + cur_pair = pair + i; + + if (cur_pair->req != NULL) { + err = ssdfs_commit_queue_check_request(cur_pair->req); + if (unlikely(err)) { + SSDFS_ERR("flush request failed: " + "err %d\n", err); + } + + refs_count = + atomic_read(&cur_pair->req->private.refs_count); + if (refs_count != 0) { + SSDFS_WARN("unexpected refs_count %d\n", + refs_count); + } + + ssdfs_request_free(cur_pair->req); + cur_pair->req = NULL; + } else { + SSDFS_ERR("request is NULL: " + "item_index %d\n", i); + } + + if (cur_pair->si != NULL) { + ssdfs_segment_put_object(cur_pair->si); + cur_pair->si = NULL; + } else { + SSDFS_ERR("segment is NULL: " + "item_index %d\n", i); + } + } +} + +/* + * ssdfs_commit_queue_issue_requests_sync() - issue commit now requests sync + * @tree: extents tree + * + * This method tries to issue commit now requests + * synchronously. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + */ +static int +ssdfs_commit_queue_issue_requests_sync(struct ssdfs_extents_btree_info *tree) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_seg2req_pair *pairs; + u32 items_count; + size_t bytes_count; + u32 i, j; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p\n", tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = tree->fsi; + + if (!tree->updated_segs.ids) { + SSDFS_ERR("commit queue is absent!!!\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("count %u, capacity %u\n", + tree->updated_segs.count, + tree->updated_segs.capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (tree->updated_segs.count > tree->updated_segs.capacity || + tree->updated_segs.capacity == 0) { + SSDFS_ERR("count %u > capacity %u\n", + tree->updated_segs.count, + tree->updated_segs.capacity); + return -ERANGE; + } + + if (tree->updated_segs.count == 0) { + SSDFS_DBG("commit queue is empty\n"); + return 0; + } + + items_count = tree->updated_segs.count * fsi->pebs_per_seg; + + pairs = ssdfs_ext_tree_kcalloc(items_count, + sizeof(struct ssdfs_seg2req_pair), + GFP_KERNEL); + if (!pairs) { + SSDFS_ERR("fail to allocate requsts array\n"); + return -ENOMEM; + } + + for (i = 0; i < tree->updated_segs.count; i++) { + u64 seg_id = tree->updated_segs.ids[i]; + struct ssdfs_seg2req_pair *start_pair; + + start_pair = &pairs[i * fsi->pebs_per_seg]; + + err = __ssdfs_commit_queue_issue_requests_sync(fsi, seg_id, + start_pair); + if (unlikely(err)) { + SSDFS_ERR("fail to issue commit requests: " + "seg_id %llu, err %d\n", + seg_id, err); + i++; + break; + } + } + + for (j = 0; j < i; j++) { + u64 seg_id = tree->updated_segs.ids[j]; + struct ssdfs_seg2req_pair *start_pair; + + start_pair = &pairs[j * fsi->pebs_per_seg]; + + ssdfs_commit_queue_wait_commit_logs_end(fsi, seg_id, + start_pair); + } + + if (!err) { + bytes_count = sizeof(u64) * tree->updated_segs.capacity; + memset(tree->updated_segs.ids, 0, bytes_count); + tree->updated_segs.count = 0; + } + + ssdfs_ext_tree_kfree(pairs); + + return err; +} + +/* + * ssdfs_extents_tree_add_updated_seg_id() - add updated segment ID in queue + * @tree: extents tree + * @seg_id: segment ID + * + * This method tries to add the updated segment ID into + * the commit queue. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_extents_tree_add_updated_seg_id(struct ssdfs_extents_btree_info *tree, + u64 seg_id) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, seg_id %llu\n", tree, seg_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (tree->updated_segs.count >= SSDFS_COMMIT_QUEUE_THRESHOLD) { + err = ssdfs_commit_queue_issue_requests_async(tree); + if (unlikely(err)) { + SSDFS_ERR("fail to issue commit requests: " + "err %d\n", err); + return err; + } + } + + err = ssdfs_commit_queue_add_segment_id(tree, seg_id); + if (unlikely(err)) { + SSDFS_ERR("fail to add segment ID: " + "seg_id %llu, err %d\n", + seg_id, err); + return err; + } + + return 0; +} + +/* + * ssdfs_extents_tree_commit_logs_now() - commit logs now + * @tree: extents tree + * + * This method tries to commit logs in updated segments. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static inline +int ssdfs_extents_tree_commit_logs_now(struct ssdfs_extents_btree_info *tree) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p\n", tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + return ssdfs_commit_queue_issue_requests_sync(tree); +} + +/* + * ssdfs_init_inline_root_node() - initialize inline root node + * @fsi: pointer on shared file system object + * @root: pointer on inline root node [out] + */ +static inline +void ssdfs_init_inline_root_node(struct ssdfs_fs_info *fsi, + struct ssdfs_btree_inline_root_node *root) +{ + size_t index_size = sizeof(struct ssdfs_btree_index); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!root); + + SSDFS_DBG("root %p\n", root); +#endif /* CONFIG_SSDFS_DEBUG */ + + root->header.height = SSDFS_BTREE_LEAF_NODE_HEIGHT; + root->header.items_count = 0; + root->header.flags = 0; + root->header.type = 0; + root->header.upper_node_id = cpu_to_le32(SSDFS_BTREE_ROOT_NODE_ID); + memset(root->header.node_ids, 0xFF, + sizeof(__le32) * SSDFS_BTREE_ROOT_NODE_INDEX_COUNT); + memset(root->indexes, 0xFF, + index_size * SSDFS_BTREE_ROOT_NODE_INDEX_COUNT); +} + +/* + * ssdfs_extents_tree_create() - create extents tree of a new inode + * @fsi: pointer on shared file system object + * @ii: pointer on in-core SSDFS inode + * + * This method tries to create extents btree for a new inode. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - unable to allocate memory. + */ +int ssdfs_extents_tree_create(struct ssdfs_fs_info *fsi, + struct ssdfs_inode_info *ii) +{ + struct ssdfs_extents_btree_info *ptr; + size_t fork_size = sizeof(struct ssdfs_raw_fork); + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !ii); + BUG_ON(!rwsem_is_locked(&ii->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("ii %p, ino %lu\n", + ii, ii->vfs_inode.i_ino); +#else + SSDFS_DBG("ii %p, ino %lu\n", + ii, ii->vfs_inode.i_ino); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (S_ISDIR(ii->vfs_inode.i_mode)) { + SSDFS_WARN("folder cannot have extents tree\n"); + return -ERANGE; + } else + ii->extents_tree = NULL; + + ptr = ssdfs_ext_tree_kzalloc(sizeof(struct ssdfs_extents_btree_info), + GFP_KERNEL); + if (!ptr) { + SSDFS_ERR("fail to allocate extents tree\n"); + return -ENOMEM; + } + + err = ssdfs_commit_queue_create(ptr); + if (unlikely(err)) { + SSDFS_ERR("fail to create commit queue: err %d\n", + err); + ssdfs_ext_tree_kfree(ptr); + return err; + } + + atomic_set(&ptr->state, SSDFS_EXTENTS_BTREE_UNKNOWN_STATE); + atomic_set(&ptr->type, SSDFS_INLINE_FORKS_ARRAY); + atomic64_set(&ptr->forks_count, 0); + init_rwsem(&ptr->lock); + ptr->generic_tree = NULL; + memset(ptr->buffer.forks, 0xFF, fork_size * SSDFS_INLINE_FORKS_COUNT); + ptr->inline_forks = ptr->buffer.forks; + memset(&ptr->root_buffer, 0xFF, + sizeof(struct ssdfs_btree_inline_root_node)); + ptr->root = NULL; + ssdfs_memcpy(&ptr->desc, + 0, sizeof(struct ssdfs_extents_btree_descriptor), + &fsi->segs_tree->extents_btree, + 0, sizeof(struct ssdfs_extents_btree_descriptor), + sizeof(struct ssdfs_extents_btree_descriptor)); + ptr->owner = ii; + ptr->fsi = fsi; + atomic_set(&ptr->state, SSDFS_EXTENTS_BTREE_CREATED); + + ssdfs_debug_extents_btree_object(ptr); + + ii->extents_tree = ptr; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; +} + +/* + * ssdfs_extents_tree_destroy() - destroy extents tree + * @ii: pointer on in-core SSDFS inode + */ +void ssdfs_extents_tree_destroy(struct ssdfs_inode_info *ii) +{ + size_t fork_size = sizeof(struct ssdfs_raw_fork); + struct ssdfs_extents_btree_info *tree; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ii); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("ii %p, ino %lu\n", + ii, ii->vfs_inode.i_ino); +#else + SSDFS_DBG("ii %p, ino %lu\n", + ii, ii->vfs_inode.i_ino); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + tree = SSDFS_EXTREE(ii); + + if (!tree) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("extents tree is absent: ino %lu\n", + ii->vfs_inode.i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + return; + } + + switch (atomic_read(&tree->state)) { + case SSDFS_EXTENTS_BTREE_CREATED: + case SSDFS_EXTENTS_BTREE_INITIALIZED: + /* expected state*/ + break; + + case SSDFS_EXTENTS_BTREE_CORRUPTED: + SSDFS_WARN("extents tree is corrupted: " + "ino %lu\n", + ii->vfs_inode.i_ino); + break; + + case SSDFS_EXTENTS_BTREE_DIRTY: + if (atomic64_read(&tree->forks_count) > 0) { + SSDFS_WARN("extents tree is dirty: " + "ino %lu\n", + ii->vfs_inode.i_ino); + } else { + /* regular destroy */ + atomic_set(&tree->state, + SSDFS_EXTENTS_BTREE_UNKNOWN_STATE); + } + break; + + default: + SSDFS_WARN("invalid state of extents tree: " + "ino %lu, state %#x\n", + ii->vfs_inode.i_ino, + atomic_read(&tree->state)); + return; + } + + if (rwsem_is_locked(&tree->lock)) { + /* inform about possible trouble */ + SSDFS_WARN("tree is locked under destruction\n"); + } + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_FORKS_ARRAY: + if (!tree->inline_forks) { + SSDFS_WARN("empty inline_forks pointer\n"); + memset(tree->buffer.forks, 0xFF, + fork_size * SSDFS_INLINE_FORKS_COUNT); + } else { + memset(tree->inline_forks, 0xFF, + fork_size * SSDFS_INLINE_FORKS_COUNT); + } + tree->inline_forks = NULL; + break; + + case SSDFS_PRIVATE_EXTENTS_BTREE: + if (!tree->generic_tree) { + SSDFS_WARN("empty generic_tree pointer\n"); + ssdfs_btree_destroy(&tree->buffer.tree); + } else { + /* destroy tree via pointer */ + ssdfs_btree_destroy(tree->generic_tree); + } + tree->generic_tree = NULL; + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG(); +#else + SSDFS_WARN("invalid extents btree state %#x\n", + atomic_read(&tree->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + } + + memset(&tree->root_buffer, 0xFF, + sizeof(struct ssdfs_btree_inline_root_node)); + tree->root = NULL; + + tree->owner = NULL; + tree->fsi = NULL; + + ssdfs_commit_queue_destroy(tree); + + atomic_set(&tree->type, SSDFS_EXTENTS_BTREE_UNKNOWN_TYPE); + atomic_set(&tree->state, SSDFS_EXTENTS_BTREE_UNKNOWN_STATE); + + ssdfs_ext_tree_kfree(ii->extents_tree); + ii->extents_tree = NULL; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ +} + +/* + * ssdfs_extents_tree_init() - init extents tree for existing inode + * @fsi: pointer on shared file system object + * @ii: pointer on in-core SSDFS inode + * + * This method tries to create the extents tree and to initialize + * the root node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - unable to allocate memory. + * %-ERANGE - internal error. + * %-EIO - corrupted raw on-disk inode. + */ +int ssdfs_extents_tree_init(struct ssdfs_fs_info *fsi, + struct ssdfs_inode_info *ii) +{ + struct ssdfs_inode raw_inode; + struct ssdfs_btree_node *node; + struct ssdfs_extents_btree_info *tree; + struct ssdfs_btree_inline_root_node *root_node; + size_t fork_size = sizeof(struct ssdfs_raw_fork); + size_t inline_forks_size = fork_size * SSDFS_INLINE_FORKS_COUNT; + u16 flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !ii); + BUG_ON(!rwsem_is_locked(&ii->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("si %p, ii %p, ino %lu\n", + fsi, ii, ii->vfs_inode.i_ino); +#else + SSDFS_DBG("si %p, ii %p, ino %lu\n", + fsi, ii, ii->vfs_inode.i_ino); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + tree = SSDFS_EXTREE(ii); + if (!tree) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("extents tree is absent: ino %lu\n", + ii->vfs_inode.i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + ssdfs_memcpy(&raw_inode, + 0, sizeof(struct ssdfs_inode), + &ii->raw_inode, + 0, sizeof(struct ssdfs_inode), + sizeof(struct ssdfs_inode)); + + flags = le16_to_cpu(raw_inode.private_flags); + + switch (atomic_read(&tree->state)) { + case SSDFS_EXTENTS_BTREE_CREATED: + /* expected tree state */ + break; + + default: + SSDFS_WARN("unexpected state of tree %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + } + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_FORKS_ARRAY: + /* expected tree type */ + break; + + case SSDFS_PRIVATE_EXTENTS_BTREE: + SSDFS_WARN("unexpected type of tree %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + + default: + SSDFS_WARN("invalid type of tree %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + + down_write(&tree->lock); + + if (flags & SSDFS_INODE_HAS_EXTENTS_BTREE) { + atomic64_set(&tree->forks_count, + le32_to_cpu(raw_inode.count_of.forks)); + + if (tree->generic_tree) { + err = -ERANGE; + atomic_set(&tree->state, + SSDFS_EXTENTS_BTREE_CORRUPTED); + SSDFS_WARN("generic tree exists\n"); + goto finish_tree_init; + } + + tree->generic_tree = &tree->buffer.tree; + tree->inline_forks = NULL; + atomic_set(&tree->type, SSDFS_PRIVATE_EXTENTS_BTREE); + + err = ssdfs_btree_create(fsi, + ii->vfs_inode.i_ino, + &ssdfs_extents_btree_desc_ops, + &ssdfs_extents_btree_ops, + tree->generic_tree); + if (unlikely(err)) { + atomic_set(&tree->state, + SSDFS_EXTENTS_BTREE_CORRUPTED); + SSDFS_ERR("fail to create extents tree: err %d\n", + err); + goto finish_tree_init; + } + + err = ssdfs_btree_radix_tree_find(tree->generic_tree, + SSDFS_BTREE_ROOT_NODE_ID, + &node); + if (unlikely(err)) { + SSDFS_ERR("fail to get the root node: err %d\n", + err); + goto fail_create_generic_tree; + } else if (unlikely(!node)) { + err = -ERANGE; + SSDFS_WARN("empty node pointer\n"); + goto fail_create_generic_tree; + } + + root_node = &raw_inode.internal[0].area1.extents_root; + err = ssdfs_btree_create_root_node(node, root_node); + if (unlikely(err)) { + SSDFS_ERR("fail to init the root node: err %d\n", + err); + goto fail_create_generic_tree; + } + + tree->root = &tree->root_buffer; + + ssdfs_memcpy(tree->root, + 0, sizeof(struct ssdfs_btree_inline_root_node), + root_node, + 0, sizeof(struct ssdfs_btree_inline_root_node), + sizeof(struct ssdfs_btree_inline_root_node)); + + atomic_set(&tree->state, SSDFS_EXTENTS_BTREE_INITIALIZED); + +fail_create_generic_tree: + if (unlikely(err)) { + atomic_set(&tree->state, + SSDFS_EXTENTS_BTREE_CORRUPTED); + ssdfs_btree_destroy(tree->generic_tree); + tree->generic_tree = NULL; + goto finish_tree_init; + } + } else if (flags & SSDFS_INODE_HAS_XATTR_BTREE) { + atomic64_set(&tree->forks_count, + le32_to_cpu(raw_inode.count_of.forks)); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(atomic64_read(&tree->forks_count) > 1); +#else + if (atomic64_read(&tree->forks_count) > 1) { + err = -EIO; + atomic_set(&tree->state, + SSDFS_EXTENTS_BTREE_CORRUPTED); + SSDFS_ERR("corrupted on-disk raw inode: " + "forks_count %llu\n", + (u64)atomic64_read(&tree->forks_count)); + goto finish_tree_init; + } +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!tree->inline_forks) { + err = -ERANGE; + atomic_set(&tree->state, + SSDFS_EXTENTS_BTREE_CORRUPTED); + SSDFS_WARN("undefined inline forks pointer\n"); + goto finish_tree_init; + } else { + ssdfs_memcpy(tree->inline_forks, 0, inline_forks_size, + &raw_inode.internal, 0, inline_forks_size, + inline_forks_size); + } + + atomic_set(&tree->type, SSDFS_INLINE_FORKS_ARRAY); + atomic_set(&tree->state, SSDFS_EXTENTS_BTREE_INITIALIZED); + } else if (flags & SSDFS_INODE_HAS_INLINE_EXTENTS) { + atomic64_set(&tree->forks_count, + le32_to_cpu(raw_inode.count_of.forks)); + + if (!tree->inline_forks) { + err = -ERANGE; + atomic_set(&tree->state, + SSDFS_EXTENTS_BTREE_CORRUPTED); + SSDFS_WARN("undefined inline forks pointer\n"); + goto finish_tree_init; + } else { + ssdfs_memcpy(tree->inline_forks, 0, inline_forks_size, + &raw_inode.internal, 0, inline_forks_size, + inline_forks_size); + } + + atomic_set(&tree->type, SSDFS_INLINE_FORKS_ARRAY); + atomic_set(&tree->state, SSDFS_EXTENTS_BTREE_INITIALIZED); + } else + BUG(); + +finish_tree_init: + up_write(&tree->lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ssdfs_debug_extents_btree_object(tree); + + return err; +} + +/* + * ssdfs_migrate_inline2generic_tree() - convert inline tree into generic + * @tree: extents tree + * + * This method tries to convert the inline tree into generic one. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EFAULT - the tree is empty. + */ +static +int ssdfs_migrate_inline2generic_tree(struct ssdfs_extents_btree_info *tree) +{ + struct ssdfs_raw_fork inline_forks[SSDFS_INLINE_FORKS_COUNT]; + struct ssdfs_btree_search *search; + size_t fork_size = sizeof(struct ssdfs_raw_fork); + s64 forks_count, forks_capacity; + int private_flags; + u64 start_hash = 0, end_hash = 0; + u64 blks_count = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p\n", tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_FORKS_ARRAY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's type %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + + switch (atomic_read(&tree->state)) { + case SSDFS_EXTENTS_BTREE_CREATED: + case SSDFS_EXTENTS_BTREE_INITIALIZED: + case SSDFS_EXTENTS_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + forks_count = atomic64_read(&tree->forks_count); + + if (!tree->owner) { + SSDFS_ERR("empty owner inode\n"); + return -ERANGE; + } + + private_flags = atomic_read(&tree->owner->private_flags); + + forks_capacity = SSDFS_INLINE_FORKS_COUNT; + if (private_flags & SSDFS_INODE_HAS_XATTR_BTREE) + forks_capacity--; + if (private_flags & SSDFS_INODE_HAS_EXTENTS_BTREE) { + SSDFS_ERR("the extents tree is generic\n"); + return -ERANGE; + } + + if (forks_count > forks_capacity) { + SSDFS_WARN("extents tree is corrupted: " + "forks_count %lld, forks_capacity %lld\n", + forks_count, forks_capacity); + atomic_set(&tree->state, SSDFS_EXTENTS_BTREE_CORRUPTED); + return -ERANGE; + } else if (forks_count == 0) { + SSDFS_DBG("empty tree\n"); + return -EFAULT; + } else if (forks_count < forks_capacity) { + SSDFS_WARN("forks_count %lld, forks_capacity %lld\n", + forks_count, forks_capacity); + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree->inline_forks || tree->generic_tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset(inline_forks, 0xFF, fork_size * SSDFS_INLINE_FORKS_COUNT); + ssdfs_memcpy(inline_forks, 0, fork_size * forks_capacity, + tree->inline_forks, 0, fork_size * forks_capacity, + fork_size * forks_capacity); + tree->inline_forks = NULL; + + tree->generic_tree = &tree->buffer.tree; + tree->inline_forks = NULL; + + atomic64_set(&tree->forks_count, 0); + + err = ssdfs_btree_create(tree->fsi, + tree->owner->vfs_inode.i_ino, + &ssdfs_extents_btree_desc_ops, + &ssdfs_extents_btree_ops, + &tree->buffer.tree); + if (unlikely(err)) { + SSDFS_ERR("fail to create generic tree: err %d\n", + err); + goto recover_inline_tree; + } + + start_hash = le64_to_cpu(inline_forks[0].start_offset); + if (forks_count > 1) { + end_hash = + le64_to_cpu(inline_forks[forks_count - 1].start_offset); + blks_count = + le64_to_cpu(inline_forks[forks_count - 1].blks_count); + } else { + end_hash = start_hash; + blks_count = le64_to_cpu(inline_forks[0].blks_count); + } + + if (blks_count == 0 || blks_count >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid blks_count %llu\n", + blks_count); + goto destroy_generic_tree; + } + + end_hash += blks_count - 1; + + search = ssdfs_btree_search_alloc(); + if (!search) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate btree search object\n"); + goto destroy_generic_tree; + } + + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + search->request.flags = + SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT; + search->request.start.hash = start_hash; + search->request.end.hash = end_hash; + search->request.count = forks_count; + + err = ssdfs_btree_find_item(&tree->buffer.tree, search); + if (err == -ENODATA) { + /* expected error */ + err = 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find item: " + "start_hash %llx, end_hash %llx, err %d\n", + start_hash, end_hash, err); + goto finish_add_range; + } + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + case SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE: + /* expected state */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid search result's state %#x\n", + search->result.state); + goto finish_add_range; + } + + if (search->result.buf) { + err = -ERANGE; + SSDFS_ERR("search->result.buf %p\n", + search->result.buf); + goto finish_add_range; + } + + if (forks_count == 1) { + search->result.buf_state = SSDFS_BTREE_SEARCH_INLINE_BUFFER; + search->result.buf_size = sizeof(struct ssdfs_raw_fork); + search->result.items_in_buffer = forks_count; + search->result.buf = &search->raw.fork; + ssdfs_memcpy(&search->raw.fork, 0, search->result.buf_size, + inline_forks, 0, search->result.buf_size, + search->result.buf_size); + } else { + err = ssdfs_btree_search_alloc_result_buf(search, + forks_count * sizeof(struct ssdfs_raw_fork)); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate memory for buffer\n"); + goto finish_add_range; + } + ssdfs_memcpy(search->result.buf, 0, search->result.buf_size, + inline_forks, 0, search->result.buf_size, + search->result.buf_size); + search->result.items_in_buffer = (u16)forks_count; + } + + search->request.type = SSDFS_BTREE_SEARCH_ADD_RANGE; + + err = ssdfs_btree_add_range(&tree->buffer.tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to add the range into tree: " + "start_hash %llx, end_hash %llx, err %d\n", + start_hash, end_hash, err); + goto finish_add_range; + } + +finish_add_range: + ssdfs_btree_search_free(search); + + if (unlikely(err)) + goto destroy_generic_tree; + + err = ssdfs_btree_synchronize_root_node(tree->generic_tree, + tree->root); + if (unlikely(err)) { + SSDFS_ERR("fail to synchronize the root node: " + "err %d\n", err); + goto destroy_generic_tree; + } + + atomic_set(&tree->type, SSDFS_PRIVATE_EXTENTS_BTREE); + atomic_set(&tree->state, SSDFS_EXTENTS_BTREE_DIRTY); + + atomic_or(SSDFS_INODE_HAS_EXTENTS_BTREE, + &tree->owner->private_flags); + atomic_and(~SSDFS_INODE_HAS_INLINE_EXTENTS, + &tree->owner->private_flags); + return 0; + +destroy_generic_tree: + ssdfs_btree_destroy(&tree->buffer.tree); + +recover_inline_tree: + ssdfs_memcpy(tree->buffer.forks, + 0, fork_size * SSDFS_INLINE_FORKS_COUNT, + inline_forks, + 0, fork_size * SSDFS_INLINE_FORKS_COUNT, + fork_size * SSDFS_INLINE_FORKS_COUNT); + tree->inline_forks = tree->buffer.forks; + tree->generic_tree = NULL; + atomic64_set(&tree->forks_count, forks_count); + return err; +} + +/* + * ssdfs_extents_tree_flush() - save modified extents tree + * @fsi: pointer on shared file system object + * @ii: pointer on in-core SSDFS inode + * + * This method tries to flush inode's extents btree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_extents_tree_flush(struct ssdfs_fs_info *fsi, + struct ssdfs_inode_info *ii) +{ + struct ssdfs_extents_btree_info *tree; + size_t fork_size = sizeof(struct ssdfs_raw_fork); + size_t inline_forks_size = fork_size * SSDFS_INLINE_FORKS_COUNT; + int flags; + u64 forks_count = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !ii); + BUG_ON(!rwsem_is_locked(&ii->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("fsi %p, ii %p, ino %lu\n", + fsi, ii, ii->vfs_inode.i_ino); +#else + SSDFS_DBG("fsi %p, ii %p, ino %lu\n", + fsi, ii, ii->vfs_inode.i_ino); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + tree = SSDFS_EXTREE(ii); + if (!tree) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("extents tree is absent: ino %lu\n", + ii->vfs_inode.i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ERANGE; + } + + ssdfs_debug_extents_btree_object(tree); + + flags = atomic_read(&ii->private_flags); + + down_write(&tree->lock); + + err = ssdfs_extents_tree_commit_logs_now(tree); + if (unlikely(err)) { + SSDFS_ERR("fail to commit logs: " + "ino %lu, err %d\n", + ii->vfs_inode.i_ino, err); + goto finish_extents_tree_flush; + } + + switch (atomic_read(&tree->state)) { + case SSDFS_EXTENTS_BTREE_DIRTY: + /* need to flush */ + break; + + case SSDFS_EXTENTS_BTREE_CREATED: + case SSDFS_EXTENTS_BTREE_INITIALIZED: + /* do nothing */ + goto finish_extents_tree_flush; + + case SSDFS_EXTENTS_BTREE_CORRUPTED: + err = -EOPNOTSUPP; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("extents btree corrupted: ino %lu\n", + ii->vfs_inode.i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_extents_tree_flush; + + default: + err = -ERANGE; + SSDFS_WARN("unexpected state of tree %#x\n", + atomic_read(&tree->state)); + goto finish_extents_tree_flush; + } + + forks_count = atomic64_read(&tree->forks_count); + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_FORKS_ARRAY: + if (!tree->inline_forks) { + err = -ERANGE; + atomic_set(&tree->state, + SSDFS_EXTENTS_BTREE_CORRUPTED); + SSDFS_WARN("undefined inline forks pointer\n"); + goto finish_extents_tree_flush; + } + + if (forks_count == 0) { + flags = atomic_read(&ii->private_flags); + + if (flags & SSDFS_INODE_HAS_XATTR_BTREE) { + memset(&ii->raw_inode.internal, 0xFF, + fork_size); + } else { + memset(&ii->raw_inode.internal, 0xFF, + inline_forks_size); + } + } else if (forks_count == 1) { + flags = atomic_read(&ii->private_flags); + + if (flags & SSDFS_INODE_HAS_XATTR_BTREE) { + ssdfs_memcpy(&ii->raw_inode.internal, + 0, fork_size, + tree->inline_forks, + 0, fork_size, + fork_size); + } else { + ssdfs_memcpy(&ii->raw_inode.internal, + 0, inline_forks_size, + tree->inline_forks, + 0, inline_forks_size, + inline_forks_size); + } + } else if (forks_count == SSDFS_INLINE_FORKS_COUNT) { + flags = atomic_read(&ii->private_flags); + + if (flags & SSDFS_INODE_HAS_XATTR_BTREE) { + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("tree should be converted: " + "ino %lu\n", + ii->vfs_inode.i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + ssdfs_memcpy(&ii->raw_inode.internal, + 0, inline_forks_size, + tree->inline_forks, + 0, inline_forks_size, + inline_forks_size); + } + + if (err == -EAGAIN) { + err = ssdfs_migrate_inline2generic_tree(tree); + if (unlikely(err)) { + atomic_set(&tree->state, + SSDFS_EXTENTS_BTREE_CORRUPTED); + SSDFS_ERR("fail to convert tree: " + "err %d\n", err); + goto finish_extents_tree_flush; + } else + goto try_generic_tree_flush; + } + } else { + err = -ERANGE; + atomic_set(&tree->state, + SSDFS_EXTENTS_BTREE_CORRUPTED); + SSDFS_WARN("invalid forks_count %llu\n", + (u64)atomic64_read(&tree->forks_count)); + goto finish_extents_tree_flush; + } + + atomic_or(SSDFS_INODE_HAS_INLINE_EXTENTS, + &ii->private_flags); + break; + + case SSDFS_PRIVATE_EXTENTS_BTREE: +try_generic_tree_flush: + if (!tree->generic_tree) { + err = -ERANGE; + atomic_set(&tree->state, + SSDFS_EXTENTS_BTREE_CORRUPTED); + SSDFS_WARN("undefined generic tree pointer\n"); + goto finish_extents_tree_flush; + } + + err = ssdfs_btree_flush(tree->generic_tree); + if (unlikely(err)) { + SSDFS_ERR("fail to flush extents btree: " + "ino %lu, err %d\n", + ii->vfs_inode.i_ino, err); + goto finish_generic_tree_flush; + } + + if (!tree->root) { + err = -ERANGE; + atomic_set(&tree->state, + SSDFS_EXTENTS_BTREE_CORRUPTED); + SSDFS_WARN("undefined root node pointer\n"); + goto finish_generic_tree_flush; + } + + ssdfs_memcpy(&ii->raw_inode.internal[0].area1.extents_root, + 0, sizeof(struct ssdfs_btree_inline_root_node), + tree->root, + 0, sizeof(struct ssdfs_btree_inline_root_node), + sizeof(struct ssdfs_btree_inline_root_node)); + + atomic_or(SSDFS_INODE_HAS_EXTENTS_BTREE, + &ii->private_flags); + +finish_generic_tree_flush: + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid type of tree %#x\n", + atomic_read(&tree->type)); + goto finish_extents_tree_flush; + } + + ii->raw_inode.count_of.forks = cpu_to_le32((u32)forks_count); + atomic_set(&tree->state, SSDFS_EXTENTS_BTREE_INITIALIZED); + +finish_extents_tree_flush: + up_write(&tree->lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("RAW INODE DUMP\n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + &ii->raw_inode, + sizeof(struct ssdfs_inode)); + SSDFS_DBG("\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_prepare_volume_extent() - convert requested byte stream into extent + * @fsi: pointer on shared file system object + * @req: request object + * + * This method tries to convert logical byte stream into extent of blocks. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-ENODATA - unable to convert byte stream into extent. + */ +int ssdfs_prepare_volume_extent(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req) +{ + struct ssdfs_inode_info *ii; + struct ssdfs_extents_btree_info *tree; + struct ssdfs_btree_search *search; + struct ssdfs_raw_fork *fork = NULL; + struct ssdfs_raw_extent *extent = NULL; + u32 pagesize = fsi->pagesize; + u64 seg_id; + u32 logical_blk = U32_MAX, len; + u64 start_blk; + u64 blks_count; + u64 requested_blk, requested_len; + u64 processed_blks = 0; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !req); + BUG_ON((req->extent.logical_offset >> fsi->log_pagesize) >= U32_MAX); + + SSDFS_DBG("fsi %p, req %p, ino %llu, " + "logical_offset %llu, data_bytes %u, " + "cno %llu, parent_snapshot %llu\n", + fsi, req, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, + req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + ii = SSDFS_I(req->result.pvec.pages[0]->mapping->host); + + tree = SSDFS_EXTREE(ii); + if (!tree) { + down_write(&ii->lock); + err = ssdfs_extents_tree_create(fsi, ii); + up_write(&ii->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to create extents tree: " + "err %d\n", err); + return err; + } else + tree = SSDFS_EXTREE(ii); + } + + requested_blk = req->extent.logical_offset >> fsi->log_pagesize; + requested_len = (req->extent.data_bytes + pagesize - 1) >> + fsi->log_pagesize; + + search = ssdfs_btree_search_alloc(); + if (!search) { + SSDFS_ERR("fail to allocate btree search object\n"); + return -ENOMEM; + } + + ssdfs_btree_search_init(search); + + err = ssdfs_extents_tree_find_fork(tree, requested_blk, search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the fork: " + "blk %llu, err %d\n", + requested_blk, err); + goto finish_prepare_volume_extent; + } + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_VALID_ITEM: + /* expected state */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid result state %#x\n", + search->result.state); + goto finish_prepare_volume_extent; + } + + switch (search->result.buf_state) { + case SSDFS_BTREE_SEARCH_INLINE_BUFFER: + case SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER: + /* expected state */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid buffer state %#x\n", + search->result.buf_state); + goto finish_prepare_volume_extent; + } + + if (!search->result.buf) { + err = -ERANGE; + SSDFS_ERR("empty result buffer pointer\n"); + goto finish_prepare_volume_extent; + } + + if (search->result.items_in_buffer == 0) { + err = -ERANGE; + SSDFS_ERR("items_in_buffer %u\n", + search->result.items_in_buffer); + goto finish_prepare_volume_extent; + } + + fork = (struct ssdfs_raw_fork *)search->result.buf; + start_blk = le64_to_cpu(fork->start_offset); + blks_count = le64_to_cpu(fork->blks_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_blk %llu, blks_count %llu\n", + start_blk, blks_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < SSDFS_INLINE_EXTENTS_COUNT; extent = NULL, i++) { + if (processed_blks >= blks_count) + break; + + extent = &fork->extents[i]; + + seg_id = le64_to_cpu(extent->seg_id); + logical_blk = le32_to_cpu(extent->logical_blk); + len = le32_to_cpu(extent->len); + + if (seg_id == U64_MAX || logical_blk == U32_MAX || + len == U32_MAX) { + err = -ERANGE; + SSDFS_ERR("corrupted extent: index %d\n", i); + goto finish_prepare_volume_extent; + } + + if (len == 0) { + err = -ERANGE; + SSDFS_ERR("corrupted extent: index %d\n", i); + goto finish_prepare_volume_extent; + } + + if ((start_blk + processed_blks) <= requested_blk && + requested_blk < (start_blk + processed_blks + len)) { + u64 diff = requested_blk - (start_blk + processed_blks); + + logical_blk += (u32)diff; + len -= (u32)diff; + len = min_t(u32, len, requested_len); + break; + } + + processed_blks += len; + } + + if (!extent) { + err = -ENODATA; + SSDFS_DBG("extent hasn't been found\n"); + goto finish_prepare_volume_extent; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(logical_blk >= U16_MAX); + BUG_ON(len >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_request_define_segment(seg_id, req); + ssdfs_request_define_volume_extent((u16)logical_blk, (u16)len, req); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_blk %u, len %u\n", + logical_blk, len); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_prepare_volume_extent: + ssdfs_btree_search_free(search); + return err; +} + +/* + * ssdfs_recommend_migration_extent() - recommend migration extent + * @fsi: pointer on shared file system object + * @req: request object + * @fragment: recommended fragment [out] + * + * This method tries to find an extent recommended to migration. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-ENODATA - unable to find a relevant extent. + */ +int ssdfs_recommend_migration_extent(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + struct ssdfs_zone_fragment *fragment) +{ + struct ssdfs_inode_info *ii; + struct ssdfs_extents_btree_info *tree; + struct ssdfs_btree_search *search; + struct ssdfs_raw_fork *fork = NULL; + struct ssdfs_raw_extent *found = NULL; + size_t item_size = sizeof(struct ssdfs_raw_extent); + u64 start_blk; + u64 blks_count; + u64 seg_id; + u32 logical_blk = U32_MAX, len; + u64 requested_blk, requested_len; + u64 processed_blks = 0; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !req || !fragment); + BUG_ON((req->extent.logical_offset >> fsi->log_pagesize) >= U32_MAX); + + SSDFS_DBG("fsi %p, req %p, ino %llu, " + "logical_offset %llu, data_bytes %u, " + "cno %llu, parent_snapshot %llu\n", + fsi, req, req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, + req->extent.cno, + req->extent.parent_snapshot); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset(fragment, 0xFF, sizeof(struct ssdfs_zone_fragment)); + + ii = SSDFS_I(req->result.pvec.pages[0]->mapping->host); + + tree = SSDFS_EXTREE(ii); + if (!tree) { + down_write(&ii->lock); + err = ssdfs_extents_tree_create(fsi, ii); + up_write(&ii->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to create extents tree: " + "err %d\n", err); + return err; + } else + tree = SSDFS_EXTREE(ii); + } + + requested_blk = req->extent.logical_offset >> fsi->log_pagesize; + + requested_blk = req->extent.logical_offset >> fsi->log_pagesize; + requested_len = (req->extent.data_bytes + fsi->pagesize - 1) >> + fsi->log_pagesize; + + search = ssdfs_btree_search_alloc(); + if (!search) { + SSDFS_ERR("fail to allocate btree search object\n"); + return -ENOMEM; + } + + ssdfs_btree_search_init(search); + + err = ssdfs_extents_tree_find_fork(tree, requested_blk, search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the fork: " + "blk %llu, err %d\n", + requested_blk, err); + goto finish_recommend_migration_extent; + } + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_VALID_ITEM: + /* expected state */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid result state %#x\n", + search->result.state); + goto finish_recommend_migration_extent; + } + + switch (search->result.buf_state) { + case SSDFS_BTREE_SEARCH_INLINE_BUFFER: + case SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER: + /* expected state */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid buffer state %#x\n", + search->result.buf_state); + goto finish_recommend_migration_extent; + } + + if (!search->result.buf) { + err = -ERANGE; + SSDFS_ERR("empty result buffer pointer\n"); + goto finish_recommend_migration_extent; + } + + if (search->result.items_in_buffer == 0) { + err = -ERANGE; + SSDFS_ERR("items_in_buffer %u\n", + search->result.items_in_buffer); + goto finish_recommend_migration_extent; + } + + fork = (struct ssdfs_raw_fork *)search->result.buf; + start_blk = le64_to_cpu(fork->start_offset); + blks_count = le64_to_cpu(fork->blks_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_blk %llu, blks_count %llu\n", + start_blk, blks_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < SSDFS_INLINE_EXTENTS_COUNT; found = NULL, i++) { + if (processed_blks >= blks_count) + break; + + found = &fork->extents[i]; + + seg_id = le64_to_cpu(found->seg_id); + logical_blk = le32_to_cpu(found->logical_blk); + len = le32_to_cpu(found->len); + + if (seg_id == U64_MAX || logical_blk == U32_MAX || + len == U32_MAX) { + err = -ERANGE; + SSDFS_ERR("corrupted extent: index %d\n", i); + goto finish_recommend_migration_extent; + } + + if (len == 0) { + err = -ERANGE; + SSDFS_ERR("corrupted extent: index %d\n", i); + goto finish_recommend_migration_extent; + } + + if (req->place.start.seg_id == seg_id) { + if (logical_blk <= req->place.start.blk_index && + req->place.start.blk_index < (logical_blk + len)) { + /* extent has been found */ + break; + } + } + + processed_blks += len; + found = NULL; + } + + if (!found) { + err = -ENODATA; + SSDFS_DBG("extent hasn't been found\n"); + goto finish_recommend_migration_extent; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(logical_blk >= U16_MAX); + BUG_ON(len >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (logical_blk == req->place.start.blk_index && + req->place.len == len) { + err = -ENODATA; + SSDFS_DBG("extent hasn't been found\n"); + goto finish_recommend_migration_extent; + } else { + fragment->ino = ii->vfs_inode.i_ino; + fragment->logical_blk_offset = start_blk + processed_blks; + ssdfs_memcpy(&fragment->extent, 0, item_size, + found, 0, item_size, + item_size); + } + +finish_recommend_migration_extent: + ssdfs_btree_search_free(search); + return err; +} + +/* + * ssdfs_extents_tree_has_logical_block() - check that block exists + * @blk_offset: offset of logical block into file + * @inode: pointer on VFS inode + */ +bool ssdfs_extents_tree_has_logical_block(u64 blk_offset, struct inode *inode) +{ + struct ssdfs_inode_info *ii; + struct ssdfs_extents_btree_info *tree; + struct ssdfs_btree_search *search; + ino_t ino; + bool is_found = false; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!inode); +#endif /* CONFIG_SSDFS_DEBUG */ + + ii = SSDFS_I(inode); + ino = inode->i_ino; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu, blk_offset %llu\n", + ino, blk_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree = SSDFS_EXTREE(ii); + if (!tree) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("extents tree is absent: ino %lu\n", + ii->vfs_inode.i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + return false; + } + + search = ssdfs_btree_search_alloc(); + if (!search) { + SSDFS_ERR("fail to allocate btree search object\n"); + return false; + } + + ssdfs_btree_search_init(search); + + err = ssdfs_extents_tree_find_fork(tree, blk_offset, search); + if (err == -ENODATA) + is_found = false; + else if (unlikely(err)) { + is_found = false; + SSDFS_ERR("fail to find the fork: " + "blk %llu, err %d\n", + blk_offset, err); + } else + is_found = true; + + ssdfs_btree_search_free(search); + + return is_found; +} + +/* + * ssdfs_extents_tree_add_extent() - add extent into extents tree + * @inode: pointer on VFS inode + * @req: pointer on segment request [in] + * + * This method tries to add an extent into extents tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOSPC - extents tree is unable to add requested block(s). + * %-EEXIST - extent exists in the tree. + */ +int ssdfs_extents_tree_add_extent(struct inode *inode, + struct ssdfs_segment_request *req) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_inode_info *ii; + struct ssdfs_extents_btree_info *tree; + struct ssdfs_btree_search *search; + struct ssdfs_raw_extent extent; + ino_t ino; + u64 requested_blk; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!inode || !req); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = SSDFS_FS_I(inode->i_sb); + ii = SSDFS_I(inode); + ino = inode->i_ino; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("ino %lu, logical_offset %llu, " + "seg_id %llu, start_blk %u, len %u\n", + ino, req->extent.logical_offset, + req->place.start.seg_id, + req->place.start.blk_index, req->place.len); +#else + SSDFS_DBG("ino %lu, logical_offset %llu, " + "seg_id %llu, start_blk %u, len %u\n", + ino, req->extent.logical_offset, + req->place.start.seg_id, + req->place.start.blk_index, req->place.len); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + tree = SSDFS_EXTREE(ii); + if (!tree) { + down_write(&ii->lock); + err = ssdfs_extents_tree_create(fsi, ii); + up_write(&ii->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to create extents tree: " + "err %d\n", err); + return err; + } else + tree = SSDFS_EXTREE(ii); + } + + requested_blk = req->extent.logical_offset >> fsi->log_pagesize; + extent.seg_id = cpu_to_le64(req->place.start.seg_id); + extent.logical_blk = cpu_to_le32(req->place.start.blk_index); + extent.len = cpu_to_le32(req->place.len); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("requested_blk %llu, " + "extent (seg_id %llu, logical_blk %u, len %u)\n", + requested_blk, extent.seg_id, + extent.logical_blk, extent.len); +#endif /* CONFIG_SSDFS_DEBUG */ + + search = ssdfs_btree_search_alloc(); + if (!search) { + SSDFS_ERR("fail to allocate btree search object\n"); + return -ERANGE; + } + + ssdfs_btree_search_init(search); + err = __ssdfs_extents_tree_add_extent(tree, requested_blk, + &extent, search); + ssdfs_btree_search_free(search); + + if (unlikely(err)) { + SSDFS_ERR("fail to add block into the tree: " + "blk %llu, err %d\n", + requested_blk, err); + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/* + * ssdfs_extents_tree_truncate() - truncate extents tree + * @inode: pointer on VFS inode + * + * This method tries to truncate extents tree. + * + * The key trick with truncate operation that it is possible + * to store inline forks in the inode and to place the whole + * heirarchy into the shared extents tree. This is the case + * of deletion the whole file or practically the whole + * file. The shared tree's thread will be responsible for + * the real invalidation in the background. If we truncate + * the file partially then we could simply correct the whole + * length of the file and to delegate the responsibility + * to truncate all invalidated extents of the tree to the + * thread of shared extents tree. + * + * Usually, if we need to truncate some file then we need to find + * the position of the extent that will be truncated. Finally, + * we will know the whole hierarchy path from the root node + * till the leaf one. So, all forks/extents after truncated one + * should be added into the pre-invalidated list and to be + * deleted or to be obsolete into the leaf node. Also all index + * records should be deleted from all parent nodes and needs + * to be placed into pre-invalidated list of the shared extents tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_extents_tree_truncate(struct inode *inode) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_inode_info *ii; + struct ssdfs_extents_btree_info *tree; + struct ssdfs_btree_search *search; + ino_t ino; + loff_t size; + u64 blk_offset; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!inode); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = SSDFS_FS_I(inode->i_sb); + ii = SSDFS_I(inode); + ino = inode->i_ino; + size = i_size_read(inode); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("ino %lu, size %llu\n", + ino, size); +#else + SSDFS_DBG("ino %lu, size %llu\n", + ino, size); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + tree = SSDFS_EXTREE(ii); + if (!tree) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("extents tree is absent: ino %lu\n", + ii->vfs_inode.i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOENT; + } + + blk_offset = (u64)size + fsi->log_pagesize - 1; + blk_offset >>= fsi->log_pagesize; + + search = ssdfs_btree_search_alloc(); + if (!search) { + SSDFS_ERR("fail to allocate btree search object\n"); + return -ERANGE; + } + + ssdfs_btree_search_init(search); + err = ssdfs_extents_tree_truncate_extent(tree, blk_offset, 0, search); + ssdfs_btree_search_free(search); + + if (unlikely(err)) { + SSDFS_ERR("fail to truncate the tree: " + "blk %llu, err %d\n", + blk_offset, err); + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return err; +} + +/****************************************************************************** + * EXTENTS TREE OBJECT FUNCTIONALITY * + ******************************************************************************/ + +/* + * need_initialize_extent_btree_search() - check necessity to init the search + * @blk: logical block number + * @search: search object + */ +static inline +bool need_initialize_extent_btree_search(u64 blk, + struct ssdfs_btree_search *search) +{ + return need_initialize_btree_search(search) || + search->request.start.hash != blk; +} + +/* + * ssdfs_extents_tree_find_inline_fork() - find an inline fork in the tree + * @tree: extents tree + * @blk: logical block number + * @search: search object + * + * This method tries to find a fork for the requested @blk. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOENT - fork is empty (no extents). + * %-ENODATA - item hasn't been found. + */ +static +int ssdfs_extents_tree_find_inline_fork(struct ssdfs_extents_btree_info *tree, + u64 blk, + struct ssdfs_btree_search *search) +{ + size_t fork_size = sizeof(struct ssdfs_raw_fork); + u64 forks_count; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, blk %llu, search %p\n", + tree, blk, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (atomic_read(&tree->type) != SSDFS_INLINE_FORKS_ARRAY) { + SSDFS_ERR("invalid tree type %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + + ssdfs_btree_search_free_result_buf(search); + + forks_count = (u64)atomic64_read(&tree->forks_count); + + if (forks_count < 0) { + SSDFS_ERR("invalid forks_count %llu\n", + forks_count); + return -ERANGE; + } else if (forks_count == 0) { + SSDFS_DBG("empty tree\n"); + search->result.state = SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND; + search->result.err = -ENOENT; + search->result.start_index = 0; + search->result.count = 0; + search->result.search_cno = ssdfs_current_cno(tree->fsi->sb); + search->result.buf_state = + SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + search->result.buf = NULL; + search->result.buf_size = 0; + search->result.items_in_buffer = 0; + + ssdfs_debug_btree_search_object(search); + + return -ENOENT; + } else if (forks_count > SSDFS_INLINE_FORKS_COUNT) { + SSDFS_ERR("invalid forks_count %llu\n", + forks_count); + return -ERANGE; + } + + if (!tree->inline_forks) { + SSDFS_ERR("inline forks haven't been initialized\n"); + return -ERANGE; + } + + search->result.start_index = 0; + + for (i = 0; i < forks_count; i++) { + struct ssdfs_raw_fork *fork; + u64 start; + u64 blks_count; + + search->result.state = SSDFS_BTREE_SEARCH_UNKNOWN_RESULT; + + fork = &tree->inline_forks[i]; + start = le64_to_cpu(fork->start_offset); + blks_count = le64_to_cpu(fork->blks_count); + + if (start >= U64_MAX || blks_count >= U64_MAX) { + SSDFS_ERR("invalid fork state: " + "start_offset %llu, blks_count %llu\n", + start, blks_count); + return -ERANGE; + } + + ssdfs_memcpy(&search->raw.fork, 0, fork_size, + fork, 0, fork_size, + fork_size); + + search->result.err = 0; + search->result.start_index = (u16)i; + search->result.count = 1; + search->request.count = 1; + search->result.search_cno = ssdfs_current_cno(tree->fsi->sb); + + switch (search->result.buf_state) { + case SSDFS_BTREE_SEARCH_INLINE_BUFFER: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + search->result.buf_state = + SSDFS_BTREE_SEARCH_INLINE_BUFFER; + search->result.buf = &search->raw.fork; + search->result.buf_size = sizeof(struct ssdfs_raw_fork); + search->result.items_in_buffer = 1; + break; + + case SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + ssdfs_btree_search_free_result_buf(search); + search->result.buf_state = + SSDFS_BTREE_SEARCH_INLINE_BUFFER; + search->result.buf = &search->raw.fork; + search->result.buf_size = sizeof(struct ssdfs_raw_fork); + search->result.items_in_buffer = 1; + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + search->result.buf_state = + SSDFS_BTREE_SEARCH_INLINE_BUFFER; + search->result.buf = &search->raw.fork; + search->result.buf_size = sizeof(struct ssdfs_raw_fork); + search->result.items_in_buffer = 1; + break; + } + + if (blk < start) { + err = -ENODATA; + search->result.err = -ENODATA; + search->result.state = + SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND; + goto finish_search_inline_fork; + } else if (start <= blk && blk < (start + blks_count)) { + search->result.state = SSDFS_BTREE_SEARCH_VALID_ITEM; + goto finish_search_inline_fork; + } + } + + err = -ENODATA; + search->result.err = -ENODATA; + search->result.state = SSDFS_BTREE_SEARCH_OUT_OF_RANGE; + + ssdfs_debug_btree_search_object(search); + +finish_search_inline_fork: + return err; +} + +/* + * ssdfs_extents_tree_find_fork() - find a fork in the tree + * @tree: extents tree + * @blk: logical block number + * @search: search object + * + * This method tries to find a fork for the requested @blk. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - item hasn't been found + */ +int ssdfs_extents_tree_find_fork(struct ssdfs_extents_btree_info *tree, + u64 blk, + struct ssdfs_btree_search *search) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + + SSDFS_DBG("tree %p, blk %llu, search %p\n", + tree, blk, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + + if (need_initialize_extent_btree_search(blk, search)) { + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + search->request.flags = + SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT; + search->request.start.hash = blk; + search->request.end.hash = blk; + search->request.count = 1; + } + + switch (atomic_read(&tree->state)) { + case SSDFS_EXTENTS_BTREE_CREATED: + case SSDFS_EXTENTS_BTREE_INITIALIZED: + case SSDFS_EXTENTS_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_FORKS_ARRAY: + down_read(&tree->lock); + err = ssdfs_extents_tree_find_inline_fork(tree, blk, search); + up_read(&tree->lock); + + if (err == -ENODATA || err == -ENOENT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find the inline fork: " + "blk %llu\n", + blk); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to find the inline fork: " + "blk %llu, err %d\n", + blk, err); + } + break; + + case SSDFS_PRIVATE_EXTENTS_BTREE: + down_read(&tree->lock); + err = ssdfs_btree_find_item(tree->generic_tree, search); + up_read(&tree->lock); + + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find the fork: " + "blk %llu\n", + blk); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to find the fork: " + "blk %llu, err %d\n", + blk, err); + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid extents tree type %#x\n", + atomic_read(&tree->type)); + break; + } + + return err; +} + +/* + * ssdfs_add_head_extent_into_fork() - add head extent into the fork + * @blk: logical block number + * @extent: raw extent + * @fork: raw fork + * + * This method tries to add @extent into the head of fork. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOSPC - need to add a new fork in the tree. + */ +static +int ssdfs_add_head_extent_into_fork(u64 blk, + struct ssdfs_raw_extent *extent, + struct ssdfs_raw_fork *fork) +{ + struct ssdfs_raw_extent *cur = NULL; + size_t desc_size = sizeof(struct ssdfs_raw_extent); + u64 seg1, seg2; + u32 lblk1, lblk2; + u32 len1, len2; + u64 blks_count, counted_blks; + int valid_extents; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!extent || !fork); + + SSDFS_DBG("blk %llu, extent %p, fork %p\n", + blk, extent, fork); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (blk >= U64_MAX) { + SSDFS_ERR("invalid blk %llu\n", blk); + return -EINVAL; + } + + blks_count = le64_to_cpu(fork->blks_count); + + seg2 = le64_to_cpu(extent->seg_id); + lblk2 = le32_to_cpu(extent->logical_blk); + len2 = le32_to_cpu(extent->len); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(seg2 >= U64_MAX); + BUG_ON(lblk2 >= U32_MAX); + BUG_ON(len2 >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (blks_count == 0 || blks_count >= U64_MAX) { + fork->start_offset = cpu_to_le64(blk); + fork->blks_count = cpu_to_le64(len2); + cur = &fork->extents[0]; + ssdfs_memcpy(cur, 0, sizeof(struct ssdfs_raw_extent), + extent, 0, sizeof(struct ssdfs_raw_extent), + sizeof(struct ssdfs_raw_extent)); + return 0; + } else if (le64_to_cpu(fork->start_offset) >= U64_MAX) { + SSDFS_ERR("corrupted fork: " + "start_offset %llu, blks_count %llu\n", + le64_to_cpu(fork->start_offset), + blks_count); + return -ERANGE; + } + + if ((blk + len2) != le64_to_cpu(fork->start_offset)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to add the hole into fork: " + "blk %llu, len %u, start_offset %llu\n", + blk, len2, + le64_to_cpu(fork->start_offset)); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } + + counted_blks = 0; + valid_extents = 0; + for (i = 0; i < SSDFS_INLINE_EXTENTS_COUNT; i++) { + u32 len; + + cur = &fork->extents[i]; + len = le32_to_cpu(cur->len); + + if (len >= U32_MAX) + break; + else { + counted_blks += len; + valid_extents++; + } + } + + if (counted_blks != blks_count) { + SSDFS_ERR("corrupted fork: " + "counted_blks %llu, blks_count %llu\n", + counted_blks, blks_count); + return -ERANGE; + } + + if (valid_extents > SSDFS_INLINE_EXTENTS_COUNT || + valid_extents == 0) { + SSDFS_ERR("invalid valid_extents count %d\n", + valid_extents); + return -ERANGE; + } + + cur = &fork->extents[0]; + + seg1 = le64_to_cpu(cur->seg_id); + lblk1 = le32_to_cpu(cur->logical_blk); + len1 = le32_to_cpu(cur->len); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(seg1 >= U64_MAX); + BUG_ON(lblk1 >= U32_MAX); + BUG_ON(len1 >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (seg1 == seg2 && (lblk2 + len2) == lblk1) { + if ((U32_MAX - len2) <= len1) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to merge to extents: " + "len1 %u, len2 %u\n", + len1, len2); +#endif /* CONFIG_SSDFS_DEBUG */ + goto add_extent_into_fork; + } + + cur->logical_blk = cpu_to_le32(lblk2); + le32_add_cpu(&cur->len, len2); + } else { +add_extent_into_fork: + if (valid_extents == SSDFS_INLINE_EXTENTS_COUNT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to add extent: " + "valid_extents %u\n", + valid_extents); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } + + ssdfs_memmove(&fork->extents[1], 0, valid_extents * desc_size, + &fork->extents[0], 0, valid_extents * desc_size, + valid_extents * desc_size); + ssdfs_memcpy(cur, 0, desc_size, + extent, 0, desc_size, + desc_size); + } + + fork->start_offset = cpu_to_le64(blk); + le64_add_cpu(&fork->blks_count, len2); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("FORK: start_offset %llu, blks_count %llu\n", + le64_to_cpu(fork->start_offset), + le64_to_cpu(fork->blks_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_add_tail_extent_into_fork() - add tail extent into the fork + * @blk: logical block number + * @extent: raw extent + * @fork: raw fork + * + * This method tries to add @extent into the tail of fork. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOSPC - need to add a new fork in the tree. + */ +static +int ssdfs_add_tail_extent_into_fork(u64 blk, + struct ssdfs_raw_extent *extent, + struct ssdfs_raw_fork *fork) +{ + struct ssdfs_raw_extent *cur = NULL; + u64 seg1, seg2; + u32 lblk1, lblk2; + u32 len1, len2; + u64 start_offset; + u64 blks_count, counted_blks; + int valid_extents; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!extent || !fork); + + SSDFS_DBG("blk %llu, extent %p, fork %p\n", + blk, extent, fork); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (blk >= U64_MAX) { + SSDFS_ERR("invalid blk %llu\n", blk); + return -EINVAL; + } + + blks_count = le64_to_cpu(fork->blks_count); + + seg2 = le64_to_cpu(extent->seg_id); + lblk2 = le32_to_cpu(extent->logical_blk); + len2 = le32_to_cpu(extent->len); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(seg2 >= U64_MAX); + BUG_ON(lblk2 >= U32_MAX); + BUG_ON(len2 >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + start_offset = le64_to_cpu(fork->start_offset); + + if (blks_count == 0 || blks_count >= U64_MAX) { + fork->start_offset = cpu_to_le64(blk); + fork->blks_count = cpu_to_le64(len2); + cur = &fork->extents[0]; + ssdfs_memcpy(cur, 0, sizeof(struct ssdfs_raw_extent), + extent, 0, sizeof(struct ssdfs_raw_extent), + sizeof(struct ssdfs_raw_extent)); + return 0; + } else if (start_offset >= U64_MAX) { + SSDFS_ERR("corrupted fork: " + "start_offset %llu, blks_count %llu\n", + start_offset, blks_count); + return -ERANGE; + } + + if ((start_offset + blks_count) != blk) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to add the hole into fork: " + "blk %llu, len %u, start_offset %llu\n", + blk, len2, start_offset); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } + + counted_blks = 0; + valid_extents = 0; + for (i = 0; i < SSDFS_INLINE_EXTENTS_COUNT; i++) { + u32 len; + + cur = &fork->extents[i]; + len = le32_to_cpu(cur->len); + + if (len >= U32_MAX) + break; + else { + counted_blks += len; + valid_extents++; + } + } + + if (counted_blks != blks_count) { + SSDFS_ERR("corrupted fork: " + "counted_blks %llu, blks_count %llu\n", + counted_blks, blks_count); + return -ERANGE; + } + + if (valid_extents > SSDFS_INLINE_EXTENTS_COUNT || + valid_extents == 0) { + SSDFS_ERR("invalid valid_extents count %d\n", + valid_extents); + return -ERANGE; + } + + cur = &fork->extents[valid_extents - 1]; + + seg1 = le64_to_cpu(cur->seg_id); + lblk1 = le32_to_cpu(cur->logical_blk); + len1 = le32_to_cpu(cur->len); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(seg1 >= U64_MAX); + BUG_ON(lblk1 >= U32_MAX); + BUG_ON(len1 >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (seg1 == seg2 && (lblk1 + len1) == lblk2) { + if ((U32_MAX - len2) <= len1) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to merge to extents: " + "len1 %u, len2 %u\n", + len1, len2); +#endif /* CONFIG_SSDFS_DEBUG */ + goto add_extent_into_fork; + } + + le32_add_cpu(&cur->len, len2); + } else { +add_extent_into_fork: + if (valid_extents == SSDFS_INLINE_EXTENTS_COUNT) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to add extent: " + "valid_extents %u\n", + valid_extents); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } + + cur = &fork->extents[valid_extents]; + ssdfs_memcpy(cur, 0, sizeof(struct ssdfs_raw_extent), + extent, 0, sizeof(struct ssdfs_raw_extent), + sizeof(struct ssdfs_raw_extent)); + } + + le64_add_cpu(&fork->blks_count, len2); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("FORK: start_offset %llu, blks_count %llu\n", + le64_to_cpu(fork->start_offset), + le64_to_cpu(fork->blks_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_add_extent_into_fork() - add extent into the fork + * @blk: logical block number + * @extent: raw extent + * @search: search object + * + * This method tries to add @extent into the fork. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - fork doesn't exist. + * %-ENOSPC - need to add a new fork in the tree. + * %-EEXIST - extent exists in the fork. + */ +static +int ssdfs_add_extent_into_fork(u64 blk, + struct ssdfs_raw_extent *extent, + struct ssdfs_btree_search *search) +{ + struct ssdfs_raw_fork *fork; + u64 start_offset; + u64 blks_count; + u32 len; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!extent || !search); + + SSDFS_DBG("blk %llu, extent %p, search %p\n", + blk, extent, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_EMPTY_RESULT: + SSDFS_DBG("no fork in search object\n"); + return -ENODATA; + + case SSDFS_BTREE_SEARCH_VALID_ITEM: + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + case SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid search object state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.buf_state != SSDFS_BTREE_SEARCH_INLINE_BUFFER) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search buffer state %#x\n", + search->result.buf_state); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + + if (search->result.buf_size != sizeof(struct ssdfs_raw_fork) || + search->result.items_in_buffer != 1) { + SSDFS_ERR("invalid search buffer state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + fork = &search->raw.fork; + start_offset = le64_to_cpu(fork->start_offset); + blks_count = le64_to_cpu(fork->blks_count); + len = le32_to_cpu(extent->len); + + if (start_offset >= U64_MAX || blks_count >= U64_MAX) { + SSDFS_ERR("invalid fork state: " + "start_offset %llu, blks_count %llu\n", + start_offset, blks_count); + return -ERANGE; + } + + if (blk >= U64_MAX || len >= U32_MAX) { + SSDFS_ERR("invalid extent: " + "blk %llu, len %u\n", + blk, len); + return -ERANGE; + } + + if (start_offset <= blk && blk < (start_offset + blks_count)) { + SSDFS_ERR("extent exists in the fork: " + "fork (start %llu, blks_count %llu), " + "extent (blk %llu, len %u)\n", + start_offset, blks_count, + blk, len); + return -EEXIST; + } + + if (start_offset < (blk + len) && + (blk + len) < (start_offset + blks_count)) { + SSDFS_ERR("extent exists in the fork: " + "fork (start %llu, blks_count %llu), " + "extent (blk %llu, len %u)\n", + start_offset, blks_count, + blk, len); + return -EEXIST; + } + + if (blk < start_offset && (blk + len) < start_offset) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("need to add the fork: " + "fork (start %llu, blks_count %llu), " + "extent (blk %llu, len %u)\n", + start_offset, blks_count, + blk, len); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } + + if (blk > (start_offset + blks_count)) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("need to add the fork: " + "fork (start %llu, blks_count %llu), " + "extent (blk %llu, len %u)\n", + start_offset, blks_count, + blk, len); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fork (start %llu, blks_count %llu), " + "extent (blk %llu, len %u)\n", + start_offset, blks_count, + blk, len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if ((blk + len) == start_offset) { + err = ssdfs_add_head_extent_into_fork(blk, extent, fork); + if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("need to add the fork: " + "fork (start %llu, blks_count %llu), " + "extent (blk %llu, len %u)\n", + start_offset, blks_count, + blk, len); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to add the head extent into fork: " + "fork (start %llu, blks_count %llu), " + "extent (blk %llu, len %u)\n", + start_offset, blks_count, + blk, len); + return err; + } + } else if ((start_offset + blks_count) == blk) { + err = ssdfs_add_tail_extent_into_fork(blk, extent, fork); + if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("need to add the fork: " + "fork (start %llu, blks_count %llu), " + "extent (blk %llu, len %u)\n", + start_offset, blks_count, + blk, len); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to add the tail extent into fork: " + "fork (start %llu, blks_count %llu), " + "extent (blk %llu, len %u)\n", + start_offset, blks_count, + blk, len); + return err; + } + } else { + SSDFS_ERR("invalid extent: " + "fork (start %llu, blks_count %llu), " + "extent (blk %llu, len %u)\n", + start_offset, blks_count, + blk, len); + return -ERANGE; + } + + /* Now fork's start block represent necessary change */ + start_offset = le64_to_cpu(fork->start_offset); + blks_count = le64_to_cpu(fork->blks_count); + search->request.start.hash = start_offset; + search->request.end.hash = start_offset + blks_count - 1; + search->request.count = (int)blks_count; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search (request.start.hash %llx, " + "request.end.hash %llx, request.count %d)\n", + search->request.start.hash, + search->request.end.hash, + search->request.count); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_prepare_empty_fork() - prepare empty fork + * @blk: block number + * @search: search object + * + * This method tries to prepare empty fork for @blk into + * @search object. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_prepare_empty_fork(u64 blk, + struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); + + SSDFS_DBG("blk %llu, search %p\n", + blk, search); + SSDFS_DBG("search->result: (state %#x, err %d, " + "start_index %u, count %u, buf_state %#x, buf %p, " + "buf_size %zu, items_in_buffer %u)\n", + search->result.state, + search->result.err, + search->result.start_index, + search->result.count, + search->result.buf_state, + search->result.buf, + search->result.buf_size, + search->result.items_in_buffer); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->result.err = 0; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.start_index >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.buf_state) { + case SSDFS_BTREE_SEARCH_INLINE_BUFFER: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + search->result.buf_state = SSDFS_BTREE_SEARCH_INLINE_BUFFER; + search->result.buf = &search->raw.fork; + search->result.buf_size = sizeof(struct ssdfs_raw_fork); + search->result.items_in_buffer = 1; + break; + + case SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + ssdfs_btree_search_free_result_buf(search); + search->result.buf_state = SSDFS_BTREE_SEARCH_INLINE_BUFFER; + search->result.buf = &search->raw.fork; + search->result.buf_size = sizeof(struct ssdfs_raw_fork); + search->result.items_in_buffer = 1; + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + search->result.buf_state = SSDFS_BTREE_SEARCH_INLINE_BUFFER; + search->result.buf = &search->raw.fork; + search->result.buf_size = sizeof(struct ssdfs_raw_fork); + search->result.items_in_buffer = 1; + break; + } + + memset(&search->raw.fork, 0xFF, sizeof(struct ssdfs_raw_fork)); + + search->raw.fork.start_offset = cpu_to_le64(blk); + search->raw.fork.blks_count = cpu_to_le64(0); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search->result: (state %#x, err %d, " + "start_index %u, count %u, buf_state %#x, buf %p, " + "buf_size %zu, items_in_buffer %u)\n", + search->result.state, + search->result.err, + search->result.start_index, + search->result.count, + search->result.buf_state, + search->result.buf, + search->result.buf_size, + search->result.items_in_buffer); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->request.flags |= SSDFS_BTREE_SEARCH_INLINE_BUF_HAS_NEW_ITEM; + + return 0; +} diff --git a/fs/ssdfs/extents_tree.h b/fs/ssdfs/extents_tree.h new file mode 100644 index 000000000000..a1524ff21f36 --- /dev/null +++ b/fs/ssdfs/extents_tree.h @@ -0,0 +1,171 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/extents_tree.h - extents tree declarations. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#ifndef _SSDFS_EXTENTS_TREE_H +#define _SSDFS_EXTENTS_TREE_H + +#define SSDFS_COMMIT_QUEUE_DEFAULT_CAPACITY (16) +#define SSDFS_COMMIT_QUEUE_THRESHOLD (32) + +/* + * struct ssdfs_commit_queue - array of segment IDs + * @ids: array of segment IDs + * @count: number of items in the queue + * @capacity: maximum number of available positions in the queue + */ +struct ssdfs_commit_queue { + u64 *ids; + u32 count; + u32 capacity; +}; + +/* + * struct ssdfs_extents_btree_info - extents btree info + * @type: extents btree type + * @state: extents btree state + * @forks_count: count of the forks in the whole extents tree + * @lock: extents btree lock + * @generic_tree: pointer on generic btree object + * @inline_forks: pointer on inline forks array + * @buffer.tree: piece of memory for generic btree object + * @buffer.forks: piece of memory for the inline forks + * @root: pointer on root node + * @root_buffer: buffer for root node + * @updated_segs: updated segments queue + * @desc: b-tree descriptor + * @owner: pointer on owner inode object + * @fsi: pointer on shared file system object + * + * A newly created inode tries to store extents into inline + * forks. Every fork contains three extents. The raw on-disk + * inode has internal private area that is able to contain the + * two inline forks or root node of extents btree and extended + * attributes btree. If inode hasn't extended attributes and + * the amount of extents are lesser than six then everithing + * can be stored inside of inline forks. Otherwise, the real + * extents btree should be created. + */ +struct ssdfs_extents_btree_info { + atomic_t type; + atomic_t state; + atomic64_t forks_count; + + struct rw_semaphore lock; + struct ssdfs_btree *generic_tree; + struct ssdfs_raw_fork *inline_forks; + union { + struct ssdfs_btree tree; + struct ssdfs_raw_fork forks[SSDFS_INLINE_FORKS_COUNT]; + } buffer; + struct ssdfs_btree_inline_root_node *root; + struct ssdfs_btree_inline_root_node root_buffer; + struct ssdfs_commit_queue updated_segs; + + struct ssdfs_extents_btree_descriptor desc; + struct ssdfs_inode_info *owner; + struct ssdfs_fs_info *fsi; +}; + +/* Extents tree types */ +enum { + SSDFS_EXTENTS_BTREE_UNKNOWN_TYPE, + SSDFS_INLINE_FORKS_ARRAY, + SSDFS_PRIVATE_EXTENTS_BTREE, + SSDFS_EXTENTS_BTREE_TYPE_MAX +}; + +/* Extents tree states */ +enum { + SSDFS_EXTENTS_BTREE_UNKNOWN_STATE, + SSDFS_EXTENTS_BTREE_CREATED, + SSDFS_EXTENTS_BTREE_INITIALIZED, + SSDFS_EXTENTS_BTREE_DIRTY, + SSDFS_EXTENTS_BTREE_CORRUPTED, + SSDFS_EXTENTS_BTREE_STATE_MAX +}; + +/* + * Extents tree API + */ +int ssdfs_extents_tree_create(struct ssdfs_fs_info *fsi, + struct ssdfs_inode_info *ii); +int ssdfs_extents_tree_init(struct ssdfs_fs_info *fsi, + struct ssdfs_inode_info *ii); +void ssdfs_extents_tree_destroy(struct ssdfs_inode_info *ii); +int ssdfs_extents_tree_flush(struct ssdfs_fs_info *fsi, + struct ssdfs_inode_info *ii); +int ssdfs_extents_tree_add_updated_seg_id(struct ssdfs_extents_btree_info *tree, + u64 seg_id); + +int ssdfs_prepare_volume_extent(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req); +int ssdfs_recommend_migration_extent(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + struct ssdfs_zone_fragment *fragment); +bool ssdfs_extents_tree_has_logical_block(u64 blk_offset, struct inode *inode); +int ssdfs_extents_tree_add_extent(struct inode *inode, + struct ssdfs_segment_request *req); +int ssdfs_extents_tree_move_extent(struct ssdfs_extents_btree_info *tree, + u64 blk, + struct ssdfs_raw_extent *old_extent, + struct ssdfs_raw_extent *new_extent, + struct ssdfs_btree_search *search); +int ssdfs_extents_tree_truncate(struct inode *inode); + +/* + * Extents tree internal API + */ +int ssdfs_extents_tree_find_fork(struct ssdfs_extents_btree_info *tree, + u64 blk, + struct ssdfs_btree_search *search); +int __ssdfs_extents_tree_add_extent(struct ssdfs_extents_btree_info *tree, + u64 blk, + struct ssdfs_raw_extent *extent, + struct ssdfs_btree_search *search); +int ssdfs_extents_tree_change_extent(struct ssdfs_extents_btree_info *tree, + u64 blk, + struct ssdfs_raw_extent *extent, + struct ssdfs_btree_search *search); +int ssdfs_extents_tree_truncate_extent(struct ssdfs_extents_btree_info *tree, + u64 blk, u32 new_len, + struct ssdfs_btree_search *search); +int ssdfs_extents_tree_delete_extent(struct ssdfs_extents_btree_info *tree, + u64 blk, + struct ssdfs_btree_search *search); +int ssdfs_extents_tree_delete_all(struct ssdfs_extents_btree_info *tree); +int __ssdfs_extents_btree_node_get_fork(struct pagevec *pvec, + u32 area_offset, + u32 area_size, + u32 node_size, + u16 item_index, + struct ssdfs_raw_fork *fork); + +void ssdfs_debug_extents_btree_object(struct ssdfs_extents_btree_info *tree); + +/* + * Extents btree specialized operations + */ +extern const struct ssdfs_btree_descriptor_operations + ssdfs_extents_btree_desc_ops; +extern const struct ssdfs_btree_operations ssdfs_extents_btree_ops; +extern const struct ssdfs_btree_node_operations ssdfs_extents_btree_node_ops; + +#endif /* _SSDFS_EXTENTS_TREE_H */ From patchwork Sat Feb 25 01:09:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151972 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49370C7EE30 for ; Sat, 25 Feb 2023 01:20:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229780AbjBYBUE (ORCPT ); Fri, 24 Feb 2023 20:20:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49364 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229788AbjBYBTL (ORCPT ); Fri, 24 Feb 2023 20:19:11 -0500 Received: from mail-oi1-x236.google.com (mail-oi1-x236.google.com [IPv6:2607:f8b0:4864:20::236]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EF90AD316 for ; Fri, 24 Feb 2023 17:17:57 -0800 (PST) Received: by mail-oi1-x236.google.com with SMTP id be35so816377oib.4 for ; Fri, 24 Feb 2023 17:17:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=UXFenPEZv/xsp4VR91nUmDjMApQXD/3LAd9p3F01Vj8=; b=qW7A7ggiY2Nn+PoPpY7SljWqqVqJRmfyPf+SobpmZlv1vSBdeNbY9CvJJmEIUkD9qg KmUIqwBjTWijIhFfTTlQ7q2AKYsZ6wM9W0sUBDg5/1AjzVxCC4rRhBiJ35qIFFqz4g37 0jF1uLGIsk4/y12tbc30rORddiOpcUred3mt1JQKQnRRVCF62A/6CHzo7NZipX0+vAR+ IOt5tWc1MKTM1JvCusO2xAwgTDp0oHD+VV1LvgIzvP880AyMNBa6/Qz4tF0iEXSRfqj+ e+ppVvVwl/BAR2EPrLlwJ6xUN0qrtSuvev2+oiWkM9ZCe91DVT7QAKklGOpC2JSZ7FCN PdWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UXFenPEZv/xsp4VR91nUmDjMApQXD/3LAd9p3F01Vj8=; b=gr4Svi63PpnupI+9ivvvCGmW8O7DAAsbKsVu7YclBgAspZ3meW2Xx2LHIRTaH0Oan5 wbpKMViEqBN6EuzmGh5tU+WU3V5mGbauZBqFH867/6GqT8kRiCbTrnbgEw1GW8T2WIy1 4JyblAGHUKQBY04DML9NXLs+xaSMRN1/JVT1My2R/+U48bRsCwJeRK10MOqDq9VZCovI eFcGvBRlLWZUaEYGFjBR9+ok6ToLuT8fVrAQN6qeOFxSzX1QK1o6ZuWQm1oHJXnVcpM5 BpRrULVv6zoeM6mDs/B0xeLOjqRpSw8OZZkeGl/ADNFZnTdYHWTqE3j+M2OEqD0EGmiY RjjQ== X-Gm-Message-State: AO0yUKUqF4HsY97gQTAzt92ROqm2r9wPXYg3T4PpiIs7rYfecAKZjUCx x017mHCOh1JukM774Dfy+GbM7p5l5sNCv4B/ X-Google-Smtp-Source: AK7set+x7WMaLlZoZoMKPOLpyPSVB/lpYx7vl0Zq8pjIJMhPOGPF60adMlU6G3Nu3v2mikWDXy9bCA== X-Received: by 2002:a05:6808:de2:b0:383:f065:8122 with SMTP id g34-20020a0568080de200b00383f0658122mr2331887oic.33.1677287876781; Fri, 24 Feb 2023 17:17:56 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:55 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 67/76] ssdfs: extents b-tree specialized operations Date: Fri, 24 Feb 2023 17:09:18 -0800 Message-Id: <20230225010927.813929-68-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Extents b-tree implements API: (1) create - create extents b-tree (2) destroy - destroy extents b-tree (3) flush - flush dirty extents b-tree (4) prepare_volume_extent - convert requested offset into extent (5) recommend_migration_extent - find extent recommended for migration (6) add_extent - add extent into the extents b-tree (7) move_extent - move extent from one segment into another one (8) truncate - truncate extent b-tree Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/extents_tree.c | 3519 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 3519 insertions(+) diff --git a/fs/ssdfs/extents_tree.c b/fs/ssdfs/extents_tree.c index a13e7d773e7d..f978ef0cca12 100644 --- a/fs/ssdfs/extents_tree.c +++ b/fs/ssdfs/extents_tree.c @@ -3368,3 +3368,3522 @@ int ssdfs_prepare_empty_fork(u64 blk, return 0; } + +/* + * ssdfs_extents_tree_add_inline_fork() - add the inline fork into the tree + * @tree: extents tree + * @search: search object + * + * This method tries to add the inline fork into the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOSPC - inline tree hasn't room for the new fork. + * %-EEXIST - fork exists in the tree. + */ +static +int ssdfs_extents_tree_add_inline_fork(struct ssdfs_extents_btree_info *tree, + struct ssdfs_btree_search *search) +{ + struct ssdfs_raw_fork *cur; + size_t fork_size = sizeof(struct ssdfs_raw_fork); + s64 forks_count, forks_capacity; + int private_flags; + u64 start_hash; + u16 start_index; + u64 hash1, hash2; + u64 len; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, search %p\n", + tree, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_FORKS_ARRAY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's type %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + + switch (atomic_read(&tree->state)) { + case SSDFS_EXTENTS_BTREE_CREATED: + case SSDFS_EXTENTS_BTREE_INITIALIZED: + case SSDFS_EXTENTS_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + if (!tree->inline_forks) { + SSDFS_ERR("empty inline tree %p\n", + tree->inline_forks); + return -ERANGE; + } + + forks_count = atomic64_read(&tree->forks_count); + + if (!tree->owner) { + SSDFS_ERR("empty owner inode\n"); + return -ERANGE; + } + + private_flags = atomic_read(&tree->owner->private_flags); + + forks_capacity = SSDFS_INLINE_FORKS_COUNT; + if (private_flags & SSDFS_INODE_HAS_XATTR_BTREE) + forks_capacity--; + if (private_flags & SSDFS_INODE_HAS_EXTENTS_BTREE) { + SSDFS_ERR("the extents tree is generic\n"); + return -ERANGE; + } + + if (forks_count > forks_capacity) { + SSDFS_WARN("extents tree is corrupted: " + "forks_count %llu, forks_capacity %llu\n", + (u64)forks_count, (u64)forks_capacity); + atomic_set(&tree->state, SSDFS_EXTENTS_BTREE_CORRUPTED); + return -ERANGE; + } else if (forks_count == forks_capacity) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("inline tree hasn't room for the new fork: " + "forks_count %llu, forks_capacity %llu\n", + (u64)forks_count, (u64)forks_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid search result's state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.buf_state != SSDFS_BTREE_SEARCH_INLINE_BUFFER) { + SSDFS_ERR("invalid buf_state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + start_hash = search->request.start.hash; + if (start_hash != le64_to_cpu(search->raw.fork.start_offset)) { + SSDFS_ERR("corrupted fork: " + "start_hash %llx, " + "fork (start %llu, blks_count %llu)\n", + start_hash, + le64_to_cpu(search->raw.fork.start_offset), + le64_to_cpu(search->raw.fork.blks_count)); + return -ERANGE; + } + + start_index = search->result.start_index; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_index %u, forks_count %lld, " + "forks_capacity %lld\n", + start_index, forks_count, forks_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (forks_count == 0) { + if (start_index != 0) { + SSDFS_ERR("invalid start_index %u\n", + start_index); + return -ERANGE; + } + + cur = &tree->inline_forks[start_index]; + + ssdfs_memcpy(cur, 0, fork_size, + &search->raw.fork, 0, fork_size, + fork_size); + } else { + if (start_index >= forks_capacity) { + SSDFS_ERR("start_index %u >= forks_capacity %llu\n", + start_index, (u64)forks_capacity); + return -ERANGE; + } + + cur = &tree->inline_forks[start_index]; + + if ((start_index + 1) <= forks_count) { + err = ssdfs_memmove(tree->inline_forks, + (start_index + 1) * fork_size, + forks_capacity * fork_size, + tree->inline_forks, + start_index * fork_size, + forks_capacity * fork_size, + (forks_count - start_index) * + fork_size); + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", err); + return err; + } + + ssdfs_memcpy(cur, 0, fork_size, + &search->raw.fork, 0, fork_size, + fork_size); + + hash1 = le64_to_cpu(search->raw.fork.start_offset); + len = le64_to_cpu(search->raw.fork.blks_count); + + cur = &tree->inline_forks[start_index + 1]; + hash2 = le64_to_cpu(cur->start_offset); + + if (!((hash1 + len) <= hash2)) { + SSDFS_WARN("fork is corrupted: " + "hash1 %llu, len %llu, " + "hash2 %llu\n", + hash1, len, hash2); + atomic_set(&tree->state, + SSDFS_EXTENTS_BTREE_CORRUPTED); + return -ERANGE; + } + } else { + ssdfs_memcpy(cur, 0, fork_size, + &search->raw.fork, 0, fork_size, + fork_size); + + if (start_index > 0) { + cur = &tree->inline_forks[start_index - 1]; + + hash1 = le64_to_cpu(cur->start_offset); + len = le64_to_cpu(cur->blks_count); + hash2 = + le64_to_cpu(search->raw.fork.start_offset); + + if (!((hash1 + len) <= hash2)) { + SSDFS_WARN("fork is corrupted: " + "hash1 %llu, len %llu, " + "hash2 %llu\n", + hash1, len, hash2); + atomic_set(&tree->state, + SSDFS_EXTENTS_BTREE_CORRUPTED); + return -ERANGE; + } + } + } + } + + forks_count = atomic64_inc_return(&tree->forks_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("forks_count %lld\n", forks_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (forks_count > forks_capacity) { + SSDFS_WARN("forks_count is too much: " + "count %lld, capacity %lld\n", + forks_count, forks_capacity); + atomic_set(&tree->state, SSDFS_EXTENTS_BTREE_CORRUPTED); + return -ERANGE; + } + + atomic_set(&tree->state, SSDFS_EXTENTS_BTREE_DIRTY); + return 0; +} + +/* + * ssdfs_extents_tree_add_fork() - add the fork into the tree + * @tree: extents tree + * @search: search object + * + * This method tries to add the generic fork into the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EEXIST - fork exists in the tree. + */ +static +int ssdfs_extents_tree_add_fork(struct ssdfs_extents_btree_info *tree, + struct ssdfs_btree_search *search) +{ + s64 forks_count; + u64 start_hash; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, search %p\n", + tree, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->type)) { + case SSDFS_PRIVATE_EXTENTS_BTREE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's type %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + + switch (atomic_read(&tree->state)) { + case SSDFS_EXTENTS_BTREE_CREATED: + case SSDFS_EXTENTS_BTREE_INITIALIZED: + case SSDFS_EXTENTS_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + if (!tree->generic_tree) { + SSDFS_ERR("empty generic tree %p\n", + tree->generic_tree); + return -ERANGE; + } + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + case SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE: + case SSDFS_BTREE_SEARCH_OBSOLETE_RESULT: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid search result's state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.buf_state != SSDFS_BTREE_SEARCH_INLINE_BUFFER) { + SSDFS_ERR("invalid buf_state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + start_hash = search->request.start.hash; + if (start_hash != le64_to_cpu(search->raw.fork.start_offset)) { + SSDFS_ERR("corrupted fork: " + "start_hash %llx, " + "fork (start %llu, blks_count %llu)\n", + start_hash, + le64_to_cpu(search->raw.fork.start_offset), + le64_to_cpu(search->raw.fork.blks_count)); + return -ERANGE; + } + + err = ssdfs_btree_add_item(tree->generic_tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to add the fork into the tree: " + "err %d\n", err); + return err; + } + + forks_count = atomic64_read(&tree->forks_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("forks_count %lld\n", forks_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (forks_count >= S64_MAX) { + SSDFS_WARN("forks_count is too much\n"); + atomic_set(&tree->state, SSDFS_EXTENTS_BTREE_CORRUPTED); + return -ERANGE; + } + + err = ssdfs_btree_synchronize_root_node(tree->generic_tree, + tree->root); + if (unlikely(err)) { + SSDFS_ERR("fail to synchronize the root node: " + "err %d\n", err); + return err; + } + + atomic_set(&tree->state, SSDFS_EXTENTS_BTREE_DIRTY); + return 0; +} + +/* + * ssdfs_invalidate_inline_tail_forks() - invalidate inline tail forks + * @tree: extents tree + * @search: search object + * + * This method tries to invalidate inline tail forks. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_invalidate_inline_tail_forks(struct ssdfs_extents_btree_info *tree, + struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_shared_extents_tree *shextree; + struct ssdfs_raw_fork *cur; + ino_t ino; + s64 forks_count; + int lower_bound, upper_bound; + int i, j; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !tree->fsi || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, search %p\n", + tree, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = tree->fsi; + shextree = fsi->shextree; + + if (!shextree) { + SSDFS_ERR("shared extents tree is absent\n"); + return -ERANGE; + } + + if (search->request.type != SSDFS_BTREE_SEARCH_INVALIDATE_TAIL) { + SSDFS_DBG("nothing should be done\n"); + return 0; + } + + ino = tree->owner->vfs_inode.i_ino; + forks_count = atomic64_read(&tree->forks_count); + + lower_bound = search->result.start_index + 1; + upper_bound = forks_count - 1; + + for (i = upper_bound; i >= lower_bound; i--) { + u64 calculated = 0; + u64 blks_count; + + cur = &tree->inline_forks[i]; + + if (atomic64_read(&tree->forks_count) == 0) { + SSDFS_ERR("invalid forks_count\n"); + return -ERANGE; + } else + atomic64_dec(&tree->forks_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("forks_count %lld\n", + atomic64_read(&tree->forks_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + blks_count = le64_to_cpu(cur->blks_count); + if (blks_count == 0 || blks_count >= U64_MAX) { + memset(cur, 0xFF, sizeof(struct ssdfs_raw_fork)); + continue; + } + + for (j = SSDFS_INLINE_EXTENTS_COUNT - 1; j >= 0; j--) { + struct ssdfs_raw_extent *extent; + u32 len; + + extent = &cur->extents[j]; + len = le32_to_cpu(extent->len); + + if (len == 0 || len >= U32_MAX) { + memset(extent, 0xFF, + sizeof(struct ssdfs_raw_extent)); + continue; + } + + if ((calculated + len) > blks_count) { + atomic_set(&tree->state, + SSDFS_EXTENTS_BTREE_CORRUPTED); + SSDFS_ERR("corrupted extent: " + "calculated %llu, len %u, " + "blks %llu\n", + calculated, len, blks_count); + return -ERANGE; + } + + err = ssdfs_shextree_add_pre_invalid_extent(shextree, + ino, + extent); + if (unlikely(err)) { + SSDFS_ERR("fail to add pre-invalid " + "(seg_id %llu, blk %u, " + "len %u), err %d\n", + le64_to_cpu(extent->seg_id), + le32_to_cpu(extent->logical_blk), + len, err); + return err; + } + + calculated += len; + + memset(extent, 0xFF, sizeof(struct ssdfs_raw_extent)); + } + + if (calculated != blks_count) { + atomic_set(&tree->state, + SSDFS_EXTENTS_BTREE_CORRUPTED); + SSDFS_ERR("calculated %llu != blks %llu\n", + calculated, blks_count); + return -ERANGE; + } + } + + return 0; +} + +/* + * ssdfs_extents_tree_change_inline_fork() - change inline fork + * @tree: extents tree + * @search: search object + * + * This method tries to change the existing inline fork. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - fork doesn't exist in the tree. + */ +static +int ssdfs_extents_tree_change_inline_fork(struct ssdfs_extents_btree_info *tree, + struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_shared_extents_tree *shextree; + struct ssdfs_raw_fork *cur; + size_t fork_size = sizeof(struct ssdfs_raw_fork); + ino_t ino; + u64 start_hash; + int private_flags; + s64 forks_count, forks_capacity; + u16 start_index; + u64 start_offset; + u64 blks_count; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !tree->fsi || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, search %p\n", + tree, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = tree->fsi; + shextree = fsi->shextree; + + if (!shextree) { + SSDFS_ERR("shared extents tree is absent\n"); + return -ERANGE; + } + + ino = tree->owner->vfs_inode.i_ino; + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_FORKS_ARRAY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's type %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + + switch (atomic_read(&tree->state)) { + case SSDFS_EXTENTS_BTREE_CREATED: + case SSDFS_EXTENTS_BTREE_INITIALIZED: + case SSDFS_EXTENTS_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + if (!tree->inline_forks) { + SSDFS_ERR("empty inline tree %p\n", + tree->inline_forks); + return -ERANGE; + } + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_VALID_ITEM: + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid search object state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.buf_state != SSDFS_BTREE_SEARCH_INLINE_BUFFER) { + SSDFS_ERR("invalid buf_state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + start_hash = search->request.start.hash; + start_offset = le64_to_cpu(search->raw.fork.start_offset); + blks_count = le64_to_cpu(search->raw.fork.blks_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_hash %llx, fork (start %llu, blks_count %llu)\n", + start_hash, start_offset, blks_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_INVALIDATE_TAIL: + if ((start_offset + blks_count) != start_hash) { + SSDFS_ERR("invalid request: " + "start_hash %llx, " + "fork (start %llu, blks_count %llu)\n", + start_hash, start_offset, blks_count); + return -ERANGE; + } + break; + + default: + if (start_hash < start_offset || + start_hash >= (start_offset + blks_count)) { + SSDFS_ERR("corrupted fork: " + "start_hash %llx, " + "fork (start %llu, blks_count %llu)\n", + start_hash, start_offset, blks_count); + return -ERANGE; + } + break; + } + + forks_count = atomic64_read(&tree->forks_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("BEFORE: forks_count %lld\n", + atomic64_read(&tree->forks_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!tree->owner) { + SSDFS_ERR("empty owner inode\n"); + return -ERANGE; + } + + private_flags = atomic_read(&tree->owner->private_flags); + + forks_capacity = SSDFS_INLINE_FORKS_COUNT; + if (private_flags & SSDFS_INODE_HAS_XATTR_BTREE) + forks_capacity--; + if (private_flags & SSDFS_INODE_HAS_EXTENTS_BTREE) { + SSDFS_ERR("the extents tree is generic\n"); + return -ERANGE; + } + + if (forks_count > forks_capacity) { + SSDFS_WARN("extents tree is corrupted: " + "forks_count %lld, forks_capacity %lld\n", + forks_count, forks_capacity); + atomic_set(&tree->state, SSDFS_EXTENTS_BTREE_CORRUPTED); + return -ERANGE; + } else if (forks_count == 0) { + SSDFS_ERR("empty tree\n"); + return -ENODATA; + } + + start_index = search->result.start_index; + + if (start_index >= forks_count) { + SSDFS_ERR("start_index %u >= forks_count %lld\n", + start_index, forks_count); + return -ENODATA; + } + + cur = &tree->inline_forks[start_index]; + ssdfs_memcpy(cur, 0, fork_size, + &search->raw.fork, 0, fork_size, + fork_size); + atomic_set(&tree->state, SSDFS_EXTENTS_BTREE_DIRTY); + + err = ssdfs_invalidate_inline_tail_forks(tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate inline tail forks: " + "err %d\n", err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("AFTER: forks_count %lld\n", + atomic64_read(&tree->forks_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_extents_tree_change_fork() - change the fork + * @tree: extents tree + * @search: search object + * + * This method tries to change the existing generic fork. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - fork doesn't exist in the tree. + */ +static +int ssdfs_extents_tree_change_fork(struct ssdfs_extents_btree_info *tree, + struct ssdfs_btree_search *search) +{ + u64 start_hash; + u64 start_offset; + u64 blks_count; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, search %p\n", + tree, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->type)) { + case SSDFS_PRIVATE_EXTENTS_BTREE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's type %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + + switch (atomic_read(&tree->state)) { + case SSDFS_EXTENTS_BTREE_CREATED: + case SSDFS_EXTENTS_BTREE_INITIALIZED: + case SSDFS_EXTENTS_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + if (!tree->generic_tree) { + SSDFS_ERR("empty generic tree %p\n", + tree->generic_tree); + return -ERANGE; + } + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_VALID_ITEM: + /* continue logic */ + break; + + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + /* extent has been merged into the existing fork */ + search->result.state = SSDFS_BTREE_SEARCH_VALID_ITEM; + search->result.err = 0; + break; + + default: + SSDFS_ERR("invalid search object state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.buf_state != SSDFS_BTREE_SEARCH_INLINE_BUFFER) { + SSDFS_ERR("invalid buf_state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + start_hash = search->request.start.hash; + start_offset = le64_to_cpu(search->raw.fork.start_offset); + blks_count = le64_to_cpu(search->raw.fork.blks_count); + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_CHANGE_ITEM: + if (start_hash < start_offset || + start_hash >= (start_offset + blks_count)) { + SSDFS_ERR("corrupted fork: " + "start_hash %llx, " + "fork (start %llu, blks_count %llu)\n", + start_hash, start_offset, blks_count); + return -ERANGE; + } + break; + + case SSDFS_BTREE_SEARCH_INVALIDATE_TAIL: + if (start_hash != (start_offset + blks_count)) { + SSDFS_ERR("corrupted fork: " + "start_hash %llx, " + "fork (start %llu, blks_count %llu)\n", + start_hash, start_offset, blks_count); + return -ERANGE; + } + break; + + default: + SSDFS_ERR("unexpected request type %#x\n", + search->request.type); + return -ERANGE; + } + + err = ssdfs_btree_change_item(tree->generic_tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to change the fork into the tree: " + "err %d\n", err); + return err; + } + + atomic_set(&tree->state, SSDFS_EXTENTS_BTREE_DIRTY); + return 0; +} + +/* + * ssdfs_extents_tree_add_extent_nolock() - add extent into the tree + * @tree: extents tree + * @blk: logical block number + * @extent: new extent + * @search: search object + * + * This method tries to add @extent into the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EEXIST - extent exists in the tree. + */ +static +int ssdfs_extents_tree_add_extent_nolock(struct ssdfs_extents_btree_info *tree, + u64 blk, + struct ssdfs_raw_extent *extent, + struct ssdfs_btree_search *search) +{ + s64 forks_count; + u32 init_flags = SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT; + u32 len; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !extent || !search); + BUG_ON(!tree->owner); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, search %p, blk %llu\n", + tree, search, blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + forks_count = atomic64_read(&tree->forks_count); + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_FORKS_ARRAY: + err = ssdfs_extents_tree_find_inline_fork(tree, blk, search); + if (err == -ENOENT) { + /* + * Fork doesn't exist for requested extent. + * It needs to create a new fork. + */ + } else if (err == -ENODATA) { + /* + * Fork doesn't contain the requested extent. + * It needs to add a new extent. + */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to find the inline fork: " + "blk %llu, err %d\n", + blk, err); + goto finish_add_extent; + } else { + err = -EEXIST; + SSDFS_ERR("block exists already: " + "blk %llu, err %d\n", + blk, err); + goto finish_add_extent; + } + + if (err == -ENOENT) { +add_new_inline_fork: + ssdfs_debug_btree_search_object(search); + + if (forks_count > 0) + search->result.start_index += 1; + + err = ssdfs_prepare_empty_fork(blk, search); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare empty fork: " + "err %d\n", + err); + goto finish_add_extent; + } + + ssdfs_debug_btree_search_object(search); + + err = ssdfs_add_extent_into_fork(blk, extent, search); + if (unlikely(err)) { + SSDFS_ERR("fail to add extent into fork: " + "err %d\n", + err); + goto finish_add_extent; + } + + ssdfs_debug_btree_search_object(search); + + search->request.type = SSDFS_BTREE_SEARCH_ADD_ITEM; + err = ssdfs_extents_tree_add_inline_fork(tree, search); + if (err == -ENOSPC) { + err = ssdfs_migrate_inline2generic_tree(tree); + if (unlikely(err)) { + SSDFS_ERR("fail to migrate the tree: " + "err %d\n", + err); + goto finish_add_extent; + } else { + ssdfs_btree_search_init(search); + search->request.type = + SSDFS_BTREE_SEARCH_FIND_ITEM; + search->request.flags = init_flags; + search->request.start.hash = blk; + search->request.end.hash = blk + len - 1; + search->request.count = 1; + goto try_to_add_into_generic_tree; + } + } else if (unlikely(err)) { + SSDFS_ERR("fail to add fork: err %d\n", err); + goto finish_add_extent; + } + } else { + err = ssdfs_add_extent_into_fork(blk, extent, search); + if (err == -ENOSPC) { + /* try to add a new fork */ + goto add_new_inline_fork; + } else if (unlikely(err)) { + SSDFS_ERR("fail to add extent into fork: " + "err %d\n", + err); + goto finish_add_extent; + } + + search->request.type = SSDFS_BTREE_SEARCH_CHANGE_ITEM; + err = ssdfs_extents_tree_change_inline_fork(tree, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to change fork: err %d\n", err); + goto finish_add_extent; + } + } + break; + + case SSDFS_PRIVATE_EXTENTS_BTREE: +try_to_add_into_generic_tree: + err = ssdfs_btree_find_item(tree->generic_tree, search); + if (err == -ENOENT) { + /* + * Fork doesn't exist for requested extent. + * It needs to create a new fork. + */ + } else if (err == -ENODATA) { + /* + * Fork doesn't contain the requested extent. + * It needs to add a new extent. + */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to find the fork: " + "blk %llu, err %d\n", + blk, err); + goto finish_add_extent; + } else { + err = -EEXIST; + SSDFS_ERR("block exists already: " + "blk %llu, err %d\n", + blk, err); + goto finish_add_extent; + } + + if (err == -ENOENT) { +add_new_generic_fork: + ssdfs_debug_btree_search_object(search); + + if (forks_count > 0) + search->result.start_index += 1; + + err = ssdfs_prepare_empty_fork(blk, search); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare empty fork: " + "err %d\n", + err); + goto finish_add_extent; + } + + err = ssdfs_add_extent_into_fork(blk, extent, search); + if (unlikely(err)) { + SSDFS_ERR("fail to add extent into fork: " + "err %d\n", + err); + goto finish_add_extent; + } + + search->request.type = SSDFS_BTREE_SEARCH_ADD_ITEM; + err = ssdfs_extents_tree_add_fork(tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to add fork: err %d\n", err); + goto finish_add_extent; + } + } else { + err = ssdfs_add_extent_into_fork(blk, extent, search); + if (err == -ENOSPC || err == -ENODATA) { + /* try to add a new fork */ + goto add_new_generic_fork; + } else if (unlikely(err)) { + SSDFS_ERR("fail to add extent into fork: " + "err %d\n", + err); + goto finish_add_extent; + } + + search->request.type = SSDFS_BTREE_SEARCH_CHANGE_ITEM; + err = ssdfs_extents_tree_change_fork(tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to change fork: err %d\n", err); + goto finish_add_extent; + } + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid extents tree type %#x\n", + atomic_read(&tree->type)); + goto finish_add_extent; + } + +finish_add_extent: + return err; +} + +/* + * __ssdfs_extents_tree_add_extent() - add extent into the tree + * @tree: extents tree + * @blk: logical block number + * @extent: new extent + * @search: search object + * + * This method tries to add @extent into the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EEXIST - extent exists in the tree. + */ +int __ssdfs_extents_tree_add_extent(struct ssdfs_extents_btree_info *tree, + u64 blk, + struct ssdfs_raw_extent *extent, + struct ssdfs_btree_search *search) +{ + struct ssdfs_inode_info *ii; + u32 init_flags = SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT; + u32 len; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !extent || !search); + BUG_ON(!tree->owner); + + SSDFS_DBG("tree %p, search %p, blk %llu\n", + tree, search, blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + ii = tree->owner; + + down_write(&ii->lock); + down_write(&tree->lock); + + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + len = le32_to_cpu(extent->len); + + if (len == 0) { + err = -ERANGE; + SSDFS_ERR("empty extent\n"); + goto finish_add_extent; + } + + if (need_initialize_extent_btree_search(blk, search)) { + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + search->request.flags = init_flags; + search->request.start.hash = blk; + search->request.end.hash = blk + len - 1; + /* no information about forks count */ + search->request.count = 0; + } + + ssdfs_debug_btree_search_object(search); + + switch (atomic_read(&tree->state)) { + case SSDFS_EXTENTS_BTREE_CREATED: + case SSDFS_EXTENTS_BTREE_INITIALIZED: + case SSDFS_EXTENTS_BTREE_DIRTY: + /* expected state */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid extent tree's state %#x\n", + atomic_read(&tree->state)); + goto finish_add_extent; + }; + + err = ssdfs_extents_tree_add_extent_nolock(tree, blk, extent, search); + if (unlikely(err)) { + SSDFS_ERR("fail to add extent: " + "blk %llu, err %d\n", + blk, err); + goto finish_add_extent; + } + +finish_add_extent: + up_write(&tree->lock); + up_write(&ii->lock); + + ssdfs_btree_search_forget_parent_node(search); + ssdfs_btree_search_forget_child_node(search); + + ssdfs_debug_extents_btree_object(tree); + + return err; +} + +/* + * ssdfs_change_extent_in_fork() - change extent in the fork + * @blk: logical block number + * @extent: extent object + * @search: search object + * + * This method tries to change @extent in the fork. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - extent doesn't exist in the fork. + */ +static +int ssdfs_change_extent_in_fork(u64 blk, + struct ssdfs_raw_extent *extent, + struct ssdfs_btree_search *search) +{ + struct ssdfs_raw_fork *fork; + struct ssdfs_raw_extent *cur_extent = NULL; + struct ssdfs_raw_extent buf; + u64 start_offset; + u64 blks_count; + u32 len1, len2, len_diff; + u64 cur_blk; + int i; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!extent || !search); + + SSDFS_DBG("blk %llu, extent %p, search %p\n", + blk, extent, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_EMPTY_RESULT: + SSDFS_DBG("no fork in search object\n"); + return -ENODATA; + + case SSDFS_BTREE_SEARCH_VALID_ITEM: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid search object state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.buf_state != SSDFS_BTREE_SEARCH_INLINE_BUFFER) { + SSDFS_ERR("invalid search buffer state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + if (search->result.buf_size != sizeof(struct ssdfs_raw_fork) || + search->result.items_in_buffer != 1) { + SSDFS_ERR("invalid search buffer state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + fork = &search->raw.fork; + start_offset = le64_to_cpu(fork->start_offset); + blks_count = le64_to_cpu(fork->blks_count); + len1 = le32_to_cpu(extent->len); + + if (start_offset >= U64_MAX || blks_count >= U64_MAX) { + SSDFS_ERR("invalid fork state: " + "start_offset %llu, blks_count %llu\n", + start_offset, blks_count); + return -ERANGE; + } + + if (blk >= U64_MAX || len1 >= U32_MAX) { + SSDFS_ERR("invalid extent: " + "blk %llu, len %u\n", + blk, len1); + return -ERANGE; + } + + if (start_offset <= blk && blk < (start_offset + blks_count)) { + /* + * Expected state + */ + } else { + SSDFS_ERR("extent is out of fork: \n" + "fork (start %llu, blks_count %llu), " + "extent (blk %llu, len %u)\n", + start_offset, blks_count, + blk, len1); + return -ENODATA; + } + + cur_blk = le64_to_cpu(fork->start_offset); + for (i = 0; i < SSDFS_INLINE_EXTENTS_COUNT; i++) { + len2 = le32_to_cpu(fork->extents[i].len); + + if (cur_blk == blk) { + /* extent is found */ + cur_extent = &fork->extents[i]; + break; + } else if (blk < cur_blk) { + SSDFS_ERR("invalid extent: " + "blk %llu, cur_blk %llu\n", + blk, cur_blk); + return -ERANGE; + } else if (len2 >= U32_MAX || len2 == 0) { + /* empty extent */ + break; + } else { + /* it needs to check the next extent */ + cur_blk += len2; + } + } + + if (!cur_extent) { + SSDFS_ERR("fail to find the extent: blk %llu\n", + blk); + return -ENODATA; + } + + if (le32_to_cpu(extent->len) == 0) { + SSDFS_ERR("empty extent: " + "seg_id %llu, logical_blk %u, len %u\n", + le64_to_cpu(extent->seg_id), + le32_to_cpu(extent->logical_blk), + le32_to_cpu(extent->len)); + return -ERANGE; + } + + ssdfs_memcpy(&buf, 0, sizeof(struct ssdfs_raw_extent), + cur_extent, 0, sizeof(struct ssdfs_raw_extent), + sizeof(struct ssdfs_raw_extent)); + ssdfs_memcpy(cur_extent, 0, sizeof(struct ssdfs_raw_extent), + extent, 0, sizeof(struct ssdfs_raw_extent), + sizeof(struct ssdfs_raw_extent)); + + len2 = le32_to_cpu(buf.len); + + if (len2 < len1) { + /* old extent is shorter */ + len_diff = len1 - len2; + blks_count += len_diff; + fork->blks_count = cpu_to_le64(blks_count); + } else if (len2 > len1) { + /* old extent is larger */ + len_diff = len2 - len1; + + if (blks_count <= len_diff) { + SSDFS_ERR("blks_count %llu <= len_diff %u\n", + blks_count, len_diff); + return -ERANGE; + } + + blks_count -= len_diff; + fork->blks_count = cpu_to_le64(blks_count); + } + + return 0; +} + +/* + * ssdfs_extents_tree_change_extent() - change extent in the tree + * @tree: extents tree + * @blk: logical block number + * @extent: extent object + * @search: search object + * + * This method tries to change @extent in the @tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - extent doesn't exist in the tree. + */ +int ssdfs_extents_tree_change_extent(struct ssdfs_extents_btree_info *tree, + u64 blk, + struct ssdfs_raw_extent *extent, + struct ssdfs_btree_search *search) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !extent || !search); + + SSDFS_DBG("tree %p, search %p, blk %llu\n", + tree, search, blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->state)) { + case SSDFS_EXTENTS_BTREE_CREATED: + case SSDFS_EXTENTS_BTREE_INITIALIZED: + case SSDFS_EXTENTS_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + + if (need_initialize_extent_btree_search(blk, search)) { + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + search->request.flags = + SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT; + search->request.start.hash = blk; + search->request.end.hash = blk; + search->request.count = 1; + } + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_FORKS_ARRAY: + down_write(&tree->lock); + + err = ssdfs_extents_tree_find_inline_fork(tree, blk, search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the inline fork: " + "blk %llu, err %d\n", + blk, err); + goto finish_change_inline_fork; + } + + err = ssdfs_change_extent_in_fork(blk, extent, search); + if (unlikely(err)) { + SSDFS_ERR("fail to change extent in fork: err %d\n", + err); + goto finish_change_inline_fork; + } + + search->request.type = SSDFS_BTREE_SEARCH_CHANGE_ITEM; + + err = ssdfs_extents_tree_change_inline_fork(tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to change inline fork: err %d\n", err); + goto finish_change_inline_fork; + } + +finish_change_inline_fork: + up_write(&tree->lock); + break; + + case SSDFS_PRIVATE_EXTENTS_BTREE: + down_read(&tree->lock); + + err = ssdfs_btree_find_item(tree->generic_tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the fork: " + "blk %llu, err %d\n", + blk, err); + goto finish_change_generic_fork; + } + + err = ssdfs_change_extent_in_fork(blk, extent, search); + if (unlikely(err)) { + SSDFS_ERR("fail to change extent in fork: err %d\n", + err); + goto finish_change_generic_fork; + } + + search->request.type = SSDFS_BTREE_SEARCH_CHANGE_ITEM; + + err = ssdfs_extents_tree_change_fork(tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to change fork: err %d\n", err); + goto finish_change_generic_fork; + } + +finish_change_generic_fork: + up_read(&tree->lock); + + ssdfs_btree_search_forget_parent_node(search); + ssdfs_btree_search_forget_child_node(search); + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid extents tree type %#x\n", + atomic_read(&tree->type)); + break; + } + + return err; +} + +/* + * ssdfs_shrink_found_extent() - shrink found extent + * @old_extent: old state of extent + * @found_extent: shrinking extent [in|out] + */ +static inline +int ssdfs_shrink_found_extent(struct ssdfs_raw_extent *old_extent, + struct ssdfs_raw_extent *found_extent) +{ + u64 seg_id1, seg_id2; + u32 logical_blk1, logical_blk2; + u32 len1, len2; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!old_extent || !found_extent); + + SSDFS_DBG("old_extent %p, found_extent %p\n", + old_extent, found_extent); +#endif /* CONFIG_SSDFS_DEBUG */ + + seg_id1 = le64_to_cpu(old_extent->seg_id); + logical_blk1 = le32_to_cpu(old_extent->logical_blk); + len1 = le32_to_cpu(old_extent->len); + + seg_id2 = le64_to_cpu(found_extent->seg_id); + logical_blk2 = le32_to_cpu(found_extent->logical_blk); + len2 = le32_to_cpu(found_extent->len); + + if (seg_id1 != seg_id2) { + SSDFS_ERR("invalid segment ID: " + "old_extent (seg_id %llu, " + "logical_blk %u, len %u), " + "found_extent (seg_id %llu, " + "logical_blk %u, len %u)\n", + seg_id1, logical_blk1, len1, + seg_id2, logical_blk2, len2); + return -ERANGE; + } + + if (logical_blk1 != logical_blk2) { + SSDFS_ERR("invalid old extent: " + "old_extent (seg_id %llu, " + "logical_blk %u, len %u), " + "found_extent (seg_id %llu, " + "logical_blk %u, len %u)\n", + seg_id1, logical_blk1, len1, + seg_id2, logical_blk2, len2); + return -ERANGE; + } else { + if (len1 > len2) { + SSDFS_ERR("invalid length of old extent: " + "old_extent (seg_id %llu, " + "logical_blk %u, len %u), " + "found_extent (seg_id %llu, " + "logical_blk %u, len %u)\n", + seg_id1, logical_blk1, len1, + seg_id2, logical_blk2, len2); + return -ERANGE; + } else if (len1 < len2) { + /* shrink extent */ + found_extent->len = cpu_to_le32(len2 - len1); + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("extent is empty: " + "old_extent (seg_id %llu, " + "logical_blk %u, len %u), " + "found_extent (seg_id %llu, " + "logical_blk %u, len %u)\n", + seg_id1, logical_blk1, len1, + seg_id2, logical_blk2, len2); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENODATA; + } + } + + return 0; +} + +/* + * ssdfs_delete_extent_in_fork() - delete extent from the fork + * @blk: logical block number + * @search: search object + * + * This method tries to delete extent from the fork. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - extent doesn't exist in the tree. + * %-EFAULT - fail to create the hole in the fork. + */ +static +int ssdfs_delete_extent_in_fork(u64 blk, + struct ssdfs_btree_search *search, + struct ssdfs_raw_extent *extent) +{ + struct ssdfs_raw_fork *fork; + struct ssdfs_raw_extent *cur_extent = NULL; + size_t extent_size = sizeof(struct ssdfs_raw_extent); + u64 start_offset; + u64 blks_count; + u64 cur_blk; + u32 len; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search || !extent); + + SSDFS_DBG("blk %llu, search %p\n", + blk, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_EMPTY_RESULT: + SSDFS_DBG("no fork in search object\n"); + return -ENODATA; + + case SSDFS_BTREE_SEARCH_VALID_ITEM: + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid search object state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.buf_state != SSDFS_BTREE_SEARCH_INLINE_BUFFER) { + SSDFS_ERR("invalid search buffer state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + if (search->result.buf_size != sizeof(struct ssdfs_raw_fork) || + search->result.items_in_buffer != 1) { + SSDFS_ERR("invalid search buffer state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + fork = &search->raw.fork; + start_offset = le64_to_cpu(fork->start_offset); + blks_count = le64_to_cpu(fork->blks_count); + + if (start_offset >= U64_MAX || blks_count >= U64_MAX) { + SSDFS_ERR("invalid fork state: " + "start_offset %llu, blks_count %llu\n", + start_offset, blks_count); + return -ENODATA; + } + + if (blk >= U64_MAX) { + SSDFS_ERR("invalid request: blk %llu\n", + blk); + return -ERANGE; + } + + if (start_offset <= blk && blk < (start_offset + blks_count)) { + /* + * Expected state + */ + } else { + SSDFS_ERR("blk %llu is out of fork\n", + blk); + return -ERANGE; + } + + cur_blk = le64_to_cpu(fork->start_offset); + for (i = 0; i < SSDFS_INLINE_EXTENTS_COUNT; i++) { + len = le32_to_cpu(fork->extents[i].len); + + if (cur_blk == blk) { + /* extent is found */ + cur_extent = &fork->extents[i]; + break; + } else if (blk < cur_blk) { + SSDFS_ERR("invalid extent: " + "blk %llu, cur_blk %llu\n", + blk, cur_blk); + return -ERANGE; + } else if (len >= U32_MAX || len == 0) { + /* empty extent */ + break; + } else { + /* it needs to check the next extent */ + cur_blk += len; + } + } + + if (!cur_extent) { + SSDFS_ERR("fail to find the extent: blk %llu\n", + blk); + return -ERANGE; + } + + ssdfs_memcpy(extent, 0, extent_size, + cur_extent, 0, extent_size, + extent_size); + + len = le32_to_cpu(fork->extents[i].len); + + if (i < (SSDFS_INLINE_EXTENTS_COUNT - 1)) { + err = ssdfs_memmove(fork->extents, + i * extent_size, + SSDFS_INLINE_EXTENTS_COUNT * extent_size, + fork->extents, + (i + 1) * extent_size, + SSDFS_INLINE_EXTENTS_COUNT * extent_size, + (SSDFS_INLINE_EXTENTS_COUNT - i) * + extent_size); + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", err); + return err; + } + } else { + memset(&fork->extents[i], 0xFF, extent_size); + } + + if (len >= U32_MAX || len == 0) { + /* + * Do nothing. Empty extent. + */ + } else if (blks_count < len) { + SSDFS_ERR("blks_count %llu < len %u\n", + blks_count, len); + return -ERANGE; + } + + blks_count -= len; + fork->blks_count = cpu_to_le64(blks_count); + + if (blks_count == 0) { + fork->start_offset = cpu_to_le64(U64_MAX); + SSDFS_DBG("empty fork\n"); + return -ENODATA; + } + + return 0; +} + +/* + * ssdfs_extents_tree_move_extent() - move extent (ZNS SSD) + * @tree: extents tree + * @blk: logical block number + * @old_extent: old extent object + * @new_extent: new extent object + * @search: search object + * + * This method tries to change @old_extent on @new_extent in the @tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - extent doesn't exist in the tree. + */ +int ssdfs_extents_tree_move_extent(struct ssdfs_extents_btree_info *tree, + u64 blk, + struct ssdfs_raw_extent *old_extent, + struct ssdfs_raw_extent *new_extent, + struct ssdfs_btree_search *search) +{ + struct ssdfs_raw_extent extent; + u64 new_blk; + u32 len; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !old_extent || !new_extent || !search); + + SSDFS_DBG("tree %p, search %p, blk %llu\n", + tree, search, blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->state)) { + case SSDFS_EXTENTS_BTREE_CREATED: + case SSDFS_EXTENTS_BTREE_INITIALIZED: + case SSDFS_EXTENTS_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + + if (need_initialize_extent_btree_search(blk, search)) { + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + search->request.flags = + SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT; + search->request.start.hash = blk; + search->request.end.hash = blk; + search->request.count = 1; + } + + down_write(&tree->lock); + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_FORKS_ARRAY: + err = ssdfs_extents_tree_find_inline_fork(tree, blk, search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the inline fork: " + "blk %llu, err %d\n", + blk, err); + goto finish_change_fork; + } + + err = ssdfs_delete_extent_in_fork(blk, search, + &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to delete extent in fork: err %d\n", + err); + goto finish_change_fork; + } + + err = ssdfs_shrink_found_extent(old_extent, &extent); + if (err == -ENODATA) { + /* + * Extent is empty. Do nothing. + */ + err = 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to shrink extent: " + "blk %llu, err %d\n", + blk, err); + goto finish_change_fork; + } else { + len = le32_to_cpu(old_extent->len); + new_blk = blk + len; + + err = ssdfs_add_extent_into_fork(new_blk, + &extent, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to add extent into fork: " + "err %d\n", + err); + goto finish_change_fork; + } + + ssdfs_debug_btree_search_object(search); + + search->request.type = SSDFS_BTREE_SEARCH_ADD_ITEM; + err = ssdfs_extents_tree_add_inline_fork(tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to add fork: err %d\n", err); + goto finish_change_fork; + } + } + + err = ssdfs_extents_tree_add_extent_nolock(tree, blk, + new_extent, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to add extent: " + "blk %llu, err %d\n", + blk, err); + goto finish_change_fork; + } + break; + + case SSDFS_PRIVATE_EXTENTS_BTREE: + err = ssdfs_btree_find_item(tree->generic_tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the fork: " + "blk %llu, err %d\n", + blk, err); + goto finish_change_fork; + } + + err = ssdfs_delete_extent_in_fork(blk, search, + &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to delete extent in fork: err %d\n", + err); + goto finish_change_fork; + } + + err = ssdfs_shrink_found_extent(old_extent, &extent); + if (err == -ENODATA) { + /* + * Extent is empty. Do nothing. + */ + err = 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to shrink extent: " + "blk %llu, err %d\n", + blk, err); + goto finish_change_fork; + } else { + len = le32_to_cpu(old_extent->len); + new_blk = blk + len; + + err = ssdfs_add_extent_into_fork(new_blk, + &extent, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to add extent into fork: " + "err %d\n", + err); + goto finish_change_fork; + } + + ssdfs_debug_btree_search_object(search); + + err = ssdfs_extents_tree_add_extent_nolock(tree, + new_blk, + &extent, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to add extent: " + "blk %llu, err %d\n", + new_blk, err); + goto finish_change_fork; + } + } + + err = ssdfs_extents_tree_add_extent_nolock(tree, blk, + new_extent, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to add extent: " + "blk %llu, err %d\n", + blk, err); + goto finish_change_fork; + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid extents tree type %#x\n", + atomic_read(&tree->type)); + break; + } + +finish_change_fork: + up_write(&tree->lock); + + ssdfs_btree_search_forget_parent_node(search); + ssdfs_btree_search_forget_child_node(search); + + ssdfs_debug_extents_btree_object(tree); + + return err; +} + +/* + * ssdfs_truncate_extent_in_fork() - truncate the extent in the fork + * @blk: logical block number + * @new_len: new length of the extent + * @search: search object + * @fork: truncated fork [out] + * + * This method tries to truncate the extent in the fork. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - no extents in the fork. + * %-ENOSPC - invalid @new_len of the extent. + * %-EFAULT - extent doesn't exist in the fork. + */ +static +int ssdfs_truncate_extent_in_fork(u64 blk, u32 new_len, + struct ssdfs_btree_search *search, + struct ssdfs_raw_fork *fork) +{ + struct ssdfs_raw_fork *cur_fork; + struct ssdfs_raw_extent *cur_extent = NULL; + u64 start_offset; + u64 blks_count; + u32 len, len_diff; + u64 cur_blk; + u64 rest_len; + u32 logical_blk; + int i, j; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON( !search || !fork); + + SSDFS_DBG("blk %llu, new_len %u, search %p\n", + blk, new_len, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_EMPTY_RESULT: + SSDFS_DBG("no fork in search object\n"); + return -EFAULT; + + case SSDFS_BTREE_SEARCH_VALID_ITEM: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid search object state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.buf_state != SSDFS_BTREE_SEARCH_INLINE_BUFFER) { + SSDFS_ERR("invalid search buffer state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + if (search->result.buf_size != sizeof(struct ssdfs_raw_fork) || + search->result.items_in_buffer != 1) { + SSDFS_ERR("invalid search buffer state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + memset(fork, 0xFF, sizeof(struct ssdfs_raw_fork)); + + cur_fork = &search->raw.fork; + start_offset = le64_to_cpu(cur_fork->start_offset); + blks_count = le64_to_cpu(cur_fork->blks_count); + + if (start_offset >= U64_MAX || blks_count >= U64_MAX) { + SSDFS_ERR("invalid fork state: " + "start_offset %llu, blks_count %llu\n", + start_offset, blks_count); + return -ERANGE; + } + + if (blks_count == 0) { + SSDFS_ERR("empty fork: blks_count %llu\n", + blks_count); + return -ENODATA; + } + + if (blk >= U64_MAX) { + SSDFS_ERR("invalid extent: blk %llu\n", + blk); + return -ERANGE; + } + + if (start_offset <= blk && blk < (start_offset + blks_count)) { + /* + * Expected state + */ + } else { + SSDFS_ERR("extent is out of fork: \n" + "fork (start %llu, blks_count %llu), " + "extent (blk %llu, len %u)\n", + start_offset, blks_count, + blk, new_len); + return -EFAULT; + } + + cur_blk = le64_to_cpu(cur_fork->start_offset); + for (i = 0; i < SSDFS_INLINE_EXTENTS_COUNT; i++) { + len = le32_to_cpu(cur_fork->extents[i].len); + + if (len >= U32_MAX || len == 0) { + /* empty extent */ + break; + } else if (cur_blk <= blk && blk < (cur_blk + len)) { + /* extent is found */ + cur_extent = &cur_fork->extents[i]; + break; + } else if (blk < cur_blk) { + SSDFS_ERR("invalid extent: " + "blk %llu, cur_blk %llu\n", + blk, cur_blk); + return -EFAULT; + } else { + /* it needs to check the next extent */ + cur_blk += len; + } + } + + if (!cur_extent) { + SSDFS_ERR("fail to find the extent: blk %llu\n", + blk); + return -EFAULT; + } + + rest_len = blks_count - (blk - start_offset); + + if (new_len > rest_len) { + SSDFS_ERR("fail to grow extent's size: " + "rest_len %llu, new_len %u\n", + rest_len, new_len); + return -ENOSPC; + } else if (new_len == rest_len) { + SSDFS_WARN("nothing should be done: " + "rest_len %llu, new_len %u\n", + rest_len, new_len); + return 0; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(i >= SSDFS_INLINE_EXTENTS_COUNT); +#endif /* CONFIG_SSDFS_DEBUG */ + + fork->start_offset = cpu_to_le64(blk); + fork->blks_count = cpu_to_le64(0); + + for (j = 0; i < SSDFS_INLINE_EXTENTS_COUNT; i++) { + cur_extent = &cur_fork->extents[i]; + len = le32_to_cpu(cur_extent->len); + + if ((cur_blk + len) < blk) { + /* pass on this extent */ + continue; + } else if ((blk + new_len) <= cur_blk) { + ssdfs_memcpy(&fork->extents[j], + 0, sizeof(struct ssdfs_raw_extent), + cur_extent, + 0, sizeof(struct ssdfs_raw_extent), + sizeof(struct ssdfs_raw_extent)); + le64_add_cpu(&fork->blks_count, len); + j++; + + /* clear extent */ + memset(cur_extent, 0xFF, + sizeof(struct ssdfs_raw_extent)); + } else { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON((blk - cur_blk) >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + len_diff = len - (u32)(blk - cur_blk); + + if (len_diff <= new_len) { + /* + * leave the extent unchanged + */ + } else { + len_diff = (cur_blk + len) - (blk + new_len); + + fork->extents[j].seg_id = cur_extent->seg_id; + logical_blk = + le32_to_cpu(cur_extent->logical_blk); + logical_blk += len - len_diff; + fork->extents[j].logical_blk = + cpu_to_le32(logical_blk); + fork->extents[j].len = cpu_to_le32(len_diff); + le64_add_cpu(&fork->blks_count, len_diff); + j++; + + /* shrink extent */ + cur_extent->len = cpu_to_le32(len - len_diff); + } + } + + cur_blk += len; + + if (cur_blk >= (start_offset + blks_count)) + break; + } + + blks_count -= rest_len - new_len; + + if (blks_count == 0) { + cur_fork->blks_count = cpu_to_le64(0); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("empty fork: blks_count %llu\n", + blks_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + return -ENODATA; + } else + cur_fork->blks_count = cpu_to_le64(blks_count); + + return 0; +} + +/* + * ssdfs_extents_tree_delete_inline_fork() - delete inline fork + * @tree: extents tree + * @search: search object + * + * This method tries to delete the inline fork from the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - fork doesn't exist in the tree. + * %-ENOENT - no more forks in the tree. + */ +static +int ssdfs_extents_tree_delete_inline_fork(struct ssdfs_extents_btree_info *tree, + struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_shared_extents_tree *shextree; + struct ssdfs_raw_fork *fork; + size_t fork_size = sizeof(struct ssdfs_raw_fork); + size_t inline_forks_size = fork_size * SSDFS_INLINE_FORKS_COUNT; + ino_t ino; + u64 start_hash; + s64 forks_count; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !tree->fsi || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, search %p\n", + tree, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = tree->fsi; + shextree = fsi->shextree; + + if (!shextree) { + SSDFS_ERR("shared extents tree is absent\n"); + return -ERANGE; + } + + ino = tree->owner->vfs_inode.i_ino; + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_FORKS_ARRAY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's type %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + + switch (atomic_read(&tree->state)) { + case SSDFS_EXTENTS_BTREE_CREATED: + case SSDFS_EXTENTS_BTREE_INITIALIZED: + case SSDFS_EXTENTS_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + if (!tree->inline_forks) { + SSDFS_ERR("empty inline tree %p\n", + tree->inline_forks); + return -ERANGE; + } + + if (search->result.buf_state != SSDFS_BTREE_SEARCH_INLINE_BUFFER) { + SSDFS_ERR("invalid buf_state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + if (!search->result.buf) { + SSDFS_ERR("empty buffer pointer\n"); + return -ERANGE; + } + + start_hash = search->request.start.hash; + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_VALID_ITEM: + if (start_hash != le64_to_cpu(search->raw.fork.start_offset)) { + SSDFS_ERR("corrupted fork: " + "start_hash %llx, " + "fork (start %llu, blks_count %llu)\n", + start_hash, + le64_to_cpu(search->raw.fork.start_offset), + le64_to_cpu(search->raw.fork.blks_count)); + return -ERANGE; + } + break; + + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + if (start_hash >= le64_to_cpu(search->raw.fork.start_offset)) { + SSDFS_ERR("corrupted fork: " + "start_hash %llx, " + "fork (start %llu, blks_count %llu)\n", + start_hash, + le64_to_cpu(search->raw.fork.start_offset), + le64_to_cpu(search->raw.fork.blks_count)); + return -ERANGE; + } + break; + + default: + SSDFS_WARN("unexpected result state %#x\n", + search->result.state); + return -ERANGE; + } + + forks_count = atomic64_read(&tree->forks_count); + if (forks_count == 0) { + SSDFS_DBG("empty tree\n"); + return -ENOENT; + } else if (forks_count > SSDFS_INLINE_FORKS_COUNT) { + SSDFS_ERR("invalid forks count %lld\n", + forks_count); + return -ERANGE; + } else + atomic64_dec(&tree->forks_count); + + if (search->result.start_index >= forks_count) { + SSDFS_ERR("invalid search result: " + "start_index %u, forks_count %lld\n", + search->result.start_index, + forks_count); + return -ENODATA; + } + + err = ssdfs_invalidate_inline_tail_forks(tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate inline tail forks: " + "err %d\n", err); + return err; + } + + if (search->result.start_index < (forks_count - 1)) { + u16 index = search->result.start_index; + + err = ssdfs_memmove(tree->inline_forks, + (size_t)index * fork_size, + inline_forks_size, + tree->inline_forks, + (size_t)(index + 1) * fork_size, + inline_forks_size, + (forks_count - index) * fork_size); + if (unlikely(err)) { + SSDFS_ERR("fail to move: err %d\n", err); + return err; + } + + index = forks_count - 1; + fork = &tree->inline_forks[index]; + memset(fork, 0xFF, sizeof(struct ssdfs_raw_fork)); + } + + atomic_set(&tree->state, SSDFS_EXTENTS_BTREE_DIRTY); + + forks_count = atomic64_read(&tree->forks_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("forks_count %lld\n", forks_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (forks_count == 0) { + SSDFS_DBG("tree is empty now\n"); + } else if (forks_count < 0) { + SSDFS_WARN("invalid forks_count %lld\n", + forks_count); + atomic_set(&tree->state, SSDFS_EXTENTS_BTREE_CORRUPTED); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_extents_tree_delete_fork() - delete generic fork + * @tree: extents tree + * @search: search object + * + * This method tries to delete the generic fork from the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - fork doesn't exist in the tree. + * %-ENOENT - no more forks in the tree. + */ +static +int ssdfs_extents_tree_delete_fork(struct ssdfs_extents_btree_info *tree, + struct ssdfs_btree_search *search) +{ + u64 start_hash; + s64 forks_count; + u64 blks_count; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, search %p\n", + tree, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->type)) { + case SSDFS_PRIVATE_EXTENTS_BTREE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's type %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + + switch (atomic_read(&tree->state)) { + case SSDFS_EXTENTS_BTREE_CREATED: + case SSDFS_EXTENTS_BTREE_INITIALIZED: + case SSDFS_EXTENTS_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + if (!tree->generic_tree) { + SSDFS_ERR("empty generic tree %p\n", + tree->generic_tree); + return -ERANGE; + } + + if (search->result.state != SSDFS_BTREE_SEARCH_VALID_ITEM) { + SSDFS_ERR("invalid search result's state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.buf_state != SSDFS_BTREE_SEARCH_INLINE_BUFFER) { + SSDFS_ERR("invalid buf_state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + start_hash = search->request.start.hash; + if (start_hash != le64_to_cpu(search->raw.fork.start_offset)) { + SSDFS_ERR("corrupted fork: " + "start_hash %llx, " + "fork (start %llu, blks_count %llu)\n", + start_hash, + le64_to_cpu(search->raw.fork.start_offset), + le64_to_cpu(search->raw.fork.blks_count)); + return -ERANGE; + } + + forks_count = atomic64_read(&tree->forks_count); + if (forks_count == 0) { + SSDFS_DBG("empty tree\n"); + return -ENOENT; + } + + if (search->result.start_index >= forks_count) { + SSDFS_ERR("invalid search result: " + "start_index %u, forks_count %lld\n", + search->result.start_index, + forks_count); + return -ENODATA; + } + + blks_count = le64_to_cpu(search->raw.fork.blks_count); + if (!(blks_count == 0 || blks_count >= U64_MAX)) { + SSDFS_ERR("fork is empty: " + "blks_count %llu\n", + blks_count); + return -ERANGE; + } + + err = ssdfs_btree_delete_item(tree->generic_tree, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete the fork from the tree: " + "err %d\n", err); + return err; + } + + forks_count = atomic64_read(&tree->forks_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("forks_count %lld\n", forks_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (forks_count == 0) { + SSDFS_DBG("tree is empty now\n"); + return -ENOENT; + } else if (forks_count < 0) { + SSDFS_WARN("invalid forks_count %lld\n", + forks_count); + atomic_set(&tree->state, SSDFS_EXTENTS_BTREE_CORRUPTED); + return -ERANGE; + } + + atomic_set(&tree->state, SSDFS_EXTENTS_BTREE_DIRTY); + return 0; +} + +/* + * ssdfs_migrate_generic2inline_tree() - convert generic tree into inline + * @tree: extents tree + * + * This method tries to convert the generic tree into inline one. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOSPC - the tree cannot be converted into inline again. + */ +static +int ssdfs_migrate_generic2inline_tree(struct ssdfs_extents_btree_info *tree) +{ + struct ssdfs_raw_fork inline_forks[SSDFS_INLINE_FORKS_COUNT]; + struct ssdfs_btree_search *search; + size_t fork_size = sizeof(struct ssdfs_raw_fork); + s64 forks_count, forks_capacity; + u64 start_hash = 0, end_hash = 0; + u64 blks_count = 0; + int private_flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p\n", tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->type)) { + case SSDFS_PRIVATE_EXTENTS_BTREE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's type %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + + switch (atomic_read(&tree->state)) { + case SSDFS_EXTENTS_BTREE_CREATED: + case SSDFS_EXTENTS_BTREE_INITIALIZED: + case SSDFS_EXTENTS_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + forks_count = atomic64_read(&tree->forks_count); + + if (!tree->owner) { + SSDFS_ERR("empty owner inode\n"); + return -ERANGE; + } + + private_flags = atomic_read(&tree->owner->private_flags); + + forks_capacity = SSDFS_INLINE_FORKS_COUNT; + if (private_flags & SSDFS_INODE_HAS_XATTR_BTREE) + forks_capacity--; + + if (private_flags & SSDFS_INODE_HAS_INLINE_EXTENTS) { + SSDFS_ERR("the extents tree is not generic\n"); + return -ERANGE; + } + + if (forks_count > forks_capacity) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("forks_count %lld > forks_capacity %lld\n", + forks_count, forks_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + return -ENOSPC; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(tree->inline_forks || !tree->generic_tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree->generic_tree = NULL; + + search = ssdfs_btree_search_alloc(); + if (!search) { + SSDFS_ERR("fail to allocate btree search object\n"); + return -ENOMEM; + } + + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_FIND_RANGE; + search->request.flags = 0; + search->request.start.hash = U64_MAX; + search->request.end.hash = U64_MAX; + search->request.count = 0; + + err = ssdfs_btree_get_head_range(&tree->buffer.tree, + forks_count, search); + if (unlikely(err)) { + SSDFS_ERR("fail to extract forks: " + "forks_count %lld, err %d\n", + forks_count, err); + goto finish_process_range; + } else if (forks_count != search->result.items_in_buffer) { + err = -ERANGE; + SSDFS_ERR("forks_count %lld != items_in_buffer %u\n", + forks_count, + search->result.items_in_buffer); + goto finish_process_range; + } + + if (search->result.state != SSDFS_BTREE_SEARCH_VALID_ITEM) { + err = -ERANGE; + SSDFS_ERR("invalid search result's state %#x\n", + search->result.state); + goto finish_process_range; + } + + memset(inline_forks, 0xFF, fork_size * SSDFS_INLINE_FORKS_COUNT); + + if (search->result.buf_size != (fork_size * forks_count) || + search->result.items_in_buffer != forks_count) { + err = -ERANGE; + SSDFS_ERR("invalid search result: " + "buf_size %zu, items_in_buffer %u, " + "forks_count %lld\n", + search->result.buf_size, + search->result.items_in_buffer, + forks_count); + goto finish_process_range; + } + + switch (search->result.buf_state) { + case SSDFS_BTREE_SEARCH_INLINE_BUFFER: + ssdfs_memcpy(inline_forks, + 0, fork_size * SSDFS_INLINE_FORKS_COUNT, + &search->raw.fork, + 0, fork_size, + fork_size); + break; + + case SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER: + if (!search->result.buf) { + err = -ERANGE; + SSDFS_ERR("empty buffer\n"); + goto finish_process_range; + } + + err = ssdfs_memcpy(inline_forks, + 0, fork_size * SSDFS_INLINE_FORKS_COUNT, + search->result.buf, + 0, search->result.buf_size, + fork_size * forks_count); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + goto finish_process_range; + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid buffer's state %#x\n", + search->result.buf_state); + goto finish_process_range; + } + + start_hash = le64_to_cpu(inline_forks[0].start_offset); + if (forks_count > 1) { + end_hash = + le64_to_cpu(inline_forks[forks_count - 1].start_offset); + blks_count = + le64_to_cpu(inline_forks[forks_count - 1].blks_count); + } else { + end_hash = start_hash; + blks_count = le64_to_cpu(inline_forks[0].blks_count); + } + + if (blks_count == 0 || blks_count >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid blks_count %llu\n", + blks_count); + goto finish_process_range; + } + + end_hash += blks_count - 1; + + search->request.type = SSDFS_BTREE_SEARCH_DELETE_RANGE; + search->request.flags = SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT | + SSDFS_BTREE_SEARCH_NOT_INVALIDATE; + search->request.start.hash = start_hash; + search->request.end.hash = end_hash; + search->request.count = forks_count; + + err = ssdfs_btree_delete_range(&tree->buffer.tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete range: " + "start_hash %llx, end_hash %llx, count %u, " + "err %d\n", + search->request.start.hash, + search->request.end.hash, + search->request.count, + err); + goto finish_process_range; + } + + if (!is_ssdfs_btree_empty(&tree->buffer.tree)) { + err = -ERANGE; + SSDFS_WARN("extents tree is not empty\n"); + atomic_set(&tree->state, SSDFS_EXTENTS_BTREE_CORRUPTED); + goto finish_process_range; + } + + err = ssdfs_btree_destroy_node_range(&tree->buffer.tree, + 0); + if (unlikely(err)) { + SSDFS_ERR("fail to destroy nodes' range: err %d\n", + err); + goto finish_process_range; + } + +finish_process_range: + ssdfs_btree_search_free(search); + + if (unlikely(err)) + return err; + + ssdfs_btree_destroy(&tree->buffer.tree); + + err = ssdfs_memcpy(tree->buffer.forks, + 0, fork_size * SSDFS_INLINE_FORKS_COUNT, + inline_forks, + 0, fork_size * SSDFS_INLINE_FORKS_COUNT, + fork_size * forks_count); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + return err; + } + + atomic_set(&tree->type, SSDFS_INLINE_FORKS_ARRAY); + atomic_set(&tree->state, SSDFS_EXTENTS_BTREE_DIRTY); + tree->inline_forks = tree->buffer.forks; + + atomic64_set(&tree->forks_count, forks_count); + + atomic_and(~SSDFS_INODE_HAS_EXTENTS_BTREE, + &tree->owner->private_flags); + atomic_or(SSDFS_INODE_HAS_INLINE_EXTENTS, + &tree->owner->private_flags); + + return 0; +} + +/* + * ssdfs_extents_tree_truncate_extent() - truncate the extent in the tree + * @tree: extent tree + * @blk: logical block number + * @new_len: new length of the extent + * @search: search object + * + * This method tries to truncate the extent in the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - extent doesn't exist in the tree. + * %-ENOSPC - invalid @new_len of the extent. + * %-EFAULT - fail to create the hole in the fork. + */ +int ssdfs_extents_tree_truncate_extent(struct ssdfs_extents_btree_info *tree, + u64 blk, u32 new_len, + struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_shared_extents_tree *shextree; + struct ssdfs_raw_fork fork; + u64 blks_count; + ino_t ino; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !tree->fsi || !search); + + SSDFS_DBG("tree %p, search %p, blk %llu, new_len %u\n", + tree, search, blk, new_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = tree->fsi; + shextree = fsi->shextree; + + if (!shextree) { + SSDFS_ERR("shared extents tree is absent\n"); + return -ERANGE; + } + + ino = tree->owner->vfs_inode.i_ino; + + switch (atomic_read(&tree->state)) { + case SSDFS_EXTENTS_BTREE_CREATED: + case SSDFS_EXTENTS_BTREE_INITIALIZED: + case SSDFS_EXTENTS_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + + if (need_initialize_extent_btree_search(blk, search)) { + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + search->request.flags = + SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT; + search->request.start.hash = blk; + search->request.end.hash = blk; + search->request.count = 1; + } + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_FORKS_ARRAY: + down_write(&tree->lock); + + err = ssdfs_extents_tree_find_inline_fork(tree, blk, search); + if (err == -ENODATA) { + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + /* hole case -> continue truncation */ + break; + + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + /* inflation case -> nothing has to be done */ + err = 0; + goto finish_truncate_inline_fork; + + default: + SSDFS_ERR("fail to find the inline fork: " + "blk %llu, err %d\n", + blk, err); + goto finish_truncate_inline_fork; + } + } else if (unlikely(err)) { + SSDFS_ERR("fail to find the inline fork: " + "blk %llu, err %d\n", + blk, err); + goto finish_truncate_inline_fork; + } + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + search->request.type = + SSDFS_BTREE_SEARCH_INVALIDATE_TAIL; + err = ssdfs_extents_tree_delete_inline_fork(tree, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete fork: err %d\n", err); + goto finish_truncate_inline_fork; + } + break; + + case SSDFS_BTREE_SEARCH_VALID_ITEM: + err = ssdfs_truncate_extent_in_fork(blk, new_len, + search, &fork); + if (err == -ENODATA) { + /* + * The truncating fork is empty. + */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to change extent in fork: " + "err %d\n", + err); + goto finish_truncate_inline_fork; + } + + if (err == -ENODATA) { + search->request.type = + SSDFS_BTREE_SEARCH_INVALIDATE_TAIL; + err = + ssdfs_extents_tree_delete_inline_fork(tree, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete fork: " + "err %d\n", err); + goto finish_truncate_inline_fork; + } + } else { + search->request.type = + SSDFS_BTREE_SEARCH_INVALIDATE_TAIL; + err = + ssdfs_extents_tree_change_inline_fork(tree, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to change fork: " + "err %d\n", err); + goto finish_truncate_inline_fork; + } + } + + blks_count = le64_to_cpu(fork.blks_count); + + if (blks_count == 0 || blks_count >= U64_MAX) { + /* + * empty fork -> do nothing + */ + } else { + err = + ssdfs_shextree_add_pre_invalid_fork(shextree, + ino, + &fork); + if (unlikely(err)) { + SSDFS_ERR("fail to pre-invalidate: " + "(start_offset %llu, " + "blks_count %llu), err %d\n", + le64_to_cpu(fork.start_offset), + le64_to_cpu(fork.blks_count), + err); + goto finish_truncate_inline_fork; + } + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid result state %#x\n", + search->result.state); + goto finish_truncate_inline_fork; + } + +finish_truncate_inline_fork: + up_write(&tree->lock); + break; + + case SSDFS_PRIVATE_EXTENTS_BTREE: + down_read(&tree->lock); + + err = ssdfs_btree_find_item(tree->generic_tree, search); + if (err == -ENODATA) { + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + /* hole case -> continue truncation */ + break; + + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + case SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE: + if (is_last_leaf_node_found(search)) { + /* + * inflation case + * nothing has to be done + */ + err = 0; + goto finish_truncate_generic_fork; + } else { + /* + * hole case + * continue truncation + */ + } + break; + + default: + SSDFS_ERR("fail to find the fork: " + "blk %llu, err %d\n", + blk, err); + goto finish_truncate_generic_fork; + } + } else if (unlikely(err)) { + SSDFS_ERR("fail to find the fork: " + "blk %llu, err %d\n", + blk, err); + goto finish_truncate_generic_fork; + } + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + search->request.type = + SSDFS_BTREE_SEARCH_INVALIDATE_TAIL; + err = ssdfs_extents_tree_delete_fork(tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete fork: err %d\n", err); + goto finish_truncate_generic_fork; + } + break; + + case SSDFS_BTREE_SEARCH_VALID_ITEM: + err = ssdfs_truncate_extent_in_fork(blk, new_len, + search, &fork); + if (err == -ENODATA) { + /* + * The truncating fork is empty. + */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to change extent in fork: " + "err %d\n", err); + goto finish_truncate_generic_fork; + } + + if (err == -ENODATA) { + search->request.type = + SSDFS_BTREE_SEARCH_INVALIDATE_TAIL; + err = ssdfs_extents_tree_delete_fork(tree, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete fork: " + "err %d\n", err); + goto finish_truncate_generic_fork; + } + } else { + search->request.type = + SSDFS_BTREE_SEARCH_INVALIDATE_TAIL; + err = ssdfs_extents_tree_change_fork(tree, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to change fork: " + "err %d\n", err); + goto finish_truncate_generic_fork; + } + } + + blks_count = le64_to_cpu(fork.blks_count); + + if (blks_count == 0 || blks_count >= U64_MAX) { + /* + * empty fork -> do nothing + */ + } else { + err = + ssdfs_shextree_add_pre_invalid_fork(shextree, + ino, + &fork); + if (unlikely(err)) { + SSDFS_ERR("fail to pre-invalidate: " + "(start_offset %llu, " + "blks_count %llu), err %d\n", + le64_to_cpu(fork.start_offset), + le64_to_cpu(fork.blks_count), + err); + goto finish_truncate_generic_fork; + } + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid result state %#x\n", + search->result.state); + goto finish_truncate_generic_fork; + } + +finish_truncate_generic_fork: + up_read(&tree->lock); + + ssdfs_btree_search_forget_parent_node(search); + ssdfs_btree_search_forget_child_node(search); + + if (unlikely(err)) + return err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("forks_count %lld\n", + atomic64_read(&tree->forks_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!err && + need_migrate_generic2inline_btree(tree->generic_tree, 1)) { + down_write(&tree->lock); + err = ssdfs_migrate_generic2inline_tree(tree); + up_write(&tree->lock); + + if (err == -ENOSPC) { + /* continue to use the generic tree */ + err = 0; + SSDFS_DBG("unable to re-create inline tree\n"); + } else if (unlikely(err)) { + SSDFS_ERR("fail to re-create inline tree: " + "err %d\n", + err); + } + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid extents tree type %#x\n", + atomic_read(&tree->type)); + break; + } + + return err; +} + +static int +ssdfs_extents_tree_delete_inline_extent(struct ssdfs_extents_btree_info *tree, + u64 blk, + struct ssdfs_raw_extent *extent, + struct ssdfs_btree_search *search) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !tree->fsi || !extent || !search); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p, search %p, blk %llu\n", + tree, search, blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + + err = ssdfs_extents_tree_find_inline_fork(tree, blk, search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the inline fork: " + "blk %llu, err %d\n", + blk, err); + return err; + } + + err = ssdfs_delete_extent_in_fork(blk, search, extent); + if (err == -ENODATA) { + /* + * The fork doesn't contain any extents. + */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to delete extent in fork: err %d\n", + err); + return err; + } + + if (err == -ENODATA) { + search->request.type = SSDFS_BTREE_SEARCH_DELETE_ITEM; + err = ssdfs_extents_tree_delete_inline_fork(tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete fork: err %d\n", err); + return err; + } + } else { + search->request.type = SSDFS_BTREE_SEARCH_CHANGE_ITEM; + err = ssdfs_extents_tree_change_inline_fork(tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to change fork: err %d\n", err); + return err; + } + } + + return 0; +} + +/* + * ssdfs_extents_tree_delete_extent() - delete extent from the tree + * @tree: extents tree + * @blk: logical block number + * @search: search object + * + * This method tries to delete extent from the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - extent doesn't exist in the tree. + * %-EFAULT - fail to create the hole in the fork. + */ +int ssdfs_extents_tree_delete_extent(struct ssdfs_extents_btree_info *tree, + u64 blk, + struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_shared_extents_tree *shextree; + struct ssdfs_raw_extent extent; + ino_t ino; + u32 len; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !tree->fsi || !search); + + SSDFS_DBG("tree %p, search %p, blk %llu\n", + tree, search, blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = tree->fsi; + shextree = fsi->shextree; + + if (!shextree) { + SSDFS_ERR("shared extents tree is absent\n"); + return -ERANGE; + } + + ino = tree->owner->vfs_inode.i_ino; + + switch (atomic_read(&tree->state)) { + case SSDFS_EXTENTS_BTREE_CREATED: + case SSDFS_EXTENTS_BTREE_INITIALIZED: + case SSDFS_EXTENTS_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + + if (need_initialize_extent_btree_search(blk, search)) { + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + search->request.flags = + SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT; + search->request.start.hash = blk; + search->request.end.hash = blk; + search->request.count = 1; + } + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_FORKS_ARRAY: + down_write(&tree->lock); + + err = ssdfs_extents_tree_delete_inline_extent(tree, + blk, + &extent, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete inline extent: " + "blk %llu, err %d\n", + blk, err); + goto finish_delete_inline_extent; + } + + len = le32_to_cpu(extent.len); + + if (len == 0 || len >= U32_MAX) { + /* + * empty extent -> do nothing + */ + } else { + err = ssdfs_shextree_add_pre_invalid_extent(shextree, + ino, + &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to add pre-invalid extent " + "(seg_id %llu, blk %u, len %u), " + "err %d\n", + le64_to_cpu(extent.seg_id), + le32_to_cpu(extent.logical_blk), + len, err); + goto finish_delete_inline_extent; + } + } + +finish_delete_inline_extent: + up_write(&tree->lock); + break; + + case SSDFS_PRIVATE_EXTENTS_BTREE: + down_read(&tree->lock); + + err = ssdfs_btree_find_item(tree->generic_tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the fork: " + "blk %llu, err %d\n", + blk, err); + goto finish_delete_generic_extent; + } + + err = ssdfs_delete_extent_in_fork(blk, search, + &extent); + if (err == -ENODATA) { + /* + * The fork doesn't contain any extents. + */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to delete extent in fork: err %d\n", + err); + goto finish_delete_generic_extent; + } + + if (err == -ENODATA) { + search->request.type = SSDFS_BTREE_SEARCH_DELETE_ITEM; + err = ssdfs_extents_tree_delete_fork(tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete fork: err %d\n", err); + goto finish_delete_generic_extent; + } + } else { + search->request.type = SSDFS_BTREE_SEARCH_CHANGE_ITEM; + err = ssdfs_extents_tree_change_fork(tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to change fork: err %d\n", err); + goto finish_delete_generic_extent; + } + } + +finish_delete_generic_extent: + up_read(&tree->lock); + + ssdfs_btree_search_forget_parent_node(search); + ssdfs_btree_search_forget_child_node(search); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("forks_count %lld\n", + atomic64_read(&tree->forks_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!err && + need_migrate_generic2inline_btree(tree->generic_tree, 1)) { + down_write(&tree->lock); + err = ssdfs_migrate_generic2inline_tree(tree); + up_write(&tree->lock); + + if (err == -ENOSPC) { + /* continue to use the generic tree */ + err = 0; + SSDFS_DBG("unable to re-create inline tree\n"); + } else if (unlikely(err)) { + SSDFS_ERR("fail to re-create inline tree: " + "err %d\n", + err); + } + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid extents tree type %#x\n", + atomic_read(&tree->type)); + break; + } + + return err; +} + +/* + * ssdfs_delete_all_inline_forks() - delete all inline forks + * @tree: extents tree + * + * This method tries to delete all inline forks in the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENOENT - empty tree. + */ +static +int ssdfs_delete_all_inline_forks(struct ssdfs_extents_btree_info *tree) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_shared_extents_tree *shextree; + struct ssdfs_raw_fork *fork; + struct ssdfs_raw_extent *extent; + u64 forks_count; + ino_t ino; + int i, j; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !tree->fsi); + BUG_ON(!rwsem_is_locked(&tree->lock)); + + SSDFS_DBG("tree %p\n", tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = tree->fsi; + shextree = fsi->shextree; + + if (!shextree) { + SSDFS_ERR("shared extents tree is absent\n"); + return -ERANGE; + } + + ino = tree->owner->vfs_inode.i_ino; + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_FORKS_ARRAY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's type %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + + switch (atomic_read(&tree->state)) { + case SSDFS_EXTENTS_BTREE_CREATED: + case SSDFS_EXTENTS_BTREE_INITIALIZED: + case SSDFS_EXTENTS_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + if (!tree->inline_forks) { + SSDFS_ERR("empty inline forks %p\n", + tree->inline_forks); + return -ERANGE; + } + + forks_count = atomic64_read(&tree->forks_count); + if (forks_count == 0) { + SSDFS_DBG("empty tree\n"); + return -ENOENT; + } else if (forks_count > SSDFS_INLINE_FORKS_COUNT) { + atomic_set(&tree->state, + SSDFS_EXTENTS_BTREE_CORRUPTED); + SSDFS_ERR("extents tree is corupted: " + "forks_count %llu", + forks_count); + return -ERANGE; + } + + for (i = 0; i < forks_count; i++) { + u64 blks_count; + u64 calculated = 0; + + fork = &tree->inline_forks[i]; + blks_count = le64_to_cpu(fork->blks_count); + + if (blks_count == 0 || blks_count >= U64_MAX) { + atomic_set(&tree->state, + SSDFS_EXTENTS_BTREE_CORRUPTED); + SSDFS_ERR("corrupted fork: blks_count %llu\n", + blks_count); + return -ERANGE; + } + + for (j = SSDFS_INLINE_EXTENTS_COUNT - 1; j >= 0; j--) { + u32 len; + + extent = &fork->extents[j]; + len = le32_to_cpu(extent->len); + + if (len == 0 || len >= U32_MAX) + continue; + + if ((calculated + len) > blks_count) { + atomic_set(&tree->state, + SSDFS_EXTENTS_BTREE_CORRUPTED); + SSDFS_ERR("corrupted extent: " + "calculated %llu, len %u, " + "blks %llu\n", + calculated, len, blks_count); + return -ERANGE; + } + + err = ssdfs_shextree_add_pre_invalid_extent(shextree, + ino, + extent); + if (unlikely(err)) { + SSDFS_ERR("fail to add pre-invalid extent " + "(seg_id %llu, blk %u, len %u), " + "err %d\n", + le64_to_cpu(extent->seg_id), + le32_to_cpu(extent->logical_blk), + len, err); + return err; + } + } + + if (calculated != blks_count) { + atomic_set(&tree->state, + SSDFS_EXTENTS_BTREE_CORRUPTED); + SSDFS_ERR("calculated %llu != blks_count %llu\n", + calculated, blks_count); + return -ERANGE; + } + } + + memset(tree->inline_forks, 0xFF, + sizeof(struct ssdfs_raw_fork) * SSDFS_INLINE_FORKS_COUNT); + + atomic_set(&tree->state, SSDFS_EXTENTS_BTREE_DIRTY); + return 0; +} + +/* + * ssdfs_extents_tree_delete_all() - delete all forks in the tree + * @tree: extents tree + * + * This method tries to delete all forks in the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_extents_tree_delete_all(struct ssdfs_extents_btree_info *tree) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); + + SSDFS_DBG("tree %p\n", tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->state)) { + case SSDFS_EXTENTS_BTREE_CREATED: + case SSDFS_EXTENTS_BTREE_INITIALIZED: + case SSDFS_EXTENTS_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid extent tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_FORKS_ARRAY: + down_write(&tree->lock); + err = ssdfs_delete_all_inline_forks(tree); + if (!err) + atomic64_set(&tree->forks_count, 0); + up_write(&tree->lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("forks_count %lld\n", + atomic64_read(&tree->forks_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (unlikely(err)) { + SSDFS_ERR("fail to delete all inline forks: " + "err %d\n", + err); + } + break; + + case SSDFS_PRIVATE_EXTENTS_BTREE: + down_write(&tree->lock); + err = ssdfs_btree_delete_all(tree->generic_tree); + if (!err) { + atomic64_set(&tree->forks_count, 0); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("forks_count %lld\n", + atomic64_read(&tree->forks_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + } + up_write(&tree->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to delete the all forks: " + "err %d\n", + err); + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid extents tree type %#x\n", + atomic_read(&tree->type)); + break; + } + + return err; +} From patchwork Sat Feb 25 01:09:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151973 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C7F7C7EE2E for ; Sat, 25 Feb 2023 01:20:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229793AbjBYBUG (ORCPT ); Fri, 24 Feb 2023 20:20:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49366 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229579AbjBYBTM (ORCPT ); Fri, 24 Feb 2023 20:19:12 -0500 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A66D1420C for ; Fri, 24 Feb 2023 17:18:00 -0800 (PST) Received: by mail-oi1-x22e.google.com with SMTP id s41so7618oiw.13 for ; Fri, 24 Feb 2023 17:18:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qQQiVE6WIgoK7cj+rs5hZ3UPd+GufWLt79PFo8OwnJ0=; b=SV6Oil0WRKjRaTNlviIadx70cZGUtqK1/ShdVpao14F1bI0pysecONj4pN0e7uFSnx 9hwTIFTWccCkpXmBOci7D9mgfIPKVeDY3cMPUIfJEcPgA29b3MsmQWXtMHiIKLB0IPtj WXIuIIP57qmRiqKVu6DMBFHp+NGf9anIxeCoW/9FbM5Q4Fo59hrbbGtKNOCxM1zcH+xi 88TXS4cHViTc8H6qNsR4eZdRJAVSVRHdryOEScsYYlMOgNYDE8kykZQ+j0aMJAT/gu+G y+HDmmrv5BQxvakAfSa5rPLXlaMitPwZFpyYHYVip4VD+q+2lBYxYTINO1kIQ18KxYEQ ns4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qQQiVE6WIgoK7cj+rs5hZ3UPd+GufWLt79PFo8OwnJ0=; b=5gOjXmWYr7ydvgvM1tUPXiMunlFQSUzHH+YLButFyHCR4BvAfkAaYZQHcMsI1/a9mz RxXRuG2VQKroXvlFX559W9tenEwwaJBmVSjfrTaHMw8L3Dl5QeL6juzUfpL/NT7KaetJ 3uyF+dfQrbU5LMn6xWRPn+FWAKM8ozlYZDTymHco3sX6/dc739HuDmVmwHTW/A5BzclA ToincfmEPSBTARruHq/mXcB9+JkTF/580w66LeDjDGcn6IKgOTnBLlDcBkACXPMOHfkF DOhXKGYNx4EZKJydqELUwgZAuKp2gIH+Lbu6Qrg/nxIjigcm7PEr5mezu9DDdI578N5k 00+w== X-Gm-Message-State: AO0yUKVN12g2m7yK++fQKa6vm1uXiOHU/wZt6UTiz6Jq4DPScPVKmtRw hQ5RRE509EXFsh89cxdLScAYnM3X0F1Gr1YG X-Google-Smtp-Source: AK7set/xawJSJyz0EhnZ3Toto6SZvNBe5T6FTmJkm8EF6OhgJE5vnL4WqfzQeUTiUaKKHHdH9S1cNQ== X-Received: by 2002:aca:1307:0:b0:36c:a58f:a245 with SMTP id e7-20020aca1307000000b0036ca58fa245mr5016130oii.8.1677287878989; Fri, 24 Feb 2023 17:17:58 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:17:58 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 68/76] ssdfs: search extent logic in extents b-tree node Date: Fri, 24 Feb 2023 17:09:19 -0800 Message-Id: <20230225010927.813929-69-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This patch implements lookup logic in extents b-tree node. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/extents_tree.c | 3111 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 3111 insertions(+) diff --git a/fs/ssdfs/extents_tree.c b/fs/ssdfs/extents_tree.c index f978ef0cca12..4b183308eff5 100644 --- a/fs/ssdfs/extents_tree.c +++ b/fs/ssdfs/extents_tree.c @@ -6887,3 +6887,3114 @@ int ssdfs_extents_tree_delete_all(struct ssdfs_extents_btree_info *tree) return err; } + +/****************************************************************************** + * SPECIALIZED EXTENTS BTREE DESCRIPTOR OPERATIONS * + ******************************************************************************/ + +/* + * ssdfs_extents_btree_desc_init() - specialized btree descriptor init + * @fsi: pointer on shared file system object + * @tree: pointer on btree object + */ +static +int ssdfs_extents_btree_desc_init(struct ssdfs_fs_info *fsi, + struct ssdfs_btree *tree) +{ + struct ssdfs_extents_btree_info *tree_info = NULL; + struct ssdfs_btree_descriptor *desc; + u32 erasesize; + u32 node_size; + size_t fork_size = sizeof(struct ssdfs_raw_fork); + u16 item_size; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !tree); + + SSDFS_DBG("fsi %p, tree %p\n", + fsi, tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree_info = container_of(tree, + struct ssdfs_extents_btree_info, + buffer.tree); + + erasesize = fsi->erasesize; + + desc = &tree_info->desc.desc; + + if (le32_to_cpu(desc->magic) != SSDFS_EXTENTS_BTREE_MAGIC) { + err = -EIO; + SSDFS_ERR("invalid magic %#x\n", + le32_to_cpu(desc->magic)); + goto finish_btree_desc_init; + } + + /* TODO: check flags */ + + if (desc->type != SSDFS_EXTENTS_BTREE) { + err = -EIO; + SSDFS_ERR("invalid btree type %#x\n", + desc->type); + goto finish_btree_desc_init; + } + + node_size = 1 << desc->log_node_size; + if (node_size < SSDFS_4KB || node_size > erasesize) { + err = -EIO; + SSDFS_ERR("invalid node size: " + "log_node_size %u, node_size %u, erasesize %u\n", + desc->log_node_size, + node_size, erasesize); + goto finish_btree_desc_init; + } + + item_size = le16_to_cpu(desc->item_size); + + if (item_size != fork_size) { + err = -EIO; + SSDFS_ERR("invalid item size %u\n", + item_size); + goto finish_btree_desc_init; + } + + if (le16_to_cpu(desc->index_area_min_size) < (4 * fork_size)) { + err = -EIO; + SSDFS_ERR("invalid index_area_min_size %u\n", + le16_to_cpu(desc->index_area_min_size)); + goto finish_btree_desc_init; + } + + err = ssdfs_btree_desc_init(fsi, tree, desc, (u8)item_size, item_size); + +finish_btree_desc_init: + if (unlikely(err)) { + SSDFS_ERR("fail to init btree descriptor: err %d\n", + err); + } + + return err; +} + +/* + * ssdfs_extents_btree_desc_flush() - specialized btree's descriptor flush + * @tree: pointer on btree object + */ +static +int ssdfs_extents_btree_desc_flush(struct ssdfs_btree *tree) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_extents_btree_info *tree_info = NULL; + struct ssdfs_btree_descriptor desc; + size_t fork_size = sizeof(struct ssdfs_raw_fork); + u32 erasesize; + u32 node_size; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !tree->fsi); + + SSDFS_DBG("owner_ino %llu, type %#x, state %#x\n", + tree->owner_ino, tree->type, + atomic_read(&tree->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = tree->fsi; + + if (tree->type != SSDFS_EXTENTS_BTREE) { + SSDFS_WARN("invalid tree type %#x\n", + tree->type); + return -ERANGE; + } else { + tree_info = container_of(tree, + struct ssdfs_extents_btree_info, + buffer.tree); + } + + memset(&desc, 0xFF, sizeof(struct ssdfs_btree_descriptor)); + + desc.magic = cpu_to_le32(SSDFS_EXTENTS_BTREE_MAGIC); + desc.item_size = cpu_to_le16(fork_size); + + err = ssdfs_btree_desc_flush(tree, &desc); + if (unlikely(err)) { + SSDFS_ERR("invalid btree descriptor: err %d\n", + err); + return err; + } + + if (desc.type != SSDFS_EXTENTS_BTREE) { + SSDFS_ERR("invalid btree type %#x\n", + desc.type); + return -ERANGE; + } + + erasesize = fsi->erasesize; + node_size = 1 << desc.log_node_size; + + if (node_size < SSDFS_4KB || node_size > erasesize) { + SSDFS_ERR("invalid node size: " + "log_node_size %u, node_size %u, erasesize %u\n", + desc.log_node_size, + node_size, erasesize); + return -ERANGE; + } + + if (le16_to_cpu(desc.index_area_min_size) < (4 * fork_size)) { + SSDFS_ERR("invalid index_area_min_size %u\n", + le16_to_cpu(desc.index_area_min_size)); + return -ERANGE; + } + + ssdfs_memcpy(&tree_info->desc.desc, + 0, sizeof(struct ssdfs_btree_descriptor), + &desc, + 0, sizeof(struct ssdfs_btree_descriptor), + sizeof(struct ssdfs_btree_descriptor)); + + return 0; +} + +/****************************************************************************** + * SPECIALIZED EXTENTS BTREE OPERATIONS * + ******************************************************************************/ + +/* + * ssdfs_extents_btree_create_root_node() - specialized root node creation + * @fsi: pointer on shared file system object + * @node: pointer on node object [out] + */ +static +int ssdfs_extents_btree_create_root_node(struct ssdfs_fs_info *fsi, + struct ssdfs_btree_node *node) +{ + struct ssdfs_btree *tree; + struct ssdfs_extents_btree_info *tree_info = NULL; + struct ssdfs_btree_inline_root_node tmp_buffer; + struct ssdfs_inode *raw_inode = NULL; + int private_flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !node); + + SSDFS_DBG("fsi %p, node %p\n", + fsi, node); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree = node->tree; + if (!tree) { + SSDFS_ERR("node hasn't pointer on tree\n"); + return -ERANGE; + } + + if (atomic_read(&tree->state) != SSDFS_BTREE_UNKNOWN_STATE) { + SSDFS_ERR("unexpected tree state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + } + + if (tree->type != SSDFS_EXTENTS_BTREE) { + SSDFS_WARN("invalid tree type %#x\n", + tree->type); + return -ERANGE; + } else { + tree_info = container_of(tree, + struct ssdfs_extents_btree_info, + buffer.tree); + } + + if (!tree_info->owner) { + SSDFS_ERR("empty inode pointer\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!rwsem_is_locked(&tree_info->owner->lock)); + BUG_ON(!rwsem_is_locked(&tree_info->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + private_flags = atomic_read(&tree_info->owner->private_flags); + + if (private_flags & SSDFS_INODE_HAS_EXTENTS_BTREE) { + switch (atomic_read(&tree_info->type)) { + case SSDFS_PRIVATE_EXTENTS_BTREE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid tree type %#x\n", + atomic_read(&tree_info->type)); + return -ERANGE; + } + + raw_inode = &tree_info->owner->raw_inode; + ssdfs_memcpy(&tmp_buffer, + 0, sizeof(struct ssdfs_btree_inline_root_node), + &raw_inode->internal[0].area1.extents_root, + 0, sizeof(struct ssdfs_btree_inline_root_node), + sizeof(struct ssdfs_btree_inline_root_node)); + } else { + switch (atomic_read(&tree_info->type)) { + case SSDFS_INLINE_FORKS_ARRAY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid tree type %#x\n", + atomic_read(&tree_info->type)); + return -ERANGE; + } + + memset(&tmp_buffer, 0xFF, + sizeof(struct ssdfs_btree_inline_root_node)); + + tmp_buffer.header.height = SSDFS_BTREE_LEAF_NODE_HEIGHT + 1; + tmp_buffer.header.items_count = 0; + tmp_buffer.header.flags = 0; + tmp_buffer.header.type = SSDFS_BTREE_ROOT_NODE; + tmp_buffer.header.upper_node_id = + cpu_to_le32(SSDFS_BTREE_ROOT_NODE_ID); + } + + ssdfs_memcpy(&tree_info->root_buffer, + 0, sizeof(struct ssdfs_btree_inline_root_node), + &tmp_buffer, + 0, sizeof(struct ssdfs_btree_inline_root_node), + sizeof(struct ssdfs_btree_inline_root_node)); + tree_info->root = &tree_info->root_buffer; + + err = ssdfs_btree_create_root_node(node, tree_info->root); + if (unlikely(err)) { + SSDFS_ERR("fail to create root node: err %d\n", + err); + } + + return err; +} + +/* + * ssdfs_extents_btree_pre_flush_root_node() - specialized root node pre-flush + * @node: pointer on node object + */ +static +int ssdfs_extents_btree_pre_flush_root_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_btree *tree; + struct ssdfs_state_bitmap *bmap; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node_id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + case SSDFS_BTREE_NODE_INITIALIZED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is clean\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + + case SSDFS_BTREE_NODE_CORRUPTED: + SSDFS_WARN("node %u is corrupted\n", + node->node_id); + down_read(&node->bmap_array.lock); + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_DIRTY_BMAP]; + spin_lock(&bmap->lock); + bitmap_clear(bmap->ptr, 0, node->bmap_array.bits_count); + spin_unlock(&bmap->lock); + up_read(&node->bmap_array.lock); + clear_ssdfs_btree_node_dirty(node); + return -EFAULT; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + tree = node->tree; + if (!tree) { + SSDFS_ERR("node hasn't pointer on tree\n"); + return -ERANGE; + } + + if (tree->type != SSDFS_EXTENTS_BTREE) { + SSDFS_WARN("invalid tree type %#x\n", + tree->type); + return -ERANGE; + } + + down_write(&node->full_lock); + down_write(&node->header_lock); + + err = ssdfs_btree_pre_flush_root_node(node); + if (unlikely(err)) { + SSDFS_ERR("fail to pre-flush root node: " + "node_id %u, err %d\n", + node->node_id, err); + } + + up_write(&node->header_lock); + up_write(&node->full_lock); + + return err; +} + +/* + * ssdfs_extents_btree_flush_root_node() - specialized root node flush + * @node: pointer on node object + */ +static +int ssdfs_extents_btree_flush_root_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_btree *tree; + struct ssdfs_extents_btree_info *tree_info = NULL; + struct ssdfs_btree_inline_root_node tmp_buffer; + struct ssdfs_inode *raw_inode = NULL; + int private_flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node %p, node_id %u\n", + node, node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree = node->tree; + if (!tree) { + SSDFS_ERR("node hasn't pointer on tree\n"); + return -ERANGE; + } + + if (tree->type != SSDFS_EXTENTS_BTREE) { + SSDFS_WARN("invalid tree type %#x\n", + tree->type); + return -ERANGE; + } else { + tree_info = container_of(tree, + struct ssdfs_extents_btree_info, + buffer.tree); + } + + if (!tree_info->owner) { + SSDFS_ERR("empty inode pointer\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!rwsem_is_locked(&tree_info->owner->lock)); + BUG_ON(!rwsem_is_locked(&tree_info->lock)); +#endif /* CONFIG_SSDFS_DEBUG */ + + private_flags = atomic_read(&tree_info->owner->private_flags); + + if (private_flags & SSDFS_INODE_HAS_EXTENTS_BTREE) { + switch (atomic_read(&tree_info->type)) { + case SSDFS_PRIVATE_EXTENTS_BTREE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid tree type %#x\n", + atomic_read(&tree_info->type)); + return -ERANGE; + } + + if (!tree_info->root) { + SSDFS_ERR("root node pointer is NULL\n"); + return -ERANGE; + } + + ssdfs_btree_flush_root_node(node, tree_info->root); + ssdfs_memcpy(&tmp_buffer, + 0, sizeof(struct ssdfs_btree_inline_root_node), + tree_info->root, + 0, sizeof(struct ssdfs_btree_inline_root_node), + sizeof(struct ssdfs_btree_inline_root_node)); + + raw_inode = &tree_info->owner->raw_inode; + ssdfs_memcpy(&raw_inode->internal[0].area1.extents_root, + 0, sizeof(struct ssdfs_btree_inline_root_node), + &tmp_buffer, + 0, sizeof(struct ssdfs_btree_inline_root_node), + sizeof(struct ssdfs_btree_inline_root_node)); + } else { + err = -ERANGE; + SSDFS_ERR("extents tree is inline forks array\n"); + } + + return err; +} + +/* + * ssdfs_extents_btree_create_node() - specialized node creation + * @node: pointer on node object + */ +static +int ssdfs_extents_btree_create_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_btree *tree; + void *addr[SSDFS_BTREE_NODE_BMAP_COUNT]; + size_t hdr_size = sizeof(struct ssdfs_extents_btree_node_header); + u32 node_size; + u32 items_area_size = 0; + u16 item_size = 0; + u16 index_size = 0; + u16 index_area_min_size; + u16 items_capacity = 0; + u16 index_capacity = 0; + u32 index_area_size = 0; + size_t bmap_bytes; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree); + WARN_ON(atomic_read(&node->state) != SSDFS_BTREE_NODE_CREATED); + + SSDFS_DBG("node_id %u, state %#x, type %#x\n", + node->node_id, atomic_read(&node->state), + atomic_read(&node->type)); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree = node->tree; + node_size = tree->node_size; + index_area_min_size = tree->index_area_min_size; + + node->node_ops = &ssdfs_extents_btree_node_ops; + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_INDEX_NODE: + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid index area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_AREA_ABSENT: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid items area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + break; + + case SSDFS_BTREE_HYBRID_NODE: + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid index area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid items area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + break; + + case SSDFS_BTREE_LEAF_NODE: + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_AREA_ABSENT: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid index area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid items area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + break; + + default: + SSDFS_WARN("invalid node type %#x\n", + atomic_read(&node->type)); + return -ERANGE; + } + + down_write(&node->header_lock); + down_write(&node->bmap_array.lock); + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_INDEX_NODE: + node->index_area.offset = (u32)hdr_size; + node->index_area.area_size = node_size - hdr_size; + + index_area_size = node->index_area.area_size; + index_size = node->index_area.index_size; + + node->index_area.index_capacity = index_area_size / index_size; + index_capacity = node->index_area.index_capacity; + + node->bmap_array.index_start_bit = + SSDFS_BTREE_NODE_HEADER_INDEX + 1; + node->bmap_array.item_start_bit = + node->bmap_array.index_start_bit + index_capacity; + break; + + case SSDFS_BTREE_HYBRID_NODE: + node->index_area.offset = (u32)hdr_size; + + if (index_area_min_size == 0 || + index_area_min_size >= (node_size - hdr_size)) { + err = -ERANGE; + SSDFS_ERR("invalid index area desc: " + "index_area_min_size %u, " + "node_size %u, hdr_size %zu\n", + index_area_min_size, + node_size, hdr_size); + goto finish_create_node; + } + + node->index_area.area_size = index_area_min_size; + + index_area_size = node->index_area.area_size; + index_size = node->index_area.index_size; + node->index_area.index_capacity = index_area_size / index_size; + index_capacity = node->index_area.index_capacity; + + node->items_area.offset = node->index_area.offset + + node->index_area.area_size; + + if (node->items_area.offset >= node_size) { + err = -ERANGE; + SSDFS_ERR("invalid items area desc: " + "area_offset %u, node_size %u\n", + node->items_area.offset, + node_size); + goto finish_create_node; + } + + node->items_area.area_size = node_size - + node->items_area.offset; + node->items_area.free_space = node->items_area.area_size; + node->items_area.item_size = tree->item_size; + node->items_area.min_item_size = tree->min_item_size; + node->items_area.max_item_size = tree->max_item_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_size %u, hdr_size %zu, free_space %u\n", + node_size, hdr_size, + node->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + items_area_size = node->items_area.area_size; + item_size = node->items_area.item_size; + + node->items_area.items_count = 0; + node->items_area.items_capacity = items_area_size / item_size; + items_capacity = node->items_area.items_capacity; + + if (node->items_area.items_capacity == 0) { + err = -ERANGE; + SSDFS_ERR("items area's capacity %u\n", + node->items_area.items_capacity); + goto finish_create_node; + } + + node->bmap_array.index_start_bit = + SSDFS_BTREE_NODE_HEADER_INDEX + 1; + node->bmap_array.item_start_bit = + node->bmap_array.index_start_bit + index_capacity; + + node->raw.extents_header.blks_count = cpu_to_le64(0); + node->raw.extents_header.forks_count = cpu_to_le32(0); + node->raw.extents_header.allocated_extents = cpu_to_le32(0); + node->raw.extents_header.valid_extents = cpu_to_le32(0); + node->raw.extents_header.max_extent_blks = cpu_to_le32(0); + break; + + case SSDFS_BTREE_LEAF_NODE: + node->items_area.offset = (u32)hdr_size; + node->items_area.area_size = node_size - hdr_size; + node->items_area.free_space = node->items_area.area_size; + node->items_area.item_size = tree->item_size; + node->items_area.min_item_size = tree->min_item_size; + node->items_area.max_item_size = tree->max_item_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_size %u, hdr_size %zu, free_space %u\n", + node_size, hdr_size, + node->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + items_area_size = node->items_area.area_size; + item_size = node->items_area.item_size; + + node->items_area.items_count = 0; + node->items_area.items_capacity = items_area_size / item_size; + items_capacity = node->items_area.items_capacity; + + node->bmap_array.item_start_bit = + SSDFS_BTREE_NODE_HEADER_INDEX + 1; + + node->raw.extents_header.blks_count = cpu_to_le64(0); + node->raw.extents_header.forks_count = cpu_to_le32(0); + node->raw.extents_header.allocated_extents = cpu_to_le32(0); + node->raw.extents_header.valid_extents = cpu_to_le32(0); + node->raw.extents_header.max_extent_blks = cpu_to_le32(0); + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid node type %#x\n", + atomic_read(&node->type)); + goto finish_create_node; + } + + node->bmap_array.bits_count = index_capacity + items_capacity + 1; + + if (item_size > 0) + items_capacity = node_size / item_size; + else + items_capacity = 0; + + if (index_size > 0) + index_capacity = node_size / index_size; + else + index_capacity = 0; + + bmap_bytes = index_capacity + items_capacity + 1; + bmap_bytes += BITS_PER_LONG; + bmap_bytes /= BITS_PER_BYTE; + + node->bmap_array.bmap_bytes = bmap_bytes; + + if (bmap_bytes == 0 || bmap_bytes > SSDFS_EXTENT_MAX_BMAP_SIZE) { + err = -EIO; + SSDFS_ERR("invalid bmap_bytes %zu\n", + bmap_bytes); + goto finish_create_node; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, blks_count %llu, " + "forks_count %u, allocated_extents %u, " + "valid_extents %u, max_extent_blks %u\n", + node->node_id, + le64_to_cpu(node->raw.extents_header.blks_count), + le32_to_cpu(node->raw.extents_header.forks_count), + le32_to_cpu(node->raw.extents_header.allocated_extents), + le32_to_cpu(node->raw.extents_header.valid_extents), + le32_to_cpu(node->raw.extents_header.max_extent_blks)); + SSDFS_DBG("items_count %u, items_capacity %u, " + "start_hash %llx, end_hash %llx\n", + node->items_area.items_count, + node->items_area.items_capacity, + node->items_area.start_hash, + node->items_area.end_hash); + SSDFS_DBG("index_count %u, index_capacity %u, " + "start_hash %llx, end_hash %llx\n", + node->index_area.index_count, + node->index_area.index_capacity, + node->index_area.start_hash, + node->index_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_create_node: + up_write(&node->bmap_array.lock); + up_write(&node->header_lock); + + if (unlikely(err)) + return err; + + err = ssdfs_btree_node_allocate_bmaps(addr, bmap_bytes); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate node's bitmaps: " + "bmap_bytes %zu, err %d\n", + bmap_bytes, err); + return err; + } + + down_write(&node->bmap_array.lock); + for (i = 0; i < SSDFS_BTREE_NODE_BMAP_COUNT; i++) { + spin_lock(&node->bmap_array.bmap[i].lock); + node->bmap_array.bmap[i].ptr = addr[i]; + addr[i] = NULL; + spin_unlock(&node->bmap_array.bmap[i].lock); + } + up_write(&node->bmap_array.lock); + + err = ssdfs_btree_node_allocate_content_space(node, node_size); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate content space: " + "node_size %u, err %d\n", + node_size, err); + return err; + } + + ssdfs_debug_btree_node_object(node); + + return err; +} + +/* + * ssdfs_extents_btree_init_node() - init extents tree's node + * @node: pointer on node object + * + * This method tries to init the node of extents btree. + * + * It makes sense to allocate the bitmap with taking into + * account that we will resize the node. So, it needs + * to allocate the index area in bitmap is equal to + * the whole node and items area is equal to the whole node. + * This technique provides opportunity not to resize or + * to shift the content of the bitmap. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - unable to allocate memory. + * %-ERANGE - internal error. + * %-EIO - invalid node's header content + */ +static +int ssdfs_extents_btree_init_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_btree *tree; + struct ssdfs_extents_btree_info *tree_info = NULL; + struct ssdfs_extents_btree_node_header *hdr; + size_t hdr_size = sizeof(struct ssdfs_extents_btree_node_header); + void *addr[SSDFS_BTREE_NODE_BMAP_COUNT]; + struct page *page; + void *kaddr; + u64 start_hash, end_hash; + u32 node_size; + u16 item_size; + u16 parent_ino; + u32 forks_count; + u16 items_capacity; + u32 allocated_extents, valid_extents; + u64 calculated_extents; + u32 max_extent_blks; + u64 calculated_blks; + u64 blks_count; + u32 items_count; + u16 flags; + u8 index_size; + u16 index_capacity = 0; + size_t bmap_bytes; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node_id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree = node->tree; + if (!tree) { + SSDFS_ERR("node hasn't pointer on tree\n"); + return -ERANGE; + } + + if (tree->type != SSDFS_EXTENTS_BTREE) { + SSDFS_WARN("invalid tree type %#x\n", + tree->type); + return -ERANGE; + } else { + tree_info = container_of(tree, + struct ssdfs_extents_btree_info, + buffer.tree); + } + + if (atomic_read(&node->state) != SSDFS_BTREE_NODE_CONTENT_PREPARED) { + SSDFS_WARN("fail to init node: id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); + return -ERANGE; + } + + down_write(&node->full_lock); + + if (pagevec_count(&node->content.pvec) == 0) { + err = -ERANGE; + SSDFS_ERR("empty node's content: id %u\n", + node->node_id); + goto finish_init_node; + } + + page = node->content.pvec.pages[0]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + kaddr = kmap_local_page(page); + + hdr = (struct ssdfs_extents_btree_node_header *)kaddr; + + if (!is_csum_valid(&hdr->node.check, hdr, hdr_size)) { + err = -EIO; + SSDFS_ERR("invalid checksum: node_id %u\n", + node->node_id); + goto finish_init_operation; + } + + if (le32_to_cpu(hdr->node.magic.common) != SSDFS_SUPER_MAGIC || + le16_to_cpu(hdr->node.magic.key) != SSDFS_EXTENTS_BNODE_MAGIC) { + err = -EIO; + SSDFS_ERR("invalid magic: common %#x, key %#x\n", + le32_to_cpu(hdr->node.magic.common), + le16_to_cpu(hdr->node.magic.key)); + goto finish_init_operation; + } + + down_write(&node->header_lock); + + ssdfs_memcpy(&node->raw.extents_header, 0, hdr_size, + hdr, 0, hdr_size, + hdr_size); + + err = ssdfs_btree_init_node(node, &hdr->node, + hdr_size); + if (unlikely(err)) { + SSDFS_ERR("fail to init node: id %u, err %d\n", + node->node_id, err); + goto finish_header_init; + } + + start_hash = le64_to_cpu(hdr->node.start_hash); + end_hash = le64_to_cpu(hdr->node.end_hash); + node_size = 1 << hdr->node.log_node_size; + index_size = hdr->node.index_size; + item_size = hdr->node.min_item_size; + items_capacity = le16_to_cpu(hdr->node.items_capacity); + parent_ino = le64_to_cpu(hdr->parent_ino); + forks_count = le32_to_cpu(hdr->forks_count); + allocated_extents = le32_to_cpu(hdr->allocated_extents); + valid_extents = le32_to_cpu(hdr->valid_extents); + max_extent_blks = le32_to_cpu(hdr->max_extent_blks); + blks_count = le64_to_cpu(hdr->blks_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_hash %llx, end_hash %llx, forks_count %u, " + "allocated_extents %u, valid_extents %u, " + "blks_count %llu\n", + start_hash, end_hash, forks_count, + allocated_extents, valid_extents, + blks_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (parent_ino != tree_info->owner->vfs_inode.i_ino) { + err = -EIO; + SSDFS_ERR("parent_ino %u != ino %lu\n", + parent_ino, + tree_info->owner->vfs_inode.i_ino); + goto finish_header_init; + } + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_ROOT_NODE: + case SSDFS_BTREE_INDEX_NODE: + /* do nothing */ + break; + + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_LEAF_NODE: + if (item_size == 0 || node_size % item_size) { + err = -EIO; + SSDFS_ERR("invalid size: item_size %u, node_size %u\n", + item_size, node_size); + goto finish_header_init; + } + + if (item_size != sizeof(struct ssdfs_raw_fork)) { + err = -EIO; + SSDFS_ERR("invalid item_size: " + "size %u, expected size %zu\n", + item_size, + sizeof(struct ssdfs_raw_fork)); + goto finish_header_init; + } + + if (items_capacity == 0 || + items_capacity > (node_size / item_size)) { + err = -EIO; + SSDFS_ERR("invalid items_capacity %u\n", + items_capacity); + goto finish_header_init; + } + + if (forks_count > items_capacity) { + err = -EIO; + SSDFS_ERR("items_capacity %u != forks_count %u\n", + items_capacity, + forks_count); + goto finish_header_init; + } + + if (valid_extents > allocated_extents) { + err = -EIO; + SSDFS_ERR("valid_extents %u > allocated_extents %u\n", + valid_extents, allocated_extents); + goto finish_header_init; + } + + calculated_extents = (u64)forks_count * + SSDFS_INLINE_EXTENTS_COUNT; + if (calculated_extents != allocated_extents) { + err = -EIO; + SSDFS_ERR("calculated_extents %llu != allocated_extents %u\n", + calculated_extents, allocated_extents); + goto finish_header_init; + } + + calculated_blks = (u64)valid_extents * max_extent_blks; + if (calculated_blks < blks_count) { + err = -EIO; + SSDFS_ERR("calculated_blks %llu < blks_count %llu\n", + calculated_blks, blks_count); + goto finish_header_init; + } + break; + + default: + BUG(); + } + + node->items_area.items_count = (u16)forks_count; + node->items_area.items_capacity = items_capacity; + +finish_header_init: + up_write(&node->header_lock); + + if (unlikely(err)) + goto finish_init_operation; + + items_count = node_size / item_size; + + if (item_size > 0) + items_capacity = node_size / item_size; + else + items_capacity = 0; + + if (index_size > 0) + index_capacity = node_size / index_size; + else + index_capacity = 0; + + bmap_bytes = index_capacity + items_capacity + 1; + bmap_bytes += BITS_PER_LONG; + bmap_bytes /= BITS_PER_BYTE; + + if (bmap_bytes == 0 || bmap_bytes > SSDFS_EXTENT_MAX_BMAP_SIZE) { + err = -EIO; + SSDFS_ERR("invalid bmap_bytes %zu\n", + bmap_bytes); + goto finish_init_operation; + } + + err = ssdfs_btree_node_allocate_bmaps(addr, bmap_bytes); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate node's bitmaps: " + "bmap_bytes %zu, err %d\n", + bmap_bytes, err); + goto finish_init_operation; + } + + down_write(&node->bmap_array.lock); + + flags = atomic_read(&node->flags); + if (flags & SSDFS_BTREE_NODE_HAS_INDEX_AREA) { + node->bmap_array.index_start_bit = + SSDFS_BTREE_NODE_HEADER_INDEX + 1; + /* + * Reserve the whole node space as + * potential space for indexes. + */ + index_capacity = node_size / index_size; + node->bmap_array.item_start_bit = + node->bmap_array.index_start_bit + index_capacity; + } else if (flags & SSDFS_BTREE_NODE_HAS_ITEMS_AREA) { + node->bmap_array.item_start_bit = + SSDFS_BTREE_NODE_HEADER_INDEX + 1; + } else + BUG(); + + node->bmap_array.bits_count = index_capacity + items_capacity + 1; + node->bmap_array.bmap_bytes = bmap_bytes; + + ssdfs_btree_node_init_bmaps(node, addr); + + spin_lock(&node->bmap_array.bmap[SSDFS_BTREE_NODE_ALLOC_BMAP].lock); + bitmap_set(node->bmap_array.bmap[SSDFS_BTREE_NODE_ALLOC_BMAP].ptr, + 0, forks_count); + spin_unlock(&node->bmap_array.bmap[SSDFS_BTREE_NODE_ALLOC_BMAP].lock); + + up_write(&node->bmap_array.lock); +finish_init_operation: + kunmap_local(kaddr); + + if (unlikely(err)) + goto finish_init_node; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("forks_count %lld\n", + atomic64_read(&tree_info->forks_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_init_node: + up_write(&node->full_lock); + + return err; +} + +static +void ssdfs_extents_btree_destroy_node(struct ssdfs_btree_node *node) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("operation is unavailable\n"); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +/* + * ssdfs_extents_btree_add_node() - add node into extents btree + * @node: pointer on node object + * + * This method tries to finish addition of node into extents btree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_extents_btree_add_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_btree *tree; + int type; + u16 items_capacity = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node_id %u, state %#x, type %#x\n", + node->node_id, atomic_read(&node->state), + atomic_read(&node->type)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected states */ + break; + + default: + SSDFS_WARN("invalid node: id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); + return -ERANGE; + } + + type = atomic_read(&node->type); + + switch (type) { + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_LEAF_NODE: + /* expected states */ + break; + + default: + SSDFS_WARN("invalid node type %#x\n", type); + return -ERANGE; + }; + + tree = node->tree; + if (!tree) { + SSDFS_ERR("node hasn't pointer on tree\n"); + return -ERANGE; + } + + down_write(&node->header_lock); + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + items_capacity = node->items_area.items_capacity; + break; + default: + items_capacity = 0; + break; + }; + + if (items_capacity == 0) { + if (type == SSDFS_BTREE_LEAF_NODE || + type == SSDFS_BTREE_HYBRID_NODE) { + err = -ERANGE; + SSDFS_ERR("invalid node state: " + "type %#x, items_capacity %u\n", + type, items_capacity); + goto finish_add_node; + } + } + +finish_add_node: + up_write(&node->header_lock); + + if (err) + return err; + + err = ssdfs_btree_update_parent_node_pointer(tree, node); + if (unlikely(err)) { + SSDFS_ERR("fail to update parent pointer: " + "node_id %u, err %d\n", + node->node_id, err); + return err; + } + + return 0; +} + + +static +int ssdfs_extents_btree_delete_node(struct ssdfs_btree_node *node) +{ + /* TODO: implement */ + SSDFS_DBG("TODO: implement %s\n", __func__); + return 0; + + +/* + * TODO: it needs to add special free space descriptor in the + * index area for the case of deleted nodes. Code of + * allocation of new items should create empty node + * with completely free items during passing through + * index level. + */ + + + +/* + * TODO: node can be really deleted/invalidated. But index + * area should contain index for deleted node with + * special flag. In this case it will be clear that + * we have some capacity without real node allocation. + * If some item will be added in the node then node + * has to be allocated. It means that if you delete + * a node then index hierachy will be the same without + * necessity to delete or modify it. + */ + + + + /* TODO: decrement nodes_count and/or leaf_nodes counters */ + /* TODO: decrease inodes_capacity and/or free_inodes */ +} + +/* + * ssdfs_extents_btree_pre_flush_node() - pre-flush node's header + * @node: pointer on node object + * + * This method tries to flush node's header. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_extents_btree_pre_flush_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_extents_btree_node_header extents_header; + size_t hdr_size = sizeof(struct ssdfs_extents_btree_node_header); + struct ssdfs_btree *tree; + struct ssdfs_extents_btree_info *tree_info = NULL; + struct ssdfs_state_bitmap *bmap; + struct page *page; + u16 items_count; + u32 forks_count; + u32 allocated_extents; + u32 valid_extents; + u32 max_extent_blks; + u64 blks_count; + u64 calculated_extents; + u64 calculated_blks; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node_id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + case SSDFS_BTREE_NODE_INITIALIZED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is clean\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + + case SSDFS_BTREE_NODE_CORRUPTED: + SSDFS_WARN("node %u is corrupted\n", + node->node_id); + down_read(&node->bmap_array.lock); + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_DIRTY_BMAP]; + spin_lock(&bmap->lock); + bitmap_clear(bmap->ptr, 0, node->bmap_array.bits_count); + spin_unlock(&bmap->lock); + up_read(&node->bmap_array.lock); + clear_ssdfs_btree_node_dirty(node); + return -EFAULT; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + tree = node->tree; + if (!tree) { + SSDFS_ERR("node hasn't pointer on tree\n"); + return -ERANGE; + } + + if (tree->type != SSDFS_EXTENTS_BTREE) { + SSDFS_WARN("invalid tree type %#x\n", + tree->type); + return -ERANGE; + } else { + tree_info = container_of(tree, + struct ssdfs_extents_btree_info, + buffer.tree); + } + + down_write(&node->full_lock); + down_write(&node->header_lock); + + ssdfs_memcpy(&extents_header, 0, hdr_size, + &node->raw.extents_header, 0, hdr_size, + hdr_size); + + extents_header.node.magic.common = cpu_to_le32(SSDFS_SUPER_MAGIC); + extents_header.node.magic.key = cpu_to_le16(SSDFS_EXTENTS_BNODE_MAGIC); + extents_header.node.magic.version.major = SSDFS_MAJOR_REVISION; + extents_header.node.magic.version.minor = SSDFS_MINOR_REVISION; + + err = ssdfs_btree_node_pre_flush_header(node, &extents_header.node); + if (unlikely(err)) { + SSDFS_ERR("fail to flush generic header: " + "node_id %u, err %d\n", + node->node_id, err); + goto finish_extents_header_preparation; + } + + if (!tree_info->owner) { + err = -ERANGE; + SSDFS_WARN("fail to extract parent_ino\n"); + goto finish_extents_header_preparation; + } + + extents_header.parent_ino = + cpu_to_le64(tree_info->owner->vfs_inode.i_ino); + + items_count = node->items_area.items_count; + forks_count = le32_to_cpu(extents_header.forks_count); + allocated_extents = le32_to_cpu(extents_header.allocated_extents); + valid_extents = le32_to_cpu(extents_header.valid_extents); + max_extent_blks = le32_to_cpu(extents_header.max_extent_blks); + blks_count = le64_to_cpu(extents_header.blks_count); + + if (forks_count != items_count) { + err = -ERANGE; + SSDFS_ERR("forks_count %u != items_count %u\n", + forks_count, items_count); + goto finish_extents_header_preparation; + } + + if (valid_extents > allocated_extents) { + err = -ERANGE; + SSDFS_ERR("valid_extents %u > allocated_extents %u\n", + valid_extents, allocated_extents); + goto finish_extents_header_preparation; + } + + calculated_extents = (u64)forks_count * SSDFS_INLINE_EXTENTS_COUNT; + if (calculated_extents != allocated_extents) { + err = -ERANGE; + SSDFS_ERR("calculated_extents %llu != allocated_extents %u\n", + calculated_extents, allocated_extents); + goto finish_extents_header_preparation; + } + + calculated_blks = (u64)valid_extents * max_extent_blks; + if (calculated_blks < blks_count) { + err = -ERANGE; + SSDFS_ERR("calculated_blks %llu < blks_count %llu\n", + calculated_blks, blks_count); + goto finish_extents_header_preparation; + } + + extents_header.node.check.bytes = cpu_to_le16((u16)hdr_size); + extents_header.node.check.flags = cpu_to_le16(SSDFS_CRC32); + + err = ssdfs_calculate_csum(&extents_header.node.check, + &extents_header, hdr_size); + if (unlikely(err)) { + SSDFS_ERR("unable to calculate checksum: err %d\n", err); + goto finish_extents_header_preparation; + } + + ssdfs_memcpy(&node->raw.extents_header, 0, hdr_size, + &extents_header, 0, hdr_size, + hdr_size); + +finish_extents_header_preparation: + up_write(&node->header_lock); + + if (unlikely(err)) + goto finish_node_pre_flush; + + if (pagevec_count(&node->content.pvec) < 1) { + err = -ERANGE; + SSDFS_ERR("pagevec is empty\n"); + goto finish_node_pre_flush; + } + + page = node->content.pvec.pages[0]; + ssdfs_memcpy_to_page(page, 0, PAGE_SIZE, + &extents_header, 0, hdr_size, + hdr_size); + +finish_node_pre_flush: + up_write(&node->full_lock); + + return err; +} + +/* + * ssdfs_extents_btree_flush_node() - flush node + * @node: pointer on node object + * + * This method tries to flush node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_extents_btree_flush_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_btree *tree; + struct ssdfs_extents_btree_info *tree_info = NULL; + int private_flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node %p, node_id %u\n", + node, node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree = node->tree; + if (!tree) { + SSDFS_ERR("node hasn't pointer on tree\n"); + return -ERANGE; + } + + if (tree->type != SSDFS_EXTENTS_BTREE) { + SSDFS_WARN("invalid tree type %#x\n", + tree->type); + return -ERANGE; + } else { + tree_info = container_of(tree, + struct ssdfs_extents_btree_info, + buffer.tree); + } + + private_flags = atomic_read(&tree_info->owner->private_flags); + + if (private_flags & SSDFS_INODE_HAS_EXTENTS_BTREE) { + switch (atomic_read(&tree_info->type)) { + case SSDFS_PRIVATE_EXTENTS_BTREE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid tree type %#x\n", + atomic_read(&tree_info->type)); + return -ERANGE; + } + + err = ssdfs_btree_common_node_flush(node); + if (unlikely(err)) { + SSDFS_ERR("fail to flush node: " + "node_id %u, height %u, err %d\n", + node->node_id, + atomic_read(&node->height), + err); + } + } else { + err = -ERANGE; + SSDFS_ERR("extents tree is inline forks array\n"); + } + + return err; +} + +/****************************************************************************** + * SPECIALIZED EXTENTS BTREE NODE OPERATIONS * + ******************************************************************************/ + +/* + * ssdfs_convert_lookup2item_index() - convert lookup into item index + * @node_size: size of the node in bytes + * @lookup_index: lookup index + */ +static inline +u16 ssdfs_convert_lookup2item_index(u32 node_size, u16 lookup_index) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_size %u, lookup_index %u\n", + node_size, lookup_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_convert_lookup2item_index(lookup_index, node_size, + sizeof(struct ssdfs_raw_fork), + SSDFS_EXTENTS_BTREE_LOOKUP_TABLE_SIZE); +} + +/* + * ssdfs_convert_item2lookup_index() - convert item into lookup index + * @node_size: size of the node in bytes + * @item_index: item index + */ +static inline +u16 ssdfs_convert_item2lookup_index(u32 node_size, u16 item_index) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_size %u, item_index %u\n", + node_size, item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_convert_item2lookup_index(item_index, node_size, + sizeof(struct ssdfs_raw_fork), + SSDFS_EXTENTS_BTREE_LOOKUP_TABLE_SIZE); +} + +/* + * is_hash_for_lookup_table() - should item's hash be into lookup table? + * @node_size: size of the node in bytes + * @item_index: item index + */ +static inline +bool is_hash_for_lookup_table(u32 node_size, u16 item_index) +{ + u16 lookup_index; + u16 calculated; + + lookup_index = ssdfs_convert_item2lookup_index(node_size, item_index); + calculated = ssdfs_convert_lookup2item_index(node_size, lookup_index); + + return calculated == item_index; +} + +/* + * ssdfs_extents_btree_node_find_lookup_index() - find lookup index + * @node: node object + * @search: search object + * @lookup_index: lookup index [out] + * + * This method tries to find a lookup index for requested items. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - lookup index doesn't exist for requested hash. + */ +static +int ssdfs_extents_btree_node_find_lookup_index(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search, + u16 *lookup_index) +{ + __le64 *lookup_table; + int array_size = SSDFS_EXTENTS_BTREE_LOOKUP_TABLE_SIZE; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search || !lookup_index); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->header_lock); + lookup_table = node->raw.extents_header.lookup_table; + err = ssdfs_btree_node_find_lookup_index_nolock(search, + lookup_table, + array_size, + lookup_index); + up_read(&node->header_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("lookup_index %u, err %d\n", + *lookup_index, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_get_fork_hash_range() - get fork's hash range + * @kaddr: pointer on the fork object + * @start_hash: pointer on the value of starting hash [out] + * @end_hash: pointer on the value of ending hash [out] + */ +static +void ssdfs_get_fork_hash_range(void *kaddr, + u64 *start_hash, + u64 *end_hash) +{ + struct ssdfs_raw_fork *fork; + u64 blks_count; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!kaddr || !start_hash || !end_hash); + + SSDFS_DBG("kaddr %p\n", kaddr); +#endif /* CONFIG_SSDFS_DEBUG */ + + fork = (struct ssdfs_raw_fork *)kaddr; + *start_hash = le64_to_cpu(fork->start_offset); + blks_count = le64_to_cpu(fork->blks_count); + + if (blks_count > 0) + *end_hash = *start_hash + blks_count - 1; + else + *end_hash = *start_hash; +} + +/* + * ssdfs_check_found_fork() - check found fork + * @fsi: pointer on shared file system object + * @search: search object + * @kaddr: pointer on the fork object + * @item_index: index of the item + * @start_hash: pointer on the value of starting hash [out] + * @end_hash: pointer on the value of ending hash [out] + * @found_index: pointer on the value with found index [out] + * + * This method tries to check the found fork. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - possible place was found. + */ +static +int ssdfs_check_found_fork(struct ssdfs_fs_info *fsi, + struct ssdfs_btree_search *search, + void *kaddr, + u16 item_index, + u64 *start_hash, + u64 *end_hash, + u16 *found_index) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search || !kaddr || !found_index); + BUG_ON(!start_hash || !end_hash); + + SSDFS_DBG("item_index %u\n", item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + *start_hash = U64_MAX; + *end_hash = U64_MAX; + *found_index = U16_MAX; + + ssdfs_get_fork_hash_range(kaddr, start_hash, end_hash); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("item_index %u, " + "search (start_hash %llx, end_hash %llx), " + "start_hash %llx, end_hash %llx\n", + item_index, + search->request.start.hash, + search->request.end.hash, + *start_hash, *end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (*start_hash <= search->request.start.hash && + *end_hash >= search->request.end.hash) { + /* start_hash is inside the fork */ + *found_index = item_index; + + search->result.state = SSDFS_BTREE_SEARCH_VALID_ITEM; + search->result.err = 0; + search->result.start_index = *found_index; + search->result.count = 1; + } else if (*start_hash > search->request.end.hash) { + *found_index = item_index; + + search->result.state = + SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND; + search->result.err = -ENODATA; + search->result.start_index = item_index; + search->result.count = 1; + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_ADD_ITEM: + case SSDFS_BTREE_SEARCH_ADD_RANGE: + case SSDFS_BTREE_SEARCH_CHANGE_ITEM: + /* do nothing */ + break; + + default: + ssdfs_btree_search_free_result_buf(search); + search->result.buf_state = + SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE; + search->result.buf = NULL; + search->result.buf_size = 0; + search->result.items_in_buffer = 0; + break; + } + } else if ((*end_hash + 1) == search->request.start.hash) { + err = -EAGAIN; + *found_index = item_index + 1; + + search->result.state = + SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND; + search->result.err = -ENODATA; + search->result.start_index = *found_index; + search->result.count = 1; + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_ADD_ITEM: + case SSDFS_BTREE_SEARCH_ADD_RANGE: + case SSDFS_BTREE_SEARCH_CHANGE_ITEM: + /* do nothing */ + break; + + default: + ssdfs_btree_search_free_result_buf(search); + + search->result.buf_state = + SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE; + search->result.buf = NULL; + search->result.buf_size = 0; + search->result.items_in_buffer = 0; + break; + } + } else if (*end_hash < search->request.start.hash) { + err = -EAGAIN; + *found_index = item_index + 1; + } else if (*start_hash > search->request.start.hash && + *end_hash < search->request.end.hash) { + err = -ERANGE; + SSDFS_ERR("requested range is bigger than fork: " + "search (start_hash %llx, end_hash %llx), " + "start_hash %llx, end_hash %llx\n", + search->request.start.hash, + search->request.end.hash, + *start_hash, *end_hash); + } else if (*start_hash > search->request.start.hash && + *end_hash > search->request.end.hash) { + err = -ERANGE; + SSDFS_ERR("requested range exists partially: " + "search (start_hash %llx, end_hash %llx), " + "start_hash %llx, end_hash %llx\n", + search->request.start.hash, + search->request.end.hash, + *start_hash, *end_hash); + } else if (*start_hash < search->request.start.hash && + *end_hash < search->request.end.hash) { + err = -ERANGE; + SSDFS_ERR("requested range exists partially: " + "search (start_hash %llx, end_hash %llx), " + "start_hash %llx, end_hash %llx\n", + search->request.start.hash, + search->request.end.hash, + *start_hash, *end_hash); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found_index %u, err %d\n", + *found_index, err); + + ssdfs_debug_btree_search_object(search); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_prepare_forks_buffer() - prepare buffer for the forks + * @search: search object + * @found_index: found index of the item + * @start_hash: starting hash of the range + * @end_hash: ending hash of the range + * @items_count: count of items in the range + * @item_size: size of the item in bytes + * + * This method tries to prepare the buffers for the forks' range. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate the memory. + */ +static +int ssdfs_prepare_forks_buffer(struct ssdfs_btree_search *search, + u16 found_index, + u64 start_hash, + u64 end_hash, + u16 items_count, + size_t item_size) +{ + u16 found_forks = 0; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); + + SSDFS_DBG("found_index %u, start_hash %llx, end_hash %llx, " + "items_count %u, item_size %zu\n", + found_index, start_hash, end_hash, + items_count, item_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_FIND_ITEM: + case SSDFS_BTREE_SEARCH_FIND_RANGE: + case SSDFS_BTREE_SEARCH_DELETE_ITEM: + case SSDFS_BTREE_SEARCH_DELETE_RANGE: + case SSDFS_BTREE_SEARCH_DELETE_ALL: + case SSDFS_BTREE_SEARCH_INVALIDATE_TAIL: + /* continue logic */ + break; + + default: + /* + * Do not touch buffer. + * It contains prepared fork. + */ + search->result.count = search->result.items_in_buffer; + return 0; + } + + ssdfs_btree_search_free_result_buf(search); + + if (start_hash <= search->request.end.hash && + search->request.end.hash <= end_hash) { + /* use inline buffer */ + found_forks = 1; + } else { + /* use external buffer */ + if (found_index >= items_count) { + SSDFS_ERR("found_index %u >= items_count %u\n", + found_index, items_count); + return -ERANGE; + } + found_forks = items_count - found_index; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found_forks %u\n", found_forks); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (found_forks == 1) { + search->result.buf_state = + SSDFS_BTREE_SEARCH_INLINE_BUFFER; + search->result.buf = &search->raw.fork; + search->result.buf_size = item_size; + search->result.items_in_buffer = 0; + } else { + err = ssdfs_btree_search_alloc_result_buf(search, + item_size * found_forks); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate memory for buffer\n"); + return err; + } + } + + return 0; +} + +/* + * ssdfs_extract_found_fork() - extract found fork + * @fsi: pointer on shared file system object + * @search: search object + * @item_size: size of the item in bytes + * @kaddr: pointer on the fork object + * @start_hash: pointer on the value of starting hash [out] + * @end_hash: pointer on the value of ending hash [out] + * + * This method tries to extract the found fork. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_extract_found_fork(struct ssdfs_fs_info *fsi, + struct ssdfs_btree_search *search, + size_t item_size, + void *kaddr, + u64 *start_hash, + u64 *end_hash) +{ + u32 calculated; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search || !kaddr); + BUG_ON(!start_hash || !end_hash); + + SSDFS_DBG("kaddr %p\n", kaddr); +#endif /* CONFIG_SSDFS_DEBUG */ + + *start_hash = U64_MAX; + *end_hash = U64_MAX; + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_FIND_ITEM: + case SSDFS_BTREE_SEARCH_FIND_RANGE: + case SSDFS_BTREE_SEARCH_DELETE_ITEM: + case SSDFS_BTREE_SEARCH_DELETE_RANGE: + case SSDFS_BTREE_SEARCH_DELETE_ALL: + case SSDFS_BTREE_SEARCH_INVALIDATE_TAIL: + /* continue logic */ + break; + + default: + /* + * Do not touch buffer. + * It contains prepared fork. + */ + search->result.count = search->result.items_in_buffer; + return 0; + } + + calculated = search->result.items_in_buffer * item_size; + if (calculated >= search->result.buf_size) { + SSDFS_ERR("calculated %u >= buf_size %zu\n", + calculated, search->result.buf_size); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_get_fork_hash_range(kaddr, start_hash, end_hash); + ssdfs_memcpy(search->result.buf, calculated, + search->result.buf_size, + kaddr, 0, item_size, + item_size); + search->result.items_in_buffer++; + search->result.count++; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search (result.items_in_buffer %u, " + "result.count %u)\n", + search->result.items_in_buffer, + search->result.count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (*start_hash <= search->request.start.hash && + *end_hash >= search->request.end.hash) { + /* start_hash is inside the fork */ + search->result.state = SSDFS_BTREE_SEARCH_VALID_ITEM; + } else { + /* request is outside the fork */ + search->result.state = SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND; + } + + return 0; +} + +/* + * ssdfs_extract_range_by_lookup_index() - extract a range of items + * @node: pointer on node object + * @lookup_index: lookup index for requested range + * @search: pointer on search request object + * + * This method tries to extract a range of items from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - requested range is out of the node. + */ +static +int ssdfs_extract_range_by_lookup_index(struct ssdfs_btree_node *node, + u16 lookup_index, + struct ssdfs_btree_search *search) +{ + int capacity = SSDFS_EXTENTS_BTREE_LOOKUP_TABLE_SIZE; + size_t item_size = sizeof(struct ssdfs_raw_fork); + + return __ssdfs_extract_range_by_lookup_index(node, lookup_index, + capacity, item_size, + search, + ssdfs_check_found_fork, + ssdfs_prepare_forks_buffer, + ssdfs_extract_found_fork); +} + +/* + * ssdfs_btree_search_result_no_data() - prepare result state for no data case + * @node: pointer on node object + * @lookup_index: lookup index + * @search: pointer on search request object [in|out] + * + * This method prepares result state for no data case. + */ +static inline +void ssdfs_btree_search_result_no_data(struct ssdfs_btree_node *node, + u16 lookup_index, + struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->result.state = SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND; + search->result.err = -ENODATA; + search->result.start_index = + ssdfs_convert_lookup2item_index(node->node_size, + lookup_index); + search->result.count = search->request.count; + search->result.search_cno = ssdfs_current_cno(node->tree->fsi->sb); + + if (!is_btree_search_contains_new_item(search)) { + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_ADD_ITEM: + case SSDFS_BTREE_SEARCH_ADD_RANGE: + case SSDFS_BTREE_SEARCH_CHANGE_ITEM: + /* do nothing */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->result.buf_state = + SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE; + search->result.buf = NULL; + search->result.buf_size = 0; + search->result.items_in_buffer = 0; + break; + } + } +} + +/* + * ssdfs_extents_btree_node_find_range() - find a range of items into the node + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to find a range of items into the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - requested range is out of the node. + * %-ENOMEM - unable to allocate memory. + */ +static +int ssdfs_extents_btree_node_find_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + int state; + u16 items_count; + u16 items_capacity; + u64 start_hash; + u64 end_hash; + u16 lookup_index; + int res; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->header_lock); + state = atomic_read(&node->items_area.state); + items_count = node->items_area.items_count; + items_capacity = node->items_area.items_capacity; + start_hash = node->items_area.start_hash; + end_hash = node->items_area.end_hash; + up_read(&node->header_lock); + + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + SSDFS_ERR("invalid area state %#x\n", + state); + return -ERANGE; + } + + if (items_capacity == 0 || items_count > items_capacity) { + SSDFS_ERR("corrupted node description: " + "items_count %u, items_capacity %u\n", + items_count, + items_capacity); + return -ERANGE; + } + + if (search->request.count > items_capacity) { + SSDFS_ERR("invalid request: " + "count %u, items_capacity %u\n", + search->request.count, + items_capacity); + return -ERANGE; + } + + res = ssdfs_btree_node_check_hash_range(node, + items_count, + items_capacity, + start_hash, + end_hash, + search); + if (res == -ENODATA) { + /* continue extract the fork */ + err = res; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_count %u, items_capacity %u, " + "start_hash %llx, end_hash %llx, err %d\n", + items_count, items_capacity, + start_hash, end_hash, err); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (res) { + SSDFS_ERR("items_count %u, items_capacity %u, " + "start_hash %llx, end_hash %llx, err %d\n", + items_count, items_capacity, + start_hash, end_hash, res); + return res; + } + + res = ssdfs_extents_btree_node_find_lookup_index(node, search, + &lookup_index); + if (res == -ENODATA) { + err = res; + ssdfs_btree_search_result_no_data(node, lookup_index, search); + /* continue extract the fork */ + } else if (unlikely(res)) { + SSDFS_ERR("fail to find the index: " + "start_hash %llx, end_hash %llx, err %d\n", + search->request.start.hash, + search->request.end.hash, + res); + return res; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(lookup_index >= SSDFS_EXTENTS_BTREE_LOOKUP_TABLE_SIZE); +#endif /* CONFIG_SSDFS_DEBUG */ + + res = ssdfs_extract_range_by_lookup_index(node, lookup_index, + search); + search->request.count = search->result.count; + search->result.search_cno = ssdfs_current_cno(node->tree->fsi->sb); + + ssdfs_debug_btree_search_object(search); + + if (res == -ENODATA) { + err = res; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u is empty\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (res == -EAGAIN) { + err = -ENODATA; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u contains not all requested blocks: " + "node (start_hash %llx, end_hash %llx), " + "request (start_hash %llx, end_hash %llx)\n", + node->node_id, + start_hash, end_hash, + search->request.start.hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + ssdfs_btree_search_result_no_data(node, lookup_index, search); + } else if (unlikely(res)) { + SSDFS_ERR("fail to extract range: " + "node %u (start_hash %llx, end_hash %llx), " + "request (start_hash %llx, end_hash %llx), " + "err %d\n", + node->node_id, + start_hash, end_hash, + search->request.start.hash, + search->request.end.hash, + err); + return res; + } + + search->request.flags &= ~SSDFS_BTREE_SEARCH_INLINE_BUF_HAS_NEW_ITEM; + + return err; +} + +/* + * ssdfs_extents_btree_node_find_item() - find item into node + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to find an item into the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_extents_btree_node_find_item(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + return ssdfs_extents_btree_node_find_range(node, search); +} + +static +int ssdfs_extents_btree_node_allocate_item(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("operation is unavailable\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EOPNOTSUPP; +} + +static +int ssdfs_extents_btree_node_allocate_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("operation is unavailable\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EOPNOTSUPP; +} + +/* + * __ssdfs_extents_btree_node_get_fork() - extract the fork from pagevec + * @pvec: pointer on pagevec + * @area_offset: area offset from the node's beginning + * @area_size: area size + * @node_size: size of the node + * @item_index: index of the fork in the node + * @fork: pointer on fork's buffer [out] + * + * This method tries to extract the fork from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int __ssdfs_extents_btree_node_get_fork(struct pagevec *pvec, + u32 area_offset, + u32 area_size, + u32 node_size, + u16 item_index, + struct ssdfs_raw_fork *fork) +{ + size_t item_size = sizeof(struct ssdfs_raw_fork); + u32 item_offset; + int page_index; + struct page *page; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pvec || !fork); + + SSDFS_DBG("area_offset %u, area_size %u, item_index %u\n", + area_offset, area_size, item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + item_offset = (u32)item_index * item_size; + if (item_offset >= area_size) { + SSDFS_ERR("item_offset %u >= area_size %u\n", + item_offset, area_size); + return -ERANGE; + } + + item_offset += area_offset; + if (item_offset >= node_size) { + SSDFS_ERR("item_offset %u >= node_size %u\n", + item_offset, node_size); + return -ERANGE; + } + + page_index = item_offset >> PAGE_SHIFT; + + if (page_index > 0) + item_offset %= page_index * PAGE_SIZE; + + if (page_index >= pagevec_count(pvec)) { + SSDFS_ERR("invalid page_index: " + "index %d, pvec_size %u\n", + page_index, + pagevec_count(pvec)); + return -ERANGE; + } + + page = pvec->pages[page_index]; + err = ssdfs_memcpy_from_page(fork, 0, item_size, + page, item_offset, PAGE_SIZE, + item_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + return err; + } + + return 0; +} + +/* + * ssdfs_extents_btree_node_get_fork() - extract fork from the node + * @node: pointer on node object + * @area: items area descriptor + * @item_index: index of the fork + * @fork: pointer on extracted fork [out] + * + * This method tries to extract the fork from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_extents_btree_node_get_fork(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + u16 item_index, + struct ssdfs_raw_fork *fork) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !fork); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, item_index %u\n", + node->node_id, item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_extents_btree_node_get_fork(&node->content.pvec, + area->offset, + area->area_size, + node->node_size, + item_index, + fork); +} + +/* + * is_requested_position_correct() - check that requested position is correct + * @node: pointer on node object + * @area: items area descriptor + * @search: search object + * + * This method tries to check that requested position of a fork + * into the node is correct. + * + * RETURN: + * [success] + * + * %SSDFS_CORRECT_POSITION - requested position is correct. + * %SSDFS_SEARCH_LEFT_DIRECTION - correct position from the left. + * %SSDFS_SEARCH_RIGHT_DIRECTION - correct position from the right. + * + * [failure] - error code: + * + * %SSDFS_CHECK_POSITION_FAILURE - internal error. + */ +static +int is_requested_position_correct(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + struct ssdfs_btree_search *search) +{ + struct ssdfs_raw_fork fork; + u16 item_index; + u64 start_offset; + u64 blks_count; + u64 end_offset; + int direction = SSDFS_CHECK_POSITION_FAILURE; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, item_index %u\n", + node->node_id, search->result.start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + item_index = search->result.start_index; + if ((item_index + search->result.count) > area->items_capacity) { + SSDFS_ERR("invalid request: " + "item_index %u, count %u\n", + item_index, search->result.count); + return SSDFS_CHECK_POSITION_FAILURE; + } + + if (item_index >= area->items_count) { + if (area->items_count == 0) + item_index = area->items_count; + else + item_index = area->items_count - 1; + + search->result.start_index = item_index; + } + + + err = ssdfs_extents_btree_node_get_fork(node, area, + item_index, &fork); + if (unlikely(err)) { + SSDFS_ERR("fail to extract the fork: " + "item_index %u, err %d\n", + item_index, err); + return SSDFS_CHECK_POSITION_FAILURE; + } + + start_offset = le64_to_cpu(fork.start_offset); + blks_count = le64_to_cpu(fork.blks_count); + + if (start_offset >= U64_MAX || blks_count >= U64_MAX) { + SSDFS_ERR("invalid fork\n"); + return SSDFS_CHECK_POSITION_FAILURE; + } + + if (blks_count > 0) + end_offset = start_offset + blks_count - 1; + else + end_offset = start_offset; + + if (start_offset <= search->request.start.hash && + search->request.start.hash < end_offset) + direction = SSDFS_CORRECT_POSITION; + else if (search->request.start.hash < start_offset) + direction = SSDFS_SEARCH_LEFT_DIRECTION; + else + direction = SSDFS_SEARCH_RIGHT_DIRECTION; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_offset %llx, end_offset %llx, " + "search (start_hash %llx, end_hash %llx), " + "direction %#x\n", + start_offset, end_offset, + search->request.start.hash, + search->request.end.hash, + direction); +#endif /* CONFIG_SSDFS_DEBUG */ + + return direction; +} + +/* + * ssdfs_find_correct_position_from_left() - find position from the left + * @node: pointer on node object + * @area: items area descriptor + * @search: search object + * + * This method tries to find a correct position of the fork + * from the left side of forks' sequence in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_find_correct_position_from_left(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + struct ssdfs_btree_search *search) +{ + struct ssdfs_raw_fork fork; + int item_index; + u64 start_offset; + u64 blks_count; + u64 end_offset; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, item_index %u\n", + node->node_id, search->result.start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + item_index = search->result.start_index; + if ((item_index + search->request.count) > area->items_capacity) { + SSDFS_ERR("invalid request: " + "item_index %d, count %u\n", + item_index, search->request.count); + return -ERANGE; + } + + if (item_index >= area->items_count) { + if (area->items_count == 0) + item_index = area->items_count; + else + item_index = area->items_count - 1; + + search->result.start_index = (u16)item_index; + } + + if (area->items_count == 0) + return 0; + + for (; item_index >= 0; item_index--) { + err = ssdfs_extents_btree_node_get_fork(node, area, + (u16)item_index, + &fork); + if (unlikely(err)) { + SSDFS_ERR("fail to extract the fork: " + "item_index %d, err %d\n", + item_index, err); + return err; + } + + start_offset = le64_to_cpu(fork.start_offset); + blks_count = le64_to_cpu(fork.blks_count); + + if (blks_count > 0) + end_offset = start_offset + blks_count - 1; + else + end_offset = start_offset; + + if (start_offset <= search->request.start.hash && + search->request.start.hash < end_offset) { + search->result.start_index = (u16)item_index; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search->result.start_index %u\n", + search->result.start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; + } else if (end_offset <= search->request.start.hash) { + search->result.start_index = (u16)(item_index + 1); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search->result.start_index %u\n", + search->result.start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; + } + } + + search->result.start_index = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search->result.start_index %u\n", + search->result.start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_find_correct_position_from_right() - find position from the right + * @node: pointer on node object + * @area: items area descriptor + * @search: search object + * + * This method tries to find a correct position of the fork + * from the right side of forks' sequence in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_find_correct_position_from_right(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + struct ssdfs_btree_search *search) +{ + struct ssdfs_raw_fork fork; + int item_index; + u64 start_offset; + u64 blks_count; + u64 end_offset; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, item_index %u\n", + node->node_id, search->result.start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + item_index = search->result.start_index; + if ((item_index + search->result.count) > area->items_capacity) { + SSDFS_ERR("invalid request: " + "item_index %d, count %u, " + "area->items_capacity %u\n", + item_index, search->result.count, + area->items_capacity); + return -ERANGE; + } + + if (item_index >= area->items_count) { + if (area->items_count == 0) + item_index = area->items_count; + else + item_index = area->items_count - 1; + + search->result.start_index = (u16)item_index; + } + + + for (; item_index < area->items_count; item_index++) { + err = ssdfs_extents_btree_node_get_fork(node, area, + (u16)item_index, + &fork); + if (unlikely(err)) { + SSDFS_ERR("fail to extract the fork: " + "item_index %d, err %d\n", + item_index, err); + return err; + } + + start_offset = le64_to_cpu(fork.start_offset); + blks_count = le64_to_cpu(fork.blks_count); + + if (blks_count > 0) + end_offset = start_offset + blks_count - 1; + else + end_offset = start_offset; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_offset %llx, end_offset %llx, " + "search (start_hash %llx, end_hash %llx)\n", + start_offset, end_offset, + search->request.start.hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (start_offset <= search->request.start.hash && + search->request.start.hash <= end_offset) { + search->result.start_index = (u16)item_index; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search->result.start_index %u\n", + search->result.start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; + } else if (search->request.end.hash < start_offset) { + if (item_index == 0) { + search->result.start_index = + (u16)item_index; + } else { + search->result.start_index = + (u16)(item_index - 1); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search->result.start_index %u\n", + search->result.start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; + } + } + + search->result.start_index = area->items_count; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search->result.start_index %u\n", + search->result.start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_clean_lookup_table() - clean unused space of lookup table + * @node: pointer on node object + * @area: items area descriptor + * @start_index: starting index + * + * This method tries to clean the unused space of lookup table. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_clean_lookup_table(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + u16 start_index) +{ + __le64 *lookup_table; + u16 lookup_index; + u16 item_index; + u16 items_count; + u16 items_capacity; + u16 cleaning_indexes; + u32 cleaning_bytes; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, start_index %u\n", + node->node_id, start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + items_capacity = node->items_area.items_capacity; + if (start_index >= items_capacity) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_index %u >= items_capacity %u\n", + start_index, items_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + lookup_table = node->raw.extents_header.lookup_table; + + lookup_index = ssdfs_convert_item2lookup_index(node->node_size, + start_index); + if (unlikely(lookup_index >= SSDFS_EXTENTS_BTREE_LOOKUP_TABLE_SIZE)) { + SSDFS_ERR("invalid lookup_index %u\n", + lookup_index); + return -ERANGE; + } + + items_count = node->items_area.items_count; + item_index = ssdfs_convert_lookup2item_index(node->node_size, + lookup_index); + if (unlikely(item_index >= items_capacity)) { + SSDFS_ERR("item_index %u >= items_capacity %u\n", + item_index, items_capacity); + return -ERANGE; + } + + if (item_index != start_index) + lookup_index++; + + cleaning_indexes = + SSDFS_EXTENTS_BTREE_LOOKUP_TABLE_SIZE - lookup_index; + cleaning_bytes = cleaning_indexes * sizeof(__le64); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("lookup_index %u, cleaning_indexes %u, cleaning_bytes %u\n", + lookup_index, cleaning_indexes, cleaning_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset(&lookup_table[lookup_index], 0xFF, cleaning_bytes); + + return 0; +} + +/* + * ssdfs_correct_lookup_table() - correct lookup table of the node + * @node: pointer on node object + * @area: items area descriptor + * @start_index: starting index of the range + * @range_len: number of items in the range + * + * This method tries to correct the lookup table of the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_correct_lookup_table(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + u16 start_index, u16 range_len) +{ + __le64 *lookup_table; + struct ssdfs_raw_fork fork; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (range_len == 0) { + SSDFS_WARN("range == 0\n"); + return -ERANGE; + } + + lookup_table = node->raw.extents_header.lookup_table; + + for (i = 0; i < range_len; i++) { + int item_index = start_index + i; + u16 lookup_index; + + if (is_hash_for_lookup_table(node->node_size, item_index)) { + lookup_index = + ssdfs_convert_item2lookup_index(node->node_size, + item_index); + + err = ssdfs_extents_btree_node_get_fork(node, area, + item_index, + &fork); + if (unlikely(err)) { + SSDFS_ERR("fail to extract fork: " + "item_index %d, err %d\n", + item_index, err); + return err; + } + + lookup_table[lookup_index] = fork.start_offset; + } + } + + return 0; +} + +/* + * ssdfs_initialize_lookup_table() - initialize lookup table + * @node: pointer on node object + */ +static +void ssdfs_initialize_lookup_table(struct ssdfs_btree_node *node) +{ + __le64 *lookup_table; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + lookup_table = node->raw.extents_header.lookup_table; + memset(lookup_table, 0xFF, + sizeof(__le64) * SSDFS_EXTENTS_BTREE_LOOKUP_TABLE_SIZE); +} + +/* + * ssdfs_calculate_range_blocks() - calculate number of blocks in range + * @search: search object + * @valid_extents: number of valid extents in the range [out] + * @blks_count: number of blocks in the range [out] + * @max_extent_blks: maximal number of blocks in one extent [out] + * + * This method tries to calculate the @valid_extents, + * @blks_count, @max_extent_blks in the range. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_calculate_range_blocks(struct ssdfs_btree_search *search, + u32 *valid_extents, + u64 *blks_count, + u32 *max_extent_blks) +{ + struct ssdfs_raw_fork *fork; + size_t item_size = sizeof(struct ssdfs_raw_fork); + u32 items; + int i, j; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search || !valid_extents || !blks_count || !max_extent_blks); + + SSDFS_DBG("node_id %u\n", search->node.id); +#endif /* CONFIG_SSDFS_DEBUG */ + + *valid_extents = 0; + *blks_count = 0; + *max_extent_blks = 0; + + switch (search->result.buf_state) { + case SSDFS_BTREE_SEARCH_INLINE_BUFFER: + case SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid buf_state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + if (!search->result.buf) { + SSDFS_ERR("buffer pointer is NULL\n"); + return -ERANGE; + } + + items = search->result.items_in_buffer; + if (search->result.buf_size != (items * item_size)) { + SSDFS_ERR("buf_size %zu, items_in_buffer %u, " + "item_size %zu\n", + search->result.buf_size, + items, item_size); + return -ERANGE; + } + + for (i = 0; i < items; i++) { + u64 blks; + u64 calculated = 0; + + fork = (struct ssdfs_raw_fork *)((u8 *)search->result.buf + + (i * item_size)); + + blks = le64_to_cpu(fork->blks_count); + if (blks >= U64_MAX || blks == 0) { + SSDFS_ERR("corrupted fork: blks_count %llu\n", + blks); + return -ERANGE; + } + + *blks_count += blks; + + for (j = 0; j < SSDFS_INLINE_EXTENTS_COUNT; j++) { + struct ssdfs_raw_extent *extent; + u32 len; + + extent = &fork->extents[j]; + len = le32_to_cpu(extent->len); + + if (len == 0 || len >= U32_MAX) + break; + + calculated += len; + *valid_extents += 1; + + if (*max_extent_blks < len) + *max_extent_blks = len; + } + + if (calculated != blks) { + SSDFS_ERR("calculated %llu != blks %llu\n", + calculated, blks); + return -ERANGE; + } + } + + return 0; +} + +/* + * ssdfs_calculate_range_blocks_in_node() - calculate number of blocks in range + * @node: pointer on node object + * @area: items area descriptor + * @start_index: starting index of the range + * @range_len: number of items in the range + * @valid_extents: number of valid extents in the range [out] + * @blks_count: number of blocks in the range [out] + * @max_extent_blks: maximal number of blocks in one extent [out] + * + * This method tries to calculate the @valid_extents, + * @blks_count, @max_extent_blks in the range inside the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_calculate_range_blocks_in_node(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + u16 start_index, u16 range_len, + u32 *valid_extents, + u64 *blks_count, + u32 *max_extent_blks) +{ + struct ssdfs_raw_fork fork; + int i, j; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area); + BUG_ON(!valid_extents || !blks_count || !max_extent_blks); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, start_index %u, range_len %u\n", + node->node_id, start_index, range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + *valid_extents = 0; + *blks_count = 0; + *max_extent_blks = 0; + + if (range_len == 0) { + SSDFS_WARN("search->request.count == 0\n"); + return -ERANGE; + } + + if ((start_index + range_len) > area->items_count) { + SSDFS_ERR("invalid request: " + "start_index %u, range_len %u, items_count %u\n", + start_index, range_len, area->items_count); + return -ERANGE; + } + + for (i = 0; i < range_len; i++) { + int item_index = (int)start_index + i; + u64 blks; + u64 calculated = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(item_index >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_extents_btree_node_get_fork(node, area, + (u16)item_index, + &fork); + if (unlikely(err)) { + SSDFS_ERR("fail to extract fork: " + "item_index %d, err %d\n", + item_index, err); + return err; + } + + blks = le64_to_cpu(fork.blks_count); + if (blks >= U64_MAX || blks == 0) { + SSDFS_ERR("corrupted fork: blks_count %llu\n", + blks); + return -ERANGE; + } + + *blks_count += blks; + + for (j = 0; j < SSDFS_INLINE_EXTENTS_COUNT; j++) { + struct ssdfs_raw_extent *extent; + u32 len; + + extent = &fork.extents[j]; + len = le32_to_cpu(extent->len); + + if (len == 0 || len >= U32_MAX) + break; + + calculated += len; + *valid_extents += 1; + + if (*max_extent_blks < len) + *max_extent_blks = len; + } + + if (calculated != blks) { + SSDFS_ERR("calculated %llu != blks %llu\n", + calculated, blks); + return -ERANGE; + } + } + + return 0; +} From patchwork Sat Feb 25 01:09:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151974 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A108C64ED8 for ; Sat, 25 Feb 2023 01:20:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229530AbjBYBUh (ORCPT ); Fri, 24 Feb 2023 20:20:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49392 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229496AbjBYBTj (ORCPT ); Fri, 24 Feb 2023 20:19:39 -0500 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AA636D33A for ; Fri, 24 Feb 2023 17:18:02 -0800 (PST) Received: by mail-oi1-x22e.google.com with SMTP id bk32so764695oib.10 for ; Fri, 24 Feb 2023 17:18:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/+IdmaVTVSz8oK7ae/4XwLlJsbsUvDZKaVOtigEZIsg=; b=uNaKafiMIvxk3tPeHYPIEi9vlNu1UKnBEqb5AKPYJjZdXL4vFQEqQ7mWzLhn7plUwS CjOXfn5l8zLeFzffCC+u8aiI6o1pzKNGV4ug/kt+BrEMmgfoGgtrBs9EmgUSjBebi7Vo RPP7hD0sga4K2w8n6OtUxQdQKJaOQyIaPtXuR0wbsxa82XA5VTmotZx8oA+wLN0XCFaB Vrz8ZmkMUxCJKgJbeVFtR5g0z+pvVgyg78p6qGVFGb96KwiWbZ8IVhTZFeDblJ/d6Oez /zlGEB798nwoHeCVNlXK+/TaO7ndhdrJGoR6zG1Dy0K/h6+TBCFdfAPE6BcT+z4u9aGV Lgxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/+IdmaVTVSz8oK7ae/4XwLlJsbsUvDZKaVOtigEZIsg=; b=uI3mkJ0Bzyn6KcTA89buytj3LFE66bzIMfitJhFzfaXYZwu33wOJSY6oiFuYH9+OF0 YyC1zXd7JYz66aRwXSiefu0/orafCL5tzD0GPkOhuvrldTL2xT0UUDAUnGJRqTZ0ameT XTJyc/DqGqjePgaSrQmn4RXl1lH5xhhtK46PGza6MqSz9rtZL3KAGm3In5x+VqoSDxAX /xlWV0nz/4sIOu8A20cAsE3cY3cCrHqy6UHdznR7SSq08R8P6PZ1bypLWvRcaMg1rBqf qR5rt80b4F2m/yXQof77Css96LsphgQXbxcp+TzXtDmwdXWW0UyHm4/S1MePybfBZtDS WheQ== X-Gm-Message-State: AO0yUKX1j2Oo3CMUpkU563yWO8X3cvdW4ka5L4RK0UDpfEK2qa4NN09D K/k2eOSXuUzq90jFwLkVHJPvsP3LJ7rwNDjc X-Google-Smtp-Source: AK7set/mfYBMn+4dybcFuan38LWO10m7Hueg/IpxXTv/OvKRUy6gOcQbpiv3OSCR0vvb7YEU3Cz89Q== X-Received: by 2002:a05:6808:23c2:b0:383:da56:831 with SMTP id bq2-20020a05680823c200b00383da560831mr773243oib.25.1677287880957; Fri, 24 Feb 2023 17:18:00 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.17.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:18:00 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 69/76] ssdfs: add/change/delete extent in extents b-tree node Date: Fri, 24 Feb 2023 17:09:20 -0800 Message-Id: <20230225010927.813929-70-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Implement logic of adding, changing, and deleting extents in extents b-tree. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/extents_tree.c | 3060 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 3060 insertions(+) diff --git a/fs/ssdfs/extents_tree.c b/fs/ssdfs/extents_tree.c index 4b183308eff5..77fb8cc60136 100644 --- a/fs/ssdfs/extents_tree.c +++ b/fs/ssdfs/extents_tree.c @@ -9998,3 +9998,3063 @@ int ssdfs_calculate_range_blocks_in_node(struct ssdfs_btree_node *node, return 0; } + +/* + * __ssdfs_extents_btree_node_insert_range() - insert range of forks into node + * @node: pointer on node object + * @search: search object + * + * This method tries to insert the range of forks into the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int __ssdfs_extents_btree_node_insert_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree *tree; + struct ssdfs_extents_btree_info *etree; + struct ssdfs_extents_btree_node_header *hdr; + struct ssdfs_btree_node_items_area items_area; + struct ssdfs_raw_fork fork; + size_t item_size = sizeof(struct ssdfs_raw_fork); + u16 item_index; + int free_items; + int direction; + u16 range_len; + u16 forks_count = 0; + u32 used_space; + u64 start_hash = U64_MAX; + u64 end_hash = U64_MAX; + u64 old_hash; + u64 blks_count; + u32 valid_extents; + u32 max_extent_blks; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid items_area state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + + tree = node->tree; + + switch (tree->type) { + case SSDFS_EXTENTS_BTREE: + /* expected btree type */ + break; + + default: + SSDFS_ERR("invalid btree type %#x\n", tree->type); + return -ERANGE; + } + + etree = container_of(tree, struct ssdfs_extents_btree_info, + buffer.tree); + + down_read(&node->header_lock); + ssdfs_memcpy(&items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + &node->items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + sizeof(struct ssdfs_btree_node_items_area)); + old_hash = node->items_area.start_hash; + up_read(&node->header_lock); + + if (items_area.items_capacity == 0 || + items_area.items_capacity < items_area.items_count) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid items accounting: " + "node_id %u, items_capacity %u, items_count %u\n", + node->node_id, items_area.items_capacity, + items_area.items_count); + return -EFAULT; + } + + if (items_area.min_item_size != item_size || + items_area.max_item_size != item_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("min_item_size %u, max_item_size %u, " + "item_size %zu\n", + items_area.min_item_size, items_area.max_item_size, + item_size); + return -EFAULT; + } + + if (items_area.area_size == 0 || + items_area.area_size >= node->node_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid area_size %u\n", + items_area.area_size); + return -EFAULT; + } + + if (items_area.free_space > items_area.area_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("free_space %u > area_size %u\n", + items_area.free_space, items_area.area_size); + return -EFAULT; + } + + free_items = items_area.items_capacity - items_area.items_count; + if (unlikely(free_items < 0)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_WARN("invalid free_items %d\n", + free_items); + return -EFAULT; + } else if (free_items == 0) { + SSDFS_DBG("node hasn't free items\n"); + return -ENOSPC; + } + + if (((u64)free_items * item_size) > items_area.free_space) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid free_items: " + "free_items %d, item_size %zu, free_space %u\n", + free_items, item_size, items_area.free_space); + return -EFAULT; + } + + item_index = search->result.start_index; + if ((item_index + search->request.count) >= items_area.items_capacity) { + SSDFS_ERR("invalid request: " + "item_index %u, count %u\n", + item_index, search->request.count); + return -ERANGE; + } + + down_write(&node->full_lock); + + direction = is_requested_position_correct(node, &items_area, + search); + switch (direction) { + case SSDFS_CORRECT_POSITION: + /* do nothing */ + break; + + case SSDFS_SEARCH_LEFT_DIRECTION: + err = ssdfs_find_correct_position_from_left(node, &items_area, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the correct position: " + "err %d\n", + err); + goto finish_detect_affected_items; + } + break; + + case SSDFS_SEARCH_RIGHT_DIRECTION: + err = ssdfs_find_correct_position_from_right(node, &items_area, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the correct position: " + "err %d\n", + err); + goto finish_detect_affected_items; + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("fail to check requested position\n"); + goto finish_detect_affected_items; + } + + range_len = items_area.items_count - search->result.start_index; + forks_count = range_len + search->request.count; + + item_index = search->result.start_index; + if ((item_index + forks_count) > items_area.items_capacity) { + err = -ERANGE; + SSDFS_ERR("invalid forks_count: " + "item_index %u, forks_count %u, items_capacity %u\n", + item_index, forks_count, + items_area.items_capacity); + goto finish_detect_affected_items; + } + + err = ssdfs_lock_items_range(node, item_index, forks_count); + if (err == -ENOENT) { + up_write(&node->full_lock); + wake_up_all(&node->wait_queue); + return -ERANGE; + } else if (err == -ENODATA) { + up_write(&node->full_lock); + wake_up_all(&node->wait_queue); + return -ERANGE; + } else if (unlikely(err)) + BUG(); + +finish_detect_affected_items: + downgrade_write(&node->full_lock); + + if (unlikely(err)) + goto finish_insert_range; + + err = ssdfs_shift_range_right(node, &items_area, item_size, + item_index, range_len, + search->request.count); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to shift forks range: " + "start %u, count %u, err %d\n", + item_index, search->request.count, + err); + goto unlock_items_range; + } + + err = ssdfs_generic_insert_range(node, &items_area, + item_size, search); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to insert range: err %d\n", + err); + goto unlock_items_range; + } + + down_write(&node->header_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_area.items_count %u, search->request.count %u\n", + node->items_area.items_count, + search->request.count); +#endif /* CONFIG_SSDFS_DEBUG */ + + node->items_area.items_count += search->request.count; + if (node->items_area.items_count > node->items_area.items_capacity) { + err = -ERANGE; + SSDFS_ERR("items_count %u > items_capacity %u\n", + node->items_area.items_count, + node->items_area.items_capacity); + goto finish_items_area_correction; + } + + used_space = (u32)search->request.count * item_size; + if (used_space > node->items_area.free_space) { + err = -ERANGE; + SSDFS_ERR("used_space %u > free_space %u\n", + used_space, + node->items_area.free_space); + goto finish_items_area_correction; + } + node->items_area.free_space -= used_space; + + err = ssdfs_extents_btree_node_get_fork(node, &node->items_area, + 0, &fork); + if (unlikely(err)) { + SSDFS_ERR("fail to get fork: err %d\n", err); + goto finish_items_area_correction; + } + start_hash = le64_to_cpu(fork.start_offset); + + err = ssdfs_extents_btree_node_get_fork(node, + &node->items_area, + node->items_area.items_count - 1, + &fork); + if (unlikely(err)) { + SSDFS_ERR("fail to get fork: err %d\n", err); + goto finish_items_area_correction; + } + + end_hash = le64_to_cpu(fork.start_offset); + + blks_count = le64_to_cpu(fork.blks_count); + if (blks_count == 0 || blks_count >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid blks_count %llu\n", + blks_count); + goto finish_items_area_correction; + } + + end_hash += blks_count - 1; + + if (start_hash >= U64_MAX || end_hash >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("start_hash %llx, end_hash %llx\n", + start_hash, end_hash); + goto finish_items_area_correction; + } + + node->items_area.start_hash = start_hash; + node->items_area.end_hash = end_hash; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node->items_area: " + "start_hash %llx, end_hash %llx\n", + node->items_area.start_hash, + node->items_area.end_hash); + SSDFS_DBG("items_area.items_count %u, items_area.items_capacity %u\n", + node->items_area.items_count, + node->items_area.items_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_correct_lookup_table(node, &node->items_area, + item_index, forks_count); + if (unlikely(err)) { + SSDFS_ERR("fail to correct lookup table: " + "err %d\n", err); + goto finish_items_area_correction; + } + + hdr = &node->raw.extents_header; + + le32_add_cpu(&hdr->forks_count, search->request.count); + le32_add_cpu(&hdr->allocated_extents, + search->request.count * SSDFS_INLINE_EXTENTS_COUNT); + + err = ssdfs_calculate_range_blocks(search, &valid_extents, + &blks_count, &max_extent_blks); + if (unlikely(err)) { + SSDFS_ERR("fail to calculate range blocks: err %d\n", + err); + goto finish_items_area_correction; + } + + le32_add_cpu(&hdr->valid_extents, valid_extents); + le64_add_cpu(&hdr->blks_count, blks_count); + + if (le32_to_cpu(hdr->max_extent_blks) < max_extent_blks) + hdr->max_extent_blks = cpu_to_le32(max_extent_blks); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, forks_count %u, allocated_extents %u, " + "valid_extents %u, blks_count %llu\n", + node->node_id, + le32_to_cpu(hdr->forks_count), + le32_to_cpu(hdr->allocated_extents), + le32_to_cpu(hdr->valid_extents), + le64_to_cpu(hdr->blks_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + atomic64_add(search->request.count, &etree->forks_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("forks_count %lld\n", + atomic64_read(&etree->forks_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_items_area_correction: + up_write(&node->header_lock); + + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + goto unlock_items_range; + } + + err = ssdfs_set_node_header_dirty(node, items_area.items_capacity); + if (unlikely(err)) { + SSDFS_ERR("fail to set header dirty: err %d\n", + err); + goto unlock_items_range; + } + + err = ssdfs_set_dirty_items_range(node, items_area.items_capacity, + item_index, forks_count); + if (unlikely(err)) { + SSDFS_ERR("fail to set items range as dirty: " + "start %u, count %u, err %d\n", + item_index, forks_count, err); + goto unlock_items_range; + } + +unlock_items_range: + ssdfs_unlock_items_range(node, item_index, forks_count); + +finish_insert_range: + up_read(&node->full_lock); + + if (unlikely(err)) + return err; + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_HYBRID_NODE: + if (items_area.items_count == 0) { + struct ssdfs_btree_index_key key; + + spin_lock(&node->descriptor_lock); + ssdfs_memcpy(&key, + 0, sizeof(struct ssdfs_btree_index_key), + &node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + spin_unlock(&node->descriptor_lock); + + key.index.hash = cpu_to_le64(start_hash); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, node_type %#x, " + "node_height %u, hash %llx\n", + le32_to_cpu(key.node_id), + key.node_type, + key.height, + le64_to_cpu(key.index.hash)); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_btree_node_add_index(node, &key); + if (unlikely(err)) { + SSDFS_ERR("fail to add index: err %d\n", err); + return err; + } + } else if (old_hash != start_hash) { + struct ssdfs_btree_index_key old_key, new_key; + + spin_lock(&node->descriptor_lock); + ssdfs_memcpy(&old_key, + 0, sizeof(struct ssdfs_btree_index_key), + &node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + ssdfs_memcpy(&new_key, + 0, sizeof(struct ssdfs_btree_index_key), + &node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + spin_unlock(&node->descriptor_lock); + + old_key.index.hash = cpu_to_le64(old_hash); + new_key.index.hash = cpu_to_le64(start_hash); + + err = ssdfs_btree_node_change_index(node, + &old_key, &new_key); + if (unlikely(err)) { + SSDFS_ERR("fail to change index: err %d\n", + err); + return err; + } + } + break; + + default: + /* do nothing */ + break; + } + + return 0; +} + +/* + * ssdfs_extents_btree_node_insert_item() - insert item in the node + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to insert an item in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOSPC - node hasn't free items. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_extents_btree_node_insert_item(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + int state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid result's state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.err == -ENODATA || search->result.err == -EAGAIN) { + search->result.err = 0; + /* + * Node doesn't contain an item. + */ + } else if (search->result.err) { + SSDFS_WARN("invalid search result: err %d\n", + search->result.err); + return search->result.err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search->result: (state %#x, err %d, " + "start_index %u, count %u, buf_state %#x, buf %p, " + "buf_size %zu, items_in_buffer %u)\n", + search->result.state, + search->result.err, + search->result.start_index, + search->result.count, + search->result.buf_state, + search->result.buf, + search->result.buf_size, + search->result.items_in_buffer); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_btree_search_contains_new_item(search)) { + switch (search->result.buf_state) { + case SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE: + search->result.buf_state = + SSDFS_BTREE_SEARCH_INLINE_BUFFER; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + search->result.buf = &search->raw.fork; + search->result.buf_size = sizeof(struct ssdfs_raw_fork); + search->result.items_in_buffer = 1; + break; + + case SSDFS_BTREE_SEARCH_INLINE_BUFFER: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search->result.buf); + BUG_ON(search->result.buf_size != + sizeof(struct ssdfs_raw_fork)); + BUG_ON(search->result.items_in_buffer != 1); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + default: + SSDFS_ERR("unexpected buffer state %#x\n", + search->result.buf_state); + return -ERANGE; + } + } else { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.count != 1); + BUG_ON(!search->result.buf); + BUG_ON(search->result.buf_state != + SSDFS_BTREE_SEARCH_INLINE_BUFFER); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + state = atomic_read(&node->items_area.state); + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + SSDFS_ERR("invalid area state %#x\n", + state); + return -ERANGE; + } + + err = __ssdfs_extents_btree_node_insert_range(node, search); + if (unlikely(err)) { + SSDFS_ERR("fail to insert range: " + "node_id %u, err %d\n", + node->node_id, err); + return err; + } + + return 0; +} + +/* + * ssdfs_extents_btree_node_insert_range() - insert range of items + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to insert a range of items in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOSPC - node hasn't free items. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_extents_btree_node_insert_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + int state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("node_id %u, type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + node->node_id, + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid result's state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.err == -ENODATA) { + search->result.err = 0; + /* + * Node doesn't contain an item. + */ + } else if (search->result.err) { + SSDFS_WARN("invalid serach result: err %d\n", + search->result.err); + return search->result.err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.count <= 1); + BUG_ON(!search->result.buf); + BUG_ON(search->result.buf_state != SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_read(&node->items_area.state); + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + SSDFS_ERR("invalid area state %#x\n", + state); + return -ERANGE; + } + + err = __ssdfs_extents_btree_node_insert_range(node, search); + if (unlikely(err)) { + SSDFS_ERR("fail to insert range: " + "node_id %u, err %d\n", + node->node_id, err); + return err; + } + + return 0; +} + +/* + * ssdfs_change_item_only() - change fork in the node + * @node: pointer on node object + * @area: pointer on items area's descriptor + * @search: pointer on search request object + * + * This method tries to change an item in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_change_item_only(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + struct ssdfs_btree_search *search) +{ + struct ssdfs_raw_fork fork; + size_t item_size = sizeof(struct ssdfs_raw_fork); + struct ssdfs_extents_btree_node_header *hdr; + u16 item_index; + u16 range_len; + u64 start_hash, end_hash; + u64 old_blks_count, blks_count, diff_blks_count; + u32 old_valid_extents, valid_extents, diff_valid_extents; + u32 old_max_extent_blks, max_extent_blks; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + range_len = search->result.count; + + if (range_len == 0) { + err = -ERANGE; + SSDFS_ERR("empty range\n"); + return err; + } + + item_index = search->result.start_index; + if ((item_index + range_len) > area->items_count) { + err = -ERANGE; + SSDFS_ERR("invalid request: " + "item_index %u, range_len %u, items_count %u\n", + item_index, range_len, + area->items_count); + return err; + } + + err = ssdfs_calculate_range_blocks_in_node(node, area, + item_index, range_len, + &old_valid_extents, + &old_blks_count, + &old_max_extent_blks); + if (unlikely(err)) { + SSDFS_ERR("fail to calculate range's blocks: " + "node_id %u, item_index %u, range_len %u\n", + node->node_id, item_index, range_len); + return err; + } + + err = ssdfs_generic_insert_range(node, area, + item_size, search); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to insert range: err %d\n", + err); + return err; + } + + down_write(&node->header_lock); + + start_hash = node->items_area.start_hash; + end_hash = node->items_area.end_hash; + + if (item_index == 0) { + err = ssdfs_extents_btree_node_get_fork(node, + &node->items_area, + item_index, &fork); + if (unlikely(err)) { + SSDFS_ERR("fail to get fork: err %d\n", err); + goto finish_items_area_correction; + } + start_hash = le64_to_cpu(fork.start_offset); + } + + if ((item_index + range_len) == node->items_area.items_count) { + err = ssdfs_extents_btree_node_get_fork(node, + &node->items_area, + item_index + range_len - 1, + &fork); + if (unlikely(err)) { + SSDFS_ERR("fail to get fork: err %d\n", err); + goto finish_items_area_correction; + } + + end_hash = le64_to_cpu(fork.start_offset); + + blks_count = le64_to_cpu(fork.blks_count); + if (blks_count == 0 || blks_count >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid blks_count %llu\n", + blks_count); + goto finish_items_area_correction; + } + + end_hash += blks_count - 1; + + if (start_hash >= U64_MAX || end_hash >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("start_hash %llx, end_hash %llx\n", + start_hash, end_hash); + goto finish_items_area_correction; + } + } else if ((item_index + range_len) > node->items_area.items_count) { + err = -ERANGE; + SSDFS_ERR("invalid range_len: " + "item_index %u, range_len %u, items_count %u\n", + item_index, range_len, + node->items_area.items_count); + goto finish_items_area_correction; + } + + node->items_area.start_hash = start_hash; + node->items_area.end_hash = end_hash; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_area: start_hash %llx, end_hash %llx\n", + node->items_area.start_hash, + node->items_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_correct_lookup_table(node, &node->items_area, + item_index, range_len); + if (unlikely(err)) { + SSDFS_ERR("fail to correct lookup table: " + "err %d\n", err); + goto finish_items_area_correction; + } + + err = ssdfs_calculate_range_blocks(search, &valid_extents, + &blks_count, &max_extent_blks); + if (unlikely(err)) { + SSDFS_ERR("fail to calculate range blocks: err %d\n", + err); + goto finish_items_area_correction; + } + + hdr = &node->raw.extents_header; + + if (old_valid_extents < valid_extents) { + diff_valid_extents = valid_extents - old_valid_extents; + valid_extents = le32_to_cpu(hdr->valid_extents); + + if (valid_extents >= (U32_MAX - diff_valid_extents)) { + err = -ERANGE; + SSDFS_ERR("valid_extents %u, diff_valid_extents %u\n", + valid_extents, diff_valid_extents); + goto finish_items_area_correction; + } + + valid_extents += diff_valid_extents; + hdr->valid_extents = cpu_to_le32(valid_extents); + } else if (old_valid_extents > valid_extents) { + diff_valid_extents = old_valid_extents - valid_extents; + valid_extents = le32_to_cpu(hdr->valid_extents); + + if (valid_extents < diff_valid_extents) { + err = -ERANGE; + SSDFS_ERR("valid_extents %u < diff_valid_extents %u\n", + valid_extents, diff_valid_extents); + goto finish_items_area_correction; + } + + valid_extents -= diff_valid_extents; + hdr->valid_extents = cpu_to_le32(valid_extents); + } + + if (old_blks_count < blks_count) { + diff_blks_count = blks_count - old_blks_count; + blks_count = le64_to_cpu(hdr->blks_count); + + if (blks_count >= (U64_MAX - diff_blks_count)) { + err = -ERANGE; + SSDFS_ERR("blks_count %llu, diff_blks_count %llu\n", + blks_count, diff_blks_count); + goto finish_items_area_correction; + } + + blks_count += diff_blks_count; + hdr->blks_count = cpu_to_le64(blks_count); + } else if (old_blks_count > blks_count) { + diff_blks_count = old_blks_count - blks_count; + blks_count = le32_to_cpu(hdr->blks_count); + + if (blks_count < diff_blks_count) { + err = -ERANGE; + SSDFS_ERR("blks_count %llu < diff_blks_count %llu\n", + blks_count, diff_blks_count); + goto finish_items_area_correction; + } + + blks_count -= diff_blks_count; + hdr->blks_count = cpu_to_le64(blks_count); + } + + if (le32_to_cpu(hdr->max_extent_blks) < max_extent_blks) + hdr->max_extent_blks = cpu_to_le32(max_extent_blks); + +finish_items_area_correction: + up_write(&node->header_lock); + + if (unlikely(err)) + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + + return err; +} + +/* + * ssdfs_invalidate_forks_range() - invalidate range of forks + * @node: pointer on node object + * @area: pointer on items area's descriptor + * @start_index: starting index of the fork + * @range_len: number of forks in the range + * + * This method tries to add the range of forks into + * pre-invalid queue of the shared extents tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_invalidate_forks_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + u16 start_index, u16 range_len) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_shared_extents_tree *shextree; + struct ssdfs_raw_fork fork; + u64 ino; + u16 cur_index; + u16 i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !node->tree->fsi || !area); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, start_index %u, range_len %u\n", + node->node_id, start_index, range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + shextree = fsi->shextree; + + if (!shextree) { + SSDFS_ERR("shared extents tree is absent\n"); + return -ERANGE; + } + + ino = node->tree->owner_ino; + + if ((start_index + range_len) > area->items_count) { + SSDFS_ERR("invalid request: " + "start_index %u, range_len %u\n", + start_index, range_len); + return -ERANGE; + } + + for (i = 0; i < range_len; i++) { + cur_index = start_index + i; + + err = ssdfs_extents_btree_node_get_fork(node, area, + cur_index, + &fork); + if (unlikely(err)) { + SSDFS_ERR("fail to get fork: " + "cur_index %u, err %d\n", + cur_index, err); + return err; + } + + err = ssdfs_shextree_add_pre_invalid_fork(shextree, ino, &fork); + if (unlikely(err)) { + SSDFS_ERR("fail to make the fork pre-invalid: " + "cur_index %u, err %d\n", + cur_index, err); + return err; + } + } + + return 0; +} + +/* + * ssdfs_define_first_invalid_index() - find the first index for hash + * @node: pointer on node object + * @hash: searching hash + * @start_index: found index [out] + * + * The method tries to find the index for the hash. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EAGAIN - unable to find an index. + */ +static +int ssdfs_define_first_invalid_index(struct ssdfs_btree_node *node, + u64 hash, u16 *start_index) +{ + bool node_locked_outside = false; + struct ssdfs_btree_node_index_area area; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !start_index); + + SSDFS_DBG("node_id %u, hash %llx\n", + node->node_id, hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_ssdfs_btree_node_index_area_exist(node)) { + SSDFS_ERR("index area is absent\n"); + return -ERANGE; + } + + node_locked_outside = rwsem_is_locked(&node->full_lock); + + if (!node_locked_outside) { + /* lock node locally */ + down_read(&node->full_lock); + } + + down_read(&node->header_lock); + ssdfs_memcpy(&area, + 0, sizeof(struct ssdfs_btree_node_index_area), + &node->index_area, + 0, sizeof(struct ssdfs_btree_node_index_area), + sizeof(struct ssdfs_btree_node_index_area)); + err = ssdfs_find_index_by_hash(node, &area, hash, + start_index); + up_read(&node->header_lock); + + if (err == -EEXIST) { + /* hash == found hash */ + err = 0; + } else if (err == -ENODATA) { + err = -EAGAIN; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find an index: " + "node_id %u, hash %llx\n", + node->node_id, hash); +#endif /* CONFIG_SSDFS_DEBUG */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to find an index: " + "node_id %u, hash %llx, err %d\n", + node->node_id, hash, err); + } + + if (!node_locked_outside) { + /* unlock node locally */ + up_read(&node->full_lock); + } + + return err; +} + +/* + * ssdfs_invalidate_index_tail() - invalidate the tail of index sequence + * @node: pointer on node object + * @start_index: starting index + * + * The method tries to invalidate the tail of index sequence. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_invalidate_index_tail(struct ssdfs_btree_node *node, + u16 start_index) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_shared_extents_tree *shextree; + struct ssdfs_btree *tree; + struct ssdfs_btree_node_items_area items_area; + struct ssdfs_btree_node_index_area index_area; + struct ssdfs_btree_index_key index; + int node_type; + int index_type = SSDFS_EXTENT_INFO_UNKNOWN_TYPE; + u64 ino; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !node->tree->fsi); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, start_index %u\n", + node->node_id, start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + shextree = fsi->shextree; + + if (!shextree) { + SSDFS_ERR("shared extents tree is absent\n"); + return -ERANGE; + } + + ino = node->tree->owner_ino; + + tree = node->tree; + switch (tree->type) { + case SSDFS_EXTENTS_BTREE: + index_type = SSDFS_EXTENT_INFO_INDEX_DESCRIPTOR; + break; + + case SSDFS_DENTRIES_BTREE: + index_type = SSDFS_EXTENT_INFO_DENTRY_INDEX_DESCRIPTOR; + break; + + default: + SSDFS_ERR("unsupported tree type %#x\n", + tree->type); + return -ERANGE; + } + + if (!is_ssdfs_btree_node_index_area_exist(node)) { + SSDFS_ERR("index area is absent\n"); + return -ERANGE; + } + + down_read(&node->header_lock); + ssdfs_memcpy(&items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + &node->items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + sizeof(struct ssdfs_btree_node_items_area)); + ssdfs_memcpy(&index_area, + 0, sizeof(struct ssdfs_btree_node_index_area), + &node->index_area, + 0, sizeof(struct ssdfs_btree_node_index_area), + sizeof(struct ssdfs_btree_node_index_area)); + up_read(&node->header_lock); + + if (is_ssdfs_btree_node_items_area_exist(node)) { + err = ssdfs_invalidate_forks_range(node, &items_area, + 0, items_area.items_count); + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate forks range: " + "node_id %u, range (start %u, count %u), " + "err %d\n", + node->node_id, 0, items_area.items_count, + err); + goto finish_invalidate_index_tail; + } + } + + err = ssdfs_lock_whole_index_area(node); + if (unlikely(err)) { + SSDFS_ERR("fail to lock source's index area: err %d\n", + err); + goto finish_invalidate_index_tail; + } + + if (start_index >= index_area.index_count) { + err = -ERANGE; + SSDFS_ERR("start_index %u >= index_count %u\n", + start_index, index_area.index_count); + goto finish_process_index_area; + } + + node_type = atomic_read(&node->type); + + for (i = start_index; i < index_area.index_count; i++) { + if (node_type == SSDFS_BTREE_ROOT_NODE) { + err = __ssdfs_btree_root_node_extract_index(node, + (u16)i, + &index); + } else { + err = ssdfs_btree_node_get_index(&node->content.pvec, + index_area.offset, + index_area.area_size, + node->node_size, + (u16)i, &index); + } + + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to extract index: " + "node_id %u, index %d, err %d\n", + node->node_id, i, err); + goto finish_process_index_area; + } + + err = ssdfs_shextree_add_pre_invalid_index(shextree, + ino, + index_type, + &index); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to pre-invalid index: " + "index_id %d, err %d\n", + i, err); + goto finish_process_index_area; + } + } + + down_write(&node->header_lock); + + for (i = index_area.index_count - 1; i >= start_index; i--) { + if (node_type == SSDFS_BTREE_ROOT_NODE) { + err = ssdfs_btree_root_node_delete_index(node, + (u16)i); + } else { + err = ssdfs_btree_common_node_delete_index(node, + (u16)i); + } + + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to delete index: " + "node_id %u, index %d, err %d\n", + node->node_id, i, err); + goto finish_index_deletion; + } + } + +finish_index_deletion: + up_write(&node->header_lock); + +finish_process_index_area: + ssdfs_unlock_whole_index_area(node); + +finish_invalidate_index_tail: + return err; +} + +/* + * __ssdfs_invalidate_items_area() - invalidate the items area + * @node: pointer on node object + * @area: pointer on items area's descriptor + * @start_index: starting index of the fork + * @range_len: number of forks in the range + * @search: pointer on search request object + * + * The method tries to invalidate the items area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_invalidate_items_area(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + u16 start_index, u16 range_len, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_node *parent = NULL, *found = NULL; + struct ssdfs_extents_btree_node_header *hdr; + bool items_area_empty = false; + bool is_hybrid = false; + bool has_index_area = false; + bool index_area_empty = false; + int parent_type = SSDFS_BTREE_LEAF_NODE; + u64 hash; + spinlock_t *lock; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, start_index %u, range_len %u\n", + node->node_id, start_index, range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (range_len == 0) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("nothing should be done: range_len %u\n", + range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + if (((u32)start_index + range_len) > area->items_count) { + SSDFS_ERR("start_index %u, range_len %u, items_count %u\n", + start_index, range_len, + area->items_count); + return -ERANGE; + } + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_HYBRID_NODE: + is_hybrid = true; + break; + + case SSDFS_BTREE_LEAF_NODE: + is_hybrid = false; + break; + + default: + SSDFS_WARN("invalid node type %#x\n", + atomic_read(&node->type)); + return -ERANGE; + } + + if (!(search->request.flags & SSDFS_BTREE_SEARCH_NOT_INVALIDATE)) { + err = ssdfs_invalidate_forks_range(node, area, + start_index, range_len); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to invalidate range of forks: " + "node_id %u, start_index %u, " + "range_len %u, err %d\n", + node->node_id, start_index, + range_len, err); + return err; + } + } + + down_write(&node->header_lock); + + hdr = &node->raw.extents_header; + if (node->items_area.items_count == range_len) { + items_area_empty = true; + } + + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + has_index_area = true; + if (node->index_area.index_count == 0) + index_area_empty = true; + else + index_area_empty = false; + break; + + default: + has_index_area = false; + index_area_empty = false; + break; + } + + up_write(&node->header_lock); + + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + return err; + } + + if (!(search->request.flags & SSDFS_BTREE_SEARCH_NOT_INVALIDATE)) + goto finish_invalidate_items_area; + + if (!items_area_empty) + goto finish_invalidate_items_area; + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_DELETE_ALL: + search->result.state = + SSDFS_BTREE_SEARCH_OBSOLETE_RESULT; + + parent = node; + + do { + lock = &parent->descriptor_lock; + spin_lock(lock); + parent = parent->parent_node; + spin_unlock(lock); + lock = NULL; + + if (!parent) { + SSDFS_ERR("node %u hasn't parent\n", + node->node_id); + return -ERANGE; + } + + parent_type = atomic_read(&parent->type); + switch (parent_type) { + case SSDFS_BTREE_ROOT_NODE: + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid parent node's type %#x\n", + parent_type); + return -ERANGE; + } + } while (parent_type != SSDFS_BTREE_ROOT_NODE); + + err = ssdfs_invalidate_root_node_hierarchy(parent); + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate root node hierarchy: " + "err %d\n", err); + return -ERANGE; + } + break; + + case SSDFS_BTREE_SEARCH_DELETE_ITEM: + case SSDFS_BTREE_SEARCH_DELETE_RANGE: + case SSDFS_BTREE_SEARCH_INVALIDATE_TAIL: + if (is_hybrid && has_index_area && !index_area_empty) { + search->result.state = + SSDFS_BTREE_SEARCH_OBSOLETE_RESULT; + } else { + search->result.state = + SSDFS_BTREE_SEARCH_PLEASE_DELETE_NODE; + } + + hash = search->request.start.hash; + + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + err = ssdfs_define_first_invalid_index(node, hash, + &start_index); + if (err == -EAGAIN) { + err = 0; + /* continue to search */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to define first index: " + "err %d\n", err); + return err; + } else if (start_index >= U16_MAX) { + SSDFS_ERR("invalid start index\n"); + return -ERANGE; + } else { + found = node; + goto try_invalidate_tail; + } + break; + + case SSDFS_BTREE_NODE_AREA_ABSENT: + /* need to check the parent */ + break; + + default: + SSDFS_ERR("invalid index area: " + "node_id %u, state %#x\n", + node->node_id, + atomic_read(&node->index_area.state)); + return -ERANGE; + } + + parent = node; + + do { + lock = &parent->descriptor_lock; + spin_lock(lock); + parent = parent->parent_node; + spin_unlock(lock); + lock = NULL; + + if (!parent) { + SSDFS_ERR("node %u hasn't parent\n", + node->node_id); + return -ERANGE; + } + + parent_type = atomic_read(&parent->type); + switch (parent_type) { + case SSDFS_BTREE_ROOT_NODE: + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid parent node's type %#x\n", + parent_type); + return -ERANGE; + } + + switch (atomic_read(&parent->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + err = ssdfs_define_first_invalid_index(parent, + hash, + &start_index); + if (err == -EAGAIN) { + err = 0; + /* continue to search */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to define first index: " + "err %d\n", err); + return err; + } else if (start_index >= U16_MAX) { + SSDFS_ERR("invalid start index\n"); + return -ERANGE; + } else { + found = parent; + goto try_invalidate_tail; + } + break; + + default: + SSDFS_ERR("index area is absent: " + "node_id %u, height %d\n", + parent->node_id, + atomic_read(&parent->height)); + return -ERANGE; + } + } while (parent_type != SSDFS_BTREE_ROOT_NODE); + + if (found == NULL) { + SSDFS_ERR("fail to find start index\n"); + return -ERANGE; + } + +try_invalidate_tail: + err = ssdfs_invalidate_index_tail(found, start_index); + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate the index tail: " + "node_id %u, start_index %u, err %d\n", + found->node_id, start_index, err); + return err; + } + break; + + default: + atomic_set(&node->state, + SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid request type %#x\n", + search->request.type); + return -ERANGE; + } + +finish_invalidate_items_area: + return 0; +} + +/* + * ssdfs_invalidate_whole_items_area() - invalidate the whole items area + * @node: pointer on node object + * @area: pointer on items area's descriptor + * @search: pointer on search request object + * + * The method tries to invalidate the items area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_invalidate_whole_items_area(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, area %p, search %p\n", + node->node_id, area, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_invalidate_items_area(node, area, + 0, area->items_count, + search); +} + +/* + * ssdfs_invalidate_items_area_partially() - invalidate the items area + * @node: pointer on node object + * @area: pointer on items area's descriptor + * @start_index: starting index of the fork + * @range_len: number of forks in the range + * @search: pointer on search request object + * + * The method tries to invalidate the items area partially. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_invalidate_items_area_partially(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + u16 start_index, u16 range_len, + struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, start_index %u, range_len %u\n", + node->node_id, start_index, range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_invalidate_items_area(node, area, + start_index, range_len, + search); +} + +/* + * ssdfs_change_item_and_invalidate_tail() - change fork and invalidate tail + * @node: pointer on node object + * @area: pointer on items area's descriptor + * @search: pointer on search request object + * + * This method tries to change an item in the node and invalidate + * the tail forks sequence. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_change_item_and_invalidate_tail(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + struct ssdfs_btree_search *search) +{ + struct ssdfs_raw_fork fork; + size_t item_size = sizeof(struct ssdfs_raw_fork); + struct ssdfs_extents_btree_node_header *hdr; + u16 item_index; + u16 range_len; + u64 start_hash, end_hash; + u64 old_blks_count, blks_count, diff_blks_count; + u32 old_valid_extents, valid_extents, diff_valid_extents; + u32 old_max_extent_blks, max_extent_blks; + u16 invalidate_index, invalidate_range; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + range_len = search->result.count; + + if (range_len == 0) { + err = -ERANGE; + SSDFS_ERR("empty range\n"); + return err; + } + + item_index = search->result.start_index; + if ((item_index + range_len) > area->items_count) { + err = -ERANGE; + SSDFS_ERR("invalid request: " + "item_index %u, range_len %u, items_count %u\n", + item_index, range_len, + area->items_count); + return err; + } + + err = ssdfs_calculate_range_blocks_in_node(node, area, + item_index, range_len, + &old_valid_extents, + &old_blks_count, + &old_max_extent_blks); + if (unlikely(err)) { + SSDFS_ERR("fail to calculate range's blocks: " + "node_id %u, item_index %u, range_len %u\n", + node->node_id, item_index, range_len); + return err; + } + + err = ssdfs_generic_insert_range(node, area, + item_size, search); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to insert range: err %d\n", + err); + return err; + } + + invalidate_index = item_index + range_len; + invalidate_range = area->items_count - invalidate_index; + + err = ssdfs_invalidate_items_area_partially(node, area, + invalidate_index, + invalidate_range, + search); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to invalidate items range: err %d\n", + err); + return err; + } + + down_write(&node->header_lock); + + start_hash = node->items_area.start_hash; + end_hash = node->items_area.end_hash; + + err = ssdfs_extents_btree_node_get_fork(node, + &node->items_area, + item_index, + &fork); + if (unlikely(err)) { + SSDFS_ERR("fail to get fork: err %d\n", err); + goto finish_items_area_correction; + } + + if (item_index == 0) + start_hash = le64_to_cpu(fork.start_offset); + + end_hash = le64_to_cpu(fork.start_offset); + + blks_count = le64_to_cpu(fork.blks_count); + if (blks_count == 0 || blks_count >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid blks_count %llu\n", + blks_count); + goto finish_items_area_correction; + } + + end_hash += blks_count - 1; + + if (start_hash >= U64_MAX || end_hash >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("start_hash %llx, end_hash %llx\n", + start_hash, end_hash); + goto finish_items_area_correction; + } + + err = ssdfs_correct_lookup_table(node, &node->items_area, + item_index, range_len); + if (unlikely(err)) { + SSDFS_ERR("fail to correct lookup table: " + "err %d\n", err); + goto finish_items_area_correction; + } + + err = ssdfs_calculate_range_blocks(search, &valid_extents, + &blks_count, &max_extent_blks); + if (unlikely(err)) { + SSDFS_ERR("fail to calculate range blocks: err %d\n", + err); + goto finish_items_area_correction; + } + + hdr = &node->raw.extents_header; + + if (old_valid_extents < valid_extents) { + diff_valid_extents = valid_extents - old_valid_extents; + valid_extents = le32_to_cpu(hdr->valid_extents); + + if (valid_extents >= (U32_MAX - diff_valid_extents)) { + err = -ERANGE; + SSDFS_ERR("valid_extents %u, diff_valid_extents %u\n", + valid_extents, diff_valid_extents); + goto finish_items_area_correction; + } + + valid_extents += diff_valid_extents; + hdr->valid_extents = cpu_to_le32(valid_extents); + } else if (old_valid_extents > valid_extents) { + diff_valid_extents = old_valid_extents - valid_extents; + valid_extents = le32_to_cpu(hdr->valid_extents); + + if (valid_extents < diff_valid_extents) { + err = -ERANGE; + SSDFS_ERR("valid_extents %u < diff_valid_extents %u\n", + valid_extents, diff_valid_extents); + goto finish_items_area_correction; + } + + valid_extents -= diff_valid_extents; + hdr->valid_extents = cpu_to_le32(valid_extents); + } + + if (old_blks_count < blks_count) { + diff_blks_count = blks_count - old_blks_count; + blks_count = le64_to_cpu(hdr->blks_count); + + if (blks_count >= (U64_MAX - diff_blks_count)) { + err = -ERANGE; + SSDFS_ERR("blks_count %llu, diff_blks_count %llu\n", + blks_count, diff_blks_count); + goto finish_items_area_correction; + } + + blks_count += diff_blks_count; + hdr->blks_count = cpu_to_le64(blks_count); + } else if (old_blks_count > blks_count) { + diff_blks_count = old_blks_count - blks_count; + blks_count = le32_to_cpu(hdr->blks_count); + + if (blks_count < diff_blks_count) { + err = -ERANGE; + SSDFS_ERR("blks_count %llu < diff_blks_count %llu\n", + blks_count, diff_blks_count); + goto finish_items_area_correction; + } + + blks_count -= diff_blks_count; + hdr->blks_count = cpu_to_le64(blks_count); + } + + if (le32_to_cpu(hdr->max_extent_blks) < max_extent_blks) + hdr->max_extent_blks = cpu_to_le32(max_extent_blks); + +finish_items_area_correction: + up_write(&node->header_lock); + + if (unlikely(err)) + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + + return err; +} + +/* + * ssdfs_extents_btree_node_change_item() - change item in the node + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to change an item in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_extents_btree_node_change_item(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + size_t item_size = sizeof(struct ssdfs_raw_fork); + struct ssdfs_btree_node_items_area items_area; + u16 item_index; + int direction; + u16 range_len; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search->result.state != SSDFS_BTREE_SEARCH_VALID_ITEM) { + SSDFS_ERR("invalid result's state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.err) { + SSDFS_WARN("invalid search result: err %d\n", + search->result.err); + return search->result.err; + } + + if (is_btree_search_contains_new_item(search)) { + switch (search->result.buf_state) { + case SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE: + search->result.buf_state = + SSDFS_BTREE_SEARCH_INLINE_BUFFER; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + search->result.buf = &search->raw.fork; + search->result.buf_size = sizeof(struct ssdfs_raw_fork); + search->result.items_in_buffer = 1; + break; + + case SSDFS_BTREE_SEARCH_INLINE_BUFFER: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search->result.buf); + BUG_ON(search->result.buf_size != + sizeof(struct ssdfs_raw_fork)); + BUG_ON(search->result.items_in_buffer != 1); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + default: + SSDFS_ERR("unexpected buffer state %#x\n", + search->result.buf_state); + return -ERANGE; + } + } else { +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.count != 1); + BUG_ON(!search->result.buf); + BUG_ON(search->result.buf_state != + SSDFS_BTREE_SEARCH_INLINE_BUFFER); + BUG_ON(search->result.items_in_buffer != 1); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid items_area state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + + down_read(&node->header_lock); + ssdfs_memcpy(&items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + &node->items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + sizeof(struct ssdfs_btree_node_items_area)); + up_read(&node->header_lock); + + if (items_area.items_capacity == 0 || + items_area.items_capacity < items_area.items_count) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid items accounting: " + "node_id %u, items_capacity %u, items_count %u\n", + node->node_id, items_area.items_capacity, + items_area.items_count); + return -EFAULT; + } + + if (items_area.min_item_size != item_size || + items_area.max_item_size != item_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("min_item_size %u, max_item_size %u, " + "item_size %zu\n", + items_area.min_item_size, items_area.max_item_size, + item_size); + return -EFAULT; + } + + if (items_area.area_size == 0 || + items_area.area_size >= node->node_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid area_size %u\n", + items_area.area_size); + return -EFAULT; + } + + down_write(&node->full_lock); + + direction = is_requested_position_correct(node, &items_area, + search); + switch (direction) { + case SSDFS_CORRECT_POSITION: + /* do nothing */ + break; + + case SSDFS_SEARCH_LEFT_DIRECTION: + err = ssdfs_find_correct_position_from_left(node, &items_area, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the correct position: " + "err %d\n", + err); + goto finish_define_changing_items; + } + break; + + case SSDFS_SEARCH_RIGHT_DIRECTION: + err = ssdfs_find_correct_position_from_right(node, &items_area, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the correct position: " + "err %d\n", + err); + goto finish_define_changing_items; + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("fail to check requested position\n"); + goto finish_define_changing_items; + } + + range_len = search->result.count; + + if (range_len == 0) { + err = -ERANGE; + SSDFS_ERR("empty range\n"); + goto finish_define_changing_items; + } + + item_index = search->result.start_index; + if ((item_index + range_len) > items_area.items_count) { + err = -ERANGE; + SSDFS_ERR("invalid request: " + "item_index %u, range_len %u, items_count %u\n", + item_index, range_len, + items_area.items_count); + goto finish_define_changing_items; + } + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_CHANGE_ITEM: + /* range_len doesn't need to be changed */ + break; + + case SSDFS_BTREE_SEARCH_INVALIDATE_TAIL: + range_len = items_area.items_count - item_index; + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid request type: %#x\n", + search->request.type); + goto finish_define_changing_items; + } + + err = ssdfs_lock_items_range(node, item_index, range_len); + if (err == -ENOENT) { + up_write(&node->full_lock); + wake_up_all(&node->wait_queue); + return -ERANGE; + } else if (err == -ENODATA) { + up_write(&node->full_lock); + wake_up_all(&node->wait_queue); + return -ERANGE; + } else if (unlikely(err)) + BUG(); + +finish_define_changing_items: + downgrade_write(&node->full_lock); + + if (unlikely(err)) + goto finish_change_item; + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_CHANGE_ITEM: + err = ssdfs_change_item_only(node, &items_area, search); + break; + + case SSDFS_BTREE_SEARCH_INVALIDATE_TAIL: + err = ssdfs_change_item_and_invalidate_tail(node, &items_area, + search); + break; + + default: + BUG(); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to change item: err %d\n", + err); + goto unlock_items_range; + } + + err = ssdfs_set_node_header_dirty(node, items_area.items_capacity); + if (unlikely(err)) { + SSDFS_ERR("fail to set header dirty: err %d\n", + err); + goto unlock_items_range; + } + + err = ssdfs_set_dirty_items_range(node, items_area.items_capacity, + item_index, range_len); + if (unlikely(err)) { + SSDFS_ERR("fail to set items range as dirty: " + "start %u, count %u, err %d\n", + item_index, range_len, err); + goto unlock_items_range; + } + +unlock_items_range: + ssdfs_unlock_items_range(node, item_index, range_len); + +finish_change_item: + up_read(&node->full_lock); + + return err; +} + +/* + * __ssdfs_extents_btree_node_delete_range() - delete range of items + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to delete a range of items in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + * %-EAGAIN - continue deletion in the next node. + */ +static +int __ssdfs_extents_btree_node_delete_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree *tree; + struct ssdfs_extents_btree_info *etree; + struct ssdfs_extents_btree_node_header *hdr; + struct ssdfs_btree_node_items_area items_area; + struct ssdfs_raw_fork fork; + size_t item_size = sizeof(struct ssdfs_raw_fork); + u16 index_count = 0; + int free_items; + u16 item_index; + int direction; + u16 range_len; + u16 shift_range_len = 0; + u16 locked_len = 0; + u32 deleted_space, free_space; + u64 start_hash = U64_MAX; + u64 end_hash = U64_MAX; + u64 old_hash; + u32 old_forks_count = 0, forks_count = 0; + u32 forks_diff; + u32 allocated_extents; + u32 valid_extents; + u64 blks_count; + u32 max_extent_blks; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_VALID_ITEM: + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid result state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.err) { + SSDFS_WARN("invalid search result: err %d\n", + search->result.err); + return search->result.err; + } + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid items_area state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + + tree = node->tree; + + switch (tree->type) { + case SSDFS_EXTENTS_BTREE: + /* expected btree type */ + break; + + default: + SSDFS_ERR("invalid btree type %#x\n", tree->type); + return -ERANGE; + } + + etree = container_of(tree, struct ssdfs_extents_btree_info, + buffer.tree); + + down_read(&node->header_lock); + ssdfs_memcpy(&items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + &node->items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + sizeof(struct ssdfs_btree_node_items_area)); + old_hash = node->items_area.start_hash; + up_read(&node->header_lock); + + if (items_area.items_capacity == 0 || + items_area.items_capacity < items_area.items_count) { + SSDFS_ERR("invalid items accounting: " + "node_id %u, items_capacity %u, items_count %u\n", + search->node.id, + items_area.items_capacity, + items_area.items_count); + return -ERANGE; + } + + if (items_area.min_item_size != item_size || + items_area.max_item_size != item_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("min_item_size %u, max_item_size %u, " + "item_size %zu\n", + items_area.min_item_size, items_area.max_item_size, + item_size); + return -EFAULT; + } + + if (items_area.area_size == 0 || + items_area.area_size >= node->node_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid area_size %u\n", + items_area.area_size); + return -EFAULT; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_area: items_count %u, items_capacity %u, " + "area_size %u, free_space %u\n", + items_area.items_count, + items_area.items_capacity, + items_area.area_size, + items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (items_area.free_space > items_area.area_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("free_space %u > area_size %u\n", + items_area.free_space, items_area.area_size); + return -EFAULT; + } + + free_items = items_area.items_capacity - items_area.items_count; + if (unlikely(free_items < 0)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_WARN("invalid free_items %d\n", + free_items); + return -EFAULT; + } + + if (((u64)free_items * item_size) > items_area.free_space) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid free_items: " + "free_items %d, item_size %zu, free_space %u\n", + free_items, item_size, items_area.free_space); + return -EFAULT; + } + + forks_count = items_area.items_count; + item_index = search->result.start_index; + + range_len = search->request.count; + if (range_len == 0) { + SSDFS_ERR("range_len == 0\n"); + return -ERANGE; + } + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_DELETE_ITEM: + if ((item_index + range_len) > items_area.items_count) { + SSDFS_ERR("invalid request: " + "item_index %d, count %u\n", + item_index, range_len); + return -ERANGE; + } + break; + + case SSDFS_BTREE_SEARCH_DELETE_RANGE: + case SSDFS_BTREE_SEARCH_DELETE_ALL: + case SSDFS_BTREE_SEARCH_INVALIDATE_TAIL: + /* request can be distributed between several nodes */ + break; + + default: + atomic_set(&node->state, + SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid request type %#x\n", + search->request.type); + return -ERANGE; + } + + down_write(&node->full_lock); + + direction = is_requested_position_correct(node, &items_area, + search); + switch (direction) { + case SSDFS_CORRECT_POSITION: + /* do nothing */ + break; + + case SSDFS_SEARCH_LEFT_DIRECTION: + err = ssdfs_find_correct_position_from_left(node, &items_area, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the correct position: " + "err %d\n", + err); + goto finish_detect_affected_items; + } + break; + + case SSDFS_SEARCH_RIGHT_DIRECTION: + err = ssdfs_find_correct_position_from_right(node, &items_area, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the correct position: " + "err %d\n", + err); + goto finish_detect_affected_items; + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("fail to check requested position\n"); + goto finish_detect_affected_items; + } + + item_index = search->result.start_index; + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_DELETE_ITEM: + if ((item_index + range_len) > items_area.items_count) { + err = -ERANGE; + SSDFS_ERR("invalid forks_count: " + "item_index %u, forks_count %u, " + "items_count %u\n", + item_index, range_len, + items_area.items_count); + goto finish_detect_affected_items; + } + break; + + case SSDFS_BTREE_SEARCH_DELETE_RANGE: + case SSDFS_BTREE_SEARCH_DELETE_ALL: + case SSDFS_BTREE_SEARCH_INVALIDATE_TAIL: + /* request can be distributed between several nodes */ + range_len = min_t(unsigned int, range_len, + items_area.items_count - item_index); +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, item_index %u, " + "request.count %u, items_count %u\n", + node->node_id, item_index, + search->request.count, + items_area.items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + default: + BUG(); + } + + locked_len = items_area.items_count - item_index; + + err = ssdfs_lock_items_range(node, item_index, locked_len); + if (err == -ENOENT) { + up_write(&node->full_lock); + wake_up_all(&node->wait_queue); + SSDFS_ERR("fail to lock items range\n"); + return -ERANGE; + } else if (err == -ENODATA) { + up_write(&node->full_lock); + wake_up_all(&node->wait_queue); + SSDFS_ERR("fail to lock items range\n"); + return -ERANGE; + } else if (unlikely(err)) + BUG(); + +finish_detect_affected_items: + downgrade_write(&node->full_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to delete range: err %d\n", + err); + goto finish_delete_range; + } + + if (range_len == items_area.items_count) { + /* items area is empty */ + err = ssdfs_invalidate_whole_items_area(node, &items_area, + search); + } else { + err = ssdfs_invalidate_items_area_partially(node, &items_area, + item_index, + range_len, + search); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate items area: " + "node_id %u, start_index %u, " + "range_len %u, err %d\n", + node->node_id, item_index, + range_len, err); + goto finish_delete_range; + } + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_DELETE_ITEM: + case SSDFS_BTREE_SEARCH_DELETE_RANGE: + /* continue to shift rest forks to left */ + break; + + case SSDFS_BTREE_SEARCH_DELETE_ALL: + case SSDFS_BTREE_SEARCH_INVALIDATE_TAIL: + err = ssdfs_set_node_header_dirty(node, + items_area.items_capacity); + if (unlikely(err)) { + SSDFS_ERR("fail to set header dirty: err %d\n", + err); + goto finish_delete_range; + } + break; + + default: + BUG(); + } + + shift_range_len = locked_len - range_len; + if (shift_range_len != 0) { + err = ssdfs_shift_range_left(node, &items_area, item_size, + item_index + range_len, + shift_range_len, range_len); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to shift the range: " + "start %u, count %u, err %d\n", + item_index + range_len, + shift_range_len, + err); + goto finish_delete_range; + } + + err = __ssdfs_btree_node_clear_range(node, + &items_area, item_size, + item_index + shift_range_len, + range_len); + if (unlikely(err)) { + SSDFS_ERR("fail to clear range: " + "start %u, count %u, err %d\n", + item_index + range_len, + shift_range_len, + err); + goto finish_delete_range; + } + } + + down_write(&node->header_lock); + + if (node->items_area.items_count < search->request.count) + node->items_area.items_count = 0; + else + node->items_area.items_count -= search->request.count; + + deleted_space = (u32)search->request.count * item_size; + free_space = node->items_area.free_space; + if ((free_space + deleted_space) > node->items_area.area_size) { + err = -ERANGE; + SSDFS_ERR("deleted_space %u, free_space %u, area_size %u\n", + deleted_space, + node->items_area.free_space, + node->items_area.area_size); + goto finish_items_area_correction; + } + node->items_area.free_space += deleted_space; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("NEW STATE: node_id %u, " + "items_count %u, free_space %u\n", + node->node_id, + node->items_area.items_count, + node->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (node->items_area.items_count == 0) { + start_hash = U64_MAX; + end_hash = U64_MAX; + } else { + err = ssdfs_extents_btree_node_get_fork(node, + &node->items_area, + 0, &fork); + if (unlikely(err)) { + SSDFS_ERR("fail to get fork: err %d\n", err); + goto finish_items_area_correction; + } + start_hash = le64_to_cpu(fork.start_offset); + + err = ssdfs_extents_btree_node_get_fork(node, + &node->items_area, + node->items_area.items_count - 1, + &fork); + if (unlikely(err)) { + SSDFS_ERR("fail to get fork: err %d\n", err); + goto finish_items_area_correction; + } + end_hash = le64_to_cpu(fork.start_offset); + + blks_count = le64_to_cpu(fork.blks_count); + if (blks_count == 0 || blks_count >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid blks_count %llu\n", + blks_count); + goto finish_items_area_correction; + } + + end_hash += blks_count - 1; + } + + node->items_area.start_hash = start_hash; + node->items_area.end_hash = end_hash; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_area.start_hash %llx, " + "items_area.end_hash %llx\n", + node->items_area.start_hash, + node->items_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (node->items_area.items_count == 0) + ssdfs_initialize_lookup_table(node); + else { + err = ssdfs_clean_lookup_table(node, + &node->items_area, + node->items_area.items_count); + if (unlikely(err)) { + SSDFS_ERR("fail to clean the rest of lookup table: " + "start_index %u, err %d\n", + node->items_area.items_count, err); + goto finish_items_area_correction; + } + + if (shift_range_len != 0) { + int start_index = + node->items_area.items_count - shift_range_len; + + if (start_index < 0) { + err = -ERANGE; + SSDFS_ERR("invalid start_index %d\n", + start_index); + goto finish_items_area_correction; + } + + err = ssdfs_correct_lookup_table(node, + &node->items_area, + start_index, + shift_range_len); + if (unlikely(err)) { + SSDFS_ERR("fail to correct lookup table: " + "err %d\n", err); + goto finish_items_area_correction; + } + } + } + + hdr = &node->raw.extents_header; + old_forks_count = le32_to_cpu(hdr->forks_count); + + if (node->items_area.items_count == 0) { + hdr->forks_count = cpu_to_le32(0); + hdr->allocated_extents = cpu_to_le32(0); + hdr->valid_extents = cpu_to_le32(0); + hdr->blks_count = cpu_to_le64(0); + hdr->max_extent_blks = cpu_to_le32(0); + } else { + if (old_forks_count < search->request.count) { + hdr->forks_count = cpu_to_le32(0); + hdr->allocated_extents = cpu_to_le32(0); + hdr->valid_extents = cpu_to_le32(0); + hdr->blks_count = cpu_to_le64(0); + hdr->max_extent_blks = cpu_to_le32(0); + } else { + forks_count = le32_to_cpu(hdr->forks_count); + forks_count -= search->request.count; + hdr->forks_count = cpu_to_le32(forks_count); + + allocated_extents = le32_to_cpu(hdr->allocated_extents); + allocated_extents -= + search->request.count * + SSDFS_INLINE_EXTENTS_COUNT; + hdr->allocated_extents = cpu_to_le32(allocated_extents); + + err = ssdfs_calculate_range_blocks_in_node(node, + &node->items_area, + 0, forks_count, + &valid_extents, + &blks_count, + &max_extent_blks); + if (unlikely(err)) { + SSDFS_ERR("fail to calculate range's blocks: " + "node_id %u, item_index %u, " + "range_len %u\n", + node->node_id, 0, forks_count); + goto finish_items_area_correction; + } + + hdr->valid_extents = cpu_to_le32(valid_extents); + hdr->blks_count = cpu_to_le64(blks_count); + hdr->max_extent_blks = cpu_to_le32(max_extent_blks); + } + } + + forks_count = le32_to_cpu(hdr->forks_count); + forks_diff = old_forks_count - forks_count; + atomic64_sub(forks_diff, &etree->forks_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("forks_count %lld\n", + atomic64_read(&etree->forks_count)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_memcpy(&items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + &node->items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + sizeof(struct ssdfs_btree_node_items_area)); + + err = ssdfs_set_node_header_dirty(node, items_area.items_capacity); + if (unlikely(err)) { + SSDFS_ERR("fail to set header dirty: err %d\n", + err); + goto finish_items_area_correction; + } + + if (forks_count != 0) { + err = ssdfs_set_dirty_items_range(node, + items_area.items_capacity, + item_index, + old_forks_count - item_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set items range as dirty: " + "start %u, count %u, err %d\n", + item_index, + old_forks_count - item_index, + err); + goto finish_items_area_correction; + } + } + +finish_items_area_correction: + up_write(&node->header_lock); + + if (unlikely(err)) + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + +finish_delete_range: + ssdfs_unlock_items_range(node, item_index, locked_len); + up_read(&node->full_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to delete range: err %d\n", + err); + return err; + } + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_HYBRID_NODE: + if (forks_count == 0) { + int state; + + down_read(&node->header_lock); + state = atomic_read(&node->index_area.state); + index_count = node->index_area.index_count; + end_hash = node->index_area.end_hash; + up_read(&node->header_lock); + + if (state != SSDFS_BTREE_NODE_INDEX_AREA_EXIST) { + SSDFS_ERR("invalid area state %#x\n", + state); + return -ERANGE; + } + + if (index_count <= 1 || end_hash == old_hash) { + err = ssdfs_btree_node_delete_index(node, + old_hash); + if (unlikely(err)) { + SSDFS_ERR("fail to delete index: " + "old_hash %llx, err %d\n", + old_hash, err); + return err; + } + + if (index_count > 0) + index_count--; + } + } else if (old_hash != start_hash) { + struct ssdfs_btree_index_key old_key, new_key; + + spin_lock(&node->descriptor_lock); + ssdfs_memcpy(&old_key, + 0, sizeof(struct ssdfs_btree_index_key), + &node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + ssdfs_memcpy(&new_key, + 0, sizeof(struct ssdfs_btree_index_key), + &node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + spin_unlock(&node->descriptor_lock); + + old_key.index.hash = cpu_to_le64(old_hash); + new_key.index.hash = cpu_to_le64(start_hash); + + err = ssdfs_btree_node_change_index(node, + &old_key, &new_key); + if (unlikely(err)) { + SSDFS_ERR("fail to change index: err %d\n", + err); + return err; + } + } + break; + + default: + /* do nothing */ + break; + } + + if (forks_count == 0 && index_count == 0) + search->result.state = SSDFS_BTREE_SEARCH_PLEASE_DELETE_NODE; + else + search->result.state = SSDFS_BTREE_SEARCH_OBSOLETE_RESULT; + + if (search->request.type == SSDFS_BTREE_SEARCH_DELETE_RANGE) { + if (search->request.count > range_len) { + search->request.start.hash = items_area.end_hash; + search->request.count -= range_len; + SSDFS_DBG("continue to delete range\n"); + return -EAGAIN; + } + } + + ssdfs_debug_btree_node_object(node); + + return 0; +} + +/* + * ssdfs_extents_btree_node_delete_item() - delete an item from node + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to delete an item from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_extents_btree_node_delete_item(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p, result.count %u\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child, search->result.count); + + BUG_ON(search->result.count != 1); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_extents_btree_node_delete_range(node, search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete fork: err %d\n", + err); + return err; + } + + return 0; +} + +/* + * ssdfs_extents_btree_node_delete_range() - delete range of items from node + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to delete a range of items from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_extents_btree_node_delete_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_extents_btree_node_delete_range(node, search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete forks range: err %d\n", + err); + return err; + } + + return 0; +} + +/* + * ssdfs_extents_btree_node_extract_range() - extract range of items from node + * @node: pointer on node object + * @start_index: starting index of the range + * @count: count of items in the range + * @search: pointer on search request object + * + * This method tries to extract a range of items from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + * %-ENOMEM - fail to allocate memory. + * %-ENODATA - no such range in the node. + */ +static +int ssdfs_extents_btree_node_extract_range(struct ssdfs_btree_node *node, + u16 start_index, u16 count, + struct ssdfs_btree_search *search) +{ + struct ssdfs_raw_fork *fork; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_index %u, count %u, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + start_index, count, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->full_lock); + err = __ssdfs_btree_node_extract_range(node, start_index, count, + sizeof(struct ssdfs_raw_fork), + search); + up_read(&node->full_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to extract a range: " + "start %u, count %u, err %d\n", + start_index, count, err); + return err; + } + + search->request.flags = + SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT; + fork = (struct ssdfs_raw_fork *)search->result.buf; + search->request.start.hash = le64_to_cpu(fork->start_offset); + fork += search->result.count - 1; + search->request.end.hash = le64_to_cpu(fork->start_offset); + search->request.count = count; + + return 0; + +} + +/* + * ssdfs_extents_btree_resize_items_area() - resize items area of the node + * @node: node object + * @new_size: new size of the items area + * + * This method tries to resize the items area of the node. + * + * TODO: It makes sense to allocate the bitmap with taking into + * account that we will resize the node. So, it needs + * to allocate the index area in bitmap is equal to + * the whole node and items area is equal to the whole node. + * This technique provides opportunity not to resize or + * to shift the content of the bitmap. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_extents_btree_resize_items_area(struct ssdfs_btree_node *node, + u32 new_size) +{ + struct ssdfs_fs_info *fsi; + size_t item_size = sizeof(struct ssdfs_raw_fork); + size_t index_size; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !node->tree->fsi); + + SSDFS_DBG("node_id %u, new_size %u\n", + node->node_id, new_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + index_size = le16_to_cpu(fsi->vh->extents_btree.desc.index_size); + + return __ssdfs_btree_node_resize_items_area(node, + item_size, + index_size, + new_size); +} + +void ssdfs_debug_extents_btree_object(struct ssdfs_extents_btree_info *tree) +{ +#ifdef CONFIG_SSDFS_DEBUG + int i, j; + + BUG_ON(!tree); + + SSDFS_DBG("EXTENTS TREE: type %#x, state %#x, " + "forks_count %llu, is_locked %d, " + "generic_tree %p, inline_forks %p, " + "root %p, owner %p, fsi %p\n", + atomic_read(&tree->type), + atomic_read(&tree->state), + (u64)atomic64_read(&tree->forks_count), + rwsem_is_locked(&tree->lock), + tree->generic_tree, + tree->inline_forks, + tree->root, + tree->owner, + tree->fsi); + + if (tree->generic_tree) { + /* debug dump of generic tree */ + ssdfs_debug_btree_object(tree->generic_tree); + } + + if (tree->inline_forks) { + for (i = 0; i < SSDFS_INLINE_FORKS_COUNT; i++) { + struct ssdfs_raw_fork *fork; + + fork = &tree->inline_forks[i]; + + SSDFS_DBG("INLINE FORK: index %d, " + "start_offset %llu, blks_count %llu\n", + i, + le64_to_cpu(fork->start_offset), + le64_to_cpu(fork->blks_count)); + + for (j = 0; j < SSDFS_INLINE_EXTENTS_COUNT; j++) { + struct ssdfs_raw_extent *extent; + + extent = &fork->extents[j]; + + SSDFS_DBG("EXTENT: index %d, " + "seg_id %llu, logical_blk %u, " + "len %u\n", + j, + le64_to_cpu(extent->seg_id), + le32_to_cpu(extent->logical_blk), + le32_to_cpu(extent->len)); + } + } + } + + if (tree->root) { + SSDFS_DBG("ROOT NODE HEADER: height %u, items_count %u, " + "flags %#x, type %#x, upper_node_id %u, " + "node_ids (left %u, right %u)\n", + tree->root->header.height, + tree->root->header.items_count, + tree->root->header.flags, + tree->root->header.type, + le32_to_cpu(tree->root->header.upper_node_id), + le32_to_cpu(tree->root->header.node_ids[0]), + le32_to_cpu(tree->root->header.node_ids[1])); + + for (i = 0; i < SSDFS_BTREE_ROOT_NODE_INDEX_COUNT; i++) { + struct ssdfs_btree_index *index; + + index = &tree->root->indexes[i]; + + SSDFS_DBG("NODE_INDEX: index %d, hash %llx, " + "seg_id %llu, logical_blk %u, len %u\n", + i, + le64_to_cpu(index->hash), + le64_to_cpu(index->extent.seg_id), + le32_to_cpu(index->extent.logical_blk), + le32_to_cpu(index->extent.len)); + } + } +#endif /* CONFIG_SSDFS_DEBUG */ +} + +const struct ssdfs_btree_descriptor_operations ssdfs_extents_btree_desc_ops = { + .init = ssdfs_extents_btree_desc_init, + .flush = ssdfs_extents_btree_desc_flush, +}; + +const struct ssdfs_btree_operations ssdfs_extents_btree_ops = { + .create_root_node = ssdfs_extents_btree_create_root_node, + .create_node = ssdfs_extents_btree_create_node, + .init_node = ssdfs_extents_btree_init_node, + .destroy_node = ssdfs_extents_btree_destroy_node, + .add_node = ssdfs_extents_btree_add_node, + .delete_node = ssdfs_extents_btree_delete_node, + .pre_flush_root_node = ssdfs_extents_btree_pre_flush_root_node, + .flush_root_node = ssdfs_extents_btree_flush_root_node, + .pre_flush_node = ssdfs_extents_btree_pre_flush_node, + .flush_node = ssdfs_extents_btree_flush_node, +}; + +const struct ssdfs_btree_node_operations ssdfs_extents_btree_node_ops = { + .find_item = ssdfs_extents_btree_node_find_item, + .find_range = ssdfs_extents_btree_node_find_range, + .extract_range = ssdfs_extents_btree_node_extract_range, + .allocate_item = ssdfs_extents_btree_node_allocate_item, + .allocate_range = ssdfs_extents_btree_node_allocate_range, + .insert_item = ssdfs_extents_btree_node_insert_item, + .insert_range = ssdfs_extents_btree_node_insert_range, + .change_item = ssdfs_extents_btree_node_change_item, + .delete_item = ssdfs_extents_btree_node_delete_item, + .delete_range = ssdfs_extents_btree_node_delete_range, + .resize_items_area = ssdfs_extents_btree_resize_items_area, +}; From patchwork Sat Feb 25 01:09:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151975 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4DD7BC7EE2E for ; Sat, 25 Feb 2023 01:20:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229643AbjBYBUk (ORCPT ); Fri, 24 Feb 2023 20:20:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229630AbjBYBTk (ORCPT ); Fri, 24 Feb 2023 20:19:40 -0500 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E62E12BF9 for ; Fri, 24 Feb 2023 17:18:04 -0800 (PST) Received: by mail-oi1-x22e.google.com with SMTP id e21so831000oie.1 for ; Fri, 24 Feb 2023 17:18:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CCHjzp5iee+rTJ3m6+0ElhtjUdZZH14etZXXSs97bbo=; b=K8ESAxSqUBMXGSGbSs5kBiOVJOvLVX/aFNO1jiY0ngpN6PFxOXAzz7UjpvMhAzrVM5 jtd0u3K4kuHpcwDaXeNL1z2dvqInxeEbvjz7EXjL93SSRpfIGQcxiaGf+NUgXcnkoB9b Yr0YeQwAvuOcjei8LYSROjfYx7lO6RWKGMehXQS+ELcuGUyCwyQIJTEPkwkut1phQ2wC sdWXcQ35f+vjp0xsuVcdjz4uTqFpJRwTkkkbstfn7POyalEdIxZihP/Q1zB3iRW3nbK9 jgi2NKwjJskbJvBjye4muwrbfkof9cBOgRKEvjbK8qU8uL48jUYFU4BnbkSXU5OEkcwl AozA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CCHjzp5iee+rTJ3m6+0ElhtjUdZZH14etZXXSs97bbo=; b=Kj2RQOKH/VMyLChWeLaOoLW7q+JqB3WkJGEnvSCiqO9D5h59kBPQtKf1y9nwUeqRZ/ sRN3tQsN6dThGdhvkF7t5amJM8VktUxBWolgOb96bAgiXZnTq9b7eKIdL92UoolkFNC5 963HFEMqMSnvEOjM4SSPE0W2CAf2JnILc6DHk0GpUNKAh33arCCKRlMfZwmhIONr9Dz0 u16sfHReugdqVj9kt/gmfIg/+uCm3LwPNg+SgMcp/rTehx5bbdCyp6HPzkqF+gudmN8Q CK7NKeWKoADUi1H7W1eSB8TE+dA0WxZ/2UNl99mLSEoMgt/4dC0B6WZxc0AYtNTzXw58 FQ6g== X-Gm-Message-State: AO0yUKVlPDdwtUIvhtNGrKfkxPb5WAUDHdu65L68y6S6qOH0fRtsp9P9 JSWaunCEovOCjq1MUNO8rdtDKQoG3tQGTn19 X-Google-Smtp-Source: AK7set8yxeTYycG2OB5BWJMH+hn8z4lLSbR01ntbGsrsAJlhod1VOVwx5/L+3jgdiKDBKVWCfToWcQ== X-Received: by 2002:aca:d0d:0:b0:360:e80e:26a9 with SMTP id 13-20020aca0d0d000000b00360e80e26a9mr4336250oin.12.1677287883289; Fri, 24 Feb 2023 17:18:03 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.18.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:18:02 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 70/76] ssdfs: introduce invalidated extents b-tree Date: Fri, 24 Feb 2023 17:09:21 -0800 Message-Id: <20230225010927.813929-71-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org ZNS SSD operates by zone concept. Zone can be: (1) empty, (2) open (implicitly or explicitly), (3) closed, (4) full. The number of open/active zones is limited by some threshold. To manage open/active zones limitation, SSDFS has current user data segment for new data and current user data segment to receive updates for closed zones. Every update of data in closed zone requires: (1) store updated data into current segment for updated user data, (2) update extents b-tree by new data location, (3) add invalidated extent of closed zone into invalidated extents b-tree. Invalidated extents b-tree is responsible for: (1) correct erase block's (closed zone) block bitmap by means of setting moved logical blocks as invalidated during erase block object initialization, (2) collect all invalidated extents of closed zone. If the length of all closed zone's invalidated extents is equal to zone size, then closed zone can be re-initialized or be erased. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/invalidated_extents_tree.c | 2523 +++++++++++++++++++++++++++ fs/ssdfs/invalidated_extents_tree.h | 95 + 2 files changed, 2618 insertions(+) create mode 100644 fs/ssdfs/invalidated_extents_tree.c create mode 100644 fs/ssdfs/invalidated_extents_tree.h diff --git a/fs/ssdfs/invalidated_extents_tree.c b/fs/ssdfs/invalidated_extents_tree.c new file mode 100644 index 000000000000..4cb5ffeac706 --- /dev/null +++ b/fs/ssdfs/invalidated_extents_tree.c @@ -0,0 +1,2523 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/invalidated_extents_tree.c - invalidated extents btree implementation. + * + * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates. + * https://www.bytedance.com/ + * Copyright (c) 2022-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cong Wang + */ + +#include +#include +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "btree_search.h" +#include "btree_node.h" +#include "btree.h" +#include "shared_dictionary.h" +#include "dentries_tree.h" +#include "invalidated_extents_tree.h" + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_invext_tree_page_leaks; +atomic64_t ssdfs_invext_tree_memory_leaks; +atomic64_t ssdfs_invext_tree_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_invext_tree_cache_leaks_increment(void *kaddr) + * void ssdfs_invext_tree_cache_leaks_decrement(void *kaddr) + * void *ssdfs_invext_tree_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_invext_tree_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_invext_tree_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_invext_tree_kfree(void *kaddr) + * struct page *ssdfs_invext_tree_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_invext_tree_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_invext_tree_free_page(struct page *page) + * void ssdfs_invext_tree_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(invext_tree) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(invext_tree) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_invext_tree_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_invext_tree_page_leaks, 0); + atomic64_set(&ssdfs_invext_tree_memory_leaks, 0); + atomic64_set(&ssdfs_invext_tree_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_invext_tree_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_invext_tree_page_leaks) != 0) { + SSDFS_ERR("INVALIDATED EXTENTS TREE: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_invext_tree_page_leaks)); + } + + if (atomic64_read(&ssdfs_invext_tree_memory_leaks) != 0) { + SSDFS_ERR("INVALIDATED EXTENTS TREE: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_invext_tree_memory_leaks)); + } + + if (atomic64_read(&ssdfs_invext_tree_cache_leaks) != 0) { + SSDFS_ERR("INVALIDATED EXTENTS TREE: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_invext_tree_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +/****************************************************************************** + * INVALIDATED EXTENTS TREE OBJECT FUNCTIONALITY * + ******************************************************************************/ + +/* + * ssdfs_invextree_create() - create invalidated extents btree + * @fsi: pointer on shared file system object + * + * This method tries to create invalidated extents btree object. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ENOMEM - unable to allocate memory. + * %-ERANGE - internal error. + */ +int ssdfs_invextree_create(struct ssdfs_fs_info *fsi) +{ + struct ssdfs_invextree_info *ptr; + size_t desc_size = sizeof(struct ssdfs_invextree_info); + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("fsi %p\n", fsi); +#else + SSDFS_DBG("fsi %p\n", fsi); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + fsi->invextree = NULL; + + ptr = ssdfs_invext_tree_kzalloc(desc_size, GFP_KERNEL); + if (!ptr) { + SSDFS_ERR("fail to allocate invalidated extents tree\n"); + return -ENOMEM; + } + + atomic_set(&ptr->state, SSDFS_INVEXTREE_UNKNOWN_STATE); + + fsi->invextree = ptr; + ptr->fsi = fsi; + + err = ssdfs_btree_create(fsi, + SSDFS_INVALID_EXTENTS_BTREE_INO, + &ssdfs_invextree_desc_ops, + &ssdfs_invextree_ops, + &ptr->generic_tree); + if (unlikely(err)) { + SSDFS_ERR("fail to create invalidated extents tree: err %d\n", + err); + goto fail_create_invextree; + } + + init_rwsem(&ptr->lock); + + atomic64_set(&ptr->extents_count, 0); + + atomic_set(&ptr->state, SSDFS_INVEXTREE_CREATED); + + ssdfs_debug_invextree_object(ptr); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("DONE: create invalidated extents tree\n"); +#else + SSDFS_DBG("DONE: create invalidated extents tree\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + return 0; + +fail_create_invextree: + ssdfs_invext_tree_kfree(ptr); + fsi->invextree = NULL; + return err; +} + +/* + * ssdfs_invextree_destroy - destroy invalidated extents btree + * @fsi: pointer on shared file system object + */ +void ssdfs_invextree_destroy(struct ssdfs_fs_info *fsi) +{ + struct ssdfs_invextree_info *tree; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p\n", fsi->invextree); +#else + SSDFS_DBG("tree %p\n", fsi->invextree); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + if (!fsi->invextree) + return; + + tree = fsi->invextree; + + ssdfs_debug_invextree_object(tree); + + ssdfs_btree_destroy(&tree->generic_tree); + + ssdfs_invext_tree_kfree(tree); + fsi->invextree = NULL; + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ +} + +/* + * ssdfs_invextree_flush() - flush dirty invalidated extents btree + * @fsi: pointer on shared file system object + * + * This method tries to flush the dirty invalidated extents btree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +int ssdfs_invextree_flush(struct ssdfs_fs_info *fsi) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !fsi->invextree); + BUG_ON(!rwsem_is_locked(&fsi->volume_sem)); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p\n", fsi->invextree); +#else + SSDFS_DBG("tree %p\n", fsi->invextree); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + err = ssdfs_btree_flush(&fsi->invextree->generic_tree); + if (unlikely(err)) { + SSDFS_ERR("fail to flush invalidated extents btree: err %d\n", + err); + return err; + } + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#else + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ssdfs_debug_invextree_object(fsi->invextree); + + return 0; +} + +/****************************************************************************** + * INVALIDATED EXTENTS TREE OBJECT FUNCTIONALITY * + ******************************************************************************/ + +/* + * ssdfs_invextree_calculate_hash() - calculate hash value + * @fsi: pointer on shared file system object + * @seg_id: segment or zone ID + * @logical_blk: logical block index in the segment or zone + */ +static inline +u64 ssdfs_invextree_calculate_hash(struct ssdfs_fs_info *fsi, + u64 seg_id, u32 logical_blk) +{ + u64 hash = U64_MAX; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi); + + SSDFS_DBG("seg_id %llu, logical_block %u\n", + seg_id, logical_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + hash = seg_id; + hash *= fsi->pebs_per_seg; + hash *= fsi->pages_per_peb; + hash += logical_blk; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("seg_id %llu, logical_block %u, " + "pebs_per_seg %u, pages_per_peb %u, " + "hash %llx\n", + seg_id, logical_blk, + fsi->pebs_per_seg, + fsi->pages_per_peb, + hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + return hash; +} + +/* + * need_initialize_invextree_search() - check necessity to init the search + * @fsi: pointer on shared file system object + * @seg_id: segment or zone ID + * @logical_blk: logical block index in the segment or zone + * @search: search object + */ +static inline +bool need_initialize_invextree_search(struct ssdfs_fs_info *fsi, + u64 seg_id, u32 logical_blk, + struct ssdfs_btree_search *search) +{ + u64 hash = ssdfs_invextree_calculate_hash(fsi, seg_id, logical_blk); + + return need_initialize_btree_search(search) || + search->request.start.hash != hash; +} + +/* + * ssdfs_invextree_find() - find invalidated extent + * @tree: pointer on invalidated extents btree object + * @extent: searching range of invalidated logical blocks + * @search: pointer on search request object + * + * This method tries to find an invalidated extent. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_invextree_find(struct ssdfs_invextree_info *tree, + struct ssdfs_raw_extent *extent, + struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + u64 seg_id; + u32 logical_blk; + u32 len; + u64 start_hash; + u64 end_hash; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !tree->fsi || !extent || !search); + + SSDFS_DBG("tree %p, search %p, " + "extent (seg_id %llu, logical_blk %u, len %u)\n", + tree, search, + le64_to_cpu(extent->seg_id), + le32_to_cpu(extent->logical_blk), + le32_to_cpu(extent->len)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = tree->fsi; + + seg_id = le64_to_cpu(extent->seg_id); + logical_blk = le32_to_cpu(extent->logical_blk); + len = le32_to_cpu(extent->len); + + start_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, logical_blk); + end_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk + len - 1); + + search->request.type = SSDFS_BTREE_SEARCH_FIND_RANGE; + + if (need_initialize_invextree_search(fsi, seg_id, logical_blk, search)) { + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_FIND_RANGE; + search->request.flags = SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE; + search->request.start.hash = start_hash; + search->request.end.hash = end_hash; + search->request.count = len; + } + + return ssdfs_btree_find_range(&tree->generic_tree, search); +} + +/* + * ssdfs_invextree_find_leaf_node() - find a leaf node in the tree + * @tree: invalidated extents tree + * @seg_id: segment or zone ID + * @search: search object + * + * This method tries to find a leaf node for the requested @seg_id. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_invextree_find_leaf_node(struct ssdfs_invextree_info *tree, + u64 seg_id, + struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + u64 hash; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + + SSDFS_DBG("tree %p, seg_id %llu, search %p\n", + tree, seg_id, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = tree->fsi; + hash = ssdfs_invextree_calculate_hash(fsi, seg_id, 0); + + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + + if (need_initialize_invextree_search(fsi, seg_id, 0, search)) { + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_FIND_ITEM; + search->request.flags = SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE; + search->request.start.hash = hash; + search->request.end.hash = hash; + search->request.count = 1; + } + + err = ssdfs_btree_find_item(&tree->generic_tree, search); + if (err == -ENODATA) { + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + case SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE: + /* expected state */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("unexpected result's state %#x\n", + search->result.state); + goto finish_find_leaf_node; + } + + switch (search->node.state) { + case SSDFS_BTREE_SEARCH_FOUND_LEAF_NODE_DESC: + case SSDFS_BTREE_SEARCH_FOUND_INDEX_NODE_DESC: + /* expected state */ + err = 0; + break; + + default: + err = -ERANGE; + SSDFS_ERR("unexpected node state %#x\n", + search->node.state); + break; + } + } + +finish_find_leaf_node: + return err; +} + +/* + * ssdfs_invextree_get_start_hash() - get starting hash of the tree + * @tree: invalidated extents tree + * @start_hash: extracted start hash [out] + * + * This method tries to extract a start hash of the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_invextree_get_start_hash(struct ssdfs_invextree_info *tree, + u64 *start_hash) +{ + struct ssdfs_btree_node *node; + u64 extents_count; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !start_hash); + + SSDFS_DBG("tree %p, start_hash %p\n", + tree, start_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + *start_hash = U64_MAX; + + switch (atomic_read(&tree->state)) { + case SSDFS_INVEXTREE_CREATED: + case SSDFS_INVEXTREE_INITIALIZED: + case SSDFS_INVEXTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid invalidated extents tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + extents_count = atomic64_read(&tree->extents_count); + + if (extents_count < 0) { + SSDFS_WARN("invalid invalidated extents count: " + "extents_count %llu\n", + extents_count); + return -ERANGE; + } else if (extents_count == 0) + return -ENOENT; + + down_read(&tree->lock); + + err = ssdfs_btree_radix_tree_find(&tree->generic_tree, + SSDFS_BTREE_ROOT_NODE_ID, + &node); + if (unlikely(err)) { + SSDFS_ERR("fail to find root node in radix tree: " + "err %d\n", err); + goto finish_get_start_hash; + } else if (!node) { + err = -ENOENT; + SSDFS_WARN("empty node pointer\n"); + goto finish_get_start_hash; + } + + down_read(&node->header_lock); + *start_hash = node->index_area.start_hash; + up_read(&node->header_lock); + +finish_get_start_hash: + up_read(&tree->lock); + + if (*start_hash >= U64_MAX) { + /* warn about invalid hash code */ + SSDFS_WARN("hash_code is invalid\n"); + } + + return err; +} + +/* + * ssdfs_invextree_node_hash_range() - get node's hash range + * @tree: invalidated extents tree + * @search: search object + * @start_hash: extracted start hash [out] + * @end_hash: extracted end hash [out] + * @items_count: extracted number of items in node [out] + * + * This method tries to extract start hash, end hash, + * and items count in a node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +int ssdfs_invextree_node_hash_range(struct ssdfs_invextree_info *tree, + struct ssdfs_btree_search *search, + u64 *start_hash, u64 *end_hash, + u16 *items_count) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search || !start_hash || !end_hash || !items_count); + + SSDFS_DBG("search %p, start_hash %p, " + "end_hash %p, items_count %p\n", + search, start_hash, end_hash, items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + *start_hash = *end_hash = U64_MAX; + *items_count = 0; + + switch (atomic_read(&tree->state)) { + case SSDFS_INVEXTREE_CREATED: + case SSDFS_INVEXTREE_INITIALIZED: + case SSDFS_INVEXTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid invalidated extents tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + err = ssdfs_btree_node_get_hash_range(search, + start_hash, + end_hash, + items_count); + if (unlikely(err)) { + SSDFS_ERR("fail to get hash range: err %d\n", + err); + goto finish_extract_hash_range; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_hash %llx, end_hash %llx, items_count %u\n", + *start_hash, *end_hash, *items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_extract_hash_range: + return err; +} + +/* + * ssdfs_invextree_extract_range() - extract range of items + * @tree: invalidated extents tree + * @start_index: start item index in the node + * @count: requested count of items + * @search: search object + * + * This method tries to extract a range of items from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOMEM - fail to allocate memory. + * %-ENOENT - unable to extract any items. + */ +int ssdfs_invextree_extract_range(struct ssdfs_invextree_info *tree, + u16 start_index, u16 count, + struct ssdfs_btree_search *search) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search); + + SSDFS_DBG("tree %p, start_index %u, count %u, search %p\n", + tree, start_index, count, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->state)) { + case SSDFS_INVEXTREE_CREATED: + case SSDFS_INVEXTREE_INITIALIZED: + case SSDFS_INVEXTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid invalidated extents tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + down_read(&tree->lock); + err = ssdfs_btree_extract_range(&tree->generic_tree, + start_index, count, + search); + up_read(&tree->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to extract the range: " + "start_index %u, count %u, err %d\n", + start_index, count, err); + } + + return err; +} + +/* + * ssdfs_invextree_check_search_result() - check result of search + * @search: search object + */ +int ssdfs_invextree_check_search_result(struct ssdfs_btree_search *search) +{ + size_t desc_size = sizeof(struct ssdfs_raw_extent); + u16 items_count; + size_t buf_size; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_VALID_ITEM: + /* expected state */ + break; + + default: + SSDFS_ERR("unexpected result's state %#x\n", + search->result.state); + return -ERANGE; + } + + switch (search->result.buf_state) { + case SSDFS_BTREE_SEARCH_INLINE_BUFFER: + case SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER: + if (!search->result.buf) { + SSDFS_ERR("buffer pointer is NULL\n"); + return -ERANGE; + } + break; + + default: + SSDFS_ERR("unexpected buffer's state\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.items_in_buffer >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + items_count = (u16)search->result.items_in_buffer; + + if (items_count == 0) { + SSDFS_ERR("items_in_buffer %u\n", + items_count); + return -ENOENT; + } else if (items_count != search->result.count) { + SSDFS_ERR("items_count %u != search->result.count %u\n", + items_count, search->result.count); + return -ERANGE; + } + + buf_size = desc_size * items_count; + + if (buf_size != search->result.buf_size) { + SSDFS_ERR("buf_size %zu != search->result.buf_size %zu\n", + buf_size, + search->result.buf_size); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_invextree_get_next_hash() - get next node's starting hash + * @tree: invalidated extents tree + * @search: search object + * @next_hash: next node's starting hash [out] + */ +int ssdfs_invextree_get_next_hash(struct ssdfs_invextree_info *tree, + struct ssdfs_btree_search *search, + u64 *next_hash) +{ + u64 old_hash; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search || !next_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + old_hash = le64_to_cpu(search->node.found_index.index.hash); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search %p, next_hash %p, old (node %u, hash %llx)\n", + search, next_hash, search->node.id, old_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&tree->lock); + err = ssdfs_btree_get_next_hash(&tree->generic_tree, search, next_hash); + up_read(&tree->lock); + + return err; +} + +/* + * ssdfs_prepare_invalidated_extent() - prepare invalidated extent + * @extent: invalidated extent + * @search: pointer on search request object + * + * This method tries to prepare an invalidated extent. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + */ +static +int ssdfs_prepare_invalidated_extent(struct ssdfs_raw_extent *extent, + struct ssdfs_btree_search *search) +{ + struct ssdfs_raw_extent *desc = NULL; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!extent || !search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.buf_state) { + case SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE: + search->result.buf_state = SSDFS_BTREE_SEARCH_INLINE_BUFFER; +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + search->result.buf = &search->raw.invalidated_extent; + search->result.buf_size = sizeof(struct ssdfs_raw_extent); + search->result.items_in_buffer = 1; + break; + + case SSDFS_BTREE_SEARCH_INLINE_BUFFER: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search->result.buf); + BUG_ON(search->result.buf_size != + sizeof(struct ssdfs_raw_extent)); + BUG_ON(search->result.items_in_buffer != 1); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + default: + SSDFS_ERR("unexpected buffer state %#x\n", + search->result.buf_state); + return -ERANGE; + } + + desc = &search->raw.invalidated_extent; + + ssdfs_memcpy(desc, 0, sizeof(struct ssdfs_raw_extent), + extent, 0, sizeof(struct ssdfs_raw_extent), + sizeof(struct ssdfs_raw_extent)); + + return 0; +} + +/* + * ssdfs_invextree_add() - add invalidated extent into the tree + * @tree: pointer on invalidated extents btree object + * @extent: invalidated extent + * @search: search object + * + * This method tries to add invalidated extent info into the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EEXIST - invalidated extent exists in the tree. + */ +int ssdfs_invextree_add(struct ssdfs_invextree_info *tree, + struct ssdfs_raw_extent *extent, + struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + u64 hash; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !tree->fsi || !extent || !search); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p, extent %p, search %p\n", + tree, extent, search); +#else + SSDFS_DBG("tree %p, extent %p, search %p\n", + tree, extent, search); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + fsi = tree->fsi; + hash = ssdfs_invextree_calculate_hash(fsi, + le64_to_cpu(extent->seg_id), + le32_to_cpu(extent->logical_blk)); + + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_ADD_ITEM; + search->request.flags = SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE; + search->request.start.hash = hash; + search->request.end.hash = hash; + search->request.count = 1; + + switch (atomic_read(&tree->state)) { + case SSDFS_INVEXTREE_CREATED: + case SSDFS_INVEXTREE_INITIALIZED: + case SSDFS_INVEXTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid invalidated extents tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + down_read(&tree->lock); + + err = ssdfs_btree_find_item(&tree->generic_tree, search); + if (err == -ENODATA) { + /* + * Invalidated extent doesn't exist. + */ + } else if (unlikely(err)) { + SSDFS_ERR("fail to find the invalidated extent: " + "seg_id %llu, logical_blk %u, err %d\n", + le64_to_cpu(extent->seg_id), + le32_to_cpu(extent->logical_blk), + err); + goto finish_add_invalidated_extent; + } + + if (err == -ENODATA) { + err = ssdfs_prepare_invalidated_extent(extent, search); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare invalidated extent: " + "err %d\n", err); + goto finish_add_invalidated_extent; + } + + search->request.type = SSDFS_BTREE_SEARCH_ADD_ITEM; + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + case SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE: + /* expected state */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid search result's state %#x\n", + search->result.state); + goto finish_add_invalidated_extent; + } + + if (search->result.buf_state != + SSDFS_BTREE_SEARCH_INLINE_BUFFER) { + err = -ERANGE; + SSDFS_ERR("invalid buf_state %#x\n", + search->result.buf_state); + goto finish_add_invalidated_extent; + } + + err = ssdfs_btree_add_item(&tree->generic_tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to add invalidated extent the tree: " + "err %d\n", err); + goto finish_add_invalidated_extent; + } + + atomic_set(&tree->state, SSDFS_INVEXTREE_DIRTY); + + ssdfs_btree_search_forget_parent_node(search); + ssdfs_btree_search_forget_child_node(search); + + if (unlikely(err)) { + SSDFS_ERR("fail to add invalidated extent: " + "err %d\n", err); + goto finish_add_invalidated_extent; + } + } else { + err = -EEXIST; + SSDFS_DBG("invalidated extent exists in the tree\n"); + goto finish_add_invalidated_extent; + } + +finish_add_invalidated_extent: + up_read(&tree->lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ssdfs_debug_invextree_object(tree); + + return err; +} + +/* + * ssdfs_invextree_delete() - delete invalidated extent from the tree + * @tree: invalidated extents tree + * @extent: invalidated extent + * @search: search object + * + * This method tries to delete invalidated extent from the tree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-ENODATA - invalidated extent doesn't exist in the tree. + */ +int ssdfs_invextree_delete(struct ssdfs_invextree_info *tree, + struct ssdfs_raw_extent *extent, + struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_raw_extent *desc; + u64 seg_id; + u32 logical_blk; + u32 len; + u64 start_hash; + u64 end_hash; + s64 extents_count; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !tree->fsi || !extent || !search); +#endif /* CONFIG_SSDFS_DEBUG */ + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("tree %p, extent %p, search %p\n", + tree, extent, search); +#else + SSDFS_DBG("tree %p, extent %p, search %p\n", + tree, extent, search); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + switch (atomic_read(&tree->state)) { + case SSDFS_INVEXTREE_CREATED: + case SSDFS_INVEXTREE_INITIALIZED: + case SSDFS_INVEXTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid invalidated extents tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + seg_id = le64_to_cpu(extent->seg_id); + logical_blk = le32_to_cpu(extent->logical_blk); + len = le32_to_cpu(extent->len); + + fsi = tree->fsi; + start_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, logical_blk); + end_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk + len - 1); + + search->request.type = SSDFS_BTREE_SEARCH_DELETE_RANGE; + + if (need_initialize_invextree_search(fsi, seg_id, + logical_blk, search)) { + ssdfs_btree_search_init(search); + search->request.type = SSDFS_BTREE_SEARCH_DELETE_RANGE; + search->request.flags = + SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE; + search->request.start.hash = start_hash; + search->request.end.hash = end_hash; + search->request.count = len; + } + + down_read(&tree->lock); + + err = ssdfs_btree_find_item(&tree->generic_tree, search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the invalidated extent: " + "seg_id %llu, err %d\n", + seg_id, err); + goto finish_delete_invalidated_extent; + } + + search->request.type = SSDFS_BTREE_SEARCH_DELETE_ITEM; + + if (search->result.state != SSDFS_BTREE_SEARCH_VALID_ITEM) { + err = -ERANGE; + SSDFS_ERR("invalid search result's state %#x\n", + search->result.state); + goto finish_delete_invalidated_extent; + } + + if (search->result.buf_state != SSDFS_BTREE_SEARCH_INLINE_BUFFER) { + err = -ERANGE; + SSDFS_ERR("invalid buf_state %#x\n", + search->result.buf_state); + goto finish_delete_invalidated_extent; + } + + desc = &search->raw.invalidated_extent; + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_VALID_ITEM: + if (seg_id != cpu_to_le64(desc->seg_id)) { + SSDFS_ERR("invalid result state: " + "seg_id1 %llu != seg_id2 %llu\n", + seg_id, cpu_to_le64(desc->seg_id)); + goto finish_delete_invalidated_extent; + } + break; + + default: + err = -ERANGE; + SSDFS_WARN("unexpected result state %#x\n", + search->result.state); + goto finish_delete_invalidated_extent; + } + + extents_count = atomic64_read(&tree->extents_count); + if (extents_count == 0) { + err = -ENOENT; + SSDFS_DBG("empty tree\n"); + goto finish_delete_invalidated_extent; + } + + if (search->result.start_index >= extents_count) { + err = -ENODATA; + SSDFS_ERR("invalid search result: " + "start_index %u, extents_count %lld\n", + search->result.start_index, + extents_count); + goto finish_delete_invalidated_extent; + } + + err = ssdfs_btree_delete_item(&tree->generic_tree, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete invalidated extent from the tree: " + "err %d\n", err); + goto finish_delete_invalidated_extent; + } + + atomic_set(&tree->state, SSDFS_INVEXTREE_DIRTY); + + ssdfs_btree_search_forget_parent_node(search); + ssdfs_btree_search_forget_child_node(search); + + extents_count = atomic64_read(&tree->extents_count); + + if (extents_count == 0) { + err = -ENOENT; + SSDFS_DBG("tree is empty now\n"); + goto finish_delete_invalidated_extent; + } else if (extents_count < 0) { + err = -ERANGE; + SSDFS_WARN("invalid extents_count %lld\n", + extents_count); + atomic_set(&tree->state, SSDFS_INVEXTREE_CORRUPTED); + goto finish_delete_invalidated_extent; + } + +finish_delete_invalidated_extent: + up_read(&tree->lock); + +#ifdef CONFIG_SSDFS_TRACK_API_CALL + SSDFS_ERR("finished\n"); +#endif /* CONFIG_SSDFS_TRACK_API_CALL */ + + ssdfs_debug_invextree_object(tree); + + return err; +} + +/****************************************************************************** + * SPECIALIZED INVALIDATED EXTENTS BTREE DESCRIPTOR OPERATIONS * + ******************************************************************************/ + +/* + * ssdfs_invextree_desc_init() - specialized btree descriptor init + * @fsi: pointer on shared file system object + * @tree: pointer on invalidated extents btree object + */ +static +int ssdfs_invextree_desc_init(struct ssdfs_fs_info *fsi, + struct ssdfs_btree *tree) +{ + struct ssdfs_btree_descriptor *desc; + u32 erasesize; + u32 node_size; + size_t desc_size = sizeof(struct ssdfs_raw_extent); + u16 item_size; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !tree); + BUG_ON(!rwsem_is_locked(&fsi->volume_sem)); + + SSDFS_DBG("fsi %p, tree %p\n", + fsi, tree); +#endif /* CONFIG_SSDFS_DEBUG */ + + erasesize = fsi->erasesize; + + desc = &fsi->vh->invextree.desc; + + if (le32_to_cpu(desc->magic) != SSDFS_INVEXT_BTREE_MAGIC) { + err = -EIO; + SSDFS_ERR("invalid magic %#x\n", + le32_to_cpu(desc->magic)); + goto finish_btree_desc_init; + } + + /* TODO: check flags */ + + if (desc->type != SSDFS_INVALIDATED_EXTENTS_BTREE) { + err = -EIO; + SSDFS_ERR("invalid btree type %#x\n", + desc->type); + goto finish_btree_desc_init; + } + + node_size = 1 << desc->log_node_size; + if (node_size < SSDFS_4KB || node_size > erasesize) { + err = -EIO; + SSDFS_ERR("invalid node size: " + "log_node_size %u, node_size %u, erasesize %u\n", + desc->log_node_size, + node_size, erasesize); + goto finish_btree_desc_init; + } + + item_size = le16_to_cpu(desc->item_size); + + if (item_size != desc_size) { + err = -EIO; + SSDFS_ERR("invalid item size %u\n", + item_size); + goto finish_btree_desc_init; + } + + if (le16_to_cpu(desc->index_area_min_size) != (16 * desc_size)) { + err = -EIO; + SSDFS_ERR("invalid index_area_min_size %u\n", + le16_to_cpu(desc->index_area_min_size)); + goto finish_btree_desc_init; + } + + err = ssdfs_btree_desc_init(fsi, tree, desc, (u8)item_size, item_size); + +finish_btree_desc_init: + if (unlikely(err)) { + SSDFS_ERR("fail to init btree descriptor: err %d\n", + err); + } + + return err; +} + +/* + * ssdfs_invextree_desc_flush() - specialized btree's descriptor flush + * @tree: pointer on invalidated extents btree object + */ +static +int ssdfs_invextree_desc_flush(struct ssdfs_btree *tree) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_btree_descriptor desc; + size_t desc_size = sizeof(struct ssdfs_raw_extent); + u32 erasesize; + u32 node_size; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !tree->fsi); + BUG_ON(!rwsem_is_locked(&tree->fsi->volume_sem)); + + SSDFS_DBG("owner_ino %llu, type %#x, state %#x\n", + tree->owner_ino, tree->type, + atomic_read(&tree->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = tree->fsi; + + memset(&desc, 0xFF, sizeof(struct ssdfs_btree_descriptor)); + + desc.magic = cpu_to_le32(SSDFS_INVEXT_BTREE_MAGIC); + desc.item_size = cpu_to_le16(desc_size); + + err = ssdfs_btree_desc_flush(tree, &desc); + if (unlikely(err)) { + SSDFS_ERR("invalid btree descriptor: err %d\n", + err); + return err; + } + + if (desc.type != SSDFS_INVALIDATED_EXTENTS_BTREE) { + SSDFS_ERR("invalid btree type %#x\n", + desc.type); + return -ERANGE; + } + + erasesize = fsi->erasesize; + node_size = 1 << desc.log_node_size; + + if (node_size < SSDFS_4KB || node_size > erasesize) { + SSDFS_ERR("invalid node size: " + "log_node_size %u, node_size %u, erasesize %u\n", + desc.log_node_size, + node_size, erasesize); + return -ERANGE; + } + + if (le16_to_cpu(desc.index_area_min_size) != (16 * desc_size)) { + SSDFS_ERR("invalid index_area_min_size %u\n", + le16_to_cpu(desc.index_area_min_size)); + return -ERANGE; + } + + ssdfs_memcpy(&fsi->vh->invextree.desc, + 0, sizeof(struct ssdfs_btree_descriptor), + &desc, + 0, sizeof(struct ssdfs_btree_descriptor), + sizeof(struct ssdfs_btree_descriptor)); + + return 0; +} + +/****************************************************************************** + * SPECIALIZED INVALIDATED EXTENTS BTREE OPERATIONS * + ******************************************************************************/ + +/* + * ssdfs_invextree_create_root_node() - specialized root node creation + * @fsi: pointer on shared file system object + * @node: pointer on node object [out] + */ +static +int ssdfs_invextree_create_root_node(struct ssdfs_fs_info *fsi, + struct ssdfs_btree_node *node) +{ + struct ssdfs_btree_inline_root_node *root_node; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !fsi->vs || !node); + BUG_ON(!rwsem_is_locked(&fsi->volume_sem)); + + SSDFS_DBG("fsi %p, node %p\n", + fsi, node); +#endif /* CONFIG_SSDFS_DEBUG */ + + root_node = &fsi->vh->invextree.root_node; + err = ssdfs_btree_create_root_node(node, root_node); + if (unlikely(err)) { + SSDFS_ERR("fail to create root node: err %d\n", + err); + } + + return err; +} + +/* + * ssdfs_invextree_pre_flush_root_node() - specialized root node pre-flush + * @node: pointer on node object + */ +static +int ssdfs_invextree_pre_flush_root_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_btree *tree; + struct ssdfs_state_bitmap *bmap; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node_id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + case SSDFS_BTREE_NODE_INITIALIZED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is clean\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + + case SSDFS_BTREE_NODE_CORRUPTED: + SSDFS_WARN("node %u is corrupted\n", + node->node_id); + down_read(&node->bmap_array.lock); + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_DIRTY_BMAP]; + spin_lock(&bmap->lock); + bitmap_clear(bmap->ptr, 0, node->bmap_array.bits_count); + spin_unlock(&bmap->lock); + up_read(&node->bmap_array.lock); + clear_ssdfs_btree_node_dirty(node); + return -EFAULT; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + tree = node->tree; + if (!tree) { + SSDFS_ERR("node hasn't pointer on tree\n"); + return -ERANGE; + } + + if (tree->type != SSDFS_INVALIDATED_EXTENTS_BTREE) { + SSDFS_WARN("invalid tree type %#x\n", + tree->type); + return -ERANGE; + } + + down_write(&node->full_lock); + down_write(&node->header_lock); + + err = ssdfs_btree_pre_flush_root_node(node); + if (unlikely(err)) { + SSDFS_ERR("fail to pre-flush root node: " + "node_id %u, err %d\n", + node->node_id, err); + } + + up_write(&node->header_lock); + up_write(&node->full_lock); + + return err; +} + +/* + * ssdfs_invextree_flush_root_node() - specialized root node flush + * @node: pointer on node object + */ +static +int ssdfs_invextree_flush_root_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_btree_inline_root_node *root_node; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !node->tree->fsi); + BUG_ON(!rwsem_is_locked(&node->tree->fsi->volume_sem)); + + SSDFS_DBG("node_id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!is_ssdfs_btree_node_dirty(node)) { + SSDFS_WARN("node %u is not dirty\n", + node->node_id); + return 0; + } + + root_node = &node->tree->fsi->vh->invextree.root_node; + ssdfs_btree_flush_root_node(node, root_node); + + return 0; +} + +/* + * ssdfs_invextree_create_node() - specialized node creation + * @node: pointer on node object + */ +static +int ssdfs_invextree_create_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_btree *tree; + void *addr[SSDFS_BTREE_NODE_BMAP_COUNT]; + size_t hdr_size = sizeof(struct ssdfs_invextree_node_header); + u32 node_size; + u32 items_area_size = 0; + u16 item_size = 0; + u16 index_size = 0; + u16 index_area_min_size; + u16 items_capacity = 0; + u16 index_capacity = 0; + u32 index_area_size = 0; + size_t bmap_bytes; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree); + WARN_ON(atomic_read(&node->state) != SSDFS_BTREE_NODE_CREATED); + + SSDFS_DBG("node_id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree = node->tree; + node_size = tree->node_size; + index_area_min_size = tree->index_area_min_size; + + node->node_ops = &ssdfs_invextree_node_ops; + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_INDEX_NODE: + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid index area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_AREA_ABSENT: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid items area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + break; + + case SSDFS_BTREE_HYBRID_NODE: + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid index area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid items area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + break; + + case SSDFS_BTREE_LEAF_NODE: + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_AREA_ABSENT: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid index area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid items area's state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + break; + + default: + SSDFS_WARN("invalid node type %#x\n", + atomic_read(&node->type)); + return -ERANGE; + } + + down_write(&node->header_lock); + down_write(&node->bmap_array.lock); + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_INDEX_NODE: + node->index_area.offset = (u32)hdr_size; + node->index_area.area_size = node_size - hdr_size; + + index_area_size = node->index_area.area_size; + index_size = node->index_area.index_size; + + node->index_area.index_capacity = index_area_size / index_size; + index_capacity = node->index_area.index_capacity; + + node->bmap_array.index_start_bit = + SSDFS_BTREE_NODE_HEADER_INDEX + 1; + node->bmap_array.item_start_bit = + node->bmap_array.index_start_bit + index_capacity; + break; + + case SSDFS_BTREE_HYBRID_NODE: + node->index_area.offset = (u32)hdr_size; + + if (index_area_min_size == 0 || + index_area_min_size >= (node_size - hdr_size)) { + err = -ERANGE; + SSDFS_ERR("invalid index area desc: " + "index_area_min_size %u, " + "node_size %u, hdr_size %zu\n", + index_area_min_size, + node_size, hdr_size); + goto finish_create_node; + } + + node->index_area.area_size = index_area_min_size; + + index_area_size = node->index_area.area_size; + index_size = node->index_area.index_size; + node->index_area.index_capacity = index_area_size / index_size; + index_capacity = node->index_area.index_capacity; + + node->items_area.offset = node->index_area.offset + + node->index_area.area_size; + + if (node->items_area.offset >= node_size) { + err = -ERANGE; + SSDFS_ERR("invalid items area desc: " + "area_offset %u, node_size %u\n", + node->items_area.offset, + node_size); + goto finish_create_node; + } + + node->items_area.area_size = node_size - + node->items_area.offset; + node->items_area.free_space = node->items_area.area_size; + node->items_area.item_size = tree->item_size; + node->items_area.min_item_size = tree->min_item_size; + node->items_area.max_item_size = tree->max_item_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_size %u, hdr_size %zu, free_space %u\n", + node_size, hdr_size, + node->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + items_area_size = node->items_area.area_size; + item_size = node->items_area.item_size; + + node->items_area.items_count = 0; + node->items_area.items_capacity = items_area_size / item_size; + items_capacity = node->items_area.items_capacity; + + if (node->items_area.items_capacity == 0) { + err = -ERANGE; + SSDFS_ERR("items area's capacity %u\n", + node->items_area.items_capacity); + goto finish_create_node; + } + + node->items_area.end_hash = node->items_area.start_hash + + node->items_area.items_capacity - 1; + + node->bmap_array.index_start_bit = + SSDFS_BTREE_NODE_HEADER_INDEX + 1; + node->bmap_array.item_start_bit = + node->bmap_array.index_start_bit + index_capacity; + break; + + case SSDFS_BTREE_LEAF_NODE: + node->items_area.offset = (u32)hdr_size; + node->items_area.area_size = node_size - hdr_size; + node->items_area.free_space = node->items_area.area_size; + node->items_area.item_size = tree->item_size; + node->items_area.min_item_size = tree->min_item_size; + node->items_area.max_item_size = tree->max_item_size; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_size %u, hdr_size %zu, free_space %u\n", + node_size, hdr_size, + node->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + items_area_size = node->items_area.area_size; + item_size = node->items_area.item_size; + + node->items_area.items_count = 0; + node->items_area.items_capacity = items_area_size / item_size; + items_capacity = node->items_area.items_capacity; + + node->items_area.end_hash = node->items_area.start_hash + + node->items_area.items_capacity - 1; + + node->bmap_array.item_start_bit = + SSDFS_BTREE_NODE_HEADER_INDEX + 1; + break; + + default: + err = -ERANGE; + SSDFS_WARN("invalid node type %#x\n", + atomic_read(&node->type)); + goto finish_create_node; + } + + node->bmap_array.bits_count = index_capacity + items_capacity + 1; + + if (item_size > 0) + items_capacity = node_size / item_size; + else + items_capacity = 0; + + if (index_size > 0) + index_capacity = node_size / index_size; + else + index_capacity = 0; + + bmap_bytes = index_capacity + items_capacity + 1; + bmap_bytes += BITS_PER_LONG; + bmap_bytes /= BITS_PER_BYTE; + + node->bmap_array.bmap_bytes = bmap_bytes; + + if (bmap_bytes == 0 || bmap_bytes > SSDFS_INVEXTREE_BMAP_SIZE) { + err = -EIO; + SSDFS_ERR("invalid bmap_bytes %zu\n", + bmap_bytes); + goto finish_create_node; + } + + node->raw.invextree_header.extents_count = cpu_to_le32(0); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, extents_count %u\n", + node->node_id, + le32_to_cpu(node->raw.invextree_header.extents_count)); + SSDFS_DBG("items_count %u, items_capacity %u, " + "start_hash %llx, end_hash %llx\n", + node->items_area.items_count, + node->items_area.items_capacity, + node->items_area.start_hash, + node->items_area.end_hash); + SSDFS_DBG("index_count %u, index_capacity %u, " + "start_hash %llx, end_hash %llx\n", + node->index_area.index_count, + node->index_area.index_capacity, + node->index_area.start_hash, + node->index_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_create_node: + up_write(&node->bmap_array.lock); + up_write(&node->header_lock); + + if (unlikely(err)) + return err; + + err = ssdfs_btree_node_allocate_bmaps(addr, bmap_bytes); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate node's bitmaps: " + "bmap_bytes %zu, err %d\n", + bmap_bytes, err); + return err; + } + + down_write(&node->bmap_array.lock); + for (i = 0; i < SSDFS_BTREE_NODE_BMAP_COUNT; i++) { + spin_lock(&node->bmap_array.bmap[i].lock); + node->bmap_array.bmap[i].ptr = addr[i]; + addr[i] = NULL; + spin_unlock(&node->bmap_array.bmap[i].lock); + } + up_write(&node->bmap_array.lock); + + err = ssdfs_btree_node_allocate_content_space(node, node_size); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate content space: " + "node_size %u, err %d\n", + node_size, err); + return err; + } + + ssdfs_debug_btree_node_object(node); + + return err; +} + +/* + * ssdfs_invextree_init_node() - init invalidated extents tree's node + * @node: pointer on node object + * + * This method tries to init the node of invalidated extents btree. + * + * It makes sense to allocate the bitmap with taking into + * account that we will resize the node. So, it needs + * to allocate the index area in bitmap is equal to + * the whole node and items area is equal to the whole node. + * This technique provides opportunity not to resize or + * to shift the content of the bitmap. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ENOMEM - unable to allocate memory. + * %-ERANGE - internal error. + * %-EIO - invalid node's header content + */ +static +int ssdfs_invextree_init_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_btree *tree; + struct ssdfs_invextree_info *tree_info = NULL; + struct ssdfs_invextree_node_header *hdr; + size_t hdr_size = sizeof(struct ssdfs_invextree_node_header); + void *addr[SSDFS_BTREE_NODE_BMAP_COUNT]; + struct page *page; + void *kaddr; + u64 start_hash, end_hash; + u32 node_size; + u16 item_size; + u32 extents_count; + u16 items_capacity; + u32 items_count; + u16 free_space = 0; + u32 calculated_used_space; + u16 flags; + u8 index_size; + u32 index_area_size = 0; + u16 index_capacity = 0; + size_t bmap_bytes; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node_id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree = node->tree; + if (!tree) { + SSDFS_ERR("node hasn't pointer on tree\n"); + return -ERANGE; + } + + if (node->tree->type == SSDFS_INVALIDATED_EXTENTS_BTREE) + tree_info = (struct ssdfs_invextree_info *)node->tree; + else { + SSDFS_ERR("invalid tree type %#x\n", + node->tree->type); + return -ERANGE; + } + + if (atomic_read(&node->state) != SSDFS_BTREE_NODE_CONTENT_PREPARED) { + SSDFS_WARN("fail to init node: id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); + return -ERANGE; + } + + down_read(&node->full_lock); + + if (pagevec_count(&node->content.pvec) == 0) { + err = -ERANGE; + SSDFS_ERR("empty node's content: id %u\n", + node->node_id); + goto finish_init_node; + } + + page = node->content.pvec.pages[0]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + kaddr = kmap_local_page(page); + + hdr = (struct ssdfs_invextree_node_header *)kaddr; + + if (!is_csum_valid(&hdr->node.check, hdr, hdr_size)) { + err = -EIO; + SSDFS_ERR("invalid checksum: node_id %u\n", + node->node_id); + goto finish_init_operation; + } + + if (le32_to_cpu(hdr->node.magic.common) != SSDFS_SUPER_MAGIC || + le16_to_cpu(hdr->node.magic.key) != SSDFS_INVEXT_BNODE_MAGIC) { + err = -EIO; + SSDFS_ERR("invalid magic: common %#x, key %#x\n", + le32_to_cpu(hdr->node.magic.common), + le16_to_cpu(hdr->node.magic.key)); + goto finish_init_operation; + } + + down_write(&node->header_lock); + + ssdfs_memcpy(&node->raw.invextree_header, 0, hdr_size, + hdr, 0, hdr_size, + hdr_size); + + err = ssdfs_btree_init_node(node, &hdr->node, + hdr_size); + if (unlikely(err)) { + SSDFS_ERR("fail to init node: id %u, err %d\n", + node->node_id, err); + goto finish_header_init; + } + + flags = atomic_read(&node->flags); + + start_hash = le64_to_cpu(hdr->node.start_hash); + end_hash = le64_to_cpu(hdr->node.end_hash); + node_size = 1 << hdr->node.log_node_size; + index_size = hdr->node.index_size; + item_size = hdr->node.min_item_size; + items_capacity = le16_to_cpu(hdr->node.items_capacity); + extents_count = le32_to_cpu(hdr->extents_count); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_hash %llx, end_hash %llx, " + "items_capacity %u, extents_count %u\n", + start_hash, end_hash, + items_capacity, extents_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (item_size == 0 || node_size % item_size) { + err = -EIO; + SSDFS_ERR("invalid size: item_size %u, node_size %u\n", + item_size, node_size); + goto finish_header_init; + } + + if (item_size != sizeof(struct ssdfs_raw_extent)) { + err = -EIO; + SSDFS_ERR("invalid item_size: " + "size %u, expected size %zu\n", + item_size, + sizeof(struct ssdfs_raw_extent)); + goto finish_header_init; + } + + calculated_used_space = hdr_size; + calculated_used_space += extents_count * item_size; + + if (flags & SSDFS_BTREE_NODE_HAS_INDEX_AREA) { + index_area_size = 1 << hdr->node.log_index_area_size; + calculated_used_space += index_area_size; + } + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_ROOT_NODE: + /* do nothing */ + break; + + case SSDFS_BTREE_INDEX_NODE: + if (flags & SSDFS_BTREE_NODE_HAS_INDEX_AREA) { + if (index_area_size != node->node_size) { + err = -EIO; + SSDFS_ERR("invalid index area's size: " + "node_id %u, index_area_size %u, " + "node_size %u\n", + node->node_id, + index_area_size, + node->node_size); + goto finish_header_init; + } + + calculated_used_space -= hdr_size; + } else { + err = -EIO; + SSDFS_ERR("invalid set of flags: " + "node_id %u, flags %#x\n", + node->node_id, flags); + goto finish_header_init; + } + + free_space = 0; + break; + + case SSDFS_BTREE_HYBRID_NODE: + if (flags & SSDFS_BTREE_NODE_HAS_INDEX_AREA) { + /* + * expected state + */ + } else { + err = -EIO; + SSDFS_ERR("invalid set of flags: " + "node_id %u, flags %#x\n", + node->node_id, flags); + goto finish_header_init; + } + /* FALLTHRU */ + fallthrough; + case SSDFS_BTREE_LEAF_NODE: + if (extents_count > 0 && + (start_hash >= U64_MAX || end_hash >= U64_MAX)) { + err = -EIO; + SSDFS_ERR("invalid hash range: " + "start_hash %llx, end_hash %llx\n", + start_hash, end_hash); + goto finish_header_init; + } + + if (item_size == 0 || node_size % item_size) { + err = -EIO; + SSDFS_ERR("invalid size: item_size %u, node_size %u\n", + item_size, node_size); + goto finish_header_init; + } + + if (item_size != sizeof(struct ssdfs_raw_extent)) { + err = -EIO; + SSDFS_ERR("invalid item_size: " + "size %u, expected size %zu\n", + item_size, + sizeof(struct ssdfs_raw_extent)); + goto finish_header_init; + } + + if (items_capacity == 0 || + items_capacity > (node_size / item_size)) { + err = -EIO; + SSDFS_ERR("invalid items_capacity %u\n", + items_capacity); + goto finish_header_init; + } + + if (extents_count > items_capacity) { + err = -EIO; + SSDFS_ERR("items_capacity %u != extents_count %u\n", + items_capacity, + extents_count); + goto finish_header_init; + } + + free_space = + (u32)(items_capacity - extents_count) * item_size; + if (free_space > node->items_area.area_size) { + err = -EIO; + SSDFS_ERR("free_space %u > area_size %u\n", + free_space, + node->items_area.area_size); + goto finish_header_init; + } + break; + + default: + BUG(); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_space %u, index_area_size %u, " + "hdr_size %zu, extents_count %u, " + "item_size %u\n", + free_space, index_area_size, hdr_size, + extents_count, item_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (free_space != (node_size - calculated_used_space)) { + err = -EIO; + SSDFS_ERR("free_space %u, node_size %u, " + "calculated_used_space %u\n", + free_space, node_size, + calculated_used_space); + goto finish_header_init; + } + + node->items_area.free_space = free_space; + node->items_area.items_count = (u16)extents_count; + node->items_area.items_capacity = items_capacity; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_count %u, items_capacity %u, " + "start_hash %llx, end_hash %llx\n", + node->items_area.items_count, + node->items_area.items_capacity, + node->items_area.start_hash, + node->items_area.end_hash); + SSDFS_DBG("index_count %u, index_capacity %u, " + "start_hash %llx, end_hash %llx\n", + node->index_area.index_count, + node->index_area.index_capacity, + node->index_area.start_hash, + node->index_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_header_init: + up_write(&node->header_lock); + + if (unlikely(err)) + goto finish_init_operation; + + items_count = node_size / item_size; + + if (item_size > 0) + items_capacity = node_size / item_size; + else + items_capacity = 0; + + if (index_size > 0) + index_capacity = node_size / index_size; + else + index_capacity = 0; + + bmap_bytes = index_capacity + items_capacity + 1; + bmap_bytes += BITS_PER_LONG; + bmap_bytes /= BITS_PER_BYTE; + + if (bmap_bytes == 0 || bmap_bytes > SSDFS_INVEXTREE_BMAP_SIZE) { + err = -EIO; + SSDFS_ERR("invalid bmap_bytes %zu\n", + bmap_bytes); + goto finish_init_operation; + } + + err = ssdfs_btree_node_allocate_bmaps(addr, bmap_bytes); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate node's bitmaps: " + "bmap_bytes %zu, err %d\n", + bmap_bytes, err); + goto finish_init_operation; + } + + down_write(&node->bmap_array.lock); + + if (flags & SSDFS_BTREE_NODE_HAS_INDEX_AREA) { + node->bmap_array.index_start_bit = + SSDFS_BTREE_NODE_HEADER_INDEX + 1; + /* + * Reserve the whole node space as + * potential space for indexes. + */ + index_capacity = node_size / index_size; + node->bmap_array.item_start_bit = + node->bmap_array.index_start_bit + index_capacity; + } else if (flags & SSDFS_BTREE_NODE_HAS_ITEMS_AREA) { + node->bmap_array.item_start_bit = + SSDFS_BTREE_NODE_HEADER_INDEX + 1; + } else + BUG(); + + node->bmap_array.bits_count = index_capacity + items_capacity + 1; + node->bmap_array.bmap_bytes = bmap_bytes; + + ssdfs_btree_node_init_bmaps(node, addr); + + spin_lock(&node->bmap_array.bmap[SSDFS_BTREE_NODE_ALLOC_BMAP].lock); + bitmap_set(node->bmap_array.bmap[SSDFS_BTREE_NODE_ALLOC_BMAP].ptr, + 0, extents_count); + spin_unlock(&node->bmap_array.bmap[SSDFS_BTREE_NODE_ALLOC_BMAP].lock); + + up_write(&node->bmap_array.lock); +finish_init_operation: + kunmap_local(kaddr); + + if (unlikely(err)) + goto finish_init_node; + + atomic64_add((u64)extents_count, &tree_info->extents_count); + +finish_init_node: + up_read(&node->full_lock); + + ssdfs_debug_btree_node_object(node); + + return err; +} + +static +void ssdfs_invextree_destroy_node(struct ssdfs_btree_node *node) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("operation is unavailable\n"); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +/* + * ssdfs_invextree_add_node() - add node into invalidated extents btree + * @node: pointer on node object + * + * This method tries to finish addition of node + * into invalidated extents btree. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_invextree_add_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_btree *tree; + int type; + u16 items_capacity = 0; + u64 start_hash = U64_MAX; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node_id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_CREATED: + case SSDFS_BTREE_NODE_INITIALIZED: + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_WARN("invalid node: id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); + return -ERANGE; + } + + type = atomic_read(&node->type); + + switch (type) { + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + case SSDFS_BTREE_LEAF_NODE: + /* expected states */ + break; + + default: + SSDFS_WARN("invalid node type %#x\n", type); + return -ERANGE; + }; + + tree = node->tree; + if (!tree) { + SSDFS_ERR("node hasn't pointer on tree\n"); + return -ERANGE; + } + + down_read(&node->header_lock); + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + items_capacity = node->items_area.items_capacity; + start_hash = node->items_area.start_hash; + break; + default: + items_capacity = 0; + break; + }; + + if (items_capacity == 0) { + if (type == SSDFS_BTREE_LEAF_NODE || + type == SSDFS_BTREE_HYBRID_NODE) { + err = -ERANGE; + SSDFS_ERR("invalid node state: " + "type %#x, items_capacity %u\n", + type, items_capacity); + goto finish_add_node; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, extents_count %u\n", + node->node_id, + le32_to_cpu(node->raw.invextree_header.extents_count)); + SSDFS_DBG("items_count %u, items_capacity %u, " + "start_hash %llx, end_hash %llx\n", + node->items_area.items_count, + node->items_area.items_capacity, + node->items_area.start_hash, + node->items_area.end_hash); + SSDFS_DBG("index_count %u, index_capacity %u, " + "start_hash %llx, end_hash %llx\n", + node->index_area.index_count, + node->index_area.index_capacity, + node->index_area.start_hash, + node->index_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_add_node: + up_read(&node->header_lock); + + ssdfs_debug_btree_node_object(node); + + if (err) + return err; + + err = ssdfs_btree_update_parent_node_pointer(tree, node); + if (unlikely(err)) { + SSDFS_ERR("fail to update parent pointer: " + "node_id %u, err %d\n", + node->node_id, err); + return err; + } + + return 0; +} + +static +int ssdfs_invextree_delete_node(struct ssdfs_btree_node *node) +{ + /* TODO: implement */ + SSDFS_DBG("TODO: implement\n"); + return 0; + +/* + * TODO: it needs to add special free space descriptor in the + * index area for the case of deleted nodes. Code of + * allocation of new items should create empty node + * with completely free items during passing through + * index level. + */ + + + +/* + * TODO: node can be really deleted/invalidated. But index + * area should contain index for deleted node with + * special flag. In this case it will be clear that + * we have some capacity without real node allocation. + * If some item will be added in the node then node + * has to be allocated. It means that if you delete + * a node then index hierachy will be the same without + * necessity to delete or modify it. + */ +} + +/* + * ssdfs_invextree_pre_flush_node() - pre-flush node's header + * @node: pointer on node object + * + * This method tries to flush node's header. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_invextree_pre_flush_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_invextree_node_header invextree_header; + size_t hdr_size = sizeof(struct ssdfs_invextree_node_header); + struct ssdfs_btree *tree; + struct ssdfs_invextree_info *tree_info = NULL; + struct ssdfs_state_bitmap *bmap; + struct page *page; + u16 items_count; + u32 items_area_size; + u16 extents_count; + u32 used_space; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node_id %u, state %#x\n", + node->node_id, atomic_read(&node->state)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_debug_btree_node_object(node); + + switch (atomic_read(&node->state)) { + case SSDFS_BTREE_NODE_DIRTY: + /* expected state */ + break; + + case SSDFS_BTREE_NODE_INITIALIZED: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node %u is clean\n", + node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + + case SSDFS_BTREE_NODE_CORRUPTED: + SSDFS_WARN("node %u is corrupted\n", + node->node_id); + down_read(&node->bmap_array.lock); + bmap = &node->bmap_array.bmap[SSDFS_BTREE_NODE_DIRTY_BMAP]; + spin_lock(&bmap->lock); + bitmap_clear(bmap->ptr, 0, node->bmap_array.bits_count); + spin_unlock(&bmap->lock); + up_read(&node->bmap_array.lock); + clear_ssdfs_btree_node_dirty(node); + return -EFAULT; + + default: + SSDFS_ERR("invalid node state %#x\n", + atomic_read(&node->state)); + return -ERANGE; + } + + tree = node->tree; + if (!tree) { + SSDFS_ERR("node hasn't pointer on tree\n"); + return -ERANGE; + } + + if (tree->type != SSDFS_INVALIDATED_EXTENTS_BTREE) { + SSDFS_WARN("invalid tree type %#x\n", + tree->type); + return -ERANGE; + } else { + tree_info = container_of(tree, + struct ssdfs_invextree_info, + generic_tree); + } + + down_write(&node->full_lock); + down_write(&node->header_lock); + + ssdfs_memcpy(&invextree_header, 0, hdr_size, + &node->raw.invextree_header, 0, hdr_size, + hdr_size); + + invextree_header.node.magic.common = cpu_to_le32(SSDFS_SUPER_MAGIC); + invextree_header.node.magic.key = + cpu_to_le16(SSDFS_INVEXT_BNODE_MAGIC); + invextree_header.node.magic.version.major = SSDFS_MAJOR_REVISION; + invextree_header.node.magic.version.minor = SSDFS_MINOR_REVISION; + + err = ssdfs_btree_node_pre_flush_header(node, &invextree_header.node); + if (unlikely(err)) { + SSDFS_ERR("fail to flush generic header: " + "node_id %u, err %d\n", + node->node_id, err); + goto finish_invextree_header_preparation; + } + + items_count = node->items_area.items_count; + items_area_size = node->items_area.area_size; + extents_count = le16_to_cpu(invextree_header.extents_count); + + if (extents_count != items_count) { + err = -ERANGE; + SSDFS_ERR("extents_count %u != items_count %u\n", + extents_count, items_count); + goto finish_invextree_header_preparation; + } + + used_space = (u32)items_count * sizeof(struct ssdfs_raw_extent); + + if (used_space > items_area_size) { + err = -ERANGE; + SSDFS_ERR("used_space %u > items_area_size %u\n", + used_space, items_area_size); + goto finish_invextree_header_preparation; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("extents_count %u, " + "items_area_size %u, item_size %zu\n", + extents_count, items_area_size, + sizeof(struct ssdfs_raw_extent)); +#endif /* CONFIG_SSDFS_DEBUG */ + + invextree_header.node.check.bytes = cpu_to_le16((u16)hdr_size); + invextree_header.node.check.flags = cpu_to_le16(SSDFS_CRC32); + + err = ssdfs_calculate_csum(&invextree_header.node.check, + &invextree_header, hdr_size); + if (unlikely(err)) { + SSDFS_ERR("unable to calculate checksum: err %d\n", err); + goto finish_invextree_header_preparation; + } + + ssdfs_memcpy(&node->raw.invextree_header, 0, hdr_size, + &invextree_header, 0, hdr_size, + hdr_size); + +finish_invextree_header_preparation: + up_write(&node->header_lock); + + if (unlikely(err)) + goto finish_node_pre_flush; + + if (pagevec_count(&node->content.pvec) < 1) { + err = -ERANGE; + SSDFS_ERR("pagevec is empty\n"); + goto finish_node_pre_flush; + } + + page = node->content.pvec.pages[0]; + ssdfs_memcpy_to_page(page, 0, PAGE_SIZE, + &invextree_header, 0, hdr_size, + hdr_size); + +finish_node_pre_flush: + up_write(&node->full_lock); + + return err; +} + +/* + * ssdfs_invextree_btree_flush_node() - flush node + * @node: pointer on node object + * + * This method tries to flush node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_invextree_flush_node(struct ssdfs_btree_node *node) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_btree *tree; + u64 fs_feature_compat; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + + SSDFS_DBG("node %p, node_id %u\n", + node, node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + tree = node->tree; + if (!tree) { + SSDFS_ERR("node hasn't pointer on tree\n"); + return -ERANGE; + } + + if (tree->type != SSDFS_INVALIDATED_EXTENTS_BTREE) { + SSDFS_WARN("invalid tree type %#x\n", + tree->type); + return -ERANGE; + } + + fsi = node->tree->fsi; + + spin_lock(&fsi->volume_state_lock); + fs_feature_compat = fsi->fs_feature_compat; + spin_unlock(&fsi->volume_state_lock); + + if (fs_feature_compat & SSDFS_HAS_INVALID_EXTENTS_TREE_COMPAT_FLAG) { + err = ssdfs_btree_common_node_flush(node); + if (unlikely(err)) { + SSDFS_ERR("fail to flush node: " + "node_id %u, height %u, err %d\n", + node->node_id, + atomic_read(&node->height), + err); + } + } else { + err = -EFAULT; + SSDFS_CRIT("invalidated extents tree is absent\n"); + } + + ssdfs_debug_btree_node_object(node); + + return err; +} diff --git a/fs/ssdfs/invalidated_extents_tree.h b/fs/ssdfs/invalidated_extents_tree.h new file mode 100644 index 000000000000..1ba613504565 --- /dev/null +++ b/fs/ssdfs/invalidated_extents_tree.h @@ -0,0 +1,95 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/invalidated_extents_tree.h - invalidated extents btree declarations. + * + * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates. + * https://www.bytedance.com/ + * Copyright (c) 2022-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cong Wang + */ + +#ifndef _SSDFS_INVALIDATED_EXTENTS_TREE_H +#define _SSDFS_INVALIDATED_EXTENTS_TREE_H + +/* + * struct ssdfs_invextree_info - invalidated extents tree object + * @state: invalidated extents btree state + * @lock: invalidated extents btree lock + * @generic_tree: generic btree description + * @extents_count: count of extents in the whole tree + * @fsi: pointer on shared file system object + */ +struct ssdfs_invextree_info { + atomic_t state; + struct rw_semaphore lock; + struct ssdfs_btree generic_tree; + + atomic64_t extents_count; + + struct ssdfs_fs_info *fsi; +}; + +/* Invalidated extents tree states */ +enum { + SSDFS_INVEXTREE_UNKNOWN_STATE, + SSDFS_INVEXTREE_CREATED, + SSDFS_INVEXTREE_INITIALIZED, + SSDFS_INVEXTREE_DIRTY, + SSDFS_INVEXTREE_CORRUPTED, + SSDFS_INVEXTREE_STATE_MAX +}; + +/* + * Invalidated extents tree API + */ +int ssdfs_invextree_create(struct ssdfs_fs_info *fsi); +void ssdfs_invextree_destroy(struct ssdfs_fs_info *fsi); +int ssdfs_invextree_flush(struct ssdfs_fs_info *fsi); + +int ssdfs_invextree_find(struct ssdfs_invextree_info *tree, + struct ssdfs_raw_extent *extent, + struct ssdfs_btree_search *search); +int ssdfs_invextree_add(struct ssdfs_invextree_info *tree, + struct ssdfs_raw_extent *extent, + struct ssdfs_btree_search *search); +int ssdfs_invextree_delete(struct ssdfs_invextree_info *tree, + struct ssdfs_raw_extent *extent, + struct ssdfs_btree_search *search); + +/* + * Invalidated extents tree's internal API + */ +int ssdfs_invextree_find_leaf_node(struct ssdfs_invextree_info *tree, + u64 seg_id, + struct ssdfs_btree_search *search); +int ssdfs_invextree_get_start_hash(struct ssdfs_invextree_info *tree, + u64 *start_hash); +int ssdfs_invextree_node_hash_range(struct ssdfs_invextree_info *tree, + struct ssdfs_btree_search *search, + u64 *start_hash, u64 *end_hash, + u16 *items_count); +int ssdfs_invextree_extract_range(struct ssdfs_invextree_info *tree, + u16 start_index, u16 count, + struct ssdfs_btree_search *search); +int ssdfs_invextree_check_search_result(struct ssdfs_btree_search *search); +int ssdfs_invextree_get_next_hash(struct ssdfs_invextree_info *tree, + struct ssdfs_btree_search *search, + u64 *next_hash); + +void ssdfs_debug_invextree_object(struct ssdfs_invextree_info *tree); + +/* + * Invalidated extents btree specialized operations + */ +extern const struct ssdfs_btree_descriptor_operations ssdfs_invextree_desc_ops; +extern const struct ssdfs_btree_operations ssdfs_invextree_ops; +extern const struct ssdfs_btree_node_operations ssdfs_invextree_node_ops; + +#endif /* _SSDFS_INVALIDATED_EXTENTS_TREE_H */ From patchwork Sat Feb 25 01:09:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151976 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2A52C6FA8E for ; Sat, 25 Feb 2023 01:20:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229842AbjBYBUs (ORCPT ); Fri, 24 Feb 2023 20:20:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229689AbjBYBTk (ORCPT ); Fri, 24 Feb 2023 20:19:40 -0500 Received: from mail-oi1-x230.google.com (mail-oi1-x230.google.com [IPv6:2607:f8b0:4864:20::230]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D10A18694 for ; Fri, 24 Feb 2023 17:18:06 -0800 (PST) Received: by mail-oi1-x230.google.com with SMTP id bh20so794036oib.9 for ; Fri, 24 Feb 2023 17:18:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=szi3RhYICdmVbKx10qLVhGmU5y2OVUgDDIy8ENOZq/I=; b=D7vekgyz0Kw4T/IsoyRn1WwpBKV1qZI0kdQg3yLM/b+Kg4Rje2zcO0VD0q3VrPyKf6 Y5nlrwI3J0OCZvSw2RaJX6mB4QfNz3skfCfLEVvqGOJCrWeddA/DVzMAj60a/mZxAA0E lPA4LvOpZw+WqEnpzq9nh0XfNQSYmRuooiMEjsKcnwPLUHgfP9UklUgr/cG2wprjmAR7 Yj63E8fiCwAOIs0IhuolH5QVDdd66uBj/eMmgWX8B2XinIyFs0sM8CKLLA02GUFfvOlr C7Y4iP6ttbKtdbUslpn20UkYEsYpGBXETkqT2VLJhShmXJHW+LRX/Xsa6pDnDpdbaZIG CCfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=szi3RhYICdmVbKx10qLVhGmU5y2OVUgDDIy8ENOZq/I=; b=xYxOdGvaosO3h+WDYh668HpbJ3U+rwdVarMIMbO+oSvXfIebm5XK2xrkDh3SnGsdAV OV7Wg/WvYMqCHfQznC2/lvYYeehPgq/h/AJc2vIdJ+eXgZFr73vUevIpUdYl6Vvl9XWl p7+12iRyy8BfPkCnL1SMvGsJ/EG0codcivc/yF0gldVUvNWgqNOOAuTKnKCFDSJU2VnM ziU2zT+lW/ndA6LwQcQ2Fl+imWDgLjKcGPQfuGPzbG4bOuA+vtE7SUeIhCoN40tCn5YH elFqzYalZVeN4EZ20UOIw73Ibq7S68GnC4WuiWBS4b/RfrtbfqqzZvyls9/Wo8MtXNBV pXgg== X-Gm-Message-State: AO0yUKU104mPLh8PAZRPBOcrnkRcklBnJr+IkhYASgEG9/uvF+Do6/lX G9L9JFZAb0DEMCRdQ2s/ikPwARzO3RfvkgM6 X-Google-Smtp-Source: AK7set/U1U4SubY8aihSUFO6SdRmchGkblQMG4sarb1NTLUAukVsmGxHIiPQqHe7lWdynGBP0HzHVA== X-Received: by 2002:a05:6808:687:b0:383:eaea:7dcb with SMTP id k7-20020a056808068700b00383eaea7dcbmr2767247oig.15.1677287885185; Fri, 24 Feb 2023 17:18:05 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.18.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:18:04 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 71/76] ssdfs: find item in invalidated extents b-tree Date: Fri, 24 Feb 2023 17:09:22 -0800 Message-Id: <20230225010927.813929-72-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Implement lookup logic in invalidated extents b-tree. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/invalidated_extents_tree.c | 1640 +++++++++++++++++++++++++++ 1 file changed, 1640 insertions(+) diff --git a/fs/ssdfs/invalidated_extents_tree.c b/fs/ssdfs/invalidated_extents_tree.c index 4cb5ffeac706..d7dc4156a20d 100644 --- a/fs/ssdfs/invalidated_extents_tree.c +++ b/fs/ssdfs/invalidated_extents_tree.c @@ -2521,3 +2521,1643 @@ int ssdfs_invextree_flush_node(struct ssdfs_btree_node *node) return err; } + +/****************************************************************************** + * SPECIALIZED INVALIDATED EXTENTS BTREE NODE OPERATIONS * + ******************************************************************************/ + +/* + * ssdfs_convert_lookup2item_index() - convert lookup into item index + * @node_size: size of the node in bytes + * @lookup_index: lookup index + */ +static inline +u16 ssdfs_convert_lookup2item_index(u32 node_size, u16 lookup_index) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_size %u, lookup_index %u\n", + node_size, lookup_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_convert_lookup2item_index(lookup_index, node_size, + sizeof(struct ssdfs_raw_extent), + SSDFS_INVEXTREE_LOOKUP_TABLE_SIZE); +} + +/* + * ssdfs_convert_item2lookup_index() - convert item into lookup index + * @node_size: size of the node in bytes + * @item_index: item index + */ +static inline +u16 ssdfs_convert_item2lookup_index(u32 node_size, u16 item_index) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_size %u, item_index %u\n", + node_size, item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_convert_item2lookup_index(item_index, node_size, + sizeof(struct ssdfs_raw_extent), + SSDFS_INVEXTREE_LOOKUP_TABLE_SIZE); +} + +/* + * is_hash_for_lookup_table() - should item's hash be into lookup table? + * @node_size: size of the node in bytes + * @item_index: item index + */ +static inline +bool is_hash_for_lookup_table(u32 node_size, u16 item_index) +{ + u16 lookup_index; + u16 calculated; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_size %u, item_index %u\n", + node_size, item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + lookup_index = ssdfs_convert_item2lookup_index(node_size, item_index); + calculated = ssdfs_convert_lookup2item_index(node_size, lookup_index); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("lookup_index %u, calculated %u\n", + lookup_index, calculated); +#endif /* CONFIG_SSDFS_DEBUG */ + + return calculated == item_index; +} + +/* + * ssdfs_invextree_node_find_lookup_index() - find lookup index + * @node: node object + * @search: search object + * @lookup_index: lookup index [out] + * + * This method tries to find a lookup index for requested items. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - lookup index doesn't exist for requested hash. + */ +static +int ssdfs_invextree_node_find_lookup_index(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search, + u16 *lookup_index) +{ + __le64 *lookup_table; + int array_size = SSDFS_INVEXTREE_LOOKUP_TABLE_SIZE; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search || !lookup_index); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->header_lock); + lookup_table = node->raw.invextree_header.lookup_table; + err = ssdfs_btree_node_find_lookup_index_nolock(search, + lookup_table, + array_size, + lookup_index); + up_read(&node->header_lock); + + return err; +} + +/* + * __ssdfs_check_extent_for_request() - check invalidated extent + * @fsi: pointer on shared file system object + * @extent: pointer on invalidated extent object + * @search: search object + * + * This method tries to check @extent for the @search request. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EAGAIN - continue the search. + * %-ENODATA - possible place was found. + */ +static +int __ssdfs_check_extent_for_request(struct ssdfs_fs_info *fsi, + struct ssdfs_raw_extent *extent, + struct ssdfs_btree_search *search) +{ + u64 seg_id; + u32 logical_blk; + u32 len; + u64 start_hash; + u64 end_hash; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !extent || !search); + + SSDFS_DBG("fsi %p, extent %p, search %p\n", + fsi, extent, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + seg_id = le64_to_cpu(extent->seg_id); + logical_blk = le32_to_cpu(extent->logical_blk); + len = le32_to_cpu(extent->len); + + start_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk); + end_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk + len - 1); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("request (start_hash %llx, end_hash %llx), " + "extent (start_hash %llx, end_hash %llx)\n", + search->request.start.hash, + search->request.end.hash, + start_hash, end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (start_hash > search->request.end.hash) { + /* no data for request */ + err = -ENODATA; + } else if (search->request.start.hash > end_hash) { + /* continue the search */ + err = -EAGAIN; + } else { + /* + * extent is inside request [start_hash, end_hash] + */ + } + + return err; +} + +/* + * ssdfs_check_extent_for_request() - check invalidated extent + * @fsi: pointer on shared file system object + * @extent: pointer on invalidated extent object + * @search: search object + * + * This method tries to check @extent for the @search request. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-EINVAL - invalid input. + * %-ERANGE - internal error. + * %-EAGAIN - continue the search. + * %-ENODATA - possible place was found. + */ +static +int ssdfs_check_extent_for_request(struct ssdfs_fs_info *fsi, + struct ssdfs_raw_extent *extent, + struct ssdfs_btree_search *search) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !extent || !search); + + SSDFS_DBG("fsi %p, extent %p, search %p\n", + fsi, extent, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_check_extent_for_request(fsi, extent, search); + if (err == -EAGAIN) { + /* continue the search */ + return err; + } else if (err == -ENODATA) { + search->result.err = -ENODATA; + search->result.state = + SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND; + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to check invalidated extent: err %d\n", + err); + return err; + } else { + /* valid item found */ + search->result.state = SSDFS_BTREE_SEARCH_VALID_ITEM; + } + + return 0; +} + +/* + * ssdfs_get_extent_hash_range() - get extent's hash range + * @fsi: pointer on shared file system object + * @kaddr: pointer on extent object + * @start_hash: pointer on start_hash value [out] + * @end_hash: pointer on end_hash value [out] + */ +static +void ssdfs_get_extent_hash_range(struct ssdfs_fs_info *fsi, + void *kaddr, + u64 *start_hash, + u64 *end_hash) +{ + struct ssdfs_raw_extent *extent; + u64 seg_id; + u32 logical_blk; + u32 len; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !kaddr || !start_hash || !end_hash); + + SSDFS_DBG("kaddr %p\n", kaddr); +#endif /* CONFIG_SSDFS_DEBUG */ + + extent = (struct ssdfs_raw_extent *)kaddr; + + seg_id = le64_to_cpu(extent->seg_id); + logical_blk = le32_to_cpu(extent->logical_blk); + len = le32_to_cpu(extent->len); + + *start_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk); + *end_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk + len - 1); +} + +/* + * ssdfs_check_found_extent() - check found invalidated extent + * @fsi: pointer on shared file system object + * @search: search object + * @kaddr: pointer on invalidated extent object + * @item_index: index of the extent + * @start_hash: pointer on start_hash value [out] + * @end_hash: pointer on end_hash value [out] + * @found_index: pointer on found index [out] + * + * This method tries to check the found invalidated extent. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - corrupted invalidated extent. + * %-EAGAIN - continue the search. + * %-ENODATA - possible place was found. + */ +static +int ssdfs_check_found_extent(struct ssdfs_fs_info *fsi, + struct ssdfs_btree_search *search, + void *kaddr, + u16 item_index, + u64 *start_hash, + u64 *end_hash, + u16 *found_index) +{ + struct ssdfs_raw_extent *extent; + u32 req_flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search || !kaddr || !found_index); + BUG_ON(!start_hash || !end_hash); + + SSDFS_DBG("item_index %u\n", item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + *start_hash = U64_MAX; + *end_hash = U64_MAX; + *found_index = U16_MAX; + + extent = (struct ssdfs_raw_extent *)kaddr; + req_flags = search->request.flags; + + if (!(req_flags & SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE)) { + SSDFS_ERR("invalid request: hash is absent\n"); + return -ERANGE; + } + + ssdfs_get_extent_hash_range(fsi, kaddr, start_hash, end_hash); + + err = ssdfs_check_extent_for_request(fsi, extent, search); + if (err == -ENODATA) { + search->result.state = + SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND; + search->result.err = err; + search->result.start_index = item_index; + search->result.count = 1; + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_ADD_ITEM: + case SSDFS_BTREE_SEARCH_ADD_RANGE: + case SSDFS_BTREE_SEARCH_CHANGE_ITEM: + /* do nothing */ + break; + + default: + ssdfs_btree_search_free_result_buf(search); + + search->result.buf_state = + SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE; + search->result.buf = NULL; + search->result.buf_size = 0; + search->result.items_in_buffer = 0; + break; + } + } else if (err == -EAGAIN) { + /* continue to search */ + err = 0; + *found_index = U16_MAX; + } else if (unlikely(err)) { + SSDFS_ERR("fail to check extent: err %d\n", + err); + } else { + *found_index = item_index; + search->result.state = + SSDFS_BTREE_SEARCH_VALID_ITEM; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_hash %llx, end_hash %llx, " + "found_index %u\n", + *start_hash, *end_hash, + *found_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +/* + * ssdfs_prepare_extents_buffer() - prepare buffer for extents + * @search: search object + * @found_index: found index of invalidated extent + * @start_hash: starting hash + * @end_hash: ending hash + * @items_count: count of items in the sequence + * @item_size: size of the item + */ +static +int ssdfs_prepare_extents_buffer(struct ssdfs_btree_search *search, + u16 found_index, + u64 start_hash, + u64 end_hash, + u16 items_count, + size_t item_size) +{ + u16 found_extents = 0; + size_t buf_size = sizeof(struct ssdfs_raw_extent); + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); + + SSDFS_DBG("found_index %u, start_hash %llx, end_hash %llx, " + "items_count %u, item_size %zu\n", + found_index, start_hash, end_hash, + items_count, item_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_btree_search_free_result_buf(search); + + if (start_hash == end_hash) { + /* use inline buffer */ + found_extents = 1; + } else { + /* use external buffer */ + if (found_index >= items_count) { + SSDFS_ERR("found_index %u >= items_count %u\n", + found_index, items_count); + return -ERANGE; + } + found_extents = items_count - found_index; + } + + if (found_extents == 1) { + search->result.buf_state = + SSDFS_BTREE_SEARCH_INLINE_BUFFER; + search->result.buf = &search->raw.invalidated_extent; + search->result.buf_size = buf_size; + search->result.items_in_buffer = 0; + } else { + if (search->result.buf) { + SSDFS_WARN("search->result.buf %p, " + "search->result.buf_state %#x\n", + search->result.buf, + search->result.buf_state); + } + + err = ssdfs_btree_search_alloc_result_buf(search, + buf_size * found_extents); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate memory for buffer\n"); + return err; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("found_extents %u, " + "search->result.items_in_buffer %u\n", + found_extents, + search->result.items_in_buffer); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_extract_found_extent() - extract found extent + * @fsi: pointer on shared file system object + * @search: search object + * @item_size: size of the item + * @kaddr: pointer on invalidated extent + * @start_hash: pointer on start_hash value [out] + * @end_hash: pointer on end_hash value [out] + * + * This method tries to extract the found extent. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_extract_found_extent(struct ssdfs_fs_info *fsi, + struct ssdfs_btree_search *search, + size_t item_size, + void *kaddr, + u64 *start_hash, + u64 *end_hash) +{ + struct ssdfs_raw_extent *extent; + size_t buf_size = sizeof(struct ssdfs_raw_extent); + u32 calculated; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !search || !kaddr); + BUG_ON(!start_hash || !end_hash); + + SSDFS_DBG("kaddr %p\n", kaddr); +#endif /* CONFIG_SSDFS_DEBUG */ + + *start_hash = U64_MAX; + *end_hash = U64_MAX; + + calculated = search->result.items_in_buffer * buf_size; + if (calculated > search->result.buf_size) { + SSDFS_ERR("calculated %u > buf_size %zu\n", + calculated, search->result.buf_size); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search->result.items_in_buffer %u, " + "calculated %u\n", + search->result.items_in_buffer, + calculated); + + BUG_ON(!search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + extent = (struct ssdfs_raw_extent *)kaddr; + ssdfs_get_extent_hash_range(fsi, extent, start_hash, end_hash); + + err = __ssdfs_check_extent_for_request(fsi, extent, search); + if (err == -ENODATA) { + SSDFS_DBG("current extent is out of requested range\n"); + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to check extent: err %d\n", + err); + return err; + } + + err = ssdfs_memcpy(search->result.buf, + calculated, search->result.buf_size, + extent, 0, item_size, + item_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: calculated %u, " + "search->result.buf_size %zu, err %d\n", + calculated, search->result.buf_size, err); + return err; + } + + search->result.items_in_buffer++; + search->result.count++; + search->result.state = SSDFS_BTREE_SEARCH_VALID_ITEM; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_hash %llx, end_hash %llx, " + "search->result.count %u\n", + *start_hash, *end_hash, + search->result.count); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_extract_range_by_lookup_index() - extract a range of items + * @node: pointer on node object + * @lookup_index: lookup index for requested range + * @search: pointer on search request object + * + * This method tries to extract a range of items from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - requested range is out of the node. + */ +static +int ssdfs_extract_range_by_lookup_index(struct ssdfs_btree_node *node, + u16 lookup_index, + struct ssdfs_btree_search *search) +{ + int capacity = SSDFS_INVEXTREE_LOOKUP_TABLE_SIZE; + size_t item_size = sizeof(struct ssdfs_raw_extent); + + return __ssdfs_extract_range_by_lookup_index(node, lookup_index, + capacity, item_size, + search, + ssdfs_check_found_extent, + ssdfs_prepare_extents_buffer, + ssdfs_extract_found_extent); +} + +/* + * ssdfs_invextree_node_find_range() - find a range of items into the node + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to find a range of items into the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENODATA - requested range is out of the node. + * %-ENOMEM - unable to allocate memory. + */ +static +int ssdfs_invextree_node_find_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + int state; + u16 items_count; + u16 items_capacity; + u64 start_hash; + u64 end_hash; + u16 lookup_index; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&node->header_lock); + state = atomic_read(&node->items_area.state); + items_count = node->items_area.items_count; + items_capacity = node->items_area.items_capacity; + start_hash = node->items_area.start_hash; + end_hash = node->items_area.end_hash; + up_read(&node->header_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("request (start_hash %llx, end_hash %llx), " + "node (start_hash %llx, end_hash %llx)\n", + search->request.start.hash, + search->request.end.hash, + start_hash, end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + SSDFS_ERR("invalid area state %#x\n", + state); + return -ERANGE; + } + + if (items_capacity == 0 || items_count > items_capacity) { + SSDFS_ERR("corrupted node description: " + "items_count %u, items_capacity %u\n", + items_count, + items_capacity); + return -ERANGE; + } + + if (search->request.count == 0 || + search->request.count > items_capacity) { + SSDFS_ERR("invalid request: " + "count %u, items_capacity %u\n", + search->request.count, + items_capacity); + return -ERANGE; + } + + err = ssdfs_btree_node_check_hash_range(node, + items_count, + items_capacity, + start_hash, + end_hash, + search); + if (err) + return err; + + err = ssdfs_invextree_node_find_lookup_index(node, search, + &lookup_index); + if (err == -ENODATA) { + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_CHANGE_ITEM: + case SSDFS_BTREE_SEARCH_DELETE_ITEM: + /* do nothing */ + goto try_extract_range_by_lookup_index; + + default: + /* continue logic */ + break; + } + + search->result.state = + SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND; + search->result.err = -ENODATA; + search->result.start_index = + ssdfs_convert_lookup2item_index(node->node_size, + lookup_index); + search->result.count = search->request.count; + search->result.search_cno = + ssdfs_current_cno(node->tree->fsi->sb); + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_ADD_ITEM: + case SSDFS_BTREE_SEARCH_ADD_RANGE: + case SSDFS_BTREE_SEARCH_CHANGE_ITEM: + /* do nothing */ + break; + + default: +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + search->result.buf_state = + SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE; + search->result.buf = NULL; + search->result.buf_size = 0; + search->result.items_in_buffer = 0; + break; + } + + return -ENODATA; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find the index: " + "start_hash %llx, end_hash %llx, err %d\n", + search->request.start.hash, + search->request.end.hash, + err); + return err; + } + +try_extract_range_by_lookup_index: + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(lookup_index >= SSDFS_INVEXTREE_LOOKUP_TABLE_SIZE); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_extract_range_by_lookup_index(node, lookup_index, + search); + search->result.search_cno = ssdfs_current_cno(node->tree->fsi->sb); + + if (err == -EAGAIN) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node contains not all requested extents: " + "node (start_hash %llx, end_hash %llx), " + "request (start_hash %llx, end_hash %llx)\n", + start_hash, end_hash, + search->request.start.hash, + search->request.end.hash); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to extract range: " + "node (start_hash %llx, end_hash %llx), " + "request (start_hash %llx, end_hash %llx), " + "err %d\n", + start_hash, end_hash, + search->request.start.hash, + search->request.end.hash, + err); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } else if (unlikely(err)) { + SSDFS_ERR("fail to extract range: " + "node (start_hash %llx, end_hash %llx), " + "request (start_hash %llx, end_hash %llx), " + "err %d\n", + start_hash, end_hash, + search->request.start.hash, + search->request.end.hash, + err); + return err; + } + + return 0; +} + +/* + * ssdfs_invextree_node_find_item() - find item into node + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to find an item into the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_invextree_node_find_item(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search->request.count != 1) { + SSDFS_ERR("invalid request state: " + "count %d, start_hash %llx, end_hash %llx\n", + search->request.count, + search->request.start.hash, + search->request.end.hash); + return -ERANGE; + } + + return ssdfs_invextree_node_find_range(node, search); +} + +static +int ssdfs_invextree_node_allocate_item(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("operation is unavailable\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EOPNOTSUPP; +} + +static +int ssdfs_invextree_node_allocate_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("operation is unavailable\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + return -EOPNOTSUPP; +} + +/* + * __ssdfs_invextree_node_get_extent() - extract the invalidated extent + * @pvec: pointer on pagevec + * @area_offset: area offset from the node's beginning + * @area_size: area size + * @node_size: size of the node + * @item_index: index of the extent in the node + * @extent: pointer on extent's buffer [out] + * + * This method tries to extract the invalidated extent from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_invextree_node_get_extent(struct pagevec *pvec, + u32 area_offset, + u32 area_size, + u32 node_size, + u16 item_index, + struct ssdfs_raw_extent *extent) +{ + size_t item_size = sizeof(struct ssdfs_raw_extent); + u32 item_offset; + int page_index; + struct page *page; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!pvec || !extent); + + SSDFS_DBG("area_offset %u, area_size %u, item_index %u\n", + area_offset, area_size, item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + item_offset = (u32)item_index * item_size; + if (item_offset >= area_size) { + SSDFS_ERR("item_offset %u >= area_size %u\n", + item_offset, area_size); + return -ERANGE; + } + + item_offset += area_offset; + if (item_offset >= node_size) { + SSDFS_ERR("item_offset %u >= node_size %u\n", + item_offset, node_size); + return -ERANGE; + } + + page_index = item_offset >> PAGE_SHIFT; + + if (page_index > 0) + item_offset %= page_index * PAGE_SIZE; + + if (page_index >= pagevec_count(pvec)) { + SSDFS_ERR("invalid page_index: " + "index %d, pvec_size %u\n", + page_index, + pagevec_count(pvec)); + return -ERANGE; + } + + page = pvec->pages[page_index]; + + err = ssdfs_memcpy_from_page(extent, 0, item_size, + page, item_offset, PAGE_SIZE, + item_size); + if (unlikely(err)) { + SSDFS_ERR("fail to copy: err %d\n", err); + return err; + } + + return 0; +} + +/* + * ssdfs_invextree_node_get_extent() - extract extent from the node + * @node: pointer on node object + * @area: items area descriptor + * @item_index: index of the extent + * @extent: pointer on extracted extent [out] + * + * This method tries to extract the extent from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_invextree_node_get_extent(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + u16 item_index, + struct ssdfs_raw_extent *extent) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !extent); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, item_index %u\n", + node->node_id, item_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_invextree_node_get_extent(&node->content.pvec, + area->offset, + area->area_size, + node->node_size, + item_index, + extent); +} + +/* + * is_requested_position_correct() - check that requested position is correct + * @node: pointer on node object + * @area: items area descriptor + * @search: search object + * + * This method tries to check that requested position of an extent + * into the node is correct. + * + * RETURN: + * [success] + * + * %SSDFS_CORRECT_POSITION - requested position is correct. + * %SSDFS_SEARCH_LEFT_DIRECTION - correct position from the left. + * %SSDFS_SEARCH_RIGHT_DIRECTION - correct position from the right. + * + * [failure] - error code: + * + * %SSDFS_CHECK_POSITION_FAILURE - internal error. + */ +static +int is_requested_position_correct(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_raw_extent extent; + u16 item_index; + u64 seg_id; + u32 logical_blk; + u32 len; + u64 start_hash; + u64 end_hash; + int direction = SSDFS_CHECK_POSITION_FAILURE; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, item_index %u\n", + node->node_id, search->result.start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + item_index = search->result.start_index; + if ((item_index + search->request.count) > area->items_capacity) { + SSDFS_ERR("invalid request: " + "item_index %u, count %u\n", + item_index, search->request.count); + return SSDFS_CHECK_POSITION_FAILURE; + } + + if (item_index >= area->items_count) { + if (area->items_count == 0) + item_index = area->items_count; + else + item_index = area->items_count - 1; + + search->result.start_index = item_index; + } + + err = ssdfs_invextree_node_get_extent(node, area, item_index, &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to extract the extent: " + "item_index %u, err %d\n", + item_index, err); + return SSDFS_CHECK_POSITION_FAILURE; + } + + seg_id = le64_to_cpu(extent.seg_id); + logical_blk = le32_to_cpu(extent.logical_blk); + len = le32_to_cpu(extent.len); + + start_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk); + end_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk + len - 1); + + if (search->request.end.hash < start_hash) + direction = SSDFS_SEARCH_LEFT_DIRECTION; + else if (end_hash < search->request.start.hash) + direction = SSDFS_SEARCH_RIGHT_DIRECTION; + else + direction = SSDFS_CORRECT_POSITION; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("extent (start_hash %llx, end_hash %llx), " + "search (start_hash %llx, end_hash %llx), " + "direction %#x\n", + start_hash, end_hash, + search->request.start.hash, + search->request.end.hash, + direction); +#endif /* CONFIG_SSDFS_DEBUG */ + + return direction; +} + +/* + * ssdfs_find_correct_position_from_left() - find position from the left + * @node: pointer on node object + * @area: items area descriptor + * @search: search object + * + * This method tries to find a correct position of the extent + * from the left side of extents' sequence in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_find_correct_position_from_left(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_raw_extent extent; + int item_index; + u64 seg_id; + u32 logical_blk; + u64 start_hash; + u32 req_flags; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, item_index %u\n", + node->node_id, search->result.start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + item_index = search->result.start_index; + if ((item_index + search->request.count) >= area->items_capacity) { + SSDFS_ERR("invalid request: " + "item_index %u, count %u\n", + item_index, search->request.count); + return -ERANGE; + } + + if (item_index >= area->items_count) { + if (area->items_count == 0) + item_index = area->items_count; + else + item_index = area->items_count - 1; + + search->result.start_index = (u16)item_index; + return 0; + } + + req_flags = search->request.flags; + + for (; item_index >= 0; item_index--) { + err = ssdfs_invextree_node_get_extent(node, area, + (u16)item_index, + &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to extract the extent: " + "item_index %d, err %d\n", + item_index, err); + return err; + } + + seg_id = le64_to_cpu(extent.seg_id); + logical_blk = le32_to_cpu(extent.logical_blk); + + start_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk); + + if (search->request.start.hash == start_hash) { + search->result.start_index = (u16)item_index; + return 0; + } else if (start_hash < search->request.start.hash) { + search->result.start_index = (u16)(item_index + 1); + return 0; + } + } + + search->result.start_index = 0; + return 0; +} + +/* + * ssdfs_find_correct_position_from_right() - find position from the right + * @node: pointer on node object + * @area: items area descriptor + * @search: search object + * + * This method tries to find a correct position of the extent + * from the right side of extents' sequence in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_find_correct_position_from_right(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_raw_extent extent; + int item_index; + u64 seg_id; + u32 logical_blk; + u64 start_hash; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, item_index %u\n", + node->node_id, search->result.start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + item_index = search->result.start_index; + if ((item_index + search->request.count) >= area->items_capacity) { + SSDFS_ERR("invalid request: " + "item_index %u, count %u\n", + item_index, search->request.count); + return -ERANGE; + } + + if (item_index >= area->items_count) { + if (area->items_count == 0) + item_index = area->items_count; + else + item_index = area->items_count - 1; + + search->result.start_index = (u16)item_index; + + return 0; + } + + for (; item_index < area->items_count; item_index++) { + err = ssdfs_invextree_node_get_extent(node, area, + (u16)item_index, + &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to extract the extent: " + "item_index %d, err %d\n", + item_index, err); + return err; + } + + seg_id = le64_to_cpu(extent.seg_id); + logical_blk = le32_to_cpu(extent.logical_blk); + + start_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk); + + if (search->request.start.hash == start_hash) { + search->result.start_index = (u16)item_index; + return 0; + } else if (search->request.end.hash < start_hash) { + if (item_index == 0) { + search->result.start_index = + (u16)item_index; + } else { + search->result.start_index = + (u16)(item_index - 1); + } + return 0; + } + } + + search->result.start_index = area->items_count; + return 0; +} + +/* + * ssdfs_clean_lookup_table() - clean unused space of lookup table + * @node: pointer on node object + * @area: items area descriptor + * @start_index: starting index + * + * This method tries to clean the unused space of lookup table. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_clean_lookup_table(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + u16 start_index) +{ + __le64 *lookup_table; + u16 lookup_index; + u16 item_index; + u16 items_count; + u16 items_capacity; + u16 cleaning_indexes; + u32 cleaning_bytes; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, start_index %u\n", + node->node_id, start_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + items_capacity = node->items_area.items_capacity; + if (start_index >= items_capacity) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_index %u >= items_capacity %u\n", + start_index, items_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + return 0; + } + + lookup_table = node->raw.invextree_header.lookup_table; + + lookup_index = ssdfs_convert_item2lookup_index(node->node_size, + start_index); + if (unlikely(lookup_index >= SSDFS_INVEXTREE_LOOKUP_TABLE_SIZE)) { + SSDFS_ERR("invalid lookup_index %u\n", + lookup_index); + return -ERANGE; + } + + items_count = node->items_area.items_count; + item_index = ssdfs_convert_lookup2item_index(node->node_size, + lookup_index); + if (unlikely(item_index >= items_capacity)) { + SSDFS_ERR("item_index %u >= items_capacity %u\n", + item_index, items_capacity); + return -ERANGE; + } + + if (item_index != start_index) + lookup_index++; + + cleaning_indexes = SSDFS_INVEXTREE_LOOKUP_TABLE_SIZE - lookup_index; + cleaning_bytes = cleaning_indexes * sizeof(__le64); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("lookup_index %u, cleaning_indexes %u, cleaning_bytes %u\n", + lookup_index, cleaning_indexes, cleaning_bytes); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset(&lookup_table[lookup_index], 0xFF, cleaning_bytes); + + return 0; +} + +/* + * ssdfs_correct_lookup_table() - correct lookup table of the node + * @node: pointer on node object + * @area: items area descriptor + * @start_index: starting index of the range + * @range_len: number of items in the range + * + * This method tries to correct the lookup table of the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_correct_lookup_table(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + u16 start_index, u16 range_len) +{ + struct ssdfs_fs_info *fsi; + __le64 *lookup_table; + struct ssdfs_raw_extent extent; + u64 seg_id; + u32 logical_blk; + u64 hash; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, start_index %u, range_len %u\n", + node->node_id, start_index, range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + if (range_len == 0) { + SSDFS_DBG("range_len == 0\n"); + return 0; + } + + lookup_table = node->raw.invextree_header.lookup_table; + + for (i = 0; i < range_len; i++) { + int item_index = start_index + i; + u16 lookup_index; + + if (is_hash_for_lookup_table(node->node_size, item_index)) { + lookup_index = + ssdfs_convert_item2lookup_index(node->node_size, + item_index); + + err = ssdfs_invextree_node_get_extent(node, + area, + item_index, + &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to extract extent: " + "item_index %d, err %d\n", + item_index, err); + return err; + } + + seg_id = le64_to_cpu(extent.seg_id); + logical_blk = le32_to_cpu(extent.logical_blk); + + hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk); + + lookup_table[lookup_index] = hash; + } + } + + return 0; +} + +/* + * ssdfs_initialize_lookup_table() - initialize lookup table + * @node: pointer on node object + */ +static +void ssdfs_initialize_lookup_table(struct ssdfs_btree_node *node) +{ + __le64 *lookup_table; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u\n", node->node_id); +#endif /* CONFIG_SSDFS_DEBUG */ + + lookup_table = node->raw.invextree_header.lookup_table; + memset(lookup_table, 0xFF, + sizeof(__le64) * SSDFS_INVEXTREE_LOOKUP_TABLE_SIZE); +} + +/* + * ssdfs_show_extent_items() - show invalidated extent items + * @node: pointer on node object + * @items_area: items area descriptor + */ +static inline +void ssdfs_show_extent_items(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *items_area) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_raw_extent extent; + u64 seg_id; + u32 logical_blk; + u32 len; + u64 start_hash; + u64 end_hash; + int i; + int err; + + fsi = node->tree->fsi; + + for (i = 0; i < items_area->items_count; i++) { + err = ssdfs_invextree_node_get_extent(node, + items_area, + i, + &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to get invalidated extent item: " + "err %d\n", err); + return; + } + + seg_id = le64_to_cpu(extent.seg_id); + logical_blk = le32_to_cpu(extent.logical_blk); + len = le32_to_cpu(extent.len); + + start_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk); + end_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk + len - 1); + + SSDFS_ERR("index %d, seg_id %llu, logical_blk %u, len %u, " + "start_hash %llxm end_hash %llx\n", + i, seg_id, logical_blk, len, + start_hash, end_hash); + } +} + +/* + * ssdfs_invextree_node_do_insert_range() - insert range into node + * @tree: invalidated extents tree + * @node: pointer on node object + * @items_area: items area state + * @search: search object [in|out] + * + * This method tries to insert the range of extents into the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_invextree_node_do_insert_range(struct ssdfs_invextree_info *tree, + struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *items_area, + struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_invextree_node_header *hdr; + struct ssdfs_raw_extent extent; + size_t item_size = sizeof(struct ssdfs_raw_extent); + u16 item_index; + u16 range_len; + u32 used_space; + u64 seg_id; + u32 logical_blk; + u16 extents_count = 0; + u64 start_hash, end_hash; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !node || !items_area || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + item_index = search->result.start_index; + if ((item_index + search->request.count) > items_area->items_capacity) { + SSDFS_ERR("invalid request: " + "item_index %u, count %u\n", + item_index, search->request.count); + return -ERANGE; + } + + range_len = items_area->items_count - search->result.start_index; + extents_count = range_len + search->request.count; + + err = ssdfs_shift_range_right(node, items_area, item_size, + item_index, range_len, + search->request.count); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to shift dentries range: " + "start %u, count %u, err %d\n", + item_index, search->request.count, + err); + return err; + } + + ssdfs_debug_btree_node_object(node); + + err = ssdfs_generic_insert_range(node, items_area, + item_size, search); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to insert item: err %d\n", + err); + return err; + } + + down_write(&node->header_lock); + + node->items_area.items_count += search->request.count; + if (node->items_area.items_count > node->items_area.items_capacity) { + err = -ERANGE; + SSDFS_ERR("items_count %u > items_capacity %u\n", + node->items_area.items_count, + node->items_area.items_capacity); + goto finish_items_area_correction; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_capacity %u, items_count %u\n", + items_area->items_capacity, + items_area->items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + used_space = (u32)search->request.count * item_size; + if (used_space > node->items_area.free_space) { + err = -ERANGE; + SSDFS_ERR("used_space %u > free_space %u\n", + used_space, + node->items_area.free_space); + goto finish_items_area_correction; + } + node->items_area.free_space -= used_space; + + err = ssdfs_invextree_node_get_extent(node, + &node->items_area, + 0, &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to get extent: err %d\n", err); + goto finish_items_area_correction; + } + + seg_id = le64_to_cpu(extent.seg_id); + logical_blk = le32_to_cpu(extent.logical_blk); + + start_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk); + + err = ssdfs_invextree_node_get_extent(node, + &node->items_area, + node->items_area.items_count - 1, + &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to get extent: err %d\n", err); + goto finish_items_area_correction; + } + + seg_id = le64_to_cpu(extent.seg_id); + logical_blk = le32_to_cpu(extent.logical_blk); + + end_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk); + + if (start_hash >= U64_MAX || end_hash >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("start_hash %llx, end_hash %llx\n", + start_hash, end_hash); + goto finish_items_area_correction; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("BEFORE: node_id %u, start_hash %llx, end_hash %llx\n", + node->node_id, + node->items_area.start_hash, + node->items_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + node->items_area.start_hash = start_hash; + node->items_area.end_hash = end_hash; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("AFTER: node_id %u, start_hash %llx, end_hash %llx\n", + node->node_id, start_hash, end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_correct_lookup_table(node, &node->items_area, + item_index, extents_count); + if (unlikely(err)) { + SSDFS_ERR("fail to correct lookup table: " + "err %d\n", err); + goto finish_items_area_correction; + } + + hdr = &node->raw.invextree_header; + + le32_add_cpu(&hdr->extents_count, search->request.count); + atomic64_add(search->request.count, &tree->extents_count); + +finish_items_area_correction: + up_write(&node->header_lock); + + if (unlikely(err)) + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + + return err; +} From patchwork Sat Feb 25 01:09:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151977 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E6B2C64ED8 for ; Sat, 25 Feb 2023 01:20:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229721AbjBYBUt (ORCPT ); Fri, 24 Feb 2023 20:20:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229798AbjBYBTk (ORCPT ); Fri, 24 Feb 2023 20:19:40 -0500 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3EA0E14EB1 for ; Fri, 24 Feb 2023 17:18:08 -0800 (PST) Received: by mail-oi1-x22b.google.com with SMTP id y184so799099oiy.8 for ; Fri, 24 Feb 2023 17:18:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=FWskcHFcXnOzVU030JxCrM03sRl41YyxeLn7/rXHJ+0=; b=G0kYERFo8Wu3IHW8Us8FKSsl5cBM3Th0mgrcZ/28yumz1At7DfTPFgu3MgqR5aGBxL d8QAf/Ff6n31QZ4N96eY9LRDpqiP7wfvhfOXKqrd672u9rqNOG8QDCwSTvgm1ruMlHdV uNPuMiJ2YC0NviGclQSRzOq+KbizLiimUokaf6hJvW6no3qqLC0kNElzGN2tjkg6po+I M2Tin2TjAvyKt8xezuY0brgL5Pui7RAgpArkg8+Ab9wqNjHPOIN6PWfb6nV6ekKQBmlF gMGUfgboWjfpl312JRsDgEWAyz6uK8ZhETD3tAlW7jIIicK3YSU+xNDxOK9EqkW8L4oZ 5caw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FWskcHFcXnOzVU030JxCrM03sRl41YyxeLn7/rXHJ+0=; b=CDRq07WTj08LbUNCTV+ADZz5g4RsYGDWmfT/jwzvvU2Z0qkkSXDMtNOqtDT/rQWR0C aeNMAU2fjRGRS5y8rMTXxPsljY9+RWPNYPqwZoJqZfk9O/6jqggU9UZ8qHc95eDdhvyB rDAKMW2EFXTHxxszLp7QRONivENM6/X5OXyXfAISQKtPOnI/kUhi4joyldeUVWEK/PUG 1k66e4whJ7tFccpvSaSkQQRacjNoBOQ3JSiq7DTfLMEo05KEHw4VUrjjJCGajOxmKbQJ 0hq4C5pX2yLX0F3yFd5STsgTgqJlk8b7lq37Q5IcAc0JhQ767zL1tnQbw2iuwrxcsODS jOzg== X-Gm-Message-State: AO0yUKX+JppZ/2Txb+iNMq3Batro5eCONofgp4yvJqDtNN7v33nRnlo1 N8P0ZIsLHj9bjuKmkE9jwVLlLnb03SpKZ+dD X-Google-Smtp-Source: AK7set/41h6cdVMjtiR0cjENyzQQFFdtYVgxappJAbAQ4A8j54j58xsSqfwK6Y3seXDhlzY984WZig== X-Received: by 2002:aca:2218:0:b0:384:87c:7579 with SMTP id b24-20020aca2218000000b00384087c7579mr1288000oic.42.1677287887388; Fri, 24 Feb 2023 17:18:07 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.18.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:18:06 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 72/76] ssdfs: modification operations of invalidated extents b-tree Date: Fri, 24 Feb 2023 17:09:23 -0800 Message-Id: <20230225010927.813929-73-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Implement modification logic of invalidated extents b-tree. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/invalidated_extents_tree.c | 2900 +++++++++++++++++++++++++++ 1 file changed, 2900 insertions(+) diff --git a/fs/ssdfs/invalidated_extents_tree.c b/fs/ssdfs/invalidated_extents_tree.c index d7dc4156a20d..0d2a255b9551 100644 --- a/fs/ssdfs/invalidated_extents_tree.c +++ b/fs/ssdfs/invalidated_extents_tree.c @@ -4161,3 +4161,2903 @@ int ssdfs_invextree_node_do_insert_range(struct ssdfs_invextree_info *tree, return err; } + +/* + * ssdfs_invextree_node_merge_range_left() - merge range with left extent + * @tree: invalidated extents tree + * @node: pointer on node object + * @items_area: items area state + * @search: search object [in|out] + * + * This method tries to merge a left extent with inserting range. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_invextree_node_merge_range_left(struct ssdfs_invextree_info *tree, + struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *items_area, + struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_invextree_node_header *hdr; + struct ssdfs_raw_extent *prepared = NULL; + struct ssdfs_raw_extent extent; + size_t item_size = sizeof(struct ssdfs_raw_extent); + u16 item_index; + u16 range_len; + u32 used_space; + u64 seg_id; + u32 logical_blk; + u32 added_extents = 0; + u16 extents_count = 0; + u32 len; + u64 start_hash, end_hash; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !node || !items_area || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + item_index = search->result.start_index; + if (item_index == 0) { + SSDFS_ERR("there is no item from the left\n"); + return -ERANGE; + } + + search->result.start_index--; + + item_index = search->result.start_index; + if ((item_index + search->request.count) > items_area->items_capacity) { + SSDFS_ERR("invalid request: " + "item_index %u, count %u\n", + item_index, search->request.count); + return -ERANGE; + } + + range_len = items_area->items_count - search->result.start_index; + extents_count = range_len + search->request.count; + + prepared = (struct ssdfs_raw_extent *)search->result.buf; + len = le32_to_cpu(prepared->len); + + err = ssdfs_invextree_node_get_extent(node, + items_area, + item_index - 1, + &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to get extent: err %d\n", err); + return err; + } + + le32_add_cpu(&extent.len, len); + + ssdfs_memcpy(search->result.buf, 0, search->result.buf_size, + &extent, 0, item_size, + item_size); + + added_extents = search->request.count - 1; + + if (search->request.count > 1) { + err = ssdfs_shift_range_right(node, items_area, item_size, + item_index + 1, range_len - 1, + added_extents); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to shift dentries range: " + "start %u, count %u, err %d\n", + item_index, added_extents, err); + return err; + } + } + + ssdfs_debug_btree_node_object(node); + + err = ssdfs_generic_insert_range(node, items_area, + item_size, search); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to insert item: err %d\n", + err); + return err; + } + + down_write(&node->header_lock); + + if (search->request.count > 1) { + node->items_area.items_count += added_extents; + if (node->items_area.items_count > + node->items_area.items_capacity) { + err = -ERANGE; + SSDFS_ERR("items_count %u > items_capacity %u\n", + node->items_area.items_count, + node->items_area.items_capacity); + goto finish_items_area_correction; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_capacity %u, items_count %u\n", + items_area->items_capacity, + items_area->items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + used_space = added_extents * item_size; + if (used_space > node->items_area.free_space) { + err = -ERANGE; + SSDFS_ERR("used_space %u > free_space %u\n", + used_space, + node->items_area.free_space); + goto finish_items_area_correction; + } + node->items_area.free_space -= used_space; + } + + err = ssdfs_invextree_node_get_extent(node, + &node->items_area, + 0, &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to get extent: err %d\n", err); + goto finish_items_area_correction; + } + + seg_id = le64_to_cpu(extent.seg_id); + logical_blk = le32_to_cpu(extent.logical_blk); + + start_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk); + + err = ssdfs_invextree_node_get_extent(node, + &node->items_area, + node->items_area.items_count - 1, + &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to get extent: err %d\n", err); + goto finish_items_area_correction; + } + + seg_id = le64_to_cpu(extent.seg_id); + logical_blk = le32_to_cpu(extent.logical_blk); + + end_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk); + + if (start_hash >= U64_MAX || end_hash >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("start_hash %llx, end_hash %llx\n", + start_hash, end_hash); + goto finish_items_area_correction; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("BEFORE: node_id %u, start_hash %llx, end_hash %llx\n", + node->node_id, + node->items_area.start_hash, + node->items_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + node->items_area.start_hash = start_hash; + node->items_area.end_hash = end_hash; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("AFTER: node_id %u, start_hash %llx, end_hash %llx\n", + node->node_id, start_hash, end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_correct_lookup_table(node, &node->items_area, + item_index, extents_count); + if (unlikely(err)) { + SSDFS_ERR("fail to correct lookup table: " + "err %d\n", err); + goto finish_items_area_correction; + } + + if (search->request.count > 1) { + hdr = &node->raw.invextree_header; + + le32_add_cpu(&hdr->extents_count, added_extents); + atomic64_add(added_extents, &tree->extents_count); + } + +finish_items_area_correction: + up_write(&node->header_lock); + + if (unlikely(err)) + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + + return err; +} + +/* + * ssdfs_invextree_node_merge_range_right() - merge range with right extent + * @tree: invalidated extents tree + * @node: pointer on node object + * @items_area: items area state + * @search: search object [in|out] + * + * This method tries to merge a right extent with inserting range. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_invextree_node_merge_range_right(struct ssdfs_invextree_info *tree, + struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *items_area, + struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_invextree_node_header *hdr; + struct ssdfs_raw_extent extent; + struct ssdfs_raw_extent *prepared = NULL; + size_t item_size = sizeof(struct ssdfs_raw_extent); + u16 item_index; + u16 range_len; + u32 used_space; + u64 seg_id; + u32 logical_blk; + u32 offset; + u32 len; + u32 added_extents = 0; + u16 extents_count = 0; + u64 start_hash, end_hash; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !node || !items_area || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + item_index = search->result.start_index + search->request.count - 1; + if (item_index >= items_area->items_capacity) { + SSDFS_ERR("invalid request: " + "item_index %u, items_capacity %u\n", + item_index, items_area->items_capacity); + return -ERANGE; + } + + range_len = items_area->items_count - search->result.start_index; + extents_count = range_len + search->request.count; + + offset = (search->result.items_in_buffer - 1) * + sizeof(struct ssdfs_raw_extent); + + prepared = + (struct ssdfs_raw_extent *)((u8 *)search->result.buf + offset); + len = le32_to_cpu(prepared->len); + + err = ssdfs_invextree_node_get_extent(node, + items_area, + item_index, + &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to get extent: err %d\n", err); + return err; + } + + extent.logical_blk = prepared->logical_blk; + le32_add_cpu(&extent.len, len); + + ssdfs_memcpy(search->result.buf, offset, search->result.buf_size, + &extent, 0, item_size, + item_size); + + item_index = search->result.start_index + 1; + added_extents = search->request.count - 1; + + if (search->request.count > 1) { + err = ssdfs_shift_range_right(node, items_area, item_size, + item_index, range_len - 1, + added_extents); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to shift dentries range: " + "start %u, count %u, err %d\n", + item_index, added_extents, err); + return err; + } + } + + ssdfs_debug_btree_node_object(node); + + err = ssdfs_generic_insert_range(node, items_area, + item_size, search); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to insert item: err %d\n", + err); + return err; + } + + down_write(&node->header_lock); + + if (search->request.count > 1) { + node->items_area.items_count += added_extents; + if (node->items_area.items_count > + node->items_area.items_capacity) { + err = -ERANGE; + SSDFS_ERR("items_count %u > items_capacity %u\n", + node->items_area.items_count, + node->items_area.items_capacity); + goto finish_items_area_correction; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_capacity %u, items_count %u\n", + items_area->items_capacity, + items_area->items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + used_space = added_extents * item_size; + if (used_space > node->items_area.free_space) { + err = -ERANGE; + SSDFS_ERR("used_space %u > free_space %u\n", + used_space, + node->items_area.free_space); + goto finish_items_area_correction; + } + node->items_area.free_space -= used_space; + } + + err = ssdfs_invextree_node_get_extent(node, + &node->items_area, + 0, &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to get extent: err %d\n", err); + goto finish_items_area_correction; + } + + seg_id = le64_to_cpu(extent.seg_id); + logical_blk = le32_to_cpu(extent.logical_blk); + + start_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk); + + err = ssdfs_invextree_node_get_extent(node, + &node->items_area, + node->items_area.items_count - 1, + &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to get extent: err %d\n", err); + goto finish_items_area_correction; + } + + seg_id = le64_to_cpu(extent.seg_id); + logical_blk = le32_to_cpu(extent.logical_blk); + + end_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk); + + if (start_hash >= U64_MAX || end_hash >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("start_hash %llx, end_hash %llx\n", + start_hash, end_hash); + goto finish_items_area_correction; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("BEFORE: node_id %u, start_hash %llx, end_hash %llx\n", + node->node_id, + node->items_area.start_hash, + node->items_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + node->items_area.start_hash = start_hash; + node->items_area.end_hash = end_hash; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("AFTER: node_id %u, start_hash %llx, end_hash %llx\n", + node->node_id, start_hash, end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_correct_lookup_table(node, &node->items_area, + item_index, extents_count); + if (unlikely(err)) { + SSDFS_ERR("fail to correct lookup table: " + "err %d\n", err); + goto finish_items_area_correction; + } + + if (search->request.count > 1) { + hdr = &node->raw.invextree_header; + + le32_add_cpu(&hdr->extents_count, added_extents); + atomic64_add(added_extents, &tree->extents_count); + } + +finish_items_area_correction: + up_write(&node->header_lock); + + if (unlikely(err)) + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + + return err; +} + +/* + * ssdfs_invextree_node_merge_left_and_right() - merge range with neighbours + * @tree: invalidated extents tree + * @node: pointer on node object + * @items_area: items area state + * @search: search object [in|out] + * + * This method tries to merge inserting range with left and right extents. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static int +ssdfs_invextree_node_merge_left_and_right(struct ssdfs_invextree_info *tree, + struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *items_area, + struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_invextree_node_header *hdr; + struct ssdfs_raw_extent extent; + struct ssdfs_raw_extent *prepared = NULL; + size_t item_size = sizeof(struct ssdfs_raw_extent); + u16 item_index; + u16 range_len; + u32 used_space; + u64 seg_id; + u32 logical_blk; + u32 offset; + u32 len; + int added_extents = 0; + u16 extents_count = 0; + u64 start_hash, end_hash; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !node || !items_area || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + item_index = search->result.start_index; + if ((item_index + search->request.count) > items_area->items_capacity) { + SSDFS_ERR("invalid request: " + "item_index %u, count %u\n", + item_index, search->request.count); + return -ERANGE; + } + + prepared = (struct ssdfs_raw_extent *)search->result.buf; + len = le32_to_cpu(prepared->len); + + if (item_index == 0) { + SSDFS_ERR("there is no item from the left\n"); + return -ERANGE; + } + + err = ssdfs_invextree_node_get_extent(node, + items_area, + item_index - 1, + &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to get extent: err %d\n", err); + return err; + } + + le32_add_cpu(&extent.len, len); + + ssdfs_memcpy(search->result.buf, 0, search->result.buf_size, + &extent, 0, item_size, + item_size); + + item_index = search->result.start_index + search->request.count - 1; + + offset = (search->result.items_in_buffer - 1) * + sizeof(struct ssdfs_raw_extent); + + prepared = + (struct ssdfs_raw_extent *)((u8 *)search->result.buf + offset); + len = le32_to_cpu(prepared->len); + + err = ssdfs_invextree_node_get_extent(node, + items_area, + item_index, + &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to get extent: err %d\n", err); + return err; + } + + if (search->request.count > 1) + extent.logical_blk = prepared->logical_blk; + + le32_add_cpu(&extent.len, len); + + ssdfs_memcpy(search->result.buf, offset, search->result.buf_size, + &extent, 0, item_size, + item_size); + + range_len = items_area->items_count - search->result.start_index; + extents_count = range_len + search->request.count; + added_extents = search->request.count - 2; + + if (search->request.count <= 0) { + SSDFS_ERR("invalid request count %d\n", + search->request.count); + return -ERANGE; + } else if (search->request.count == 1) { + err = ssdfs_shift_range_left(node, items_area, item_size, + item_index, range_len, + search->request.count); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to shift dentries range: " + "start %u, count %u, err %d\n", + item_index, search->request.count, err); + return err; + } + + item_index = search->result.start_index - 1; + } else if (search->request.count == 2) { + /* no shift */ + item_index = search->result.start_index - 1; + } else { + item_index = search->result.start_index + 1; + + err = ssdfs_shift_range_right(node, items_area, item_size, + item_index, range_len - 1, + added_extents); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to shift dentries range: " + "start %u, count %u, err %d\n", + item_index, added_extents, err); + return err; + } + + item_index = search->result.start_index - 1; + } + + ssdfs_debug_btree_node_object(node); + + err = ssdfs_generic_insert_range(node, items_area, + item_size, search); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to insert item: err %d\n", + err); + return err; + } + + down_write(&node->header_lock); + + if (search->request.count <= 0) { + err = -ERANGE; + SSDFS_ERR("invalid request count %d\n", + search->request.count); + goto finish_items_area_correction; + } else if (search->request.count == 1) { + if (node->items_area.items_count == 0) { + err = -ERANGE; + SSDFS_ERR("node's items area is empty\n"); + goto finish_items_area_correction; + } + + /* two items are exchanged on one */ + node->items_area.items_count--; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_capacity %u, items_count %u\n", + items_area->items_capacity, + items_area->items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + node->items_area.free_space += item_size; + } else if (search->request.count == 2) { + /* + * Nothing has been added. + */ + } else { + node->items_area.items_count += added_extents; + if (node->items_area.items_count > + node->items_area.items_capacity) { + err = -ERANGE; + SSDFS_ERR("items_count %u > items_capacity %u\n", + node->items_area.items_count, + node->items_area.items_capacity); + goto finish_items_area_correction; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_capacity %u, items_count %u\n", + items_area->items_capacity, + items_area->items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + used_space = added_extents * item_size; + if (used_space > node->items_area.free_space) { + err = -ERANGE; + SSDFS_ERR("used_space %u > free_space %u\n", + used_space, + node->items_area.free_space); + goto finish_items_area_correction; + } + node->items_area.free_space -= used_space; + } + + err = ssdfs_invextree_node_get_extent(node, + &node->items_area, + 0, &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to get extent: err %d\n", err); + goto finish_items_area_correction; + } + + seg_id = le64_to_cpu(extent.seg_id); + logical_blk = le32_to_cpu(extent.logical_blk); + + start_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk); + + err = ssdfs_invextree_node_get_extent(node, + &node->items_area, + node->items_area.items_count - 1, + &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to get extent: err %d\n", err); + goto finish_items_area_correction; + } + + seg_id = le64_to_cpu(extent.seg_id); + logical_blk = le32_to_cpu(extent.logical_blk); + + end_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk); + + if (start_hash >= U64_MAX || end_hash >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("start_hash %llx, end_hash %llx\n", + start_hash, end_hash); + goto finish_items_area_correction; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("BEFORE: node_id %u, start_hash %llx, end_hash %llx\n", + node->node_id, + node->items_area.start_hash, + node->items_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + node->items_area.start_hash = start_hash; + node->items_area.end_hash = end_hash; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("AFTER: node_id %u, start_hash %llx, end_hash %llx\n", + node->node_id, start_hash, end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_correct_lookup_table(node, &node->items_area, + item_index, extents_count); + if (unlikely(err)) { + SSDFS_ERR("fail to correct lookup table: " + "err %d\n", err); + goto finish_items_area_correction; + } + + if (search->request.count == 1) { + hdr = &node->raw.invextree_header; + + le32_add_cpu(&hdr->extents_count, added_extents); + atomic64_add(added_extents, &tree->extents_count); + } else if (search->request.count == 2) { + /* + * Nothing has been added. + */ + } else { + hdr = &node->raw.invextree_header; + + le32_add_cpu(&hdr->extents_count, added_extents); + atomic64_add(added_extents, &tree->extents_count); + } + +finish_items_area_correction: + up_write(&node->header_lock); + + if (unlikely(err)) + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + + return err; +} + +/* + * __ssdfs_invextree_node_insert_range() - insert range into node + * @node: pointer on node object + * @search: search object + * + * This method tries to insert the range of extents into the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int __ssdfs_invextree_node_insert_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_btree *tree; + struct ssdfs_invextree_info *tree_info; + struct ssdfs_btree_node_items_area items_area; + struct ssdfs_raw_extent found; + struct ssdfs_raw_extent *prepared = NULL; + size_t item_size = sizeof(struct ssdfs_raw_extent); + u64 old_hash; + u64 start_hash = U64_MAX, end_hash = U64_MAX; + u64 cur_hash; + u16 item_index; + int free_items; + u16 range_len; + u16 extents_count = 0; + u64 seg_id1, seg_id2; + u32 logical_blk1, logical_blk2; + u32 len; + int direction; + bool need_merge_with_left = false; + bool need_merge_with_right = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid items_area state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + + tree = node->tree; + fsi = tree->fsi; + + switch (tree->type) { + case SSDFS_INVALIDATED_EXTENTS_BTREE: + /* expected btree type */ + break; + + default: + SSDFS_ERR("invalid btree type %#x\n", tree->type); + return -ERANGE; + } + + tree_info = container_of(tree, + struct ssdfs_invextree_info, + generic_tree); + + down_read(&node->header_lock); + ssdfs_memcpy(&items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + &node->items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + sizeof(struct ssdfs_btree_node_items_area)); + old_hash = node->items_area.start_hash; + up_read(&node->header_lock); + + if (items_area.items_capacity == 0 || + items_area.items_capacity < items_area.items_count) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid items accounting: " + "node_id %u, items_capacity %u, items_count %u\n", + node->node_id, items_area.items_capacity, + items_area.items_count); + return -EFAULT; + } + + if (items_area.min_item_size != item_size || + items_area.max_item_size != item_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("min_item_size %u, max_item_size %u, " + "item_size %zu\n", + items_area.min_item_size, items_area.max_item_size, + item_size); + return -EFAULT; + } + + if (items_area.area_size == 0 || + items_area.area_size >= node->node_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid area_size %u\n", + items_area.area_size); + return -EFAULT; + } + + if (items_area.free_space > items_area.area_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("free_space %u > area_size %u\n", + items_area.free_space, items_area.area_size); + return -EFAULT; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("items_capacity %u, items_count %u\n", + items_area.items_capacity, + items_area.items_count); + SSDFS_DBG("items_area: start_hash %llx, end_hash %llx\n", + items_area.start_hash, items_area.end_hash); + SSDFS_DBG("area_size %u, free_space %u\n", + items_area.area_size, + items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + free_items = items_area.items_capacity - items_area.items_count; + if (unlikely(free_items < 0)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_WARN("invalid free_items %d\n", + free_items); + return -EFAULT; + } else if (free_items == 0) { + SSDFS_DBG("node hasn't free items\n"); + return -ENOSPC; + } + + if (((u64)free_items * item_size) > items_area.free_space) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid free_items: " + "free_items %d, item_size %zu, free_space %u\n", + free_items, item_size, items_area.free_space); + return -EFAULT; + } + + item_index = search->result.start_index; + if ((item_index + search->request.count) > items_area.items_capacity) { + SSDFS_ERR("invalid request: " + "item_index %u, count %u\n", + item_index, search->request.count); + return -ERANGE; + } + + down_write(&node->full_lock); + + direction = is_requested_position_correct(node, &items_area, + search); + switch (direction) { + case SSDFS_CORRECT_POSITION: + /* do nothing */ + break; + + case SSDFS_SEARCH_LEFT_DIRECTION: + err = ssdfs_find_correct_position_from_left(node, &items_area, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the correct position: " + "err %d\n", + err); + goto finish_detect_affected_items; + } + break; + + case SSDFS_SEARCH_RIGHT_DIRECTION: + err = ssdfs_find_correct_position_from_right(node, &items_area, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the correct position: " + "err %d\n", + err); + goto finish_detect_affected_items; + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("fail to check requested position\n"); + goto finish_detect_affected_items; + } + + range_len = items_area.items_count - search->result.start_index; + extents_count = range_len + search->request.count; + + item_index = search->result.start_index; + if ((item_index + extents_count) > items_area.items_capacity) { + err = -ERANGE; + SSDFS_ERR("invalid snapshots_count: " + "item_index %u, extents_count %u, " + "items_capacity %u\n", + item_index, extents_count, + items_area.items_capacity); + goto finish_detect_affected_items; + } + + if (items_area.items_count == 0) + goto lock_items_range; + + start_hash = search->request.start.hash; + end_hash = search->request.end.hash; + + if (item_index > 0) { + err = ssdfs_invextree_node_get_extent(node, + &items_area, + item_index - 1, + &found); + if (unlikely(err)) { + SSDFS_ERR("fail to get extent: err %d\n", err); + goto finish_detect_affected_items; + } + + seg_id1 = le64_to_cpu(found.seg_id); + logical_blk1 = le32_to_cpu(found.logical_blk); + len = le32_to_cpu(found.len); + + cur_hash = ssdfs_invextree_calculate_hash(fsi, seg_id1, + logical_blk1 + len - 1); + + if (cur_hash < start_hash) { + prepared = + (struct ssdfs_raw_extent *)search->result.buf; + + seg_id2 = le64_to_cpu(prepared->seg_id); + logical_blk2 = le32_to_cpu(prepared->logical_blk); + + if (seg_id1 == seg_id2 && + (logical_blk1 + len) == logical_blk2) { + /* + * Left and prepared extents need to be merged + */ + need_merge_with_left = true; + } + } else { + SSDFS_ERR("invalid range: item_index %u, " + "cur_hash %llx, " + "start_hash %llx, end_hash %llx\n", + item_index, cur_hash, + start_hash, end_hash); + + ssdfs_show_extent_items(node, &items_area); + + err = -ERANGE; + goto finish_detect_affected_items; + } + } + + if (item_index < items_area.items_count) { + err = ssdfs_invextree_node_get_extent(node, + &items_area, + item_index, + &found); + if (unlikely(err)) { + SSDFS_ERR("fail to get extent: err %d\n", err); + goto finish_detect_affected_items; + } + + seg_id1 = le64_to_cpu(found.seg_id); + logical_blk1 = le32_to_cpu(found.logical_blk); + + cur_hash = ssdfs_invextree_calculate_hash(fsi, seg_id1, + logical_blk1); + + if (end_hash < cur_hash) { + prepared = + (struct ssdfs_raw_extent *)search->result.buf; + prepared += search->result.items_in_buffer - 1; + + seg_id2 = le64_to_cpu(prepared->seg_id); + logical_blk2 = le32_to_cpu(prepared->logical_blk); + len = le32_to_cpu(prepared->len); + + if (seg_id1 == seg_id2 && + (logical_blk2 + len) == logical_blk1) { + /* + * Right and prepared extents need to be merged + */ + need_merge_with_right = true; + } + } else { + SSDFS_ERR("invalid range: item_index %u, " + "cur_hash %llx, " + "start_hash %llx, end_hash %llx\n", + item_index, cur_hash, + start_hash, end_hash); + + ssdfs_show_extent_items(node, &items_area); + + err = -ERANGE; + goto finish_detect_affected_items; + } + } + + if (need_merge_with_left) { + item_index -= 1; + search->result.start_index = item_index; + + range_len = items_area.items_count - search->result.start_index; + extents_count = range_len + search->request.count; + } + +lock_items_range: + err = ssdfs_lock_items_range(node, item_index, extents_count); + if (err == -ENOENT) { + up_write(&node->full_lock); + wake_up_all(&node->wait_queue); + return -ERANGE; + } else if (err == -ENODATA) { + up_write(&node->full_lock); + wake_up_all(&node->wait_queue); + return -ERANGE; + } else if (unlikely(err)) + BUG(); + +finish_detect_affected_items: + downgrade_write(&node->full_lock); + + if (unlikely(err)) + goto finish_insert_item; + + if (need_merge_with_left && need_merge_with_right) { + err = ssdfs_invextree_node_merge_left_and_right(tree_info, + node, + &items_area, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to insert range: err %d\n", err); + goto unlock_items_range; + } + } else if (need_merge_with_left) { + err = ssdfs_invextree_node_merge_range_left(tree_info, + node, + &items_area, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to insert range: err %d\n", err); + goto unlock_items_range; + } + } else if (need_merge_with_right) { + err = ssdfs_invextree_node_merge_range_right(tree_info, + node, + &items_area, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to insert range: err %d\n", err); + goto unlock_items_range; + } + } else { + err = ssdfs_invextree_node_do_insert_range(tree_info, node, + &items_area, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to insert range: err %d\n", err); + goto unlock_items_range; + } + } + + err = ssdfs_set_node_header_dirty(node, items_area.items_capacity); + if (unlikely(err)) { + SSDFS_ERR("fail to set header dirty: err %d\n", + err); + goto unlock_items_range; + } + + err = ssdfs_set_dirty_items_range(node, items_area.items_capacity, + item_index, extents_count); + if (unlikely(err)) { + SSDFS_ERR("fail to set items range as dirty: " + "start %u, count %u, err %d\n", + item_index, extents_count, err); + goto unlock_items_range; + } + +unlock_items_range: + ssdfs_unlock_items_range(node, item_index, extents_count); + +finish_insert_item: + up_read(&node->full_lock); + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_HYBRID_NODE: + if (items_area.items_count == 0) { + struct ssdfs_btree_index_key key; + + spin_lock(&node->descriptor_lock); + ssdfs_memcpy(&key, + 0, sizeof(struct ssdfs_btree_index_key), + &node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + spin_unlock(&node->descriptor_lock); + + key.index.hash = cpu_to_le64(start_hash); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, node_type %#x, " + "node_height %u, hash %llx\n", + le32_to_cpu(key.node_id), + key.node_type, + key.height, + le64_to_cpu(key.index.hash)); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_btree_node_add_index(node, &key); + if (unlikely(err)) { + SSDFS_ERR("fail to add index: err %d\n", err); + return err; + } + } else if (old_hash != start_hash) { + struct ssdfs_btree_index_key old_key, new_key; + + spin_lock(&node->descriptor_lock); + ssdfs_memcpy(&old_key, + 0, sizeof(struct ssdfs_btree_index_key), + &node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + ssdfs_memcpy(&new_key, + 0, sizeof(struct ssdfs_btree_index_key), + &node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + spin_unlock(&node->descriptor_lock); + + old_key.index.hash = cpu_to_le64(old_hash); + new_key.index.hash = cpu_to_le64(start_hash); + + err = ssdfs_btree_node_change_index(node, + &old_key, &new_key); + if (unlikely(err)) { + SSDFS_ERR("fail to change index: err %d\n", + err); + return err; + } + } + break; + + default: + /* do nothing */ + break; + } + + ssdfs_debug_btree_node_object(node); + + return err; +} + +/* + * ssdfs_invextree_node_insert_item() - insert item in the node + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to insert an item in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOSPC - node hasn't free items. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_invextree_node_insert_item(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + int state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); + SSDFS_DBG("free_space %u\n", node->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid result's state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.err == -ENODATA) { + search->result.err = 0; + /* + * Node doesn't contain requested item. + */ + } else if (search->result.err) { + SSDFS_WARN("invalid search result: err %d\n", + search->result.err); + return search->result.err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.count != 1); + BUG_ON(!search->result.buf); + BUG_ON(search->result.buf_state != SSDFS_BTREE_SEARCH_INLINE_BUFFER); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_read(&node->items_area.state); + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + SSDFS_ERR("invalid area state %#x\n", + state); + return -ERANGE; + } + + err = __ssdfs_invextree_node_insert_range(node, search); + if (unlikely(err)) { + SSDFS_ERR("fail to insert item: " + "node_id %u, err %d\n", + node->node_id, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_space %u\n", node->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_invextree_node_insert_range() - insert range of items + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to insert a range of items in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-ENOSPC - node hasn't free items. + * %-ENOMEM - fail to allocate memory. + */ +static +int ssdfs_invextree_node_insert_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + int state; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); + SSDFS_DBG("free_space %u\n", node->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + case SSDFS_BTREE_SEARCH_OUT_OF_RANGE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid result's state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.err == -ENODATA) { + /* + * Node doesn't contain inserting items. + */ + } else if (search->result.err) { + SSDFS_WARN("invalid search result: err %d\n", + search->result.err); + return search->result.err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.count < 1); + BUG_ON(!search->result.buf); +#endif /* CONFIG_SSDFS_DEBUG */ + + state = atomic_read(&node->items_area.state); + if (state != SSDFS_BTREE_NODE_ITEMS_AREA_EXIST) { + SSDFS_ERR("invalid area state %#x\n", + state); + return -ERANGE; + } + + err = __ssdfs_invextree_node_insert_range(node, search); + if (unlikely(err)) { + SSDFS_ERR("fail to insert range: " + "node_id %u, err %d\n", + node->node_id, err); + return err; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("free_space %u\n", node->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + return 0; +} + +/* + * ssdfs_change_item_only() - change snapshot in the node + * @node: pointer on node object + * @area: pointer on items area's descriptor + * @search: pointer on search request object + * + * This method tries to change an item in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_change_item_only(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_raw_extent extent; + size_t item_size = sizeof(struct ssdfs_raw_extent); + u16 range_len; + u16 item_index; + u64 seg_id; + u32 logical_blk; + u32 len; + u64 start_hash, end_hash; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + range_len = search->request.count; + + if (range_len == 0) { + err = -ERANGE; + SSDFS_ERR("empty range\n"); + return err; + } + + item_index = search->result.start_index; + if ((item_index + range_len) > area->items_count) { + err = -ERANGE; + SSDFS_ERR("invalid request: " + "item_index %u, range_len %u, items_count %u\n", + item_index, range_len, + area->items_count); + return err; + } + + err = ssdfs_invextree_node_get_extent(node, area, item_index, &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to get extent: err %d\n", err); + return err; + } + + err = ssdfs_generic_insert_range(node, area, + item_size, search); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to insert range: err %d\n", + err); + return err; + } + + down_write(&node->header_lock); + + start_hash = node->items_area.start_hash; + end_hash = node->items_area.end_hash; + + if (item_index == 0) { + err = ssdfs_invextree_node_get_extent(node, + &node->items_area, + item_index, + &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to get extent: err %d\n", err); + goto finish_items_area_correction; + } + + seg_id = le64_to_cpu(extent.seg_id); + logical_blk = le32_to_cpu(extent.logical_blk); + + start_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk); + } + + if ((item_index + range_len) == node->items_area.items_count) { + err = ssdfs_invextree_node_get_extent(node, + &node->items_area, + item_index + range_len - 1, + &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to get extent: err %d\n", err); + goto finish_items_area_correction; + } + + seg_id = le64_to_cpu(extent.seg_id); + logical_blk = le32_to_cpu(extent.logical_blk); + len = le32_to_cpu(extent.len); + + end_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk + len - 1); + } else if ((item_index + range_len) > node->items_area.items_count) { + err = -ERANGE; + SSDFS_ERR("invalid range_len: " + "item_index %u, range_len %u, items_count %u\n", + item_index, range_len, + node->items_area.items_count); + goto finish_items_area_correction; + } + + node->items_area.start_hash = start_hash; + node->items_area.end_hash = end_hash; + + err = ssdfs_correct_lookup_table(node, &node->items_area, + item_index, range_len); + if (unlikely(err)) { + SSDFS_ERR("fail to correct lookup table: " + "err %d\n", err); + goto finish_items_area_correction; + } + +finish_items_area_correction: + up_write(&node->header_lock); + + if (unlikely(err)) + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + + return err; +} + +/* + * ssdfs_invextree_node_change_item() - change item in the node + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to change an item in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_invextree_node_change_item(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_node_items_area items_area; + size_t item_size = sizeof(struct ssdfs_raw_extent); + u16 item_index; + int direction; + u16 range_len; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (search->result.state != SSDFS_BTREE_SEARCH_VALID_ITEM) { + SSDFS_ERR("invalid result's state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.err) { + SSDFS_WARN("invalid search result: err %d\n", + search->result.err); + return search->result.err; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.count != 1); + BUG_ON(!search->result.buf); + BUG_ON(search->result.buf_state != SSDFS_BTREE_SEARCH_INLINE_BUFFER); + BUG_ON(search->result.items_in_buffer != 1); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid items_area state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + + down_read(&node->header_lock); + ssdfs_memcpy(&items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + &node->items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + sizeof(struct ssdfs_btree_node_items_area)); + up_read(&node->header_lock); + + if (items_area.items_capacity == 0 || + items_area.items_capacity < items_area.items_count) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid items accounting: " + "node_id %u, items_capacity %u, items_count %u\n", + node->node_id, items_area.items_capacity, + items_area.items_count); + return -EFAULT; + } + + if (items_area.min_item_size != item_size || + items_area.max_item_size != item_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("min_item_size %u, max_item_size %u, " + "item_size %zu\n", + items_area.min_item_size, items_area.max_item_size, + item_size); + return -EFAULT; + } + + if (items_area.area_size == 0 || + items_area.area_size >= node->node_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid area_size %u\n", + items_area.area_size); + return -EFAULT; + } + + down_write(&node->full_lock); + + direction = is_requested_position_correct(node, &items_area, + search); + switch (direction) { + case SSDFS_CORRECT_POSITION: + /* do nothing */ + break; + + case SSDFS_SEARCH_LEFT_DIRECTION: + err = ssdfs_find_correct_position_from_left(node, &items_area, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the correct position: " + "err %d\n", + err); + goto finish_define_changing_items; + } + break; + + case SSDFS_SEARCH_RIGHT_DIRECTION: + err = ssdfs_find_correct_position_from_right(node, &items_area, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the correct position: " + "err %d\n", + err); + goto finish_define_changing_items; + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("fail to check requested position\n"); + goto finish_define_changing_items; + } + + range_len = search->request.count; + + if (range_len == 0) { + err = -ERANGE; + SSDFS_ERR("empty range\n"); + goto finish_define_changing_items; + } + + item_index = search->result.start_index; + if ((item_index + range_len) > items_area.items_count) { + err = -ERANGE; + SSDFS_ERR("invalid request: " + "item_index %u, range_len %u, items_count %u\n", + item_index, range_len, + items_area.items_count); + goto finish_define_changing_items; + } + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_CHANGE_ITEM: + /* expected type */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid request type: %#x\n", + search->request.type); + goto finish_define_changing_items; + } + + err = ssdfs_lock_items_range(node, item_index, range_len); + if (err == -ENOENT) { + up_write(&node->full_lock); + wake_up_all(&node->wait_queue); + return -ERANGE; + } else if (err == -ENODATA) { + up_write(&node->full_lock); + wake_up_all(&node->wait_queue); + return -ERANGE; + } else if (unlikely(err)) + BUG(); + +finish_define_changing_items: + downgrade_write(&node->full_lock); + + if (unlikely(err)) + goto finish_change_item; + + err = ssdfs_change_item_only(node, &items_area, search); + if (unlikely(err)) { + SSDFS_ERR("fail to change item: err %d\n", + err); + goto unlock_items_range; + } + + err = ssdfs_set_node_header_dirty(node, items_area.items_capacity); + if (unlikely(err)) { + SSDFS_ERR("fail to set header dirty: err %d\n", + err); + goto unlock_items_range; + } + + err = ssdfs_set_dirty_items_range(node, items_area.items_capacity, + item_index, range_len); + if (unlikely(err)) { + SSDFS_ERR("fail to set items range as dirty: " + "start %u, count %u, err %d\n", + item_index, range_len, err); + goto unlock_items_range; + } + +unlock_items_range: + ssdfs_unlock_items_range(node, item_index, range_len); + +finish_change_item: + up_read(&node->full_lock); + + ssdfs_debug_btree_node_object(node); + + return err; +} + +/* + * __ssdfs_invalidate_items_area() - invalidate the items area + * @node: pointer on node object + * @area: pointer on items area's descriptor + * @start_index: starting index of the item + * @range_len: number of items in the range + * @search: pointer on search request object + * + * The method tries to invalidate the items area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int __ssdfs_invalidate_items_area(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + u16 start_index, u16 range_len, + struct ssdfs_btree_search *search) +{ + struct ssdfs_btree_node *parent = NULL; + bool is_hybrid = false; + bool has_index_area = false; + bool index_area_empty = false; + bool items_area_empty = false; + int parent_type = SSDFS_BTREE_LEAF_NODE; + spinlock_t *lock; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, start_index %u, range_len %u\n", + node->node_id, start_index, range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (((u32)start_index + range_len) > area->items_count) { + SSDFS_ERR("start_index %u, range_len %u, items_count %u\n", + start_index, range_len, + area->items_count); + return -ERANGE; + } + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_HYBRID_NODE: + is_hybrid = true; + break; + + case SSDFS_BTREE_LEAF_NODE: + is_hybrid = false; + break; + + default: + SSDFS_WARN("invalid node type %#x\n", + atomic_read(&node->type)); + return -ERANGE; + } + + down_write(&node->header_lock); + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + if (node->items_area.items_count == range_len) + items_area_empty = true; + else + items_area_empty = false; + break; + + default: + items_area_empty = false; + break; + } + + switch (atomic_read(&node->index_area.state)) { + case SSDFS_BTREE_NODE_INDEX_AREA_EXIST: + has_index_area = true; + if (node->index_area.index_count == 0) + index_area_empty = true; + else + index_area_empty = false; + break; + + default: + has_index_area = false; + index_area_empty = false; + break; + } + + up_write(&node->header_lock); + + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + return err; + } + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_DELETE_ITEM: + case SSDFS_BTREE_SEARCH_DELETE_RANGE: + if (is_hybrid && has_index_area && !index_area_empty) { + search->result.state = + SSDFS_BTREE_SEARCH_OBSOLETE_RESULT; + } else if (items_area_empty) { + search->result.state = + SSDFS_BTREE_SEARCH_PLEASE_DELETE_NODE; + } else { + search->result.state = + SSDFS_BTREE_SEARCH_OBSOLETE_RESULT; + } + break; + + case SSDFS_BTREE_SEARCH_DELETE_ALL: + search->result.state = + SSDFS_BTREE_SEARCH_OBSOLETE_RESULT; + + parent = node; + + do { + lock = &parent->descriptor_lock; + spin_lock(lock); + parent = parent->parent_node; + spin_unlock(lock); + lock = NULL; + + if (!parent) { + SSDFS_ERR("node %u hasn't parent\n", + node->node_id); + return -ERANGE; + } + + parent_type = atomic_read(&parent->type); + switch (parent_type) { + case SSDFS_BTREE_ROOT_NODE: + case SSDFS_BTREE_INDEX_NODE: + case SSDFS_BTREE_HYBRID_NODE: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid parent node's type %#x\n", + parent_type); + return -ERANGE; + } + } while (parent_type != SSDFS_BTREE_ROOT_NODE); + + err = ssdfs_invalidate_root_node_hierarchy(parent); + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate root node hierarchy: " + "err %d\n", err); + return -ERANGE; + } + break; + + default: + atomic_set(&node->state, + SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid request type %#x\n", + search->request.type); + return -ERANGE; + } + + return 0; +} + +/* + * ssdfs_invalidate_whole_items_area() - invalidate the whole items area + * @node: pointer on node object + * @area: pointer on items area's descriptor + * @search: pointer on search request object + * + * The method tries to invalidate the items area. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_invalidate_whole_items_area(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, area %p, search %p\n", + node->node_id, area, search); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_invalidate_items_area(node, area, + 0, area->items_count, + search); +} + +/* + * ssdfs_invalidate_items_area_partially() - invalidate the items area + * @node: pointer on node object + * @area: pointer on items area's descriptor + * @start_index: starting index + * @range_len: number of items in the range + * @search: pointer on search request object + * + * The method tries to invalidate the items area partially. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + */ +static +int ssdfs_invalidate_items_area_partially(struct ssdfs_btree_node *node, + struct ssdfs_btree_node_items_area *area, + u16 start_index, u16 range_len, + struct ssdfs_btree_search *search) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !area || !search); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + + SSDFS_DBG("node_id %u, start_index %u, range_len %u\n", + node->node_id, start_index, range_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + return __ssdfs_invalidate_items_area(node, area, + start_index, range_len, + search); +} + +/* + * __ssdfs_invextree_node_delete_range() - delete range of items + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to delete a range of items in the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + * %-EAGAIN - continue deletion in the next node. + */ +static +int __ssdfs_invextree_node_delete_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_btree *tree; + struct ssdfs_invextree_info *tree_info; + struct ssdfs_invextree_node_header *hdr; + struct ssdfs_btree_node_items_area items_area; + struct ssdfs_raw_extent extent; + size_t item_size = sizeof(struct ssdfs_raw_extent); + u16 index_count = 0; + int free_items; + u16 item_index; + int direction; + u16 range_len; + u16 shift_range_len = 0; + u16 locked_len = 0; + u32 deleted_space, free_space; + u64 seg_id; + u32 logical_blk; + u32 len; + u64 start_hash = U64_MAX; + u64 end_hash = U64_MAX; + u64 old_hash; + u32 old_extents_count = 0, extents_count = 0; + u32 extents_diff; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_VALID_ITEM: + case SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid result state %#x\n", + search->result.state); + return -ERANGE; + } + + if (search->result.err) { + SSDFS_WARN("invalid search result: err %d\n", + search->result.err); + return search->result.err; + } + + switch (atomic_read(&node->items_area.state)) { + case SSDFS_BTREE_NODE_ITEMS_AREA_EXIST: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid items_area state %#x\n", + atomic_read(&node->items_area.state)); + return -ERANGE; + } + + tree = node->tree; + + switch (tree->type) { + case SSDFS_INVALIDATED_EXTENTS_BTREE: + /* expected btree type */ + break; + + default: + SSDFS_ERR("invalid btree type %#x\n", tree->type); + return -ERANGE; + } + + tree_info = container_of(tree, + struct ssdfs_invextree_info, + generic_tree); + + down_read(&node->header_lock); + ssdfs_memcpy(&items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + &node->items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + sizeof(struct ssdfs_btree_node_items_area)); + old_hash = node->items_area.start_hash; + up_read(&node->header_lock); + + if (items_area.items_capacity == 0 || + items_area.items_capacity < items_area.items_count) { + SSDFS_ERR("invalid items accounting: " + "node_id %u, items_capacity %u, items_count %u\n", + search->node.id, + items_area.items_capacity, + items_area.items_count); + return -ERANGE; + } + + if (items_area.min_item_size != item_size || + items_area.max_item_size != item_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("min_item_size %u, max_item_size %u, " + "item_size %zu\n", + items_area.min_item_size, items_area.max_item_size, + item_size); + return -EFAULT; + } + + if (items_area.area_size == 0 || + items_area.area_size >= node->node_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid area_size %u\n", + items_area.area_size); + return -EFAULT; + } + + if (items_area.free_space > items_area.area_size) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("free_space %u > area_size %u\n", + items_area.free_space, items_area.area_size); + return -EFAULT; + } + + free_items = items_area.items_capacity - items_area.items_count; + if (unlikely(free_items < 0)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_WARN("invalid free_items %d\n", + free_items); + return -EFAULT; + } + + if (((u64)free_items * item_size) > items_area.free_space) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid free_items: " + "free_items %d, item_size %zu, free_space %u\n", + free_items, item_size, items_area.free_space); + return -EFAULT; + } + + extents_count = items_area.items_count; + item_index = search->result.start_index; + + range_len = search->request.count; + if (range_len == 0) { + SSDFS_ERR("range_len == 0\n"); + return -ERANGE; + } + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_DELETE_ITEM: + if ((item_index + range_len) > items_area.items_count) { + SSDFS_ERR("invalid request: " + "item_index %u, range_len %u, " + "items_count %u\n", + item_index, range_len, + items_area.items_count); + return -ERANGE; + } + break; + + case SSDFS_BTREE_SEARCH_DELETE_RANGE: + case SSDFS_BTREE_SEARCH_DELETE_ALL: + /* request can be distributed between several nodes */ + break; + + default: + atomic_set(&node->state, + SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("invalid request type %#x\n", + search->request.type); + return -ERANGE; + } + + down_write(&node->full_lock); + + direction = is_requested_position_correct(node, &items_area, + search); + switch (direction) { + case SSDFS_CORRECT_POSITION: + /* do nothing */ + break; + + case SSDFS_SEARCH_LEFT_DIRECTION: + err = ssdfs_find_correct_position_from_left(node, &items_area, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the correct position: " + "err %d\n", + err); + goto finish_detect_affected_items; + } + break; + + case SSDFS_SEARCH_RIGHT_DIRECTION: + err = ssdfs_find_correct_position_from_right(node, &items_area, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the correct position: " + "err %d\n", + err); + goto finish_detect_affected_items; + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("fail to check requested position\n"); + goto finish_detect_affected_items; + } + + item_index = search->result.start_index; + + switch (search->request.type) { + case SSDFS_BTREE_SEARCH_DELETE_ITEM: + if ((item_index + range_len) > items_area.items_count) { + err = -ERANGE; + SSDFS_ERR("invalid dentries_count: " + "item_index %u, dentries_count %u, " + "items_count %u\n", + item_index, range_len, + items_area.items_count); + goto finish_detect_affected_items; + } + break; + + case SSDFS_BTREE_SEARCH_DELETE_RANGE: + case SSDFS_BTREE_SEARCH_DELETE_ALL: + /* request can be distributed between several nodes */ + range_len = min_t(unsigned int, range_len, + items_area.items_count - item_index); +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_id %u, item_index %u, " + "request.count %u, items_count %u\n", + node->node_id, item_index, + search->request.count, + items_area.items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + break; + + default: + BUG(); + } + + locked_len = items_area.items_count - item_index; + + err = ssdfs_lock_items_range(node, item_index, locked_len); + if (err == -ENOENT) { + up_write(&node->full_lock); + wake_up_all(&node->wait_queue); + return -ERANGE; + } else if (err == -ENODATA) { + up_write(&node->full_lock); + wake_up_all(&node->wait_queue); + return -ERANGE; + } else if (unlikely(err)) + BUG(); + +finish_detect_affected_items: + downgrade_write(&node->full_lock); + + if (unlikely(err)) + goto finish_delete_range; + + err = ssdfs_btree_node_clear_range(node, &node->items_area, + item_size, search); + if (unlikely(err)) { + SSDFS_ERR("fail to clear items range: err %d\n", + err); + goto finish_delete_range; + } + + if (range_len == items_area.items_count) { + /* items area is empty */ + err = ssdfs_invalidate_whole_items_area(node, &items_area, + search); + } else { + err = ssdfs_invalidate_items_area_partially(node, &items_area, + item_index, + range_len, + search); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to invalidate items area: " + "node_id %u, start_index %u, " + "range_len %u, err %d\n", + node->node_id, item_index, + range_len, err); + goto finish_delete_range; + } + + shift_range_len = locked_len - range_len; + if (shift_range_len != 0) { + err = ssdfs_shift_range_left(node, &items_area, item_size, + item_index + range_len, + shift_range_len, range_len); + if (unlikely(err)) { + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + SSDFS_ERR("fail to shift the range: " + "start %u, count %u, err %d\n", + item_index + range_len, + shift_range_len, + err); + goto finish_delete_range; + } + + err = __ssdfs_btree_node_clear_range(node, + &items_area, item_size, + item_index + shift_range_len, + range_len); + if (unlikely(err)) { + SSDFS_ERR("fail to clear range: " + "start %u, count %u, err %d\n", + item_index + range_len, + shift_range_len, + err); + goto finish_delete_range; + } + } + + down_write(&node->header_lock); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("INITIAL STATE: node_id %u, " + "items_count %u, free_space %u\n", + node->node_id, + node->items_area.items_count, + node->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (node->items_area.items_count < search->request.count) + node->items_area.items_count = 0; + else + node->items_area.items_count -= search->request.count; + + deleted_space = (u32)search->request.count * item_size; + free_space = node->items_area.free_space; + if ((free_space + deleted_space) > node->items_area.area_size) { + err = -ERANGE; + SSDFS_ERR("deleted_space %u, free_space %u, area_size %u\n", + deleted_space, + node->items_area.free_space, + node->items_area.area_size); + goto finish_items_area_correction; + } + node->items_area.free_space += deleted_space; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("NEW STATE: node_id %u, " + "items_count %u, free_space %u\n", + node->node_id, + node->items_area.items_count, + node->items_area.free_space); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (node->items_area.items_count == 0) { + start_hash = U64_MAX; + end_hash = U64_MAX; + } else { + err = ssdfs_invextree_node_get_extent(node, + &node->items_area, + 0, &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to get extent: err %d\n", err); + goto finish_items_area_correction; + } + + seg_id = le64_to_cpu(extent.seg_id); + logical_blk = le32_to_cpu(extent.logical_blk); + + start_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk); + + err = ssdfs_invextree_node_get_extent(node, + &node->items_area, + node->items_area.items_count - 1, + &extent); + if (unlikely(err)) { + SSDFS_ERR("fail to get extent: err %d\n", err); + goto finish_items_area_correction; + } + + seg_id = le64_to_cpu(extent.seg_id); + logical_blk = le32_to_cpu(extent.logical_blk); + len = le32_to_cpu(extent.len); + + end_hash = ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk + len - 1); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("BEFORE: node_id %u, items_area.start_hash %llx, " + "items_area.end_hash %llx\n", + node->node_id, + node->items_area.start_hash, + node->items_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + node->items_area.start_hash = start_hash; + node->items_area.end_hash = end_hash; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("AFTER: node_id %u, items_area.start_hash %llx, " + "items_area.end_hash %llx\n", + node->node_id, + node->items_area.start_hash, + node->items_area.end_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (node->items_area.items_count == 0) + ssdfs_initialize_lookup_table(node); + else { + err = ssdfs_clean_lookup_table(node, + &node->items_area, + node->items_area.items_count); + if (unlikely(err)) { + SSDFS_ERR("fail to clean the rest of lookup table: " + "start_index %u, err %d\n", + node->items_area.items_count, err); + goto finish_items_area_correction; + } + + if (shift_range_len != 0) { + int start_index = + node->items_area.items_count - shift_range_len; + + if (start_index < 0) { + err = -ERANGE; + SSDFS_ERR("invalid start_index %d\n", + start_index); + goto finish_items_area_correction; + } + + err = ssdfs_correct_lookup_table(node, + &node->items_area, + start_index, + shift_range_len); + if (unlikely(err)) { + SSDFS_ERR("fail to correct lookup table: " + "err %d\n", err); + goto finish_items_area_correction; + } + } + } + + hdr = &node->raw.invextree_header; + + old_extents_count = le32_to_cpu(hdr->extents_count); + + if (node->items_area.items_count == 0) { + hdr->extents_count = cpu_to_le32(0); + } else { + if (old_extents_count < search->request.count) { + hdr->extents_count = cpu_to_le32(0); + } else { + extents_count = le32_to_cpu(hdr->extents_count); + extents_count -= search->request.count; + hdr->extents_count = cpu_to_le32(extents_count); + } + } + + extents_count = le32_to_cpu(hdr->extents_count); + extents_diff = old_extents_count - extents_count; + atomic64_sub(extents_diff, &tree_info->extents_count); + + ssdfs_memcpy(&items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + &node->items_area, + 0, sizeof(struct ssdfs_btree_node_items_area), + sizeof(struct ssdfs_btree_node_items_area)); + + err = ssdfs_set_node_header_dirty(node, items_area.items_capacity); + if (unlikely(err)) { + SSDFS_ERR("fail to set header dirty: err %d\n", + err); + goto finish_items_area_correction; + } + + if (extents_count != 0) { + err = ssdfs_set_dirty_items_range(node, + items_area.items_capacity, + item_index, + old_extents_count - item_index); + if (unlikely(err)) { + SSDFS_ERR("fail to set items range as dirty: " + "start %u, count %u, err %d\n", + item_index, + old_extents_count - item_index, + err); + goto finish_items_area_correction; + } + } + +finish_items_area_correction: + up_write(&node->header_lock); + + if (unlikely(err)) + atomic_set(&node->state, SSDFS_BTREE_NODE_CORRUPTED); + +finish_delete_range: + ssdfs_unlock_items_range(node, item_index, locked_len); + up_read(&node->full_lock); + + if (unlikely(err)) + return err; + + switch (atomic_read(&node->type)) { + case SSDFS_BTREE_HYBRID_NODE: + if (extents_count == 0) { + int state; + + down_read(&node->header_lock); + state = atomic_read(&node->index_area.state); + index_count = node->index_area.index_count; + end_hash = node->index_area.end_hash; + up_read(&node->header_lock); + + if (state != SSDFS_BTREE_NODE_INDEX_AREA_EXIST) { + SSDFS_ERR("invalid area state %#x\n", + state); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index_count %u, end_hash %llx, " + "old_hash %llx\n", + index_count, end_hash, old_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (index_count <= 1 || end_hash == old_hash) { + err = ssdfs_btree_node_delete_index(node, + old_hash); + if (unlikely(err)) { + SSDFS_ERR("fail to delete index: " + "old_hash %llx, err %d\n", + old_hash, err); + return err; + } + + if (index_count > 0) + index_count--; + } + } else if (old_hash != start_hash) { + struct ssdfs_btree_index_key old_key, new_key; + + spin_lock(&node->descriptor_lock); + ssdfs_memcpy(&old_key, + 0, sizeof(struct ssdfs_btree_index_key), + &node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + ssdfs_memcpy(&new_key, + 0, sizeof(struct ssdfs_btree_index_key), + &node->node_index, + 0, sizeof(struct ssdfs_btree_index_key), + sizeof(struct ssdfs_btree_index_key)); + spin_unlock(&node->descriptor_lock); + + old_key.index.hash = cpu_to_le64(old_hash); + new_key.index.hash = cpu_to_le64(start_hash); + + err = ssdfs_btree_node_change_index(node, + &old_key, &new_key); + if (unlikely(err)) { + SSDFS_ERR("fail to change index: err %d\n", + err); + return err; + } + } + break; + + default: + /* do nothing */ + break; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("node_type %#x, extents_count %u, index_count %u\n", + atomic_read(&node->type), + extents_count, index_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (extents_count == 0 && index_count == 0) + search->result.state = SSDFS_BTREE_SEARCH_PLEASE_DELETE_NODE; + else + search->result.state = SSDFS_BTREE_SEARCH_OBSOLETE_RESULT; + + if (search->request.type == SSDFS_BTREE_SEARCH_DELETE_RANGE) { + if (search->request.count > range_len) { + search->request.start.hash = items_area.end_hash; + search->request.count -= range_len; + return -EAGAIN; + } + } + + ssdfs_debug_btree_node_object(node); + + return 0; +} + +/* + * ssdfs_invextree_node_delete_item() - delete an item from node + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to delete an item from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_invextree_node_delete_item(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p, " + "search->result.count %d\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child, + search->result.count); + + BUG_ON(search->result.count != 1); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_invextree_node_delete_range(node, search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete extent: err %d\n", + err); + return err; + } + + return 0; +} + +/* + * ssdfs_invextree_node_delete_range() - delete range of items from node + * @node: pointer on node object + * @search: pointer on search request object + * + * This method tries to delete a range of items from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_invextree_node_delete_range(struct ssdfs_btree_node *node, + struct ssdfs_btree_search *search) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_hash %llx, end_hash %llx, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + search->request.start.hash, search->request.end.hash, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = __ssdfs_invextree_node_delete_range(node, search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete extents range: err %d\n", + err); + return err; + } + + return 0; +} + +/* + * ssdfs_invextree_node_extract_range() - extract range of items from node + * @node: pointer on node object + * @start_index: starting index of the range + * @count: count of items in the range + * @search: pointer on search request object + * + * This method tries to extract a range of items from the node. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + * %-ENOMEM - fail to allocate memory. + * %-ENODATA - no such range in the node. + */ +static +int ssdfs_invextree_node_extract_range(struct ssdfs_btree_node *node, + u16 start_index, u16 count, + struct ssdfs_btree_search *search) +{ + struct ssdfs_fs_info *fsi; + struct ssdfs_raw_extent *extent; + u64 seg_id; + u32 logical_blk; + u32 len; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !search); + + SSDFS_DBG("type %#x, flags %#x, " + "start_index %u, count %u, " + "state %#x, node_id %u, height %u, " + "parent %p, child %p\n", + search->request.type, search->request.flags, + start_index, count, + atomic_read(&node->state), node->node_id, + atomic_read(&node->height), search->node.parent, + search->node.child); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + + down_read(&node->full_lock); + err = __ssdfs_btree_node_extract_range(node, start_index, count, + sizeof(struct ssdfs_raw_extent), + search); + up_read(&node->full_lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to extract a range: " + "start %u, count %u, err %d\n", + start_index, count, err); + return err; + } + + search->request.flags = + SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE | + SSDFS_BTREE_SEARCH_HAS_VALID_COUNT; + + extent = (struct ssdfs_raw_extent *)search->result.buf; + + seg_id = le64_to_cpu(extent->seg_id); + logical_blk = le32_to_cpu(extent->logical_blk); + search->request.start.hash = + ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk); + + extent += search->result.count - 1; + + seg_id = le64_to_cpu(extent->seg_id); + logical_blk = le32_to_cpu(extent->logical_blk); + len = le32_to_cpu(extent->len); + search->request.end.hash = + ssdfs_invextree_calculate_hash(fsi, seg_id, + logical_blk + len - 1); + + search->request.count = count; + + return 0; +} + +/* + * ssdfs_invextree_resize_items_area() - resize items area of the node + * @node: node object + * @new_size: new size of the items area + * + * This method tries to resize the items area of the node. + * + * It makes sense to allocate the bitmap with taking into + * account that we will resize the node. So, it needs + * to allocate the index area in bitmap is equal to + * the whole node and items area is equal to the whole node. + * This technique provides opportunity not to resize or + * to shift the content of the bitmap. + * + * RETURN: + * [success] + * [failure] - error code: + * + * %-ERANGE - internal error. + * %-EFAULT - node is corrupted. + */ +static +int ssdfs_invextree_resize_items_area(struct ssdfs_btree_node *node, + u32 new_size) +{ + struct ssdfs_fs_info *fsi; + size_t item_size = sizeof(struct ssdfs_raw_extent); + size_t index_size; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!node || !node->tree || !node->tree->fsi); + BUG_ON(!rwsem_is_locked(&node->full_lock)); + BUG_ON(!rwsem_is_locked(&node->header_lock)); + + SSDFS_DBG("node_id %u, new_size %u\n", + node->node_id, new_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + fsi = node->tree->fsi; + index_size = le16_to_cpu(fsi->vh->invextree.desc.index_size); + + err = __ssdfs_btree_node_resize_items_area(node, + item_size, + index_size, + new_size); + if (unlikely(err)) { + SSDFS_ERR("fail to resize items area: " + "node_id %u, new_size %u, err %d\n", + node->node_id, new_size, err); + return err; + } + + return 0; +} + +void ssdfs_debug_invextree_object(struct ssdfs_invextree_info *tree) +{ +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree); + + SSDFS_DBG("INVALIDATED EXTENTS TREE: state %#x, " + "extents_count %llu, is_locked %d, fsi %p\n", + atomic_read(&tree->state), + (u64)atomic64_read(&tree->extents_count), + rwsem_is_locked(&tree->lock), + tree->fsi); + + ssdfs_debug_btree_object(&tree->generic_tree); +#endif /* CONFIG_SSDFS_DEBUG */ +} + +const struct ssdfs_btree_descriptor_operations ssdfs_invextree_desc_ops = { + .init = ssdfs_invextree_desc_init, + .flush = ssdfs_invextree_desc_flush, +}; + +const struct ssdfs_btree_operations ssdfs_invextree_ops = { + .create_root_node = ssdfs_invextree_create_root_node, + .create_node = ssdfs_invextree_create_node, + .init_node = ssdfs_invextree_init_node, + .destroy_node = ssdfs_invextree_destroy_node, + .add_node = ssdfs_invextree_add_node, + .delete_node = ssdfs_invextree_delete_node, + .pre_flush_root_node = ssdfs_invextree_pre_flush_root_node, + .flush_root_node = ssdfs_invextree_flush_root_node, + .pre_flush_node = ssdfs_invextree_pre_flush_node, + .flush_node = ssdfs_invextree_flush_node, +}; + +const struct ssdfs_btree_node_operations ssdfs_invextree_node_ops = { + .find_item = ssdfs_invextree_node_find_item, + .find_range = ssdfs_invextree_node_find_range, + .extract_range = ssdfs_invextree_node_extract_range, + .allocate_item = ssdfs_invextree_node_allocate_item, + .allocate_range = ssdfs_invextree_node_allocate_range, + .insert_item = ssdfs_invextree_node_insert_item, + .insert_range = ssdfs_invextree_node_insert_range, + .change_item = ssdfs_invextree_node_change_item, + .delete_item = ssdfs_invextree_node_delete_item, + .delete_range = ssdfs_invextree_node_delete_range, + .resize_items_area = ssdfs_invextree_resize_items_area, +}; From patchwork Sat Feb 25 01:09:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151978 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4E06C7EE2E for ; Sat, 25 Feb 2023 01:20:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229800AbjBYBUv (ORCPT ); Fri, 24 Feb 2023 20:20:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49404 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229799AbjBYBTk (ORCPT ); Fri, 24 Feb 2023 20:19:40 -0500 Received: from mail-oi1-x235.google.com (mail-oi1-x235.google.com [IPv6:2607:f8b0:4864:20::235]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2958814EBD for ; Fri, 24 Feb 2023 17:18:11 -0800 (PST) Received: by mail-oi1-x235.google.com with SMTP id bl7so861253oib.0 for ; Fri, 24 Feb 2023 17:18:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=PucdjTi3P9V9CigTxo8OBpDSm3boFOklXYcX1T/0lrQ=; b=S8NKTm5NE0xv9MwNaWGwVmJpngZ6WCXxG0MSyO068CH2zZJuFxlNnrn20tc7TBWZ3P JqeUe2a4nf7IVp6X4FCNEiFJrnnps7ehAHyUNhJVy8wAlb8/DKcNrddOUklwVmPajSDI zc5pmz4uwBugQ6pLYtXWuuBPA/G5MpU7YBJ13tcjRn757iX2hkVcKGY1/mdley8XFFiA 0QnPB5fuNj1waZZ7buvAPpqjIfzyVS85CAgBB0pgBu4b3AjA4vtp1/VH2kHrWbfqWSc9 zSxEhWTjgKWfWx93+hUnE4mjKW2IbDU4zFokPPezSOM2tJScV07n7wD+vuttDIanfgPU 9r3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PucdjTi3P9V9CigTxo8OBpDSm3boFOklXYcX1T/0lrQ=; b=DQ5ZkVGNvKN+Ipbu7yY0Lm0l3O0lRx+SGI8bCRcoZYYzqOJKnjABJlQHCmUM33hvLE RNNBB3TtE2LcPD4qgwfzOAQXLrM2pZ200DptsG99C132B4UxfaAwAZYwUyjYMIYBd6YX f6eBLF/9sylVvnZrGdDZ6K+b0+SarTXv2Sz1JDdyd6tFGrbKqcEw9u6I2EnF88eJ/KNz 4YUGcdZmkFefWH72fh4vpoimcggic6PSNXLIi3MGPH7e3vYEVZ7MtxbPpgGHYYf+CFxL wDtey4tkae9MzY+QCH+9C+wkSTcJ0ilD+yCPVNmv64eGuDI4yR87+JsVWscxxhGSgh78 Uohw== X-Gm-Message-State: AO0yUKWq/n4bIDcxRRkNXmCYq6mTlz5vbtA7pJEuqYI7QQOEv0eHdrwj ksl9KKnl1dKS2vgih+RRyPhVzy3j88o+eNbi X-Google-Smtp-Source: AK7set81GObODcR4TgRMeFCwn6T433MBL/aMX5ob/00ENON2Lo7fshhsw9ZSLGfdf9msPwYqQSPdnw== X-Received: by 2002:a05:6808:34d:b0:383:caf8:f517 with SMTP id j13-20020a056808034d00b00383caf8f517mr5601634oie.14.1677287889621; Fri, 24 Feb 2023 17:18:09 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.18.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:18:08 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 73/76] ssdfs: implement inode operations support Date: Fri, 24 Feb 2023 17:09:24 -0800 Message-Id: <20230225010927.813929-74-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Patch implements operations: (1) ssdfs_read_inode - read raw inode (2) ssdfs_iget - get existing inode (3) ssdfs_new_inode - create new inode (4) ssdfs_getattr - get attributes (5) ssdfs_setattr - set attributes (6) ssdfs_truncate - truncate file (7) ssdfs_setsize - set file size (8) ssdfs_evict_inode - evict inode (9) ssdfs_write_inode - store dirty inode Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/inode.c | 1190 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1190 insertions(+) create mode 100644 fs/ssdfs/inode.c diff --git a/fs/ssdfs/inode.c b/fs/ssdfs/inode.c new file mode 100644 index 000000000000..c9ba7288e357 --- /dev/null +++ b/fs/ssdfs/inode.c @@ -0,0 +1,1190 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/inode.c - inode handling routines. + * + * Copyright (c) 2014-2019 HGST, a Western Digital Company. + * http://www.hgst.com/ + * Copyright (c) 2014-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * + * (C) Copyright 2014-2019, HGST, Inc., All rights reserved. + * + * Created by HGST, San Jose Research Center, Storage Architecture Group + * + * Authors: Viacheslav Dubeyko + * + * Acknowledgement: Cyril Guyot + * Zvonimir Bandic + */ + +#include +#include +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "request_queue.h" +#include "btree_search.h" +#include "btree_node.h" +#include "btree.h" +#include "extents_tree.h" +#include "inodes_tree.h" +#include "dentries_tree.h" +#include "xattr_tree.h" +#include "acl.h" +#include "xattr.h" + +#include + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_inode_page_leaks; +atomic64_t ssdfs_inode_memory_leaks; +atomic64_t ssdfs_inode_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_inode_cache_leaks_increment(void *kaddr) + * void ssdfs_inode_cache_leaks_decrement(void *kaddr) + * void *ssdfs_inode_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_inode_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_inode_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_inode_kfree(void *kaddr) + * struct page *ssdfs_inode_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_inode_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_inode_free_page(struct page *page) + * void ssdfs_inode_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(inode) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(inode) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_inode_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_inode_page_leaks, 0); + atomic64_set(&ssdfs_inode_memory_leaks, 0); + atomic64_set(&ssdfs_inode_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_inode_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_inode_page_leaks) != 0) { + SSDFS_ERR("INODE: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_inode_page_leaks)); + } + + if (atomic64_read(&ssdfs_inode_memory_leaks) != 0) { + SSDFS_ERR("INODE: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_inode_memory_leaks)); + } + + if (atomic64_read(&ssdfs_inode_cache_leaks) != 0) { + SSDFS_ERR("INODE: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_inode_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +bool is_raw_inode_checksum_correct(struct ssdfs_fs_info *fsi, + void *buf, size_t size) +{ + struct ssdfs_inode *raw_inode; + size_t raw_inode_size; + __le32 old_checksum; + bool is_valid = false; + + spin_lock(&fsi->inodes_tree->lock); + raw_inode_size = fsi->inodes_tree->raw_inode_size; + spin_unlock(&fsi->inodes_tree->lock); + + if (raw_inode_size != size) { + SSDFS_WARN("raw_inode_size %zu != size %zu\n", + raw_inode_size, size); + return false; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("RAW INODE DUMP:\n"); + print_hex_dump_bytes("", DUMP_PREFIX_OFFSET, + buf, size); + SSDFS_DBG("\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + raw_inode = (struct ssdfs_inode *)buf; + + old_checksum = raw_inode->checksum; + raw_inode->checksum = 0; + raw_inode->checksum = ssdfs_crc32_le(buf, size); + + is_valid = old_checksum == raw_inode->checksum; + + if (!is_valid) { + SSDFS_WARN("invalid inode's checksum: " + "stored %#x != calculated %#x\n", + le32_to_cpu(old_checksum), + le32_to_cpu(raw_inode->checksum)); + raw_inode->checksum = old_checksum; + } + + return is_valid; +} + +void ssdfs_set_inode_flags(struct inode *inode) +{ + unsigned int flags = SSDFS_I(inode)->flags; + unsigned int new_fl = 0; + + if (flags & FS_SYNC_FL) + new_fl |= S_SYNC; + if (flags & FS_APPEND_FL) + new_fl |= S_APPEND; + if (flags & FS_IMMUTABLE_FL) + new_fl |= S_IMMUTABLE; + if (flags & FS_NOATIME_FL) + new_fl |= S_NOATIME; + if (flags & FS_DIRSYNC_FL) + new_fl |= S_DIRSYNC; + inode_set_flags(inode, new_fl, S_SYNC | S_APPEND | S_IMMUTABLE | + S_NOATIME | S_DIRSYNC); +} + +static int ssdfs_inode_setops(struct inode *inode) +{ + if (S_ISREG(inode->i_mode)) { + inode->i_op = &ssdfs_file_inode_operations; + inode->i_fop = &ssdfs_file_operations; + inode->i_mapping->a_ops = &ssdfs_aops; + } else if (S_ISDIR(inode->i_mode)) { + inode->i_op = &ssdfs_dir_inode_operations; + inode->i_fop = &ssdfs_dir_operations; + inode->i_mapping->a_ops = &ssdfs_aops; + } else if (S_ISLNK(inode->i_mode)) { + inode->i_op = &ssdfs_symlink_inode_operations; + inode->i_mapping->a_ops = &ssdfs_aops; + inode_nohighmem(inode); + } else if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode) || + S_ISFIFO(inode->i_mode) || S_ISSOCK(inode->i_mode)) { + inode->i_op = &ssdfs_special_inode_operations; + init_special_inode(inode, inode->i_mode, inode->i_rdev); + } else { + SSDFS_ERR("bogus i_mode %o for ino %lu\n", + inode->i_mode, (unsigned long)inode->i_ino); + return -EINVAL; + } + + return 0; +} + +static int ssdfs_read_inode(struct inode *inode) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb); + struct ssdfs_btree_search *search; + struct ssdfs_inode *raw_inode; + size_t raw_inode_size; + struct ssdfs_inode_info *ii = SSDFS_I(inode); + u16 private_flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu\n", (unsigned long)inode->i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + search = ssdfs_btree_search_alloc(); + if (!search) { + SSDFS_ERR("fail to allocate btree search object\n"); + return -ENOMEM; + } + + ssdfs_btree_search_init(search); + + err = ssdfs_inodes_btree_find(fsi->inodes_tree, inode->i_ino, search); + if (unlikely(err)) { + SSDFS_ERR("fail to find the raw inode: " + "ino %lu, err %d\n", + inode->i_ino, err); + goto finish_read_inode; + } + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_VALID_ITEM: + /* expected state */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid result state %#x\n", + search->result.state); + goto finish_read_inode; + } + + switch (search->result.buf_state) { + case SSDFS_BTREE_SEARCH_INLINE_BUFFER: + case SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER: + /* expected state */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid buffer state %#x\n", + search->result.buf_state); + goto finish_read_inode; + } + + if (!search->result.buf) { + err = -ERANGE; + SSDFS_ERR("empty result buffer pointer\n"); + goto finish_read_inode; + } + + if (search->result.items_in_buffer == 0) { + err = -ERANGE; + SSDFS_ERR("items_in_buffer %u\n", + search->result.items_in_buffer); + goto finish_read_inode; + } + + raw_inode = (struct ssdfs_inode *)search->result.buf; + raw_inode_size = + search->result.buf_size / search->result.items_in_buffer; + + if (!is_raw_inode_checksum_correct(fsi, raw_inode, raw_inode_size)) { + err = -EIO; + SSDFS_ERR("invalid inode's checksum: ino %lu\n", + inode->i_ino); + goto finish_read_inode; + } + + if (le16_to_cpu(raw_inode->magic) != SSDFS_INODE_MAGIC) { + err = -EIO; + SSDFS_ERR("invalid inode magic %#x\n", + le16_to_cpu(raw_inode->magic)); + goto finish_read_inode; + } + + if (le64_to_cpu(raw_inode->ino) != inode->i_ino) { + err = -EIO; + SSDFS_ERR("raw_inode->ino %llu != i_ino %lu\n", + le64_to_cpu(raw_inode->ino), + inode->i_ino); + goto finish_read_inode; + } + + inode->i_mode = le16_to_cpu(raw_inode->mode); + ii->flags = le32_to_cpu(raw_inode->flags); + i_uid_write(inode, le32_to_cpu(raw_inode->uid)); + i_gid_write(inode, le32_to_cpu(raw_inode->gid)); + set_nlink(inode, le32_to_cpu(raw_inode->refcount)); + + inode->i_atime.tv_sec = le64_to_cpu(raw_inode->atime); + inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->ctime); + inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->mtime); + inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->atime_nsec); + inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->ctime_nsec); + inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->mtime_nsec); + + ii->birthtime.tv_sec = le64_to_cpu(raw_inode->birthtime); + ii->birthtime.tv_nsec = le32_to_cpu(raw_inode->birthtime_nsec); + ii->raw_inode_size = fsi->raw_inode_size; + + inode->i_generation = (u32)le64_to_cpu(raw_inode->generation); + + inode->i_size = le64_to_cpu(raw_inode->size); + inode->i_blkbits = fsi->log_pagesize; + inode->i_blocks = le64_to_cpu(raw_inode->blocks); + + private_flags = le16_to_cpu(raw_inode->private_flags); + atomic_set(&ii->private_flags, private_flags); + if (private_flags & ~SSDFS_INODE_PRIVATE_FLAGS_MASK) { + err = -EIO; + SSDFS_ERR("invalid set of private_flags %#x\n", + private_flags); + goto finish_read_inode; + } + + err = ssdfs_inode_setops(inode); + if (unlikely(err)) + goto finish_read_inode; + + down_write(&ii->lock); + + ii->parent_ino = le64_to_cpu(raw_inode->parent_ino); + ssdfs_set_inode_flags(inode); + ii->name_hash = le64_to_cpu(raw_inode->hash_code); + ii->name_len = le16_to_cpu(raw_inode->name_len); + + ssdfs_memcpy(&ii->raw_inode, + 0, sizeof(struct ssdfs_inode), + raw_inode, + 0, sizeof(struct ssdfs_inode), + sizeof(struct ssdfs_inode)); + + if (S_ISREG(inode->i_mode)) { + if (private_flags & ~SSDFS_IFREG_PRIVATE_FLAG_MASK) { + err = -EIO; + SSDFS_ERR("regular file: invalid private flags %#x\n", + private_flags); + goto unlock_mutable_fields; + } + + if (is_ssdfs_file_inline(ii)) { + err = ssdfs_allocate_inline_file_buffer(inode); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate inline buffer\n"); + goto unlock_mutable_fields; + } + + /* + * TODO: pre-fetch file's content in buffer + * (if inode size > 256 bytes) + */ + } else if (private_flags & SSDFS_INODE_HAS_INLINE_EXTENTS || + private_flags & SSDFS_INODE_HAS_EXTENTS_BTREE) { + err = ssdfs_extents_tree_create(fsi, ii); + if (unlikely(err)) { + SSDFS_ERR("fail to create the extents tree: " + "ino %lu, err %d\n", + inode->i_ino, err); + goto unlock_mutable_fields; + } + + err = ssdfs_extents_tree_init(fsi, ii); + if (unlikely(err)) { + SSDFS_ERR("fail to init the extents tree: " + "ino %lu, err %d\n", + inode->i_ino, err); + goto unlock_mutable_fields; + } + } + + if (private_flags & SSDFS_INODE_HAS_INLINE_XATTR || + private_flags & SSDFS_INODE_HAS_XATTR_BTREE) { + err = ssdfs_xattrs_tree_create(fsi, ii); + if (unlikely(err)) { + SSDFS_ERR("fail to create the xattrs tree: " + "ino %lu, err %d\n", + inode->i_ino, err); + goto unlock_mutable_fields; + } + + err = ssdfs_xattrs_tree_init(fsi, ii); + if (unlikely(err)) { + SSDFS_ERR("fail to init the xattrs tree: " + "ino %lu, err %d\n", + inode->i_ino, err); + goto unlock_mutable_fields; + } + } + } else if (S_ISDIR(inode->i_mode)) { + if (private_flags & ~SSDFS_IFDIR_PRIVATE_FLAG_MASK) { + err = -EIO; + SSDFS_ERR("folder: invalid private flags %#x\n", + private_flags); + goto unlock_mutable_fields; + } + + if (private_flags & SSDFS_INODE_HAS_INLINE_DENTRIES || + private_flags & SSDFS_INODE_HAS_DENTRIES_BTREE) { + err = ssdfs_dentries_tree_create(fsi, ii); + if (unlikely(err)) { + SSDFS_ERR("fail to create the dentries tree: " + "ino %lu, err %d\n", + inode->i_ino, err); + goto unlock_mutable_fields; + } + + err = ssdfs_dentries_tree_init(fsi, ii); + if (unlikely(err)) { + SSDFS_ERR("fail to init the dentries tree: " + "ino %lu, err %d\n", + inode->i_ino, err); + goto unlock_mutable_fields; + } + } + + if (private_flags & SSDFS_INODE_HAS_INLINE_XATTR || + private_flags & SSDFS_INODE_HAS_XATTR_BTREE) { + err = ssdfs_xattrs_tree_create(fsi, ii); + if (unlikely(err)) { + SSDFS_ERR("fail to create the xattrs tree: " + "ino %lu, err %d\n", + inode->i_ino, err); + goto unlock_mutable_fields; + } + + err = ssdfs_xattrs_tree_init(fsi, ii); + if (unlikely(err)) { + SSDFS_ERR("fail to init the xattrs tree: " + "ino %lu, err %d\n", + inode->i_ino, err); + goto unlock_mutable_fields; + } + } + } else if (S_ISLNK(inode->i_mode) || + S_ISCHR(inode->i_mode) || + S_ISBLK(inode->i_mode) || + S_ISFIFO(inode->i_mode) || + S_ISSOCK(inode->i_mode)) { + /* do nothing */ + } else { + err = -EINVAL; + SSDFS_ERR("bogus i_mode %o for ino %lu\n", + inode->i_mode, (unsigned long)inode->i_ino); + goto unlock_mutable_fields; + } + +unlock_mutable_fields: + up_write(&ii->lock); + +finish_read_inode: + ssdfs_btree_search_free(search); + return err; +} + +struct inode *ssdfs_iget(struct super_block *sb, ino_t ino) +{ + struct inode *inode; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu\n", (unsigned long)ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + inode = iget_locked(sb, ino); + if (unlikely(!inode)) { + err = -ENOMEM; + SSDFS_ERR("unable to obtain or to allocate inode %lu, err %d\n", + (unsigned long)ino, err); + return ERR_PTR(err); + } + + if (!(inode->i_state & I_NEW)) { + trace_ssdfs_iget(inode); + return inode; + } + + err = ssdfs_read_inode(inode); + if (unlikely(err)) { + SSDFS_ERR("unable to read inode %lu, err %d\n", + (unsigned long)ino, err); + goto bad_inode; + } + + unlock_new_inode(inode); + trace_ssdfs_iget(inode); + return inode; + +bad_inode: + iget_failed(inode); + trace_ssdfs_iget_exit(inode, err); + return ERR_PTR(err); +} + +static void ssdfs_init_raw_inode(struct ssdfs_inode_info *ii) +{ + struct ssdfs_inode *ri = &ii->raw_inode; + + ri->magic = cpu_to_le16(SSDFS_INODE_MAGIC); + ri->mode = cpu_to_le16(ii->vfs_inode.i_mode); + ri->flags = cpu_to_le32(ii->flags); + ri->uid = cpu_to_le32(i_uid_read(&ii->vfs_inode)); + ri->gid = cpu_to_le32(i_gid_read(&ii->vfs_inode)); + ri->atime = cpu_to_le64(ii->vfs_inode.i_atime.tv_sec); + ri->ctime = cpu_to_le64(ii->vfs_inode.i_ctime.tv_sec); + ri->mtime = cpu_to_le64(ii->vfs_inode.i_mtime.tv_sec); + ri->atime_nsec = cpu_to_le32(ii->vfs_inode.i_atime.tv_nsec); + ri->ctime_nsec = cpu_to_le32(ii->vfs_inode.i_ctime.tv_nsec); + ri->mtime_nsec = cpu_to_le32(ii->vfs_inode.i_mtime.tv_nsec); + ri->birthtime = cpu_to_le64(ii->birthtime.tv_sec); + ri->birthtime_nsec = cpu_to_le32(ii->birthtime.tv_nsec); + ri->generation = cpu_to_le64(ii->vfs_inode.i_generation); + ri->size = cpu_to_le64(i_size_read(&ii->vfs_inode)); + ri->blocks = cpu_to_le64(ii->vfs_inode.i_blocks); + ri->parent_ino = cpu_to_le64(ii->parent_ino); + ri->refcount = cpu_to_le32(ii->vfs_inode.i_nlink); + ri->checksum = 0; + ri->ino = cpu_to_le64(ii->vfs_inode.i_ino); + ri->hash_code = cpu_to_le64(ii->name_hash); + ri->name_len = cpu_to_le16(ii->name_len); +} + +static void ssdfs_init_inode(struct inode *dir, + struct inode *inode, + umode_t mode, + ino_t ino, + const struct qstr *qstr) +{ + struct super_block *sb = dir->i_sb; + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); + struct ssdfs_inode_info *ii = SSDFS_I(inode); + + inode->i_ino = ino; + ii->parent_ino = dir->i_ino; + ii->birthtime = current_time(inode); + ii->raw_inode_size = fsi->raw_inode_size; + inode->i_mtime = inode->i_atime = inode->i_ctime = ii->birthtime; + inode_init_owner(&init_user_ns, inode, dir, mode); + ii->flags = ssdfs_mask_flags(mode, + SSDFS_I(dir)->flags & SSDFS_FL_INHERITED); + ssdfs_set_inode_flags(inode); + inode->i_generation = get_random_u32(); + inode->i_blkbits = fsi->log_pagesize; + i_size_write(inode, 0); + inode->i_blocks = 0; + set_nlink(inode, 1); + + down_write(&ii->lock); + ii->name_hash = ssdfs_generate_name_hash(qstr); + ii->name_len = (u16)qstr->len; + ssdfs_init_raw_inode(ii); + up_write(&ii->lock); +} + +struct inode *ssdfs_new_inode(struct inode *dir, umode_t mode, + const struct qstr *qstr) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(dir->i_sb); + struct super_block *sb = dir->i_sb; + struct inode *inode; + struct ssdfs_btree_search *search; + struct ssdfs_inodes_btree_info *itree; + ino_t ino; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dir_ino %lu, mode %o\n", + (unsigned long)dir->i_ino, mode); +#endif /* CONFIG_SSDFS_DEBUG */ + + itree = fsi->inodes_tree; + + search = ssdfs_btree_search_alloc(); + if (!search) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate btree search object\n"); + goto failed_new_inode; + } + + ssdfs_btree_search_init(search); + err = ssdfs_inodes_btree_allocate(itree, &ino, search); + ssdfs_btree_search_free(search); + + if (err == -ENOSPC) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to allocate an inode: " + "dir_ino %lu, err %d\n", + (unsigned long)dir->i_ino, err); +#endif /* CONFIG_SSDFS_DEBUG */ + goto failed_new_inode; + } else if (unlikely(err)) { + SSDFS_ERR("fail to allocate an inode: " + "dir_ino %lu, err %d\n", + (unsigned long)dir->i_ino, err); + goto failed_new_inode; + } + + inode = new_inode(sb); + if (unlikely(!inode)) { + err = -ENOMEM; + SSDFS_ERR("unable to allocate inode: err %d\n", err); + goto failed_new_inode; + } + + ssdfs_init_inode(dir, inode, mode, ino, qstr); + + err = ssdfs_inode_setops(inode); + if (unlikely(err)) { + SSDFS_ERR("fail to set inode's operations: " + "err %d\n", err); + goto bad_inode; + } + + if (insert_inode_locked(inode) < 0) { + err = -EIO; + SSDFS_ERR("inode number already in use: " + "ino %lu\n", + (unsigned long) ino); + goto bad_inode; + } + + err = ssdfs_init_acl(inode, dir); + if (err) { + SSDFS_ERR("fail to init ACL: " + "err %d\n", err); + goto fail_drop; + } + + err = ssdfs_init_security(inode, dir, qstr); + if (err) { + SSDFS_ERR("fail to init security xattr: " + "err %d\n", err); + goto fail_drop; + } + + mark_inode_dirty(inode); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("new inode %lu is created\n", + ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + trace_ssdfs_inode_new(inode); + return inode; + +fail_drop: + trace_ssdfs_inode_new_exit(inode, err); + clear_nlink(inode); + unlock_new_inode(inode); + iput(inode); + return ERR_PTR(err); + +bad_inode: + trace_ssdfs_inode_new_exit(inode, err); + make_bad_inode(inode); + iput(inode); + +failed_new_inode: + return ERR_PTR(err); +} + +int ssdfs_getattr(struct user_namespace *mnt_userns, + const struct path *path, struct kstat *stat, + u32 request_mask, unsigned int query_flags) +{ + struct inode *inode = d_inode(path->dentry); + struct ssdfs_inode_info *ii = SSDFS_I(inode); + unsigned int flags; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu\n", (unsigned long)inode->i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + flags = ii->flags & SSDFS_FL_USER_VISIBLE; + if (flags & SSDFS_APPEND_FL) + stat->attributes |= STATX_ATTR_APPEND; + if (flags & SSDFS_COMPR_FL) + stat->attributes |= STATX_ATTR_COMPRESSED; + if (flags & SSDFS_IMMUTABLE_FL) + stat->attributes |= STATX_ATTR_IMMUTABLE; + if (flags & SSDFS_NODUMP_FL) + stat->attributes |= STATX_ATTR_NODUMP; + + stat->attributes_mask |= (STATX_ATTR_APPEND | + STATX_ATTR_COMPRESSED | + STATX_ATTR_ENCRYPTED | + STATX_ATTR_IMMUTABLE | + STATX_ATTR_NODUMP); + + generic_fillattr(&init_user_ns, inode, stat); + return 0; +} + +static int ssdfs_truncate(struct inode *inode) +{ + struct ssdfs_inode_info *ii = SSDFS_I(inode); + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu\n", (unsigned long)inode->i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) || + S_ISLNK(inode->i_mode))) + return -EINVAL; + + if (IS_APPEND(inode) || IS_IMMUTABLE(inode)) + return -EPERM; + + if (is_ssdfs_file_inline(ii)) { + loff_t newsize = i_size_read(inode); + size_t inline_capacity = + ssdfs_inode_inline_file_capacity(inode); + + if (newsize > inline_capacity) { + SSDFS_ERR("newsize %llu > inline_capacity %zu\n", + (u64)newsize, inline_capacity); + return -E2BIG; + } else if (newsize == inline_capacity) { + /* do nothing */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("newsize %llu == inline_capacity %zu\n", + (u64)newsize, inline_capacity); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { + loff_t size = inline_capacity - newsize; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!ii->inline_file); +#endif /* CONFIG_SSDFS_DEBUG */ + + memset((u8 *)ii->inline_file + newsize, 0, size); + } + } else { + err = ssdfs_extents_tree_truncate(inode); + if (err == -ENOENT) { + err = 0; + SSDFS_DBG("extents tree is absent\n"); + } else if (unlikely(err)) { + SSDFS_ERR("fail to truncate extents tree: " + "err %d\n", + err); + return err; + } + } + + inode->i_mtime = inode->i_ctime = current_time(inode); + mark_inode_dirty(inode); + + return 0; +} + +int ssdfs_setsize(struct inode *inode, struct iattr *attr) +{ + loff_t oldsize = i_size_read(inode); + loff_t newsize = attr->ia_size; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu\n", (unsigned long)inode->i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) || + S_ISLNK(inode->i_mode))) + return -EINVAL; + + if (IS_APPEND(inode) || IS_IMMUTABLE(inode)) + return -EPERM; + + inode_dio_wait(inode); + + if (newsize > oldsize) { + i_size_write(inode, newsize); + pagecache_isize_extended(inode, oldsize, newsize); + } else { + truncate_setsize(inode, newsize); + + err = ssdfs_truncate(inode); + if (err) + return err; + } + + inode->i_mtime = inode->i_ctime = current_time(inode); + mark_inode_dirty(inode); + return 0; +} + +int ssdfs_setattr(struct user_namespace *mnt_userns, + struct dentry *dentry, struct iattr *attr) +{ + struct inode *inode = dentry->d_inode; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu\n", (unsigned long)inode->i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = setattr_prepare(&init_user_ns, dentry, attr); + if (err) + return err; + + if (S_ISREG(inode->i_mode) && + attr->ia_valid & ATTR_SIZE && + attr->ia_size != inode->i_size) { + err = ssdfs_setsize(inode, attr); + if (err) + return err; + } + + if (attr->ia_valid) { + setattr_copy(&init_user_ns, inode, attr); + mark_inode_dirty(inode); + + if (attr->ia_valid & ATTR_MODE) { + err = posix_acl_chmod(&init_user_ns, + dentry, inode->i_mode); + } + } + + return err; +} + +/* + * This method does all fs work to be done when in-core inode + * is about to be gone, for whatever reason. + */ +void ssdfs_evict_inode(struct inode *inode) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb); + struct ssdfs_xattrs_btree_info *xattrs_tree; + ino_t ino = inode->i_ino; + bool want_delete = false; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu mode %o count %d nlink %u\n", + ino, inode->i_mode, + atomic_read(&inode->i_count), + inode->i_nlink); +#endif /* CONFIG_SSDFS_DEBUG */ + + xattrs_tree = SSDFS_XATTREE(SSDFS_I(inode)); + + if (!inode->i_nlink) { + err = filemap_flush(inode->i_mapping); + if (err) { + SSDFS_WARN("inode %lu flush error: %d\n", + ino, err); + } + } + + err = filemap_fdatawait(inode->i_mapping); + if (err) { + SSDFS_WARN("inode %lu fdatawait error: %d\n", + ino, err); + ssdfs_clear_dirty_pages(inode->i_mapping); + } + + if (!inode->i_nlink && !is_bad_inode(inode)) + want_delete = true; + else + want_delete = false; + + trace_ssdfs_inode_evict(inode); + + truncate_inode_pages_final(&inode->i_data); + + if (want_delete) { + sb_start_intwrite(inode->i_sb); + + i_size_write(inode, 0); + + err = ssdfs_truncate(inode); + if (err) { + SSDFS_WARN("fail to truncate inode: " + "ino %lu, err %d\n", + ino, err); + } + + if (xattrs_tree) { + err = ssdfs_xattrs_tree_delete_all(xattrs_tree); + if (err) { + SSDFS_WARN("fail to truncate xattrs tree: " + "ino %lu, err %d\n", + ino, err); + } + } + } + + clear_inode(inode); + + if (want_delete) { + err = ssdfs_inodes_btree_delete(fsi->inodes_tree, ino); + if (err) { + SSDFS_WARN("fail to deallocate raw inode: " + "ino %lu, err %d\n", + ino, err); + } + + sb_end_intwrite(inode->i_sb); + } +} + +/* + * This method is called when the VFS needs to write an + * inode to disc + */ +int ssdfs_write_inode(struct inode *inode, struct writeback_control *wbc) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb); + struct ssdfs_inode_info *ii = SSDFS_I(inode); + struct ssdfs_inode *ri = &ii->raw_inode; + struct ssdfs_inodes_btree_info *itree; + struct ssdfs_btree_search *search; + struct ssdfs_dentries_btree_descriptor dentries_btree; + bool has_save_dentries_desc = true; + size_t dentries_desc_size = + sizeof(struct ssdfs_dentries_btree_descriptor); + struct ssdfs_extents_btree_descriptor extents_btree; + bool has_save_extents_desc = true; + size_t extents_desc_size = + sizeof(struct ssdfs_extents_btree_descriptor); + struct ssdfs_xattr_btree_descriptor xattr_btree; + bool has_save_xattrs_desc = true; + size_t xattr_desc_size = + sizeof(struct ssdfs_xattr_btree_descriptor); + int private_flags; + size_t raw_inode_size; + ino_t ino; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu\n", (unsigned long)inode->i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_read(&fsi->volume_sem); + raw_inode_size = le16_to_cpu(fsi->vs->inodes_btree.desc.item_size); + ssdfs_memcpy(&dentries_btree, 0, dentries_desc_size, + &fsi->vh->dentries_btree, 0, dentries_desc_size, + dentries_desc_size); + ssdfs_memcpy(&extents_btree, 0, extents_desc_size, + &fsi->vh->extents_btree, 0, extents_desc_size, + extents_desc_size); + ssdfs_memcpy(&xattr_btree, 0, xattr_desc_size, + &fsi->vh->xattr_btree, 0, xattr_desc_size, + xattr_desc_size); + up_read(&fsi->volume_sem); + + if (raw_inode_size != sizeof(struct ssdfs_inode)) { + SSDFS_WARN("raw_inode_size %zu != size %zu\n", + raw_inode_size, + sizeof(struct ssdfs_inode)); + return -ERANGE; + } + + itree = fsi->inodes_tree; + ino = inode->i_ino; + + search = ssdfs_btree_search_alloc(); + if (!search) { + SSDFS_ERR("fail to allocate btree search object\n"); + return -ENOMEM; + } + + ssdfs_btree_search_init(search); + + err = ssdfs_inodes_btree_find(itree, ino, search); + if (unlikely(err)) { +#ifdef CONFIG_SSDFS_TESTING + err = 0; + SSDFS_DBG("fail to find inode: " + "ino %lu, err %d\n", + ino, err); +#else + SSDFS_ERR("fail to find inode: " + "ino %lu, err %d\n", + ino, err); +#endif /* CONFIG_SSDFS_TESTING */ + goto free_search_object; + } + + down_write(&ii->lock); + + ssdfs_init_raw_inode(ii); + + if (S_ISREG(inode->i_mode) && ii->extents_tree) { + err = ssdfs_extents_tree_flush(fsi, ii); + if (unlikely(err)) { + SSDFS_ERR("fail to flush extents tree: " + "ino %lu, err %d\n", + inode->i_ino, err); + goto finish_write_inode; + } + + if (memcmp(&extents_btree, &ii->extents_tree->desc, + extents_desc_size) != 0) { + ssdfs_memcpy(&extents_btree, + 0, extents_desc_size, + &ii->extents_tree->desc, + 0, extents_desc_size, + extents_desc_size); + has_save_extents_desc = true; + } else + has_save_extents_desc = false; + } else if (S_ISDIR(inode->i_mode) && ii->dentries_tree) { + err = ssdfs_dentries_tree_flush(fsi, ii); + if (unlikely(err)) { + SSDFS_ERR("fail to flush dentries tree: " + "ino %lu, err %d\n", + inode->i_ino, err); + goto finish_write_inode; + } + + if (memcmp(&dentries_btree, &ii->dentries_tree->desc, + dentries_desc_size) != 0) { + ssdfs_memcpy(&dentries_btree, + 0, dentries_desc_size, + &ii->dentries_tree->desc, + 0, dentries_desc_size, + dentries_desc_size); + has_save_dentries_desc = true; + } else + has_save_dentries_desc = false; + } + + if (ii->xattrs_tree) { + err = ssdfs_xattrs_tree_flush(fsi, ii); + if (unlikely(err)) { + SSDFS_ERR("fail to flush xattrs tree: " + "ino %lu, err %d\n", + inode->i_ino, err); + goto finish_write_inode; + } + + if (memcmp(&xattr_btree, &ii->xattrs_tree->desc, + xattr_desc_size) != 0) { + ssdfs_memcpy(&xattr_btree, + 0, xattr_desc_size, + &ii->xattrs_tree->desc, + 0, xattr_desc_size, + xattr_desc_size); + has_save_xattrs_desc = true; + } else + has_save_xattrs_desc = false; + } + + private_flags = atomic_read(&ii->private_flags); + if (private_flags & ~SSDFS_INODE_PRIVATE_FLAGS_MASK) { + err = -ERANGE; + SSDFS_WARN("invalid set of private_flags %#x\n", + private_flags); + goto finish_write_inode; + } else + ri->private_flags = cpu_to_le16((u16)private_flags); + + ri->checksum = ssdfs_crc32_le(ri, raw_inode_size); + + switch (search->result.buf_state) { + case SSDFS_BTREE_SEARCH_INLINE_BUFFER: + case SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER: + /* expected state */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid result's buffer state: " + "%#x\n", + search->result.buf_state); + goto finish_write_inode; + } + + if (!search->result.buf) { + err = -ERANGE; + SSDFS_ERR("invalid buffer\n"); + goto finish_write_inode; + } + + if (search->result.buf_size < raw_inode_size) { + err = -ERANGE; + SSDFS_ERR("buf_size %zu < raw_inode_size %zu\n", + search->result.buf_size, + raw_inode_size); + goto finish_write_inode; + } + + if (search->result.items_in_buffer != 1) { + SSDFS_WARN("unexpected value: " + "items_in_buffer %u\n", + search->result.items_in_buffer); + } + + ssdfs_memcpy(search->result.buf, 0, search->result.buf_size, + ri, 0, raw_inode_size, + raw_inode_size); + +finish_write_inode: + up_write(&ii->lock); + + if (unlikely(err)) + goto free_search_object; + + if (has_save_dentries_desc || has_save_extents_desc || + has_save_xattrs_desc) { + down_write(&fsi->volume_sem); + ssdfs_memcpy(&fsi->vh->dentries_btree, + 0, dentries_desc_size, + &dentries_btree, + 0, dentries_desc_size, + dentries_desc_size); + ssdfs_memcpy(&fsi->vh->extents_btree, + 0, extents_desc_size, + &extents_btree, + 0, extents_desc_size, + extents_desc_size); + ssdfs_memcpy(&fsi->vh->xattr_btree, + 0, xattr_desc_size, + &xattr_btree, + 0, xattr_desc_size, + xattr_desc_size); + up_write(&fsi->volume_sem); + } + + err = ssdfs_inodes_btree_change(itree, ino, search); + if (unlikely(err)) { + SSDFS_ERR("fail to change inode: " + "ino %lu, err %d\n", + ino, err); + goto free_search_object; + } + +free_search_object: + ssdfs_btree_search_free(search); + + return err; +} + +/* + * This method is called when the VFS needs + * to get filesystem statistics + */ +int ssdfs_statfs(struct dentry *dentry, struct kstatfs *buf) +{ + struct super_block *sb = dentry->d_sb; + struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb); +#ifdef CONFIG_SSDFS_BLOCK_DEVICE + u64 id = huge_encode_dev(sb->s_bdev->bd_dev); +#endif + u64 nsegs; + u32 pages_per_seg; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu\n", (unsigned long)dentry->d_inode->i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + buf->f_type = SSDFS_SUPER_MAGIC; + buf->f_bsize = sb->s_blocksize; + buf->f_frsize = buf->f_bsize; + + mutex_lock(&fsi->resize_mutex); + nsegs = fsi->nsegs; + mutex_unlock(&fsi->resize_mutex); + + pages_per_seg = fsi->pages_per_seg; + buf->f_blocks = nsegs * pages_per_seg; + + spin_lock(&fsi->volume_state_lock); + buf->f_bfree = fsi->free_pages; + spin_unlock(&fsi->volume_state_lock); + + buf->f_bavail = buf->f_bfree; + + spin_lock(&fsi->inodes_tree->lock); + buf->f_files = fsi->inodes_tree->allocated_inodes; + buf->f_ffree = fsi->inodes_tree->free_inodes; + spin_unlock(&fsi->inodes_tree->lock); + + buf->f_namelen = SSDFS_MAX_NAME_LEN; + +#ifdef CONFIG_SSDFS_MTD_DEVICE + buf->f_fsid.val[0] = SSDFS_SUPER_MAGIC; + buf->f_fsid.val[1] = fsi->mtd->index; +#elif defined(CONFIG_SSDFS_BLOCK_DEVICE) + buf->f_fsid.val[0] = (u32)id; + buf->f_fsid.val[1] = (u32)(id >> 32); +#else + BUILD_BUG(); +#endif + + return 0; +} From patchwork Sat Feb 25 01:09:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151979 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A1B5C6FA8E for ; Sat, 25 Feb 2023 01:21:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229847AbjBYBVY (ORCPT ); Fri, 24 Feb 2023 20:21:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48986 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229801AbjBYBTk (ORCPT ); Fri, 24 Feb 2023 20:19:40 -0500 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 56CA014EBE for ; Fri, 24 Feb 2023 17:18:14 -0800 (PST) Received: by mail-oi1-x22e.google.com with SMTP id s41so7931oiw.13 for ; Fri, 24 Feb 2023 17:18:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5HRbeo0lmE2tUHcCbFwHR3cWTCWG3Ajx43R3eGM9P/0=; b=YMYZO1/3ZX07TJ3XfZz1rpDxYeIMJYXcclSjRrrbt5bz/QIHi1eKOJ6oLV/H6nxvL9 mzQ3JmraUgG4GGTdeNlbuU8J0DjJT13uXS1sWkkMi9bRsc/OurfRWH44RIYZqpR2/exz CogixymZjo6lcXDLvykbY68xcfq7uZANjBxS1ilCWQY17npv0eVsqtD/RxJuM9IGnp0C 6kzKho6qDmd5/4Lw1EoAPBFoJPHUMxmwJJP0sjwtCMlphzyJ/Ls4eCCoFooa51dzDdxv IKNoM96FDEDV+O123za76R9prwZvNVTOea2X9YOnA9hz59D7zVx1gV15pmWjAnRuF0km FmIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5HRbeo0lmE2tUHcCbFwHR3cWTCWG3Ajx43R3eGM9P/0=; b=HmcPfaLGrlgJX/9M14wbI/F6XbD+tQuk/IgReLuwVGc03PGNjlXc/DxkHUWIeTFPDX FZTcG5SYjRkIp8BKftowBBXTjOyaLsovHrgupU/VOKmYj3rWhGSzsVrLCNGg6Rze+2/w KmgFYq78xEKJW3B4T24La6tLbQNFjp6JYNYsVAhMrBPXLe4RwJGP5tiUoWk5XX72oRw3 yngDADMXaYp2U9tVv9qITbWRkHCvX8TCwxyArzhekVD7DqDAX58jVBFisgyQ2x7182YQ hvrA0CUJKPBUwwV0/PONi4rD/Y/WKKGS1ee7vj71EySdf6mV93aU7XtxIv95JETdXb6+ 4qeQ== X-Gm-Message-State: AO0yUKULfEqjgYBlGqXPjcNP4TNpNC9WIRO/b0cYzFVXU/qPcGywX6BY UEWQPj9AzpwPQ6ERoLG77kc2TOCou1G3wkTy X-Google-Smtp-Source: AK7set9fCH46qmNUi9HhoqXdddZ34DpI9T1HymKHU8/JQfMjb653LnD9GlIWMkv2assgSAap5eLxMw== X-Received: by 2002:a05:6808:638c:b0:37a:cef7:ca15 with SMTP id ec12-20020a056808638c00b0037acef7ca15mr4969871oib.18.1677287891877; Fri, 24 Feb 2023 17:18:11 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.18.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:18:11 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 74/76] ssdfs: implement directory operations support Date: Fri, 24 Feb 2023 17:09:25 -0800 Message-Id: <20230225010927.813929-75-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Implement directory operations. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/dir.c | 2071 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 2071 insertions(+) create mode 100644 fs/ssdfs/dir.c diff --git a/fs/ssdfs/dir.c b/fs/ssdfs/dir.c new file mode 100644 index 000000000000..c73393872aae --- /dev/null +++ b/fs/ssdfs/dir.c @@ -0,0 +1,2071 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/dir.c - folder operations. + * + * Copyright (c) 2019-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + */ + +#include +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "btree_search.h" +#include "btree_node.h" +#include "btree.h" +#include "dentries_tree.h" +#include "shared_dictionary.h" +#include "xattr.h" +#include "acl.h" + +#include + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_dir_page_leaks; +atomic64_t ssdfs_dir_memory_leaks; +atomic64_t ssdfs_dir_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_dir_cache_leaks_increment(void *kaddr) + * void ssdfs_dir_cache_leaks_decrement(void *kaddr) + * void *ssdfs_dir_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_dir_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_dir_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_dir_kfree(void *kaddr) + * struct page *ssdfs_dir_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_dir_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_dir_free_page(struct page *page) + * void ssdfs_dir_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(dir) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(dir) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_dir_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_dir_page_leaks, 0); + atomic64_set(&ssdfs_dir_memory_leaks, 0); + atomic64_set(&ssdfs_dir_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_dir_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_dir_page_leaks) != 0) { + SSDFS_ERR("DIR: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_dir_page_leaks)); + } + + if (atomic64_read(&ssdfs_dir_memory_leaks) != 0) { + SSDFS_ERR("DIR: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_dir_memory_leaks)); + } + + if (atomic64_read(&ssdfs_dir_cache_leaks) != 0) { + SSDFS_ERR("DIR: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_dir_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +static unsigned char +ssdfs_filetype_table[SSDFS_FT_MAX] = { + [SSDFS_FT_UNKNOWN] = DT_UNKNOWN, + [SSDFS_FT_REG_FILE] = DT_REG, + [SSDFS_FT_DIR] = DT_DIR, + [SSDFS_FT_CHRDEV] = DT_CHR, + [SSDFS_FT_BLKDEV] = DT_BLK, + [SSDFS_FT_FIFO] = DT_FIFO, + [SSDFS_FT_SOCK] = DT_SOCK, + [SSDFS_FT_SYMLINK] = DT_LNK, +}; + +int ssdfs_inode_by_name(struct inode *dir, + const struct qstr *child, + ino_t *ino) +{ + struct ssdfs_inode_info *ii = SSDFS_I(dir); + struct ssdfs_btree_search *search; + struct ssdfs_dir_entry *raw_dentry; + size_t dentry_size = sizeof(struct ssdfs_dir_entry); + int private_flags; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dir_ino %lu, target_name %s\n", + (unsigned long)dir->i_ino, + child->name); +#endif /* CONFIG_SSDFS_DEBUG */ + + *ino = 0; + private_flags = atomic_read(&ii->private_flags); + + if (private_flags & SSDFS_INODE_HAS_INLINE_DENTRIES || + private_flags & SSDFS_INODE_HAS_DENTRIES_BTREE) { + down_read(&ii->lock); + + if (!ii->dentries_tree) { + err = -ERANGE; + SSDFS_WARN("dentries tree absent!!!\n"); + goto finish_search_dentry; + } + + search = ssdfs_btree_search_alloc(); + if (!search) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate btree search object\n"); + goto finish_search_dentry; + } + + ssdfs_btree_search_init(search); + + err = ssdfs_dentries_tree_find(ii->dentries_tree, + child->name, + child->len, + search); + if (err == -ENODATA) { + err = -ENOENT; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dir %lu hasn't child %s\n", + (unsigned long)dir->i_ino, + child->name); +#endif /* CONFIG_SSDFS_DEBUG */ + goto dentry_is_not_available; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find the dentry: " + "dir %lu, child %s\n", + (unsigned long)dir->i_ino, + child->name); + goto dentry_is_not_available; + } + + if (search->result.state != SSDFS_BTREE_SEARCH_VALID_ITEM) { + err = -ERANGE; + SSDFS_ERR("invalid result's state %#x\n", + search->result.state); + goto dentry_is_not_available; + } + + switch (search->result.buf_state) { + case SSDFS_BTREE_SEARCH_INLINE_BUFFER: + case SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER: + /* expected state */ + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid buffer state %#x\n", + search->result.buf_state); + goto dentry_is_not_available; + } + + if (!search->result.buf) { + err = -ERANGE; + SSDFS_ERR("buffer is absent\n"); + goto dentry_is_not_available; + } + + if (search->result.buf_size < dentry_size) { + err = -ERANGE; + SSDFS_ERR("buf_size %zu < dentry_size %zu\n", + search->result.buf_size, + dentry_size); + goto dentry_is_not_available; + } + + raw_dentry = (struct ssdfs_dir_entry *)search->result.buf; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(le64_to_cpu(raw_dentry->ino) >= U32_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + *ino = (ino_t)le64_to_cpu(raw_dentry->ino); + +dentry_is_not_available: + ssdfs_btree_search_free(search); + +finish_search_dentry: + up_read(&ii->lock); + } else { + err = -ENOENT; +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dentries tree is absent: " + "ino %lu\n", + (unsigned long)dir->i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return err; +} + +/* + * The ssdfs_lookup() is called when the VFS needs + * to look up an inode in a parent directory. + */ +static struct dentry *ssdfs_lookup(struct inode *dir, struct dentry *target, + unsigned int flags) +{ + struct inode *inode = NULL; + ino_t ino; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dir %lu, flags %#x\n", (unsigned long)dir->i_ino, flags); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (target->d_name.len > SSDFS_MAX_NAME_LEN) + return ERR_PTR(-ENAMETOOLONG); + + err = ssdfs_inode_by_name(dir, &target->d_name, &ino); + if (err == -ENOENT) { + err = 0; + ino = 0; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find the inode: " + "err %d\n", + err); + return ERR_PTR(err); + } + + if (ino) { + inode = ssdfs_iget(dir->i_sb, ino); + if (inode == ERR_PTR(-ESTALE)) { + SSDFS_ERR("deleted inode referenced: %lu\n", + (unsigned long)ino); + return ERR_PTR(-EIO); + } + } + + return d_splice_alias(inode, target); +} + +static int ssdfs_add_link(struct inode *dir, struct dentry *dentry, + struct inode *inode) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(dir->i_sb); + struct ssdfs_inode_info *dir_ii = SSDFS_I(dir); + struct ssdfs_inode_info *ii = SSDFS_I(inode); + struct ssdfs_btree_search *search; + int private_flags; + bool is_locked_outside = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("Created ino %lu with mode %o, nlink %d, nrpages %ld\n", + (unsigned long)inode->i_ino, inode->i_mode, + inode->i_nlink, inode->i_mapping->nrpages); +#endif /* CONFIG_SSDFS_DEBUG */ + + private_flags = atomic_read(&dir_ii->private_flags); + is_locked_outside = rwsem_is_locked(&dir_ii->lock); + + if (private_flags & SSDFS_INODE_HAS_INLINE_DENTRIES || + private_flags & SSDFS_INODE_HAS_DENTRIES_BTREE) { + if (!is_locked_outside) { + /* need to lock */ + down_read(&dir_ii->lock); + } + + if (!dir_ii->dentries_tree) { + err = -ERANGE; + SSDFS_WARN("dentries tree absent!!!\n"); + goto finish_add_link; + } + } else { + if (!is_locked_outside) { + /* need to lock */ + down_write(&dir_ii->lock); + } + + if (dir_ii->dentries_tree) { + err = -ERANGE; + SSDFS_WARN("dentries tree exists unexpectedly!!!\n"); + goto finish_create_dentries_tree; + } else { + err = ssdfs_dentries_tree_create(fsi, dir_ii); + if (unlikely(err)) { + SSDFS_ERR("fail to create the dentries tree: " + "ino %lu, err %d\n", + dir->i_ino, err); + goto finish_create_dentries_tree; + } + + atomic_or(SSDFS_INODE_HAS_INLINE_DENTRIES, + &dir_ii->private_flags); + } + +finish_create_dentries_tree: + if (!is_locked_outside) { + /* downgrade the lock */ + downgrade_write(&dir_ii->lock); + } + + if (unlikely(err)) + goto finish_add_link; + } + + search = ssdfs_btree_search_alloc(); + if (!search) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate btree search object\n"); + goto finish_add_link; + } + + ssdfs_btree_search_init(search); + + err = ssdfs_dentries_tree_add(dir_ii->dentries_tree, + &dentry->d_name, + ii, search); + if (unlikely(err)) { + SSDFS_ERR("fail to add the dentry: " + "ino %lu, err %d\n", + inode->i_ino, err); + } else { + dir->i_mtime = dir->i_ctime = current_time(dir); + mark_inode_dirty(dir); + } + + ssdfs_btree_search_free(search); + +finish_add_link: + if (!is_locked_outside) { + /* need to unlock */ + up_read(&dir_ii->lock); + } + + return err; +} + +static int ssdfs_add_nondir(struct inode *dir, struct dentry *dentry, + struct inode *inode) +{ + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("Created ino %lu with mode %o, nlink %d, nrpages %ld\n", + (unsigned long)inode->i_ino, inode->i_mode, + inode->i_nlink, inode->i_mapping->nrpages); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_add_link(dir, dentry, inode); + if (err) { + inode_dec_link_count(inode); + iget_failed(inode); + return err; + } + + unlock_new_inode(inode); + d_instantiate(dentry, inode); + return 0; +} + +/* + * The ssdfs_create() is called by the open(2) and + * creat(2) system calls. + */ +int ssdfs_create(struct user_namespace *mnt_userns, + struct inode *dir, struct dentry *dentry, + umode_t mode, bool excl) +{ + struct inode *inode; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dir %lu, mode %o\n", (unsigned long)dir->i_ino, mode); +#endif /* CONFIG_SSDFS_DEBUG */ + + inode = ssdfs_new_inode(dir, mode, &dentry->d_name); + if (IS_ERR(inode)) { + err = PTR_ERR(inode); + goto failed_create; + } + + mark_inode_dirty(inode); + return ssdfs_add_nondir(dir, dentry, inode); + +failed_create: + return err; +} + +/* + * The ssdfs_mknod() is called by the mknod(2) system call + * to create a device (char, block) inode or a named pipe + * (FIFO) or socket. + */ +static int ssdfs_mknod(struct user_namespace *mnt_userns, + struct inode *dir, struct dentry *dentry, + umode_t mode, dev_t rdev) +{ + struct inode *inode; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dir %lu, mode %o, rdev %#x\n", + (unsigned long)dir->i_ino, mode, rdev); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (dentry->d_name.len > SSDFS_MAX_NAME_LEN) + return -ENAMETOOLONG; + + inode = ssdfs_new_inode(dir, mode, &dentry->d_name); + if (IS_ERR(inode)) + return PTR_ERR(inode); + + init_special_inode(inode, mode, rdev); + + mark_inode_dirty(inode); + return ssdfs_add_nondir(dir, dentry, inode); +} + +/* + * Create symlink. + * The ssdfs_symlink() is called by the symlink(2) system call. + */ +static int ssdfs_symlink(struct user_namespace *mnt_userns, + struct inode *dir, struct dentry *dentry, + const char *target) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(dir->i_sb); + struct inode *inode; + size_t target_len = strlen(target) + 1; + size_t raw_inode_size; + size_t inline_len; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dir %lu, target_len %zu\n", + (unsigned long)dir->i_ino, target_len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (target_len > dir->i_sb->s_blocksize) + return -ENAMETOOLONG; + + down_read(&fsi->volume_sem); + raw_inode_size = le16_to_cpu(fsi->vs->inodes_btree.desc.item_size); + up_read(&fsi->volume_sem); + + inline_len = offsetof(struct ssdfs_inode, internal); + + if (raw_inode_size <= inline_len) { + SSDFS_ERR("invalid raw inode size %zu\n", + raw_inode_size); + return -EFAULT; + } + + inline_len = raw_inode_size - inline_len; + + inode = ssdfs_new_inode(dir, S_IFLNK | S_IRWXUGO, &dentry->d_name); + if (IS_ERR(inode)) + return PTR_ERR(inode); + + if (target_len > inline_len) { + /* slow symlink */ + inode_nohighmem(inode); + + err = page_symlink(inode, target, target_len); + if (err) + goto out_fail; + } else { + /* fast symlink */ + down_write(&SSDFS_I(inode)->lock); + inode->i_link = (char *)SSDFS_I(inode)->raw_inode.internal; + memcpy(inode->i_link, target, target_len); + inode->i_size = target_len - 1; + atomic_or(SSDFS_INODE_HAS_INLINE_FILE, + &SSDFS_I(inode)->private_flags); + up_write(&SSDFS_I(inode)->lock); + } + + mark_inode_dirty(inode); + return ssdfs_add_nondir(dir, dentry, inode); + +out_fail: + inode_dec_link_count(inode); + iget_failed(inode); + return err; +} + +/* + * Create hardlink. + * The ssdfs_link() is called by the link(2) system call. + */ +static int ssdfs_link(struct dentry *old_dentry, struct inode *dir, + struct dentry *dentry) +{ + struct inode *inode = d_inode(old_dentry); + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dir %lu, inode %lu\n", + (unsigned long)dir->i_ino, (unsigned long)inode->i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (inode->i_nlink >= SSDFS_LINK_MAX) + return -EMLINK; + + if (!S_ISREG(inode->i_mode)) + return -EPERM; + + inode->i_ctime = current_time(inode); + inode_inc_link_count(inode); + ihold(inode); + + err = ssdfs_add_link(dir, dentry, inode); + if (err) { + inode_dec_link_count(inode); + iput(inode); + return err; + } + + d_instantiate(dentry, inode); + return 0; +} + +/* + * Set the first fragment of directory. + */ +static int ssdfs_make_empty(struct inode *inode, struct inode *parent) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb); + struct ssdfs_inode_info *ii = SSDFS_I(inode); + struct ssdfs_inode_info *parent_ii = SSDFS_I(parent); + struct ssdfs_btree_search *search; + int private_flags; + struct qstr dot = QSTR_INIT(".", 1); + struct qstr dotdot = QSTR_INIT("..", 2); + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("Created ino %lu with mode %o, nlink %d, nrpages %ld\n", + (unsigned long)inode->i_ino, inode->i_mode, + inode->i_nlink, inode->i_mapping->nrpages); +#endif /* CONFIG_SSDFS_DEBUG */ + + private_flags = atomic_read(&ii->private_flags); + + if (private_flags & SSDFS_INODE_HAS_INLINE_DENTRIES || + private_flags & SSDFS_INODE_HAS_DENTRIES_BTREE) { + down_read(&ii->lock); + + if (!ii->dentries_tree) { + err = -ERANGE; + SSDFS_WARN("dentries tree absent!!!\n"); + goto finish_make_empty_dir; + } + } else { + down_write(&ii->lock); + + if (ii->dentries_tree) { + err = -ERANGE; + SSDFS_WARN("dentries tree exists unexpectedly!!!\n"); + goto finish_create_dentries_tree; + } else { + err = ssdfs_dentries_tree_create(fsi, ii); + if (unlikely(err)) { + SSDFS_ERR("fail to create the dentries tree: " + "ino %lu, err %d\n", + inode->i_ino, err); + goto finish_create_dentries_tree; + } + + atomic_or(SSDFS_INODE_HAS_INLINE_DENTRIES, + &ii->private_flags); + } + +finish_create_dentries_tree: + downgrade_write(&ii->lock); + + if (unlikely(err)) + goto finish_make_empty_dir; + } + + search = ssdfs_btree_search_alloc(); + if (!search) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate btree search object\n"); + goto finish_make_empty_dir; + } + + ssdfs_btree_search_init(search); + + err = ssdfs_dentries_tree_add(ii->dentries_tree, + &dot, ii, search); + if (unlikely(err)) { + SSDFS_ERR("fail to add dentry: " + "ino %lu, err %d\n", + inode->i_ino, err); + goto free_search_object; + } + + err = ssdfs_dentries_tree_add(ii->dentries_tree, + &dotdot, parent_ii, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to add dentry: " + "ino %lu, err %d\n", + parent->i_ino, err); + goto free_search_object; + } + +free_search_object: + ssdfs_btree_search_free(search); + +finish_make_empty_dir: + up_read(&ii->lock); + + return err; +} + +/* + * Create subdirectory. + * The ssdfs_mkdir() is called by the mkdir(2) system call. + */ +static int ssdfs_mkdir(struct user_namespace *mnt_userns, + struct inode *dir, struct dentry *dentry, umode_t mode) +{ + struct inode *inode; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dir %lu, mode %o\n", + (unsigned long)dir->i_ino, mode); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (dentry->d_name.len > SSDFS_MAX_NAME_LEN) + return -ENAMETOOLONG; + + inode_inc_link_count(dir); + + inode = ssdfs_new_inode(dir, S_IFDIR | mode, &dentry->d_name); + err = PTR_ERR(inode); + if (IS_ERR(inode)) + goto out_dir; + + inode_inc_link_count(inode); + + err = ssdfs_make_empty(inode, dir); + if (err) + goto out_fail; + + err = ssdfs_add_link(dir, dentry, inode); + if (err) + goto out_fail; + + d_instantiate(dentry, inode); + unlock_new_inode(inode); + return 0; + +out_fail: + inode_dec_link_count(inode); + inode_dec_link_count(inode); + unlock_new_inode(inode); + iput(inode); +out_dir: + inode_dec_link_count(dir); + return err; +} + +/* + * Delete inode. + * The ssdfs_unlink() is called by the unlink(2) system call. + */ +static int ssdfs_unlink(struct inode *dir, struct dentry *dentry) +{ + struct ssdfs_inode_info *ii = SSDFS_I(dir); + struct inode *inode = d_inode(dentry); + struct ssdfs_btree_search *search; + int private_flags; + u64 name_hash; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dir %lu, inode %lu\n", + (unsigned long)dir->i_ino, (unsigned long)inode->i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + trace_ssdfs_unlink_enter(dir, dentry); + + private_flags = atomic_read(&ii->private_flags); + + if (private_flags & SSDFS_INODE_HAS_INLINE_DENTRIES || + private_flags & SSDFS_INODE_HAS_DENTRIES_BTREE) { + down_read(&ii->lock); + + if (!ii->dentries_tree) { + err = -ERANGE; + SSDFS_WARN("dentries tree absent!!!\n"); + goto finish_delete_dentry; + } + + search = ssdfs_btree_search_alloc(); + if (!search) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate btree search object\n"); + goto finish_delete_dentry; + } + + ssdfs_btree_search_init(search); + + name_hash = ssdfs_generate_name_hash(&dentry->d_name); + if (name_hash >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid name hash\n"); + goto dentry_is_not_available; + } + + err = ssdfs_dentries_tree_delete(ii->dentries_tree, + name_hash, + inode->i_ino, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to delete the dentry: " + "name_hash %llx, ino %lu, err %d\n", + name_hash, inode->i_ino, err); + goto dentry_is_not_available; + } + +dentry_is_not_available: + ssdfs_btree_search_free(search); + +finish_delete_dentry: + up_read(&ii->lock); + + if (unlikely(err)) + goto finish_unlink; + } else { + err = -ENOENT; + SSDFS_ERR("dentries tree is absent\n"); + goto finish_unlink; + } + + mark_inode_dirty(dir); + mark_inode_dirty(inode); + inode->i_ctime = dir->i_ctime = dir->i_mtime = current_time(dir); + inode_dec_link_count(inode); + +finish_unlink: + trace_ssdfs_unlink_exit(inode, err); + return err; +} + +static inline bool ssdfs_empty_dir(struct inode *dir) +{ + struct ssdfs_inode_info *ii = SSDFS_I(dir); + bool is_empty = false; + int private_flags; + u64 dentries_count; + u64 threshold = 2; /* . and .. */ + + private_flags = atomic_read(&ii->private_flags); + + if (private_flags & SSDFS_INODE_HAS_INLINE_DENTRIES || + private_flags & SSDFS_INODE_HAS_DENTRIES_BTREE) { + down_read(&ii->lock); + + if (!ii->dentries_tree) { + SSDFS_WARN("dentries tree absent!!!\n"); + is_empty = true; + } else { + dentries_count = + atomic64_read(&ii->dentries_tree->dentries_count); + + if (dentries_count > threshold) { + /* not empty folder */ + is_empty = false; + } else if (dentries_count < threshold) { + SSDFS_WARN("unexpected dentries count %llu\n", + dentries_count); + is_empty = true; + } else + is_empty = true; + } + + up_read(&ii->lock); + } else { + /* dentries tree is absent */ + is_empty = true; + } + + return is_empty; +} + +/* + * Delete subdirectory. + * The ssdfs_rmdir() is called by the rmdir(2) system call. + */ +static int ssdfs_rmdir(struct inode *dir, struct dentry *dentry) +{ + struct inode *inode = d_inode(dentry); + int err = -ENOTEMPTY; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("dir %lu, subdir %lu\n", + (unsigned long)dir->i_ino, (unsigned long)inode->i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (ssdfs_empty_dir(inode)) { + err = ssdfs_unlink(dir, dentry); + if (!err) { + inode->i_size = 0; + inode_dec_link_count(inode); + inode_dec_link_count(dir); + } + } + + return err; +} + +enum { + SSDFS_FIRST_INODE_LOCK = 0, + SSDFS_SECOND_INODE_LOCK, + SSDFS_THIRD_INODE_LOCK, + SSDFS_FOURTH_INODE_LOCK, +}; + +static void lock_4_inodes(struct inode *inode1, struct inode *inode2, + struct inode *inode3, struct inode *inode4) +{ + down_write_nested(&SSDFS_I(inode1)->lock, SSDFS_FIRST_INODE_LOCK); + + if (inode2 != inode1) { + down_write_nested(&SSDFS_I(inode2)->lock, + SSDFS_SECOND_INODE_LOCK); + } + + if (inode3) { + down_write_nested(&SSDFS_I(inode3)->lock, + SSDFS_THIRD_INODE_LOCK); + } + + if (inode4) { + down_write_nested(&SSDFS_I(inode4)->lock, + SSDFS_FOURTH_INODE_LOCK); + } +} + +static void unlock_4_inodes(struct inode *inode1, struct inode *inode2, + struct inode *inode3, struct inode *inode4) +{ + if (inode4) + up_write(&SSDFS_I(inode4)->lock); + if (inode3) + up_write(&SSDFS_I(inode3)->lock); + if (inode1 != inode2) + up_write(&SSDFS_I(inode2)->lock); + up_write(&SSDFS_I(inode1)->lock); +} + +/* + * Regular rename. + */ +static int ssdfs_rename_target(struct inode *old_dir, + struct dentry *old_dentry, + struct inode *new_dir, + struct dentry *new_dentry, + unsigned int flags) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(old_dir->i_sb); + struct ssdfs_inode_info *old_dir_ii = SSDFS_I(old_dir); + struct ssdfs_inode_info *new_dir_ii = SSDFS_I(new_dir); + struct inode *old_inode = d_inode(old_dentry); + struct ssdfs_inode_info *old_ii = SSDFS_I(old_inode); + struct inode *new_inode = d_inode(new_dentry); + struct ssdfs_btree_search *search; + struct qstr dotdot = QSTR_INIT("..", 2); + bool is_dir = S_ISDIR(old_inode->i_mode); + bool move = (new_dir != old_dir); + bool unlink = new_inode != NULL; + ino_t old_ino, old_parent_ino, new_ino; + struct timespec64 time; + u64 name_hash; + int err = -ENOENT; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("old_dir %lu, old_inode %lu, " + "new_dir %lu, new_inode %p\n", + (unsigned long)old_dir->i_ino, + (unsigned long)old_inode->i_ino, + (unsigned long)new_dir->i_ino, + new_inode); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_inode_by_name(old_dir, &old_dentry->d_name, &old_ino); + if (unlikely(err)) { + SSDFS_ERR("fail to find old dentry: err %d\n", err); + goto out; + } else if (old_ino != old_inode->i_ino) { + err = -ERANGE; + SSDFS_ERR("invalid ino: found ino %lu != requested ino %lu\n", + old_ino, old_inode->i_ino); + goto out; + } + + if (S_ISDIR(old_inode->i_mode)) { + err = ssdfs_inode_by_name(old_inode, &dotdot, &old_parent_ino); + if (unlikely(err)) { + SSDFS_ERR("fail to find parent dentry: err %d\n", err); + goto out; + } else if (old_parent_ino != old_dir->i_ino) { + err = -ERANGE; + SSDFS_ERR("invalid ino: " + "found ino %lu != requested ino %lu\n", + old_parent_ino, old_dir->i_ino); + goto out; + } + } + + if (!old_dir_ii->dentries_tree) { + err = -ERANGE; + SSDFS_ERR("old dir hasn't dentries tree\n"); + goto out; + } + + if (!new_dir_ii->dentries_tree) { + err = -ERANGE; + SSDFS_ERR("new dir hasn't dentries tree\n"); + goto out; + } + + if (S_ISDIR(old_inode->i_mode) && !old_ii->dentries_tree) { + err = -ERANGE; + SSDFS_ERR("old inode hasn't dentries tree\n"); + goto out; + } + + if (flags & RENAME_WHITEOUT) { + /* TODO: implement support */ + SSDFS_WARN("TODO: implement support of RENAME_WHITEOUT\n"); + } + + search = ssdfs_btree_search_alloc(); + if (!search) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate btree search object\n"); + goto out; + } + + ssdfs_btree_search_init(search); + + lock_4_inodes(old_dir, new_dir, old_inode, new_inode); + + if (new_inode) { + err = -ENOTEMPTY; + if (is_dir && !ssdfs_empty_dir(new_inode)) + goto finish_target_rename; + + err = ssdfs_inode_by_name(new_dir, &new_dentry->d_name, + &new_ino); + if (unlikely(err)) { + SSDFS_ERR("fail to find new dentry: err %d\n", err); + goto finish_target_rename; + } else if (new_ino != new_inode->i_ino) { + err = -ERANGE; + SSDFS_ERR("invalid ino: " + "found ino %lu != requested ino %lu\n", + new_ino, new_inode->i_ino); + goto finish_target_rename; + } + + name_hash = ssdfs_generate_name_hash(&new_dentry->d_name); + + err = ssdfs_dentries_tree_change(new_dir_ii->dentries_tree, + name_hash, + new_inode->i_ino, + &old_dentry->d_name, + old_ii, + search); + if (unlikely(err)) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "fail to update dentry: err %d\n", + err); + goto finish_target_rename; + } + } else { + err = ssdfs_add_link(new_dir, new_dentry, old_inode); + if (unlikely(err)) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "fail to add the link: err %d\n", + err); + goto finish_target_rename; + } + } + + name_hash = ssdfs_generate_name_hash(&old_dentry->d_name); + + err = ssdfs_dentries_tree_delete(old_dir_ii->dentries_tree, + name_hash, + old_inode->i_ino, + search); + if (unlikely(err)) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "fail to delete the dentry: " + "name_hash %llx, ino %lu, err %d\n", + name_hash, old_inode->i_ino, err); + goto finish_target_rename; + } + + if (is_dir && move) { + /* update ".." directory entry info of old dentry */ + name_hash = ssdfs_generate_name_hash(&dotdot); + err = ssdfs_dentries_tree_change(old_ii->dentries_tree, + name_hash, old_dir->i_ino, + &dotdot, new_dir_ii, + search); + if (unlikely(err)) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "fail to update dentry: err %d\n", + err); + goto finish_target_rename; + } + } + + old_ii->parent_ino = new_dir->i_ino; + + /* + * Like most other Unix systems, set the @i_ctime for inodes on a + * rename. + */ + time = current_time(old_dir); + old_inode->i_ctime = time; + mark_inode_dirty(old_inode); + + /* We must adjust parent link count when renaming directories */ + if (is_dir) { + if (move) { + /* + * @old_dir loses a link because we are moving + * @old_inode to a different directory. + */ + inode_dec_link_count(old_dir); + /* + * @new_dir only gains a link if we are not also + * overwriting an existing directory. + */ + if (!unlink) + inode_inc_link_count(new_dir); + } else { + /* + * @old_inode is not moving to a different directory, + * but @old_dir still loses a link if we are + * overwriting an existing directory. + */ + if (unlink) + inode_dec_link_count(old_dir); + } + } + + old_dir->i_mtime = old_dir->i_ctime = time; + new_dir->i_mtime = new_dir->i_ctime = time; + + /* + * And finally, if we unlinked a direntry which happened to have the + * same name as the moved direntry, we have to decrement @i_nlink of + * the unlinked inode and change its ctime. + */ + if (unlink) { + /* + * Directories cannot have hard-links, so if this is a + * directory, just clear @i_nlink. + */ + if (is_dir) { + clear_nlink(new_inode); + mark_inode_dirty(new_inode); + } else + inode_dec_link_count(new_inode); + new_inode->i_ctime = time; + } + +finish_target_rename: + unlock_4_inodes(old_dir, new_dir, old_inode, new_inode); + ssdfs_btree_search_free(search); + +out: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; +} + +/* + * Cross-directory rename. + */ +static int ssdfs_cross_rename(struct inode *old_dir, + struct dentry *old_dentry, + struct inode *new_dir, + struct dentry *new_dentry) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(old_dir->i_sb); + struct ssdfs_inode_info *old_dir_ii = SSDFS_I(old_dir); + struct ssdfs_inode_info *new_dir_ii = SSDFS_I(new_dir); + struct inode *old_inode = d_inode(old_dentry); + struct ssdfs_inode_info *old_ii = SSDFS_I(old_inode); + struct inode *new_inode = d_inode(new_dentry); + struct ssdfs_inode_info *new_ii = SSDFS_I(new_inode); + struct ssdfs_btree_search *search; + struct qstr dotdot = QSTR_INIT("..", 2); + ino_t old_ino, new_ino; + struct timespec64 time; + u64 name_hash; + int err = -ENOENT; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("old_dir %lu, old_inode %lu, new_dir %lu\n", + (unsigned long)old_dir->i_ino, + (unsigned long)old_inode->i_ino, + (unsigned long)new_dir->i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_inode_by_name(old_dir, &old_dentry->d_name, &old_ino); + if (unlikely(err)) { + SSDFS_ERR("fail to find old dentry: err %d\n", err); + goto out; + } else if (old_ino != old_inode->i_ino) { + err = -ERANGE; + SSDFS_ERR("invalid ino: found ino %lu != requested ino %lu\n", + old_ino, old_inode->i_ino); + goto out; + } + + err = ssdfs_inode_by_name(new_dir, &new_dentry->d_name, &new_ino); + if (unlikely(err)) { + SSDFS_ERR("fail to find new dentry: err %d\n", err); + goto out; + } else if (new_ino != new_inode->i_ino) { + err = -ERANGE; + SSDFS_ERR("invalid ino: found ino %lu != requested ino %lu\n", + new_ino, new_inode->i_ino); + goto out; + } + + if (!old_dir_ii->dentries_tree) { + err = -ERANGE; + SSDFS_ERR("old dir hasn't dentries tree\n"); + goto out; + } + + if (!new_dir_ii->dentries_tree) { + err = -ERANGE; + SSDFS_ERR("new dir hasn't dentries tree\n"); + goto out; + } + + if (S_ISDIR(old_inode->i_mode) && !old_ii->dentries_tree) { + err = -ERANGE; + SSDFS_ERR("old inode hasn't dentries tree\n"); + goto out; + } + + if (S_ISDIR(new_inode->i_mode) && !new_ii->dentries_tree) { + err = -ERANGE; + SSDFS_ERR("new inode hasn't dentries tree\n"); + goto out; + } + + search = ssdfs_btree_search_alloc(); + if (!search) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate btree search object\n"); + goto out; + } + + ssdfs_btree_search_init(search); + name_hash = ssdfs_generate_name_hash(&dotdot); + + lock_4_inodes(old_dir, new_dir, old_inode, new_inode); + + /* update ".." directory entry info of old dentry */ + if (S_ISDIR(old_inode->i_mode)) { + err = ssdfs_dentries_tree_change(old_ii->dentries_tree, + name_hash, old_dir->i_ino, + &dotdot, new_dir_ii, + search); + if (unlikely(err)) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "fail to update dentry: err %d\n", + err); + goto finish_cross_rename; + } + } + + /* update ".." directory entry info of new dentry */ + if (S_ISDIR(new_inode->i_mode)) { + err = ssdfs_dentries_tree_change(new_ii->dentries_tree, + name_hash, new_dir->i_ino, + &dotdot, old_dir_ii, + search); + if (unlikely(err)) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "fail to update dentry: err %d\n", + err); + goto finish_cross_rename; + } + } + + /* update directory entry info of old dir inode */ + name_hash = ssdfs_generate_name_hash(&old_dentry->d_name); + + err = ssdfs_dentries_tree_change(old_dir_ii->dentries_tree, + name_hash, old_inode->i_ino, + &new_dentry->d_name, new_ii, + search); + if (unlikely(err)) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "fail to update dentry: err %d\n", + err); + goto finish_cross_rename; + } + + /* update directory entry info of new dir inode */ + name_hash = ssdfs_generate_name_hash(&new_dentry->d_name); + + err = ssdfs_dentries_tree_change(new_dir_ii->dentries_tree, + name_hash, new_inode->i_ino, + &old_dentry->d_name, old_ii, + search); + if (unlikely(err)) { + ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__, + "fail to update dentry: err %d\n", + err); + goto finish_cross_rename; + } + + old_ii->parent_ino = new_dir->i_ino; + new_ii->parent_ino = old_dir->i_ino; + + time = current_time(old_dir); + old_inode->i_ctime = time; + new_inode->i_ctime = time; + old_dir->i_mtime = old_dir->i_ctime = time; + new_dir->i_mtime = new_dir->i_ctime = time; + + if (old_dir != new_dir) { + if (S_ISDIR(old_inode->i_mode) && + !S_ISDIR(new_inode->i_mode)) { + inode_inc_link_count(new_dir); + inode_dec_link_count(old_dir); + } + else if (!S_ISDIR(old_inode->i_mode) && + S_ISDIR(new_inode->i_mode)) { + inode_dec_link_count(new_dir); + inode_inc_link_count(old_dir); + } + } + + mark_inode_dirty(old_inode); + mark_inode_dirty(new_inode); + +finish_cross_rename: + unlock_4_inodes(old_dir, new_dir, old_inode, new_inode); + ssdfs_btree_search_free(search); + +out: + return err; +} + +/* + * The ssdfs_rename() is called by the rename(2) system call + * to rename the object to have the parent and name given by + * the second inode and dentry. + */ +static int ssdfs_rename(struct user_namespace *mnt_userns, + struct inode *old_dir, struct dentry *old_dentry, + struct inode *new_dir, struct dentry *new_dentry, + unsigned int flags) +{ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("old_dir %lu, old_inode %lu, new_dir %lu\n", + (unsigned long)old_dir->i_ino, + (unsigned long)old_dentry->d_inode->i_ino, + (unsigned long)new_dir->i_ino); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE | RENAME_WHITEOUT)) { + SSDFS_ERR("invalid flags %#x\n", flags); + return -EINVAL; + } + + if (flags & RENAME_EXCHANGE) { + return ssdfs_cross_rename(old_dir, old_dentry, + new_dir, new_dentry); + } + + return ssdfs_rename_target(old_dir, old_dentry, new_dir, new_dentry, + flags); +} + +static +int ssdfs_dentries_tree_get_start_hash(struct ssdfs_dentries_btree_info *tree, + u64 *start_hash) +{ + struct ssdfs_btree_index *index; + struct ssdfs_dir_entry *cur_dentry; + u64 dentries_count; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !start_hash); + + SSDFS_DBG("tree %p, start_hash %p\n", + tree, start_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + *start_hash = U64_MAX; + + switch (atomic_read(&tree->state)) { + case SSDFS_DENTRIES_BTREE_CREATED: + case SSDFS_DENTRIES_BTREE_INITIALIZED: + case SSDFS_DENTRIES_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + dentries_count = atomic64_read(&tree->dentries_count); + + if (dentries_count < 2) { + SSDFS_WARN("folder is corrupted: " + "dentries_count %llu\n", + dentries_count); + return -ERANGE; + } else if (dentries_count == 2) + return -ENOENT; + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_DENTRIES_ARRAY: + down_read(&tree->lock); + + if (!tree->inline_dentries) { + err = -ERANGE; + SSDFS_ERR("inline tree's pointer is empty\n"); + goto finish_process_inline_tree; + } + + cur_dentry = &tree->inline_dentries[0]; + *start_hash = le64_to_cpu(cur_dentry->hash_code); + +finish_process_inline_tree: + up_read(&tree->lock); + + if (*start_hash >= U64_MAX) { + /* warn about invalid hash code */ + SSDFS_WARN("inline array: hash_code is invalid\n"); + } + break; + + case SSDFS_PRIVATE_DENTRIES_BTREE: + down_read(&tree->lock); + + if (!tree->root) { + err = -ERANGE; + SSDFS_ERR("root node pointer is NULL\n"); + goto finish_get_start_hash; + } + + index = &tree->root->indexes[SSDFS_ROOT_NODE_LEFT_LEAF_NODE]; + *start_hash = le64_to_cpu(index->hash); + +finish_get_start_hash: + up_read(&tree->lock); + + if (*start_hash >= U64_MAX) { + /* warn about invalid hash code */ + SSDFS_WARN("private dentry: hash_code is invalid\n"); + } + break; + + default: + err = -ERANGE; + SSDFS_ERR("invalid tree type %#x\n", + atomic_read(&tree->type)); + break; + } + + return err; +} + +static +int ssdfs_dentries_tree_get_next_hash(struct ssdfs_dentries_btree_info *tree, + struct ssdfs_btree_search *search, + u64 *next_hash) +{ + u64 old_hash; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!tree || !search || !next_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + old_hash = le64_to_cpu(search->node.found_index.index.hash); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("search %p, next_hash %p, old (node %u, hash %llx)\n", + search, next_hash, search->node.id, old_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_DENTRIES_ARRAY: + SSDFS_DBG("inline dentries array is unsupported\n"); + return -ENOENT; + + case SSDFS_PRIVATE_DENTRIES_BTREE: + /* expected tree type */ + break; + + default: + SSDFS_ERR("invalid tree type %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + + down_read(&tree->lock); + err = ssdfs_btree_get_next_hash(tree->generic_tree, search, next_hash); + up_read(&tree->lock); + + return err; +} + +static +int ssdfs_dentries_tree_node_hash_range(struct ssdfs_dentries_btree_info *tree, + struct ssdfs_btree_search *search, + u64 *start_hash, u64 *end_hash, + u16 *items_count) +{ + struct ssdfs_dir_entry *cur_dentry; + u64 dentries_count; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search || !start_hash || !end_hash || !items_count); + + SSDFS_DBG("search %p, start_hash %p, " + "end_hash %p, items_count %p\n", + search, start_hash, end_hash, items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + + *start_hash = *end_hash = U64_MAX; + *items_count = 0; + + switch (atomic_read(&tree->state)) { + case SSDFS_DENTRIES_BTREE_CREATED: + case SSDFS_DENTRIES_BTREE_INITIALIZED: + case SSDFS_DENTRIES_BTREE_DIRTY: + /* expected state */ + break; + + default: + SSDFS_ERR("invalid dentries tree's state %#x\n", + atomic_read(&tree->state)); + return -ERANGE; + }; + + switch (atomic_read(&tree->type)) { + case SSDFS_INLINE_DENTRIES_ARRAY: + dentries_count = atomic64_read(&tree->dentries_count); + if (dentries_count >= U16_MAX) { + err = -ERANGE; + SSDFS_ERR("unexpected dentries count %llu\n", + dentries_count); + goto finish_extract_hash_range; + } + + *items_count = (u16)dentries_count; + + if (*items_count == 0) + goto finish_extract_hash_range; + + down_read(&tree->lock); + + if (!tree->inline_dentries) { + err = -ERANGE; + SSDFS_ERR("inline tree's pointer is empty\n"); + goto finish_process_inline_tree; + } + + cur_dentry = &tree->inline_dentries[0]; + *start_hash = le64_to_cpu(cur_dentry->hash_code); + + if (dentries_count > SSDFS_INLINE_DENTRIES_COUNT) { + err = -ERANGE; + SSDFS_ERR("dentries_count %llu > max_value %u\n", + dentries_count, + SSDFS_INLINE_DENTRIES_COUNT); + goto finish_process_inline_tree; + } + + cur_dentry = &tree->inline_dentries[dentries_count - 1]; + *end_hash = le64_to_cpu(cur_dentry->hash_code); + +finish_process_inline_tree: + up_read(&tree->lock); + break; + + case SSDFS_PRIVATE_DENTRIES_BTREE: + err = ssdfs_btree_node_get_hash_range(search, + start_hash, + end_hash, + items_count); + if (unlikely(err)) { + SSDFS_ERR("fail to get hash range: err %d\n", + err); + goto finish_extract_hash_range; + } + break; + + default: + SSDFS_ERR("invalid tree type %#x\n", + atomic_read(&tree->type)); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_hash %llx, end_hash %llx, items_count %u\n", + *start_hash, *end_hash, *items_count); +#endif /* CONFIG_SSDFS_DEBUG */ + +finish_extract_hash_range: + return err; +} + +static +int ssdfs_dentries_tree_check_search_result(struct ssdfs_btree_search *search) +{ + size_t dentry_size = sizeof(struct ssdfs_dir_entry); + u16 items_count; + size_t buf_size; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!search); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (search->result.state) { + case SSDFS_BTREE_SEARCH_VALID_ITEM: + /* expected state */ + break; + + default: + SSDFS_ERR("unexpected result's state %#x\n", + search->result.state); + return -ERANGE; + } + + switch (search->result.buf_state) { + case SSDFS_BTREE_SEARCH_INLINE_BUFFER: + case SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER: + if (!search->result.buf) { + SSDFS_ERR("buffer pointer is NULL\n"); + return -ERANGE; + } + break; + + default: + SSDFS_ERR("unexpected buffer's state\n"); + return -ERANGE; + } + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(search->result.items_in_buffer >= U16_MAX); +#endif /* CONFIG_SSDFS_DEBUG */ + + items_count = (u16)search->result.items_in_buffer; + + if (items_count == 0) { + SSDFS_ERR("items_in_buffer %u\n", + items_count); + return -ENOENT; + } else if (items_count != search->result.count) { + SSDFS_ERR("items_count %u != search->result.count %u\n", + items_count, search->result.count); + return -ERANGE; + } + + buf_size = dentry_size * items_count; + + if (buf_size != search->result.buf_size) { + SSDFS_ERR("buf_size %zu != search->result.buf_size %zu\n", + buf_size, + search->result.buf_size); + return -ERANGE; + } + + return 0; +} + +static +bool is_invalid_dentry(struct ssdfs_dir_entry *dentry) +{ + u8 name_len; + bool is_invalid = false; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!dentry); + + SSDFS_DBG("dentry_type %#x, file_type %#x, " + "flags %#x, name_len %u, " + "hash_code %llx, ino %llu\n", + dentry->dentry_type, dentry->file_type, + dentry->flags, dentry->name_len, + le64_to_cpu(dentry->hash_code), + le64_to_cpu(dentry->ino)); +#endif /* CONFIG_SSDFS_DEBUG */ + + switch (dentry->dentry_type) { + case SSDFS_INLINE_DENTRY: + case SSDFS_REGULAR_DENTRY: + /* expected dentry type */ + break; + + default: + is_invalid = true; + SSDFS_ERR("invalid dentry type %#x\n", + dentry->dentry_type); + goto finish_check; + } + + if (dentry->file_type <= SSDFS_FT_UNKNOWN || + dentry->file_type >= SSDFS_FT_MAX) { + is_invalid = true; + SSDFS_ERR("invalid file type %#x\n", + dentry->file_type); + goto finish_check; + } + + if (dentry->flags & ~SSDFS_DENTRY_FLAGS_MASK) { + is_invalid = true; + SSDFS_ERR("invalid set of flags %#x\n", + dentry->flags); + goto finish_check; + } + + name_len = dentry->name_len; + + if (name_len > SSDFS_MAX_NAME_LEN) { + is_invalid = true; + SSDFS_ERR("invalid name_len %u\n", + name_len); + goto finish_check; + } + + if (le64_to_cpu(dentry->hash_code) >= U64_MAX) { + is_invalid = true; + SSDFS_ERR("invalid hash_code\n"); + goto finish_check; + } + + if (le64_to_cpu(dentry->ino) >= U32_MAX) { + is_invalid = true; + SSDFS_ERR("ino %llu is too huge\n", + le64_to_cpu(dentry->ino)); + goto finish_check; + } + +finish_check: + if (is_invalid) { + SSDFS_ERR("dentry_type %#x, file_type %#x, " + "flags %#x, name_len %u, " + "hash_code %llx, ino %llu\n", + dentry->dentry_type, dentry->file_type, + dentry->flags, dentry->name_len, + le64_to_cpu(dentry->hash_code), + le64_to_cpu(dentry->ino)); + } + + return is_invalid; +} + +/* + * The ssdfs_readdir() is called when the VFS needs + * to read the directory contents. + */ +static int ssdfs_readdir(struct file *file, struct dir_context *ctx) +{ + struct inode *inode = file_inode(file); + struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb); + struct ssdfs_inode_info *ii = SSDFS_I(inode); + struct qstr dot = QSTR_INIT(".", 1); + u64 dot_hash; + struct qstr dotdot = QSTR_INIT("..", 2); + u64 dotdot_hash; + struct ssdfs_shared_dict_btree_info *dict; + struct ssdfs_btree_search *search; + struct ssdfs_dir_entry *dentry; + size_t dentry_size = sizeof(struct ssdfs_dir_entry); + int private_flags; + u64 start_hash = U64_MAX; + u64 end_hash = U64_MAX; + u64 hash = U64_MAX; + u64 start_pos; + u16 items_count; + ino_t ino; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("file %p, ctx %p\n", file, ctx); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (ctx->pos < 0) { + SSDFS_DBG("ctx->pos %lld\n", ctx->pos); + return 0; + } + + dict = fsi->shdictree; + if (!dict) { + SSDFS_ERR("shared dictionary is absent\n"); + return -ERANGE; + } + + dot_hash = ssdfs_generate_name_hash(&dot); + dotdot_hash = ssdfs_generate_name_hash(&dotdot); + + private_flags = atomic_read(&ii->private_flags); + + if (private_flags & SSDFS_INODE_HAS_INLINE_DENTRIES || + private_flags & SSDFS_INODE_HAS_DENTRIES_BTREE) { + down_read(&ii->lock); + if (!ii->dentries_tree) + err = -ERANGE; + up_read(&ii->lock); + + if (unlikely(err)) { + SSDFS_WARN("dentries tree is absent\n"); + return -ERANGE; + } + } else { + if (!S_ISDIR(inode->i_mode)) { + SSDFS_WARN("this is not folder!!!\n"); + return -EINVAL; + } + + down_read(&ii->lock); + if (ii->dentries_tree) + err = -ERANGE; + up_read(&ii->lock); + + if (unlikely(err)) { + SSDFS_WARN("dentries tree exists!!!!\n"); + return err; + } + } + + start_pos = ctx->pos; + + if (ctx->pos == 0) { + err = ssdfs_inode_by_name(inode, &dot, &ino); + if (unlikely(err)) { + SSDFS_ERR("fail to find dentry: err %d\n", err); + goto out; + } + + if (!dir_emit_dot(file, ctx)) { + err = -ERANGE; + SSDFS_ERR("fail to emit dentry\n"); + goto out; + } + + ctx->pos = 1; + } + + if (ctx->pos == 1) { + err = ssdfs_inode_by_name(inode, &dotdot, &ino); + if (unlikely(err)) { + SSDFS_ERR("fail to find dentry: err %d\n", err); + goto out; + } + + if (!dir_emit_dotdot(file, ctx)) { + err = -ERANGE; + SSDFS_ERR("fail to emit dentry\n"); + goto out; + } + + ctx->pos = 2; + } + + if (ctx->pos >= 2) { + down_read(&ii->lock); + err = ssdfs_dentries_tree_get_start_hash(ii->dentries_tree, + &start_hash); + up_read(&ii->lock); + + if (err == -ENOENT) { + err = 0; + ctx->pos = 2; + goto out; + } else if (unlikely(err)) { + SSDFS_ERR("fail to get start root hash: err %d\n", err); + goto out; + } else if (start_hash >= U64_MAX) { + err = -ERANGE; + SSDFS_ERR("invalid hash value\n"); + goto out; + } + + ctx->pos = 2; + } + + search = ssdfs_btree_search_alloc(); + if (!search) { + err = -ENOMEM; + SSDFS_ERR("fail to allocate btree search object\n"); + goto out; + } + + do { + ssdfs_btree_search_init(search); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ctx->pos %llu, start_hash %llx\n", + (u64)ctx->pos, start_hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + /* allow readdir() to be interrupted */ + if (fatal_signal_pending(current)) { + err = -ERESTARTSYS; + goto out_free; + } + cond_resched(); + + down_read(&ii->lock); + + err = ssdfs_dentries_tree_find_leaf_node(ii->dentries_tree, + start_hash, + search); + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to find a leaf node: " + "hash %llx, err %d\n", + start_hash, err); +#endif /* CONFIG_SSDFS_DEBUG */ + goto finish_tree_processing; + } else if (unlikely(err)) { + SSDFS_ERR("fail to find a leaf node: " + "hash %llx, err %d\n", + start_hash, err); + goto finish_tree_processing; + } + + err = ssdfs_dentries_tree_node_hash_range(ii->dentries_tree, + search, + &start_hash, + &end_hash, + &items_count); + if (unlikely(err)) { + SSDFS_ERR("fail to get node's hash range: " + "err %d\n", err); + goto finish_tree_processing; + } + + if (items_count == 0) { + err = -ENOENT; + SSDFS_DBG("empty leaf node\n"); + goto finish_tree_processing; + } + + if (start_hash > end_hash) { + err = -ENOENT; + goto finish_tree_processing; + } + + err = ssdfs_dentries_tree_extract_range(ii->dentries_tree, + 0, items_count, + search); + if (unlikely(err)) { + SSDFS_ERR("fail to extract the range: " + "items_count %u, err %d\n", + items_count, err); + goto finish_tree_processing; + } + +finish_tree_processing: + up_read(&ii->lock); + + if (err == -ENODATA) { + err = 0; + goto out_free; + } else if (unlikely(err)) + goto out_free; + + err = ssdfs_dentries_tree_check_search_result(search); + if (unlikely(err)) { + SSDFS_ERR("corrupted search result: " + "err %d\n", err); + goto out_free; + } + + items_count = search->result.count; + + for (i = 0; i < items_count; i++) { + u8 *start_ptr = (u8 *)search->result.buf; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_pos %llu, ctx->pos %llu\n", + start_pos, ctx->pos); +#endif /* CONFIG_SSDFS_DEBUG */ + + dentry = (struct ssdfs_dir_entry *)(start_ptr + + (i * dentry_size)); + hash = le64_to_cpu(dentry->hash_code); + + if (ctx->pos < start_pos) { + if (dot_hash == hash || dotdot_hash == hash) { + /* skip counting */ + continue; + } else { + ctx->pos++; + continue; + } + } + + if (is_invalid_dentry(dentry)) { + err = -EIO; + SSDFS_ERR("found corrupted dentry\n"); + goto out_free; + } + + if (dot_hash == hash || dotdot_hash == hash) { + /* + * These items were created already. + * Simply skip the case. + */ + } else if (dentry->flags & SSDFS_DENTRY_HAS_EXTERNAL_STRING) { + err = ssdfs_shared_dict_get_name(dict, hash, + &search->name); + if (unlikely(err)) { + SSDFS_ERR("fail to extract the name: " + "hash %llx, err %d\n", + hash, err); + goto out_free; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ctx->pos %llu, name %s, " + "name_len %zu, " + "ino %llu, hash %llx\n", + ctx->pos, + search->name.str, + search->name.len, + le64_to_cpu(dentry->ino), + hash); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!dir_emit(ctx, + search->name.str, + search->name.len, + (ino_t)le64_to_cpu(dentry->ino), + ssdfs_filetype_table[dentry->file_type])) { + /* stopped by some reason */ + err = 1; + goto out_free; + } else + ctx->pos++; + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ctx->pos %llu, name %s, " + "name_len %u, " + "ino %llu, hash %llx\n", + ctx->pos, + dentry->inline_string, + dentry->name_len, + le64_to_cpu(dentry->ino), + hash); + SSDFS_DBG("dentry %p, name %p\n", + dentry, dentry->inline_string); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!dir_emit(ctx, + dentry->inline_string, + dentry->name_len, + (ino_t)le64_to_cpu(dentry->ino), + ssdfs_filetype_table[dentry->file_type])) { + /* stopped by some reason */ + err = 1; + goto out_free; + } else + ctx->pos++; + } + } + + if (hash != end_hash) { + err = -ERANGE; + SSDFS_ERR("hash %llx < end_hash %llx\n", + hash, end_hash); + goto out_free; + } + + start_hash = end_hash + 1; + + down_read(&ii->lock); + err = ssdfs_dentries_tree_get_next_hash(ii->dentries_tree, + search, + &start_hash); + up_read(&ii->lock); + + ssdfs_btree_search_forget_parent_node(search); + ssdfs_btree_search_forget_child_node(search); + + if (err == -ENOENT) { + err = 0; + ctx->pos = U64_MAX; + SSDFS_DBG("no more items in the folder\n"); + goto out_free; + } else if (unlikely(err)) { + SSDFS_ERR("fail to get next hash: err %d\n", + err); + goto out_free; + } + } while (start_hash < U64_MAX); + +out_free: + ssdfs_btree_search_free(search); + +out: +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("finished\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; +} + +const struct inode_operations ssdfs_dir_inode_operations = { + .create = ssdfs_create, + .lookup = ssdfs_lookup, + .link = ssdfs_link, + .unlink = ssdfs_unlink, + .symlink = ssdfs_symlink, + .mkdir = ssdfs_mkdir, + .rmdir = ssdfs_rmdir, + .mknod = ssdfs_mknod, + .rename = ssdfs_rename, + .setattr = ssdfs_setattr, + .listxattr = ssdfs_listxattr, + .get_inode_acl = ssdfs_get_acl, + .set_acl = ssdfs_set_acl, +}; + +const struct file_operations ssdfs_dir_operations = { + .read = generic_read_dir, + .iterate_shared = ssdfs_readdir, + .unlocked_ioctl = ssdfs_ioctl, + .fsync = ssdfs_fsync, + .llseek = generic_file_llseek, +}; From patchwork Sat Feb 25 01:09:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151981 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A72BC7EE2F for ; Sat, 25 Feb 2023 01:21:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229850AbjBYBVZ (ORCPT ); Fri, 24 Feb 2023 20:21:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49408 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229802AbjBYBTk (ORCPT ); Fri, 24 Feb 2023 20:19:40 -0500 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C95215165 for ; Fri, 24 Feb 2023 17:18:15 -0800 (PST) Received: by mail-oi1-x22e.google.com with SMTP id o12so805104oik.6 for ; Fri, 24 Feb 2023 17:18:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=nSf5Hi6No/q/6dleGU3lmrW6Qw9rlCHUll6SgrFQxaE=; b=T9TdIftTIoAIYnbWpMYSRROshFqxBJP4sR1A4YIPBvv8t0FRMGn3A3YWM7uYwmy4wa 1U08WgtfMuMmRjwCHyW9KFV0rtd3LkO6e2MvaVkPlxU6boDWGnC8wGD3tkC2mHEu+LUn wN21Uhr5GG6BaQmA+0FygF9qrdC7lkcqPhQeC185oSKwQKFO2nlPKqgxWjhD6tH+XqUO XwywB6ItlKfu6lTSncNEJCQOXPVzL6QfsoxL4vfUDRazFVZzKOm3DzeEfCUdvr/govTJ cz3PAyPEjsYZdzN2BG7amzXeD/5N1Rm8uFPKHaa0e+ntalsmTkKddYbbiD41C1kHxH3x 8yrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nSf5Hi6No/q/6dleGU3lmrW6Qw9rlCHUll6SgrFQxaE=; b=2kw7XrqSi57gXP0yjf/wWC5n6p2083YXiFdH+A0S7HC6eH0rfZlOujsgmF5DEnld8/ C/y13Avp7+4yWTp2jBf5UIjnfkpU651xIsVoy72/4XT24lTuUNw3J3P/v8HIfqaJZ3hp kSnh3Ro6qsbzPaJcW0IDSu0y5sYepfJcAnM2WxoDREc5bznhXLOJPd1VLt9Yh1K5q5oT k5CvhLy+CwIOdP5Ua2dlH89nqhoDKzM6YTDarZxMqa4bT3gelpgOHPjUb5/77ylCOisr Q736KRiiUsL5KOqmc1YrDBK02O7uGFXgalDSjnkc/d+qqeVcxeEkxnd6XFMlKcfOzp8g nMEg== X-Gm-Message-State: AO0yUKXZXUl2BoHlQ1AdIfhct5zxuo34BCWZs1rS2ORwLy7nfiqMfhAA PgITzX78lRaTnVeagT/mt75C6DNjUlRyxgOe X-Google-Smtp-Source: AK7set9ypx7RXwg2Lg6LlRtFYpsqKdfA5Q0gcVrw3V/meXP/YW4X3XSCJc4pSpeO0nzVD4jtt+cNOA== X-Received: by 2002:a54:418a:0:b0:384:637:a4f with SMTP id 10-20020a54418a000000b0038406370a4fmr1299414oiy.10.1677287893769; Fri, 24 Feb 2023 17:18:13 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.18.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:18:12 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 75/76] ssdfs: implement file operations support Date: Fri, 24 Feb 2023 17:09:26 -0800 Message-Id: <20230225010927.813929-76-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Implement file operations support. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/ssdfs/file.c | 2523 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 2523 insertions(+) create mode 100644 fs/ssdfs/file.c diff --git a/fs/ssdfs/file.c b/fs/ssdfs/file.c new file mode 100644 index 000000000000..24110db0e209 --- /dev/null +++ b/fs/ssdfs/file.c @@ -0,0 +1,2523 @@ +// SPDX-License-Identifier: BSD-3-Clause-Clear +/* + * SSDFS -- SSD-oriented File System. + * + * fs/ssdfs/file.c - file operations. + * + * Copyright (c) 2019-2023 Viacheslav Dubeyko + * http://www.ssdfs.org/ + * All rights reserved. + * + * Authors: Viacheslav Dubeyko + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "peb_mapping_queue.h" +#include "peb_mapping_table_cache.h" +#include "ssdfs.h" +#include "request_queue.h" +#include "offset_translation_table.h" +#include "page_array.h" +#include "page_vector.h" +#include "peb_container.h" +#include "segment_bitmap.h" +#include "segment.h" +#include "btree_search.h" +#include "btree_node.h" +#include "btree.h" +#include "inodes_tree.h" +#include "extents_tree.h" +#include "xattr.h" +#include "acl.h" +#include "peb_mapping_table.h" + +#include + +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING +atomic64_t ssdfs_file_page_leaks; +atomic64_t ssdfs_file_memory_leaks; +atomic64_t ssdfs_file_cache_leaks; +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +/* + * void ssdfs_file_cache_leaks_increment(void *kaddr) + * void ssdfs_file_cache_leaks_decrement(void *kaddr) + * void *ssdfs_file_kmalloc(size_t size, gfp_t flags) + * void *ssdfs_file_kzalloc(size_t size, gfp_t flags) + * void *ssdfs_file_kcalloc(size_t n, size_t size, gfp_t flags) + * void ssdfs_file_kfree(void *kaddr) + * struct page *ssdfs_file_alloc_page(gfp_t gfp_mask) + * struct page *ssdfs_file_add_pagevec_page(struct pagevec *pvec) + * void ssdfs_file_free_page(struct page *page) + * void ssdfs_file_pagevec_release(struct pagevec *pvec) + */ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + SSDFS_MEMORY_LEAKS_CHECKER_FNS(file) +#else + SSDFS_MEMORY_ALLOCATOR_FNS(file) +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ + +void ssdfs_file_memory_leaks_init(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + atomic64_set(&ssdfs_file_page_leaks, 0); + atomic64_set(&ssdfs_file_memory_leaks, 0); + atomic64_set(&ssdfs_file_cache_leaks, 0); +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +void ssdfs_file_check_memory_leaks(void) +{ +#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING + if (atomic64_read(&ssdfs_file_page_leaks) != 0) { + SSDFS_ERR("FILE: " + "memory leaks include %lld pages\n", + atomic64_read(&ssdfs_file_page_leaks)); + } + + if (atomic64_read(&ssdfs_file_memory_leaks) != 0) { + SSDFS_ERR("FILE: " + "memory allocator suffers from %lld leaks\n", + atomic64_read(&ssdfs_file_memory_leaks)); + } + + if (atomic64_read(&ssdfs_file_cache_leaks) != 0) { + SSDFS_ERR("FILE: " + "caches suffers from %lld leaks\n", + atomic64_read(&ssdfs_file_cache_leaks)); + } +#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */ +} + +enum { + SSDFS_BLOCK_BASED_REQUEST, + SSDFS_EXTENT_BASED_REQUEST, +}; + +enum { + SSDFS_CURRENT_THREAD_READ, + SSDFS_DELEGATE_TO_READ_THREAD, +}; + +static inline +bool can_file_be_inline(struct inode *inode, loff_t new_size) +{ + size_t capacity = ssdfs_inode_inline_file_capacity(inode); + + if (capacity == 0) + return false; + + if (capacity < new_size) + return false; + + return true; +} + +static inline +size_t ssdfs_inode_size_threshold(void) +{ + return sizeof(struct ssdfs_inode) - + offsetof(struct ssdfs_inode, internal); +} + +int ssdfs_allocate_inline_file_buffer(struct inode *inode) +{ + struct ssdfs_inode_info *ii = SSDFS_I(inode); + size_t threshold = ssdfs_inode_size_threshold(); + size_t inline_capacity; + + if (ii->inline_file) + return 0; + + inline_capacity = ssdfs_inode_inline_file_capacity(inode); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("inline_capacity %zu, threshold %zu\n", + inline_capacity, threshold); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (inline_capacity < threshold) { + SSDFS_ERR("inline_capacity %zu < threshold %zu\n", + inline_capacity, threshold); + return -ERANGE; + } else if (inline_capacity == threshold) { + ii->inline_file = ii->raw_inode.internal; + } else { + ii->inline_file = + ssdfs_file_kzalloc(inline_capacity, GFP_KERNEL); + if (!ii->inline_file) { + SSDFS_ERR("fail to allocate inline buffer: " + "ino %lu, inline_capacity %zu\n", + inode->i_ino, inline_capacity); + return -ENOMEM; + } + } + + return 0; +} + +void ssdfs_destroy_inline_file_buffer(struct inode *inode) +{ + struct ssdfs_inode_info *ii = SSDFS_I(inode); + size_t threshold = ssdfs_inode_size_threshold(); + size_t inline_capacity; + + if (!ii->inline_file) + return; + + inline_capacity = ssdfs_inode_inline_file_capacity(inode); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("inline_capacity %zu, threshold %zu\n", + inline_capacity, threshold); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (inline_capacity <= threshold) { + ii->inline_file = NULL; + } else { + ssdfs_file_kfree(ii->inline_file); + ii->inline_file = NULL; + } +} + +/* + * ssdfs_read_block_async() - read block async + * @fsi: pointer on shared file system object + * @req: request object + */ +static +int ssdfs_read_block_async(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req) +{ + struct ssdfs_segment_info *si; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !req); + BUG_ON((req->extent.logical_offset >> fsi->log_pagesize) >= U32_MAX); + + SSDFS_DBG("fsi %p, req %p\n", fsi, req); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_prepare_volume_extent(fsi, req); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare volume extent: " + "ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, " + "parent_snapshot %llu, err %d\n", + req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, + req->extent.cno, + req->extent.parent_snapshot, + err); + return err; + } + + req->place.len = 1; + + si = ssdfs_grab_segment(fsi, SSDFS_USER_DATA_SEG_TYPE, + req->place.start.seg_id, U64_MAX); + if (unlikely(IS_ERR_OR_NULL(si))) { + SSDFS_ERR("fail to grab segment object: " + "seg %llu, err %ld\n", + req->place.start.seg_id, + PTR_ERR(si)); + return PTR_ERR(si); + } + + err = ssdfs_segment_read_block_async(si, SSDFS_REQ_ASYNC, req); + if (unlikely(err)) { + SSDFS_ERR("read request failed: " + "ino %llu, logical_offset %llu, size %u, err %d\n", + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, err); + return err; + } + + ssdfs_segment_put_object(si); + + return 0; +} + +/* + * ssdfs_read_block_by_current_thread() - read block by current thread + * @fsi: pointer on shared file system object + * @req: request object + */ +static +int ssdfs_read_block_by_current_thread(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req) +{ + struct ssdfs_segment_info *si; + struct ssdfs_peb_container *pebc; + struct ssdfs_blk2off_table *table; + struct ssdfs_offset_position pos; + u16 logical_blk; + struct completion *end; + int i; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !req); + BUG_ON((req->extent.logical_offset >> fsi->log_pagesize) >= U32_MAX); + + SSDFS_DBG("fsi %p, req %p\n", fsi, req); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_prepare_volume_extent(fsi, req); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare volume extent: " + "ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, " + "parent_snapshot %llu, err %d\n", + req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, + req->extent.cno, + req->extent.parent_snapshot, + err); + return err; + } + + req->place.len = 1; + + si = ssdfs_grab_segment(fsi, SSDFS_USER_DATA_SEG_TYPE, + req->place.start.seg_id, U64_MAX); + if (unlikely(IS_ERR_OR_NULL(si))) { + SSDFS_ERR("fail to grab segment object: " + "seg %llu, err %d\n", + req->place.start.seg_id, err); + return PTR_ERR(si); + } + + ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ, + SSDFS_READ_PAGE, + SSDFS_REQ_SYNC, + req); + ssdfs_request_define_segment(si->seg_id, req); + + table = si->blk2off_table; + logical_blk = req->place.start.blk_index; + + err = ssdfs_blk2off_table_get_offset_position(table, logical_blk, &pos); + if (err == -EAGAIN) { + end = &table->full_init_end; + + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("blk2off init failed: " + "err %d\n", err); + goto finish_read_block; + } + + err = ssdfs_blk2off_table_get_offset_position(table, + logical_blk, + &pos); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to convert: " + "logical_blk %u, err %d\n", + logical_blk, err); + goto finish_read_block; + } + + pebc = &si->peb_array[pos.peb_index]; + + ssdfs_peb_read_request_cno(pebc); + + err = ssdfs_peb_read_page(pebc, req, &end); + if (err == -EAGAIN) { + err = SSDFS_WAIT_COMPLETION(end); + if (unlikely(err)) { + SSDFS_ERR("PEB init failed: " + "err %d\n", err); + goto forget_request_cno; + } + + err = ssdfs_peb_read_page(pebc, req, &end); + } + + if (unlikely(err)) { + SSDFS_ERR("fail to read page: err %d\n", + err); + goto forget_request_cno; + } + + for (i = 0; i < req->result.processed_blks; i++) + ssdfs_peb_mark_request_block_uptodate(pebc, req, i); + +forget_request_cno: + ssdfs_peb_finish_read_request_cno(pebc); + +finish_read_block: + req->result.err = err; + complete(&req->result.wait); + ssdfs_segment_put_object(si); + + return 0; +} + +static +int ssdfs_readpage_nolock(struct file *file, struct page *page, + int read_mode) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(file_inode(file)->i_sb); + struct inode *inode = file_inode(file); + struct ssdfs_inode_info *ii = SSDFS_I(inode); + ino_t ino = file_inode(file)->i_ino; + pgoff_t index = page_index(page); + struct ssdfs_segment_request *req; + loff_t logical_offset; + loff_t data_bytes; + loff_t file_size; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu, page_index %llu, read_mode %#x\n", + ino, (u64)index, read_mode); +#endif /* CONFIG_SSDFS_DEBUG */ + + logical_offset = (loff_t)index << PAGE_SHIFT; + + file_size = i_size_read(file_inode(file)); + data_bytes = file_size - logical_offset; + data_bytes = min_t(loff_t, PAGE_SIZE, data_bytes); + + BUG_ON(data_bytes > U32_MAX); + + ssdfs_memzero_page(page, 0, PAGE_SIZE, PAGE_SIZE); + + if (logical_offset >= file_size) { + /* Reading beyond inode */ + SetPageUptodate(page); + ClearPageError(page); + flush_dcache_page(page); + return 0; + } + + if (is_ssdfs_file_inline(ii)) { + size_t inline_capacity = + ssdfs_inode_inline_file_capacity(inode); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("inline_capacity %zu, file_size %llu\n", + inline_capacity, file_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (file_size > inline_capacity) { + ClearPageUptodate(page); + ssdfs_clear_page_private(page, 0); + SetPageError(page); + SSDFS_ERR("file_size %llu is greater capacity %zu\n", + file_size, inline_capacity); + return -E2BIG; + } + + err = ssdfs_memcpy_to_page(page, 0, PAGE_SIZE, + ii->inline_file, 0, inline_capacity, + data_bytes); + if (unlikely(err)) { + ClearPageUptodate(page); + ssdfs_clear_page_private(page, 0); + SetPageError(page); + SSDFS_ERR("fail to copy file's content: " + "err %d\n", err); + return err; + } + + SetPageUptodate(page); + ClearPageError(page); + flush_dcache_page(page); + return 0; + } + + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + return err; + } + + ssdfs_request_init(req); + ssdfs_get_request(req); + + ssdfs_request_prepare_logical_extent(ino, + (u64)logical_offset, + (u32)data_bytes, + 0, 0, req); + + err = ssdfs_request_add_page(page, req); + if (err) { + SSDFS_ERR("fail to add page into request: " + "ino %lu, page_index %lu, err %d\n", + ino, index, err); + goto fail_read_page; + } + + switch (read_mode) { + case SSDFS_CURRENT_THREAD_READ: + err = ssdfs_read_block_by_current_thread(fsi, req); + if (err) { + SSDFS_ERR("fail to read block: err %d\n", err); + goto fail_read_page; + } + + err = SSDFS_WAIT_COMPLETION(&req->result.wait); + if (unlikely(err)) { + SSDFS_ERR("read request failed: " + "ino %lu, logical_offset %llu, " + "size %u, err %d\n", + ino, (u64)logical_offset, + (u32)data_bytes, err); + goto fail_read_page; + } + + if (req->result.err) { + SSDFS_ERR("read request failed: " + "ino %lu, logical_offset %llu, " + "size %u, err %d\n", + ino, (u64)logical_offset, + (u32)data_bytes, + req->result.err); + goto fail_read_page; + } + + ssdfs_put_request(req); + ssdfs_request_free(req); + break; + + case SSDFS_DELEGATE_TO_READ_THREAD: + err = ssdfs_read_block_async(fsi, req); + if (err) { + SSDFS_ERR("fail to read block: err %d\n", err); + goto fail_read_page; + } + break; + + default: + BUG(); + } + + return 0; + +fail_read_page: + ClearPageUptodate(page); + ssdfs_clear_page_private(page, 0); + SetPageError(page); + ssdfs_put_request(req); + ssdfs_request_free(req); + + return err; +} + +static +int ssdfs_read_folio(struct file *file, struct folio *folio) +{ + struct page *page = &folio->page; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu, page_index %lu\n", + file_inode(file)->i_ino, page_index(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_account_locked_page(page); + err = ssdfs_readpage_nolock(file, page, SSDFS_CURRENT_THREAD_READ); + ssdfs_unlock_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu, page_index %lu, page %p, " + "count %d, flags %#lx\n", + file_inode(file)->i_ino, page_index(page), + page, page_ref_count(page), page->flags); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; +} + +static +struct ssdfs_segment_request * +ssdfs_issue_read_request(struct file *file, struct page *page) +{ + struct ssdfs_fs_info *fsi = SSDFS_FS_I(file_inode(file)->i_sb); + struct ssdfs_segment_request *req = NULL; + struct ssdfs_segment_info *si; + ino_t ino = file_inode(file)->i_ino; + pgoff_t index = page_index(page); + loff_t logical_offset; + loff_t data_bytes; + loff_t file_size; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu, page_index %lu\n", + file_inode(file)->i_ino, page_index(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + logical_offset = (loff_t)index << PAGE_SHIFT; + + file_size = i_size_read(file_inode(file)); + data_bytes = file_size - logical_offset; + data_bytes = min_t(loff_t, PAGE_SIZE, data_bytes); + + BUG_ON(data_bytes > U32_MAX); + + ssdfs_memzero_page(page, 0, PAGE_SIZE, PAGE_SIZE); + + if (logical_offset >= file_size) { + /* Reading beyond inode */ +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("Reading beyond inode: " + "logical_offset %llu, file_size %llu\n", + logical_offset, file_size); +#endif /* CONFIG_SSDFS_DEBUG */ + SetPageUptodate(page); + ClearPageError(page); + flush_dcache_page(page); + return ERR_PTR(-ENODATA); + } + + req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(req)) { + err = (req == NULL ? -ENOMEM : PTR_ERR(req)); + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + return req; + } + + ssdfs_request_init(req); + ssdfs_get_request(req); + + ssdfs_request_prepare_logical_extent(ino, + (u64)logical_offset, + (u32)data_bytes, + 0, 0, req); + + err = ssdfs_request_add_page(page, req); + if (err) { + SSDFS_ERR("fail to add page into request: " + "ino %lu, page_index %lu, err %d\n", + ino, index, err); + goto fail_issue_read_request; + } + + err = ssdfs_prepare_volume_extent(fsi, req); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare volume extent: " + "ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, " + "parent_snapshot %llu, err %d\n", + req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, + req->extent.cno, + req->extent.parent_snapshot, + err); + goto fail_issue_read_request; + } + + req->place.len = 1; + + si = ssdfs_grab_segment(fsi, SSDFS_USER_DATA_SEG_TYPE, + req->place.start.seg_id, U64_MAX); + if (unlikely(IS_ERR_OR_NULL(si))) { + err = (si == NULL ? -ENOMEM : PTR_ERR(si)); + SSDFS_ERR("fail to grab segment object: " + "seg %llu, err %d\n", + req->place.start.seg_id, + err); + goto fail_issue_read_request; + } + + err = ssdfs_segment_read_block_async(si, SSDFS_REQ_ASYNC_NO_FREE, req); + if (unlikely(err)) { + SSDFS_ERR("read request failed: " + "ino %llu, logical_offset %llu, size %u, err %d\n", + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, err); + goto fail_issue_read_request; + } + + ssdfs_segment_put_object(si); + + return req; + +fail_issue_read_request: + ClearPageUptodate(page); + ssdfs_clear_page_private(page, 0); + SetPageError(page); + ssdfs_put_request(req); + ssdfs_request_free(req); + + return ERR_PTR(err); +} + +static +int ssdfs_check_read_request(struct ssdfs_segment_request *req) +{ + wait_queue_head_t *wq = NULL; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!req); + + SSDFS_DBG("req %p\n", req); +#endif /* CONFIG_SSDFS_DEBUG */ + +check_req_state: + switch (atomic_read(&req->result.state)) { + case SSDFS_REQ_CREATED: + case SSDFS_REQ_STARTED: + wq = &req->private.wait_queue; + + err = wait_event_killable_timeout(*wq, + has_request_been_executed(req), + SSDFS_DEFAULT_TIMEOUT); + if (err < 0) + WARN_ON(err < 0); + else + err = 0; + + goto check_req_state; + break; + + case SSDFS_REQ_FINISHED: + /* do nothing */ + break; + + case SSDFS_REQ_FAILED: + err = req->result.err; + + if (!err) { + SSDFS_ERR("error code is absent: " + "req %p, err %d\n", + req, err); + err = -ERANGE; + } + + SSDFS_ERR("read request is failed: " + "err %d\n", err); + return err; + + default: + SSDFS_ERR("invalid result's state %#x\n", + atomic_read(&req->result.state)); + return -ERANGE; + } + + return 0; +} + +static +int ssdfs_wait_read_request_end(struct ssdfs_segment_request *req) +{ + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("req %p\n", req); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!req) + return 0; + + err = ssdfs_check_read_request(req); + if (unlikely(err)) { + SSDFS_ERR("read request failed: " + "err %d\n", err); + } + + ssdfs_request_free(req); + + return err; +} + +struct ssdfs_readahead_env { + struct file *file; + struct ssdfs_segment_request **reqs; + unsigned count; + unsigned capacity; +}; + +static inline +int ssdfs_readahead_page(void *data, struct page *page) +{ + struct ssdfs_readahead_env *env = (struct ssdfs_readahead_env *)data; + unsigned index; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu, page_index %lu, page %p, " + "count %d, flags %#lx\n", + file_inode(env->file)->i_ino, page_index(page), + page, page_ref_count(page), page->flags); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (env->count >= env->capacity) { + SSDFS_ERR("count %u >= capacity %u\n", + env->count, env->capacity); + return -ERANGE; + } + + index = env->count; + + ssdfs_get_page(page); + ssdfs_account_locked_page(page); + + env->reqs[index] = ssdfs_issue_read_request(env->file, page); + if (IS_ERR_OR_NULL(env->reqs[index])) { + err = (env->reqs[index] == NULL ? -ENOMEM : + PTR_ERR(env->reqs[index])); + env->reqs[index] = NULL; + + if (err == -ENODATA) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("no data for the page: " + "index %d\n", index); +#endif /* CONFIG_SSDFS_DEBUG */ + } else { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("unable to issue request: " + "index %d, err %d\n", + index, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + SetPageError(page); + zero_user_segment(page, 0, PAGE_SIZE); + ssdfs_unlock_page(page); + ssdfs_put_page(page); + } + } else + env->count++; + + return err; +} + +/* + * The ssdfs_readahead() is called by the VM to read pages + * associated with the address_space object. The pages are + * consecutive in the page cache and are locked. + * The implementation should decrement the page refcount + * after starting I/O on each page. Usually the page will be + * unlocked by the I/O completion handler. The ssdfs_readahead() + * is only used for read-ahead, so read errors are ignored. + */ +static +void ssdfs_readahead(struct readahead_control *rac) +{ + struct inode *inode = file_inode(rac->file); + struct ssdfs_inode_info *ii = SSDFS_I(inode); + struct ssdfs_readahead_env env; + struct page *page; + unsigned i; + int res; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu, nr_pages %u\n", + file_inode(rac->file)->i_ino, readahead_count(rac)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_ssdfs_file_inline(ii)) { + /* do nothing */ + return; + } + + env.file = rac->file; + env.count = 0; + env.capacity = readahead_count(rac); + + env.reqs = ssdfs_file_kcalloc(readahead_count(rac), + sizeof(struct ssdfs_segment_request *), + GFP_KERNEL); + if (!env.reqs) { + SSDFS_ERR("fail to allocate requests array\n"); + return; + } + + while ((page = readahead_page(rac))) { + prefetchw(&page->flags); + err = ssdfs_readahead_page((void *)&env, page); + if (unlikely(err)) { + SSDFS_ERR("fail to process page: " + "index %u, err %d\n", + env.count, err); + break; + } + }; + + for (i = 0; i < readahead_count(rac); i++) { + res = ssdfs_wait_read_request_end(env.reqs[i]); + if (res) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("waiting has finished with issue: " + "index %u, err %d\n", + i, res); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + if (err == 0) + err = res; + + env.reqs[i] = NULL; + } + + if (env.reqs) + ssdfs_file_kfree(env.reqs); + + if (err) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("readahead fails: " + "ino %lu, nr_pages %u, err %d\n", + file_inode(rac->file)->i_ino, + readahead_count(rac), err); +#endif /* CONFIG_SSDFS_DEBUG */ + } + + return; +} + +/* + * ssdfs_update_block() - update block. + * @fsi: pointer on shared file system object + * @req: request object + */ +static +int ssdfs_update_block(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + struct writeback_control *wbc) +{ + struct ssdfs_segment_info *si; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !req); + BUG_ON((req->extent.logical_offset >> fsi->log_pagesize) >= U32_MAX); + + SSDFS_DBG("fsi %p, req %p\n", fsi, req); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_prepare_volume_extent(fsi, req); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare volume extent: " + "ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, " + "parent_snapshot %llu, err %d\n", + req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, + req->extent.cno, + req->extent.parent_snapshot, + err); + return err; + } + + req->place.len = 1; + + si = ssdfs_grab_segment(fsi, SSDFS_USER_DATA_SEG_TYPE, + req->place.start.seg_id, U64_MAX); + if (unlikely(IS_ERR_OR_NULL(si))) { + SSDFS_ERR("fail to grab segment object: " + "seg %llu, err %d\n", + req->place.start.seg_id, err); + return PTR_ERR(si); + } + + if (wbc->sync_mode == WB_SYNC_NONE) { + err = ssdfs_segment_update_block_async(si, + SSDFS_REQ_ASYNC, + req); + } else if (wbc->sync_mode == WB_SYNC_ALL) + err = ssdfs_segment_update_block_sync(si, req); + else + BUG(); + + if (unlikely(err)) { + SSDFS_ERR("update request failed: " + "ino %llu, logical_offset %llu, size %u, err %d\n", + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, err); + return err; + } + + ssdfs_segment_put_object(si); + + return 0; +} + +/* + * ssdfs_update_extent() - update extent. + * @fsi: pointer on shared file system object + * @req: request object + */ +static +int ssdfs_update_extent(struct ssdfs_fs_info *fsi, + struct ssdfs_segment_request *req, + struct writeback_control *wbc) +{ + struct ssdfs_segment_info *si; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!fsi || !req); + BUG_ON((req->extent.logical_offset >> fsi->log_pagesize) >= U32_MAX); + + SSDFS_DBG("fsi %p, req %p\n", fsi, req); +#endif /* CONFIG_SSDFS_DEBUG */ + + err = ssdfs_prepare_volume_extent(fsi, req); + if (unlikely(err)) { + SSDFS_ERR("fail to prepare volume extent: " + "ino %llu, logical_offset %llu, " + "data_bytes %u, cno %llu, " + "parent_snapshot %llu, err %d\n", + req->extent.ino, + req->extent.logical_offset, + req->extent.data_bytes, + req->extent.cno, + req->extent.parent_snapshot, + err); + return err; + } + + si = ssdfs_grab_segment(fsi, SSDFS_USER_DATA_SEG_TYPE, + req->place.start.seg_id, U64_MAX); + if (unlikely(IS_ERR_OR_NULL(si))) { + SSDFS_ERR("fail to grab segment object: " + "seg %llu, err %d\n", + req->place.start.seg_id, err); + return PTR_ERR(si); + } + + if (wbc->sync_mode == WB_SYNC_NONE) { + err = ssdfs_segment_update_extent_async(si, + SSDFS_REQ_ASYNC, + req); + } else if (wbc->sync_mode == WB_SYNC_ALL) + err = ssdfs_segment_update_extent_sync(si, req); + else + BUG(); + + if (unlikely(err)) { + SSDFS_ERR("update request failed: " + "ino %llu, logical_offset %llu, size %u, err %d\n", + req->extent.ino, req->extent.logical_offset, + req->extent.data_bytes, err); + return err; + } + + ssdfs_segment_put_object(si); + + return 0; +} + +static +int ssdfs_issue_async_block_write_request(struct writeback_control *wbc, + struct ssdfs_segment_request **req) +{ + struct page *page; + struct inode *inode; + struct ssdfs_inode_info *ii; + struct ssdfs_extents_btree_info *etree; + struct ssdfs_fs_info *fsi; + ino_t ino; + u64 logical_offset; + u32 data_bytes; + u64 seg_id = U64_MAX; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!wbc || !req || !*req); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pagevec_count(&(*req)->result.pvec) == 0) { + SSDFS_ERR("pagevec is empty\n"); + return -ERANGE; + } + + page = (*req)->result.pvec.pages[0]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + inode = page->mapping->host; + ii = SSDFS_I(inode); + fsi = SSDFS_FS_I(inode->i_sb); + ino = inode->i_ino; + logical_offset = (*req)->extent.logical_offset; + data_bytes = (*req)->extent.data_bytes; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu, logical_offset %llu, " + "data_bytes %u, sync_mode %#x\n", + ino, logical_offset, data_bytes, wbc->sync_mode); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (need_add_block(page)) { + struct ssdfs_blk2off_range extent; + + err = ssdfs_segment_add_data_block_async(fsi, *req, + &seg_id, + &extent); + if (!err) { + err = ssdfs_extents_tree_add_extent(inode, *req); + if (err) { + SSDFS_ERR("fail to add extent: " + "ino %lu, page_index %llu, " + "err %d\n", + ino, (u64)page_index(page), + err); + return err; + } + + inode_add_bytes(inode, fsi->pagesize); + } + } else { + err = ssdfs_update_block(fsi, *req, wbc); + seg_id = (*req)->place.start.seg_id; + } + + if (err) { + SSDFS_ERR("fail to write page async: " + "ino %lu, page_index %llu, err %d\n", + ino, (u64)page_index(page), err); + return err; + } + + etree = SSDFS_EXTREE(ii); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!etree); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&etree->lock); + err = ssdfs_extents_tree_add_updated_seg_id(etree, seg_id); + up_write(&etree->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to add updated segment in queue: " + "seg_id %llu, err %d\n", + seg_id, err); + return err; + } + + return 0; +} + +static +int ssdfs_issue_sync_block_write_request(struct writeback_control *wbc, + struct ssdfs_segment_request **req) +{ + struct page *page; + struct inode *inode; + struct ssdfs_inode_info *ii; + struct ssdfs_extents_btree_info *etree; + struct ssdfs_fs_info *fsi; + ino_t ino; + u64 logical_offset; + u32 data_bytes; + u64 seg_id = U64_MAX; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!wbc || !req || !*req); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pagevec_count(&(*req)->result.pvec) == 0) { + SSDFS_ERR("pagevec is empty\n"); + return -ERANGE; + } + + page = (*req)->result.pvec.pages[0]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + inode = page->mapping->host; + ii = SSDFS_I(inode); + fsi = SSDFS_FS_I(inode->i_sb); + ino = inode->i_ino; + logical_offset = (*req)->extent.logical_offset; + data_bytes = (*req)->extent.data_bytes; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu, logical_offset %llu, " + "data_bytes %u, sync_mode %#x\n", + ino, logical_offset, data_bytes, wbc->sync_mode); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (need_add_block(page)) { + struct ssdfs_blk2off_range extent; + + err = ssdfs_segment_add_data_block_sync(fsi, *req, + &seg_id, + &extent); + if (!err) { + err = ssdfs_extents_tree_add_extent(inode, *req); + if (!err) + inode_add_bytes(inode, fsi->pagesize); + } + } else { + err = ssdfs_update_block(fsi, *req, wbc); + seg_id = (*req)->place.start.seg_id; + } + + if (err) { + SSDFS_ERR("fail to write page sync: " + "ino %lu, page_index %llu, err %d\n", + ino, (u64)page_index(page), err); + return err; + } + + etree = SSDFS_EXTREE(ii); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!etree); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&etree->lock); + err = ssdfs_extents_tree_add_updated_seg_id(etree, seg_id); + up_write(&etree->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to add updated segment in queue: " + "seg_id %llu, err %d\n", + seg_id, err); + return err; + } + + return 0; +} + +static +int ssdfs_issue_async_extent_write_request(struct writeback_control *wbc, + struct ssdfs_segment_request **req) +{ + struct page *page; + struct inode *inode; + struct ssdfs_inode_info *ii; + struct ssdfs_extents_btree_info *etree; + struct ssdfs_fs_info *fsi; + ino_t ino; + u64 logical_offset; + u32 data_bytes; + u64 seg_id = U64_MAX; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!wbc || !req || !*req); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pagevec_count(&(*req)->result.pvec) == 0) { + SSDFS_ERR("pagevec is empty\n"); + return -ERANGE; + } + + page = (*req)->result.pvec.pages[0]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + inode = page->mapping->host; + ii = SSDFS_I(inode); + fsi = SSDFS_FS_I(inode->i_sb); + ino = inode->i_ino; + logical_offset = (*req)->extent.logical_offset; + data_bytes = (*req)->extent.data_bytes; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu, logical_offset %llu, " + "data_bytes %u, sync_mode %#x\n", + ino, logical_offset, data_bytes, wbc->sync_mode); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (need_add_block(page)) { + struct ssdfs_blk2off_range extent; + + err = ssdfs_segment_add_data_extent_async(fsi, *req, + &seg_id, + &extent); + if (!err) { + u32 extent_bytes = data_bytes; + + err = ssdfs_extents_tree_add_extent(inode, *req); + if (err) { + SSDFS_ERR("fail to add extent: " + "ino %lu, page_index %llu, " + "err %d\n", + ino, (u64)page_index(page), err); + return err; + } + + if (fsi->pagesize > PAGE_SIZE) + extent_bytes += fsi->pagesize - 1; + else if (fsi->pagesize <= PAGE_SIZE) + extent_bytes += PAGE_SIZE - 1; + + extent_bytes >>= fsi->log_pagesize; + extent_bytes <<= fsi->log_pagesize; + + inode_add_bytes(inode, extent_bytes); + } + } else { + err = ssdfs_update_extent(fsi, *req, wbc); + seg_id = (*req)->place.start.seg_id; + } + + if (err) { + SSDFS_ERR("fail to write extent async: " + "ino %lu, page_index %llu, err %d\n", + ino, (u64)page_index(page), err); + return err; + } + + etree = SSDFS_EXTREE(ii); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!etree); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&etree->lock); + err = ssdfs_extents_tree_add_updated_seg_id(etree, seg_id); + up_write(&etree->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to add updated segment in queue: " + "seg_id %llu, err %d\n", + seg_id, err); + return err; + } + + return 0; +} + +static +int ssdfs_issue_sync_extent_write_request(struct writeback_control *wbc, + struct ssdfs_segment_request **req) +{ + struct page *page; + struct inode *inode; + struct ssdfs_inode_info *ii; + struct ssdfs_extents_btree_info *etree; + struct ssdfs_fs_info *fsi; + ino_t ino; + u64 logical_offset; + u32 data_bytes; + u64 seg_id = U64_MAX; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!wbc || !req || !*req); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (pagevec_count(&(*req)->result.pvec) == 0) { + SSDFS_ERR("pagevec is empty\n"); + return -ERANGE; + } + + page = (*req)->result.pvec.pages[0]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + inode = page->mapping->host; + ii = SSDFS_I(inode); + fsi = SSDFS_FS_I(inode->i_sb); + ino = inode->i_ino; + logical_offset = (*req)->extent.logical_offset; + data_bytes = (*req)->extent.data_bytes; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu, logical_offset %llu, " + "data_bytes %u, sync_mode %#x\n", + ino, logical_offset, data_bytes, wbc->sync_mode); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (need_add_block(page)) { + struct ssdfs_blk2off_range extent; + + err = ssdfs_segment_add_data_extent_sync(fsi, *req, + &seg_id, + &extent); + if (!err) { + u32 extent_bytes = data_bytes; + + err = ssdfs_extents_tree_add_extent(inode, *req); + if (err) { + SSDFS_ERR("fail to add extent: " + "ino %lu, page_index %llu, " + "err %d\n", + ino, (u64)page_index(page), err); + return err; + } + + if (fsi->pagesize > PAGE_SIZE) + extent_bytes += fsi->pagesize - 1; + else if (fsi->pagesize <= PAGE_SIZE) + extent_bytes += PAGE_SIZE - 1; + + extent_bytes >>= fsi->log_pagesize; + extent_bytes <<= fsi->log_pagesize; + + inode_add_bytes(inode, extent_bytes); + } + } else { + err = ssdfs_update_extent(fsi, *req, wbc); + seg_id = (*req)->place.start.seg_id; + } + + if (err) { + SSDFS_ERR("fail to write page sync: " + "ino %lu, page_index %llu, err %d\n", + ino, (u64)page_index(page), err); + return err; + } + + etree = SSDFS_EXTREE(ii); + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!etree); +#endif /* CONFIG_SSDFS_DEBUG */ + + down_write(&etree->lock); + err = ssdfs_extents_tree_add_updated_seg_id(etree, seg_id); + up_write(&etree->lock); + + if (unlikely(err)) { + SSDFS_ERR("fail to add updated segment in queue: " + "seg_id %llu, err %d\n", + seg_id, err); + return err; + } + + return 0; +} + +static +int ssdfs_issue_write_request(struct writeback_control *wbc, + struct ssdfs_segment_request **req, + int req_type) +{ + struct ssdfs_fs_info *fsi; + struct inode *inode; + struct page *page; + ino_t ino; + u64 logical_offset; + u32 data_bytes; + int i; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!wbc || !req); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!*req) { + SSDFS_ERR("empty segment request\n"); + return -ERANGE; + } + + if (pagevec_count(&(*req)->result.pvec) == 0) { + SSDFS_ERR("pagevec is empty\n"); + return -ERANGE; + } + + page = (*req)->result.pvec.pages[0]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + inode = page->mapping->host; + fsi = SSDFS_FS_I(inode->i_sb); + ino = inode->i_ino; + logical_offset = (*req)->extent.logical_offset; + data_bytes = (*req)->extent.data_bytes; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu, logical_offset %llu, " + "data_bytes %u, sync_mode %#x\n", + ino, logical_offset, data_bytes, wbc->sync_mode); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < pagevec_count(&(*req)->result.pvec); i++) { + page = (*req)->result.pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + set_page_writeback(page); + ssdfs_clear_dirty_page(page); + } + + if (wbc->sync_mode == WB_SYNC_NONE) { + if (req_type == SSDFS_BLOCK_BASED_REQUEST) + err = ssdfs_issue_async_block_write_request(wbc, req); + else if (req_type == SSDFS_EXTENT_BASED_REQUEST) + err = ssdfs_issue_async_extent_write_request(wbc, req); + else + BUG(); + + if (err) { + SSDFS_ERR("fail to write async: " + "ino %lu, err %d\n", + ino, err); + goto fail_issue_write_request; + } + + wake_up_all(&fsi->pending_wq); + + /* + * Async request is completely managed by flush thread. + * Forget request because next request will be allocated. + */ + *req = NULL; + } else if (wbc->sync_mode == WB_SYNC_ALL) { + if (req_type == SSDFS_BLOCK_BASED_REQUEST) + err = ssdfs_issue_sync_block_write_request(wbc, req); + else if (req_type == SSDFS_EXTENT_BASED_REQUEST) + err = ssdfs_issue_sync_extent_write_request(wbc, req); + else + BUG(); + + if (err) { + SSDFS_ERR("fail to write sync: " + "ino %lu, err %d\n", + ino, err); + goto fail_issue_write_request; + } + + wake_up_all(&fsi->pending_wq); + + err = SSDFS_WAIT_COMPLETION(&(*req)->result.wait); + if (unlikely(err)) { + SSDFS_ERR("write request failed: " + "ino %lu, logical_offset %llu, size %u, " + "err %d\n", + ino, (u64)logical_offset, + (u32)data_bytes, err); + goto fail_issue_write_request; + } + + if ((*req)->result.err) { + err = (*req)->result.err; + SSDFS_ERR("write request failed: " + "ino %lu, logical_offset %llu, size %u, " + "err %d\n", + ino, (u64)logical_offset, (u32)data_bytes, + (*req)->result.err); + goto fail_issue_write_request; + } + + for (i = 0; i < pagevec_count(&(*req)->result.pvec); i++) { + page = (*req)->result.pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + clear_page_new(page); + SetPageUptodate(page); + ssdfs_clear_dirty_page(page); + + ssdfs_unlock_page(page); + end_page_writeback(page); + } + + ssdfs_put_request(*req); + ssdfs_request_free(*req); + *req = NULL; + } else + BUG(); + + return 0; + +fail_issue_write_request: + if (wbc->sync_mode == WB_SYNC_ALL) { + for (i = 0; i < pagevec_count(&(*req)->result.pvec); i++) { + page = (*req)->result.pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!PageLocked(page)) { + SSDFS_WARN("page %p, PageLocked %#x\n", + page, PageLocked(page)); + ssdfs_lock_page(page); + } + + clear_page_new(page); + SetPageUptodate(page); + ClearPageDirty(page); + + ssdfs_unlock_page(page); + end_page_writeback(page); + } + + ssdfs_put_request(*req); + ssdfs_request_free(*req); + } + + return err; +} + +static +int __ssdfs_writepage(struct page *page, u32 len, + struct writeback_control *wbc, + struct ssdfs_segment_request **req) +{ + struct inode *inode = page->mapping->host; + ino_t ino = inode->i_ino; + pgoff_t index = page_index(page); + loff_t logical_offset; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu, page_index %llu, len %u, sync_mode %#x\n", + ino, (u64)index, len, wbc->sync_mode); +#endif /* CONFIG_SSDFS_DEBUG */ + + *req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(*req)) { + err = (*req == NULL ? -ENOMEM : PTR_ERR(*req)); + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + return err; + } + + ssdfs_request_init(*req); + ssdfs_get_request(*req); + + (*req)->private.flags |= SSDFS_REQ_DONT_FREE_PAGES; + + logical_offset = (loff_t)index << PAGE_SHIFT; + ssdfs_request_prepare_logical_extent(ino, (u64)logical_offset, + len, 0, 0, *req); + + err = ssdfs_request_add_page(page, *req); + if (err) { + SSDFS_ERR("fail to add page into request: " + "ino %lu, page_index %lu, err %d\n", + ino, index, err); + goto free_request; + } + + return ssdfs_issue_write_request(wbc, req, SSDFS_BLOCK_BASED_REQUEST); + +free_request: + ssdfs_put_request(*req); + ssdfs_request_free(*req); + + return err; +} + +static +int __ssdfs_writepages(struct page *page, u32 len, + struct writeback_control *wbc, + struct ssdfs_segment_request **req) +{ + struct inode *inode = page->mapping->host; + ino_t ino = inode->i_ino; + pgoff_t index = page_index(page); + loff_t logical_offset; + bool need_create_request; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu, page_index %llu, len %u, sync_mode %#x\n", + ino, (u64)index, len, wbc->sync_mode); +#endif /* CONFIG_SSDFS_DEBUG */ + + logical_offset = (loff_t)index << PAGE_SHIFT; + +try_add_page_into_request: + need_create_request = *req == NULL; + + if (need_create_request) { + *req = ssdfs_request_alloc(); + if (IS_ERR_OR_NULL(*req)) { + err = (*req == NULL ? -ENOMEM : PTR_ERR(*req)); + SSDFS_ERR("fail to allocate segment request: err %d\n", + err); + goto fail_write_pages; + } + + ssdfs_request_init(*req); + ssdfs_get_request(*req); + + (*req)->private.flags |= SSDFS_REQ_DONT_FREE_PAGES; + + err = ssdfs_request_add_page(page, *req); + if (err) { + SSDFS_ERR("fail to add page into request: " + "ino %lu, page_index %lu, err %d\n", + ino, index, err); + goto free_request; + } + + ssdfs_request_prepare_logical_extent(ino, (u64)logical_offset, + len, 0, 0, *req); + } else { + u64 upper_bound = (*req)->extent.logical_offset + + (*req)->extent.data_bytes; + u32 last_index; + struct page *last_page; + + if (pagevec_count(&(*req)->result.pvec) == 0) { + err = -ERANGE; + SSDFS_WARN("pagevec is empty\n"); + goto free_request; + } + + last_index = pagevec_count(&(*req)->result.pvec) - 1; + last_page = (*req)->result.pvec.pages[last_index]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("logical_offset %llu, upper_bound %llu, " + "last_index %u\n", + (u64)logical_offset, upper_bound, last_index); + + BUG_ON(!last_page); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (logical_offset == upper_bound && + can_be_merged_into_extent(last_page, page)) { + err = ssdfs_request_add_page(page, *req); + if (err) { + err = ssdfs_issue_write_request(wbc, req, + SSDFS_EXTENT_BASED_REQUEST); + if (err) + goto fail_write_pages; + + *req = NULL; + goto try_add_page_into_request; + } + + (*req)->extent.data_bytes += len; + } else { + err = ssdfs_issue_write_request(wbc, req, + SSDFS_EXTENT_BASED_REQUEST); + if (err) + goto fail_write_pages; + + *req = NULL; + goto try_add_page_into_request; + } + } + + return 0; + +free_request: + ssdfs_put_request(*req); + ssdfs_request_free(*req); + +fail_write_pages: + return err; +} + +/* writepage function prototype */ +typedef int (*ssdfs_writepagefn)(struct page *page, u32 len, + struct writeback_control *wbc, + struct ssdfs_segment_request **req); + +static +int ssdfs_writepage_wrapper(struct page *page, + struct writeback_control *wbc, + struct ssdfs_segment_request **req, + ssdfs_writepagefn writepage) +{ + struct inode *inode = page->mapping->host; + struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb); + struct ssdfs_inode_info *ii = SSDFS_I(inode); + ino_t ino = inode->i_ino; + pgoff_t index = page_index(page); + loff_t i_size = i_size_read(inode); + pgoff_t end_index = i_size >> PAGE_SHIFT; + int len = i_size & (PAGE_SIZE - 1); + loff_t cur_blk; + bool is_new_blk = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu, page_index %llu, " + "i_size %llu, len %d\n", + ino, (u64)index, + (u64)i_size, len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (inode->i_sb->s_flags & SB_RDONLY) { + /* + * It means that filesystem was remounted in read-only + * mode because of error or metadata corruption. But we + * have dirty pages that try to be flushed in background. + * So, here we simply discard this dirty page. + */ + err = -EROFS; + goto discard_page; + } + + /* Is the page fully outside @i_size? (truncate in progress) */ + if (index > end_index || (index == end_index && !len)) { + err = 0; + goto finish_write_page; + } + + if (is_ssdfs_file_inline(ii)) { + size_t inline_capacity = + ssdfs_inode_inline_file_capacity(inode); + + if (len > inline_capacity) { + err = -ENOSPC; + SSDFS_ERR("len %d is greater capacity %zu\n", + len, inline_capacity); + goto discard_page; + } + + set_page_writeback(page); + + err = ssdfs_memcpy_from_page(ii->inline_file, + 0, inline_capacity, + page, + 0, PAGE_SIZE, + len); + if (unlikely(err)) { + SSDFS_ERR("fail to copy file's content: " + "err %d\n", err); + goto discard_page; + } + + inode_add_bytes(inode, len); + + clear_page_new(page); + SetPageUptodate(page); + ClearPageDirty(page); + + ssdfs_unlock_page(page); + end_page_writeback(page); + + return 0; + } + + cur_blk = (index << PAGE_SHIFT) >> fsi->log_pagesize; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_blk %llu\n", (u64)cur_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!need_add_block(page)) { + is_new_blk = !ssdfs_extents_tree_has_logical_block(cur_blk, + inode); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_blk %llu, is_new_blk %#x\n", + (u64)cur_blk, is_new_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_new_blk) + set_page_new(page); + } + + /* Is the page fully inside @i_size? */ + if (index < end_index) { + err = (*writepage)(page, PAGE_SIZE, wbc, req); + if (unlikely(err)) { + ssdfs_fs_error(inode->i_sb, __FILE__, + __func__, __LINE__, + "fail to write page: " + "ino %lu, page_index %llu, err %d\n", + ino, (u64)index, err); + goto discard_page; + } + + return 0; + } + + /* + * The page straddles @i_size. It must be zeroed out on each and every + * writepage invocation because it may be mmapped. "A file is mapped + * in multiples of the page size. For a file that is not a multiple of + * the page size, the remaining memory is zeroed when mapped, and + * writes to that region are not written out to the file." + */ + zero_user_segment(page, len, PAGE_SIZE); + + err = (*writepage)(page, len, wbc, req); + if (unlikely(err)) { + ssdfs_fs_error(inode->i_sb, __FILE__, + __func__, __LINE__, + "fail to write page: " + "ino %lu, page_index %llu, err %d\n", + ino, (u64)index, err); + goto discard_page; + } + + return 0; + +finish_write_page: + ssdfs_unlock_page(page); + +discard_page: + return err; +} + +/* + * The ssdfs_writepage() is called by the VM to write + * a dirty page to backing store. This may happen for data + * integrity reasons (i.e. 'sync'), or to free up memory + * (flush). The difference can be seen in wbc->sync_mode. + */ +static +int ssdfs_writepage(struct page *page, struct writeback_control *wbc) +{ + struct ssdfs_segment_request *req = NULL; +#ifdef CONFIG_SSDFS_DEBUG + struct inode *inode = page->mapping->host; + ino_t ino = inode->i_ino; + pgoff_t index = page_index(page); + + SSDFS_DBG("ino %lu, page_index %llu\n", + ino, (u64)index); +#endif /* CONFIG_SSDFS_DEBUG */ + + return ssdfs_writepage_wrapper(page, wbc, &req, + __ssdfs_writepage); +} + +/* + * The ssdfs_writepages() is called by the VM to write out pages associated + * with the address_space object. If wbc->sync_mode is WBC_SYNC_ALL, then + * the writeback_control will specify a range of pages that must be + * written out. If it is WBC_SYNC_NONE, then a nr_to_write is given + * and that many pages should be written if possible. + * If no ->writepages is given, then mpage_writepages is used + * instead. This will choose pages from the address space that are + * tagged as DIRTY and will pass them to ->writepage. + */ +static +int ssdfs_writepages(struct address_space *mapping, + struct writeback_control *wbc) +{ + struct inode *inode = mapping->host; + struct ssdfs_inode_info *ii = SSDFS_I(inode); + ino_t ino = inode->i_ino; + struct ssdfs_segment_request *req = NULL; + struct pagevec pvec; + int nr_pages; + pgoff_t index = 0; + pgoff_t end; /* Inclusive */ + pgoff_t done_index = 0; + int range_whole = 0; + int tag; + int i; + int done = 0; + int ret = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu, nr_to_write %lu, " + "range_start %llu, range_end %llu, " + "writeback_index %llu, " + "wbc->range_cyclic %#x\n", + ino, wbc->nr_to_write, + (u64)wbc->range_start, + (u64)wbc->range_end, + (u64)mapping->writeback_index, + wbc->range_cyclic); +#endif /* CONFIG_SSDFS_DEBUG */ + + /* + * No pages to write? + */ + if (!mapping->nrpages || !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) + goto out_writepages; + + pagevec_init(&pvec); + + if (wbc->range_cyclic) { + index = mapping->writeback_index; /* prev offset */ + end = -1; + } else { + index = wbc->range_start >> PAGE_SHIFT; + end = wbc->range_end >> PAGE_SHIFT; + if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) + range_whole = 1; + } + + if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) { + tag = PAGECACHE_TAG_TOWRITE; + tag_pages_for_writeback(mapping, index, end); + } else + tag = PAGECACHE_TAG_DIRTY; + + done_index = index; + + while (!done && (index <= end)) { + nr_pages = (int)min_t(pgoff_t, end - index, + (pgoff_t)PAGEVEC_SIZE-1) + 1; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index %llu, end %llu, " + "nr_pages %d, tag %#x\n", + (u64)index, (u64)end, nr_pages, tag); +#endif /* CONFIG_SSDFS_DEBUG */ + + nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, + end, tag); + if (nr_pages == 0) + break; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("FOUND: nr_pages %d\n", nr_pages); +#endif /* CONFIG_SSDFS_DEBUG */ + + for (i = 0; i < nr_pages; i++) { + struct page *page = pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, index %d, page->index %ld\n", + page, i, page->index); +#endif /* CONFIG_SSDFS_DEBUG */ + + /* + * At this point, the page may be truncated or + * invalidated (changing page->mapping to NULL), or + * even swizzled back from swapper_space to tmpfs file + * mapping. However, page->index will not change + * because we have a reference on the page. + */ + if (page->index > end) { + /* + * can't be range_cyclic (1st pass) because + * end == -1 in that case. + */ + done = 1; + break; + } + + done_index = page->index + 1; + + ssdfs_lock_page(page); + + /* + * Page truncated or invalidated. We can freely skip it + * then, even for data integrity operations: the page + * has disappeared concurrently, so there could be no + * real expectation of this data interity operation + * even if there is now a new, dirty page at the same + * pagecache address. + */ + if (unlikely(page->mapping != mapping)) { +continue_unlock: + ssdfs_unlock_page(page); + continue; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, index %d, page->index %ld, " + "PageLocked %#x, PageDirty %#x, " + "PageWriteback %#x\n", + page, i, page->index, + PageLocked(page), PageDirty(page), + PageWriteback(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (!PageDirty(page)) { + /* someone wrote it for us */ + goto continue_unlock; + } + + if (PageWriteback(page)) { + if (wbc->sync_mode != WB_SYNC_NONE) + wait_on_page_writeback(page); + else + goto continue_unlock; + } + + BUG_ON(PageWriteback(page)); + if (!clear_page_dirty_for_io(page)) + goto continue_unlock; + + ret = ssdfs_writepage_wrapper(page, wbc, &req, + __ssdfs_writepages); + if (unlikely(ret)) { + if (ret == -EROFS) { + /* + * continue to discard pages + */ + } else { + /* + * done_index is set past this page, + * so media errors will not choke + * background writeout for the entire + * file. This has consequences for + * range_cyclic semantics (ie. it may + * not be suitable for data integrity + * writeout). + */ + done_index = page->index + 1; + done = 1; + break; + } + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, index %d, page->index %ld, " + "PageLocked %#x, PageDirty %#x, " + "PageWriteback %#x\n", + page, i, page->index, + PageLocked(page), PageDirty(page), + PageWriteback(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + /* + * We stop writing back only if we are not doing + * integrity sync. In case of integrity sync we have to + * keep going until we have written all the pages + * we tagged for writeback prior to entering this loop. + */ + if (--wbc->nr_to_write <= 0 && + wbc->sync_mode == WB_SYNC_NONE) { + done = 1; + break; + } + } + + if (!is_ssdfs_file_inline(ii)) { + ret = ssdfs_issue_write_request(wbc, &req, + SSDFS_EXTENT_BASED_REQUEST); + if (ret < 0) { + SSDFS_ERR("ino %lu, nr_to_write %lu, " + "range_start %llu, range_end %llu, " + "writeback_index %llu, " + "wbc->range_cyclic %#x, " + "index %llu, end %llu, " + "done_index %llu\n", + ino, wbc->nr_to_write, + (u64)wbc->range_start, + (u64)wbc->range_end, + (u64)mapping->writeback_index, + wbc->range_cyclic, + (u64)index, (u64)end, + (u64)done_index); + + for (i = 0; i < pagevec_count(&pvec); i++) { + struct page *page; + + page = pvec.pages[i]; + +#ifdef CONFIG_SSDFS_DEBUG + BUG_ON(!page); +#endif /* CONFIG_SSDFS_DEBUG */ + + SSDFS_ERR("page %p, index %d, " + "page->index %ld, " + "PageLocked %#x, " + "PageDirty %#x, " + "PageWriteback %#x\n", + page, i, page->index, + PageLocked(page), + PageDirty(page), + PageWriteback(page)); + } + + goto out_writepages; + } + } + + index = done_index; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("index %llu, end %llu, nr_to_write %lu\n", + (u64)index, (u64)end, wbc->nr_to_write); +#endif /* CONFIG_SSDFS_DEBUG */ + + pagevec_reinit(&pvec); + cond_resched(); + }; + + /* + * If we hit the last page and there is more work to be done: wrap + * back the index back to the start of the file for the next + * time we are called. + */ + if (wbc->range_cyclic && !done) + done_index = 0; + +out_writepages: + if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0)) + mapping->writeback_index = done_index; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu, nr_to_write %lu, " + "range_start %llu, range_end %llu, " + "writeback_index %llu\n", + ino, wbc->nr_to_write, + (u64)wbc->range_start, + (u64)wbc->range_end, + (u64)mapping->writeback_index); +#endif /* CONFIG_SSDFS_DEBUG */ + + return ret; +} + +static void ssdfs_write_failed(struct address_space *mapping, loff_t to) +{ + struct inode *inode = mapping->host; + + if (to > inode->i_size) + truncate_pagecache(inode, inode->i_size); +} + +/* + * The ssdfs_write_begin() is called by the generic + * buffered write code to ask the filesystem to prepare + * to write len bytes at the given offset in the file. + */ +static +int ssdfs_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, + struct page **pagep, void **fsdata) +{ + struct inode *inode = mapping->host; + struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb); + struct ssdfs_inode_info *ii = SSDFS_I(inode); + struct page *page; + pgoff_t index = pos >> PAGE_SHIFT; + unsigned blks = 0; + loff_t start_blk, end_blk, cur_blk; + u64 last_blk = U64_MAX; +#ifdef CONFIG_SSDFS_DEBUG + u64 free_pages = 0; +#endif /* CONFIG_SSDFS_DEBUG */ + bool is_new_blk = false; + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu, pos %llu, len %u\n", + inode->i_ino, pos, len); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (inode->i_sb->s_flags & SB_RDONLY) + return -EROFS; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu, index %lu\n", + inode->i_ino, index); +#endif /* CONFIG_SSDFS_DEBUG */ + + page = grab_cache_page_write_begin(mapping, index); + if (!page) { + SSDFS_ERR("fail to grab page: index %lu\n", + index); + return -ENOMEM; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + ssdfs_account_locked_page(page); + + if (can_file_be_inline(inode, pos + len)) { + if (!ii->inline_file) { + err = ssdfs_allocate_inline_file_buffer(inode); + if (unlikely(err)) { + SSDFS_ERR("fail to allocate inline buffer\n"); + goto try_regular_write; + } + + /* + * TODO: pre-fetch file's content in buffer + * (if inode size > 256 bytes) + */ + } + + atomic_or(SSDFS_INODE_HAS_INLINE_FILE, + &SSDFS_I(inode)->private_flags); + } else { +try_regular_write: + atomic_and(~SSDFS_INODE_HAS_INLINE_FILE, + &SSDFS_I(inode)->private_flags); + + start_blk = pos >> fsi->log_pagesize; + end_blk = (pos + len) >> fsi->log_pagesize; + + if (can_file_be_inline(inode, i_size_read(inode))) { +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("change from inline to regular file: " + "old_size %llu, new_size %llu\n", + (u64)i_size_read(inode), + (u64)(pos + len)); +#endif /* CONFIG_SSDFS_DEBUG */ + + last_blk = U64_MAX; + } else if (i_size_read(inode) > 0) { + last_blk = (i_size_read(inode) - 1) >> + fsi->log_pagesize; + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start_blk %llu, end_blk %llu, last_blk %llu\n", + (u64)start_blk, (u64)end_blk, + (u64)last_blk); +#endif /* CONFIG_SSDFS_DEBUG */ + + cur_blk = start_blk; + do { + if (last_blk >= U64_MAX) + is_new_blk = true; + else + is_new_blk = cur_blk > last_blk; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("cur_blk %llu, is_new_blk %#x, blks %u\n", + (u64)cur_blk, is_new_blk, blks); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (is_new_blk) { + if (!need_add_block(page)) { + set_page_new(page); + err = ssdfs_reserve_free_pages(fsi, 1, + SSDFS_USER_DATA_PAGES); + if (!err) + blks++; + } + +#ifdef CONFIG_SSDFS_DEBUG + spin_lock(&fsi->volume_state_lock); + free_pages = fsi->free_pages; + spin_unlock(&fsi->volume_state_lock); + + SSDFS_DBG("free_pages %llu, blks %u, err %d\n", + free_pages, blks, err); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (err) { + spin_lock(&fsi->volume_state_lock); + fsi->free_pages += blks; + spin_unlock(&fsi->volume_state_lock); + + ssdfs_unlock_page(page); + ssdfs_put_page(page); + + ssdfs_write_failed(mapping, pos + len); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); + SSDFS_DBG("volume hasn't free space\n"); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err; + } + } else if (!PageDirty(page)) { + /* + * ssdfs_write_end() marks page as dirty + */ + ssdfs_account_updated_user_data_pages(fsi, 1); + } + + cur_blk++; + } while (cur_blk < end_blk); + } + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + *pagep = page; + + if ((len == PAGE_SIZE) || PageUptodate(page)) + return 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("pos %llu, inode_size %llu\n", + pos, (u64)i_size_read(inode)); +#endif /* CONFIG_SSDFS_DEBUG */ + + if ((pos & PAGE_MASK) >= i_size_read(inode)) { + unsigned start = pos & (PAGE_SIZE - 1); + unsigned end = start + len; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("start %u, end %u, len %u\n", + start, end, len); +#endif /* CONFIG_SSDFS_DEBUG */ + + /* Reading beyond i_size is simple: memset to zero */ + zero_user_segments(page, 0, start, end, PAGE_SIZE); + return 0; + } + + return ssdfs_readpage_nolock(file, page, SSDFS_CURRENT_THREAD_READ); +} + +/* + * After a successful ssdfs_write_begin(), and data copy, + * ssdfs_write_end() must be called. + */ +static +int ssdfs_write_end(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned copied, + struct page *page, void *fsdata) +{ + struct inode *inode = mapping->host; + pgoff_t index = page->index; + unsigned start = pos & (PAGE_SIZE - 1); + unsigned end = start + copied; + loff_t old_size = i_size_read(inode); + int err = 0; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu, pos %llu, len %u, copied %u, " + "index %lu, start %u, end %u, old_size %llu\n", + inode->i_ino, pos, len, copied, + index, start, end, old_size); +#endif /* CONFIG_SSDFS_DEBUG */ + + if (copied < len) { + /* + * VFS copied less data to the page that it intended and + * declared in its '->write_begin()' call via the @len + * argument. Just tell userspace to retry the entire page. + */ + if (!PageUptodate(page)) { + copied = 0; + goto out; + } + } + + if (old_size < (index << PAGE_SHIFT) + end) { + i_size_write(inode, (index << PAGE_SHIFT) + end); + mark_inode_dirty_sync(inode); + } + + flush_dcache_page(page); + + SetPageUptodate(page); + if (!PageDirty(page)) + __set_page_dirty_nobuffers(page); + +out: + ssdfs_unlock_page(page); + ssdfs_put_page(page); + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("page %p, count %d\n", + page, page_ref_count(page)); +#endif /* CONFIG_SSDFS_DEBUG */ + + return err ? err : copied; +} + +/* + * The ssdfs_direct_IO() is called by the generic read/write + * routines to perform direct_IO - that is IO requests which + * bypass the page cache and transfer data directly between + * the storage and the application's address space. + */ +static ssize_t ssdfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) +{ + /* TODO: implement */ + return -ERANGE; +} + +/* + * The ssdfs_fsync() is called by the fsync(2) system call. + */ +int ssdfs_fsync(struct file *file, loff_t start, loff_t end, int datasync) +{ + struct inode *inode = file->f_mapping->host; + int err; + +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("ino %lu, start %llu, end %llu, datasync %#x\n", + (unsigned long)inode->i_ino, (unsigned long long)start, + (unsigned long long)end, datasync); +#endif /* CONFIG_SSDFS_DEBUG */ + + trace_ssdfs_sync_file_enter(inode); + + err = filemap_write_and_wait_range(inode->i_mapping, start, end); + if (err) { + trace_ssdfs_sync_file_exit(file, datasync, err); +#ifdef CONFIG_SSDFS_DEBUG + SSDFS_DBG("fsync failed: ino %lu, start %llu, " + "end %llu, err %d\n", + (unsigned long)inode->i_ino, + (unsigned long long)start, + (unsigned long long)end, + err); +#endif /* CONFIG_SSDFS_DEBUG */ + return err; + } + + inode_lock(inode); + sync_inode_metadata(inode, 1); + blkdev_issue_flush(inode->i_sb->s_bdev); + inode_unlock(inode); + + trace_ssdfs_sync_file_exit(file, datasync, err); + + return err; +} + +const struct file_operations ssdfs_file_operations = { + .llseek = generic_file_llseek, + .read_iter = generic_file_read_iter, + .write_iter = generic_file_write_iter, + .unlocked_ioctl = ssdfs_ioctl, + .mmap = generic_file_mmap, + .open = generic_file_open, + .fsync = ssdfs_fsync, + .splice_read = generic_file_splice_read, + .splice_write = iter_file_splice_write, +}; + +const struct inode_operations ssdfs_file_inode_operations = { + .getattr = ssdfs_getattr, + .setattr = ssdfs_setattr, + .listxattr = ssdfs_listxattr, + .get_inode_acl = ssdfs_get_acl, + .set_acl = ssdfs_set_acl, +}; + +const struct inode_operations ssdfs_special_inode_operations = { + .setattr = ssdfs_setattr, + .listxattr = ssdfs_listxattr, + .get_inode_acl = ssdfs_get_acl, + .set_acl = ssdfs_set_acl, +}; + +const struct inode_operations ssdfs_symlink_inode_operations = { + .get_link = page_get_link, + .getattr = ssdfs_getattr, + .setattr = ssdfs_setattr, + .listxattr = ssdfs_listxattr, +}; + +const struct address_space_operations ssdfs_aops = { + .read_folio = ssdfs_read_folio, + .readahead = ssdfs_readahead, + .writepage = ssdfs_writepage, + .writepages = ssdfs_writepages, + .write_begin = ssdfs_write_begin, + .write_end = ssdfs_write_end, + .direct_IO = ssdfs_direct_IO, +}; From patchwork Sat Feb 25 01:09:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viacheslav Dubeyko X-Patchwork-Id: 13151980 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 438F3C7EE30 for ; Sat, 25 Feb 2023 01:21:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229811AbjBYBVZ (ORCPT ); Fri, 24 Feb 2023 20:21:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49004 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229805AbjBYBTk (ORCPT ); Fri, 24 Feb 2023 20:19:40 -0500 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0675C1589F for ; Fri, 24 Feb 2023 17:18:17 -0800 (PST) Received: by mail-oi1-x22b.google.com with SMTP id bg11so812411oib.5 for ; Fri, 24 Feb 2023 17:18:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dubeyko-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Jup9EBcd3t7MSDZcKBvvRB0BzHwVjq3QJkgr/CpCrJI=; b=527Dwf1sKsAQ+3iKDibWaZwv3fbOcnEngj7BMPxGv9fs8Xra25I7v+qASGbt/5rael xn5k/n2eQoj2vwd3CZEZVqldUHPsnszrM+2Zxb4+gdf7MS7VZALhQTxSkencA7KExybK znuL9ur7eemN/+ssFtlthwJ1z/ybKA3Qs/BxoI2ZzrCds8MT4KsXewjxcXlfW8DCs0YG gnZJeTbI+aHw4m9SoLFRpw6E0TpY/lDbnVKP4nAfahpk1IycA2WHB7VtYoMuu7XZBjXH ri14uhX6ufJRR6LPn82FJGVdhuntIfM1rwRND517DP98g401yskYV4FnILS9C2OTCP3t rbhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Jup9EBcd3t7MSDZcKBvvRB0BzHwVjq3QJkgr/CpCrJI=; b=zy4+AZeJYqHq1tA4wCc5WrWh9wpwnUZE/AEtktlrU1D7nDh0lGpevGVGRoO6aLxPvP YpnqVElFbJbm4QxcMzhHDH2OHGArFGPGrGn7qEognFBl4PccvTo6PSwQ5p9PL2ho4BPQ K+byMsUKGSrHz5jjv9HSMmpq6QPvQmYhnsGKZnxZpt+mAlymygoD47F8HwxXBKTbV6qN xBS7UojwTPrWdAwjT8M+uAZDhiscGNC2WbQ2ymFFLjVGbdqzPKoWBRcom3My2rmWufKs tgWlbCPTKNk9dDWrOQufEd8jT49gZ534+hlIIJGkZ6JmMqe9Fno/etkCIuDKrceDgDba Oiqw== X-Gm-Message-State: AO0yUKU2csXAzxlpx31ClH+5c20JoplpCptwm/epcEqjzIU52JMEh1Qf z0VQO/dLG6fhaj3uOAzvfUpm866MKmvwWIWF X-Google-Smtp-Source: AK7set+olN9wIPpk0GxZ/cytePRp7dK+2OkgT/tHGdCczQcsTlCDNkJI3zBhYOU7xcO5QeTmlCjiBg== X-Received: by 2002:a05:6808:984:b0:383:e383:f07b with SMTP id a4-20020a056808098400b00383e383f07bmr3152320oic.48.1677287895846; Fri, 24 Feb 2023 17:18:15 -0800 (PST) Received: from system76-pc.. (172-125-78-211.lightspeed.sntcca.sbcglobal.net. [172.125.78.211]) by smtp.gmail.com with ESMTPSA id q3-20020acac003000000b0037d74967ef6sm363483oif.44.2023.02.24.17.18.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Feb 2023 17:18:15 -0800 (PST) From: Viacheslav Dubeyko To: linux-fsdevel@vger.kernel.org Cc: viacheslav.dubeyko@bytedance.com, luka.perkov@sartura.hr, bruno.banelli@sartura.hr, Viacheslav Dubeyko Subject: [RFC PATCH 76/76] introduce SSDFS file system Date: Fri, 24 Feb 2023 17:09:27 -0800 Message-Id: <20230225010927.813929-77-slava@dubeyko.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230225010927.813929-1-slava@dubeyko.com> References: <20230225010927.813929-1-slava@dubeyko.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Integrate SSDFS file system into kernel infrastructure. Signed-off-by: Viacheslav Dubeyko CC: Viacheslav Dubeyko CC: Luka Perkov CC: Bruno Banelli --- fs/Kconfig | 1 + fs/Makefile | 1 + fs/ssdfs/Kconfig | 300 +++++++++++++++++++++++++++++++++++++ fs/ssdfs/Makefile | 50 +++++++ include/uapi/linux/magic.h | 1 + 5 files changed, 353 insertions(+) create mode 100644 fs/ssdfs/Kconfig create mode 100644 fs/ssdfs/Makefile diff --git a/fs/Kconfig b/fs/Kconfig index 2685a4d0d353..e969c0564926 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -319,6 +319,7 @@ source "fs/sysv/Kconfig" source "fs/ufs/Kconfig" source "fs/erofs/Kconfig" source "fs/vboxsf/Kconfig" +source "fs/ssdfs/Kconfig" endif # MISC_FILESYSTEMS diff --git a/fs/Makefile b/fs/Makefile index 4dea17840761..61262a487a1f 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -137,3 +137,4 @@ obj-$(CONFIG_EFIVAR_FS) += efivarfs/ obj-$(CONFIG_EROFS_FS) += erofs/ obj-$(CONFIG_VBOXSF_FS) += vboxsf/ obj-$(CONFIG_ZONEFS_FS) += zonefs/ +obj-$(CONFIG_SSDFS) += ssdfs/ diff --git a/fs/ssdfs/Kconfig b/fs/ssdfs/Kconfig new file mode 100644 index 000000000000..48786339798e --- /dev/null +++ b/fs/ssdfs/Kconfig @@ -0,0 +1,300 @@ +config SSDFS + tristate "SSDFS file system support" + depends on BLOCK || MTD + help + SSDFS is flash-friendly file system. The architecture of + file system has been designed to be the LFS file system + that can: (1) exclude the GC overhead, (2) prolong NAND + flash devices lifetime, (3) achieve a good performance + balance even if the NAND flash device's lifetime is a priority. + + If unsure, say N. + +config SSDFS_BLOCK_DEVICE + bool "Block layer support" + depends on BLOCK && SSDFS + depends on BLK_DEV_ZONED + default y + help + This option enables block layer support. + + If unsure, say N. + +config SSDFS_MTD_DEVICE + bool "MTD support" + depends on !SSDFS_BLOCK_DEVICE && MTD && SSDFS + default n + help + This option enables MTD layer support. + + If unsure, say N. + +config SSDFS_POSIX_ACL + bool "SSDFS POSIX Access Control Lists" + depends on SSDFS + select FS_POSIX_ACL + help + POSIX Access Control Lists (ACLs) support permissions for users and + groups beyond the owner/group/world scheme. + + To learn more about Access Control Lists, visit the POSIX ACLs for + Linux website . + + If you don't know what Access Control Lists are, say N + +config SSDFS_SECURITY + bool "SSDFS Security Labels" + depends on SSDFS + help + Security labels support alternative access control models + implemented by security modules like SELinux. This option + enables an extended attribute handler for file security + labels in the SSDFS filesystem. + + If you are not using a security module that requires using + extended attributes for file security labels, say N. + +menu "Write amplification management" + +config SSDFS_ZLIB + bool "SSDFS ZLIB compression support" + select ZLIB_INFLATE + select ZLIB_DEFLATE + depends on SSDFS + default y + help + Zlib is designed to be a free, general-purpose, legally unencumbered, + lossless data-compression library for use on virtually any computer + hardware and operating system. It offers a good trade-off between + compression achieved and the amount of CPU time and memory necessary + to compress and decompress. See for + further information. + + If unsure, say Y. + +config SSDFS_ZLIB_COMR_LEVEL + int "Zlib compression level (0 => NO_COMPRESSION, 9 => BEST_COMPRESSION)" + depends on SSDFS_ZLIB + range 0 9 + default 9 + help + Select Zlib compression level. + Examples: + 0 => Z_NO_COMPRESSION + 1 => Z_BEST_SPEED + 9 => Z_BEST_COMPRESSION + +config SSDFS_LZO + bool "SSDFS LZO compression support" + select LZO_COMPRESS + select LZO_DECOMPRESS + depends on SSDFS + default n + help + minilzo-based compression. Generally works better than Zlib. + LZO compression is mainly aimed at embedded systems with slower + CPUs where the overheads of zlib are too high. + + If unsure, say N. + +config SSDFS_DIFF_ON_WRITE + bool "SSDFS Diff-On-Write support" + depends on SSDFS + help + This option enables delta-encoding support. + + If unsure, say N. + +config SSDFS_DIFF_ON_WRITE_METADATA + bool "SSDFS Diff-On-Write support (metadata case)" + depends on SSDFS_DIFF_ON_WRITE + help + This option enables delta-encoding support for metadata. + + If unsure, say N. + +config SSDFS_DIFF_ON_WRITE_METADATA_THRESHOLD + int "Btree node modification percentage threshold (1% - 50%)" + range 1 50 + default 25 + depends on SSDFS_DIFF_ON_WRITE_METADATA + help + Select btree node modification percentage threshold as + upper bound of modified items in a node. + +config SSDFS_DIFF_ON_WRITE_USER_DATA + bool "SSDFS Diff-On-Write support (user data case)" + depends on SSDFS_DIFF_ON_WRITE + help + This option enables delta-encoding support for user data. + + If unsure, say N. + +config SSDFS_DIFF_ON_WRITE_USER_DATA_THRESHOLD + int "Logical block's modified bits percentage threshold (1% - 50%)" + range 1 50 + default 50 + depends on SSDFS_DIFF_ON_WRITE_USER_DATA + help + Select logical block modification percentage threshold as + upper bound of modified bits in the logical block. + +endmenu + +menu "Performance" + +config SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET + bool "SSDFS fixed superblock segments set" + depends on SSDFS + default y + help + This option enables the technique of repeatable using the + reserved set of superblock segments in the beginning + of a volume. + + If unsure, say N. + +config SSDFS_SAVE_WHOLE_BLK2OFF_TBL_IN_EVERY_LOG + bool "Save whole offset translation table in every log" + depends on SSDFS + help + This option enables the technique of storing the whole + offset translation table in every log. SSDFS can distribute + the complete state of ofset translation table among multiple + logs. It could decrease amount of metadata in the log. + However, this policy increases the amount of read I/O requests + because it requires to read multiple log headers in the same + erase block. If a big erase block contains a lot of small + partial logs then it can degrades file system performance + because of significant amount of read I/O during + initialization phase. + + If unsure, say N. + +endmenu + +menu "Reliability" + +config SSDFS_CHECK_LOGICAL_BLOCK_EMPTYNESS + bool "SSDFS check a logical block emptyness on every write" + depends on SSDFS + help + This option enables the technique of checking a logical block + emptyness on every write. The goal of this technique is + to prevent the re-writing pages with existing data because + SSD's FTL can manage this sutiation. However, this can be the + source of data and metadata corruption in the case of some + issues in file system driver logic. But it needs to take into + account that this technique could degrade the write performance + of file system driver. Also, file system volume has to be erased + during creation by mkfs. Otherwise, file system driver will fail + to write even for the case correct write operations. + + If unsure, say N. + +endmenu + +menu "Development" + +config SSDFS_DEBUG + bool "SSDFS debugging" + depends on SSDFS + help + This option enables additional pre-condition and post-condition + checking in functions. The main goal of this option is providing + environment for debugging code in SSDFS driver and excluding + debug checking from end-users' kernel build. This option enables + debug output by means of pr_debug() from all files too. You can + disable debug output from any file via the 'dynamic_debug/control' + file. Please, see Documentation/dynamic-debug-howto.txt for + additional information. + + If you are going to debug SSDFS driver then choose Y here. + If unsure, say N. + +config SSDFS_TRACK_API_CALL + bool "SSDFS API calls tracking" + depends on SSDFS + help + This option enables output from the key subsystems' fucntions. + The main goal of this option is providing the vision of + file system activity. + + If you are going to debug SSDFS driver then choose Y here. + If unsure, say N. + +config SSDFS_MEMORY_LEAKS_ACCOUNTING + bool "SSDFS memory leaks accounting" + depends on SSDFS + help + This option enables accounting of memory allocation + (kmalloc, kzalloc, kcalloc, kmem_cache_alloc, alloc_page) + by means of incrementing a global counters and deallocation + (kfree, kmem_cache_free, free_page) by means decrementing + the same global counters. Also, there are special global counters + that tracking the number of locked/unlocked memory pages. + However, global counters have an unpleasant side effect. + If there are several mounted SSDFS partitions in the system + then memory leaks accounting subsystem is miscalculating + the number of memory leaks and triggers false alarms. + It makes sense to use the memory leaks accounting subsystem + only for single mounted SSDFS partition in the system. + + If you are going to check memory leaks in SSDFS driver then + choose Y here. If unsure, say N. + +config SSDFS_SHOW_CONSUMED_MEMORY + bool "SSDFS shows consumed memory" + select SSDFS_MEMORY_LEAKS_ACCOUNTING + help + This option enables showing the amount of allocated + memory and memory pages in the form of memory leaks + on every syncfs event. + + If you are going to check memory consumption in SSDFS driver + then choose Y here. If unsure, say N. + +config SSDFS_BTREE_CONSISTENCY_CHECK + bool "SSDFS btree consistency check" + depends on SSDFS + help + This option enables checking the btree consistency. + + If you are going to check btree consistency in SSDFS driver then + choose Y here. If unsure, say N. + +config SSDFS_BTREE_STRICT_CONSISTENCY_CHECK + bool "SSDFS btree strict consistency check" + depends on SSDFS + help + This option enables checking the btree consistency + after every btree's operation. This option could + seriously degrades the file system performance. + + If you are going to check btree consistency in SSDFS driver then + choose Y here. If unsure, say N. + +config SSDFS_TESTING + bool "SSDFS testing" + depends on SSDFS + select SSDFS_DEBUG + select SSDFS_MEMORY_LEAKS_ACCOUNTING + select SSDFS_BTREE_CONSISTENCY_CHECK + help + This option enables testing infrastructure of SSDFS + filesystem. + + If you are going to test SSDFS driver then choose Y here. + If unsure, say N. + +config SSDFS_UNDER_DEVELOPMENT_FUNC + bool "SSDFS under development functionality" + depends on SSDFS + help + This option enables functionality that is under + development yet. + + If you are going to check under development functionality + in SSDFS driver then choose Y here. If unsure, say N. + +endmenu diff --git a/fs/ssdfs/Makefile b/fs/ssdfs/Makefile new file mode 100644 index 000000000000..d910e3e40257 --- /dev/null +++ b/fs/ssdfs/Makefile @@ -0,0 +1,50 @@ +# +# SPDX-License-Identifier: BSD-3-Clause-Clear +# Makefile for the Linux SSD-oriented File System (SSDFS) +# +# + +obj-$(CONFIG_SSDFS) += ssdfs.o + +#ccflags-$(CONFIG_SSDFS_DEBUG) += -DDEBUG + +ssdfs-y := super.o fs_error.o recovery.o \ + recovery_fast_search.o recovery_slow_search.o \ + recovery_thread.o \ + options.o page_array.o page_vector.o \ + dynamic_array.o volume_header.o log_footer.o \ + block_bitmap.o block_bitmap_tables.o \ + peb_block_bitmap.o segment_block_bitmap.o \ + sequence_array.o offset_translation_table.o \ + request_queue.o readwrite.o \ + peb.o peb_gc_thread.o peb_read_thread.o peb_flush_thread.o \ + peb_container.o \ + segment.o segment_tree.o current_segment.o \ + segment_bitmap.o segment_bitmap_tables.o \ + peb_mapping_queue.o \ + peb_mapping_table.o peb_mapping_table_thread.o \ + peb_mapping_table_cache.o peb_migration_scheme.o \ + btree_search.o btree_node.o btree_hierarchy.o btree.o \ + extents_queue.o extents_tree.o \ + shared_extents_tree.o shared_extents_tree_thread.o \ + inodes_tree.o dentries_tree.o \ + shared_dictionary.o shared_dictionary_thread.o \ + xattr_tree.o \ + snapshot_requests_queue.o snapshot_rules.o \ + snapshot.o snapshots_tree.o snapshots_tree_thread.o \ + invalidated_extents_tree.o \ + inode.o file.o dir.o ioctl.o \ + sysfs.o \ + xattr.o xattr_user.o xattr_trusted.o \ + compression.o + +ssdfs-$(CONFIG_SSDFS_POSIX_ACL) += acl.o +ssdfs-$(CONFIG_SSDFS_SECURITY) += xattr_security.o +ssdfs-$(CONFIG_SSDFS_ZLIB) += compr_zlib.o +ssdfs-$(CONFIG_SSDFS_LZO) += compr_lzo.o +ssdfs-$(CONFIG_SSDFS_MTD_DEVICE) += dev_mtd.o +ssdfs-$(CONFIG_SSDFS_BLOCK_DEVICE) += dev_bdev.o dev_zns.o +ssdfs-$(CONFIG_SSDFS_TESTING) += testing.o +ssdfs-$(CONFIG_SSDFS_DIFF_ON_WRITE) += diff_on_write.o +ssdfs-$(CONFIG_SSDFS_DIFF_ON_WRITE_METADATA) += diff_on_write_metadata.o +ssdfs-$(CONFIG_SSDFS_DIFF_ON_WRITE_USER_DATA) += diff_on_write_user_data.o diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h index 6325d1d0e90f..f49a25d61bc0 100644 --- a/include/uapi/linux/magic.h +++ b/include/uapi/linux/magic.h @@ -95,6 +95,7 @@ #define BPF_FS_MAGIC 0xcafe4a11 #define AAFS_MAGIC 0x5a3c69f0 #define ZONEFS_MAGIC 0x5a4f4653 +#define SSDFS_SUPER_MAGIC 0x53734466 /* SsDf */ /* Since UDF 2.01 is ISO 13346 based... */ #define UDF_SUPER_MAGIC 0x15013346