From patchwork Wed Nov 27 00:18:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13886442 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DFB3E53A7 for ; Wed, 27 Nov 2024 00:18:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732666699; cv=none; b=us2uIBVw1MSX0P6Z2uAxdCppWfiPceOiAudH5zcs3wLL31zQX/gRSHBIsMEM/lhf/WKe5kphq6StPgug7IFxm3z/vV0ce0a2UbPlH2Nn6YjcnHNgXYwThNwHucZmQjjmdpz1xRylJORFrIPr47/wvnPsUQdL8syy90T6z+5zq8k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732666699; c=relaxed/simple; bh=7rlnOMoFsS9aD1uz+ODdmMOzLVl3yHhkZ6Gk1eGk4G4=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=sPMUU3SUkvclCg6/KUoCygaXkQKlcuvR1NvZKUEbY5GbKUo5rjh7GEWdZMRHLKSPAhxQr+1yLCDPh0ZkMiTn+R+9cMQuqI6XePaA7SIpBXlgVF/wT821yviEUfI/qqicB8X2ctO6lr3QAxOnVDlADAupOmH3U2Wlc5Fd8Qkvm24= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=e3dV8Ok3; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="e3dV8Ok3" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5179DC4CECF; Wed, 27 Nov 2024 00:18:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1732666697; bh=7rlnOMoFsS9aD1uz+ODdmMOzLVl3yHhkZ6Gk1eGk4G4=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=e3dV8Ok3p5UVxsi2/OarCe/7lVNgMNCA/xN0os8R6CNDe+OOm8Y4KFggTDXqaG2sA R0BG5UBcUiJp1LoPzXjgCEaGPUwZsdCxZnlwqghDghu0MJ1mSxY2Vk8UnFkVCy5JPQ 8suX/fh6Iv6fDz5IeNbV9hQCxi2Nhus+nMyPVBiy7vg7Rnb1OBzfe48xBY1RppwdZF IW1CiWrFN8l/tlJdnqi1ggruKVcramRQHZbaPc64DUvUEZWcE+Utn3z6KbT+mxCZXe YBgRW2QdHFhko2kgFC5EwIWCJICjePoVTGdV0Xcu1goFzm28wVTP2gLJZD7SSwT6yH zOdq+m9L8L+og== Date: Tue, 26 Nov 2024 16:18:16 -0800 Subject: [PATCH 01/10] design: update metadata reconstruction chapter From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, hch@lst.de, cem@kernel.org, linux-xfs@vger.kernel.org Message-ID: <173266662227.996198.3279322255032676510.stgit@frogsfrogsfrogs> In-Reply-To: <173266662205.996198.11304294193325450774.stgit@frogsfrogsfrogs> References: <173266662205.996198.11304294193325450774.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong We've landed online repair and full backrefs in the filesystem, so update the links to the new sections and transform future tense to present tense. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- .../reconstruction.asciidoc | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/design/XFS_Filesystem_Structure/reconstruction.asciidoc b/design/XFS_Filesystem_Structure/reconstruction.asciidoc index f172e0f8161656..f4c10217910b6c 100644 --- a/design/XFS_Filesystem_Structure/reconstruction.asciidoc +++ b/design/XFS_Filesystem_Structure/reconstruction.asciidoc @@ -1,10 +1,6 @@ [[Reconstruction]] = Metadata Reconstruction -[NOTE] -This is a theoretical discussion of how reconstruction could work; none of this -is implemented as of 2015. - A simple UNIX filesystem can be thought of in terms of a directed acyclic graph. To a first approximation, there exists a root directory node, which points to other nodes. Those other nodes can themselves be directories or they can be @@ -45,9 +41,14 @@ The xref:Reverse_Mapping_Btree[reverse-mapping B+tree] fills in part of the puzzle. Since it contains copies of every entry in each inode’s data and attribute forks, we can fix a corrupted block map with these records. Furthermore, if the inode B+trees become corrupt, it is possible to visit all -inode chunks using the reverse-mapping data. Should XFS ever gain the ability -to store parent directory information in each inode, it also becomes possible +inode chunks using the reverse-mapping data. xref:Parent_Pointers[Directory +parent pointers] fill in the rest of the puzzle by mirroring the directory tree +structure with parent directory information in each inode. It is now possible to resurrect damaged directory trees, which should reduce the complaints about inodes ending up in +/lost+found+. Everything else in the per-AG primary -metadata can already be reconstructed via +xfs_repair+. Hopefully, -reconstruction will not turn out to be a fool's errand. +metadata can already be reconstructed via +xfs_repair+. + +See the +https://docs.kernel.org/filesystems/xfs/xfs-online-fsck-design.html[design +document] for online repair for a more thorough discussion of how this metadata +are put to use. From patchwork Wed Nov 27 00:18:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13886443 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8BC15881E for ; Wed, 27 Nov 2024 00:18:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732666713; cv=none; b=s2NGdyWhn3iSDGM8s8R5RZlXAKyKRBZUseC+FUezKD/STntQ8+C6oVTHP2VoA9zS2SrvXZxF6/h9FY+HTn64ka4CzBWHpuhQuE+OHZl/gzvX/a9qhwLnRIux+3/kGqRVByusmcnqTJCpRG7hfzNGnTHwrENBli4mAL0uHu7nHyo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732666713; c=relaxed/simple; bh=l13/TD/1Joj+LQn85oI816+hudg/RxoGoeATTMMw8Gc=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=YTGRL76ZmiZmPlhZkdZin8y1He96QAp1sBAIEzFZM/8PrXUN3e5EothT6lyeoYd+TdnyRv/LXW6DuxwPMWsIMV3mVkQBLm5Oea4xxnCOIb89crfA4qw/tSR4Eb74g0/mlbfUc06ErLxO7idq0DWu+Az39YoEnibZ6qbjC2mSF9Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=TnyuuowL; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="TnyuuowL" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 146C1C4CECF; Wed, 27 Nov 2024 00:18:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1732666713; bh=l13/TD/1Joj+LQn85oI816+hudg/RxoGoeATTMMw8Gc=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=TnyuuowLeOskexZ2+FtmkqBECc7PWFlo3VOVVVGQ7F0DX2TngCejLjXxzv+s4A5mb VBR3uJZ4Ap6dLtCr86nHNS6eovHEIDZhoWqvlGkN2+5UVEUVGb/p8XPagdWpF84P6F drEjY3gpEM+UnO+3CusQJRSuuobYJJOd33p5z0ro+j1EO46BlWlDld3LzjGZUtyK7h jLPWEdza3MjI69WBvEfbUe08OGf6aD2tlnEaV9lqvF62vc3V3xti4Knmk8DoQgHcAX cIuWmgXEL3ER9oWuGW/MO9Q8dRaZy2aHlunkm2LreDh91TWJ4cGkDBxhQfqUTpN74B ppVGVkrEqyEOg== Date: Tue, 26 Nov 2024 16:18:32 -0800 Subject: [PATCH 02/10] design: document filesystem properties From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, hch@lst.de, cem@kernel.org, linux-xfs@vger.kernel.org Message-ID: <173266662241.996198.11668830686120619782.stgit@frogsfrogsfrogs> In-Reply-To: <173266662205.996198.11304294193325450774.stgit@frogsfrogsfrogs> References: <173266662205.996198.11304294193325450774.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Now that xfsprogs utilities can set properties to coordinate the behavior of other xfsprogs utilities, record them in the ondisk format documentation. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- .../fs_properties.asciidoc | 28 ++++++++++++++++++++ .../xfs_filesystem_structure.asciidoc | 2 + 2 files changed, 30 insertions(+) create mode 100644 design/XFS_Filesystem_Structure/fs_properties.asciidoc diff --git a/design/XFS_Filesystem_Structure/fs_properties.asciidoc b/design/XFS_Filesystem_Structure/fs_properties.asciidoc new file mode 100644 index 00000000000000..b639aec9ab6366 --- /dev/null +++ b/design/XFS_Filesystem_Structure/fs_properties.asciidoc @@ -0,0 +1,28 @@ +[[Filesystem_Properties]] += Filesystem Properties + +System administrators can set filesystem-wide properties to coordinate the +behavior of userspace XFS administration tools. These properties are recorded +as extended attributes of the +ATTR_ROOT+ namesace that are set on the root +directory. + +[options="header"] +|===== +| Property | Description +| +xfs:autofsck+ | Online fsck background scanning behavior +|===== + +*xfs:autofsck*:: +This property controls the behavior of background online fsck. +Unrecognized values are treated as if the property was not set. +Check the +xfs_scrub+ manual page for more information. + +.autofsck property values +[options="header"] +|===== +| Value | Description +| +none+ | Do not perform background scans. +| +check+ | Only check metadata. +| +optimize+ | Check and optimize metadata. +| +repair+ | Check, repair, or optimize metadata. +|===== diff --git a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc index a95a5806172a0c..689e2a874c13e9 100644 --- a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc +++ b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc @@ -84,6 +84,8 @@ include::journaling_log.asciidoc[] include::internal_inodes.asciidoc[] +include::fs_properties.asciidoc[] + :leveloffset: 0 Dynamically Allocated Structures From patchwork Wed Nov 27 00:18:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13886444 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 70C1F8BE5 for ; Wed, 27 Nov 2024 00:18:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732666729; cv=none; b=Mo1rO788AEaMdWE8KuCBBLFET1TQnZyU5UV+HgVMEy0xmbiHl8VI1y0e47yf/WtPhpOOFEGdaNN+0IvutG2HUxiDPW8YyVjRxmlgpMouwuhcJ00N9oiqzKq1wJZdbvgcAYx2AqByO+WfnzwWVAvcYTBsi9DayoTkBE+3YY3nBNo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732666729; c=relaxed/simple; bh=fN35Jos31uX+mHjYz7ERCWMKd6Zv7/sZonufw4Q2CcU=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Xj9D/+cxyeJUeBMEZtlYWkdns+OkvbfqPA+n5InEwsjYrJ4+bZ32i8g3qGupmU5cluvW45rlSd2qFIPx+ggnbhEAFbVVoRTrvmtUi0EeXcvgyVgB2YD3O6AMurFWyQ3kmRfVoDvG+Z/bZNvE8n3A18Bp3UiebS+wgdUaZ3KOp9Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=e0puSpiN; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="e0puSpiN" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BFD58C4CECF; Wed, 27 Nov 2024 00:18:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1732666728; bh=fN35Jos31uX+mHjYz7ERCWMKd6Zv7/sZonufw4Q2CcU=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=e0puSpiN7dzqIh8XPhSBpOuRKbgXVUYo8cUApppUQn/5YVCaOEEoxRVp1nQF8s9mZ LoDGJYxYIyX+53/sZOvXGw3uDvBJNwDJDx63x5BHvDmjYmPtZVYnw6NRrHLKuIrNh2 DkQkSSZ/wygyEAeQTfPQcP5dOSTJSfHevyK4p6kU1ck8DtzGw4ru62jPBY3mcVhkjo awBEr4ExS3b29zxI7STrSsc2cHJoNncvdYAvNhHVVJG6xmCe0glHrwTDOVWSUUS0tn BJynN35jy4ALUj5yDxEgmEFbTij8qvOo019kzsZhDGNrWxR08T9CkoswNhuRp/ovQ+ azjan7mcXn1CQ== Date: Tue, 26 Nov 2024 16:18:48 -0800 Subject: [PATCH 03/10] design: move superblock documentation to a separate file From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, hch@lst.de, cem@kernel.org, linux-xfs@vger.kernel.org Message-ID: <173266662256.996198.18138050815254201801.stgit@frogsfrogsfrogs> In-Reply-To: <173266662205.996198.11304294193325450774.stgit@frogsfrogsfrogs> References: <173266662205.996198.11304294193325450774.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Move the ondisk superblock docs to a separate file. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- .../allocation_groups.asciidoc | 550 -------------------- .../XFS_Filesystem_Structure/superblock.asciidoc | 548 ++++++++++++++++++++ 2 files changed, 549 insertions(+), 549 deletions(-) create mode 100644 design/XFS_Filesystem_Structure/superblock.asciidoc diff --git a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc index d7fd63ea20a646..e2cdaab5e03d3f 100644 --- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc +++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc @@ -31,555 +31,7 @@ image::images/6.png[] Each of these structures are expanded upon in the following sections. -[[Superblocks]] -== Superblocks - -Each AG starts with a superblock. The first one, in AG 0, is the primary -superblock which stores aggregate AG information. Secondary superblocks are -only used by xfs_repair when the primary superblock has been corrupted. A -superblock is one sector in length. - -The superblock is defined by the following structure. The description of each -field follows. - -[source, c] ----- -struct xfs_sb -{ - __uint32_t sb_magicnum; - __uint32_t sb_blocksize; - xfs_rfsblock_t sb_dblocks; - xfs_rfsblock_t sb_rblocks; - xfs_rtblock_t sb_rextents; - uuid_t sb_uuid; - xfs_fsblock_t sb_logstart; - xfs_ino_t sb_rootino; - xfs_ino_t sb_rbmino; - xfs_ino_t sb_rsumino; - xfs_agblock_t sb_rextsize; - xfs_agblock_t sb_agblocks; - xfs_agnumber_t sb_agcount; - xfs_extlen_t sb_rbmblocks; - xfs_extlen_t sb_logblocks; - __uint16_t sb_versionnum; - __uint16_t sb_sectsize; - __uint16_t sb_inodesize; - __uint16_t sb_inopblock; - char sb_fname[12]; - __uint8_t sb_blocklog; - __uint8_t sb_sectlog; - __uint8_t sb_inodelog; - __uint8_t sb_inopblog; - __uint8_t sb_agblklog; - __uint8_t sb_rextslog; - __uint8_t sb_inprogress; - __uint8_t sb_imax_pct; - __uint64_t sb_icount; - __uint64_t sb_ifree; - __uint64_t sb_fdblocks; - __uint64_t sb_frextents; - xfs_ino_t sb_uquotino; - xfs_ino_t sb_gquotino; - __uint16_t sb_qflags; - __uint8_t sb_flags; - __uint8_t sb_shared_vn; - xfs_extlen_t sb_inoalignmt; - __uint32_t sb_unit; - __uint32_t sb_width; - __uint8_t sb_dirblklog; - __uint8_t sb_logsectlog; - __uint16_t sb_logsectsize; - __uint32_t sb_logsunit; - __uint32_t sb_features2; - __uint32_t sb_bad_features2; - - /* version 5 superblock fields start here */ - __uint32_t sb_features_compat; - __uint32_t sb_features_ro_compat; - __uint32_t sb_features_incompat; - __uint32_t sb_features_log_incompat; - - __uint32_t sb_crc; - xfs_extlen_t sb_spino_align; - - xfs_ino_t sb_pquotino; - xfs_lsn_t sb_lsn; - uuid_t sb_meta_uuid; - xfs_ino_t sb_rrmapino; -}; ----- -*sb_magicnum*:: -Identifies the filesystem. Its value is +XFS_SB_MAGIC+ ``XFSB'' (0x58465342). - -*sb_blocksize*:: -The size of a basic unit of space allocation in bytes. Typically, this is 4096 -(4KB) but can range from 512 to 65536 bytes. - -*sb_dblocks*:: -Total number of blocks available for data and metadata on the filesystem. - -*sb_rblocks*:: -Number blocks in the real-time disk device. Refer to -xref:Real-time_Devices[real-time sub-volumes] for more information. - -*sb_rextents*:: -Number of extents on the real-time device. - -*sb_uuid*:: -UUID (Universally Unique ID) for the filesystem. Filesystems can be mounted by -the UUID instead of device name. - -*sb_logstart*:: -First block number for the journaling log if the log is internal (ie. not on a -separate disk device). For an external log device, this will be zero (the log -will also start on the first block on the log device). The identity of the log -devices is not recorded in the filesystem, but the UUIDs of the filesystem and -the log device are compared to prevent corruption. - -*sb_rootino*:: -Root inode number for the filesystem. Normally, the root inode is at the -start of the first possible inode chunk in AG 0. This is 128 when using a 4KB -block size. - -*sb_rbmino*:: -Bitmap inode for real-time extents. - -*sb_rsumino*:: -Summary inode for real-time bitmap. - -*sb_rextsize*:: -Realtime extent size in blocks. - -*sb_agblocks*:: -Size of each AG in blocks. For the actual size of the last AG, refer to the -xref:AG_Free_Space_Management[free space] +agf_length+ value. - -*sb_agcount*:: -Number of AGs in the filesystem. - -*sb_rbmblocks*:: -Number of real-time bitmap blocks. - -*sb_logblocks*:: -Number of blocks for the journaling log. - -*sb_versionnum*:: -Filesystem version number. This is a bitmask specifying the features enabled -when creating the filesystem. Any disk checking tools or drivers that do not -recognize any set bits must not operate upon the filesystem. Most of the flags -indicate features introduced over time. If the value of the lower nibble is >= -4, the higher bits indicate feature flags as follows: - -.Version 4 Superblock version flags -[options="header"] -|===== -| Flag | Description -| +XFS_SB_VERSION_ATTRBIT+ | -Set if any inode have extended attributes. If this bit is set; the -+XFS_SB_VERSION2_ATTR2BIT+ is not set; and the +attr2+ mount flag is not -specified, the +di_forkoff+ inode field will not be dynamically adjusted. -See the section about xref:Extended_Attribute_Versions[extended attribute -versions] for more information. - -| +XFS_SB_VERSION_NLINKBIT+ | Set if any inodes use 32-bit di_nlink values. -| +XFS_SB_VERSION_QUOTABIT+ | -Quotas are enabled on the filesystem. This -also brings in the various quota fields in the superblock. - -| +XFS_SB_VERSION_ALIGNBIT+ | Set if sb_inoalignmt is used. -| +XFS_SB_VERSION_DALIGNBIT+ | Set if sb_unit and sb_width are used. -| +XFS_SB_VERSION_SHAREDBIT+ | Set if sb_shared_vn is used. -| +XFS_SB_VERSION_LOGV2BIT+ | Version 2 journaling logs are used. -| +XFS_SB_VERSION_SECTORBIT+ | Set if sb_sectsize is not 512. -| +XFS_SB_VERSION_EXTFLGBIT+ | Unwritten extents are used. This is always set. -| +XFS_SB_VERSION_DIRV2BIT+ | -Version 2 directories are used. This is always set. - -| +XFS_SB_VERSION_MOREBITSBIT+ | -Set if the sb_features2 field in the superblock contains more flags. -|===== - -If the lower nibble of this value is 5, then this is a v5 filesystem; the -+XFS_SB_VERSION2_CRCBIT+ feature must be set in +sb_features2+. - -*sb_sectsize*:: -Specifies the underlying disk sector size in bytes. Typically this is 512 or -4096 bytes. This determines the minimum I/O alignment, especially for direct I/O. - -*sb_inodesize*:: -Size of the inode in bytes. The default is 256 (2 inodes per standard sector) -but can be made as large as 2048 bytes when creating the filesystem. On a v5 -filesystem, the default and minimum inode size are both 512 bytes. - -*sb_inopblock*:: -Number of inodes per block. This is equivalent to +sb_blocksize / sb_inodesize+. - -*sb_fname[12]*:: -Name for the filesystem. This value can be used in the mount command. - -*sb_blocklog*:: -log~2~ value of +sb_blocksize+. In other terms, +sb_blocksize = 2^sb_blocklog^+. - -*sb_sectlog*:: -log~2~ value of +sb_sectsize+. - -*sb_inodelog*:: -log~2~ value of +sb_inodesize+. - -*sb_inopblog*:: -log~2~ value of +sb_inopblock+. - -*sb_agblklog*:: -log~2~ value of +sb_agblocks+ (rounded up). This value is used to generate inode -numbers and absolute block numbers defined in extent maps. - -*sb_rextslog*:: -log~2~ value of +sb_rextents+. - -*sb_inprogress*:: -Flag specifying that the filesystem is being created. - -*sb_imax_pct*:: -Maximum percentage of filesystem space that can be used for inodes. The default -value is 5%. - -*sb_icount*:: -Global count for number inodes allocated on the filesystem. This is only -maintained in the first superblock. - -*sb_ifree*:: -Global count of free inodes on the filesystem. This is only maintained in the -first superblock. - -*sb_fdblocks*:: -Global count of free data blocks on the filesystem. This is only maintained in -the first superblock. - -*sb_frextents*:: -Global count of free real-time extents on the filesystem. This is only -maintained in the first superblock. - -*sb_uquotino*:: -Inode for user quotas. This and the following two quota fields only apply if -+XFS_SB_VERSION_QUOTABIT+ flag is set in +sb_versionnum+. Refer to -xref:Quota_Inodes[quota inodes] for more information. - -*sb_gquotino*:: -Inode for group or project quotas. Group and project quotas cannot be used at -the same time on v4 filesystems. On a v5 filesystem, this inode always stores -group quota information. - -*sb_qflags*:: -Quota flags. It can be a combination of the following flags: - -.Superblock quota flags -[options="header"] -|===== -| Flag | Description -| +XFS_UQUOTA_ACCT+ | User quota accounting is enabled. -| +XFS_UQUOTA_ENFD+ | User quotas are enforced. -| +XFS_UQUOTA_CHKD+ | User quotas have been checked. -| +XFS_PQUOTA_ACCT+ | Project quota accounting is enabled. -| +XFS_OQUOTA_ENFD+ | Other (group/project) quotas are enforced. -| +XFS_OQUOTA_CHKD+ | Other (group/project) quotas have been checked. -| +XFS_GQUOTA_ACCT+ | Group quota accounting is enabled. -| +XFS_GQUOTA_ENFD+ | Group quotas are enforced. -| +XFS_GQUOTA_CHKD+ | Group quotas have been checked. -| +XFS_PQUOTA_ENFD+ | Project quotas are enforced. -| +XFS_PQUOTA_CHKD+ | Project quotas have been checked. -|===== - -*sb_flags*:: -Miscellaneous flags. - -.Superblock flags -[options="header"] -|===== -| Flag | Description -| +XFS_SBF_READONLY+ | Only read-only mounts allowed. -|===== - -*sb_shared_vn*:: -Reserved and must be zero (``vn'' stands for version number). - -*sb_inoalignmt*:: -Inode chunk alignment in fsblocks. Prior to v5, the default value provided for -inode chunks to have an 8KiB alignment. Starting with v5, the default value -scales with the multiple of the inode size over 256 bytes. Concretely, this -means an alignment of 16KiB for 512-byte inodes, 32KiB for 1024-byte inodes, -etc. If sparse inodes are enabled, the +ir_startino+ field of each inode -B+tree record must be aligned to this block granularity, even if the inode -given by +ir_startino+ itself is sparse. - -*sb_unit*:: -Underlying stripe or raid unit in blocks. - -*sb_width*:: -Underlying stripe or raid width in blocks. - -*sb_dirblklog*:: -log~2~ multiplier that determines the granularity of directory block allocations -in fsblocks. - -*sb_logsectlog*:: -log~2~ value of the log subvolume's sector size. This is only used if the -journaling log is on a separate disk device (i.e. not internal). - -*sb_logsectsize*:: -The log's sector size in bytes if the filesystem uses an external log device. - -*sb_logsunit*:: -The log device's stripe or raid unit size. This only applies to version 2 logs -+XFS_SB_VERSION_LOGV2BIT+ is set in +sb_versionnum+. - -*sb_features2*:: -Additional version flags if +XFS_SB_VERSION_MOREBITSBIT+ is set in -+sb_versionnum+. The currently defined additional features include: - -.Extended Version 4 Superblock flags -[options="header"] -|===== -| Flag | Description -| +XFS_SB_VERSION2_LAZYSBCOUNTBIT+ | -Lazy global counters. Making a filesystem with this bit set can improve -performance. The global free space and inode counts are only updated in the -primary superblock when the filesystem is cleanly unmounted. - -| +XFS_SB_VERSION2_ATTR2BIT+ | -Extended attributes version 2. Making a filesystem with this optimises the -inode layout of extended attributes. If this bit is set and the +noattr2+ -mount flag is not specified, the +di_forkoff+ inode field will be dynamically -adjusted. See the section about xref:Extended_Attribute_Versions[extended -attribute versions] for more information. - -| +XFS_SB_VERSION2_PARENTBIT+ | -Parent pointers. All inodes must have an extended attribute that points back to -its parent inode. The primary purpose for this information is in backup systems. - -| +XFS_SB_VERSION2_PROJID32BIT+ | -32-bit Project ID. Inodes can be associated with a project ID number, which -can be used to enforce disk space usage quotas for a particular group of -directories. This flag indicates that project IDs can be 32 bits in size. - -| +XFS_SB_VERSION2_CRCBIT+ | -Metadata checksumming. All metadata blocks have an extended header containing -the block checksum, a copy of the metadata UUID, the log sequence number of the -last update to prevent stale replays, and a back pointer to the owner of the -block. This feature must be and can only be set if the lowest nibble of -+sb_versionnum+ is set to 5. - -| +XFS_SB_VERSION2_FTYPE+ | -Directory file type. Each directory entry records the type of the inode to -which the entry points. This speeds up directory iteration by removing the -need to load every inode into memory. -|===== - -*sb_bad_features2*:: -This field mirrors +sb_features2+, due to past 64-bit alignment errors. - -*sb_features_compat*:: -Read-write compatible feature flags. The kernel can still read and write this -FS even if it doesn't understand the flag. Currently, there are no valid -flags. - -*sb_features_ro_compat*:: -Read-only compatible feature flags. The kernel can still read this FS even if -it doesn't understand the flag. - -.Extended Version 5 Superblock Read-Only compatibility flags -[options="header"] -|===== -| Flag | Description -| +XFS_SB_FEAT_RO_COMPAT_FINOBT+ | -Free inode B+tree. Each allocation group contains a B+tree to track inode chunks -containing free inodes. This is a performance optimization to reduce the time -required to allocate inodes. - -| +XFS_SB_FEAT_RO_COMPAT_RMAPBT+ | -Reverse mapping B+tree. Each allocation group contains a B+tree containing -records mapping AG blocks to their owners. See the section about -xref:Reconstruction[reconstruction] for more details. - -| +XFS_SB_FEAT_RO_COMPAT_REFLINK+ | -Reference count B+tree. Each allocation group contains a B+tree to track the -reference counts of AG blocks. This enables files to share data blocks safely. -See the section about xref:Reflink_Deduplication[reflink and deduplication] for -more details. - -| +XFS_SB_FEAT_RO_COMPAT_INOBTCNT+ | -Inode B+tree block counters. Each allocation group's inode (AGI) header -tracks the number of blocks in each of the inode B+trees. This allows us -to have a slightly higher level of redundancy over the shape of the inode -btrees, and decreases the amount of time to compute the metadata B+tree -preallocations at mount time. - -|===== - -*sb_features_incompat*:: -Read-write incompatible feature flags. The kernel cannot read or write this -FS if it doesn't understand the flag. - -.Extended Version 5 Superblock Read-Write incompatibility flags -[options="header"] -|===== -| Flag | Description -| +XFS_SB_FEAT_INCOMPAT_FTYPE+ | -Directory file type. Each directory entry tracks the type of the inode to -which the entry points. This is a performance optimization to remove the need -to load every inode into memory to iterate a directory. - -| +XFS_SB_FEAT_INCOMPAT_SPINODES+ | -Sparse inodes. This feature relaxes the requirement to allocate inodes in -chunks of 64. When the free space is heavily fragmented, there might exist -plenty of free space but not enough contiguous free space to allocate a new -inode chunk. With this feature, the user can continue to create files until -all free space is exhausted. - -Unused space in the inode B+tree records are used to track which parts of the -inode chunk are not inodes. - -See the chapter on xref:Sparse_Inodes[Sparse Inodes] for more information. - -| +XFS_SB_FEAT_INCOMPAT_META_UUID+ | -Metadata UUID. The UUID stamped into each metadata block must match the value -in +sb_meta_uuid+. This enables the administrator to change +sb_uuid+ at will -without having to rewrite the entire filesystem. - -| +XFS_SB_FEAT_INCOMPAT_BIGTIME+ | -Large timestamps. Inode timestamps and quota expiration timers are extended to -support times through the year 2486. See the section on -xref:Timestamps[timestamps] for more information. - -| +XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR+ | -The filesystem is not in operable condition, and must be run through -xfs_repair before it can be mounted. - -| +XFS_SB_FEAT_INCOMPAT_NREXT64+ | -Large file fork extent counts. This greatly expands the maximum number of -space mappings allowed in data and extended attribute file forks. - -| +XFS_SB_FEAT_INCOMPAT_EXCHRANGE+ | -Atomic file mapping exchanges. The filesystem is capable of exchanging a range -of mappings between two arbitrary ranges of a file's fork by using log intent -items to track the progress of the high level exchange operation. In other -words, the exchange operation can be restarted if the system goes down, which -is necessary for userspace to commit of new file contents atomically. This -flag has user-visible impacts, which is why it is a permanent incompat flag. -See the section about xref:XMI_Log_Item[mapping exchange log intents] for more -information. - -| +XFS_SB_FEAT_INCOMPAT_PARENT+ | -Directory parent pointers. See the section about xref:Parent_Pointers[parent -pointers] for more information. - -|===== - -*sb_features_log_incompat*:: -Read-write incompatible feature flags for the log. The kernel cannot recover -the FS log if it doesn't understand the flag. - -.Extended Version 5 Superblock Log incompatibility flags -[options="header"] -|===== -| Flag | Description -| +XFS_SB_FEAT_INCOMPAT_LOG_XATTRS+ | -Extended attribute updates have been committed to the ondisk log. - -|===== - -*sb_crc*:: -Superblock checksum. - -*sb_spino_align*:: -Sparse inode alignment, in fsblocks. Each chunk of inodes referenced by a -sparse inode B+tree record must be aligned to this block granularity. - -*sb_pquotino*:: -Project quota inode. - -*sb_lsn*:: -Log sequence number of the last superblock update. - -*sb_meta_uuid*:: -If the +XFS_SB_FEAT_INCOMPAT_META_UUID+ feature is set, then the UUID field in -all metadata blocks must match this UUID. If not, the block header UUID field -must match +sb_uuid+. - -*sb_rrmapino*:: -If the +XFS_SB_FEAT_RO_COMPAT_RMAPBT+ feature is set and a real-time -device is present (+sb_rblocks+ > 0), this field points to an inode -that contains the root to the -xref:Real_time_Reverse_Mapping_Btree[Real-Time Reverse Mapping B+tree]. -This field is zero otherwise. - -=== xfs_db Superblock Example - -A filesystem is made on a single disk with the following command: - ----- -# mkfs.xfs -i attr=2 -n size=16384 -f /dev/sda7 -meta-data=/dev/sda7 isize=256 agcount=16, agsize=3923122 blks - = sectsz=512 attr=2 -data = bsize=4096 blocks=62769952, imaxpct=25 - = sunit=0 swidth=0 blks, unwritten=1 -naming =version 2 bsize=16384 -log =internal log bsize=4096 blocks=30649, version=1 - = sectsz=512 sunit=0 blks -realtime =none extsz=65536 blocks=0, rtextents=0 ----- - -And in xfs_db, inspecting the superblock: - ----- -xfs_db> sb -xfs_db> p -magicnum = 0x58465342 -blocksize = 4096 -dblocks = 62769952 -rblocks = 0 -rextents = 0 -uuid = 32b24036-6931-45b4-b68c-cd5e7d9a1ca5 -logstart = 33554436 -rootino = 128 -rbmino = 129 -rsumino = 130 -rextsize = 16 -agblocks = 3923122 -agcount = 16 -rbmblocks = 0 -logblocks = 30649 -versionnum = 0xb084 -sectsize = 512 -inodesize = 256 -inopblock = 16 -fname = "\000\000\000\000\000\000\000\000\000\000\000\000" -blocklog = 12 -sectlog = 9 -inodelog = 8 -inopblog = 4 -agblklog = 22 -rextslog = 0 -inprogress = 0 -imax_pct = 25 -icount = 64 -ifree = 61 -fdblocks = 62739235 -frextents = 0 -uquotino = 0 -gquotino = 0 -qflags = 0 -flags = 0 -shared_vn = 0 -inoalignmt = 2 -unit = 0 -width = 0 -dirblklog = 2 -logsectlog = 0 -logsectsize = 0 -logsunit = 0 -features2 = 8 ----- - +include::superblock.asciidoc[] [[AG_Free_Space_Management]] == AG Free Space Management diff --git a/design/XFS_Filesystem_Structure/superblock.asciidoc b/design/XFS_Filesystem_Structure/superblock.asciidoc new file mode 100644 index 00000000000000..16c31116ffafd4 --- /dev/null +++ b/design/XFS_Filesystem_Structure/superblock.asciidoc @@ -0,0 +1,548 @@ +[[Superblocks]] +== Superblocks + +Each AG starts with a superblock. The first one, in AG 0, is the primary +superblock which stores aggregate AG information. Secondary superblocks are +only used by xfs_repair when the primary superblock has been corrupted. A +superblock is one sector in length. + +The superblock is defined by the following structure. The description of each +field follows. + +[source, c] +---- +struct xfs_sb +{ + __uint32_t sb_magicnum; + __uint32_t sb_blocksize; + xfs_rfsblock_t sb_dblocks; + xfs_rfsblock_t sb_rblocks; + xfs_rtblock_t sb_rextents; + uuid_t sb_uuid; + xfs_fsblock_t sb_logstart; + xfs_ino_t sb_rootino; + xfs_ino_t sb_rbmino; + xfs_ino_t sb_rsumino; + xfs_agblock_t sb_rextsize; + xfs_agblock_t sb_agblocks; + xfs_agnumber_t sb_agcount; + xfs_extlen_t sb_rbmblocks; + xfs_extlen_t sb_logblocks; + __uint16_t sb_versionnum; + __uint16_t sb_sectsize; + __uint16_t sb_inodesize; + __uint16_t sb_inopblock; + char sb_fname[12]; + __uint8_t sb_blocklog; + __uint8_t sb_sectlog; + __uint8_t sb_inodelog; + __uint8_t sb_inopblog; + __uint8_t sb_agblklog; + __uint8_t sb_rextslog; + __uint8_t sb_inprogress; + __uint8_t sb_imax_pct; + __uint64_t sb_icount; + __uint64_t sb_ifree; + __uint64_t sb_fdblocks; + __uint64_t sb_frextents; + xfs_ino_t sb_uquotino; + xfs_ino_t sb_gquotino; + __uint16_t sb_qflags; + __uint8_t sb_flags; + __uint8_t sb_shared_vn; + xfs_extlen_t sb_inoalignmt; + __uint32_t sb_unit; + __uint32_t sb_width; + __uint8_t sb_dirblklog; + __uint8_t sb_logsectlog; + __uint16_t sb_logsectsize; + __uint32_t sb_logsunit; + __uint32_t sb_features2; + __uint32_t sb_bad_features2; + + /* version 5 superblock fields start here */ + __uint32_t sb_features_compat; + __uint32_t sb_features_ro_compat; + __uint32_t sb_features_incompat; + __uint32_t sb_features_log_incompat; + + __uint32_t sb_crc; + xfs_extlen_t sb_spino_align; + + xfs_ino_t sb_pquotino; + xfs_lsn_t sb_lsn; + uuid_t sb_meta_uuid; + xfs_ino_t sb_rrmapino; +}; +---- +*sb_magicnum*:: +Identifies the filesystem. Its value is +XFS_SB_MAGIC+ ``XFSB'' (0x58465342). + +*sb_blocksize*:: +The size of a basic unit of space allocation in bytes. Typically, this is 4096 +(4KB) but can range from 512 to 65536 bytes. + +*sb_dblocks*:: +Total number of blocks available for data and metadata on the filesystem. + +*sb_rblocks*:: +Number blocks in the real-time disk device. Refer to +xref:Real-time_Devices[real-time sub-volumes] for more information. + +*sb_rextents*:: +Number of extents on the real-time device. + +*sb_uuid*:: +UUID (Universally Unique ID) for the filesystem. Filesystems can be mounted by +the UUID instead of device name. + +*sb_logstart*:: +First block number for the journaling log if the log is internal (ie. not on a +separate disk device). For an external log device, this will be zero (the log +will also start on the first block on the log device). The identity of the log +devices is not recorded in the filesystem, but the UUIDs of the filesystem and +the log device are compared to prevent corruption. + +*sb_rootino*:: +Root inode number for the filesystem. Normally, the root inode is at the +start of the first possible inode chunk in AG 0. This is 128 when using a 4KB +block size. + +*sb_rbmino*:: +Bitmap inode for real-time extents. + +*sb_rsumino*:: +Summary inode for real-time bitmap. + +*sb_rextsize*:: +Realtime extent size in blocks. + +*sb_agblocks*:: +Size of each AG in blocks. For the actual size of the last AG, refer to the +xref:AG_Free_Space_Management[free space] +agf_length+ value. + +*sb_agcount*:: +Number of AGs in the filesystem. + +*sb_rbmblocks*:: +Number of real-time bitmap blocks. + +*sb_logblocks*:: +Number of blocks for the journaling log. + +*sb_versionnum*:: +Filesystem version number. This is a bitmask specifying the features enabled +when creating the filesystem. Any disk checking tools or drivers that do not +recognize any set bits must not operate upon the filesystem. Most of the flags +indicate features introduced over time. If the value of the lower nibble is >= +4, the higher bits indicate feature flags as follows: + +.Version 4 Superblock version flags +[options="header"] +|===== +| Flag | Description +| +XFS_SB_VERSION_ATTRBIT+ | +Set if any inode have extended attributes. If this bit is set; the ++XFS_SB_VERSION2_ATTR2BIT+ is not set; and the +attr2+ mount flag is not +specified, the +di_forkoff+ inode field will not be dynamically adjusted. +See the section about xref:Extended_Attribute_Versions[extended attribute +versions] for more information. + +| +XFS_SB_VERSION_NLINKBIT+ | Set if any inodes use 32-bit di_nlink values. +| +XFS_SB_VERSION_QUOTABIT+ | +Quotas are enabled on the filesystem. This +also brings in the various quota fields in the superblock. + +| +XFS_SB_VERSION_ALIGNBIT+ | Set if sb_inoalignmt is used. +| +XFS_SB_VERSION_DALIGNBIT+ | Set if sb_unit and sb_width are used. +| +XFS_SB_VERSION_SHAREDBIT+ | Set if sb_shared_vn is used. +| +XFS_SB_VERSION_LOGV2BIT+ | Version 2 journaling logs are used. +| +XFS_SB_VERSION_SECTORBIT+ | Set if sb_sectsize is not 512. +| +XFS_SB_VERSION_EXTFLGBIT+ | Unwritten extents are used. This is always set. +| +XFS_SB_VERSION_DIRV2BIT+ | +Version 2 directories are used. This is always set. + +| +XFS_SB_VERSION_MOREBITSBIT+ | +Set if the sb_features2 field in the superblock contains more flags. +|===== + +If the lower nibble of this value is 5, then this is a v5 filesystem; the ++XFS_SB_VERSION2_CRCBIT+ feature must be set in +sb_features2+. + +*sb_sectsize*:: +Specifies the underlying disk sector size in bytes. Typically this is 512 or +4096 bytes. This determines the minimum I/O alignment, especially for direct I/O. + +*sb_inodesize*:: +Size of the inode in bytes. The default is 256 (2 inodes per standard sector) +but can be made as large as 2048 bytes when creating the filesystem. On a v5 +filesystem, the default and minimum inode size are both 512 bytes. + +*sb_inopblock*:: +Number of inodes per block. This is equivalent to +sb_blocksize / sb_inodesize+. + +*sb_fname[12]*:: +Name for the filesystem. This value can be used in the mount command. + +*sb_blocklog*:: +log~2~ value of +sb_blocksize+. In other terms, +sb_blocksize = 2^sb_blocklog^+. + +*sb_sectlog*:: +log~2~ value of +sb_sectsize+. + +*sb_inodelog*:: +log~2~ value of +sb_inodesize+. + +*sb_inopblog*:: +log~2~ value of +sb_inopblock+. + +*sb_agblklog*:: +log~2~ value of +sb_agblocks+ (rounded up). This value is used to generate inode +numbers and absolute block numbers defined in extent maps. + +*sb_rextslog*:: +log~2~ value of +sb_rextents+. + +*sb_inprogress*:: +Flag specifying that the filesystem is being created. + +*sb_imax_pct*:: +Maximum percentage of filesystem space that can be used for inodes. The default +value is 5%. + +*sb_icount*:: +Global count for number inodes allocated on the filesystem. This is only +maintained in the first superblock. + +*sb_ifree*:: +Global count of free inodes on the filesystem. This is only maintained in the +first superblock. + +*sb_fdblocks*:: +Global count of free data blocks on the filesystem. This is only maintained in +the first superblock. + +*sb_frextents*:: +Global count of free real-time extents on the filesystem. This is only +maintained in the first superblock. + +*sb_uquotino*:: +Inode for user quotas. This and the following two quota fields only apply if ++XFS_SB_VERSION_QUOTABIT+ flag is set in +sb_versionnum+. Refer to +xref:Quota_Inodes[quota inodes] for more information. + +*sb_gquotino*:: +Inode for group or project quotas. Group and project quotas cannot be used at +the same time on v4 filesystems. On a v5 filesystem, this inode always stores +group quota information. + +*sb_qflags*:: +Quota flags. It can be a combination of the following flags: + +.Superblock quota flags +[options="header"] +|===== +| Flag | Description +| +XFS_UQUOTA_ACCT+ | User quota accounting is enabled. +| +XFS_UQUOTA_ENFD+ | User quotas are enforced. +| +XFS_UQUOTA_CHKD+ | User quotas have been checked. +| +XFS_PQUOTA_ACCT+ | Project quota accounting is enabled. +| +XFS_OQUOTA_ENFD+ | Other (group/project) quotas are enforced. +| +XFS_OQUOTA_CHKD+ | Other (group/project) quotas have been checked. +| +XFS_GQUOTA_ACCT+ | Group quota accounting is enabled. +| +XFS_GQUOTA_ENFD+ | Group quotas are enforced. +| +XFS_GQUOTA_CHKD+ | Group quotas have been checked. +| +XFS_PQUOTA_ENFD+ | Project quotas are enforced. +| +XFS_PQUOTA_CHKD+ | Project quotas have been checked. +|===== + +*sb_flags*:: +Miscellaneous flags. + +.Superblock flags +[options="header"] +|===== +| Flag | Description +| +XFS_SBF_READONLY+ | Only read-only mounts allowed. +|===== + +*sb_shared_vn*:: +Reserved and must be zero (``vn'' stands for version number). + +*sb_inoalignmt*:: +Inode chunk alignment in fsblocks. Prior to v5, the default value provided for +inode chunks to have an 8KiB alignment. Starting with v5, the default value +scales with the multiple of the inode size over 256 bytes. Concretely, this +means an alignment of 16KiB for 512-byte inodes, 32KiB for 1024-byte inodes, +etc. If sparse inodes are enabled, the +ir_startino+ field of each inode +B+tree record must be aligned to this block granularity, even if the inode +given by +ir_startino+ itself is sparse. + +*sb_unit*:: +Underlying stripe or raid unit in blocks. + +*sb_width*:: +Underlying stripe or raid width in blocks. + +*sb_dirblklog*:: +log~2~ multiplier that determines the granularity of directory block allocations +in fsblocks. + +*sb_logsectlog*:: +log~2~ value of the log subvolume's sector size. This is only used if the +journaling log is on a separate disk device (i.e. not internal). + +*sb_logsectsize*:: +The log's sector size in bytes if the filesystem uses an external log device. + +*sb_logsunit*:: +The log device's stripe or raid unit size. This only applies to version 2 logs ++XFS_SB_VERSION_LOGV2BIT+ is set in +sb_versionnum+. + +*sb_features2*:: +Additional version flags if +XFS_SB_VERSION_MOREBITSBIT+ is set in ++sb_versionnum+. The currently defined additional features include: + +.Extended Version 4 Superblock flags +[options="header"] +|===== +| Flag | Description +| +XFS_SB_VERSION2_LAZYSBCOUNTBIT+ | +Lazy global counters. Making a filesystem with this bit set can improve +performance. The global free space and inode counts are only updated in the +primary superblock when the filesystem is cleanly unmounted. + +| +XFS_SB_VERSION2_ATTR2BIT+ | +Extended attributes version 2. Making a filesystem with this optimises the +inode layout of extended attributes. If this bit is set and the +noattr2+ +mount flag is not specified, the +di_forkoff+ inode field will be dynamically +adjusted. See the section about xref:Extended_Attribute_Versions[extended +attribute versions] for more information. + +| +XFS_SB_VERSION2_PARENTBIT+ | +Parent pointers. All inodes must have an extended attribute that points back to +its parent inode. The primary purpose for this information is in backup systems. + +| +XFS_SB_VERSION2_PROJID32BIT+ | +32-bit Project ID. Inodes can be associated with a project ID number, which +can be used to enforce disk space usage quotas for a particular group of +directories. This flag indicates that project IDs can be 32 bits in size. + +| +XFS_SB_VERSION2_CRCBIT+ | +Metadata checksumming. All metadata blocks have an extended header containing +the block checksum, a copy of the metadata UUID, the log sequence number of the +last update to prevent stale replays, and a back pointer to the owner of the +block. This feature must be and can only be set if the lowest nibble of ++sb_versionnum+ is set to 5. + +| +XFS_SB_VERSION2_FTYPE+ | +Directory file type. Each directory entry records the type of the inode to +which the entry points. This speeds up directory iteration by removing the +need to load every inode into memory. +|===== + +*sb_bad_features2*:: +This field mirrors +sb_features2+, due to past 64-bit alignment errors. + +*sb_features_compat*:: +Read-write compatible feature flags. The kernel can still read and write this +FS even if it doesn't understand the flag. Currently, there are no valid +flags. + +*sb_features_ro_compat*:: +Read-only compatible feature flags. The kernel can still read this FS even if +it doesn't understand the flag. + +.Extended Version 5 Superblock Read-Only compatibility flags +[options="header"] +|===== +| Flag | Description +| +XFS_SB_FEAT_RO_COMPAT_FINOBT+ | +Free inode B+tree. Each allocation group contains a B+tree to track inode chunks +containing free inodes. This is a performance optimization to reduce the time +required to allocate inodes. + +| +XFS_SB_FEAT_RO_COMPAT_RMAPBT+ | +Reverse mapping B+tree. Each allocation group contains a B+tree containing +records mapping AG blocks to their owners. See the section about +xref:Reconstruction[reconstruction] for more details. + +| +XFS_SB_FEAT_RO_COMPAT_REFLINK+ | +Reference count B+tree. Each allocation group contains a B+tree to track the +reference counts of AG blocks. This enables files to share data blocks safely. +See the section about xref:Reflink_Deduplication[reflink and deduplication] for +more details. + +| +XFS_SB_FEAT_RO_COMPAT_INOBTCNT+ | +Inode B+tree block counters. Each allocation group's inode (AGI) header +tracks the number of blocks in each of the inode B+trees. This allows us +to have a slightly higher level of redundancy over the shape of the inode +btrees, and decreases the amount of time to compute the metadata B+tree +preallocations at mount time. + +|===== + +*sb_features_incompat*:: +Read-write incompatible feature flags. The kernel cannot read or write this +FS if it doesn't understand the flag. + +.Extended Version 5 Superblock Read-Write incompatibility flags +[options="header"] +|===== +| Flag | Description +| +XFS_SB_FEAT_INCOMPAT_FTYPE+ | +Directory file type. Each directory entry tracks the type of the inode to +which the entry points. This is a performance optimization to remove the need +to load every inode into memory to iterate a directory. + +| +XFS_SB_FEAT_INCOMPAT_SPINODES+ | +Sparse inodes. This feature relaxes the requirement to allocate inodes in +chunks of 64. When the free space is heavily fragmented, there might exist +plenty of free space but not enough contiguous free space to allocate a new +inode chunk. With this feature, the user can continue to create files until +all free space is exhausted. + +Unused space in the inode B+tree records are used to track which parts of the +inode chunk are not inodes. + +See the chapter on xref:Sparse_Inodes[Sparse Inodes] for more information. + +| +XFS_SB_FEAT_INCOMPAT_META_UUID+ | +Metadata UUID. The UUID stamped into each metadata block must match the value +in +sb_meta_uuid+. This enables the administrator to change +sb_uuid+ at will +without having to rewrite the entire filesystem. + +| +XFS_SB_FEAT_INCOMPAT_BIGTIME+ | +Large timestamps. Inode timestamps and quota expiration timers are extended to +support times through the year 2486. See the section on +xref:Timestamps[timestamps] for more information. + +| +XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR+ | +The filesystem is not in operable condition, and must be run through +xfs_repair before it can be mounted. + +| +XFS_SB_FEAT_INCOMPAT_NREXT64+ | +Large file fork extent counts. This greatly expands the maximum number of +space mappings allowed in data and extended attribute file forks. + +| +XFS_SB_FEAT_INCOMPAT_EXCHRANGE+ | +Atomic file mapping exchanges. The filesystem is capable of exchanging a range +of mappings between two arbitrary ranges of a file's fork by using log intent +items to track the progress of the high level exchange operation. In other +words, the exchange operation can be restarted if the system goes down, which +is necessary for userspace to commit of new file contents atomically. This +flag has user-visible impacts, which is why it is a permanent incompat flag. +See the section about xref:XMI_Log_Item[mapping exchange log intents] for more +information. + +| +XFS_SB_FEAT_INCOMPAT_PARENT+ | +Directory parent pointers. See the section about xref:Parent_Pointers[parent +pointers] for more information. + +|===== + +*sb_features_log_incompat*:: +Read-write incompatible feature flags for the log. The kernel cannot recover +the FS log if it doesn't understand the flag. + +.Extended Version 5 Superblock Log incompatibility flags +[options="header"] +|===== +| Flag | Description +| +XFS_SB_FEAT_INCOMPAT_LOG_XATTRS+ | +Extended attribute updates have been committed to the ondisk log. + +|===== + +*sb_crc*:: +Superblock checksum. + +*sb_spino_align*:: +Sparse inode alignment, in fsblocks. Each chunk of inodes referenced by a +sparse inode B+tree record must be aligned to this block granularity. + +*sb_pquotino*:: +Project quota inode. + +*sb_lsn*:: +Log sequence number of the last superblock update. + +*sb_meta_uuid*:: +If the +XFS_SB_FEAT_INCOMPAT_META_UUID+ feature is set, then the UUID field in +all metadata blocks must match this UUID. If not, the block header UUID field +must match +sb_uuid+. + +*sb_rrmapino*:: +If the +XFS_SB_FEAT_RO_COMPAT_RMAPBT+ feature is set and a real-time +device is present (+sb_rblocks+ > 0), this field points to an inode +that contains the root to the +xref:Real_time_Reverse_Mapping_Btree[Real-Time Reverse Mapping B+tree]. +This field is zero otherwise. + +=== xfs_db Superblock Example + +A filesystem is made on a single disk with the following command: + +---- +# mkfs.xfs -i attr=2 -n size=16384 -f /dev/sda7 +meta-data=/dev/sda7 isize=256 agcount=16, agsize=3923122 blks + = sectsz=512 attr=2 +data = bsize=4096 blocks=62769952, imaxpct=25 + = sunit=0 swidth=0 blks, unwritten=1 +naming =version 2 bsize=16384 +log =internal log bsize=4096 blocks=30649, version=1 + = sectsz=512 sunit=0 blks +realtime =none extsz=65536 blocks=0, rtextents=0 +---- + +And in xfs_db, inspecting the superblock: + +---- +xfs_db> sb +xfs_db> p +magicnum = 0x58465342 +blocksize = 4096 +dblocks = 62769952 +rblocks = 0 +rextents = 0 +uuid = 32b24036-6931-45b4-b68c-cd5e7d9a1ca5 +logstart = 33554436 +rootino = 128 +rbmino = 129 +rsumino = 130 +rextsize = 16 +agblocks = 3923122 +agcount = 16 +rbmblocks = 0 +logblocks = 30649 +versionnum = 0xb084 +sectsize = 512 +inodesize = 256 +inopblock = 16 +fname = "\000\000\000\000\000\000\000\000\000\000\000\000" +blocklog = 12 +sectlog = 9 +inodelog = 8 +inopblog = 4 +agblklog = 22 +rextslog = 0 +inprogress = 0 +imax_pct = 25 +icount = 64 +ifree = 61 +fdblocks = 62739235 +frextents = 0 +uquotino = 0 +gquotino = 0 +qflags = 0 +flags = 0 +shared_vn = 0 +inoalignmt = 2 +unit = 0 +width = 0 +dirblklog = 2 +logsectlog = 0 +logsectsize = 0 +logsunit = 0 +features2 = 8 +---- From patchwork Wed Nov 27 00:19:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13886445 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9F3538F49 for ; Wed, 27 Nov 2024 00:19:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732666744; cv=none; b=ZP/gCH3FfdED41lh1IfH6gOjv3/q3jhUYWZl4VBwJlX00mqENaThsJFsLA74f8J5TIVzxtDR8FboXOnDGR0CB+WsjryPRP5fvlUBsXnM6xEKyaC/lGZit+ydWJzMDGEiBxZA49rGQSANmvUN24IpOjWrAAEZoh8T7nlD+7rZgBg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732666744; c=relaxed/simple; bh=61zVf+O+FWTlDTD4BnwkBIqOTKOP+GgThDyFoNiZ2Lg=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=t+S0te1BE1Q1LSDw86LXk+VC9dXCggz4BTOtcUB/EW0SfEih5sCOvNwjph+36bPUWhR+HO1OO9yPGKuR4mkP+i1lIkNKWeMO4d1fGYBHTvRTg8Kt3nTds4a6mRiAexvRcS3vn3Mon80MkCN0yS9YyJ3MDc6ZA0+CBKTk5M4H5Kg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=lB4AP9w9; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lB4AP9w9" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 71B65C4CECF; Wed, 27 Nov 2024 00:19:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1732666744; bh=61zVf+O+FWTlDTD4BnwkBIqOTKOP+GgThDyFoNiZ2Lg=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=lB4AP9w9gNHPcHlI7+yWq9jH9u0UV6m9Fx92kRc+Zq6UPZ/euXn9ejGxz8tymRAdp HjbvYdn+SgWIFEXgNeu6LVhPSOxj0S+mESUEg3qPrjHldjXjEpCYJUK1buGARWYPqo D5RmPe9B7IO8KGgxz9o14KPFQUTgM3dZfvkpBCXB/nNeLbIGb0WLT9+GMbjd1CpAOM gv0o5ZIiUrNKD0uVFCZbVllp2/Dr0c3/xMcKXOI0gcdQlePAZms40gKPH2foePAoIw hziXokGjiBG6+3VfWJ3pJi4QEK6amM4I4SExAp++J/QYTTZNul1WV13umvconTSEae 8hYTkYggTgVug== Date: Tue, 26 Nov 2024 16:19:03 -0800 Subject: [PATCH 04/10] design: document the actual ondisk superblock From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, hch@lst.de, cem@kernel.org, linux-xfs@vger.kernel.org Message-ID: <173266662270.996198.11685949385888873946.stgit@frogsfrogsfrogs> In-Reply-To: <173266662205.996198.11304294193325450774.stgit@frogsfrogsfrogs> References: <173266662205.996198.11304294193325450774.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong struct xfs_dsb is the ondisk superblock, not struct xfs_sb. Replace the struct definition with the one for the the ondisk superblock. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- .../XFS_Filesystem_Structure/superblock.asciidoc | 117 ++++++++++---------- 1 file changed, 58 insertions(+), 59 deletions(-) diff --git a/design/XFS_Filesystem_Structure/superblock.asciidoc b/design/XFS_Filesystem_Structure/superblock.asciidoc index 16c31116ffafd4..79e8c30dc93e79 100644 --- a/design/XFS_Filesystem_Structure/superblock.asciidoc +++ b/design/XFS_Filesystem_Structure/superblock.asciidoc @@ -11,68 +11,67 @@ field follows. [source, c] ---- -struct xfs_sb -{ - __uint32_t sb_magicnum; - __uint32_t sb_blocksize; - xfs_rfsblock_t sb_dblocks; - xfs_rfsblock_t sb_rblocks; - xfs_rtblock_t sb_rextents; - uuid_t sb_uuid; - xfs_fsblock_t sb_logstart; - xfs_ino_t sb_rootino; - xfs_ino_t sb_rbmino; - xfs_ino_t sb_rsumino; - xfs_agblock_t sb_rextsize; - xfs_agblock_t sb_agblocks; - xfs_agnumber_t sb_agcount; - xfs_extlen_t sb_rbmblocks; - xfs_extlen_t sb_logblocks; - __uint16_t sb_versionnum; - __uint16_t sb_sectsize; - __uint16_t sb_inodesize; - __uint16_t sb_inopblock; - char sb_fname[12]; - __uint8_t sb_blocklog; - __uint8_t sb_sectlog; - __uint8_t sb_inodelog; - __uint8_t sb_inopblog; - __uint8_t sb_agblklog; - __uint8_t sb_rextslog; - __uint8_t sb_inprogress; - __uint8_t sb_imax_pct; - __uint64_t sb_icount; - __uint64_t sb_ifree; - __uint64_t sb_fdblocks; - __uint64_t sb_frextents; - xfs_ino_t sb_uquotino; - xfs_ino_t sb_gquotino; - __uint16_t sb_qflags; - __uint8_t sb_flags; - __uint8_t sb_shared_vn; - xfs_extlen_t sb_inoalignmt; - __uint32_t sb_unit; - __uint32_t sb_width; - __uint8_t sb_dirblklog; - __uint8_t sb_logsectlog; - __uint16_t sb_logsectsize; - __uint32_t sb_logsunit; - __uint32_t sb_features2; - __uint32_t sb_bad_features2; +struct xfs_dsb { + __be32 sb_magicnum; + __be32 sb_blocksize; + __be64 sb_dblocks; + __be64 sb_rblocks; + __be64 sb_rextents; + uuid_t sb_uuid; + __be64 sb_logstart; + __be64 sb_rootino; + __be64 sb_rbmino; + __be64 sb_rsumino; + __be32 sb_rextsize; + __be32 sb_agblocks; + __be32 sb_agcount; + __be32 sb_rbmblocks; + __be32 sb_logblocks; + __be16 sb_versionnum; + __be16 sb_sectsize; + __be16 sb_inodesize; + __be16 sb_inopblock; + char sb_fname[XFSLABEL_MAX]; + __u8 sb_blocklog; + __u8 sb_sectlog; + __u8 sb_inodelog; + __u8 sb_inopblog; + __u8 sb_agblklog; + __u8 sb_rextslog; + __u8 sb_inprogress; + __u8 sb_imax_pct; + __be64 sb_icount; + __be64 sb_ifree; + __be64 sb_fdblocks; + __be64 sb_frextents; + __be64 sb_uquotino; + __be64 sb_gquotino; + __be16 sb_qflags; + __u8 sb_flags; + __u8 sb_shared_vn; + __be32 sb_inoalignmt; + __be32 sb_unit; + __be32 sb_width; + __u8 sb_dirblklog; + __u8 sb_logsectlog; + __be16 sb_logsectsize; + __be32 sb_logsunit; + __be32 sb_features2; + __be32 sb_bad_features2; /* version 5 superblock fields start here */ - __uint32_t sb_features_compat; - __uint32_t sb_features_ro_compat; - __uint32_t sb_features_incompat; - __uint32_t sb_features_log_incompat; + __be32 sb_features_compat; + __be32 sb_features_ro_compat; + __be32 sb_features_incompat; + __be32 sb_features_log_incompat; + __le32 sb_crc; + __be32 sb_spino_align; + __be64 sb_pquotino; + __be64 sb_lsn; + uuid_t sb_meta_uuid; + __be64 sb_rrmapino; - __uint32_t sb_crc; - xfs_extlen_t sb_spino_align; - - xfs_ino_t sb_pquotino; - xfs_lsn_t sb_lsn; - uuid_t sb_meta_uuid; - xfs_ino_t sb_rrmapino; + /* must be padded to 64 bit alignment */ }; ---- *sb_magicnum*:: From patchwork Wed Nov 27 00:19:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13886446 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6BB70C2C8 for ; Wed, 27 Nov 2024 00:19:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732666760; cv=none; b=HsXIlvlvUWaoFBmvXzrKgQICYlGoaNV2fr80pxwkcVeuWWayHbUoKfP8oKlTPgL1fzfUpczEmaGQMZRDaIWw0qnYzAZgjEjMEWgt0XFz419BNnDkonOVs4XNwP/8mK9KYGHOWlWYikdbXlxxGmYQpM8fXr+sWfG8ec3ao/ZRgUs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732666760; c=relaxed/simple; bh=4QMLy8jvBiAI+fxLXOf9Eo59XkO0n2tdIrPZoN4WRzQ=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=vBRGDqFCttiWqFjYYTwJUlxYoL8uXYc/iVAMKme5gn4BCmWXXEGH0B3tFffZ//8w0PJFPewWbbMS8Y2frbQRebpK9vGZBjYkF9d1fHh0ucQN1R1VcWJ4yrnsqzfbprAY6X8VUqMF/x70PZrVOzzfWCupaBpvzF7Fca1H6kg85Lg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=AlhSyy5D; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="AlhSyy5D" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 15129C4CECF; Wed, 27 Nov 2024 00:19:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1732666760; bh=4QMLy8jvBiAI+fxLXOf9Eo59XkO0n2tdIrPZoN4WRzQ=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=AlhSyy5DXtwd8HR72Uv06CAmIFc8ZHGM3yssJuyFb+dtOyymVsZJfSXXSZrrTVqD/ +/gkC6ECLURQCuLG3KZb4chbzZlGhUSD1vslhFuySlFW0mKYZRwuUrlC0ICZjUxs5z mz0dGTHLHeTzt2kllk5A3yrkGPzdB7L+PgtQgI2JnAUC8lj+EqxrhACHfzyzPr7T/i RON1TbJCltsqjaaZMX2kuyHKOPmPlIZco0Lfj+o4rZmSVXNI2YtG4/NuL+FcIw9oHp wVuVNcdyPT88iOEKfMfJkNK6Y55bsLU6MzYw3Iaj39BtyGiNEVaTWnwKbtBXh2qqTy 2dDO5If6lZSSA== Date: Tue, 26 Nov 2024 16:19:19 -0800 Subject: [PATCH 05/10] design: document the changes required to handle metadata directories From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, hch@lst.de, cem@kernel.org, linux-xfs@vger.kernel.org Message-ID: <173266662285.996198.10934163595962023057.stgit@frogsfrogsfrogs> In-Reply-To: <173266662205.996198.11304294193325450774.stgit@frogsfrogsfrogs> References: <173266662205.996198.11304294193325450774.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Document the ondisk format changes for metadata directories. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- .../internal_inodes.asciidoc | 113 ++++++++++++++++++++ .../XFS_Filesystem_Structure/ondisk_inode.asciidoc | 22 ++++ .../XFS_Filesystem_Structure/superblock.asciidoc | 14 +- 3 files changed, 142 insertions(+), 7 deletions(-) diff --git a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc index 84e4cb969ce392..eaa0a50aa848f3 100644 --- a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc +++ b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc @@ -5,6 +5,119 @@ XFS allocates several inodes when a filesystem is created. These are internal and not accessible from the standard directory structure. These inodes are only accessible from the superblock. +[[Metadata_Directories]] +== Metadata Directory Tree + +If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, the +sb_metadirino+ +field in the superblock points to the root of a directory tree containing +metadata files. This directory tree is completely internal to the filesystem +and must not be exposed to user programs. + +When this feature is enabled, metadata files should be found by walking the +metadata directory tree. The superblock fields that formerly pointed to (some) +of those inodes have been deallocated and may be reused by future features. + +.Metadata Directory Paths +[options="header"] +|===== +| Metadata File | Location +|===== + +Metadata files are flagged by the +XFS_DIFLAG2_METADATA+ flag in the ++di_flags2+ field. Metadata files must have the following properties: + +* Must be either a directory or a regular file. +* chmod 0000 +* User and group IDs set to zero. +* The +XFS_DIFLAG_IMMUTABLE+, +XFS_DIFLAG_SYNC+, +XFS_DIFLAG_NOATIME+, +XFS_DIFLAG_NODUMP+, and +XFS_DIFLAG_NODEFRAG+ flags must all be set in +di_flags+. +* For a directory, the +XFS_DIFLAG_NOSYMLINKS+ flag must also be set. +* The +XFS_DIFLAG2_METADATA+ flag must be set in +di_flags2+. +* The +XFS_DIFLAG2_DAX+ flag must not be set. + +=== Metadata Directory Example + +This example shows a metadta directory from a freshly formatted root +filesystem: + +---- +xfs_db> sb 0 +xfs_db> p +magicnum = 0x58465342 +blocksize = 4096 +dblocks = 5192704 +rblocks = 0 +rextents = 0 +uuid = cbf2ceef-658e-46b0-8f96-785661c37976 +logstart = 4194311 +rootino = 128 +rbmino = 130 +rsumino = 131 +... +meta_uuid = 00000000-0000-0000-0000-000000000000 +metadirino = 129 +... +---- + +Notice how the listing includes the root of the metadata directory tree +(+metadirino+). + +---- +xfs_db> path -m / +xfs_db> ls +8 129 directory 0x0000002e 1 . (good) +10 129 directory 0x0000172e 2 .. (good) +12 33685632 directory 0x2d18ab4c 8 rtgroups (good) +---- + +Here we use the +path+ and +ls+ commands to display the root directory of +the metadata directory. We can navigate the directory the old way, too: + +---- +xfs_db> p +core.magic = 0x494e +core.mode = 040000 +core.version = 3 +core.format = 1 (local) +core.onlink = 0 +core.uid = 0 +core.gid = 0 +... +v3.flags2 = 0x8000000000000018 +v3.cowextsize = 0 +v3.crtime.sec = Wed Aug 7 10:22:36 2024 +v3.crtime.nsec = 273744000 +v3.inumber = 129 +v3.uuid = 7e55b909-8728-4d69-a1fa-891427314eea +v3.reflink = 0 +v3.cowextsz = 0 +v3.dax = 0 +v3.bigtime = 1 +v3.nrext64 = 1 +v3.metadata = 1 +u3.sfdir3.hdr.count = 1 +u3.sfdir3.hdr.i8count = 0 +u3.sfdir3.hdr.parent.i4 = 129 +u3.sfdir3.list[0].namelen = 8 +u3.sfdir3.list[0].offset = 0x60 +u3.sfdir3.list[0].name = "rtgroups" +u3.sfdir3.list[0].inumber.i4 = 33685632 +u3.sfdir3.list[0].filetype = 2 +---- + +The root of the metadata directory is a short format directory, and looks just +like any other directory. The only difference is that the metadata flag is +set, and the directory can only be viewed in the XFS debugger. + +---- +xfs_db> path -m /rtgroups/0.rmap +btdump +u3.rtrmapbt.recs[1] = [startblock,blockcount,owner,offset,extentflag,attrfork,bmbtblock] +1:[0,1,-3,0,0,0,0] +---- + +Observe that we can use the xfs_db +path+ command to navigate the metadata +directory tree to the user quota file and display its contents. + [[Quota_Inodes]] == Quota Inodes diff --git a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc index 34c064871cb255..02ec0d12bb57e5 100644 --- a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc +++ b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc @@ -78,7 +78,10 @@ struct xfs_dinode_core { __uint16_t di_mode; __int8_t di_version; __int8_t di_format; - __uint16_t di_onlink; + union { + __uint16_t di_onlink; + __uint16_t di_metatype; + }; __uint32_t di_uid; __uint32_t di_gid; __uint32_t di_nlink; @@ -188,6 +191,17 @@ In v1 inodes, this specifies the number of links to the inode from directories. When the number exceeds 65535, the inode is converted to v2 and the link count is stored in +di_nlink+. +*di_metatype*:: +If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, the +di_onlink+ field +is redefined to declare the intended contents of files in the metadata +directory tree. + +[source, c] +---- +enum xfs_metafile_type { +}; +---- + *di_uid*:: Specifies the owner's UID of the inode. @@ -383,6 +397,12 @@ will be copied to all newly created files and directories. Files with this flag set may have up to (2^48^ - 1) extents mapped to the data fork and up to (2^32^ - 1) extents mapped to the attribute fork. This flag requires the +XFS_SB_FEAT_INCOMPAT_NREXT64+ feature to be enabled. +| +XFS_DIFLAG2_METADATA+ | +This file contains filesystem metadata. This feature requires the ++XFS_SB_FEAT_INCOMPAT_METADIR+ feature to be enabled. See the section about +xref:Metadata_Directories[metadata directories] for more information on +metadata inode properties. Only directories and regular files can have this +flag set. |===== *di_cowextsize*:: diff --git a/design/XFS_Filesystem_Structure/superblock.asciidoc b/design/XFS_Filesystem_Structure/superblock.asciidoc index 79e8c30dc93e79..56877615ae81bf 100644 --- a/design/XFS_Filesystem_Structure/superblock.asciidoc +++ b/design/XFS_Filesystem_Structure/superblock.asciidoc @@ -69,7 +69,7 @@ struct xfs_dsb { __be64 sb_pquotino; __be64 sb_lsn; uuid_t sb_meta_uuid; - __be64 sb_rrmapino; + __be64 sb_metadirino; /* must be padded to 64 bit alignment */ }; @@ -438,6 +438,10 @@ information. Directory parent pointers. See the section about xref:Parent_Pointers[parent pointers] for more information. +| +XFS_SB_FEAT_INCOMPAT_METADIR+ | +Metadata directory tree. See the section about the xref:Metadata_Directories[ +metadata directory tree] for more information. + |===== *sb_features_log_incompat*:: @@ -471,11 +475,9 @@ If the +XFS_SB_FEAT_INCOMPAT_META_UUID+ feature is set, then the UUID field in all metadata blocks must match this UUID. If not, the block header UUID field must match +sb_uuid+. -*sb_rrmapino*:: -If the +XFS_SB_FEAT_RO_COMPAT_RMAPBT+ feature is set and a real-time -device is present (+sb_rblocks+ > 0), this field points to an inode -that contains the root to the -xref:Real_time_Reverse_Mapping_Btree[Real-Time Reverse Mapping B+tree]. +*sb_metadirino*:: +If the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is set, this field points to +the inode of the root directory of the metadata directory tree. This field is zero otherwise. === xfs_db Superblock Example From patchwork Wed Nov 27 00:19:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13886447 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EA2B1C8CE for ; Wed, 27 Nov 2024 00:19:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732666777; cv=none; b=j1FJd5x89hLufuu57g5CGehRv2PU/t+TpWtr2ik4Af9gXlyKLf456lV1kbwlld4X9YdL5kodTk1SzFTB4OUj8uHde9EovXI3+uj+fKu+I+gtUWbQGYKjiU33ljwZB/xGj9fgNAExg1K6HFWRAw1x/1LJcRZxZrOlF3yQ9Z1FV0A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732666777; c=relaxed/simple; bh=IhHD8XpoV3kzwDjfIC15/3ygPjs3+1t+wmvv/TBXTxw=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=WRrF63HrRXLdmhdA/wwsAJSfp9bO4rXHXF4XC0M1RQgXd1IR4P6ye5+phtOZI70DBxtCjJyq3o9ijGX9qyAyT6yiEel7zSrri2OksCdJX1cPCgwjueSUNM55EwSAEc6nWuS5RGJQ4Pd9ISOQj3B4hbCkVEvvXdNCMrTBjnqzTHQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=PdRBDDeL; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="PdRBDDeL" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B6689C4CECF; Wed, 27 Nov 2024 00:19:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1732666776; bh=IhHD8XpoV3kzwDjfIC15/3ygPjs3+1t+wmvv/TBXTxw=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=PdRBDDeLLJ6e9qwhm9ivU91ijJUzEfGron363t7TZ0IHWXaRs3SgYd4eF36xSALpa NBnYSpoohOFUZCkw/riVf92n38tDQm0MHEQgCrS8Y/FjiEIDjR7HB7IY1oDCpcDBUW nSyE6GlLCtUdSEVHMTkjai6QyEyVWH+ZZw9+2GbM9JMyd0ZNEIcW8RYk1H5cjOzD0K K/C957DuC1Lein+EKncITlG57elnGQIWj4Kvof9mC75dxAHsgm6flr9XM2zkVp8MBG 6aDupIjU39ciIj76BaBPSpI4w2KNZqpqYyPmSrtW9hdT3V8gh+mC2rurMSStEQY/fl WQzh10cIp6KHA== Date: Tue, 26 Nov 2024 16:19:35 -0800 Subject: [PATCH 06/10] design: move discussion of realtime volumes to a separate section From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, hch@lst.de, cem@kernel.org, linux-xfs@vger.kernel.org Message-ID: <173266662299.996198.2514381239709867329.stgit@frogsfrogsfrogs> In-Reply-To: <173266662205.996198.11304294193325450774.stgit@frogsfrogsfrogs> References: <173266662205.996198.11304294193325450774.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong In preparation for documenting the realtime modernization project, move the discussions of the realtime-realted ondisk metadata to a separate file. Since realtime reverse mapping btrees haven't been added to the filesystem yet, stop including them in the final output. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- .../allocation_groups.asciidoc | 20 -------- .../internal_inodes.asciidoc | 36 +------------- design/XFS_Filesystem_Structure/realtime.asciidoc | 50 ++++++++++++++++++++ .../xfs_filesystem_structure.asciidoc | 2 + 4 files changed, 54 insertions(+), 54 deletions(-) create mode 100644 design/XFS_Filesystem_Structure/realtime.asciidoc diff --git a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc index e2cdaab5e03d3f..c746a92ca47dd6 100644 --- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc +++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc @@ -772,23 +772,3 @@ core.magic = 0x494e The chunk record also indicates that this chunk has 32 inodes, and that the missing inodes are also ``free''. - -[[Real-time_Devices]] -== Real-time Devices - -The performance of the standard XFS allocator varies depending on the internal -state of the various metadata indices enabled on the filesystem. For -applications which need to minimize the jitter of allocation latency, XFS -supports the notion of a ``real-time device''. This is a special device -separate from the regular filesystem where extent allocations are tracked with -a bitmap and free space is indexed with a two-dimensional array. If an inode -is flagged with +XFS_DIFLAG_REALTIME+, its data will live on the real time -device. The metadata for real time devices is discussed in the section about -xref:Real-time_Inodes[real time inodes]. - -By placing the real time device (and the journal) on separate high-performance -storage devices, it is possible to reduce most of the unpredictability in I/O -response times that come from metadata operations. - -None of the XFS per-AG B+trees are involved with real time files. It is not -possible for real time files to share data blocks. diff --git a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc index eaa0a50aa848f3..68c86d30ff8206 100644 --- a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc +++ b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc @@ -287,41 +287,9 @@ Log sequence number of the last DQ block write. *dd_crc*:: Checksum of the DQ block. - [[Real-time_Inodes]] == Real-time Inodes There are two inodes allocated to managing the real-time device's space, the -Bitmap Inode and the Summary Inode. - -[[Real-Time_Bitmap_Inode]] -=== Real-Time Bitmap Inode - -The real time bitmap inode, +sb_rbmino+, tracks the used/free space in the -real-time device using an old-style bitmap. One bit is allocated per real-time -extent. The size of an extent is specified by the superblock's +sb_rextsize+ -value. - -The number of blocks used by the bitmap inode is equal to the number of -real-time extents (+sb_rextents+) divided by the block size (+sb_blocksize+) -and bits per byte. This value is stored in +sb_rbmblocks+. The nblocks and -extent array for the inode should match this. Each real time block gets its -own bit in the bitmap. - -[[Real-Time_Summary_Inode]] -=== Real-Time Summary Inode - -The real time summary inode, +sb_rsumino+, tracks the used and free space -accounting information for the real-time device. This file indexes the -approximate location of each free extent on the real-time device first by -log2(extent size) and then by the real-time bitmap block number. The size of -the summary inode file is equal to +sb_rbmblocks+ × log2(realtime device size) -× sizeof(+xfs_suminfo_t+). The entry for a given log2(extent size) and -rtbitmap block number is 0 if there is no free extents of that size at that -rtbitmap location, and positive if there are any. - -This data structure is not particularly space efficient, however it is a very -fast way to provide the same data as the two free space B+trees for regular -files since the space is preallocated and metadata maintenance is minimal. - -include::rtrmapbt.asciidoc[] +xref:Real-Time_Bitmap_Inode[Bitmap Inode] and the +xref:Real-Time_Summary_Inode[Summary Inode]. diff --git a/design/XFS_Filesystem_Structure/realtime.asciidoc b/design/XFS_Filesystem_Structure/realtime.asciidoc new file mode 100644 index 00000000000000..11426e8fdb632d --- /dev/null +++ b/design/XFS_Filesystem_Structure/realtime.asciidoc @@ -0,0 +1,50 @@ +[[Real-time_Devices]] += Real-time Devices + +The performance of the standard XFS allocator varies depending on the internal +state of the various metadata indices enabled on the filesystem. For +applications which need to minimize the jitter of allocation latency, XFS +supports the notion of a ``real-time device''. This is a special device +separate from the regular filesystem where extent allocations are tracked with +a bitmap and free space is indexed with a two-dimensional array. If an inode +is flagged with +XFS_DIFLAG_REALTIME+, its data will live on the real time +device. + +By placing the real time device (and the journal) on separate high-performance +storage devices, it is possible to reduce most of the unpredictability in I/O +response times that come from metadata operations. + +None of the XFS per-AG B+trees are involved with real time files. It is not +possible for real time files to share data blocks. + +[[Real-Time_Bitmap_Inode]] +== Free Space Bitmap Inode + +The real time bitmap inode, +sb_rbmino+, tracks the used/free space in the +real-time device using an old-style bitmap. One bit is allocated per real-time +extent. The size of an extent is specified by the superblock's +sb_rextsize+ +value. + +The number of blocks used by the bitmap inode is equal to the number of +real-time extents (+sb_rextents+) divided by the block size (+sb_blocksize+) +and bits per byte. This value is stored in +sb_rbmblocks+. The nblocks and +extent array for the inode should match this. Each real time block gets its +own bit in the bitmap. + +[[Real-Time_Summary_Inode]] +== Free Space Summary Inode + +The real time summary inode, +sb_rsumino+, tracks the used and free space +accounting information for the real-time device. This file indexes the +approximate location of each free extent on the real-time device first by +log2(extent size) and then by the real-time bitmap block number. The size of +the summary inode file is equal to +sb_rbmblocks+ × log2(realtime device size) +× sizeof(+xfs_suminfo_t+). The entry for a given log2(extent size) and +rtbitmap block number is 0 if there is no free extents of that size at that +rtbitmap location, and positive if there are any. + +This data structure is not particularly space efficient, however it is a very +fast way to provide the same data as the two free space B+trees for regular +files since the space is preallocated and metadata maintenance is minimal. + +include::rtrmapbt.asciidoc[] diff --git a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc index 689e2a874c13e9..a643d18add6094 100644 --- a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc +++ b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc @@ -84,6 +84,8 @@ include::journaling_log.asciidoc[] include::internal_inodes.asciidoc[] +include::realtime.asciidoc[] + include::fs_properties.asciidoc[] :leveloffset: 0 From patchwork Wed Nov 27 00:19:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13886448 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CAD641BDE6 for ; Wed, 27 Nov 2024 00:19:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732666792; cv=none; b=ierEiouENPJuPBVH0C+PH0sVvouj/dDSxfDL+mQ+GwxmiANl6Bb3pzAtgcQJshQMMMYHg3VeOVCqeJ7S023G2C5iWhmEHy7FKx1Y76U9bpmwOM9om1INKYaJyQKx7WtPlRqDKXoWdjWRdLg/4QDj7ZKIfIgdz+XuXd7gSwoVASM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732666792; c=relaxed/simple; bh=PiZS44bkChSuYiSAdonoq3e8qfW8/yzOwPNOsosBl1M=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=nVYcraYCQXRANsSjTrJD/rI4DmzwfhmRv6f6ZR5zBghmhnkG2/HbjFCenxmlOH6Pqu3NP3m3I6Nkd8bkl7tCZ8tsfoBSxk12jgD07KaRNqNST+nqwY+mThvq/5YuzbPy4hej4xV1EZNxirYZ30Oo1pCUD3wU3fQXoyvt3TebkyY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=kZeF15Cc; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="kZeF15Cc" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 57D14C4CECF; Wed, 27 Nov 2024 00:19:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1732666792; bh=PiZS44bkChSuYiSAdonoq3e8qfW8/yzOwPNOsosBl1M=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=kZeF15CcSXOubTqJvUtfaEos/OHe3Qc6Q0Lb7PxeV1YdUmkeNSFimKx3ghBUzGYgY D/+hSul9Q5z/A+Y7KUYJF0cplt+Jyohu9IvBcy+X22P/fo6yaVAcKquxNdY+QJb631 YvgAAilO1lOUJvqD7dCfAaRajiJWRQfxdEVtfFIcVVLIWWz5l3z1+Qv/SCA2jwnJ4v uc1u8IJLQZLq27QR3jUnczF0rcyflhZmUyO4hS0LyjTYesOugIVMLql7OldCAdNdBq ISdif3XSYEGOEvWXn3EihtWxMHqbIDHNlCENP/lpLVw2JK3exeoOFAaYuWsw3y9y1Z P06kpx0Nwk6mQ== Date: Tue, 26 Nov 2024 16:19:51 -0800 Subject: [PATCH 07/10] design: document realtime groups From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, hch@lst.de, cem@kernel.org, linux-xfs@vger.kernel.org Message-ID: <173266662314.996198.11932771102538214485.stgit@frogsfrogsfrogs> In-Reply-To: <173266662205.996198.11304294193325450774.stgit@frogsfrogsfrogs> References: <173266662205.996198.11304294193325450774.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Document the ondisk changes for realtime allocation groups. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- .../XFS_Filesystem_Structure/common_types.asciidoc | 4 .../internal_inodes.asciidoc | 2 design/XFS_Filesystem_Structure/magic.asciidoc | 3 .../XFS_Filesystem_Structure/ondisk_inode.asciidoc | 2 design/XFS_Filesystem_Structure/realtime.asciidoc | 344 ++++++++++++++++++++ .../XFS_Filesystem_Structure/superblock.asciidoc | 22 + 6 files changed, 376 insertions(+), 1 deletion(-) diff --git a/design/XFS_Filesystem_Structure/common_types.asciidoc b/design/XFS_Filesystem_Structure/common_types.asciidoc index 51909be384e273..34cdfdaeccf848 100644 --- a/design/XFS_Filesystem_Structure/common_types.asciidoc +++ b/design/XFS_Filesystem_Structure/common_types.asciidoc @@ -43,7 +43,9 @@ Unsigned 64 bit raw filesystem block number. *xfs_rtblock_t*:: Unsigned 64 bit extent number in the xref:Real-time_Devices[real-time] -sub-volume. +sub-volume. If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, these +values combine an xref:Realtime_Groups[rtgroup number] and block offset into +the realtime group. *xfs_fileoff_t*:: Unsigned 64 bit block offset into a file. diff --git a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc index 68c86d30ff8206..5f4d62201cbd67 100644 --- a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc +++ b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc @@ -21,6 +21,8 @@ of those inodes have been deallocated and may be reused by future features. [options="header"] |===== | Metadata File | Location +| xref:Real-Time_Bitmap_Inode[Realtime Bitmap] | /rtgroups/*.bitmap +| xref:Real-Time_Summary_Inode[Realtime Summary] | /rtgroups/*.summary |===== Metadata files are flagged by the +XFS_DIFLAG2_METADATA+ flag in the diff --git a/design/XFS_Filesystem_Structure/magic.asciidoc b/design/XFS_Filesystem_Structure/magic.asciidoc index 60952aeb876ff5..5da29b9ef9f3a8 100644 --- a/design/XFS_Filesystem_Structure/magic.asciidoc +++ b/design/XFS_Filesystem_Structure/magic.asciidoc @@ -45,9 +45,12 @@ relevant chapters. Magic numbers tend to have consistent locations: | +XFS_ATTR3_LEAF_MAGIC+ | 0x3bee | | xref:Leaf_Attributes[Leaf Attribute], v5 only | +XFS_ATTR3_RMT_MAGIC+ | 0x5841524d | XARM | xref:Remote_Values[Remote Attribute Value], v5 only | +XFS_RMAP_CRC_MAGIC+ | 0x524d4233 | RMB3 | xref:Reverse_Mapping_Btree[Reverse Mapping B+tree], v5 only +| +XFS_RTBITMAP_MAGIC+ | 0x424D505A | BMPZ | xref:Real-Time_Bitmap_Inode[Real-Time Bitmap], metadir only +| +XFS_RTSUMMARY_MAGIC+ | 0x53554D59 | SUMY | xref:Real-Time_Summary_Inode[Real-Time Summary], metadir only | +XFS_RTRMAP_CRC_MAGIC+ | 0x4d415052 | MAPR | xref:Real_time_Reverse_Mapping_Btree[Real-Time Reverse Mapping B+tree], v5 only | +XFS_REFC_CRC_MAGIC+ | 0x52334643 | R3FC | xref:Reference_Count_Btree[Reference Count B+tree], v5 only | +XFS_MD_MAGIC+ | 0x5846534d | XFSM | xref:Metadata_Dumps[Metadata Dumps] +| +XFS_RTSB_MAGIC+ | 0x46726F67 | Frog | xref:Realtime_Groups[Realtime Groups] |===== The magic numbers for log items are at offset zero in each log item, but items diff --git a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc index 02ec0d12bb57e5..e28929907147b7 100644 --- a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc +++ b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc @@ -199,6 +199,8 @@ directory tree. [source, c] ---- enum xfs_metafile_type { + XFS_METAFILE_RTBITMAP, + XFS_METAFILE_RTSUMMARY, }; ---- diff --git a/design/XFS_Filesystem_Structure/realtime.asciidoc b/design/XFS_Filesystem_Structure/realtime.asciidoc index 11426e8fdb632d..3a72eb5175ad89 100644 --- a/design/XFS_Filesystem_Structure/realtime.asciidoc +++ b/design/XFS_Filesystem_Structure/realtime.asciidoc @@ -31,6 +31,146 @@ and bits per byte. This value is stored in +sb_rbmblocks+. The nblocks and extent array for the inode should match this. Each real time block gets its own bit in the bitmap. +If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, each block of the +realtime bitmap file has a header of the following format: + +[source, c] +---- +struct xfs_rtbuf_blkinfo { + __be32 rt_magic; + __be32 rt_crc; + __be64 rt_owner; + __be64 rt_blkno; + __be64 rt_lsn; + uuid_t rt_uuid; +}; +---- + +*rt_magic*:: +Specifies the magic number for the rtbitmap block: ``BMPZ'' (0x424D505A). + +*rt_crc*:: +Checksum of the block. + +*rt_owner*:: +Specifies the inode number for the file that owns this block. + +*rt_blkno*:: +Disk address of this block. + +*rt_lsn*:: +Log sequence number of the last write to this block. + +*rt_uuid*:: +The UUID of this block, which must match either +sb_uuid+ or +sb_meta_uuid+ +depending on which features are set. + +After the block header, the bitmap data are encoded as be32 word values. + +=== xfs_db rtbitmap Example + +This example shows a real-time bitmap file from a freshly populated filesystem: + +---- +xfs_db> path -m /rtgroups/3.bitmap +xfs_db> p +core.magic = 0x494e +core.mode = 0100000 +core.version = 3 +core.format = 2 (extents) +core.metatype = 5 (rtbitmap) +core.uid = 0 +core.gid = 0 +core.nlinkv2 = 1 +core.projid_lo = 3 +core.projid_hi = 0 +core.nextents = 1 +core.atime.sec = Tue Oct 15 16:04:02 2024 +core.atime.nsec = 769675000 +core.mtime.sec = Tue Oct 15 16:04:02 2024 +core.mtime.nsec = 769675000 +core.ctime.sec = Tue Oct 15 16:04:02 2024 +core.ctime.nsec = 769681000 +core.size = 135168 +core.nblocks = 33 +core.extsize = 0 +core.naextents = 0 +core.forkoff = 24 +core.aformat = 1 (local) +core.dmevmask = 0 +core.dmstate = 0 +core.newrtbm = 0 +core.prealloc = 0 +core.realtime = 0 +core.immutable = 1 +core.append = 0 +core.sync = 1 +core.noatime = 1 +core.nodump = 1 +core.rtinherit = 0 +core.projinherit = 0 +core.nosymlinks = 0 +core.extsz = 0 +core.extszinherit = 0 +core.nodefrag = 1 +core.filestream = 0 +core.gen = 2653591217 +next_unlinked = null +v3.crc = 0x34a17119 (correct) +v3.change_count = 3 +v3.lsn = 0 +v3.flags2 = 0x38 +v3.cowextsize = 0 +v3.crtime.sec = Tue Oct 15 16:04:02 2024 +v3.crtime.nsec = 769675000 +v3.inumber = 33685633 +v3.uuid = a6575f59-1514-445e-883e-211b2c5a0f05 +v3.reflink = 0 +v3.cowextsz = 0 +v3.dax = 0 +v3.bigtime = 1 +v3.nrext64 = 1 +v3.metadata = 1 +u3.bmx[0] = [startoff,startblock,blockcount,extentflag] +0:[0,4210712,33,0] +a.sfattr.hdr.totsize = 27 +a.sfattr.hdr.count = 1 +a.sfattr.list[0].namelen = 8 +a.sfattr.list[0].valuelen = 12 +a.sfattr.list[0].root = 0 +a.sfattr.list[0].secure = 0 +a.sfattr.list[0].parent = 1 +a.sfattr.list[0].name = "0.bitmap" +a.sfattr.list[0].parent_dir.inumber = 33685632 +a.sfattr.list[0].parent_dir.gen = 142228546 +xfs_db> dblock 0 +xfs_db> p +magicnum = 0x424d505a +crc = 0xc8b10abf (correct) +owner = 33685633 +bno = 20902080 +lsn = 0x100007696 +uuid = a6575f59-1514-445e-883e-211b2c5a0f05 +rtwords[0-1011] = 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0 8:0 9:0 10:0 11:0 12:0 13:0 +14:0 15:0 16:0 17:0 18:0 19:0 20:0 21:0xfffff800 22:0xffffffff 23:0xffffffff +24:0xffffffff 25:0xffffffff 26:0xffffffff 27:0xffffffff 28:0xffffffff +29:0xffffffff 30:0xffffffff 31:0xffffffff 32:0xffffffff +... +979:0xffffffff 980:0xffffffff 981:0xffffffff 982:0xffffffff 983:0xffffffff +984:0xffffffff 985:0xffffffff 986:0xffffffff 987:0xffffffff 988:0xffffffff +989:0xffffffff 990:0xffffffff 991:0xffffffff 992:0xffffffff 993:0xffffffff +994:0xffffffff 995:0xffffffff 996:0xffffffff 997:0xffffffff 998:0xffffffff +999:0xffffffff 1000:0xffffffff 1001:0xffffffff 1002:0xffffffff 1003:0xffffffff +1004:0xffffffff 1005:0xffffffff 1006:0xffffffff 1007:0xffffffff 1008:0xffffffff +1009:0xffffffff 1010:0xffffffff 1011:0xffffffff +---- + +From this example, we can clearly see that this is a bitmap file in the +metadata directory tree, and that it is the bitmap file for rtgroup 3. When we +access the first block in the bitmap file, we can clearly see the new block +header and that the first 179 extents are allocated. The bitmap words were +excerpted for brevity. + [[Real-Time_Summary_Inode]] == Free Space Summary Inode @@ -47,4 +187,208 @@ This data structure is not particularly space efficient, however it is a very fast way to provide the same data as the two free space B+trees for regular files since the space is preallocated and metadata maintenance is minimal. +If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, each block of the +realtime summary file has the same header as rtbitmap file blocks. However, +the magic number will be ``SUMY'' (0x53554D59). After the block header, the +summary counts are encoded as be32 integers. + +=== xfs_db rtsummary Example + +This example shows a real-time summary file from a freshly populated filesystem: + +---- +xfs_db> path -m /rtgroups/3.summary +xfs_db> p +core.magic = 0x494e +core.mode = 0100000 +core.version = 3 +core.format = 2 (extents) +core.metatype = 6 (rtsummary) +core.uid = 0 +core.gid = 0 +core.nlinkv2 = 1 +core.projid_lo = 3 +core.projid_hi = 0 +core.nextents = 1 +core.atime.sec = Tue Oct 15 16:04:02 2024 +core.atime.nsec = 769694000 +core.mtime.sec = Tue Oct 15 16:04:02 2024 +core.mtime.nsec = 769694000 +core.ctime.sec = Tue Oct 15 16:04:02 2024 +core.ctime.nsec = 769699000 +core.size = 4096 +core.nblocks = 1 +core.extsize = 0 +core.naextents = 0 +core.forkoff = 24 +core.aformat = 1 (local) +core.dmevmask = 0 +core.dmstate = 0 +core.newrtbm = 0 +core.prealloc = 0 +core.realtime = 0 +core.immutable = 1 +core.append = 0 +core.sync = 1 +core.noatime = 1 +core.nodump = 1 +core.rtinherit = 0 +core.projinherit = 0 +core.nosymlinks = 0 +core.extsz = 0 +core.extszinherit = 0 +core.nodefrag = 1 +core.filestream = 0 +core.gen = 519466891 +next_unlinked = null +v3.crc = 0x54fc58d0 (correct) +v3.change_count = 3 +v3.lsn = 0 +v3.flags2 = 0x38 +v3.cowextsize = 0 +v3.crtime.sec = Tue Oct 15 16:04:02 2024 +v3.crtime.nsec = 769694000 +v3.inumber = 33685634 +v3.uuid = a6575f59-1514-445e-883e-211b2c5a0f05 +v3.reflink = 0 +v3.cowextsz = 0 +v3.dax = 0 +v3.bigtime = 1 +v3.nrext64 = 1 +v3.metadata = 1 +u3.bmx[0] = [startoff,startblock,blockcount,extentflag] +0:[0,4210703,1,0] +a.sfattr.hdr.totsize = 28 +a.sfattr.hdr.count = 1 +a.sfattr.list[0].namelen = 9 +a.sfattr.list[0].valuelen = 12 +a.sfattr.list[0].root = 0 +a.sfattr.list[0].secure = 0 +a.sfattr.list[0].parent = 1 +a.sfattr.list[0].name = "0.summary" +a.sfattr.list[0].parent_dir.inumber = 33685632 +a.sfattr.list[0].parent_dir.gen = 142228546 +xfs_db> dblock 0 +xfs_db> p +magicnum = 0x53554d59 +crc = 0x473340a8 (correct) +owner = 33685634 +bno = 20902008 +lsn = 0x100007696 +uuid = a6575f59-1514-445e-883e-211b2c5a0f05 +suminfo[0-1011] = 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0 8:0 9:0 10:0 11:0 12:0 13:0 +14:0 15:0 16:0 17:0 18:0 19:0 20:0 21:0 22:0 23:0 24:0 25:0 26:0 27:0 28:0 29:0 +30:0 31:0 32:0 +... +618:0 619:0 620:0 621:0 622:0 623:0 624:0 625:0 626:0 627:1 628:0 629:0 630:0 +... +979:0 980:0 981:0 982:0 983:0 984:0 985:0 986:0 987:0 988:0 989:0 990:0 991:0 +992:0 993:0 994:0 995:0 996:0 997:0 998:0 999:0 1000:0 1001:0 1002:0 1003:0 +1004:0 1005:0 1006:0 1007:0 1008:0 1009:0 1010:0 1011:0 +---- + +From this example, we can clearly see that this is a summary file in the +metadata directory tree, and that it is the summary file for rtgroup 3. When +we access the first block in the summary file, we can clearly see the new block +header and the nonzero counter for the one large free extent in this group. +The summary counts were excerpted for brevity. + +[[Realtime_Groups]] +== Realtime Groups + +To reduce metadata contention for space allocation and remapping activities +being applied to realtime files, the realtime volume can be split into +allocation groups, just like the data volume. The free space information is +still contained in a single file that applies to the entire volume. + +Each realtime allocation group can contain up to (2^31^ - 1) filesystem blocks, +regardless of the underlying realtime extent size. + +Each realtime group has the following characteristics: + + * Group 0 has a super block describing overall filesystem info + * Free space bitmap + * Summary of free space + +The free space metadata are the same as described in the previous sections, +except that their scope covers only a single rtgroup. The other structures are +expanded upon in the following sections. + +[[Realtime_Group_Superblocks]] +=== Superblocks + +The first block of each realtime group contains a superblock. These fields +must match their counterparts in the filesystem superblock on the data device. + +[source, c] +---- +struct xfs_rtsb { + __be32 rsb_magicnum; + __le32 rsb_crc; + + __be32 rsb_pad; + unsigned char rsb_fname[XFSLABEL_MAX]; + + uuid_t rsb_uuid; + uuid_t rsb_meta_uuid; + + /* must be padded to 64 bit alignment */ +}; +---- + +*rsb_magicnum*:: +Identifies the filesystem. Its value is +XFS_RTSB_MAGIC+ ``Frog'' (0x46726F67). + +*rsb_crc*:: +Superblock checksum. + +*rsb_pad*:: +Must be zero. + +*rsb_fname[12]*:: +Name for the filesystem. This matches +sb_fname+ in the primary superblock. + +*rsb_uuid*:: +UUID (Universally Unique ID) for the filesystem. This matches +sb_uuid+ in the +primary superblock. + +*rsb_meta_uuid*:: +Metadata UUID for the filesystem. This matches +sb_meta_uuid+ in the primary +superblock. + +==== xfs_db rtgroup Superblock Example + +A filesystem is made on a multidisk filesystem with the following command: + +---- +# mkfs.xfs -r rtgroups=1,rgcount=4,rtdev=/dev/sdb /dev/sda -f +meta-data=/dev/sda isize=512 agcount=4, agsize=1298176 blks + = sectsz=512 attr=2, projid32bit=1 + = crc=1 finobt=1, sparse=1, rmapbt=1 + = reflink=1 bigtime=1 inobtcount=1 nrext64=1 + = metadir=1 +data = bsize=4096 blocks=5192704, imaxpct=25 + = sunit=0 swidth=0 blks +naming =version 2 bsize=4096 ascii-ci=0, ftype=1 +log =internal log bsize=4096 blocks=16384, version=2 + = sectsz=512 sunit=0 blks, lazy-count=1 +realtime =/dev/sdb extsz=4096 blocks=5192704, rtextents=5192704 + = rgcount=5 rgsize=1048576 extents +---- + +And in xfs_db, inspecting the realtime group superblock and then the regular +superblock: + +---- +# xfs_db -R /dev/sdb /dev/sda +xfs_db> rtsb +xfs_db> print +magicnum = 0x46726f67 +crc = 0x759a62d4 (correct) +pad = 0 +fname = "\000\000\000\000\000\000\000\000\000\000\000\000" +uuid = 7e55b909-8728-4d69-a1fa-891427314eea +meta_uuid = 7e55b909-8728-4d69-a1fa-891427314eea +---- + include::rtrmapbt.asciidoc[] diff --git a/design/XFS_Filesystem_Structure/superblock.asciidoc b/design/XFS_Filesystem_Structure/superblock.asciidoc index 56877615ae81bf..bffb1659d0ba38 100644 --- a/design/XFS_Filesystem_Structure/superblock.asciidoc +++ b/design/XFS_Filesystem_Structure/superblock.asciidoc @@ -70,6 +70,10 @@ struct xfs_dsb { __be64 sb_lsn; uuid_t sb_meta_uuid; __be64 sb_metadirino; + __be32 sb_rgcount; + __be32 sb_rgextents; + __u8 sb_rgblklog; + __u8 sb_pad[7]; /* must be padded to 64 bit alignment */ }; @@ -480,6 +484,24 @@ If the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is set, this field points to the inode of the root directory of the metadata directory tree. This field is zero otherwise. +*sb_rgcount*:: +Count of realtime groups in the filesystem, if the ++XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is enabled. If no realtime subvolume +exists, this value will be zero. + +*sb_rgextents*:: +Maximum number of realtime extents that can be contained within a realtime +group, if the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is enabled. + +*sb_rgblklog*:: +If the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is enabled, this is the log~2~ +value of +sb_rgextents+ * +sb_rextsize+ (rounded up). This value is used to +generate absolute block numbers defined in extent maps from the segmented ++xfs_rtblock_t+ values. + +*sb_pad[7]*:: +Zeroes, if the +XFS_SB_FEAT_RO_INCOMPAT_METADIR+ feature is enabled. + === xfs_db Superblock Example A filesystem is made on a single disk with the following command: From patchwork Wed Nov 27 00:20:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13886449 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 29CC22907 for ; Wed, 27 Nov 2024 00:20:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732666808; cv=none; b=I87W/B+q4Bj1SMgvt5PNpyGvhwOGdB6RmQf3dhwF8Ovc9wQjxqauoTs1Q/iTr4x+lvXgsel2dCMCh1LlwwILJqs8q3thZhDzGMw147u7lIy9oBX+1eE1a8ACVeGiNwCOnhpBve9C6y7RIOtvx6Y67dog9yJHBuvkTgM28ertf0g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732666808; c=relaxed/simple; bh=HCPwrhHbI/IW77jVQby1kjgx4Qypa1mZnclvIJS0KXg=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=NArrpu0BlcrHQ06OsCVfuCGwVSiF0XtBNi9iFRNPx02Ag6irtTgMCuprereWBgjUpVPhx902K1kRKvoF4l/LxvrvqXJAf8AxWZToPVEX+TG3SR4eb26e5nNGqq2tBarwtLSiA5+aDkk9DBWP5C4s5TKh0vEpOdLP10KuVis122Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=SiVqnRxt; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="SiVqnRxt" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F23B1C4CECF; Wed, 27 Nov 2024 00:20:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1732666808; bh=HCPwrhHbI/IW77jVQby1kjgx4Qypa1mZnclvIJS0KXg=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=SiVqnRxtwS9875z6ahGNVJoQUMPJH/Qcq3aLuz0TnuLfZrOVYWreyWU5Og7Dlmydp Ynqld1UTPaGo5MYVz7bO66I+RuBXlsf3nCegRbr0WDafuhtaDfPfKoXpafkbRdR/5g oP3jQQTC/pNFLk2Ivgd/Q9lbZfIojmODUhPrp9A7GXZmAXEugwHRhDjBUtgjFwRRC7 jdmh9ibzacKmfHP6+y1zAV3HAJPtJjV4S6JutMJmHz0PuZNlntDx8ZNScYu/uIFEYz GmtwiF/ctXduX9OcjgxXAmYJivBBI1nqqs6aNQu5TaghDXbnBqurCWs0TCIzbP+qwy g7QOfOoPAWTKA== Date: Tue, 26 Nov 2024 16:20:07 -0800 Subject: [PATCH 08/10] design: document metadata directory tree quota changes From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, hch@lst.de, cem@kernel.org, linux-xfs@vger.kernel.org Message-ID: <173266662329.996198.2989351246297986467.stgit@frogsfrogsfrogs> In-Reply-To: <173266662205.996198.11304294193325450774.stgit@frogsfrogsfrogs> References: <173266662205.996198.11304294193325450774.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Document the changes to the ondisk quota metadata that came in with metadata directory trees. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- .../internal_inodes.asciidoc | 3 +++ .../XFS_Filesystem_Structure/ondisk_inode.asciidoc | 3 +++ .../XFS_Filesystem_Structure/superblock.asciidoc | 3 +++ 3 files changed, 9 insertions(+) diff --git a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc index 5f4d62201cbd67..40eb57233ce7c0 100644 --- a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc +++ b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc @@ -21,6 +21,9 @@ of those inodes have been deallocated and may be reused by future features. [options="header"] |===== | Metadata File | Location +| xref:Quota_Inodes[User Quota] | /quota/user +| xref:Quota_Inodes[Group Quota] | /quota/group +| xref:Quota_Inodes[Project Quota] | /quota/project | xref:Real-Time_Bitmap_Inode[Realtime Bitmap] | /rtgroups/*.bitmap | xref:Real-Time_Summary_Inode[Realtime Summary] | /rtgroups/*.summary |===== diff --git a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc index e28929907147b7..6e52e5fd3d6c1e 100644 --- a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc +++ b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc @@ -199,6 +199,9 @@ directory tree. [source, c] ---- enum xfs_metafile_type { + XFS_METAFILE_USRQUOTA, + XFS_METAFILE_GRPQUOTA, + XFS_METAFILE_PRJQUOTA, XFS_METAFILE_RTBITMAP, XFS_METAFILE_RTSUMMARY, }; diff --git a/design/XFS_Filesystem_Structure/superblock.asciidoc b/design/XFS_Filesystem_Structure/superblock.asciidoc index bffb1659d0ba38..f0455304635737 100644 --- a/design/XFS_Filesystem_Structure/superblock.asciidoc +++ b/design/XFS_Filesystem_Structure/superblock.asciidoc @@ -259,6 +259,9 @@ Quota flags. It can be a combination of the following flags: | +XFS_PQUOTA_CHKD+ | Project quotas have been checked. |===== +If the +XFS_SB_FEAT_INCOMPAT_METADIR+ feature is enabled, the +sb_qflags+ field +will persist across mounts if no quota mount options are provided. + *sb_flags*:: Miscellaneous flags. From patchwork Wed Nov 27 00:20:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13886450 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 121841862 for ; Wed, 27 Nov 2024 00:20:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732666825; cv=none; b=n7tyNZI8zrGN1L5oQc5jkbsgqHBmrFRY/rjpiZCIy7e8V6TT13CPqk01J6otyZ+puGYKG1LTwepsM2Yw9wuRbffxDCcVy7G1LC002wfKPjVdEz6Naw9u1p0WvaynRhQcQQIFhAMYZ8sSo2WDOBRtQfYEpXOlc1f7MyLL6wXFFfo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732666825; c=relaxed/simple; bh=OaGV+TH6pLtA5ze+4nl505cPKWUguc+lYrdHGo+0rUM=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Sfx2kBTReUCy6K71jcIF7EK/1LiAAmNxPIVOCPG57OIRVLH+Du+nZ1GB4pU1Cl9s7KTtsH/Y9ehsnULJxt/QpVjkXDvjP65MrfbcGVNG/w+Us3SsitN1lSODAZS2342+FZRR9kQlo3zpkrk4v4xKCrNZM4mEYP8COgwHul+PhOo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=SOLWUCAL; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="SOLWUCAL" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 96D45C4CECF; Wed, 27 Nov 2024 00:20:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1732666823; bh=OaGV+TH6pLtA5ze+4nl505cPKWUguc+lYrdHGo+0rUM=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=SOLWUCAL/6htrWNVkjnrRyP5xC0YcgmmX4gHDHKh3xvGkdW3dJ+YstFx5MegimS5e CeBWu58qy5NIlR+wiwtuMZxs8tLlRK4g7BE0JyWg8MDVIj9BggiESsIDXzV309IDp+ acdZpYfbAHBJ98axnfRtqdk2yv6NXMOJnF0OKY44lqG4mhHHu9p+bAms3EiOwXPA9A ZWisRkUh3ZvwPNsm5iK5irhoGHgCowK1rpiWCctcPDlAvbID0/qVhcLDkProrCCWxV VoiiYQKJ2UWCTjBlpyo6uNSw4MapUpFDjOjUJn+boZUDrHUza6+NsYXhwjZiEE0Rh8 d5qBzOcXjnE8A== Date: Tue, 26 Nov 2024 16:20:23 -0800 Subject: [PATCH 09/10] design: update metadump v2 format to reflect rt dumps From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, hch@lst.de, cem@kernel.org, linux-xfs@vger.kernel.org Message-ID: <173266662344.996198.10859077130378156318.stgit@frogsfrogsfrogs> In-Reply-To: <173266662205.996198.11304294193325450774.stgit@frogsfrogsfrogs> References: <173266662205.996198.11304294193325450774.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Update the metadump v2 format documentation to add realtime device dumps. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- design/XFS_Filesystem_Structure/metadump.asciidoc | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/design/XFS_Filesystem_Structure/metadump.asciidoc b/design/XFS_Filesystem_Structure/metadump.asciidoc index a32d6423ea6e75..226622c0d2f20e 100644 --- a/design/XFS_Filesystem_Structure/metadump.asciidoc +++ b/design/XFS_Filesystem_Structure/metadump.asciidoc @@ -119,7 +119,16 @@ Dump contains external log contents. |===== *xmh_incompat_flags*:: -Must be zero. +A combination of the following flags: + +.Metadump v2 incompat flags +[options="header"] +|===== +| Flag | Description +| +XFS_MD2_INCOMPAT_RTDEVICE+ | +Dump contains realtime device contents. + +|===== *xmh_reserved*:: Must be zero. @@ -143,6 +152,7 @@ Bits 55-56 determine the device from which the metadata dump data was extracted. | Value | Description | 0 | Data device | 1 | External log +| 2 | Realtime device |===== The lower 54 bits determine the device address from which the dump data was From patchwork Wed Nov 27 00:20:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13886451 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E3592F32 for ; Wed, 27 Nov 2024 00:20:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732666839; cv=none; b=oPtKZCPbiB/64j8cQme53T+tvjbF1JzqR/JPYPSbzg0kF4r00rvcd5vIOev7HxGt/VTPF31QYZez31MGRa5DCuupN7APFrHkuXRcr3gu4IEXTENdrJSWDDnG4dahcg8VFcKCDSFv0zteCOxZSmZxghsOBuSL/ZJ17w5gXp45hdY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732666839; c=relaxed/simple; bh=2OHE14oUnOBOKyOkUxAcLvTryl+Jt0UQk/4iuLWUvfE=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=TmXOkSjcpVYBSJxfFoTIYWx75MJQs828D3MF4g2O/M5Gjnea+WY6dGoZlA18c+OZRfgtv5kK6/Cx/kJBe22/3mSkLOcV6s3VHWirIKsXmW24qIhcVbI5qnOwT9pDE2d1lDdTQrRZ7KwAEreBwt70v88V0gcd7VEESUBAUvWn7kE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=arDMwSg3; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="arDMwSg3" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3478FC4CECF; Wed, 27 Nov 2024 00:20:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1732666839; bh=2OHE14oUnOBOKyOkUxAcLvTryl+Jt0UQk/4iuLWUvfE=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=arDMwSg3wSgQuNrDT7hZHC62gmChv1/GRloKWUUYFVxLBfeUtW6v8ngc0NR8iaqQ/ 20M+pg6NBiRd54yngWpKKLTdx99MEYNg0u2v2j+eiTWt5b+o+sNOI47u6bdmTOYnCl LOfe0lo4KLOVIQRhqjoDKvy2iUyBPoFUTs3bUXnXrOvBZez+EUADWPS3Cd71//BwTh vZ+4MqpXddtS918iI2gYxKtZkHN1nWRJ6yZ0Nx40UxwYmjYIpJiguf8ToTeiS5RB3x fpc/hWbeix0rXZR/+yRv+vRldVvO0MyGkt+o/GOuFRX/YMaBw2MnIuDyuTXHV97EZX olk6alq1KdoCg== Date: Tue, 26 Nov 2024 16:20:38 -0800 Subject: [PATCH 10/10] xfs-documentation: release for 6.1[23] From: "Darrick J. Wong" To: djwong@kernel.org Cc: hch@lst.de, cem@kernel.org, linux-xfs@vger.kernel.org Message-ID: <173266662358.996198.14378980413890439472.stgit@frogsfrogsfrogs> In-Reply-To: <173266662205.996198.11304294193325450774.stgit@frogsfrogsfrogs> References: <173266662205.996198.11304294193325450774.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Make a new release since we've just landed ondisk format changes for 6.12 and 6.13. Signed-off-by: "Darrick J. Wong" --- design/XFS_Filesystem_Structure/docinfo.xml | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/design/XFS_Filesystem_Structure/docinfo.xml b/design/XFS_Filesystem_Structure/docinfo.xml index 1eddb1f42f11a1..3aadb6637070d2 100644 --- a/design/XFS_Filesystem_Structure/docinfo.xml +++ b/design/XFS_Filesystem_Structure/docinfo.xml @@ -230,4 +230,23 @@ + + 3.1415926535 + November 2024 + + Darrick + Wong + djwong@kernel.org + + + + update online fsck docs + filesystem properties + metadata directory tree + realtime groups + metadir and quota + realtime sb metadump + + +