From patchwork Wed May 5 19:20:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 12240901 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC9E9C433ED for ; Wed, 5 May 2021 19:20:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 99F4E613C4 for ; Wed, 5 May 2021 19:20:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234973AbhEETVo (ORCPT ); Wed, 5 May 2021 15:21:44 -0400 Received: from out2-smtp.messagingengine.com ([66.111.4.26]:43739 "EHLO out2-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234664AbhEETVn (ORCPT ); Wed, 5 May 2021 15:21:43 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id ACE1B5C019C; Wed, 5 May 2021 15:20:46 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 05 May 2021 15:20:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=from :to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; s=fm3; bh=jYFAQOEyxqJ794q2Ch6wlVGm7q rChZ39ti+IGTaPbKc=; b=fs79QkAw0rs2Ocq7dM7BIT+Fz+nT5ezRqqJqXs5tnK ZqJFBJCREtTRB4FDmQCTpMEELtTkc1b+hk9ZtJh5Y5Q0bERmy1EladRNvIt1p7HB Qc+LX1g8hWDJJLxvU6HNFHZNbsyKE97LmBiIRfnGMpUviwwQ8jq0UrH030BYVCAH XLkSP7QQnSHvW1FNuxHmd1etRCZhSJsV2oBqgKRygm4ZbIQUBCyS/Tw1oUMN/Wd4 928Llq9s1OzOGooyczuwoMl+nJKfkLLg81YztAQyK8lDsMHrAxVSAmQY3nuZubn8 yEN3dwT48Rn9ROjfCxATYjgNb1+1JjGzfY21HYmDeRlQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=jYFAQOEyxqJ794q2Ch6wlVGm7qrChZ39ti+IGTaPbKc=; b=TKYKKHkA 8p2RmFp6CN5Glj6Hn1QY+ObHQ/HTu6yM5KNeigCxRV8B8z3iCHscjkQxY1FsH6gF apBFEaSuprDLiP3GAyQFtF49IQnSiQp9pqGFXym5XDSQUV1cR9842GnrVDU/neaI +un9JQ+N3PNBSG8YGz3bCCKE/6JWc1K54Yrf0VvpsUud6vbEqEZOYreWLA+JyW68 8mQM3Yceq7AboKM2UfyzrjO2WbXZgQa3k6s9CQ1ZHpOxFZujEuEdY//YIfM2XWJ6 RwkA8fY+UWAow3q2Hi+5PHf+p7R3GEKH9zokdqdWJHWsed+pOgbWB6Vy17kVOFSa kgWn1o3BFjaXFw== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrvdefkedgudefvdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtke ertdertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhr rdhioheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejff evvdehtddufeeihfekgeeuheelnecukfhppedvtdejrdehfedrvdehfedrjeenucevlhhu shhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegsohhrihhssegsuh hrrdhioh X-ME-Proxy: Received: from localhost (unknown [207.53.253.7]) by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 5 May 2021 15:20:46 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, linux-fscrypt@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v4 1/5] btrfs: add compat_flags to btrfs_inode_item Date: Wed, 5 May 2021 12:20:39 -0700 Message-Id: <6ed83a27f88e18f295f7d661e9c87e7ec7d33428.1620241221.git.boris@bur.io> X-Mailer: git-send-email 2.30.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The tree checker currently rejects unrecognized flags when it reads btrfs_inode_item. Practically, this means that adding a new flag makes the change backwards incompatible if the flag is ever set on a file. Take up one of the 4 reserved u64 fields in the btrfs_inode_item as a new "compat_flags". These flags are zero on inode creation in btrfs and mkfs and are ignored by an older kernel, so it should be safe to use them in this way. Signed-off-by: Boris Burkov --- fs/btrfs/btrfs_inode.h | 1 + fs/btrfs/ctree.h | 2 ++ fs/btrfs/delayed-inode.c | 2 ++ fs/btrfs/inode.c | 3 +++ fs/btrfs/ioctl.c | 7 ++++--- fs/btrfs/tree-log.c | 1 + include/uapi/linux/btrfs_tree.h | 7 ++++++- 7 files changed, 19 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index c652e19ad74e..e8dbc8e848ce 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -191,6 +191,7 @@ struct btrfs_inode { /* flags field from the on disk inode */ u32 flags; + u64 compat_flags; /* * Counters to keep track of the number of extent item's we may use due diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 0f5b0b12762b..0546273a520b 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1786,6 +1786,7 @@ BTRFS_SETGET_FUNCS(inode_gid, struct btrfs_inode_item, gid, 32); BTRFS_SETGET_FUNCS(inode_mode, struct btrfs_inode_item, mode, 32); BTRFS_SETGET_FUNCS(inode_rdev, struct btrfs_inode_item, rdev, 64); BTRFS_SETGET_FUNCS(inode_flags, struct btrfs_inode_item, flags, 64); +BTRFS_SETGET_FUNCS(inode_compat_flags, struct btrfs_inode_item, compat_flags, 64); BTRFS_SETGET_STACK_FUNCS(stack_inode_generation, struct btrfs_inode_item, generation, 64); BTRFS_SETGET_STACK_FUNCS(stack_inode_sequence, struct btrfs_inode_item, @@ -1803,6 +1804,7 @@ BTRFS_SETGET_STACK_FUNCS(stack_inode_gid, struct btrfs_inode_item, gid, 32); BTRFS_SETGET_STACK_FUNCS(stack_inode_mode, struct btrfs_inode_item, mode, 32); BTRFS_SETGET_STACK_FUNCS(stack_inode_rdev, struct btrfs_inode_item, rdev, 64); BTRFS_SETGET_STACK_FUNCS(stack_inode_flags, struct btrfs_inode_item, flags, 64); +BTRFS_SETGET_STACK_FUNCS(stack_inode_compat_flags, struct btrfs_inode_item, compat_flags, 64); BTRFS_SETGET_FUNCS(timespec_sec, struct btrfs_timespec, sec, 64); BTRFS_SETGET_FUNCS(timespec_nsec, struct btrfs_timespec, nsec, 32); BTRFS_SETGET_STACK_FUNCS(stack_timespec_sec, struct btrfs_timespec, sec, 64); diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index 1a88f6214ebc..ef4e0265dbe3 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -1718,6 +1718,7 @@ static void fill_stack_inode_item(struct btrfs_trans_handle *trans, btrfs_set_stack_inode_transid(inode_item, trans->transid); btrfs_set_stack_inode_rdev(inode_item, inode->i_rdev); btrfs_set_stack_inode_flags(inode_item, BTRFS_I(inode)->flags); + btrfs_set_stack_inode_compat_flags(inode_item, BTRFS_I(inode)->compat_flags); btrfs_set_stack_inode_block_group(inode_item, 0); btrfs_set_stack_timespec_sec(&inode_item->atime, @@ -1776,6 +1777,7 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev) inode->i_rdev = 0; *rdev = btrfs_stack_inode_rdev(inode_item); BTRFS_I(inode)->flags = btrfs_stack_inode_flags(inode_item); + BTRFS_I(inode)->compat_flags = btrfs_stack_inode_compat_flags(inode_item); inode->i_atime.tv_sec = btrfs_stack_timespec_sec(&inode_item->atime); inode->i_atime.tv_nsec = btrfs_stack_timespec_nsec(&inode_item->atime); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 69fcdf8f0b1c..d89000577f7f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3627,6 +3627,7 @@ static int btrfs_read_locked_inode(struct inode *inode, BTRFS_I(inode)->index_cnt = (u64)-1; BTRFS_I(inode)->flags = btrfs_inode_flags(leaf, inode_item); + BTRFS_I(inode)->compat_flags = btrfs_inode_compat_flags(leaf, inode_item); cache_index: /* @@ -3793,6 +3794,7 @@ static void fill_inode_item(struct btrfs_trans_handle *trans, btrfs_set_token_inode_transid(&token, item, trans->transid); btrfs_set_token_inode_rdev(&token, item, inode->i_rdev); btrfs_set_token_inode_flags(&token, item, BTRFS_I(inode)->flags); + btrfs_set_token_inode_compat_flags(&token, item, BTRFS_I(inode)->compat_flags); btrfs_set_token_inode_block_group(&token, item, 0); } @@ -8857,6 +8859,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) ei->defrag_bytes = 0; ei->disk_i_size = 0; ei->flags = 0; + ei->compat_flags = 0; ei->csum_bytes = 0; ei->index_cnt = (u64)-1; ei->dir_index = 0; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 0ba0e4ddaf6b..ff335c192170 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -102,8 +102,9 @@ static unsigned int btrfs_mask_fsflags_for_type(struct inode *inode, * Export internal inode flags to the format expected by the FS_IOC_GETFLAGS * ioctl. */ -static unsigned int btrfs_inode_flags_to_fsflags(unsigned int flags) +static unsigned int btrfs_inode_flags_to_fsflags(struct btrfs_inode *binode) { + unsigned int flags = binode->flags; unsigned int iflags = 0; if (flags & BTRFS_INODE_SYNC) @@ -156,7 +157,7 @@ void btrfs_sync_inode_flags_to_i_flags(struct inode *inode) static int btrfs_ioctl_getflags(struct file *file, void __user *arg) { struct btrfs_inode *binode = BTRFS_I(file_inode(file)); - unsigned int flags = btrfs_inode_flags_to_fsflags(binode->flags); + unsigned int flags = btrfs_inode_flags_to_fsflags(binode); if (copy_to_user(arg, &flags, sizeof(flags))) return -EFAULT; @@ -228,7 +229,7 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg) btrfs_inode_lock(inode, 0); fsflags = btrfs_mask_fsflags_for_type(inode, fsflags); - old_fsflags = btrfs_inode_flags_to_fsflags(binode->flags); + old_fsflags = btrfs_inode_flags_to_fsflags(binode); ret = vfs_ioc_setflags_prepare(inode, old_fsflags, fsflags); if (ret) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index a0fc3a1390ab..3ef166a3485a 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3944,6 +3944,7 @@ static void fill_inode_item(struct btrfs_trans_handle *trans, btrfs_set_token_inode_transid(&token, item, trans->transid); btrfs_set_token_inode_rdev(&token, item, inode->i_rdev); btrfs_set_token_inode_flags(&token, item, BTRFS_I(inode)->flags); + btrfs_set_token_inode_compat_flags(&token, item, BTRFS_I(inode)->compat_flags); btrfs_set_token_inode_block_group(&token, item, 0); } diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h index 58d7cff9afb1..ae25280316bd 100644 --- a/include/uapi/linux/btrfs_tree.h +++ b/include/uapi/linux/btrfs_tree.h @@ -574,11 +574,16 @@ struct btrfs_inode_item { /* modification sequence number for NFS */ __le64 sequence; + /* + * flags which aren't checked for corruption at mount + * and can be added in a backwards compatible way + */ + __le64 compat_flags; /* * a little future expansion, for more than this we can * just grow the inode item and version it */ - __le64 reserved[4]; + __le64 reserved[3]; struct btrfs_timespec atime; struct btrfs_timespec ctime; struct btrfs_timespec mtime; From patchwork Wed May 5 19:20:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 12240905 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02BA3C433ED for ; Wed, 5 May 2021 19:20:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B6D63613C7 for ; Wed, 5 May 2021 19:20:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235387AbhEETVs (ORCPT ); Wed, 5 May 2021 15:21:48 -0400 Received: from out2-smtp.messagingengine.com ([66.111.4.26]:58321 "EHLO out2-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235294AbhEETVp (ORCPT ); Wed, 5 May 2021 15:21:45 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 5BEC65C0199; Wed, 5 May 2021 15:20:48 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Wed, 05 May 2021 15:20:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=from :to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; s=fm3; bh=Qve82dkqGYNNAgvgI5fCva8Yij EBHTmY0eIufByMC7s=; b=c/nNJ9iI24OlkZn1gC6IziONfDzbq/oDZaGo25BSbf 9lPpm7aFmuLMhlKijBGqUxeGH5vuQejGDp/k5tc8QF7pMkK94GoscjEIyzyF+ZOK a69G+uI2ahoVbzd3/FSULTL7jV1dH+PIjzzt+J3umOOBJEF0FKXgmCwrmynTyiHX Zfib5K6/FahkAQHXI3wSe4aKpA/fi8FF1SLRMRdIzK5XMsg3Nv/sMngAoMZ+mQnl kyEQqjtcjd3BydEg4NPaRyriUKCG6UxeklRCAbwRFuyotsrrgaa0F98PCSNiOkqs JV8raYtFnCV7nvkSckZ2/NEaxWqJ15T42IUvQB/JydTw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=Qve82dkqGYNNAgvgI5fCva8YijEBHTmY0eIufByMC7s=; b=Z8Tv8Ptg kTbEkPfBHHYBNaFFd7YdJ52A0UIuSg/bGQ0ebYgTxZXU2UoN3wFEXcDAYiENIsLr QhBhoEt20Gkk1IlGOtu3ZPgKQ1buWEdXhj/KC7j9Yt3AynzuM3WGgBVESufLyGLJ UMbFDdpFwOo90cwG+RZz7gmEJK8APzORrwjnHZTrSqII7ZDp4obaPZZRLmGCaZI9 7VwW/RRR7qpNt/kPuRmul3cDF1Bb/ERJ4jBsGF0SD7ylhn4dO1ZMLlnUrH2zfct7 r5iMGQFcM3FllDTMAgP2+CwqCt2921jH1LN84YhuDCvzagl0dNgQ9knAkQF30mOQ 1PcVFdQB/CGoUA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrvdefkedgudefvdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtke ertdertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhr rdhioheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejff evvdehtddufeeihfekgeeuheelnecukfhppedvtdejrdehfedrvdehfedrjeenucevlhhu shhtvghrufhiiigvpedunecurfgrrhgrmhepmhgrihhlfhhrohhmpegsohhrihhssegsuh hrrdhioh X-ME-Proxy: Received: from localhost (unknown [207.53.253.7]) by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 5 May 2021 15:20:47 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, linux-fscrypt@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v4 2/5] btrfs: initial fsverity support Date: Wed, 5 May 2021 12:20:40 -0700 Message-Id: X-Mailer: git-send-email 2.30.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Chris Mason Add support for fsverity in btrfs. To support the generic interface in fs/verity, we add two new item types in the fs tree for inodes with verity enabled. One stores the per-file verity descriptor and the other stores the Merkle tree data itself. Verity checking is done at the end of IOs to ensure each page is checked before it is marked uptodate. Verity relies on PageChecked for the Merkle tree data itself to avoid re-walking up shared paths in the tree. For this reason, we need to cache the Merkle tree data. Since the file is immutable after verity is turned on, we can cache it at an index past EOF. Use the new inode compat_flags to store verity on the inode item, so that we can enable verity on a file, then rollback to an older kernel and still mount the file system and read the file. Since we can't safely write the file anymore without ruining the invariants of the Merkle tree, we mark a ro_compat flag on the file system when a file has verity enabled. Signed-off-by: Chris Mason Signed-off-by: Boris Burkov Reported-by: kernel test robot --- fs/btrfs/Makefile | 1 + fs/btrfs/btrfs_inode.h | 1 + fs/btrfs/ctree.h | 30 +- fs/btrfs/extent_io.c | 27 +- fs/btrfs/file.c | 6 + fs/btrfs/inode.c | 7 + fs/btrfs/ioctl.c | 14 +- fs/btrfs/super.c | 3 + fs/btrfs/sysfs.c | 6 + fs/btrfs/verity.c | 617 ++++++++++++++++++++++++++++++++ include/uapi/linux/btrfs.h | 2 +- include/uapi/linux/btrfs_tree.h | 15 + 12 files changed, 718 insertions(+), 11 deletions(-) create mode 100644 fs/btrfs/verity.c diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index cec88a66bd6c..3dcf9bcc2326 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -36,6 +36,7 @@ btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o btrfs-$(CONFIG_BTRFS_FS_REF_VERIFY) += ref-verify.o btrfs-$(CONFIG_BLK_DEV_ZONED) += zoned.o +btrfs-$(CONFIG_FS_VERITY) += verity.o btrfs-$(CONFIG_BTRFS_FS_RUN_SANITY_TESTS) += tests/free-space-tests.o \ tests/extent-buffer-tests.o tests/btrfs-tests.o \ diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index e8dbc8e848ce..4536548b9e79 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -51,6 +51,7 @@ enum { * the file range, inode's io_tree). */ BTRFS_INODE_NO_DELALLOC_FLUSH, + BTRFS_INODE_VERITY_IN_PROGRESS, }; /* in memory btrfs inode */ diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 0546273a520b..c5aab6a639ef 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -279,9 +279,10 @@ struct btrfs_super_block { #define BTRFS_FEATURE_COMPAT_SAFE_SET 0ULL #define BTRFS_FEATURE_COMPAT_SAFE_CLEAR 0ULL -#define BTRFS_FEATURE_COMPAT_RO_SUPP \ - (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE | \ - BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID) +#define BTRFS_FEATURE_COMPAT_RO_SUPP \ + (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE | \ + BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID | \ + BTRFS_FEATURE_COMPAT_RO_VERITY) #define BTRFS_FEATURE_COMPAT_RO_SAFE_SET 0ULL #define BTRFS_FEATURE_COMPAT_RO_SAFE_CLEAR 0ULL @@ -1505,6 +1506,11 @@ do { \ BTRFS_INODE_COMPRESS | \ BTRFS_INODE_ROOT_ITEM_INIT) +/* + * Inode compat flags + */ +#define BTRFS_INODE_VERITY (1 << 0) + struct btrfs_map_token { struct extent_buffer *eb; char *kaddr; @@ -3766,6 +3772,24 @@ static inline int btrfs_defrag_cancelled(struct btrfs_fs_info *fs_info) return signal_pending(current); } +/* verity.c */ +#ifdef CONFIG_FS_VERITY +extern const struct fsverity_operations btrfs_verityops; +int btrfs_drop_verity_items(struct btrfs_inode *inode); +BTRFS_SETGET_FUNCS(verity_descriptor_encryption, struct btrfs_verity_descriptor_item, + encryption, 8); +BTRFS_SETGET_FUNCS(verity_descriptor_size, struct btrfs_verity_descriptor_item, size, 64); +BTRFS_SETGET_STACK_FUNCS(stack_verity_descriptor_encryption, struct btrfs_verity_descriptor_item, + encryption, 8); +BTRFS_SETGET_STACK_FUNCS(stack_verity_descriptor_size, struct btrfs_verity_descriptor_item, + size, 64); +#else +static inline int btrfs_drop_verity_items(struct btrfs_inode *inode) +{ + return 0; +} +#endif + /* Sanity test specific functions */ #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS void btrfs_test_destroy_inode(struct inode *inode); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 4fb33cadc41a..d1f57a4ad2fb 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -13,6 +13,7 @@ #include #include #include +#include #include "misc.h" #include "extent_io.h" #include "extent-io-tree.h" @@ -2862,15 +2863,28 @@ static void begin_page_read(struct btrfs_fs_info *fs_info, struct page *page) btrfs_subpage_start_reader(fs_info, page, page_offset(page), PAGE_SIZE); } -static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len) +static int end_page_read(struct page *page, bool uptodate, u64 start, u32 len) { - struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb); + int ret = 0; + struct inode *inode = page->mapping->host; + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); ASSERT(page_offset(page) <= start && start + len <= page_offset(page) + PAGE_SIZE); if (uptodate) { - btrfs_page_set_uptodate(fs_info, page, start, len); + /* + * buffered reads of a file with page alignment will issue a + * 0 length read for one page past the end of file, so we must + * explicitly skip checking verity on that page of zeros. + */ + if (!PageError(page) && !PageUptodate(page) && + start < i_size_read(inode) && + fsverity_active(inode) && + !fsverity_verify_page(page)) + ret = -EIO; + else + btrfs_page_set_uptodate(fs_info, page, start, len); } else { btrfs_page_clear_uptodate(fs_info, page, start, len); btrfs_page_set_error(fs_info, page, start, len); @@ -2878,12 +2892,13 @@ static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len) if (fs_info->sectorsize == PAGE_SIZE) unlock_page(page); - else if (is_data_inode(page->mapping->host)) + else if (is_data_inode(inode)) /* * For subpage data, unlock the page if we're the last reader. * For subpage metadata, page lock is not utilized for read. */ btrfs_subpage_end_reader(fs_info, page, start, len); + return ret; } /* @@ -3059,7 +3074,9 @@ static void end_bio_extent_readpage(struct bio *bio) bio_offset += len; /* Update page status and unlock */ - end_page_read(page, uptodate, start, len); + ret = end_page_read(page, uptodate, start, len); + if (ret) + uptodate = 0; endio_readpage_release_extent(&processed, BTRFS_I(inode), start, end, uptodate); } diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 3b10d98b4ebb..a99470303bd9 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -16,6 +16,7 @@ #include #include #include +#include #include "ctree.h" #include "disk-io.h" #include "transaction.h" @@ -3593,7 +3594,12 @@ static loff_t btrfs_file_llseek(struct file *file, loff_t offset, int whence) static int btrfs_file_open(struct inode *inode, struct file *filp) { + int ret; filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC; + + ret = fsverity_file_open(inode, filp); + if (ret) + return ret; return generic_file_open(inode, filp); } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d89000577f7f..1b1101369777 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -32,6 +32,7 @@ #include #include #include +#include #include "misc.h" #include "ctree.h" #include "disk-io.h" @@ -5405,7 +5406,9 @@ void btrfs_evict_inode(struct inode *inode) trace_btrfs_inode_evict(inode); + if (!root) { + fsverity_cleanup_inode(inode); clear_inode(inode); return; } @@ -5488,6 +5491,7 @@ void btrfs_evict_inode(struct inode *inode) * to retry these periodically in the future. */ btrfs_remove_delayed_node(BTRFS_I(inode)); + fsverity_cleanup_inode(inode); clear_inode(inode); } @@ -9041,6 +9045,7 @@ static int btrfs_getattr(struct user_namespace *mnt_userns, struct inode *inode = d_inode(path->dentry); u32 blocksize = inode->i_sb->s_blocksize; u32 bi_flags = BTRFS_I(inode)->flags; + u32 bi_compat_flags = BTRFS_I(inode)->compat_flags; stat->result_mask |= STATX_BTIME; stat->btime.tv_sec = BTRFS_I(inode)->i_otime.tv_sec; @@ -9053,6 +9058,8 @@ static int btrfs_getattr(struct user_namespace *mnt_userns, stat->attributes |= STATX_ATTR_IMMUTABLE; if (bi_flags & BTRFS_INODE_NODUMP) stat->attributes |= STATX_ATTR_NODUMP; + if (bi_compat_flags & BTRFS_INODE_VERITY) + stat->attributes |= STATX_ATTR_VERITY; stat->attributes_mask |= (STATX_ATTR_APPEND | STATX_ATTR_COMPRESSED | diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index ff335c192170..4b8f38fe4226 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -26,6 +26,7 @@ #include #include #include +#include #include "ctree.h" #include "disk-io.h" #include "export.h" @@ -105,6 +106,7 @@ static unsigned int btrfs_mask_fsflags_for_type(struct inode *inode, static unsigned int btrfs_inode_flags_to_fsflags(struct btrfs_inode *binode) { unsigned int flags = binode->flags; + unsigned int compat_flags = binode->compat_flags; unsigned int iflags = 0; if (flags & BTRFS_INODE_SYNC) @@ -121,6 +123,8 @@ static unsigned int btrfs_inode_flags_to_fsflags(struct btrfs_inode *binode) iflags |= FS_DIRSYNC_FL; if (flags & BTRFS_INODE_NODATACOW) iflags |= FS_NOCOW_FL; + if (compat_flags & BTRFS_INODE_VERITY) + iflags |= FS_VERITY_FL; if (flags & BTRFS_INODE_NOCOMPRESS) iflags |= FS_NOCOMP_FL; @@ -148,10 +152,12 @@ void btrfs_sync_inode_flags_to_i_flags(struct inode *inode) new_fl |= S_NOATIME; if (binode->flags & BTRFS_INODE_DIRSYNC) new_fl |= S_DIRSYNC; + if (binode->compat_flags & BTRFS_INODE_VERITY) + new_fl |= S_VERITY; set_mask_bits(&inode->i_flags, - S_SYNC | S_APPEND | S_IMMUTABLE | S_NOATIME | S_DIRSYNC, - new_fl); + S_SYNC | S_APPEND | S_IMMUTABLE | S_NOATIME | S_DIRSYNC | + S_VERITY, new_fl); } static int btrfs_ioctl_getflags(struct file *file, void __user *arg) @@ -5072,6 +5078,10 @@ long btrfs_ioctl(struct file *file, unsigned int return btrfs_ioctl_get_subvol_rootref(file, argp); case BTRFS_IOC_INO_LOOKUP_USER: return btrfs_ioctl_ino_lookup_user(file, argp); + case FS_IOC_ENABLE_VERITY: + return fsverity_ioctl_enable(file, (const void __user *)argp); + case FS_IOC_MEASURE_VERITY: + return fsverity_ioctl_measure(file, argp); } return -ENOTTY; diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 4a396c1147f1..aa41ee30e3ca 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1365,6 +1365,9 @@ static int btrfs_fill_super(struct super_block *sb, sb->s_op = &btrfs_super_ops; sb->s_d_op = &btrfs_dentry_operations; sb->s_export_op = &btrfs_export_ops; +#ifdef CONFIG_FS_VERITY + sb->s_vop = &btrfs_verityops; +#endif sb->s_xattr = btrfs_xattr_handlers; sb->s_time_gran = 1; #ifdef CONFIG_BTRFS_FS_POSIX_ACL diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 436ac7b4b334..331ea4febcb1 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -267,6 +267,9 @@ BTRFS_FEAT_ATTR_INCOMPAT(raid1c34, RAID1C34); #ifdef CONFIG_BTRFS_DEBUG BTRFS_FEAT_ATTR_INCOMPAT(zoned, ZONED); #endif +#ifdef CONFIG_FS_VERITY +BTRFS_FEAT_ATTR_COMPAT_RO(verity, VERITY); +#endif static struct attribute *btrfs_supported_feature_attrs[] = { BTRFS_FEAT_ATTR_PTR(mixed_backref), @@ -284,6 +287,9 @@ static struct attribute *btrfs_supported_feature_attrs[] = { BTRFS_FEAT_ATTR_PTR(raid1c34), #ifdef CONFIG_BTRFS_DEBUG BTRFS_FEAT_ATTR_PTR(zoned), +#endif +#ifdef CONFIG_FS_VERITY + BTRFS_FEAT_ATTR_PTR(verity), #endif NULL }; diff --git a/fs/btrfs/verity.c b/fs/btrfs/verity.c new file mode 100644 index 000000000000..feaf5908b3d3 --- /dev/null +++ b/fs/btrfs/verity.c @@ -0,0 +1,617 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2020 Facebook. All rights reserved. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "ctree.h" +#include "btrfs_inode.h" +#include "transaction.h" +#include "disk-io.h" +#include "locking.h" + +/* + * Just like ext4, we cache the merkle tree in pages after EOF in the page + * cache. Unlike ext4, we're storing these in dedicated btree items and + * not just shoving them after EOF in the file. This means we'll need to + * do extra work to encrypt them once encryption is supported in btrfs, + * but btrfs has a lot of careful code around i_size and it seems better + * to make a new key type than try and adjust all of our expectations + * for i_size. + * + * fs verity items are stored under two different key types on disk. + * + * The descriptor items: + * [ inode objectid, BTRFS_VERITY_DESC_ITEM_KEY, offset ] + * + * At offset 0, we store a btrfs_verity_descriptor_item which tracks the + * size of the descriptor item and some extra data for encryption. + * Starting at offset 1, these hold the generic fs verity descriptor. + * These are opaque to btrfs, we just read and write them as a blob for + * the higher level verity code. The most common size for this is 256 bytes. + * + * The merkle tree items: + * [ inode objectid, BTRFS_VERITY_MERKLE_ITEM_KEY, offset ] + * + * These also start at offset 0, and correspond to the merkle tree bytes. + * So when fsverity asks for page 0 of the merkle tree, we pull up one page + * starting at offset 0 for this key type. These are also opaque to btrfs, + * we're blindly storing whatever fsverity sends down. + */ + +/* + * Compute the logical file offset where we cache the Merkle tree. + * + * @inode: the inode of the verity file + * + * For the purposes of caching the Merkle tree pages, as required by + * fs-verity, it is convenient to do size computations in terms of a file + * offset, rather than in terms of page indices. + * + * Returns the file offset on success, negative error code on failure. + */ +static loff_t merkle_file_pos(const struct inode *inode) +{ + u64 sz = inode->i_size; + u64 ret = round_up(sz, 65536); + + if (ret > inode->i_sb->s_maxbytes) + return -EFBIG; + return ret; +} + +/* + * Drop all the items for this inode with this key_type. + * @inode: The inode to drop items for + * @key_type: The type of items to drop (VERITY_DESC_ITEM or + * VERITY_MERKLE_ITEM) + * + * Before doing a verity enable we cleanup any existing verity items. + * + * This is also used to clean up if a verity enable failed half way + * through. + * + * Returns 0 on success, negative error code on failure. + */ +static int drop_verity_items(struct btrfs_inode *inode, u8 key_type) +{ + struct btrfs_trans_handle *trans; + struct btrfs_root *root = inode->root; + struct btrfs_path *path; + struct btrfs_key key; + int ret; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + while (1) { + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + goto out; + } + + /* + * walk backwards through all the items until we find one + * that isn't from our key type or objectid + */ + key.objectid = btrfs_ino(inode); + key.offset = (u64)-1; + key.type = key_type; + + ret = btrfs_search_slot(trans, root, &key, path, -1, 1); + if (ret > 0) { + ret = 0; + /* no more keys of this type, we're done */ + if (path->slots[0] == 0) + break; + path->slots[0]--; + } else if (ret < 0) { + break; + } + + btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]); + + /* no more keys of this type, we're done */ + if (key.objectid != btrfs_ino(inode) || key.type != key_type) + break; + + /* + * this shouldn't be a performance sensitive function because + * it's not used as part of truncate. If it ever becomes + * perf sensitive, change this to walk forward and bulk delete + * items + */ + ret = btrfs_del_items(trans, root, path, + path->slots[0], 1); + btrfs_release_path(path); + btrfs_end_transaction(trans); + + if (ret) + goto out; + } + + btrfs_end_transaction(trans); +out: + btrfs_free_path(path); + return ret; + +} + +/* + * Insert and write inode items with a given key type and offset. + * @inode: The inode to insert for. + * @key_type: The key type to insert. + * @offset: The item offset to insert at. + * @src: Source data to write. + * @len: Length of source data to write. + * + * Write len bytes from src into items of up to 1k length. + * The inserted items will have key where + * off is consecutively increasing from 0 up to the last item ending at + * offset + len. + * + * Returns 0 on success and a negative error code on failure. + */ +static int write_key_bytes(struct btrfs_inode *inode, u8 key_type, u64 offset, + const char *src, u64 len) +{ + struct btrfs_trans_handle *trans; + struct btrfs_path *path; + struct btrfs_root *root = inode->root; + struct extent_buffer *leaf; + struct btrfs_key key; + u64 copied = 0; + unsigned long copy_bytes; + unsigned long src_offset = 0; + void *data; + int ret; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + while (len > 0) { + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + break; + } + + key.objectid = btrfs_ino(inode); + key.offset = offset; + key.type = key_type; + + /* + * insert 1K at a time mostly to be friendly for smaller + * leaf size filesystems + */ + copy_bytes = min_t(u64, len, 1024); + + ret = btrfs_insert_empty_item(trans, root, path, &key, copy_bytes); + if (ret) { + btrfs_end_transaction(trans); + break; + } + + leaf = path->nodes[0]; + + data = btrfs_item_ptr(leaf, path->slots[0], void); + write_extent_buffer(leaf, src + src_offset, + (unsigned long)data, copy_bytes); + offset += copy_bytes; + src_offset += copy_bytes; + len -= copy_bytes; + copied += copy_bytes; + + btrfs_release_path(path); + btrfs_end_transaction(trans); + } + + btrfs_free_path(path); + return ret; +} + +/* + * Read inode items of the given key type and offset from the btree. + * @inode: The inode to read items of. + * @key_type: The key type to read. + * @offset: The item offset to read from. + * @dest: The buffer to read into. This parameter has slightly tricky + * semantics. If it is NULL, the function will not do any copying + * and will just return the size of all the items up to len bytes. + * If dest_page is passed, then the function will kmap_atomic the + * page and ignore dest, but it must still be non-NULL to avoid the + * counting-only behavior. + * @len: Length in bytes to read. + * @dest_page: Copy into this page instead of the dest buffer. + * + * Helper function to read items from the btree. This returns the number + * of bytes read or < 0 for errors. We can return short reads if the + * items don't exist on disk or aren't big enough to fill the desired length. + * + * Supports reading into a provided buffer (dest) or into the page cache + * + * Returns number of bytes read or a negative error code on failure. + */ +static ssize_t read_key_bytes(struct btrfs_inode *inode, u8 key_type, u64 offset, + char *dest, u64 len, struct page *dest_page) +{ + struct btrfs_path *path; + struct btrfs_root *root = inode->root; + struct extent_buffer *leaf; + struct btrfs_key key; + u64 item_end; + u64 copy_end; + u64 copied = 0; + u32 copy_offset; + unsigned long copy_bytes; + unsigned long dest_offset = 0; + void *data; + char *kaddr = dest; + int ret; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + if (dest_page) + path->reada = READA_FORWARD; + + key.objectid = btrfs_ino(inode); + key.offset = offset; + key.type = key_type; + + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + if (ret < 0) { + goto out; + } else if (ret > 0) { + ret = 0; + if (path->slots[0] == 0) + goto out; + path->slots[0]--; + } + + while (len > 0) { + leaf = path->nodes[0]; + btrfs_item_key_to_cpu(leaf, &key, path->slots[0]); + + if (key.objectid != btrfs_ino(inode) || + key.type != key_type) + break; + + item_end = btrfs_item_size_nr(leaf, path->slots[0]) + key.offset; + + if (copied > 0) { + /* + * once we've copied something, we want all of the items + * to be sequential + */ + if (key.offset != offset) + break; + } else { + /* + * our initial offset might be in the middle of an + * item. Make sure it all makes sense + */ + if (key.offset > offset) + break; + if (item_end <= offset) + break; + } + + /* desc = NULL to just sum all the item lengths */ + if (!dest) + copy_end = item_end; + else + copy_end = min(offset + len, item_end); + + /* number of bytes in this item we want to copy */ + copy_bytes = copy_end - offset; + + /* offset from the start of item for copying */ + copy_offset = offset - key.offset; + + if (dest) { + if (dest_page) + kaddr = kmap_atomic(dest_page); + + data = btrfs_item_ptr(leaf, path->slots[0], void); + read_extent_buffer(leaf, kaddr + dest_offset, + (unsigned long)data + copy_offset, + copy_bytes); + + if (dest_page) + kunmap_atomic(kaddr); + } + + offset += copy_bytes; + dest_offset += copy_bytes; + len -= copy_bytes; + copied += copy_bytes; + + path->slots[0]++; + if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) { + /* + * we've reached the last slot in this leaf and we need + * to go to the next leaf. + */ + ret = btrfs_next_leaf(root, path); + if (ret < 0) { + break; + } else if (ret > 0) { + ret = 0; + break; + } + } + } +out: + btrfs_free_path(path); + if (!ret) + ret = copied; + return ret; +} + +/* + * Drop verity items from the btree and from the page cache + * + * @inode: the inode to drop items for + * + * If we fail partway through enabling verity, enable verity and have some + * partial data extant, or cleanup orphaned verity data, we need to truncate it + * from the cache and delete the items themselves from the btree. + * + * Returns 0 on success, negative error code on failure. + */ +int btrfs_drop_verity_items(struct btrfs_inode *inode) +{ + int ret; + struct inode *ino = &inode->vfs_inode; + + truncate_inode_pages(ino->i_mapping, ino->i_size); + ret = drop_verity_items(inode, BTRFS_VERITY_DESC_ITEM_KEY); + if (ret) + return ret; + return drop_verity_items(inode, BTRFS_VERITY_MERKLE_ITEM_KEY); +} + +/* + * fsverity op that begins enabling verity. + * fsverity calls this to ask us to setup the inode for enabling. We + * drop any existing verity items and set the in progress bit. + */ +static int btrfs_begin_enable_verity(struct file *filp) +{ + struct inode *inode = file_inode(filp); + int ret; + + if (test_bit(BTRFS_INODE_VERITY_IN_PROGRESS, &BTRFS_I(inode)->runtime_flags)) + return -EBUSY; + + set_bit(BTRFS_INODE_VERITY_IN_PROGRESS, &BTRFS_I(inode)->runtime_flags); + ret = drop_verity_items(BTRFS_I(inode), BTRFS_VERITY_DESC_ITEM_KEY); + if (ret) + goto err; + + ret = drop_verity_items(BTRFS_I(inode), BTRFS_VERITY_MERKLE_ITEM_KEY); + if (ret) + goto err; + + return 0; + +err: + clear_bit(BTRFS_INODE_VERITY_IN_PROGRESS, &BTRFS_I(inode)->runtime_flags); + return ret; + +} + +/* + * fsverity op that ends enabling verity. + * fsverity calls this when it's done with all of the pages in the file + * and all of the merkle items have been inserted. We write the + * descriptor and update the inode in the btree to reflect its new life + * as a verity file. + */ +static int btrfs_end_enable_verity(struct file *filp, const void *desc, + size_t desc_size, u64 merkle_tree_size) +{ + struct btrfs_trans_handle *trans; + struct inode *inode = file_inode(filp); + struct btrfs_root *root = BTRFS_I(inode)->root; + struct btrfs_verity_descriptor_item item; + int ret; + + if (desc != NULL) { + /* write out the descriptor item */ + memset(&item, 0, sizeof(item)); + btrfs_set_stack_verity_descriptor_size(&item, desc_size); + ret = write_key_bytes(BTRFS_I(inode), + BTRFS_VERITY_DESC_ITEM_KEY, 0, + (const char *)&item, sizeof(item)); + if (ret) + goto out; + /* write out the descriptor itself */ + ret = write_key_bytes(BTRFS_I(inode), + BTRFS_VERITY_DESC_ITEM_KEY, 1, + desc, desc_size); + if (ret) + goto out; + + /* update our inode flags to include fs verity */ + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + goto out; + } + BTRFS_I(inode)->compat_flags |= BTRFS_INODE_VERITY; + btrfs_sync_inode_flags_to_i_flags(inode); + ret = btrfs_update_inode(trans, root, BTRFS_I(inode)); + btrfs_end_transaction(trans); + } + +out: + if (desc == NULL || ret) { + /* If we failed, drop all the verity items */ + drop_verity_items(BTRFS_I(inode), BTRFS_VERITY_DESC_ITEM_KEY); + drop_verity_items(BTRFS_I(inode), BTRFS_VERITY_MERKLE_ITEM_KEY); + } else + btrfs_set_fs_compat_ro(root->fs_info, VERITY); + clear_bit(BTRFS_INODE_VERITY_IN_PROGRESS, &BTRFS_I(inode)->runtime_flags); + return ret; +} + +/* + * fsverity op that gets the struct fsverity_descriptor. + * fsverity does a two pass setup for reading the descriptor, in the first pass + * it calls with buf_size = 0 to query the size of the descriptor, + * and then in the second pass it actually reads the descriptor off + * disk. + */ +static int btrfs_get_verity_descriptor(struct inode *inode, void *buf, + size_t buf_size) +{ + u64 true_size; + ssize_t ret = 0; + struct btrfs_verity_descriptor_item item; + + memset(&item, 0, sizeof(item)); + ret = read_key_bytes(BTRFS_I(inode), BTRFS_VERITY_DESC_ITEM_KEY, + 0, (char *)&item, sizeof(item), NULL); + if (ret < 0) + return ret; + + if (item.reserved[0] != 0 || item.reserved[1] != 0) + return -EUCLEAN; + + true_size = btrfs_stack_verity_descriptor_size(&item); + if (true_size > INT_MAX) + return -EUCLEAN; + + if (!buf_size) + return true_size; + if (buf_size < true_size) + return -ERANGE; + + ret = read_key_bytes(BTRFS_I(inode), + BTRFS_VERITY_DESC_ITEM_KEY, 1, + buf, buf_size, NULL); + if (ret < 0) + return ret; + if (ret != true_size) + return -EIO; + + return true_size; +} + +/* + * fsverity op that reads and caches a merkle tree page. These are stored + * in the btree, but we cache them in the inode's address space after EOF. + */ +static struct page *btrfs_read_merkle_tree_page(struct inode *inode, + pgoff_t index, + unsigned long num_ra_pages) +{ + struct page *p; + u64 off = index << PAGE_SHIFT; + loff_t merkle_pos = merkle_file_pos(inode); + ssize_t ret; + int err; + + if (merkle_pos > inode->i_sb->s_maxbytes - off - PAGE_SIZE) + return ERR_PTR(-EFBIG); + index += merkle_pos >> PAGE_SHIFT; +again: + p = find_get_page_flags(inode->i_mapping, index, FGP_ACCESSED); + if (p) { + if (PageUptodate(p)) + return p; + + lock_page(p); + /* + * we only insert uptodate pages, so !Uptodate has to be + * an error + */ + if (!PageUptodate(p)) { + unlock_page(p); + put_page(p); + return ERR_PTR(-EIO); + } + unlock_page(p); + return p; + } + + p = page_cache_alloc(inode->i_mapping); + if (!p) + return ERR_PTR(-ENOMEM); + + /* + * merkle item keys are indexed from byte 0 in the merkle tree. + * they have the form: + * + * [ inode objectid, BTRFS_MERKLE_ITEM_KEY, offset in bytes ] + */ + ret = read_key_bytes(BTRFS_I(inode), + BTRFS_VERITY_MERKLE_ITEM_KEY, off, + page_address(p), PAGE_SIZE, p); + if (ret < 0) { + put_page(p); + return ERR_PTR(ret); + } + + /* zero fill any bytes we didn't write into the page */ + if (ret < PAGE_SIZE) { + char *kaddr = kmap_atomic(p); + + memset(kaddr + ret, 0, PAGE_SIZE - ret); + kunmap_atomic(kaddr); + } + SetPageUptodate(p); + err = add_to_page_cache_lru(p, inode->i_mapping, index, + mapping_gfp_mask(inode->i_mapping)); + + if (!err) { + /* inserted and ready for fsverity */ + unlock_page(p); + } else { + put_page(p); + /* did someone race us into inserting this page? */ + if (err == -EEXIST) + goto again; + p = ERR_PTR(err); + } + return p; +} + +/* + * fsverity op that writes a merkle tree block into the btree in 1k chunks. + */ +static int btrfs_write_merkle_tree_block(struct inode *inode, const void *buf, + u64 index, int log_blocksize) +{ + u64 off = index << log_blocksize; + u64 len = 1 << log_blocksize; + + if (merkle_file_pos(inode) > inode->i_sb->s_maxbytes - off - len) + return -EFBIG; + + return write_key_bytes(BTRFS_I(inode), BTRFS_VERITY_MERKLE_ITEM_KEY, + off, buf, len); +} + +const struct fsverity_operations btrfs_verityops = { + .begin_enable_verity = btrfs_begin_enable_verity, + .end_enable_verity = btrfs_end_enable_verity, + .get_verity_descriptor = btrfs_get_verity_descriptor, + .read_merkle_tree_page = btrfs_read_merkle_tree_page, + .write_merkle_tree_block = btrfs_write_merkle_tree_block, +}; diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index 5df73001aad4..fa21c8aac78d 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -288,6 +288,7 @@ struct btrfs_ioctl_fs_info_args { * first mount when booting older kernel versions. */ #define BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID (1ULL << 1) +#define BTRFS_FEATURE_COMPAT_RO_VERITY (1ULL << 2) #define BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF (1ULL << 0) #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL (1ULL << 1) @@ -308,7 +309,6 @@ struct btrfs_ioctl_fs_info_args { #define BTRFS_FEATURE_INCOMPAT_METADATA_UUID (1ULL << 10) #define BTRFS_FEATURE_INCOMPAT_RAID1C34 (1ULL << 11) #define BTRFS_FEATURE_INCOMPAT_ZONED (1ULL << 12) - struct btrfs_ioctl_feature_flags { __u64 compat_flags; __u64 compat_ro_flags; diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h index ae25280316bd..2be57416f886 100644 --- a/include/uapi/linux/btrfs_tree.h +++ b/include/uapi/linux/btrfs_tree.h @@ -118,6 +118,14 @@ #define BTRFS_INODE_REF_KEY 12 #define BTRFS_INODE_EXTREF_KEY 13 #define BTRFS_XATTR_ITEM_KEY 24 + +/* + * fsverity has a descriptor per file, and then + * a number of sha or csum items indexed by offset in to the file. + */ +#define BTRFS_VERITY_DESC_ITEM_KEY 36 +#define BTRFS_VERITY_MERKLE_ITEM_KEY 37 + #define BTRFS_ORPHAN_ITEM_KEY 48 /* reserve 2-15 close to the inode for later flexibility */ @@ -996,4 +1004,11 @@ struct btrfs_qgroup_limit_item { __le64 rsv_excl; } __attribute__ ((__packed__)); +struct btrfs_verity_descriptor_item { + /* size of the verity descriptor in bytes */ + __le64 size; + __le64 reserved[2]; + __u8 encryption; +} __attribute__ ((__packed__)); + #endif /* _BTRFS_CTREE_H_ */ From patchwork Wed May 5 19:20:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 12240903 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A20BC43470 for ; Wed, 5 May 2021 19:20:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E9DCC613C7 for ; Wed, 5 May 2021 19:20:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235417AbhEETVv (ORCPT ); Wed, 5 May 2021 15:21:51 -0400 Received: from out2-smtp.messagingengine.com ([66.111.4.26]:41365 "EHLO out2-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234664AbhEETVr (ORCPT ); Wed, 5 May 2021 15:21:47 -0400 Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id F25835C0129; Wed, 5 May 2021 15:20:49 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Wed, 05 May 2021 15:20:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=from :to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; s=fm3; bh=caQZjyJskGUp44yY38RSJmT9Zh kRy2L2kQ0XXqbIUjs=; b=C/31vazFwbfpcq9ndL0hT/MpVJEJ0bgYnsi7S4bpgG Jun+lsGBdVAN9VrwZ1pcmUIPH7CmYRMKRNM4ws4a24xWdLd7+g3+a3Rzj/J8cWjV eOMxeIy0ChtIa44iKLTUktUGTpNJApcfrjbqSicFTYtRD78QIBGkpbhWeXIO64aU IChD/cExwYtA8UcGeNi3m/fkKTitvRSd63ifSjLPYeMK137T1e8Xa9ONlMvqf+y7 RoCHzHE2/+6rpcqEYa6zsXCwqvIaKW3pahMs1qhkKB+CrnX5kSDxS+3QmPP77GPh Y1gNEYZ6at1t5fRRpYjAkVPUOhN99/162u9pkyH0J+Cg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=caQZjyJskGUp44yY38RSJmT9ZhkRy2L2kQ0XXqbIUjs=; b=LMjixGP+ 3nSRskiQfc5MIK8V1PMbdOIJZWlG9N0O8dL5JZPEhdCBa/xI4frjyURofIt4TGe1 RBUz3uHWi1oRy87GJrRek4qn7Htg5rE8MPEUslPF4tDweucYxdWwqPSrhA2goozo L7nm7+YvDECPqYM4L2Pndj7WMxGLTkvYZb+neKakjVSGxBZPV20x80HlLC3s7gB3 Zm2X3Jz06QCbjkIU67tavxEuyM0EdnQcHr81fI78hArRop9w4EMTqsVlQR26YpXB uEaLfcfdsuQ+QUWRbsnfS8sxxauNh6Gq2PGiPXlzqcsjXMyhTyBWLleXU0WlnYG7 MYPx2mkJP1fV4g== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrvdefkedgudefvdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtke ertdertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhr rdhioheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejff evvdehtddufeeihfekgeeuheelnecukfhppedvtdejrdehfedrvdehfedrjeenucevlhhu shhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegsohhrihhssegsuh hrrdhioh X-ME-Proxy: Received: from localhost (unknown [207.53.253.7]) by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 5 May 2021 15:20:49 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, linux-fscrypt@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v4 3/5] btrfs: check verity for reads of inline extents and holes Date: Wed, 5 May 2021 12:20:41 -0700 Message-Id: <0cf02de467f18881ed84e483e21975ffdc86abca.1620241221.git.boris@bur.io> X-Mailer: git-send-email 2.30.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The majority of reads receive a verity check after the bio is complete as the page is marked uptodate. However, there is a class of reads which are handled with btrfs logic in readpage, rather than by submitting a bio. Specifically, these are inline extents, preallocated extents, and holes. Tweak readpage so that if it is going to mark such a page uptodate, it first checks verity on it. Now if a veritied file has corruption to this class of EXTENT_DATA items, it will be detected at read time. There is one annoying edge case that requires checking for start < last_byte: if userspace reads to the end of a file with page aligned size and then tries to keep reading (as cat does), the buffered read code will try to read the page past the end of the file, and expects it to be filled with 0s and marked uptodate. That bogus page is not part of the data hashed by verity, so we have to ignore it. Signed-off-by: Boris Burkov --- fs/btrfs/extent_io.c | 26 +++++++------------------- 1 file changed, 7 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index d1f57a4ad2fb..d1493a876915 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2202,18 +2202,6 @@ int test_range_bit(struct extent_io_tree *tree, u64 start, u64 end, return bitset; } -/* - * helper function to set a given page up to date if all the - * extents in the tree for that page are up to date - */ -static void check_page_uptodate(struct extent_io_tree *tree, struct page *page) -{ - u64 start = page_offset(page); - u64 end = start + PAGE_SIZE - 1; - if (test_range_bit(tree, start, end, EXTENT_UPTODATE, 1, NULL)) - SetPageUptodate(page); -} - int free_io_failure(struct extent_io_tree *failure_tree, struct extent_io_tree *io_tree, struct io_failure_record *rec) @@ -3467,14 +3455,14 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached, &cached, GFP_NOFS); unlock_extent_cached(tree, cur, cur + iosize - 1, &cached); - end_page_read(page, true, cur, iosize); + ret = end_page_read(page, true, cur, iosize); break; } em = __get_extent_map(inode, page, pg_offset, cur, end - cur + 1, em_cached); if (IS_ERR_OR_NULL(em)) { unlock_extent(tree, cur, end); - end_page_read(page, false, cur, end + 1 - cur); + ret = end_page_read(page, false, cur, end + 1 - cur); break; } extent_offset = cur - em->start; @@ -3555,9 +3543,10 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached, set_extent_uptodate(tree, cur, cur + iosize - 1, &cached, GFP_NOFS); + unlock_extent_cached(tree, cur, cur + iosize - 1, &cached); - end_page_read(page, true, cur, iosize); + ret = end_page_read(page, true, cur, iosize); cur = cur + iosize; pg_offset += iosize; continue; @@ -3565,9 +3554,8 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached, /* the get_extent function already copied into the page */ if (test_range_bit(tree, cur, cur_end, EXTENT_UPTODATE, 1, NULL)) { - check_page_uptodate(tree, page); unlock_extent(tree, cur, cur + iosize - 1); - end_page_read(page, true, cur, iosize); + ret = end_page_read(page, true, cur, iosize); cur = cur + iosize; pg_offset += iosize; continue; @@ -3577,7 +3565,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached, */ if (block_start == EXTENT_MAP_INLINE) { unlock_extent(tree, cur, cur + iosize - 1); - end_page_read(page, false, cur, iosize); + ret = end_page_read(page, false, cur, iosize); cur = cur + iosize; pg_offset += iosize; continue; @@ -3595,7 +3583,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached, *bio_flags = this_bio_flag; } else { unlock_extent(tree, cur, cur + iosize - 1); - end_page_read(page, false, cur, iosize); + ret = end_page_read(page, false, cur, iosize); goto out; } cur = cur + iosize; From patchwork Wed May 5 19:20:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 12240907 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E22EDC433B4 for ; Wed, 5 May 2021 19:20:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C915C613C4 for ; Wed, 5 May 2021 19:20:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235421AbhEETVw (ORCPT ); Wed, 5 May 2021 15:21:52 -0400 Received: from out2-smtp.messagingengine.com ([66.111.4.26]:53665 "EHLO out2-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235377AbhEETVs (ORCPT ); Wed, 5 May 2021 15:21:48 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 6DE5B5C0199; Wed, 5 May 2021 15:20:51 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 05 May 2021 15:20:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=from :to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; s=fm3; bh=OTn2VJGUKEh++E08+3NbIo7Gy2 0jA/Y9uR37p8Tn0Kc=; b=FTcgRhX3EZlZd8POnj528VDWUT8fOKEk4m88hT4EfF 87zlBXiBd2hzz7JCgBiZ5ikqG5Pe03tn+215u+/lSYFZSR5oym4MvQ/Dv2LhoDyH V4WJjtJ8lTNe/S4THXHJLKimtdJTMqVeCZnemo61+Kixzlb526EMHV9icnzjOE43 DiO234Sr5kn5zjwTagAlp4jKhLUNNEq5Jpr0v1OTHqozXumGlPehwQ2sFJxmq7ss oMudCt92xVUjHTFIgOQT7w22Va694fBJnrasCmM/CLxPWfo9UG8Qz8fKDSuX4z8S y8glTQ4hXYdCDZk9cybYq/qLRXNTmFWhx++2fYVKCqkA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=OTn2VJGUKEh++E08+3NbIo7Gy20jA/Y9uR37p8Tn0Kc=; b=jUOHsVsm jIwYaRJmSVfAgcZdmeWGv5NprJiyKGjMML/8M5P/HmBSSjvRB57LT59J8RiCtsQ+ wkuFcujlD1EERYswizHJNaE/28wYHFwK7eHLazL2UTVKFEeLP4xeNJF4ruYNLv/j sCU0+Rk3cR6HtUdRlheWytquLI/Cy6Z/5zvGporHQSTO3kv17FiOmAv+2DSdSgqs Fo45gHYeI6MhiVX6uyp4cs5HG5MgSfffOmgLyQdPgbrUibPcbbOhJZt1uGlGLxoB oaA+xiVtA8eXnQph2mgIdY5druXmrnet1hEC48WFxhIoCLxBbBuFlB81V+EUxwNZ IdN1MjZl3wKGjA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrvdefkedgudefvdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtke ertdertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhr rdhioheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejff evvdehtddufeeihfekgeeuheelnecukfhppedvtdejrdehfedrvdehfedrjeenucevlhhu shhtvghrufhiiigvpedunecurfgrrhgrmhepmhgrihhlfhhrohhmpegsohhrihhssegsuh hrrdhioh X-ME-Proxy: Received: from localhost (unknown [207.53.253.7]) by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 5 May 2021 15:20:51 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, linux-fscrypt@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v4 4/5] btrfs: fallback to buffered io for verity files Date: Wed, 5 May 2021 12:20:42 -0700 Message-Id: <6d7ffab08e76b18a5e29eff7268f860803a82c61.1620241221.git.boris@bur.io> X-Mailer: git-send-email 2.30.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Reading the contents with direct IO would circumvent verity checks, so fallback to buffered reads. For what it's worth, this is how ext4 handles it as well. Signed-off-by: Boris Burkov --- fs/btrfs/file.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index a99470303bd9..34bc22fa6b1f 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -3628,6 +3628,9 @@ static ssize_t btrfs_direct_read(struct kiocb *iocb, struct iov_iter *to) struct inode *inode = file_inode(iocb->ki_filp); ssize_t ret; + if (fsverity_active(inode)) + return 0; + if (check_direct_read(btrfs_sb(inode->i_sb), to, iocb->ki_pos)) return 0; From patchwork Wed May 5 19:20:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Burkov X-Patchwork-Id: 12240909 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE13DC43600 for ; Wed, 5 May 2021 19:20:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7FF8761166 for ; Wed, 5 May 2021 19:20:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235428AbhEETVw (ORCPT ); Wed, 5 May 2021 15:21:52 -0400 Received: from out2-smtp.messagingengine.com ([66.111.4.26]:57183 "EHLO out2-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234247AbhEETVt (ORCPT ); Wed, 5 May 2021 15:21:49 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id D37285C0129; Wed, 5 May 2021 15:20:52 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 05 May 2021 15:20:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=from :to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; s=fm3; bh=82mRpWlF6ngrJB4Vkx//tRpSfl ycsilwNAuK/EhW7Z4=; b=rgCUWQrJkvEuoexHBOcvij5EYujbfz5oMvOIkZMGXU fdLpZU1DuWuBmX0CbDGSSJ7bMPe54sL9X4M2p1BJwhHDvCcotaFzR0kZl2t3PMoE ptJj6uBCUA8An6It3yqPRMynNBrNKwQDUb+RQD1h6cg0TlaDl2N+DcKX6msX6WhM vSKrkxVVAUCwxrxjsSRBkJl5DIgRVwHB3mvnUJV8kBhZC/f5AG3l81l8tgRIgiqU vXTU/7ZEEFFRv2IC6etaGcJ5ZBuAYPlpuqGtz+U+bGz81/qubvQQjWKhavT/WVYI 93dCmevUP+Ryo21aUvRXCe2+QawGkRLezGFWx21i9Jng== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=82mRpWlF6ngrJB4Vkx//tRpSflycsilwNAuK/EhW7Z4=; b=QfKpAINT dy3VMlGL4h3saEvsL+VifML4FcakFRzuzxEymDAD3YXj7xkRQcpmH4o6ClpJXGRY he0tlmMoWYbx0i4gWIVeVfiPnx+hDoZAd5iv1+sbTq4eRS7pK0OC00iAYNOYkbVs JlmTLJr71k6mHoLDA1xxOo8jfYoamhr7ehXRdl28ZJuBqrC3X0cZcfNNbkGO0gao /wDKqKekDhcBQtCf65dTUDRl0iJ0Cn7Oze12+vdozq0wwGy8an9JyD3GxqRyAGZ3 zDtXB/cxj+7on70XCOvtqlc3EM+OCuNLyEo3SH/++37h8jyBbOOK0LFuH0WIl54u AjldzelseFkEEQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrvdefkedgudefvdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkffojghfggfgsedtke ertdertddtnecuhfhrohhmpeeuohhrihhsuceuuhhrkhhovhcuoegsohhrihhssegsuhhr rdhioheqnecuggftrfgrthhtvghrnhepieeuffeuvdeiueejhfehiefgkeevudejjeejff evvdehtddufeeihfekgeeuheelnecukfhppedvtdejrdehfedrvdehfedrjeenucevlhhu shhtvghrufhiiigvpedunecurfgrrhgrmhepmhgrihhlfhhrohhmpegsohhrihhssegsuh hrrdhioh X-ME-Proxy: Received: from localhost (unknown [207.53.253.7]) by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 5 May 2021 15:20:52 -0400 (EDT) From: Boris Burkov To: linux-btrfs@vger.kernel.org, linux-fscrypt@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v4 5/5] btrfs: verity metadata orphan items Date: Wed, 5 May 2021 12:20:43 -0700 Message-Id: <8e7e0d3dd84f729d86e7f1a466fe8828f0e7ba58.1620241221.git.boris@bur.io> X-Mailer: git-send-email 2.30.2 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org If we don't finish creating fsverity metadata for a file, or fail to clean up already created metadata after a failure, we could leak the verity items. To address this issue, we use the orphan mechanism. When we start enabling verity on a file, we also add an orphan item for that inode. When we are finished, we delete the orphan. However, if we are interrupted midway, the orphan will be present at mount and we can cleanup the half-formed verity state. There is a possible race with a normal unlink operation: if unlink and verity run on the same file in parallel, it is possible for verity to succeed and delete the still legitimate orphan added by unlink. Then, if we are interrupted and mount in that state, we will never clean up the inode properly. This is also possible for a file created with O_TMPFILE. Check nlink==0 before deleting to avoid this race. A final thing to note is that this is a resurrection of using orphans to signal orphaned metadata that isn't the inode itself. This makes the comment discussing deprecating that concept a bit messy in full context. Signed-off-by: Boris Burkov --- fs/btrfs/inode.c | 15 +++++++-- fs/btrfs/verity.c | 79 ++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 87 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 1b1101369777..67eba8db4b65 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3419,7 +3419,9 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) /* * If we have an inode with links, there are a couple of - * possibilities. Old kernels (before v3.12) used to create an + * possibilities: + * + * 1. Old kernels (before v3.12) used to create an * orphan item for truncate indicating that there were possibly * extent items past i_size that needed to be deleted. In v3.12, * truncate was changed to update i_size in sync with the extent @@ -3432,13 +3434,22 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) * slim, and it's a pain to do the truncate now, so just delete * the orphan item. * + * 2. We were halfway through creating fsverity metadata for the + * file. In that case, the orphan item represents incomplete + * fsverity metadata which must be cleaned up with + * btrfs_drop_verity_items. + * * It's also possible that this orphan item was supposed to be * deleted but wasn't. The inode number may have been reused, * but either way, we can delete the orphan item. */ if (ret == -ENOENT || inode->i_nlink) { - if (!ret) + if (!ret) { + ret = btrfs_drop_verity_items(BTRFS_I(inode)); iput(inode); + if (ret) + goto out; + } trans = btrfs_start_transaction(root, 1); if (IS_ERR(trans)) { ret = PTR_ERR(trans); diff --git a/fs/btrfs/verity.c b/fs/btrfs/verity.c index feaf5908b3d3..3a115cdca018 100644 --- a/fs/btrfs/verity.c +++ b/fs/btrfs/verity.c @@ -362,6 +362,64 @@ static ssize_t read_key_bytes(struct btrfs_inode *inode, u8 key_type, u64 offset return ret; } +/* + * Helper to manage the transaction for adding an orphan item. + */ +static int add_orphan(struct btrfs_inode *inode) +{ + struct btrfs_trans_handle *trans; + struct btrfs_root *root = inode->root; + int ret = 0; + + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + goto out; + } + ret = btrfs_orphan_add(trans, inode); + if (ret) { + btrfs_abort_transaction(trans, ret); + goto out; + } + btrfs_end_transaction(trans); + +out: + return ret; +} + +/* + * Helper to manage the transaction for deleting an orphan item. + */ +static int del_orphan(struct btrfs_inode *inode) +{ + struct btrfs_trans_handle *trans; + struct btrfs_root *root = inode->root; + int ret; + + /* + * If the inode has no links, it is either already unlinked, or was + * created with O_TMPFILE. In either case, it should have an orphan from + * that other operation. Rather than reference count the orphans, we + * simply ignore them here, because we only invoke the verity path in + * the orphan logic when i_nlink is 0. + */ + if (!inode->vfs_inode.i_nlink) + return 0; + + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) + return PTR_ERR(trans); + + ret = btrfs_del_orphan_item(trans, root, btrfs_ino(inode)); + if (ret) { + btrfs_abort_transaction(trans, ret); + return ret; + } + + btrfs_end_transaction(trans); + return ret; +} + /* * Drop verity items from the btree and from the page cache * @@ -399,11 +457,12 @@ static int btrfs_begin_enable_verity(struct file *filp) return -EBUSY; set_bit(BTRFS_INODE_VERITY_IN_PROGRESS, &BTRFS_I(inode)->runtime_flags); - ret = drop_verity_items(BTRFS_I(inode), BTRFS_VERITY_DESC_ITEM_KEY); + + ret = btrfs_drop_verity_items(BTRFS_I(inode)); if (ret) goto err; - ret = drop_verity_items(BTRFS_I(inode), BTRFS_VERITY_MERKLE_ITEM_KEY); + ret = add_orphan(BTRFS_I(inode)); if (ret) goto err; @@ -430,6 +489,7 @@ static int btrfs_end_enable_verity(struct file *filp, const void *desc, struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_verity_descriptor_item item; int ret; + int keep_orphan = 0; if (desc != NULL) { /* write out the descriptor item */ @@ -461,11 +521,20 @@ static int btrfs_end_enable_verity(struct file *filp, const void *desc, out: if (desc == NULL || ret) { - /* If we failed, drop all the verity items */ - drop_verity_items(BTRFS_I(inode), BTRFS_VERITY_DESC_ITEM_KEY); - drop_verity_items(BTRFS_I(inode), BTRFS_VERITY_MERKLE_ITEM_KEY); + /* + * If verity failed (here or in the generic code), drop all the + * verity items. + */ + keep_orphan = btrfs_drop_verity_items(BTRFS_I(inode)); } else btrfs_set_fs_compat_ro(root->fs_info, VERITY); + /* + * If we are handling an error, but failed to drop the verity items, + * we still need the orphan. + */ + if (!keep_orphan) + del_orphan(BTRFS_I(inode)); + clear_bit(BTRFS_INODE_VERITY_IN_PROGRESS, &BTRFS_I(inode)->runtime_flags); return ret; }