From patchwork Tue Oct 22 14:50:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Harmstone X-Patchwork-Id: 13845784 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D6F6E7DA7C for ; Tue, 22 Oct 2024 14:50:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729608643; cv=none; b=avvloX+ddaoulbyRIwtNDnz6ULGHY6EIRtkkh9bg66aKMW56dWEfiaHsg9XcavU5+QO9mxqK+Qy9cximyN4LKCV04QpORb0MVgy8yYYFfqAI77CLCNTVKajKyMWrNh4YEW4NiXVcRDVVRICTrLxlc9kbRUlSP/Zep6W48mCM3rM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729608643; c=relaxed/simple; bh=87TBmgu+PMGIOZvrk+fDZikTiy3fhPgRa9D3VjaDTNE=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=VCeMvW0Z4uUcvdGV20huOlwEANqhbFa2lS8ZA5UAgHksucS0y44zrdcoxTdAaKv2VpohZFIqNp2xMsyEhq05+F4q1jULYq9zu/VGLEtyvsyPcYz2+crYaXcF4RcRKUN46DqCkfeujGnTFijPfyCQlQR3M/Tvzrn/0koXuZCCnn4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fb.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b=bDQbfcZZ; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fb.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="bDQbfcZZ" Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49MDn55l012750 for ; Tue, 22 Oct 2024 07:50:40 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=facebook; bh=G mZA9KDJveq2hQvMo9nOsPOr/lSBv0lDAATJFn4uOgE=; b=bDQbfcZZemBc/9WzW 5Uc/8iq5NPtRg+DyyPamm1zBif0FA2079IsQLaJC2OnqzaYJ4ySYyqnZ3NJo8JKY 6toizIEYM8iGituBa2G5KF/TCKPuU94tGHYeLTWTkaB8sAuAg4apcvcsGAN5z76r 12YjkRhBw4Ux6kW6dABPvsqXY0= Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 42ecjd8rw5-20 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 22 Oct 2024 07:50:40 -0700 (PDT) Received: from twshared23455.15.frc2.facebook.com (2620:10d:c0a8:1b::8e35) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Tue, 22 Oct 2024 14:50:37 +0000 Received: by devbig276.nha1.facebook.com (Postfix, from userid 660015) id 935FE7FDCDA8; Tue, 22 Oct 2024 15:50:32 +0100 (BST) From: Mark Harmstone To: CC: , Mark Harmstone Subject: [PATCH 1/5] btrfs: remove pointless addition in btrfs_encoded_read Date: Tue, 22 Oct 2024 15:50:16 +0100 Message-ID: <20241022145024.1046883-2-maharmstone@fb.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241022145024.1046883-1-maharmstone@fb.com> References: <20241022145024.1046883-1-maharmstone@fb.com> Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: xp8iwgl8_amxLFBjzOgVYq95dP9kg36t X-Proofpoint-GUID: xp8iwgl8_amxLFBjzOgVYq95dP9kg36t X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_03,2024-10-04_01,2024-09-30_01 iocb->ki_pos isn't used after this function, so there's no point in changing its value. Signed-off-by: Mark Harmstone --- fs/btrfs/inode.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 7c5ef2c5c7e8..94098a4c782d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -9252,7 +9252,7 @@ ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter, ret = btrfs_encoded_read_inline(iocb, iter, start, lockend, &cached_state, extent_start, count, encoded, &unlocked); - goto out; + goto out_em; } /* @@ -9318,9 +9318,6 @@ ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter, &unlocked); } -out: - if (ret >= 0) - iocb->ki_pos += encoded->len; out_em: free_extent_map(em); out_unlock_extent: From patchwork Tue Oct 22 14:50:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Harmstone X-Patchwork-Id: 13845789 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B09DA1A2562 for ; Tue, 22 Oct 2024 14:50:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729608659; cv=none; b=iNj6Ybtno5qN9idUOnxo3NhX3IFL590/WB6L0AsLZ+7VCOTizQcovK3bgqQ/8yli6xFJMpk6jifcJkjIKs1rjrWbKy3jGw/+kyHBMTvFG5JKTaapqUPCt9wgyLP4qG+3uF0jMYVdbTm2qBFrpDqjMhb9UpPXPufIBh0wkVa5Tzo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729608659; c=relaxed/simple; bh=jcAWiMqBTPTGbxhO5biUmLCEYzANTGnD1124VL8NmS4=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=lxL3f/RI2LJ9dZIVVWEe5BUL7hrHse4i4GNy9gQQFcsRsNyY7CiVaMGMKc986yvGQ1H7MOtQSVAmdqtTZaBgvKJj430mpapwA/MDHlyo9MZhx2YX0sml+XCtJWp6sCBWSThGjyZ+Xms+QCfCauN4CdoLZ60aGFv0G6vxN3hZUWg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fb.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b=MVV4u6Fy; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fb.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="MVV4u6Fy" Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49MELR9P024551 for ; Tue, 22 Oct 2024 07:50:56 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=facebook; bh=O FFWFj88xL2Dj/HElq1YlDCuz0JjMvCuTyHJHqZEU1k=; b=MVV4u6FyeZWMzAJrx vWJLM3sGLkY2aHyZ8lU2x6vbnuGyR2+zbd9ZWQhk1Wb6iWR2a7rxOru/+tzGMolu Hcx6rinORQey+t9KfAOV5gKvREdG95roW+pZb6v0h9ktc/05evB2TarcLib6Qvfa cs+tB7bU3ScFA6Qvcr9bx0f5R4= Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 42edr5r6a8-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 22 Oct 2024 07:50:56 -0700 (PDT) Received: from twshared11671.02.ash9.facebook.com (2620:10d:c0a8:1b::30) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Tue, 22 Oct 2024 14:50:43 +0000 Received: by devbig276.nha1.facebook.com (Postfix, from userid 660015) id AF85B7FDCDA9; Tue, 22 Oct 2024 15:50:32 +0100 (BST) From: Mark Harmstone To: CC: , Mark Harmstone Subject: [PATCH 2/5] btrfs: change btrfs_encoded_read so that reading of extent is done by caller Date: Tue, 22 Oct 2024 15:50:17 +0100 Message-ID: <20241022145024.1046883-3-maharmstone@fb.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241022145024.1046883-1-maharmstone@fb.com> References: <20241022145024.1046883-1-maharmstone@fb.com> Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: hklde1IG62jciJh3R-gZ2JJtc4x5CC1f X-Proofpoint-ORIG-GUID: hklde1IG62jciJh3R-gZ2JJtc4x5CC1f X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_03,2024-10-04_01,2024-09-30_01 Change the behaviour of btrfs_encoded_read so that if it needs to read an extent from disk, it leaves the extent and inode locked and returns -EIOCBQUEUED. The caller is then responsible for doing the I/O via btrfs_encoded_read_regular and unlocking the extent and inode. Signed-off-by: Mark Harmstone --- fs/btrfs/btrfs_inode.h | 10 +++++++- fs/btrfs/inode.c | 58 ++++++++++++++++++++---------------------- fs/btrfs/ioctl.c | 33 +++++++++++++++++++++++- 3 files changed, 69 insertions(+), 32 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 157fd3f4cb33..ab1fbde97cee 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -615,7 +615,15 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, u64 disk_bytenr, u64 disk_io_size, struct page **pages); ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter, - struct btrfs_ioctl_encoded_io_args *encoded); + struct btrfs_ioctl_encoded_io_args *encoded, + struct extent_state **cached_state, + u64 *disk_bytenr, u64 *disk_io_size); +ssize_t btrfs_encoded_read_regular(struct kiocb *iocb, struct iov_iter *iter, + u64 start, u64 lockend, + struct extent_state **cached_state, + u64 disk_bytenr, u64 disk_io_size, + size_t count, bool compressed, + bool *unlocked); ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from, const struct btrfs_ioctl_encoded_io_args *encoded); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 94098a4c782d..0a4dc85769c7 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -9123,13 +9123,12 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, return blk_status_to_errno(READ_ONCE(priv.status)); } -static ssize_t btrfs_encoded_read_regular(struct kiocb *iocb, - struct iov_iter *iter, - u64 start, u64 lockend, - struct extent_state **cached_state, - u64 disk_bytenr, u64 disk_io_size, - size_t count, bool compressed, - bool *unlocked) +ssize_t btrfs_encoded_read_regular(struct kiocb *iocb, struct iov_iter *iter, + u64 start, u64 lockend, + struct extent_state **cached_state, + u64 disk_bytenr, u64 disk_io_size, + size_t count, bool compressed, + bool *unlocked) { struct btrfs_inode *inode = BTRFS_I(file_inode(iocb->ki_filp)); struct extent_io_tree *io_tree = &inode->io_tree; @@ -9190,15 +9189,16 @@ static ssize_t btrfs_encoded_read_regular(struct kiocb *iocb, } ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter, - struct btrfs_ioctl_encoded_io_args *encoded) + struct btrfs_ioctl_encoded_io_args *encoded, + struct extent_state **cached_state, + u64 *disk_bytenr, u64 *disk_io_size) { struct btrfs_inode *inode = BTRFS_I(file_inode(iocb->ki_filp)); struct btrfs_fs_info *fs_info = inode->root->fs_info; struct extent_io_tree *io_tree = &inode->io_tree; ssize_t ret; size_t count = iov_iter_count(iter); - u64 start, lockend, disk_bytenr, disk_io_size; - struct extent_state *cached_state = NULL; + u64 start, lockend; struct extent_map *em; bool unlocked = false; @@ -9224,13 +9224,13 @@ ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter, lockend - start + 1); if (ret) goto out_unlock_inode; - lock_extent(io_tree, start, lockend, &cached_state); + lock_extent(io_tree, start, lockend, cached_state); ordered = btrfs_lookup_ordered_range(inode, start, lockend - start + 1); if (!ordered) break; btrfs_put_ordered_extent(ordered); - unlock_extent(io_tree, start, lockend, &cached_state); + unlock_extent(io_tree, start, lockend, cached_state); cond_resched(); } @@ -9250,7 +9250,7 @@ ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter, free_extent_map(em); em = NULL; ret = btrfs_encoded_read_inline(iocb, iter, start, lockend, - &cached_state, extent_start, + cached_state, extent_start, count, encoded, &unlocked); goto out_em; } @@ -9263,12 +9263,12 @@ ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter, inode->vfs_inode.i_size) - iocb->ki_pos; if (em->disk_bytenr == EXTENT_MAP_HOLE || (em->flags & EXTENT_FLAG_PREALLOC)) { - disk_bytenr = EXTENT_MAP_HOLE; + *disk_bytenr = EXTENT_MAP_HOLE; count = min_t(u64, count, encoded->len); encoded->len = count; encoded->unencoded_len = count; } else if (extent_map_is_compressed(em)) { - disk_bytenr = em->disk_bytenr; + *disk_bytenr = em->disk_bytenr; /* * Bail if the buffer isn't large enough to return the whole * compressed extent. @@ -9277,7 +9277,7 @@ ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter, ret = -ENOBUFS; goto out_em; } - disk_io_size = em->disk_num_bytes; + *disk_io_size = em->disk_num_bytes; count = em->disk_num_bytes; encoded->unencoded_len = em->ram_bytes; encoded->unencoded_offset = iocb->ki_pos - (em->start - em->offset); @@ -9287,44 +9287,42 @@ ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter, goto out_em; encoded->compression = ret; } else { - disk_bytenr = extent_map_block_start(em) + (start - em->start); + *disk_bytenr = extent_map_block_start(em) + (start - em->start); if (encoded->len > count) encoded->len = count; /* * Don't read beyond what we locked. This also limits the page * allocations that we'll do. */ - disk_io_size = min(lockend + 1, iocb->ki_pos + encoded->len) - start; - count = start + disk_io_size - iocb->ki_pos; + *disk_io_size = min(lockend + 1, iocb->ki_pos + encoded->len) - start; + count = start + *disk_io_size - iocb->ki_pos; encoded->len = count; encoded->unencoded_len = count; - disk_io_size = ALIGN(disk_io_size, fs_info->sectorsize); + *disk_io_size = ALIGN(*disk_io_size, fs_info->sectorsize); } free_extent_map(em); em = NULL; - if (disk_bytenr == EXTENT_MAP_HOLE) { - unlock_extent(io_tree, start, lockend, &cached_state); + if (*disk_bytenr == EXTENT_MAP_HOLE) { + unlock_extent(io_tree, start, lockend, cached_state); btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED); unlocked = true; ret = iov_iter_zero(count, iter); if (ret != count) ret = -EFAULT; } else { - ret = btrfs_encoded_read_regular(iocb, iter, start, lockend, - &cached_state, disk_bytenr, - disk_io_size, count, - encoded->compression, - &unlocked); + ret = -EIOCBQUEUED; + goto out_em; } out_em: free_extent_map(em); out_unlock_extent: - if (!unlocked) - unlock_extent(io_tree, start, lockend, &cached_state); + /* Leave inode and extent locked if we need to do a read */ + if (!unlocked && ret != -EIOCBQUEUED) + unlock_extent(io_tree, start, lockend, cached_state); out_unlock_inode: - if (!unlocked) + if (!unlocked && ret != -EIOCBQUEUED) btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED); return ret; } diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 28b9b7fda578..d502b31010bc 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -4513,12 +4513,17 @@ static int btrfs_ioctl_encoded_read(struct file *file, void __user *argp, size_t copy_end_kernel = offsetofend(struct btrfs_ioctl_encoded_io_args, flags); size_t copy_end; + struct btrfs_inode *inode = BTRFS_I(file_inode(file)); + struct btrfs_fs_info *fs_info = inode->root->fs_info; + struct extent_io_tree *io_tree = &inode->io_tree; struct iovec iovstack[UIO_FASTIOV]; struct iovec *iov = iovstack; struct iov_iter iter; loff_t pos; struct kiocb kiocb; ssize_t ret; + u64 disk_bytenr, disk_io_size; + struct extent_state *cached_state = NULL; if (!capable(CAP_SYS_ADMIN)) { ret = -EPERM; @@ -4571,7 +4576,33 @@ static int btrfs_ioctl_encoded_read(struct file *file, void __user *argp, init_sync_kiocb(&kiocb, file); kiocb.ki_pos = pos; - ret = btrfs_encoded_read(&kiocb, &iter, &args); + ret = btrfs_encoded_read(&kiocb, &iter, &args, &cached_state, + &disk_bytenr, &disk_io_size); + + if (ret == -EIOCBQUEUED) { + bool unlocked = false; + u64 start, lockend, count; + + start = ALIGN_DOWN(kiocb.ki_pos, fs_info->sectorsize); + lockend = start + BTRFS_MAX_UNCOMPRESSED - 1; + + if (args.compression) + count = disk_io_size; + else + count = args.len; + + ret = btrfs_encoded_read_regular(&kiocb, &iter, start, lockend, + &cached_state, disk_bytenr, + disk_io_size, count, + args.compression, + &unlocked); + + if (!unlocked) { + unlock_extent(io_tree, start, lockend, &cached_state); + btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED); + } + } + if (ret >= 0) { fsnotify_access(file); if (copy_to_user(argp + copy_end, From patchwork Tue Oct 22 14:50:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Harmstone X-Patchwork-Id: 13845786 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 71A851A0BD1 for ; Tue, 22 Oct 2024 14:50:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729608647; cv=none; b=DLncSS15OXW/ouMXgGeMXiELGkMd86ijG1lrVyDmDREmsM/clF00Ty9NjcFPW90ywUsm6uwrGtgMG6MyDgJYFTRnixliikYfywTxp6Tuxowi/H0M444fYVuCW3+75yWMj5rrsX3KuO7YSvnCVWMGOYOqcuyIvy9gECN5Fe2NZSc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729608647; c=relaxed/simple; bh=f9wbl/1eX7VxXVjF/NIOrNOYgbnJ+/NGVcPbU1q1Fm8=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=iois8rrPPqE+2HunUJLnkgAEcSkaFFl7StLB8B9R30xMa6dXomCQgxHomK8+Fqll+gjJeDOA1FT7oLTxtCz4PlcjTXEHatYNMp0Sl6SJBiHCR390sY0/YpiMeub7Q0tkUTekaID0APiYObyVprh55cqlIg+Asv0VzZY8Bdef+VI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fb.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b=JqT0438M; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fb.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="JqT0438M" Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.18.1.2/8.18.1.2) with ESMTP id 49MDn5MB003018 for ; Tue, 22 Oct 2024 07:50:44 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=facebook; bh=3 gyl93d9CXuarbYsxeBNAg2bX1ItLNvyUtgtZltQC3k=; b=JqT0438MXUS7tUeWz iQLRPEGWPYT1f7g/jwC0xChz4QGmSFBmw8kN1CkJR4AWpZ54KUh6lV/zqIb42J/t fDW0J4Id6As77AsHvVbxpffGxIjJ/FwXzZWaL1HBHho6PG/CIGegLpcgjY//nqqh ZCY69ibb71db2vMdmaaX5flrvg= Received: from maileast.thefacebook.com ([163.114.135.16]) by m0001303.ppops.net (PPS) with ESMTPS id 42ea7hhgw1-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 22 Oct 2024 07:50:44 -0700 (PDT) Received: from twshared11671.02.ash9.facebook.com (2620:10d:c0a8:fe::f072) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Tue, 22 Oct 2024 14:50:43 +0000 Received: by devbig276.nha1.facebook.com (Postfix, from userid 660015) id B2FCF7FDCDAB; Tue, 22 Oct 2024 15:50:32 +0100 (BST) From: Mark Harmstone To: CC: , Mark Harmstone Subject: [PATCH 3/5] btrfs: don't sleep in btrfs_encoded_read if IOCB_NOWAIT set Date: Tue, 22 Oct 2024 15:50:18 +0100 Message-ID: <20241022145024.1046883-4-maharmstone@fb.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241022145024.1046883-1-maharmstone@fb.com> References: <20241022145024.1046883-1-maharmstone@fb.com> Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: acHS66dZMVVphSb-ZKZ80XMJEeHLmk0a X-Proofpoint-ORIG-GUID: acHS66dZMVVphSb-ZKZ80XMJEeHLmk0a X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_03,2024-10-04_01,2024-09-30_01 Change btrfs_encoded_read so that it returns -EAGAIN rather than sleeps if IOCB_NOWAIT is set in iocb->ki_flags. Signed-off-by: Mark Harmstone --- fs/btrfs/inode.c | 54 ++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 43 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 0a4dc85769c7..0c0753f20d54 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8984,12 +8984,16 @@ static ssize_t btrfs_encoded_read_inline( unsigned long ptr; void *tmp; ssize_t ret; + bool nowait = iocb->ki_flags & IOCB_NOWAIT; path = btrfs_alloc_path(); if (!path) { ret = -ENOMEM; goto out; } + + path->nowait = !!nowait; + ret = btrfs_lookup_file_extent(NULL, root, path, btrfs_ino(inode), extent_start, 0); if (ret) { @@ -9200,11 +9204,15 @@ ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter, size_t count = iov_iter_count(iter); u64 start, lockend; struct extent_map *em; + bool nowait = iocb->ki_flags & IOCB_NOWAIT; bool unlocked = false; file_accessed(iocb->ki_filp); - btrfs_inode_lock(inode, BTRFS_ILOCK_SHARED); + ret = btrfs_inode_lock(inode, + BTRFS_ILOCK_SHARED | (nowait ? BTRFS_ILOCK_TRY : 0)); + if (ret) + return ret; if (iocb->ki_pos >= inode->vfs_inode.i_size) { btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED); @@ -9217,21 +9225,45 @@ ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter, */ lockend = start + BTRFS_MAX_UNCOMPRESSED - 1; - for (;;) { + if (nowait) { struct btrfs_ordered_extent *ordered; - ret = btrfs_wait_ordered_range(inode, start, - lockend - start + 1); - if (ret) + if (filemap_range_needs_writeback(inode->vfs_inode.i_mapping, + start, lockend)) { + ret = -EAGAIN; goto out_unlock_inode; - lock_extent(io_tree, start, lockend, cached_state); + } + + if (!try_lock_extent(io_tree, start, lockend, cached_state)) { + ret = -EAGAIN; + goto out_unlock_inode; + } + ordered = btrfs_lookup_ordered_range(inode, start, lockend - start + 1); - if (!ordered) - break; - btrfs_put_ordered_extent(ordered); - unlock_extent(io_tree, start, lockend, cached_state); - cond_resched(); + if (ordered) { + btrfs_put_ordered_extent(ordered); + unlock_extent(io_tree, start, lockend, cached_state); + ret = -EAGAIN; + goto out_unlock_inode; + } + } else { + for (;;) { + struct btrfs_ordered_extent *ordered; + + ret = btrfs_wait_ordered_range(inode, start, + lockend - start + 1); + if (ret) + goto out_unlock_inode; + lock_extent(io_tree, start, lockend, cached_state); + ordered = btrfs_lookup_ordered_range(inode, start, + lockend - start + 1); + if (!ordered) + break; + btrfs_put_ordered_extent(ordered); + unlock_extent(io_tree, start, lockend, cached_state); + cond_resched(); + } } em = btrfs_get_extent(inode, NULL, start, lockend - start + 1); From patchwork Tue Oct 22 14:50:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Harmstone X-Patchwork-Id: 13845787 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 23F4E7DA7C for ; Tue, 22 Oct 2024 14:50:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729608652; cv=none; b=nQUGxz7eCTS8iBAS+0YeSeIIztLL+n67QD2hVoiI2A/sSAl0RqX9jVZT34k7WPGUiZMUUGiDNsGeoFanstMv3KcEpPtvN6s0xLqQbu1+qsh/cXylzr9AbvR22UcK393GyiSAssuN3bb6f93ZYBgHbe3nbs+TSsgd3IsRRUprtEE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729608652; c=relaxed/simple; bh=ZQqI/idDGj2CcTB3dUSpIKbEX1BYlGP7SaJ9M8tuCHw=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=M+H1B/QPbrmJI82g35mVY9kotcQ6axEKUBnFz4p4E6fZw+zM7orUcOyYRDTwIn/qBZxke4U7S4CBh1Q1gUOOT08AO8YIO76vTvWdwT9xOPOYKbSyLzYBA+QipwKFWn4gt9mi51yxRc7OfiGevRLQusFVTjm2PCdmNd6jqBpQvWQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fb.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b=LVGMSqG8; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fb.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="LVGMSqG8" Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49MDnDJs014113 for ; Tue, 22 Oct 2024 07:50:50 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=facebook; bh=T XiqcxvRqPMP5YK4fmGwC9B8y6cWCpznsjSL1sQcVq0=; b=LVGMSqG898tmEsf3+ tLj+mkUC70JSfigWrsKUJ+M8pHkz8m2LcJPtS7Q5SbX5RcziGTjh5BjYWC2scxuX 4nyAfMATxlL0XESdgvqAC23E8NIIeACUHqXU7W2nE+OT5EhSvhj5hyglsV6NGyBw sq6cjdm0Shhe60Xiogp5wV2kOk= Received: from mail.thefacebook.com ([163.114.134.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 42ea7psg41-6 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 22 Oct 2024 07:50:49 -0700 (PDT) Received: from twshared17102.15.frc2.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c08b:78::2ac9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Tue, 22 Oct 2024 14:50:36 +0000 Received: by devbig276.nha1.facebook.com (Postfix, from userid 660015) id B90F97FDCDAE; Tue, 22 Oct 2024 15:50:32 +0100 (BST) From: Mark Harmstone To: CC: , Mark Harmstone Subject: [PATCH 4/5] btrfs: move priv off stack in btrfs_encoded_read_regular_fill_pages Date: Tue, 22 Oct 2024 15:50:19 +0100 Message-ID: <20241022145024.1046883-5-maharmstone@fb.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241022145024.1046883-1-maharmstone@fb.com> References: <20241022145024.1046883-1-maharmstone@fb.com> Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: skdKsG9DGPbEJGzvsCzo64PskL7lQHth X-Proofpoint-ORIG-GUID: skdKsG9DGPbEJGzvsCzo64PskL7lQHth X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_03,2024-10-04_01,2024-09-30_01 Change btrfs_encoded_read_regular_fill_pages so that the priv struct is allocated rather than stored on the stack, in preparation for adding an asynchronous mode to the function. Signed-off-by: Mark Harmstone --- fs/btrfs/inode.c | 29 ++++++++++++++++++----------- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 0c0753f20d54..5aedb85696f4 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -9086,16 +9086,21 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, struct page **pages) { struct btrfs_fs_info *fs_info = inode->root->fs_info; - struct btrfs_encoded_read_private priv = { - .pending = ATOMIC_INIT(1), - }; + struct btrfs_encoded_read_private *priv; unsigned long i = 0; struct btrfs_bio *bbio; + int ret; - init_waitqueue_head(&priv.wait); + priv = kmalloc(sizeof(struct btrfs_encoded_read_private), GFP_NOFS); + if (!priv) + return -ENOMEM; + + init_waitqueue_head(&priv->wait); + atomic_set(&priv->pending, 1); + priv->status = 0; bbio = btrfs_bio_alloc(BIO_MAX_VECS, REQ_OP_READ, fs_info, - btrfs_encoded_read_endio, &priv); + btrfs_encoded_read_endio, priv); bbio->bio.bi_iter.bi_sector = disk_bytenr >> SECTOR_SHIFT; bbio->inode = inode; @@ -9103,11 +9108,11 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, size_t bytes = min_t(u64, disk_io_size, PAGE_SIZE); if (bio_add_page(&bbio->bio, pages[i], bytes, 0) < bytes) { - atomic_inc(&priv.pending); + atomic_inc(&priv->pending); btrfs_submit_bbio(bbio, 0); bbio = btrfs_bio_alloc(BIO_MAX_VECS, REQ_OP_READ, fs_info, - btrfs_encoded_read_endio, &priv); + btrfs_encoded_read_endio, priv); bbio->bio.bi_iter.bi_sector = disk_bytenr >> SECTOR_SHIFT; bbio->inode = inode; continue; @@ -9118,13 +9123,15 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, disk_io_size -= bytes; } while (disk_io_size); - atomic_inc(&priv.pending); + atomic_inc(&priv->pending); btrfs_submit_bbio(bbio, 0); - if (atomic_dec_return(&priv.pending)) - io_wait_event(priv.wait, !atomic_read(&priv.pending)); + if (atomic_dec_return(&priv->pending)) + io_wait_event(priv->wait, !atomic_read(&priv->pending)); /* See btrfs_encoded_read_endio() for ordering. */ - return blk_status_to_errno(READ_ONCE(priv.status)); + ret = blk_status_to_errno(READ_ONCE(priv->status)); + kfree(priv); + return ret; } ssize_t btrfs_encoded_read_regular(struct kiocb *iocb, struct iov_iter *iter, From patchwork Tue Oct 22 14:50:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Harmstone X-Patchwork-Id: 13845788 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 139377DA7C for ; Tue, 22 Oct 2024 14:50:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729608655; cv=none; b=Yi1DTjCdHr46tvtNV3GpDY3NmbuSVklFIBL0hp9b47bhPNd+0q037Ld6HOfJamdAILFgTROPhErS9xp2dJzPVI9pGSM7ilE9j/fNnU8czcJCfT4jxhM0zi4WbQHNvNmrzO2m/aKCr0abDEXVr7JgF8sPHL6LbNH7yUjlZYvvbwA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729608655; c=relaxed/simple; bh=gdmi89HQ7TNSYH7JlAJVYY3f6nr2PfZf9Mi+IV8vrkg=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=HGv3GK2+HA5JxIdBToOEIErFxzEnbhD7MpgrhPvrpEonuO0KgeKgphTWvQ16+pAll489Ah11SW4cSR2kUUgly1f05wdgtfjX27HqxPbDhVFn2AU2CuFqYd9lUKMmvqkAebca4CjmkH0MCEFEBjaVZKebQMTLexFeCrqFk/RBi/k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fb.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b=owvaURYa; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fb.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="owvaURYa" Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49MDmlOZ001952 for ; Tue, 22 Oct 2024 07:50:52 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=facebook; bh=Z K6rUQeoLW8URSECIC2LEbudIAhAq48FcZfX9+aUF1g=; b=owvaURYa7hDSKQdbd B2pl9k0tu75Fmm32tlmjdXGt6VXGQtEl8Ffy1sotzJbBG/Ksg8TuWKNN5LtR26z6 8GwYfjPbHnBY5CjdnbTDofrXSqqNBg7TfZy1IjfP7WYc6bzo/u/ApFB8NVROubOE 9athDKIhzcpBrlp8HF0r4XMVUM= Received: from mail.thefacebook.com ([163.114.134.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 42e7ktact3-20 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 22 Oct 2024 07:50:51 -0700 (PDT) Received: from twshared4239.05.ash9.facebook.com (2620:10d:c085:208::7cb7) by mail.thefacebook.com (2620:10d:c08b:78::2ac9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Tue, 22 Oct 2024 14:50:39 +0000 Received: by devbig276.nha1.facebook.com (Postfix, from userid 660015) id BD32E7FDCDB0; Tue, 22 Oct 2024 15:50:32 +0100 (BST) From: Mark Harmstone To: CC: , Mark Harmstone Subject: [PATCH 5/5] btrfs: add io_uring command for encoded reads Date: Tue, 22 Oct 2024 15:50:20 +0100 Message-ID: <20241022145024.1046883-6-maharmstone@fb.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241022145024.1046883-1-maharmstone@fb.com> References: <20241022145024.1046883-1-maharmstone@fb.com> Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: v084s209jMUtEdJoSJ1a6rUAxN56n3vu X-Proofpoint-GUID: v084s209jMUtEdJoSJ1a6rUAxN56n3vu X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_03,2024-10-04_01,2024-09-30_01 Adds an io_uring command for encoded reads, using the same interface as the existing BTRFS_IOC_ENCODED_READ ioctl. btrfs_uring_encoded_read is an io_uring version of btrfs_ioctl_encoded_read, which validates the user input and calls btrfs_encoded_read to read the appropriate metadata. If we determine that we need to read an extent from disk, we call btrfs_encoded_read_regular_fill_pages through btrfs_uring_read_extent to prepare the bio. The existing btrfs_encoded_read_regular_fill_pages is changed so that if it is passed a uring_ctx value, rather than waking up any waiting threads it calls btrfs_uring_read_extent_endio. This in turn copies the read data back to userspace, and calls io_uring_cmd_done to complete the io_uring command. Because we're potentially doing a non-blocking read, btrfs_uring_read_extent doesn't clean up after itself if it returns -EIOCBQUEUED. Instead, it allocates a priv struct, populates the fields there that we will need to unlock the inode and free our allocations, and defers this to the btrfs_uring_read_finished that gets called when the bio completes. Signed-off-by: Mark Harmstone --- fs/btrfs/btrfs_inode.h | 3 +- fs/btrfs/file.c | 1 + fs/btrfs/inode.c | 43 ++++-- fs/btrfs/ioctl.c | 309 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/ioctl.h | 2 + fs/btrfs/send.c | 3 +- 6 files changed, 349 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index ab1fbde97cee..f551444b498e 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -613,7 +613,8 @@ int btrfs_encoded_io_compression_from_extent(struct btrfs_fs_info *fs_info, int compress_type); int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, u64 disk_bytenr, u64 disk_io_size, - struct page **pages); + struct page **pages, + void *uring_ctx); ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter, struct btrfs_ioctl_encoded_io_args *encoded, struct extent_state **cached_state, diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 5e0a1805e897..fbb753300071 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -3710,6 +3710,7 @@ const struct file_operations btrfs_file_operations = { .compat_ioctl = btrfs_compat_ioctl, #endif .remap_file_range = btrfs_remap_file_range, + .uring_cmd = btrfs_uring_cmd, .fop_flags = FOP_BUFFER_RASYNC | FOP_BUFFER_WASYNC, }; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 5aedb85696f4..759ae076f90b 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -9057,6 +9057,7 @@ static ssize_t btrfs_encoded_read_inline( struct btrfs_encoded_read_private { wait_queue_head_t wait; + void *uring_ctx; atomic_t pending; blk_status_t status; }; @@ -9076,14 +9077,23 @@ static void btrfs_encoded_read_endio(struct btrfs_bio *bbio) */ WRITE_ONCE(priv->status, bbio->bio.bi_status); } - if (!atomic_dec_return(&priv->pending)) - wake_up(&priv->wait); + if (atomic_dec_return(&priv->pending) == 0) { + int err = blk_status_to_errno(READ_ONCE(priv->status)); + + if (priv->uring_ctx) { + btrfs_uring_read_extent_endio(priv->uring_ctx, err); + kfree(priv); + } else { + wake_up(&priv->wait); + } + } bio_put(&bbio->bio); } int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, u64 disk_bytenr, u64 disk_io_size, - struct page **pages) + struct page **pages, + void *uring_ctx) { struct btrfs_fs_info *fs_info = inode->root->fs_info; struct btrfs_encoded_read_private *priv; @@ -9098,6 +9108,7 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, init_waitqueue_head(&priv->wait); atomic_set(&priv->pending, 1); priv->status = 0; + priv->uring_ctx = uring_ctx; bbio = btrfs_bio_alloc(BIO_MAX_VECS, REQ_OP_READ, fs_info, btrfs_encoded_read_endio, priv); @@ -9126,12 +9137,23 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, atomic_inc(&priv->pending); btrfs_submit_bbio(bbio, 0); - if (atomic_dec_return(&priv->pending)) - io_wait_event(priv->wait, !atomic_read(&priv->pending)); - /* See btrfs_encoded_read_endio() for ordering. */ - ret = blk_status_to_errno(READ_ONCE(priv->status)); - kfree(priv); - return ret; + if (uring_ctx) { + if (atomic_dec_return(&priv->pending) == 0) { + ret = blk_status_to_errno(READ_ONCE(priv->status)); + btrfs_uring_read_extent_endio(uring_ctx, ret); + kfree(priv); + return ret; + } + + return -EIOCBQUEUED; + } else { + if (atomic_dec_return(&priv->pending) != 0) + io_wait_event(priv->wait, !atomic_read(&priv->pending)); + /* See btrfs_encoded_read_endio() for ordering. */ + ret = blk_status_to_errno(READ_ONCE(priv->status)); + kfree(priv); + return ret; + } } ssize_t btrfs_encoded_read_regular(struct kiocb *iocb, struct iov_iter *iter, @@ -9160,7 +9182,8 @@ ssize_t btrfs_encoded_read_regular(struct kiocb *iocb, struct iov_iter *iter, } ret = btrfs_encoded_read_regular_fill_pages(inode, disk_bytenr, - disk_io_size, pages); + disk_io_size, pages, + NULL); if (ret) goto out; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index d502b31010bc..7f2731ef3dbb 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -29,6 +29,7 @@ #include #include #include +#include #include "ctree.h" #include "disk-io.h" #include "export.h" @@ -4720,6 +4721,314 @@ static int btrfs_ioctl_encoded_write(struct file *file, void __user *argp, bool return ret; } +/* + * struct btrfs_uring_priv is the context that's attached to an encoded read + * io_uring command, in cmd->pdu. It contains the fields in + * btrfs_uring_read_extent that are necessary to finish off and cleanup the I/O + * in btrfs_uring_read_finished. + */ +struct btrfs_uring_priv { + struct io_uring_cmd *cmd; + struct page **pages; + unsigned long nr_pages; + struct kiocb iocb; + struct iovec *iov; + struct iov_iter iter; + struct extent_state *cached_state; + u64 count; + u64 start; + u64 lockend; + int err; + bool compressed; +}; + +static void btrfs_uring_read_finished(struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ + struct btrfs_uring_priv *priv = + *io_uring_cmd_to_pdu(cmd, struct btrfs_uring_priv *); + struct btrfs_inode *inode = BTRFS_I(file_inode(priv->iocb.ki_filp)); + struct extent_io_tree *io_tree = &inode->io_tree; + unsigned long i; + u64 cur; + size_t page_offset; + ssize_t ret; + + if (priv->err) { + ret = priv->err; + goto out; + } + + if (priv->compressed) { + i = 0; + page_offset = 0; + } else { + i = (priv->iocb.ki_pos - priv->start) >> PAGE_SHIFT; + page_offset = offset_in_page(priv->iocb.ki_pos - priv->start); + } + cur = 0; + while (cur < priv->count) { + size_t bytes = min_t(size_t, priv->count - cur, + PAGE_SIZE - page_offset); + + if (copy_page_to_iter(priv->pages[i], page_offset, bytes, + &priv->iter) != bytes) { + ret = -EFAULT; + goto out; + } + + i++; + cur += bytes; + page_offset = 0; + } + ret = priv->count; + +out: + unlock_extent(io_tree, priv->start, priv->lockend, &priv->cached_state); + btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED); + + io_uring_cmd_done(cmd, ret, 0, issue_flags); + add_rchar(current, ret); + + for (unsigned long index = 0; index < priv->nr_pages; index++) + __free_page(priv->pages[index]); + + kfree(priv->pages); + kfree(priv->iov); + kfree(priv); +} + +void btrfs_uring_read_extent_endio(void *ctx, int err) +{ + struct btrfs_uring_priv *priv = ctx; + + priv->err = err; + + *io_uring_cmd_to_pdu(priv->cmd, struct btrfs_uring_priv *) = priv; + io_uring_cmd_complete_in_task(priv->cmd, btrfs_uring_read_finished); +} + +static int btrfs_uring_read_extent(struct kiocb *iocb, struct iov_iter *iter, + u64 start, u64 lockend, + struct extent_state *cached_state, + u64 disk_bytenr, u64 disk_io_size, + size_t count, bool compressed, + struct iovec *iov, + struct io_uring_cmd *cmd) +{ + struct btrfs_inode *inode = BTRFS_I(file_inode(iocb->ki_filp)); + struct extent_io_tree *io_tree = &inode->io_tree; + struct page **pages; + struct btrfs_uring_priv *priv = NULL; + unsigned long nr_pages; + int ret; + + nr_pages = DIV_ROUND_UP(disk_io_size, PAGE_SIZE); + pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS); + if (!pages) + return -ENOMEM; + ret = btrfs_alloc_page_array(nr_pages, pages, 0); + if (ret) { + ret = -ENOMEM; + goto fail; + } + + priv = kmalloc(sizeof(*priv), GFP_NOFS); + if (!priv) { + ret = -ENOMEM; + goto fail; + } + + priv->iocb = *iocb; + priv->iov = iov; + priv->iter = *iter; + priv->count = count; + priv->cmd = cmd; + priv->cached_state = cached_state; + priv->compressed = compressed; + priv->nr_pages = nr_pages; + priv->pages = pages; + priv->start = start; + priv->lockend = lockend; + priv->err = 0; + + ret = btrfs_encoded_read_regular_fill_pages(inode, disk_bytenr, + disk_io_size, pages, + priv); + if (ret && ret != -EIOCBQUEUED) + goto fail; + + /* + * If we return -EIOCBQUEUED, we're deferring the cleanup to + * btrfs_uring_read_finished, which will handle unlocking the extent + * and inode and freeing the allocations. + */ + + return -EIOCBQUEUED; + +fail: + unlock_extent(io_tree, start, lockend, &cached_state); + btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED); + kfree(priv); + return ret; +} + +static int btrfs_uring_encoded_read(struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ + size_t copy_end_kernel = offsetofend(struct btrfs_ioctl_encoded_io_args, + flags); + size_t copy_end; + struct btrfs_ioctl_encoded_io_args args = { 0 }; + int ret; + u64 disk_bytenr, disk_io_size; + struct file *file = cmd->file; + struct btrfs_inode *inode = BTRFS_I(file->f_inode); + struct btrfs_fs_info *fs_info = inode->root->fs_info; + struct extent_io_tree *io_tree = &inode->io_tree; + struct iovec iovstack[UIO_FASTIOV]; + struct iovec *iov = iovstack; + struct iov_iter iter; + loff_t pos; + struct kiocb kiocb; + struct extent_state *cached_state = NULL; + u64 start, lockend; + void __user *sqe_addr = u64_to_user_ptr(READ_ONCE(cmd->sqe->addr)); + + if (!capable(CAP_SYS_ADMIN)) { + ret = -EPERM; + goto out_acct; + } + + if (issue_flags & IO_URING_F_COMPAT) { +#if defined(CONFIG_64BIT) && defined(CONFIG_COMPAT) + struct btrfs_ioctl_encoded_io_args_32 args32; + + copy_end = offsetofend(struct btrfs_ioctl_encoded_io_args_32, + flags); + if (copy_from_user(&args32, sqe_addr, copy_end)) { + ret = -EFAULT; + goto out_acct; + } + args.iov = compat_ptr(args32.iov); + args.iovcnt = args32.iovcnt; + args.offset = args32.offset; + args.flags = args32.flags; +#else + return -ENOTTY; +#endif + } else { + copy_end = copy_end_kernel; + if (copy_from_user(&args, sqe_addr, copy_end)) { + ret = -EFAULT; + goto out_acct; + } + } + + if (args.flags != 0) + return -EINVAL; + + ret = import_iovec(ITER_DEST, args.iov, args.iovcnt, ARRAY_SIZE(iovstack), + &iov, &iter); + if (ret < 0) + goto out_acct; + + if (iov_iter_count(&iter) == 0) { + ret = 0; + goto out_free; + } + + pos = args.offset; + ret = rw_verify_area(READ, file, &pos, args.len); + if (ret < 0) + goto out_free; + + init_sync_kiocb(&kiocb, file); + kiocb.ki_pos = pos; + + if (issue_flags & IO_URING_F_NONBLOCK) + kiocb.ki_flags |= IOCB_NOWAIT; + + start = ALIGN_DOWN(pos, fs_info->sectorsize); + lockend = start + BTRFS_MAX_UNCOMPRESSED - 1; + + ret = btrfs_encoded_read(&kiocb, &iter, &args, &cached_state, + &disk_bytenr, &disk_io_size); + if (ret < 0 && ret != -EIOCBQUEUED) + goto out_free; + + file_accessed(file); + + if (copy_to_user(sqe_addr + copy_end, (char *)&args + copy_end_kernel, + sizeof(args) - copy_end_kernel)) { + if (ret == -EIOCBQUEUED) { + unlock_extent(io_tree, start, lockend, &cached_state); + btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED); + } + ret = -EFAULT; + goto out_free; + } + + if (ret == -EIOCBQUEUED) { + u64 count; + + /* + * If we've optimized things by storing the iovecs on the stack, + * undo this. + */ + if (!iov) { + iov = kmalloc(sizeof(struct iovec) * args.iovcnt, + GFP_NOFS); + if (!iov) { + unlock_extent(io_tree, start, lockend, + &cached_state); + btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED); + ret = -ENOMEM; + goto out_acct; + } + + memcpy(iov, iovstack, + sizeof(struct iovec) * args.iovcnt); + } + + count = min_t(u64, iov_iter_count(&iter), disk_io_size); + + /* Match ioctl by not returning past EOF if uncompressed */ + if (!args.compression) + count = min_t(u64, count, args.len); + + ret = btrfs_uring_read_extent(&kiocb, &iter, start, lockend, + cached_state, disk_bytenr, + disk_io_size, count, + args.compression, iov, cmd); + + goto out_acct; + } + +out_free: + kfree(iov); + +out_acct: + if (ret > 0) + add_rchar(current, ret); + inc_syscr(current); + + return ret; +} + +int btrfs_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) +{ + switch (cmd->cmd_op) { + case BTRFS_IOC_ENCODED_READ: +#if defined(CONFIG_64BIT) && defined(CONFIG_COMPAT) + case BTRFS_IOC_ENCODED_READ_32: +#endif + return btrfs_uring_encoded_read(cmd, issue_flags); + } + + return -EINVAL; +} + long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h index 19cd26b0244a..2b760c8778f8 100644 --- a/fs/btrfs/ioctl.h +++ b/fs/btrfs/ioctl.h @@ -22,5 +22,7 @@ void btrfs_sync_inode_flags_to_i_flags(struct inode *inode); int __pure btrfs_is_empty_uuid(const u8 *uuid); void btrfs_update_ioctl_balance_args(struct btrfs_fs_info *fs_info, struct btrfs_ioctl_balance_args *bargs); +int btrfs_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags); +void btrfs_uring_read_extent_endio(void *ctx, int err); #endif diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 03b31b1c39be..cadb945bb345 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -5669,7 +5669,8 @@ static int send_encoded_extent(struct send_ctx *sctx, struct btrfs_path *path, ret = btrfs_encoded_read_regular_fill_pages(BTRFS_I(inode), disk_bytenr, disk_num_bytes, sctx->send_buf_pages + - (data_offset >> PAGE_SHIFT)); + (data_offset >> PAGE_SHIFT), + NULL); if (ret) goto out;