From patchwork Thu Feb 16 21:47:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13143992 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A997C636CC for ; Thu, 16 Feb 2023 21:48:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB78B6B007E; Thu, 16 Feb 2023 16:48:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A68FF6B0080; Thu, 16 Feb 2023 16:48:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9097A6B0081; Thu, 16 Feb 2023 16:48:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 7DEA96B007E for ; Thu, 16 Feb 2023 16:48:35 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 43A7D40A52 for ; Thu, 16 Feb 2023 21:48:35 +0000 (UTC) X-FDA: 80474494590.05.A9DBB06 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf18.hostedemail.com (Postfix) with ESMTP id 760961C001D for ; Thu, 16 Feb 2023 21:48:33 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Rux40GDM; spf=pass (imf18.hostedemail.com: domain of dhowells@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676584113; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3Y4j0lnR/W/3vRUqeGPawzcVqSHyHS0pEsYo/MQFDo8=; b=QI+7DUBTGDXWS4Punh8NBfG4mEqs0M5z86fTe/FFm9uOKsuQ0hSGeLt2YLqM4tu6v3b5qJ Ihae/SX1hC60FHM4HChB5gro81joP4Iqfgtgz7DR6eZJ36YtfxebhUzErxyF1+b1JDa4o+ +Sy3dLJK9w3gz4ODbIZdOx/w4S5Ad/E= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Rux40GDM; spf=pass (imf18.hostedemail.com: domain of dhowells@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676584113; a=rsa-sha256; cv=none; b=1jiMNbaeVC+UYwqDifNay5d6R6I7OP0d+Bmyrp+w3vTRG9ATWW2rFeQcrcus/tnOA6fVPn W/tK8BKb4AgL2qN4L2rW7HEs7FMU+36HJPjxDR2hN9q9jbvexc9V7WIKtNfKABc4n1GJd4 d29IvgrEKEbfrvc2dmHoMhCG7fy5pWA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676584112; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3Y4j0lnR/W/3vRUqeGPawzcVqSHyHS0pEsYo/MQFDo8=; b=Rux40GDM1C0GBerRBPzVXy6s/mu5trYKIdquNWF/Fm7sS3VyEe2fv+9It690IT5KS+oKKp mAVBfm6ucm1mSI2WDbo/1P4rwm+1P9TLw+b7nRsBGrUrZElwG1KqZ6liID6b9ghCoaXlfG N0niTIo9AUlkwY2QC8Vx4bXBpXXcC6I= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-573-hZ6PAfdZNkyg85S-HML7Vw-1; Thu, 16 Feb 2023 16:48:31 -0500 X-MC-Unique: hZ6PAfdZNkyg85S-HML7Vw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 409912800498; Thu, 16 Feb 2023 21:48:30 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 237522166B31; Thu, 16 Feb 2023 21:48:28 +0000 (UTC) From: David Howells To: Steve French Cc: David Howells , Jens Axboe , Al Viro , Shyam Prasad N , Rohith Surabattula , Tom Talpey , Stefan Metzmacher , Christoph Hellwig , Matthew Wilcox , Jeff Layton , linux-cifs@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Steve French , linux-cachefs@redhat.com Subject: [PATCH 08/17] netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator Date: Thu, 16 Feb 2023 21:47:36 +0000 Message-Id: <20230216214745.3985496-9-dhowells@redhat.com> In-Reply-To: <20230216214745.3985496-1-dhowells@redhat.com> References: <20230216214745.3985496-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 760961C001D X-Stat-Signature: 17oqrpnjp46e47x6ka6b5tt6umj4ic44 X-HE-Tag: 1676584113-712062 X-HE-Meta: U2FsdGVkX182C7Ci0xtplw5p8/OFVWBXInZXvPq8BJ/CiD9wfC0U8bfyCQw3gOVTtId5QKKOI+81bNw7Ejh9XUrC0Ur72yizVOU25k0LTJADW3ojVOgh4ixMYSMCGYId1s3mVJnSbIYCZjg3C9K+ygbtCW5jOqBjwAADMChiNGF4X0zzcv7IOMTDSoMqiZSFMzbywqM5YESNQW83QqMZoTzkpJFQClaiiiuI96MSkX85zsqSMH2ykyU6bsmB9OvsUORQBHAhngn9h0RQB8N2XT/W1gjU9wqLXxeQ5c6PEyX4p/wazwrgy95ZkF6HX7mk68sx1zyK4UfX+mVGNc8/IONIB6Xp+xE8vrN3FIvL7x3vzXAOYKoPUhYhvSc8FzeJ/WM5HwwLxx8Zd2Ic0RLCjGewrqtR6dtEEIQTXusxO3E4xQofF8O7m3on6uvaM2hWOQFZmO5/SfMxUtt3IORij6fR9GasGG7q1owYAfgb04wnr+Jl4CYnGsRyUbHd402WDPsq2kuRFkE0n7UMrvZFPuaygjFZgeXpe/8ixB1reHUdQ0T9fb/xrkzBC5aDkDImjuFIdjuupfFixDsfLSMHm8hJWB/902wEw3J3dEbQMG1j7hFND8Hm6o8rgrIeBSsoRfqBqpbK/O76Uxqms9HmIUbq8+Me3ahrn2mhCauCZ2mGqsR4vkoZjoVmrtJ9kkLAfnoKGNiAFHJ+Drn3CzWNj9GP4+5Y/OTLTHRK46Bg54rBljtlwWvytwHw5VrfteJPSbIZN4qElTNJZLyVjW0ZyT2pnRuGjKKf/FR/GU9WI0fNiovGlWlKOisO0sK67YqiGkrYoag1+uyOVoLD66Ox74n7ylA07f6verXMnTnmJmy79Ehhx7g0QhX7LJB8ejXYYNVyvpUu+OwAuTUCzdD21MLN9m89DhaYQZ0BbNCitW6naKp3K1SyDrUTLmZNkpYO6fNB4TWsSVTM/7vSdXT r0g8Men+ Q7NPqyaNuebfQ/t8ZozKbV+s7aHfqoqVgaxEfuz13jeyRT+sih9as4ZpTmIjy71XO1yymdL6CceXkVRfFstNNdf7eBh47d9RSXUWA9EVnvW5oeoPovJFKukVofIA4uL0RsBSmm5yVGzyGJkmeVS+xGo/DrmivQ3xkCfmCi0Wom7eLKKyCl8NbfSbx1bRUdObHFSSYwoKRr/kMI6EGs1gyCL/u8OU7p4h3jwMOhy3pbp/ksw3dRDGHcLYA0QkjD7tbEGTbgB4mwwae/CMXkRHcsBwtEIJ+s/RApeRQptBTWqCELaaUz0PUraHzBCmFFStAKeZ7mcFErz50dLJvkO3vUrzeGXEQRX4OJUXlAYKBCAMjRD0tzsz8OQJ11q5c2+G4g+paJPsRd86BM3/VvsiJ+6r+mA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a function to extract the pages from a user-space supplied iterator (UBUF- or IOVEC-type) into a BVEC-type iterator, retaining the pages by getting a pin on them (as FOLL_PIN) as we go. This is useful in three situations: (1) A userspace thread may have a sibling that unmaps or remaps the process's VM during the operation, changing the assignment of the pages and potentially causing an error. Retaining the pages keeps some pages around, even if this occurs; futher, we find out at the point of extraction if EFAULT is going to be incurred. (2) Pages might get swapped out/discarded if not retained, so we want to retain them to avoid the reload causing a deadlock due to a DIO from/to an mmapped region on the same file. (3) The iterator may get passed to sendmsg() by the filesystem. If a fault occurs, we may get a short write to a TCP stream that's then tricky to recover from. We don't deal with other types of iterator here, leaving it to other mechanisms to retain the pages (eg. PG_locked, PG_writeback and the pipe lock). Signed-off-by: David Howells cc: Jeff Layton cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: linux-cachefs@redhat.com cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org --- fs/netfs/Makefile | 1 + fs/netfs/iterator.c | 103 ++++++++++++++++++++++++++++++++++++++++++ include/linux/netfs.h | 4 ++ 3 files changed, 108 insertions(+) create mode 100644 fs/netfs/iterator.c diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile index f684c0cd1ec5..386d6fb92793 100644 --- a/fs/netfs/Makefile +++ b/fs/netfs/Makefile @@ -3,6 +3,7 @@ netfs-y := \ buffered_read.o \ io.o \ + iterator.o \ main.o \ objects.o diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c new file mode 100644 index 000000000000..6f0d79080abc --- /dev/null +++ b/fs/netfs/iterator.c @@ -0,0 +1,103 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Iterator helpers. + * + * Copyright (C) 2022 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include +#include "internal.h" + +/** + * netfs_extract_user_iter - Extract the pages from a user iterator into a bvec + * @orig: The original iterator + * @orig_len: The amount of iterator to copy + * @new: The iterator to be set up + * @extraction_flags: Flags to qualify the request + * + * Extract the page fragments from the given amount of the source iterator and + * build up a second iterator that refers to all of those bits. This allows + * the original iterator to disposed of. + * + * @extraction_flags can have ITER_ALLOW_P2PDMA set to request peer-to-peer DMA be + * allowed on the pages extracted. + * + * On success, the number of elements in the bvec is returned, the original + * iterator will have been advanced by the amount extracted. + * + * The iov_iter_extract_mode() function should be used to query how cleanup + * should be performed. + */ +ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len, + struct iov_iter *new, + iov_iter_extraction_t extraction_flags) +{ + struct bio_vec *bv = NULL; + struct page **pages; + unsigned int cur_npages; + unsigned int max_pages; + unsigned int npages = 0; + unsigned int i; + ssize_t ret; + size_t count = orig_len, offset, len; + size_t bv_size, pg_size; + + if (WARN_ON_ONCE(!iter_is_ubuf(orig) && !iter_is_iovec(orig))) + return -EIO; + + max_pages = iov_iter_npages(orig, INT_MAX); + bv_size = array_size(max_pages, sizeof(*bv)); + bv = kvmalloc(bv_size, GFP_KERNEL); + if (!bv) + return -ENOMEM; + + /* Put the page list at the end of the bvec list storage. bvec + * elements are larger than page pointers, so as long as we work + * 0->last, we should be fine. + */ + pg_size = array_size(max_pages, sizeof(*pages)); + pages = (void *)bv + bv_size - pg_size; + + while (count && npages < max_pages) { + ret = iov_iter_extract_pages(orig, &pages, count, + max_pages - npages, extraction_flags, + &offset); + if (ret < 0) { + pr_err("Couldn't get user pages (rc=%zd)\n", ret); + break; + } + + if (ret > count) { + pr_err("get_pages rc=%zd more than %zu\n", ret, count); + break; + } + + count -= ret; + ret += offset; + cur_npages = DIV_ROUND_UP(ret, PAGE_SIZE); + + if (npages + cur_npages > max_pages) { + pr_err("Out of bvec array capacity (%u vs %u)\n", + npages + cur_npages, max_pages); + break; + } + + for (i = 0; i < cur_npages; i++) { + len = ret > PAGE_SIZE ? PAGE_SIZE : ret; + bv[npages + i].bv_page = *pages++; + bv[npages + i].bv_offset = offset; + bv[npages + i].bv_len = len - offset; + ret -= len; + offset = 0; + } + + npages += cur_npages; + } + + iov_iter_bvec(new, orig->data_source, bv, npages, orig_len - count); + return npages; +} +EXPORT_SYMBOL_GPL(netfs_extract_user_iter); diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 4c76ddfb6a67..b11a84f6c32b 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -17,6 +17,7 @@ #include #include #include +#include enum netfs_sreq_ref_trace; @@ -296,6 +297,9 @@ void netfs_get_subrequest(struct netfs_io_subrequest *subreq, void netfs_put_subrequest(struct netfs_io_subrequest *subreq, bool was_async, enum netfs_sreq_ref_trace what); void netfs_stats_show(struct seq_file *); +ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len, + struct iov_iter *new, + iov_iter_extraction_t extraction_flags); /** * netfs_inode - Get the netfs inode context from the inode