From patchwork Thu Jun 29 15:54:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13297119 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F4DAEB64D9 for ; Thu, 29 Jun 2023 15:55:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 701F88D0005; Thu, 29 Jun 2023 11:55:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 68AE18D0001; Thu, 29 Jun 2023 11:55:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 503E08D0005; Thu, 29 Jun 2023 11:55:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 39CEB8D0001 for ; Thu, 29 Jun 2023 11:55:07 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 04F5F140A70 for ; Thu, 29 Jun 2023 15:55:06 +0000 (UTC) X-FDA: 80956234254.28.9144492 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf29.hostedemail.com (Postfix) with ESMTP id 2A2D0120014 for ; Thu, 29 Jun 2023 15:55:04 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=G9XzZqYO; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf29.hostedemail.com: domain of dhowells@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688054105; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2eueBEW3aL+xqYTSeuQ3xb53C/MZpWPSC4TMM2avShg=; b=ElZtu0vpPd7AvHxrw13Jg3pEnU7rZANoFMGMFAs4S6ViqanxAdFvFdgH+jJ8eh8Osgjuhn C+EP1DS7AK5LN9J+dcyIWYVvj/75YCnzOGFAE7pJmiHAW4p2iw+Ojm1NfqPb+5aqSH7tr3 Iqb6u/S+25+MXhz15wdK6iqPgvKO9PU= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=G9XzZqYO; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf29.hostedemail.com: domain of dhowells@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688054105; a=rsa-sha256; cv=none; b=KLhA5i9xckOf/cSIrefZ3AWOrS3WR1LLqZkKHo8RhqOu2KjbboAjxDhTbUKjRQxg7Lex7J 3ov6zU9w7yVS/kQyHYO++ZHl0d2djDisZOOcdHQjLnh2/maCD871u4JugDIhMFsAobvfkJ 1iNeVWtXfEb/PYGYMttSjjEAb+GbZ5I= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1688054104; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2eueBEW3aL+xqYTSeuQ3xb53C/MZpWPSC4TMM2avShg=; b=G9XzZqYOU/JTDCv8bVABRfWkJn3eN8FJFRcvnYtMrU+jHNmMBpkgziQJsXMBxEq3PA2zv+ ZvRd4PPVkpcyh6CjPbUCzwNDpUp1IgKMp+bDCVAMxl+7ol0I6Bhq5kvlz7qyazWCnGzImv UaH8DaClqOqj1ymP+LiwwVrbC4wllkk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-381-pxvax7y_Nu26XJJKh4xM3A-1; Thu, 29 Jun 2023 11:54:59 -0400 X-MC-Unique: pxvax7y_Nu26XJJKh4xM3A-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 66B8A280AA42; Thu, 29 Jun 2023 15:54:39 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.4]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0EF7D4CD0C3; Thu, 29 Jun 2023 15:54:37 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Matthew Wilcox , Dave Chinner , Matt Whitlock , Linus Torvalds , Jens Axboe , linux-fsdevel@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Christoph Hellwig , linux-fsdevel@vger.kernel.org Subject: [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns Date: Thu, 29 Jun 2023 16:54:30 +0100 Message-ID: <20230629155433.4170837-2-dhowells@redhat.com> In-Reply-To: <20230629155433.4170837-1-dhowells@redhat.com> References: <20230629155433.4170837-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 2A2D0120014 X-Stat-Signature: wd4e5yae9nkbzm9qufqhjphtyh38ffyz X-HE-Tag: 1688054104-573848 X-HE-Meta: U2FsdGVkX19HXkxbwJyGuSnIYrzgRIqYT/CJ+AozL1UO+AKYp5Yb1aNVdbpiJLSTvx64IIQH4cIzIrhNWymmYifRgk8AOMkkdvOK6xXikbada3qgydlCKy/Rj9IAFkYqzjT88rJGGaKAqJzpwNhhj/KpJZq4hKzthXbLOwKhpOE4SmQhUt6PVinys3My51wlxrT/sF//eKBDJX5wSVrvyFHARHtTxntL/daZBs1sKV0clGN00V3SOHdPo8ru4Ia+UKbHmrQSxPA55QpuvQu+Z2ZjSbZ2U4mzok6RHxHjYj9sX+dbLHL17WUerqgKMq+wFmm2EWmAPJjLC+KRol8+ITu7MJD95mMKuBIjVOYaOahrz+Jlm4ErQflgc0HtqHZjul0OxpV7cOEXMUeAWxNInW30oaCpbgRDVqGSrEHz3f5ifGQdC9M6yMe4SEf5PcWTW6fha6LMo7TIF0gTiGnsWCaszoFIUjcYzXM3KXna04ke8Axw5okgv/M+jWV3WTgq0B3TsRRe871ujXJDrUKgkKIVyrKHwZJsCmK8m/oZ40HcPN5JBmuF5LAVclSQjmj4OiCsejzLn0maaC4kZ0KmIDyiqMpuH0IOa3mV83w8TcnrwY7Ng1C6r/lbnRQQCs/kODju9aeQygFFEw1xhWwp1OtS0cRX0eaTvdrpMss6ue+qYxMQYKoGylq1Lj3LQcE5JtP3PjJ4RMJAS9WNXWvy2WdjG4K9dr6VPb0TLOpIzHLqseM+5QZZTo2mFwHSWsGZIpGQZocqBAh8cksSUEhvBN3YUlyLDPDgl4fl6tHxaxvNNii9hcpiNMf5/B2ATxOHYcOTmxgiFqGx8zgf6u56k1of4Adz1toB2H2jFfDhvoTUxk9DDJ+w5TgVF1clOc3AtLsyWKHVtW0mv+noA8qw7+KKh33AOLF0XBGGN0374slEmzGHgO7IoCQkdXJ2UPlOnXMFKFagdnNiPlpyvQF 7vgNlX1T GTs5ab70rFw/Ya83QZYmEvQ/V05huN2no2/0L1oxWdRVdKOog4rOfOellqP2C4te9O2HoznbecYKVT5809L6VMOBqPUYFX9zB3d3K5vL7RTpb5msWBCDr0c7V7a5WNrwJl28WvHJSAnvoE9F1uuH6GzxMownbcHufnO7xjgbUTc2KySB/2Cv8niqdzj+f7grsV7LRWubpSu3gnVloInjpLGVhLNx/ukIyU5rdZR+K+EvJ8fzYwkchoMcnezRYeFN/SQcIo97pv4oUIpUErwjmeeSAXqqNPL5+7hfhoxK2osnILPMUMJrlWlCZnYKxtpXcxaLm95jH7F7WNdBz6VGf6qsd/CU7KBYknoDi X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Splicing data from, say, a file into a pipe currently leaves the source pages in the pipe after splice() returns - but this means that those pages can be subsequently modified by shared-writable mmap(), write(), fallocate(), etc. before they're consumed. Fix this by stealing the pages in splice() before they're added to the pipe if no one else is using them or has them mapped and copying them otherwise. Reported-by: Matt Whitlock Link: https://lore.kernel.org/r/ec804f26-fa76-4fbe-9b1c-8fbbd829b735@mattwhitlock.name/ Signed-off-by: David Howells cc: Matthew Wilcox cc: Dave Chinner cc: Christoph Hellwig cc: Jens Axboe cc: linux-fsdevel@vger.kernel.org --- mm/filemap.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++++--- mm/internal.h | 4 +-- mm/shmem.c | 8 +++-- 3 files changed, 95 insertions(+), 9 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 9e44a49bbd74..a002df515966 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2838,15 +2838,87 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) } EXPORT_SYMBOL(generic_file_read_iter); +static inline void copy_folio_to_folio(struct folio *src, size_t src_offset, + struct folio *dst, size_t dst_offset, + size_t size) +{ + void *p, *q; + + while (size > 0) { + size_t part = min3(PAGE_SIZE - src_offset % PAGE_SIZE, + PAGE_SIZE - dst_offset % PAGE_SIZE, + size); + + p = kmap_local_folio(src, src_offset); + q = kmap_local_folio(dst, dst_offset); + memcpy(q, p, part); + kunmap_local(p); + kunmap_local(q); + src_offset += part; + dst_offset += part; + size -= part; + } +} + /* - * Splice subpages from a folio into a pipe. + * Splice data from a folio into a pipe. The folio is stolen if no one else is + * using it and copied otherwise. We can't put the folio into the pipe still + * attached to the pagecache as that allows someone to modify it after the + * splice. */ -size_t splice_folio_into_pipe(struct pipe_inode_info *pipe, - struct folio *folio, loff_t fpos, size_t size) +ssize_t splice_folio_into_pipe(struct pipe_inode_info *pipe, + struct folio *folio, loff_t fpos, size_t size) { + struct address_space *mapping; + struct folio *copy = NULL; struct page *page; + unsigned int flags = 0; + ssize_t ret; size_t spliced = 0, offset = offset_in_folio(folio, fpos); + folio_lock(folio); + + mapping = folio_mapping(folio); + ret = -ENODATA; + if (!folio->mapping) + goto err_unlock; /* Truncated */ + ret = -EIO; + if (!folio_test_uptodate(folio)) + goto err_unlock; + + /* + * At least for ext2 with nobh option, we need to wait on writeback + * completing on this folio, since we'll remove it from the pagecache. + * Otherwise truncate wont wait on the folio, allowing the disk blocks + * to be reused by someone else before we actually wrote our data to + * them. fs corruption ensues. + */ + folio_wait_writeback(folio); + + if (folio_has_private(folio) && + !filemap_release_folio(folio, GFP_KERNEL)) + goto need_copy; + + /* If we succeed in removing the mapping, set LRU flag and add it. */ + if (remove_mapping(mapping, folio)) { + folio_unlock(folio); + flags = PIPE_BUF_FLAG_LRU; + goto add_to_pipe; + } + +need_copy: + folio_unlock(folio); + + copy = folio_alloc(GFP_KERNEL, 0); + if (!copy) + return -ENOMEM; + + size = min(size, PAGE_SIZE - offset % PAGE_SIZE); + copy_folio_to_folio(folio, offset, copy, 0, size); + folio = copy; + offset = 0; + +add_to_pipe: page = folio_page(folio, offset / PAGE_SIZE); size = min(size, folio_size(folio) - offset); offset %= PAGE_SIZE; @@ -2861,6 +2933,7 @@ size_t splice_folio_into_pipe(struct pipe_inode_info *pipe, .page = page, .offset = offset, .len = part, + .flags = flags, }; folio_get(folio); pipe->head++; @@ -2869,7 +2942,13 @@ size_t splice_folio_into_pipe(struct pipe_inode_info *pipe, offset = 0; } + if (copy) + folio_put(copy); return spliced; + +err_unlock: + folio_unlock(folio); + return ret; } /** @@ -2947,7 +3026,7 @@ ssize_t filemap_splice_read(struct file *in, loff_t *ppos, for (i = 0; i < folio_batch_count(&fbatch); i++) { struct folio *folio = fbatch.folios[i]; - size_t n; + ssize_t n; if (folio_pos(folio) >= end_offset) goto out; @@ -2963,8 +3042,11 @@ ssize_t filemap_splice_read(struct file *in, loff_t *ppos, n = min_t(loff_t, len, isize - *ppos); n = splice_folio_into_pipe(pipe, folio, *ppos, n); - if (!n) + if (n <= 0) { + if (n < 0) + error = n; goto out; + } len -= n; total_spliced += n; *ppos += n; diff --git a/mm/internal.h b/mm/internal.h index a7d9e980429a..ae395e0f31d5 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -881,8 +881,8 @@ struct migration_target_control { /* * mm/filemap.c */ -size_t splice_folio_into_pipe(struct pipe_inode_info *pipe, - struct folio *folio, loff_t fpos, size_t size); +ssize_t splice_folio_into_pipe(struct pipe_inode_info *pipe, + struct folio *folio, loff_t fpos, size_t size); /* * mm/vmalloc.c diff --git a/mm/shmem.c b/mm/shmem.c index 2f2e0e618072..969931b0f00e 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2783,7 +2783,8 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos, struct inode *inode = file_inode(in); struct address_space *mapping = inode->i_mapping; struct folio *folio = NULL; - size_t total_spliced = 0, used, npages, n, part; + ssize_t n; + size_t total_spliced = 0, used, npages, part; loff_t isize; int error = 0; @@ -2844,8 +2845,11 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos, n = splice_zeropage_into_pipe(pipe, *ppos, len); } - if (!n) + if (n <= 0) { + if (n < 0) + error = n; break; + } len -= n; total_spliced += n; *ppos += n; From patchwork Thu Jun 29 15:54:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13297122 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 973A0C0015E for ; Thu, 29 Jun 2023 15:55:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2BC308D0006; Thu, 29 Jun 2023 11:55:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 246C68E0001; Thu, 29 Jun 2023 11:55:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 098228D0008; Thu, 29 Jun 2023 11:55:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id DE17F8D0006 for ; Thu, 29 Jun 2023 11:55:16 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 9B91BB05AB for ; Thu, 29 Jun 2023 15:55:16 +0000 (UTC) X-FDA: 80956234632.16.0FAB24A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf06.hostedemail.com (Postfix) with ESMTP id AB1AA180008 for ; Thu, 29 Jun 2023 15:55:14 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=hw4pABhX; spf=pass (imf06.hostedemail.com: domain of dhowells@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688054114; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dlZE9cCK+MsXM3xyy8z+SEcTpg9WUnrn3xFqBWWAVgI=; b=dU8xzunlUnc5dGovWmJv8bw2aaYmYVmW2DBfN3COC50PN7duvs4JQyOGNT2yR8Fq7/M231 XRyHfUO1QKh91nAR8x5MbP8Q79uj/z1Dq9h6gCraoq83W+exIS2P4Duouk+tKqds/kBEj9 R0wOWNRl6PiwGMuLBOMvmjxNvEgnIAo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688054114; a=rsa-sha256; cv=none; b=G62rX/Zeo3xCjSpX6PKmA/R49Huox07+WQHQE08yrSClRs/A5+RgrfeSDS3HnMzf9NaV5i 1lqgLiU4HqfU5zhh1qbuQGRf6YX1xNivuwDP40KP2wcuMpF55sKybdnEsdQLiRYdD8SIvu bnrBjSirMlYB+l6Jld0SJc8ZxMlsCrc= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=hw4pABhX; spf=pass (imf06.hostedemail.com: domain of dhowells@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1688054114; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dlZE9cCK+MsXM3xyy8z+SEcTpg9WUnrn3xFqBWWAVgI=; b=hw4pABhX6UvKgb/BkB8l4buROX18eB23zJltUCUXYFpezatFq9SacFtxa7mEzdHqbIfwTV pmVscEk8Pqkryzf5qcwuwZMQDAHERzcxXRy1P6CjlDZsSxm/5mS/dgQky2IG2l1dMgG57O SI+pJ3exWSEbQhmYTwTDxq69yYZDHf4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-631-GQN9IkHCNG6moXhhB7DL1w-1; Thu, 29 Jun 2023 11:55:05 -0400 X-MC-Unique: GQN9IkHCNG6moXhhB7DL1w-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6212C8A5150; Thu, 29 Jun 2023 15:54:41 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.4]) by smtp.corp.redhat.com (Postfix) with ESMTP id F1DA41121315; Thu, 29 Jun 2023 15:54:39 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Matthew Wilcox , Dave Chinner , Matt Whitlock , Linus Torvalds , Jens Axboe , linux-fsdevel@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Christoph Hellwig , linux-fsdevel@vger.kernel.org Subject: [RFC PATCH 2/4] splice: Make vmsplice() steal or copy Date: Thu, 29 Jun 2023 16:54:31 +0100 Message-ID: <20230629155433.4170837-3-dhowells@redhat.com> In-Reply-To: <20230629155433.4170837-1-dhowells@redhat.com> References: <20230629155433.4170837-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Rspamd-Queue-Id: AB1AA180008 X-Rspam-User: X-Stat-Signature: tktophoq8t7zn3mxixngie1xfrzssw5k X-Rspamd-Server: rspam03 X-HE-Tag: 1688054114-54916 X-HE-Meta: U2FsdGVkX19ykaXphIQbbgrDjhLEvaa/bjAsOEA5e53YPgcbT1CAnBPOaTA0panV/R78yHZzW3j+J4thzpFrvlHWVol1bYuNfUVZ5tAv29++pmYA5Wk40YFfoz3JR12Kpx1GstVDqkd5aJre7tpm7+BClXP4rZT9Mam4VAtYFc+F4gBOiIEWdSOfEplzyy9ehxD6k1JgZqCIDy54qcALCceU/wMBjti2QclOLF1lWNHZWLFpstgLPHyfVJNTw2o6RmoR6a+1XXLi1GwB7WIm9H+c3f8SPdyhuC8EINtpRlMbSsgiX6CyED0jcMXKZJKKUVS4O19naJTd+NLyDOvZB76LKFoIpM3lkjGeu4INg91f1VkZBFou5+pXmDDaVwE/PD6F89lGa1sy8nXob30mzs9rs8Sffr51hptAlV5W9zNewfbvLt3bNUAwqXyS8lvRIUqnoOowGClxQVgssfNVj5QxWZx4L6B0C9IagLCInexT3oXpsE4XDNCSgOMub6lSiEaMVFv/lU4qi6wSuhGnVjq8xD1dS9uif+ZI9NnqXl0gHkEU1EtvCFJZw4q7ateupFRs1kb7DnqAmSuaIPSTJNEFJYxy5jdhCvWSt8RJT7VL+LXasNQ11Femh2VIPwwj/Xr8LR2D7a4REBj2VhWuS761nsAvMnvGdwsO7gwmIwlHdJqgrXaTEftTTl4T1wbIuIDdvJx8w1/ZjZ3WNL6ISNTTUQlFjTGpmDrmNcm2LtRhw5vWpwhRZNTLfy9HY1s8ek/v1agGYA5ZZQ8NcWtv9F94MZc1VdjJSn2o229+12LWMn0laBMP6Yr2+fpgZci2OsrZr/y/8M/JmYjg3EIZGy4CbWcydHdqHcFXQox3EKTFjiQlYregBePHJQaEpz18nixaR7lX7wwb9kIlH53Wg7zFpEQP+In9zEefzSSlYD/swIoCHrXGMSlLurlJEymBIdBtPXVn8X6Ds9W+jQ9 RhKM+481 QeawKTAqmlkpMbswQvxeuVTFULMYdzKY4tQbjpJbMv64Ydq9E+QPqB553USuGO98lPAT1pfJQiw/x9yE6pB0wMIgfnF5R6QESbnmpO1hqDJ2DMbStq+WCRa3fw10VSxjP/0qGHM5eGi0hC6/KHf2LHmoJX8TH8Q0/WvJDMut22nFIhgyS9FaVadjDGD7XEDKqU3rko5fYFhdJtChirEWR/hB4k0uLoVsvfUuedH+BDxWdbiqreKaqwOuFHWvNJEqt+rIDYVSukiSDtq0VYYDGdySh3bpjtx74juu4Hem0m0akU9u3HWPLaSb4kHAUVJ5S3B8nKtXCtpul3kzknNT8BPO1o5HEJa1Z0DVn X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Make vmsplice()-to-pipe try to steal gifted data or else copy the source data immediately before adding it to the pipe. This prevents the data added to the pipe from being modified by write(), by shared-writable mmap and by fallocate(). [!] Note: I'm using unmap_mapping_folio() and remove_mapping() to steal a gifted page on behalf of vmsplice(). It works partly, but after a large batch of stealing, it will oops, but I can't tell why as it dies in the middle of a huge chunk of macro-generated interval tree code. [!] Note: I'm only allowing theft of pages with refcount <= 4. refcount == 3 would actually seem to be the right thing (one for the caller, one for the pagecache and one for our page table), but sometimes a fourth ref is held transiently (possibly deferred put from page-in). Reported-by: Matt Whitlock Link: https://lore.kernel.org/r/ec804f26-fa76-4fbe-9b1c-8fbbd829b735@mattwhitlock.name/ Signed-off-by: David Howells cc: Matthew Wilcox cc: Dave Chinner cc: Christoph Hellwig cc: Jens Axboe cc: linux-fsdevel@vger.kernel.org --- fs/splice.c | 123 +++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 113 insertions(+), 10 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index 004eb1c4ce31..42af642c0ff8 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -37,6 +37,7 @@ #include #include +#include "../mm/internal.h" #include "internal.h" /* @@ -1382,14 +1383,117 @@ static long __do_splice(struct file *in, loff_t __user *off_in, return ret; } +static void copy_folio_to_folio(struct folio *src, size_t src_offset, + struct folio *dst, size_t dst_offset, + size_t size) +{ + void *p, *q; + + while (size > 0) { + size_t part = min3(PAGE_SIZE - src_offset % PAGE_SIZE, + PAGE_SIZE - dst_offset % PAGE_SIZE, + size); + + p = kmap_local_folio(src, src_offset); + q = kmap_local_folio(dst, dst_offset); + memcpy(q, p, part); + kunmap_local(p); + kunmap_local(q); + src_offset += part; + dst_offset += part; + size -= part; + } +} + +static int splice_try_to_steal_page(struct pipe_inode_info *pipe, + struct page *page, size_t offset, + size_t size, unsigned int splice_flags) +{ + struct folio *folio = page_folio(page), *copy; + unsigned int flags = 0; + size_t fsize = folio_size(folio), spliced = 0; + + if (!(splice_flags & SPLICE_F_GIFT) || + fsize != PAGE_SIZE || offset != 0 || size != fsize) + goto need_copy; + + /* + * For a folio to be stealable, the caller holds a ref, the mapping + * holds a ref and the page tables hold a ref; it may or may not also + * be on the LRU. Anything else and someone else has access to it. + */ + if (folio_ref_count(folio) > 4 || folio_mapcount(folio) != 1 || + folio_maybe_dma_pinned(folio)) + goto need_copy; + + /* Try to steal. */ + folio_lock(folio); + + if (folio_ref_count(folio) > 4 || folio_mapcount(folio) != 1 || + folio_maybe_dma_pinned(folio)) + goto need_copy_unlock; + if (!folio->mapping) + goto need_copy_unlock; /* vmsplice race? */ + + /* + * Remove the folio from the process VM and then try to remove + * it from the mapping. It we can't remove it, we'll have to + * copy it instead. + */ + unmap_mapping_folio(folio); + if (remove_mapping(folio->mapping, folio)) { + folio_clear_mappedtodisk(folio); + flags |= PIPE_BUF_FLAG_LRU; + goto add_to_pipe; + } + +need_copy_unlock: + folio_unlock(folio); +need_copy: + + copy = folio_alloc(GFP_KERNEL, 0); + if (!copy) + return -ENOMEM; + + size = min(size, PAGE_SIZE - offset % PAGE_SIZE); + copy_folio_to_folio(folio, offset, copy, 0, size); + folio_mark_uptodate(copy); + folio_put(folio); + folio = copy; + offset = 0; + +add_to_pipe: + page = folio_page(folio, offset / PAGE_SIZE); + size = min(size, folio_size(folio) - offset); + offset %= PAGE_SIZE; + + while (spliced < size && + !pipe_full(pipe->head, pipe->tail, pipe->max_usage)) { + struct pipe_buffer *buf = pipe_head_buf(pipe); + size_t part = min_t(size_t, PAGE_SIZE - offset, size - spliced); + + *buf = (struct pipe_buffer) { + .ops = &default_pipe_buf_ops, + .page = page, + .offset = offset, + .len = part, + .flags = flags, + }; + folio_get(folio); + pipe->head++; + page++; + spliced += part; + offset = 0; + } + + folio_put(folio); + return spliced; +} + static int iter_to_pipe(struct iov_iter *from, struct pipe_inode_info *pipe, unsigned flags) { - struct pipe_buffer buf = { - .ops = &user_page_pipe_buf_ops, - .flags = flags - }; size_t total = 0; int ret = 0; @@ -1407,12 +1511,11 @@ static int iter_to_pipe(struct iov_iter *from, n = DIV_ROUND_UP(left + start, PAGE_SIZE); for (i = 0; i < n; i++) { - int size = min_t(int, left, PAGE_SIZE - start); + size_t part = min_t(size_t, left, + PAGE_SIZE - start % PAGE_SIZE); - buf.page = pages[i]; - buf.offset = start; - buf.len = size; - ret = add_to_pipe(pipe, &buf); + ret = splice_try_to_steal_page(pipe, pages[i], start, + part, flags); if (unlikely(ret < 0)) { iov_iter_revert(from, left); // this one got dropped by add_to_pipe() @@ -1421,7 +1524,7 @@ static int iter_to_pipe(struct iov_iter *from, goto out; } total += ret; - left -= size; + left -= part; start = 0; } } From patchwork Thu Jun 29 15:54:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13297120 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22A9EC0015E for ; Thu, 29 Jun 2023 15:55:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C80348D0001; Thu, 29 Jun 2023 11:55:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C087E8D0006; Thu, 29 Jun 2023 11:55:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A5ADB8D0001; Thu, 29 Jun 2023 11:55:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 89D658D0006 for ; Thu, 29 Jun 2023 11:55:07 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 43CFD40141 for ; Thu, 29 Jun 2023 15:55:07 +0000 (UTC) X-FDA: 80956234254.24.4D18A3C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf01.hostedemail.com (Postfix) with ESMTP id 6A1A74001D for ; Thu, 29 Jun 2023 15:55:04 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HYA6vBgT; spf=pass (imf01.hostedemail.com: domain of dhowells@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688054104; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oe2QNXSzQeZLuqUZwzrEJvS7XHDU0AAQ2jNLxvMy70I=; b=YpS00fE+b0AtnoRW5hH2DHBc2V6bDXbKiThxGbECa4oESC5K0aCzPscaANhAtapCYB+9kE 3v2jNUXFva30uzYtvH3VlIP4SKZ5zH4JcjPvzqvpzcF39hHWcP3hlnapFdE6Lpros72NUX 93g8keu2ee/77PGqFeKLSPhcz0HvkeY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688054104; a=rsa-sha256; cv=none; b=zuoouamFjiwjm64KZAzMwQ0RoigSHDUbRu42DfLswbw8UvgLBWtJcqGk3MsyNzTOuMCQrE YJwfLOl90YybmGznqcBlXo9AnB737AQNqnEBDIlJH27sid3PwBfv3uc7II7M1HScYK3ffl 0yJmU5WOHmh6SEOceYknms7rwsT09po= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HYA6vBgT; spf=pass (imf01.hostedemail.com: domain of dhowells@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1688054103; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oe2QNXSzQeZLuqUZwzrEJvS7XHDU0AAQ2jNLxvMy70I=; b=HYA6vBgTiA9QUha+OMGJ8gMw3F6t0bJvjsvBBbrah8L0bbkHPjus/wlbkpqwJ+sq8+UIK1 LKtjpSGdkuiRjYHBTpTQa00nlsHSpfzlaRfKineLbodppk5IJujTaSZRzhRtAHU0BNb/K1 8muqf4cZti5bHUlwvexOtHt2JSTsK3U= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-551-iYBDvSvpPX2FsZBV3XlVEw-1; Thu, 29 Jun 2023 11:54:59 -0400 X-MC-Unique: iYBDvSvpPX2FsZBV3XlVEw-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 99C0F8C589D; Thu, 29 Jun 2023 15:54:43 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.4]) by smtp.corp.redhat.com (Postfix) with ESMTP id 15B2F429543; Thu, 29 Jun 2023 15:54:41 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Matthew Wilcox , Dave Chinner , Matt Whitlock , Linus Torvalds , Jens Axboe , linux-fsdevel@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Christoph Hellwig , linux-fsdevel@vger.kernel.org Subject: [RFC PATCH 3/4] splice: Remove some now-unused bits Date: Thu, 29 Jun 2023 16:54:32 +0100 Message-ID: <20230629155433.4170837-4-dhowells@redhat.com> In-Reply-To: <20230629155433.4170837-1-dhowells@redhat.com> References: <20230629155433.4170837-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Rspamd-Queue-Id: 6A1A74001D X-Rspam-User: X-Stat-Signature: 7gk357s9imdy4oguqt3xos57z15sdbw8 X-Rspamd-Server: rspam03 X-HE-Tag: 1688054104-212218 X-HE-Meta: U2FsdGVkX19iA1HGmloPC4goP4DB93t24eNcSzsGlpcFu4ivO9qhxcEPJ+lMymiYBpumhF70xTn2dTkMOSqIxQ8joT42bLUDBriVqFCgMBOfRQ49jcfMLmBCe0KPtkL6ZRXLHzTlAz17L7AQ0g7DeOOUehaZ7Ylw0DZXrD75mxQyY2nv5qakVJGr42LM20buFI/KhzT+c11ye/j7r1bPt4VwQ/70l0HMZzLH/hDGrV8+spjYN8vjg99yCG+EMwJICu89+oEG/O1PGm+wAc0qVHki8hnVFLWNfjN22d0p9Z6RC8VbJMrCv8h52jqbqeVHs4ws5aYjnqjg8H+FuZ8oHWdBqDUJ8Rq81N8c0t/CA4UCwk4Af411DWrvJYccEdYNqhFklNJBacdXVRgiSuQQsgxuFipBaUCsJM+JtwqZ5W6UviBS2izVGCmdRnX2K2iqK+rZ5NFBnZbKCOvitEwCTcD30XLLzDOePzxSKhA9YZOuKhrNRY0bLwoULg5sCa4RiWyWsYSGVnVUI4mvCzZEfuM4KavyV/4ykYOs4jHT9Umt8SWfhRVx0q5A+0nK6hiBrudfOpB89q5nBmCkk4PwFu6eJT+geV6OhSu64E6+g6/j9vZlJgVxEjqaEMpZT24z96oYcOdwYd8Rdg8vxi8nKA+DLdhYxWU+qzYkvDkWdT6YvyplmVx4GMftbWBjEwyGMdQth66z7OJ8qojRktjExjtQ7LX8J83BqlJIbqrUA5Y8rmaLQYHpaLkE9goOt5A33fzuuYQGrmNnOpFnjteWrSYqONIBwuM1zCYL2owmpM13n9XMf50bG7FZebx+HSPM4ouyta6A5AFIU1Fz01xfYjntIMvG1Ow8b3AznrJ8bhXMQ5Md9fIuLWFaMs42PC0ZgoBJfuLAmjKEmNRXF4YpP4iKXPi08w090Fhxmtma8l/Y5/fd1xErT/VcQa75AnQ5pz2Cosm05wkVdZ95Y67 JBeAFzWt 46MUy3mpwJ3vrVcw3n2B3MlA6aS6XQxHbe5JGeHp+oVxTJTV87EcQGGVnqmiBJf8l1JBfE/yt+NHNQjuMlEO8ODTNuc5nWc6yhv9tf4ctLBcVu8sOnKou4MbVVOfwucth1fllDVh86eYvTp6knoFkuYooWKv+7+PsYasBSvTuY8wgZJDAWGTfvm9VsJYiPMlY6ttyGq6D2uwV1nJO68UxyCNfkNJMX2i0fMMFPpoSjDRw6Nm/hNUcto/Y3Rw6QSWq8iIWDE7k7Gwlry4b3b+hJ9vHnOLbMHlJkE3vvNmyT99dRb9a2wKPd14gZnsWfH4b6DGNqEZhb9+fSd3fqaMVj8PUhEwcilUaVZkO X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Remove some code that's no longer used as the ->confirm() op is no longer used and pages spliced in from the pagecache and process VM are now pre-stolen or copied. Signed-off-by: David Howells cc: Matthew Wilcox cc: Dave Chinner cc: Christoph Hellwig cc: Jens Axboe cc: linux-fsdevel@vger.kernel.org --- fs/fuse/dev.c | 37 --------- fs/pipe.c | 12 --- fs/splice.c | 155 +------------------------------------- include/linux/pipe_fs_i.h | 14 ---- include/linux/splice.h | 1 - mm/filemap.c | 2 +- 6 files changed, 3 insertions(+), 218 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 1a8f82f478cb..9718dce0f0d9 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -700,10 +700,6 @@ static int fuse_copy_fill(struct fuse_copy_state *cs) struct pipe_buffer *buf = cs->pipebufs; if (!cs->write) { - err = pipe_buf_confirm(cs->pipe, buf); - if (err) - return err; - BUG_ON(!cs->nr_segs); cs->currbuf = buf; cs->pg = buf->page; @@ -766,26 +762,6 @@ static int fuse_copy_do(struct fuse_copy_state *cs, void **val, unsigned *size) return ncpy; } -static int fuse_check_folio(struct folio *folio) -{ - if (folio_mapped(folio) || - folio->mapping != NULL || - (folio->flags & PAGE_FLAGS_CHECK_AT_PREP & - ~(1 << PG_locked | - 1 << PG_referenced | - 1 << PG_uptodate | - 1 << PG_lru | - 1 << PG_active | - 1 << PG_workingset | - 1 << PG_reclaim | - 1 << PG_waiters | - LRU_GEN_MASK | LRU_REFS_MASK))) { - dump_page(&folio->page, "fuse: trying to steal weird page"); - return 1; - } - return 0; -} - static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep) { int err; @@ -800,10 +776,6 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep) fuse_copy_finish(cs); - err = pipe_buf_confirm(cs->pipe, buf); - if (err) - goto out_put_old; - BUG_ON(!cs->nr_segs); cs->currbuf = buf; cs->len = buf->len; @@ -818,14 +790,6 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep) newfolio = page_folio(buf->page); - if (!folio_test_uptodate(newfolio)) - folio_mark_uptodate(newfolio); - - folio_clear_mappedtodisk(newfolio); - - if (fuse_check_folio(newfolio) != 0) - goto out_fallback_unlock; - /* * This is a new and locked page, it shouldn't be mapped or * have any special flags on it @@ -2020,7 +1984,6 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe, goto out_free; *obuf = *ibuf; - obuf->flags &= ~PIPE_BUF_FLAG_GIFT; obuf->len = rem; ibuf->offset += obuf->len; ibuf->len -= obuf->len; diff --git a/fs/pipe.c b/fs/pipe.c index 2d88f73f585a..d5c86eb20f29 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -286,7 +286,6 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to) struct pipe_buffer *buf = &pipe->bufs[tail & mask]; size_t chars = buf->len; size_t written; - int error; if (chars > total_len) { if (buf->flags & PIPE_BUF_FLAG_WHOLE) { @@ -297,13 +296,6 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to) chars = total_len; } - error = pipe_buf_confirm(pipe, buf); - if (error) { - if (!ret) - ret = error; - break; - } - written = copy_page_to_iter(buf->page, buf->offset, chars, to); if (unlikely(written < chars)) { if (!ret) @@ -462,10 +454,6 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from) if ((buf->flags & PIPE_BUF_FLAG_CAN_MERGE) && offset + chars <= PAGE_SIZE) { - ret = pipe_buf_confirm(pipe, buf); - if (ret) - goto out; - ret = copy_page_from_iter(buf->page, offset, chars, from); if (unlikely(ret < chars)) { ret = -EFAULT; diff --git a/fs/splice.c b/fs/splice.c index 42af642c0ff8..2b1f109a7d4f 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -56,129 +56,6 @@ static noinline void noinline pipe_clear_nowait(struct file *file) } while (!try_cmpxchg(&file->f_mode, &fmode, fmode & ~FMODE_NOWAIT)); } -/* - * Attempt to steal a page from a pipe buffer. This should perhaps go into - * a vm helper function, it's already simplified quite a bit by the - * addition of remove_mapping(). If success is returned, the caller may - * attempt to reuse this page for another destination. - */ -static bool page_cache_pipe_buf_try_steal(struct pipe_inode_info *pipe, - struct pipe_buffer *buf) -{ - struct folio *folio = page_folio(buf->page); - struct address_space *mapping; - - folio_lock(folio); - - mapping = folio_mapping(folio); - if (mapping) { - WARN_ON(!folio_test_uptodate(folio)); - - /* - * At least for ext2 with nobh option, we need to wait on - * writeback completing on this folio, since we'll remove it - * from the pagecache. Otherwise truncate wont wait on the - * folio, allowing the disk blocks to be reused by someone else - * before we actually wrote our data to them. fs corruption - * ensues. - */ - folio_wait_writeback(folio); - - if (folio_has_private(folio) && - !filemap_release_folio(folio, GFP_KERNEL)) - goto out_unlock; - - /* - * If we succeeded in removing the mapping, set LRU flag - * and return good. - */ - if (remove_mapping(mapping, folio)) { - buf->flags |= PIPE_BUF_FLAG_LRU; - return true; - } - } - - /* - * Raced with truncate or failed to remove folio from current - * address space, unlock and return failure. - */ -out_unlock: - folio_unlock(folio); - return false; -} - -static void page_cache_pipe_buf_release(struct pipe_inode_info *pipe, - struct pipe_buffer *buf) -{ - put_page(buf->page); - buf->flags &= ~PIPE_BUF_FLAG_LRU; -} - -/* - * Check whether the contents of buf is OK to access. Since the content - * is a page cache page, IO may be in flight. - */ -static int page_cache_pipe_buf_confirm(struct pipe_inode_info *pipe, - struct pipe_buffer *buf) -{ - struct page *page = buf->page; - int err; - - if (!PageUptodate(page)) { - lock_page(page); - - /* - * Page got truncated/unhashed. This will cause a 0-byte - * splice, if this is the first page. - */ - if (!page->mapping) { - err = -ENODATA; - goto error; - } - - /* - * Uh oh, read-error from disk. - */ - if (!PageUptodate(page)) { - err = -EIO; - goto error; - } - - /* - * Page is ok afterall, we are done. - */ - unlock_page(page); - } - - return 0; -error: - unlock_page(page); - return err; -} - -const struct pipe_buf_operations page_cache_pipe_buf_ops = { - .confirm = page_cache_pipe_buf_confirm, - .release = page_cache_pipe_buf_release, - .try_steal = page_cache_pipe_buf_try_steal, - .get = generic_pipe_buf_get, -}; - -static bool user_page_pipe_buf_try_steal(struct pipe_inode_info *pipe, - struct pipe_buffer *buf) -{ - if (!(buf->flags & PIPE_BUF_FLAG_GIFT)) - return false; - - buf->flags |= PIPE_BUF_FLAG_LRU; - return generic_pipe_buf_try_steal(pipe, buf); -} - -static const struct pipe_buf_operations user_page_pipe_buf_ops = { - .release = page_cache_pipe_buf_release, - .try_steal = user_page_pipe_buf_try_steal, - .get = generic_pipe_buf_get, -}; - static void wakeup_pipe_readers(struct pipe_inode_info *pipe) { smp_mb(); @@ -460,13 +337,6 @@ static int splice_from_pipe_feed(struct pipe_inode_info *pipe, struct splice_des if (sd->len > sd->total_len) sd->len = sd->total_len; - ret = pipe_buf_confirm(pipe, buf); - if (unlikely(ret)) { - if (ret == -ENODATA) - ret = 0; - return ret; - } - ret = actor(pipe, buf, sd); if (ret <= 0) return ret; @@ -723,13 +593,6 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out, continue; this_len = min(this_len, left); - ret = pipe_buf_confirm(pipe, buf); - if (unlikely(ret)) { - if (ret == -ENODATA) - ret = 0; - goto done; - } - bvec_set_page(&array[n], buf->page, this_len, buf->offset); left -= this_len; @@ -764,7 +627,7 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out, } } } -done: + kfree(array); splice_from_pipe_end(pipe, &sd); @@ -855,13 +718,6 @@ ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out, seg = min_t(size_t, remain, buf->len); - ret = pipe_buf_confirm(pipe, buf); - if (unlikely(ret)) { - if (ret == -ENODATA) - ret = 0; - break; - } - bvec_set_page(&bvec[bc++], buf->page, seg, buf->offset); remain -= seg; if (remain == 0 || bc >= ARRAY_SIZE(bvec)) @@ -1450,7 +1306,6 @@ static int splice_try_to_steal_page(struct pipe_inode_info *pipe, need_copy_unlock: folio_unlock(folio); need_copy: - copy = folio_alloc(GFP_KERNEL, 0); if (!copy) return -ENOMEM; @@ -1578,10 +1433,6 @@ static long vmsplice_to_pipe(struct file *file, struct iov_iter *iter, { struct pipe_inode_info *pipe; long ret = 0; - unsigned buf_flag = 0; - - if (flags & SPLICE_F_GIFT) - buf_flag = PIPE_BUF_FLAG_GIFT; pipe = get_pipe_info(file, true); if (!pipe) @@ -1592,7 +1443,7 @@ static long vmsplice_to_pipe(struct file *file, struct iov_iter *iter, pipe_lock(pipe); ret = wait_for_space(pipe, flags); if (!ret) - ret = iter_to_pipe(iter, pipe, buf_flag); + ret = iter_to_pipe(iter, pipe, flags); pipe_unlock(pipe); if (ret > 0) wakeup_pipe_readers(pipe); @@ -1876,7 +1727,6 @@ static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe, * Don't inherit the gift and merge flags, we need to * prevent multiple steals of this page. */ - obuf->flags &= ~PIPE_BUF_FLAG_GIFT; obuf->flags &= ~PIPE_BUF_FLAG_CAN_MERGE; obuf->len = len; @@ -1968,7 +1818,6 @@ static int link_pipe(struct pipe_inode_info *ipipe, * Don't inherit the gift and merge flag, we need to prevent * multiple steals of this page. */ - obuf->flags &= ~PIPE_BUF_FLAG_GIFT; obuf->flags &= ~PIPE_BUF_FLAG_CAN_MERGE; if (obuf->len > len) diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index 02e0086b10f6..9cfbefd7ba31 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -6,7 +6,6 @@ #define PIPE_BUF_FLAG_LRU 0x01 /* page is on the LRU */ #define PIPE_BUF_FLAG_ATOMIC 0x02 /* was atomically mapped */ -#define PIPE_BUF_FLAG_GIFT 0x04 /* page is a gift */ #define PIPE_BUF_FLAG_PACKET 0x08 /* read() as a packet */ #define PIPE_BUF_FLAG_CAN_MERGE 0x10 /* can merge buffers */ #define PIPE_BUF_FLAG_WHOLE 0x20 /* read() must return entire buffer or error */ @@ -203,19 +202,6 @@ static inline void pipe_buf_release(struct pipe_inode_info *pipe, ops->release(pipe, buf); } -/** - * pipe_buf_confirm - verify contents of the pipe buffer - * @pipe: the pipe that the buffer belongs to - * @buf: the buffer to confirm - */ -static inline int pipe_buf_confirm(struct pipe_inode_info *pipe, - struct pipe_buffer *buf) -{ - if (!buf->ops->confirm) - return 0; - return buf->ops->confirm(pipe, buf); -} - /** * pipe_buf_try_steal - attempt to take ownership of a pipe_buffer * @pipe: the pipe that the buffer belongs to diff --git a/include/linux/splice.h b/include/linux/splice.h index 6c461573434d..3c5abbd49ff2 100644 --- a/include/linux/splice.h +++ b/include/linux/splice.h @@ -97,6 +97,5 @@ extern ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out, extern int splice_grow_spd(const struct pipe_inode_info *, struct splice_pipe_desc *); extern void splice_shrink_spd(struct splice_pipe_desc *); -extern const struct pipe_buf_operations page_cache_pipe_buf_ops; extern const struct pipe_buf_operations default_pipe_buf_ops; #endif diff --git a/mm/filemap.c b/mm/filemap.c index a002df515966..dd144b0dab69 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2929,7 +2929,7 @@ ssize_t splice_folio_into_pipe(struct pipe_inode_info *pipe, size_t part = min_t(size_t, PAGE_SIZE - offset, size - spliced); *buf = (struct pipe_buffer) { - .ops = &page_cache_pipe_buf_ops, + .ops = &default_pipe_buf_ops, .page = page, .offset = offset, .len = part, From patchwork Thu Jun 29 15:54:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13297123 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68CA2EB64D9 for ; Thu, 29 Jun 2023 15:55:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 071F88E0002; Thu, 29 Jun 2023 11:55:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F3D198E0001; Thu, 29 Jun 2023 11:55:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB6DC8E0002; Thu, 29 Jun 2023 11:55:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C563C8E0001 for ; Thu, 29 Jun 2023 11:55:29 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 80E281603B5 for ; Thu, 29 Jun 2023 15:55:29 +0000 (UTC) X-FDA: 80956235178.05.B173362 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf21.hostedemail.com (Postfix) with ESMTP id C230F1C0011 for ; Thu, 29 Jun 2023 15:55:27 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UaOvH6QW; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf21.hostedemail.com: domain of dhowells@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688054127; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TOAbgctNJNzSTZB1BB4d4WqHvbP5DttnG4JuN21nRvU=; b=JJ2EBTl/RPP3UWvRKyGO/fEYph+PsIMDv+0iwnRm7IrdPEd+PU9BGpWpmknITl7daRaco/ aWzxDuNKaTM+VWYtikJfE2UcNJIG+ZznFl34VtiYY2WRZVuaOFmmjxPLR93gNvKdacVE+q hmMhb2TL9TNMnLBbTdGeGWEStXlLxbs= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UaOvH6QW; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf21.hostedemail.com: domain of dhowells@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688054127; a=rsa-sha256; cv=none; b=OpgMEEM1vn3r4FTu3PjIKkkQ4STwuKETlkFsCxGut48rBgqSU0v9b5XiVZNo+KnqgdpFMF SRShNISi7JSj//PG5Q6/Qf1X8kL12srydv4LwZ3XXOrx/GpUUDBSveLUWJIVGd/iVyNUVW L+ZS7t3HfqRPcnzxDbee+/hol8AKxV4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1688054127; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TOAbgctNJNzSTZB1BB4d4WqHvbP5DttnG4JuN21nRvU=; b=UaOvH6QW9s+npsw3c1RxUrv4d/6X2Wf32TUuPPGY0R6JGIYzI4KqUcWtc3559l04wMe+di 4XQsRiGmto9Op+WeM1CnUrRyPssTHrQqLGXd/p8hV6Cc1mh7WrisvP9zaC+LSpE5z2Q2Aj yR6YNcBMKoTz2SbkOhi1MFVFLPuBT6k= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-158--w3bQ03qP3q32efbcBKCww-1; Thu, 29 Jun 2023 11:55:21 -0400 X-MC-Unique: -w3bQ03qP3q32efbcBKCww-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9792B1C06ED8; Thu, 29 Jun 2023 15:54:45 +0000 (UTC) Received: from warthog.procyon.org.uk.com (unknown [10.42.28.4]) by smtp.corp.redhat.com (Postfix) with ESMTP id 398D84CD0C2; Thu, 29 Jun 2023 15:54:44 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , Matthew Wilcox , Dave Chinner , Matt Whitlock , Linus Torvalds , Jens Axboe , linux-fsdevel@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Christoph Hellwig , linux-fsdevel@vger.kernel.org Subject: [RFC PATCH 4/4] splice: Record some statistics Date: Thu, 29 Jun 2023 16:54:33 +0100 Message-ID: <20230629155433.4170837-5-dhowells@redhat.com> In-Reply-To: <20230629155433.4170837-1-dhowells@redhat.com> References: <20230629155433.4170837-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Rspamd-Queue-Id: C230F1C0011 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: rj1jstyqcmf6pngsahe7dzepbrwk6bp4 X-HE-Tag: 1688054127-678648 X-HE-Meta: U2FsdGVkX18mwigFmR5EawVJBuphiI2PKX40cTBsgWFJVXmhn//T8mumC5uYO5VC8d9MG3KfA1veXk2Ni1HKhMlp27lVwfaUj4TLO3s10MjFEqT8aYq+xAHzN8HWA6nQ8+RasfFt/dKobuch8sDne/4/3DwRBE4qfIK4MRcAIebqdLBT99SFJlUc1/w+3KmNqsewRsdqzAyKzNyKNMG5Im8SbPQnrTekRWwpylRFfvI+2+eU9znSM3Y9WDeTG0OZLADTa8dEUWv55b/MHKMX+IZToUlNi3dOQDgnAkXNKT0Qfyt60/D5jJ5NTZ5IaTB6gKIk8Du4v2Sgsjo+MmNtkfQPcmMQkurLSdrubx73pGvqViuBkjY4xo/fEfEcLABMHydutF4N5N5MAzaCcDUIqygA3dmbIscefAV1DH77bSGXIMtm1go4gfHYJ0bJpQm80pswkGySpFS96B308ieKyPhbvHx1JNcPDBwaCT/lF662xAyEwIz3EyArW+gzmBVfbcRRQg1X8QgHk91xrcIiHxNGKLDoSkTJrPFvFpXVVtSluozbekqyAfwkoqXgtOZ5a6FL8pLjlln1IokVcJRsV+jyEGmCIdQOpMX1OmOdqd7YaXEsxOuuC4PG2kBHqh93eXpHzxtt8RfZyx+zD8TfM70ZH4aENybGCr9a4M1m2jvSOiNQyZfu828QL9jHLlWIrNFhILUvZhM/sjbA1OjXfC15ojQrXjPHnkFs30y+x0Zwi8Zo8fck/OcGng6Z5JXbSpktM8IRANKEtEeWbe8DIWVXQiT2w7Xd3sPybb+iNFqCxP79oPi5iAV9w1gvNkxoQmTznUJJBGS/DXrToIhsEcBsNj6fiAG0A5HV/WqZIrz/8XDb36QGLZAcAMOSC9nKxCFvgp2NuxJ4oprhmlAHdHv651gVXf13VfMs3Pl1pKnwlUu+ej+vV5rPfq0wiF/QJ0vSHhkP/6E5sZf+8Z+ UC0udTeQ Ne4D9r3XhiU8kTVIHwCDSYnLBIKhmoSaK2WXZUrfadR2l/qNAJ4rtH37VxlR7xTy3wLdlnN+joNUORrrYnSx2eDcGgNdo1X7xgfVjQtlAK1wHcLiMuuL2L8TIEQQsTphFk6wDNivTb1RlB/LLTxxS94YfQz/gyvnMdyWvJiXG4Kv4GpK210BEsbIJG9FHZGIgVGEH5OoncZE7AY3WtU3O8luGYLZEjqetMlKCY8+VPR4avkwCb0bTVQWouTF0tnBOcBpwXbO3gjZ5lN+zgVB2EczDBiBPSm7vfXbiOgOyhRMNBXA4g7Gy/4/MWKV9MQPM27BjLfj5VR1eSJ5oWTKlWh5k0oAqfoG4Qk6u X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a proc file to export some statistics for debugging purposes. Signed-off-by: David Howells cc: Matthew Wilcox cc: Dave Chinner cc: Christoph Hellwig cc: Jens Axboe cc: linux-fsdevel@vger.kernel.org --- fs/splice.c | 28 ++++++++++++++++++++++++++++ include/linux/splice.h | 3 +++ mm/filemap.c | 6 +++++- 3 files changed, 36 insertions(+), 1 deletion(-) diff --git a/fs/splice.c b/fs/splice.c index 2b1f109a7d4f..831973ea6b3f 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -36,10 +36,15 @@ #include #include #include +#include #include "../mm/internal.h" #include "internal.h" +atomic_t splice_stat_filemap_copied, splice_stat_filemap_moved; +static atomic_t splice_stat_directly_copied; +static atomic_t vmsplice_stat_copied, vmsplice_stat_stole; + /* * Splice doesn't support FMODE_NOWAIT. Since pipes may set this flag to * indicate they support non-blocking reads or writes, we must clear it @@ -276,6 +281,7 @@ ssize_t copy_splice_read(struct file *in, loff_t *ppos, remain -= chunk; } + atomic_add(keep, &splice_stat_directly_copied); kfree(bv); return ret; } @@ -1299,6 +1305,7 @@ static int splice_try_to_steal_page(struct pipe_inode_info *pipe, unmap_mapping_folio(folio); if (remove_mapping(folio->mapping, folio)) { folio_clear_mappedtodisk(folio); + atomic_inc(&vmsplice_stat_stole); flags |= PIPE_BUF_FLAG_LRU; goto add_to_pipe; } @@ -1316,6 +1323,7 @@ static int splice_try_to_steal_page(struct pipe_inode_info *pipe, folio_put(folio); folio = copy; offset = 0; + atomic_inc(&vmsplice_stat_copied); add_to_pipe: page = folio_page(folio, offset / PAGE_SIZE); @@ -1905,3 +1913,23 @@ SYSCALL_DEFINE4(tee, int, fdin, int, fdout, size_t, len, unsigned int, flags) return error; } + +static int splice_stats_show(struct seq_file *m, void *data) +{ + seq_printf(m, "filemap: copied=%u moved=%u\n", + atomic_read(&splice_stat_filemap_copied), + atomic_read(&splice_stat_filemap_moved)); + seq_printf(m, "direct : copied=%u\n", + atomic_read(&splice_stat_directly_copied)); + seq_printf(m, "vmsplice: copied=%u stole=%u\n", + atomic_read(&vmsplice_stat_copied), + atomic_read(&vmsplice_stat_stole)); + return 0; +} + +static int splice_stats_init(void) +{ + proc_create_single("fs/splice", S_IFREG | 0444, NULL, splice_stats_show); + return 0; +} +late_initcall(splice_stats_init); diff --git a/include/linux/splice.h b/include/linux/splice.h index 3c5abbd49ff2..4f04dc338010 100644 --- a/include/linux/splice.h +++ b/include/linux/splice.h @@ -98,4 +98,7 @@ extern int splice_grow_spd(const struct pipe_inode_info *, struct splice_pipe_de extern void splice_shrink_spd(struct splice_pipe_desc *); extern const struct pipe_buf_operations default_pipe_buf_ops; + +extern atomic_t splice_stat_filemap_copied, splice_stat_filemap_moved; + #endif diff --git a/mm/filemap.c b/mm/filemap.c index dd144b0dab69..38d38cc826fa 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2872,7 +2872,8 @@ ssize_t splice_folio_into_pipe(struct pipe_inode_info *pipe, struct address_space *mapping; struct folio *copy = NULL; struct page *page; - unsigned int flags = 0; + unsigned int flags = 0, count = 0; + atomic_t *stat = &splice_stat_filemap_copied; ssize_t ret; size_t spliced = 0, offset = offset_in_folio(folio, fpos); @@ -2902,6 +2903,7 @@ ssize_t splice_folio_into_pipe(struct pipe_inode_info *pipe, /* If we succeed in removing the mapping, set LRU flag and add it. */ if (remove_mapping(mapping, folio)) { folio_unlock(folio); + stat = &splice_stat_filemap_moved; flags = PIPE_BUF_FLAG_LRU; goto add_to_pipe; } @@ -2940,8 +2942,10 @@ ssize_t splice_folio_into_pipe(struct pipe_inode_info *pipe, page++; spliced += part; offset = 0; + count++; } + atomic_add(count, stat); if (copy) folio_put(copy); return spliced;